
phoenix-srun: job 3211349 queued and waiting for resources
phoenix-srun: job 3211349 has been allocated resources
phoenix-srun: Job 3211349 scheduled successfully!
Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
Current PHX_PRIORITY is normal

[2024-06-10 00:33:29,642] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:29,644] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:30,062] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:30,068] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:31,362] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:31,364] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:31,377] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:33:31,377] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
[2024-06-10 00:33:59,053] [INFO] [comm.py:637:init_distributed] cdb=None
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
[2024-06-10 00:33:59,142] [INFO] [comm.py:637:init_distributed] cdb=None
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
[2024-06-10 00:33:59,313] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-10 00:33:59,325] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-06-10 00:33:59,326] [INFO] [comm.py:637:init_distributed] cdb=None
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=4,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=zero_stage1_config.json,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=32,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=4e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/runs/Jun10_00-34-00_SH-IDC1-10-140-37-3,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1.0,
logging_strategy=steps,
lr_scheduler_kwargs={},
lr_scheduler_type=cosine,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=4,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=200,
save_strategy=steps,
save_total_limit=3,
seed=42,
skip_memory_metrics=True,
split_batches=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.01,
)
06/10/2024 00:34:00 - INFO - __main__ - Loading Tokenizer: /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5
[INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file ./tokenizer.model
[INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file tokenizer.json
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False
06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False
[WARNING|logging.py:314] 2024-06-10 00:34:00,722 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[WARNING|logging.py:314] 2024-06-10 00:34:00,867 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:314] 2024-06-10 00:34:00,932 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:314] 2024-06-10 00:34:00,933 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:314] 2024-06-10 00:34:00,933 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:314] 2024-06-10 00:34:00,934 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:314] 2024-06-10 00:34:00,934 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[WARNING|logging.py:314] 2024-06-10 00:34:00,936 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
[TCSLoader] config_path: ~/petreloss.conf
--> before Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
--> after Client(conf_path)
06/10/2024 00:35:20 - INFO - __main__ - Loading InternVLChatModel...
[INFO|configuration_utils.py:727] 2024-06-10 00:35:20,584 >> loading configuration file /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5/config.json
[INFO|configuration_utils.py:792] 2024-06-10 00:35:20,585 >> Model config InternVLChatConfig {
  "_commit_hash": null,
  "_name_or_path": "OpenGVLab/Mini-InternVL-Chat-2B-V1-5",
  "architectures": [
    "InternVLChatModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_internvl_chat.InternVLChatConfig",
    "AutoModel": "modeling_internvl_chat.InternVLChatModel",
    "AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel"
  },
  "downsample_ratio": 0.5,
  "dynamic_image_size": true,
  "force_image_size": 448,
  "llm_config": {
    "_name_or_path": "pretrained/internlm2-chat-1_8b",
    "add_cross_attention": false,
    "architectures": [
      "InternLM2ForCausalLM"
    ],
    "attn_implementation": "flash_attention_2",
    "auto_map": {
      "AutoConfig": "configuration_internlm2.InternLM2Config",
      "AutoModel": "modeling_internlm2.InternLM2ForCausalLM",
      "AutoModelForCausalLM": "modeling_internlm2.InternLM2ForCausalLM"
    },
    "bad_words_ids": null,
    "begin_suppress_tokens": null,
    "bias": false,
    "bos_token_id": 1,
    "chunk_size_feed_forward": 0,
    "cross_attention_hidden_size": null,
    "decoder_start_token_id": null,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "early_stopping": false,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": 2,
    "exponential_decay_length_penalty": null,
    "finetuning_task": null,
    "forced_bos_token_id": null,
    "forced_eos_token_id": null,
    "hidden_act": "silu",
    "hidden_size": 2048,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1"
    },
    "initializer_range": 0.02,
    "intermediate_size": 8192,
    "is_decoder": false,
    "is_encoder_decoder": false,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1
    },
    "length_penalty": 1.0,
    "max_length": 20,
    "max_position_embeddings": 32768,
    "min_length": 0,
    "model_type": "internlm2",
    "no_repeat_ngram_size": 0,
    "num_attention_heads": 16,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_hidden_layers": 24,
    "num_key_value_heads": 8,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "pad_token_id": 2,
    "prefix": null,
    "problem_type": null,
    "pruned_heads": {},
    "remove_invalid_values": false,
    "repetition_penalty": 1.0,
    "return_dict": true,
    "return_dict_in_generate": false,
    "rms_norm_eps": 1e-05,
    "rope_scaling": {
      "factor": 3.0,
      "type": "dynamic"
    },
    "rope_theta": 1000000,
    "sep_token_id": null,
    "suppress_tokens": null,
    "task_specific_params": null,
    "temperature": 1.0,
    "tf_legacy_loss": false,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": false,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torch_dtype": "bfloat16",
    "torchscript": false,
    "transformers_version": "4.37.2",
    "typical_p": 1.0,
    "use_bfloat16": false,
    "use_cache": true,
    "vocab_size": 92553
  },
  "max_dynamic_patch": 12,
  "min_dynamic_patch": 1,
  "model_type": "internvl_chat",
  "pad2square": false,
  "ps_version": "v2",
  "select_layer": -1,
  "template": "internlm2-chat",
  "torch_dtype": "bfloat16",
  "transformers_version": null,
  "use_backbone_lora": 0,
  "use_llm_lora": 0,
  "use_thumbnail": true,
  "vision_config": {
    "_name_or_path": "OpenGVLab/InternViT-300M-448px",
    "add_cross_attention": false,
    "architectures": [
      "InternVisionModel"
    ],
    "attention_dropout": 0.0,
    "auto_map": {
      "AutoConfig": "configuration_intern_vit.InternVisionConfig",
      "AutoModel": "modeling_intern_vit.InternVisionModel"
    },
    "bad_words_ids": null,
    "begin_suppress_tokens": null,
    "bos_token_id": null,
    "chunk_size_feed_forward": 0,
    "cross_attention_hidden_size": null,
    "decoder_start_token_id": null,
    "diversity_penalty": 0.0,
    "do_sample": false,
    "drop_path_rate": 0.1,
    "dropout": 0.0,
    "early_stopping": false,
    "encoder_no_repeat_ngram_size": 0,
    "eos_token_id": null,
06/10/2024 00:35:20 - INFO - __main__ - Using flash_attention_2 for InternLM
    "exponential_decay_length_penalty": null,
    "finetuning_task": null,
    "forced_bos_token_id": null,
    "forced_eos_token_id": null,
    "hidden_act": "gelu",
    "hidden_size": 1024,
    "id2label": {
      "0": "LABEL_0",
      "1": "LABEL_1"
    },
    "image_size": 448,
    "initializer_factor": 1.0,
    "initializer_range": 0.02,
    "intermediate_size": 4096,
    "is_decoder": false,
    "is_encoder_decoder": false,
    "label2id": {
      "LABEL_0": 0,
      "LABEL_1": 1
    },
    "layer_norm_eps": 1e-06,
    "length_penalty": 1.0,
    "max_length": 20,
    "min_length": 0,
    "model_type": "intern_vit_6b",
    "no_repeat_ngram_size": 0,
    "norm_type": "layer_norm",
    "num_attention_heads": 16,
    "num_beam_groups": 1,
    "num_beams": 1,
    "num_channels": 3,
    "num_hidden_layers": 24,
    "num_return_sequences": 1,
    "output_attentions": false,
    "output_hidden_states": false,
    "output_scores": false,
    "pad_token_id": null,
    "patch_size": 14,
    "prefix": null,
    "problem_type": null,
    "pruned_heads": {},
    "qk_normalization": false,
    "qkv_bias": true,
    "remove_invalid_values": false,
    "repetition_penalty": 1.0,
    "return_dict": true,
    "return_dict_in_generate": false,
    "sep_token_id": null,
    "suppress_tokens": null,
    "task_specific_params": null,
    "temperature": 1.0,
    "tf_legacy_loss": false,
    "tie_encoder_decoder": false,
    "tie_word_embeddings": true,
    "tokenizer_class": null,
    "top_k": 50,
    "top_p": 1.0,
    "torch_dtype": "bfloat16",
    "torchscript": false,
    "transformers_version": "4.37.2",
    "typical_p": 1.0,
    "use_bfloat16": true,
    "use_flash_attn": true
  }
}

[INFO|modeling_utils.py:3473] 2024-06-10 00:35:20,675 >> loading weights file /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-10 00:35:20,708 >> Instantiating InternVLChatModel model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:826] 2024-06-10 00:35:20,710 >> Generate config GenerationConfig {}

[INFO|configuration_utils.py:826] 2024-06-10 00:35:20,796 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 2
}

[INFO|modeling_utils.py:4350] 2024-06-10 00:35:36,690 >> All model checkpoint weights were used when initializing InternVLChatModel.

[INFO|modeling_utils.py:4358] 2024-06-10 00:35:36,691 >> All the weights of InternVLChatModel were initialized from the model checkpoint at /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5.
If your task is similar to the task the model of the checkpoint was trained on, you can already use InternVLChatModel for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-10 00:35:36,698 >> loading configuration file /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-10 00:35:36,698 >> Generate config GenerationConfig {}

06/10/2024 00:35:36 - INFO - __main__ - Finished
06/10/2024 00:35:36 - INFO - __main__ - model.config.force_image_size: 448
06/10/2024 00:35:36 - INFO - __main__ - data_args.force_image_size: 448
06/10/2024 00:35:36 - INFO - __main__ - model.config.vision_config.image_size: 448
06/10/2024 00:35:36 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:35:36 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:35:36 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:35:36 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:35:36 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:35:41 - INFO - __main__ - Add dataset:sharegpt4v_instruct_gpt4-vision_cap100k_0 with length: 102025
06/10/2024 00:35:41 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:35:41 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:35:41 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:35:41 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:35:41 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:35:45 - INFO - __main__ - Add dataset:llava_instruct_150k_zh_0 with length: 157712
06/10/2024 00:35:45 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:35:45 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:35:45 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:35:45 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:35:45 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:34 - INFO - __main__ - Add dataset:sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k_0 with length: 665058
06/10/2024 00:36:34 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:34 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:34 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:34 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:34 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:43 - INFO - __main__ - Add dataset:dvqa_train_200k_0 with length: 200000
06/10/2024 00:36:43 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:43 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:43 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:43 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:43 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:44 - INFO - __main__ - Add dataset:chartqa_train_18k_0 with length: 18317
06/10/2024 00:36:44 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:44 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:44 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:44 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:44 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:45 - INFO - __main__ - Add dataset:ai2d_train_12k_0 with length: 12413
06/10/2024 00:36:45 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:45 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:45 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:45 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:45 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:50 - INFO - __main__ - Add dataset:docvqa_train_10k_0 with length: 10211
06/10/2024 00:36:50 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:50 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:50 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:50 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:50 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:51 - INFO - __main__ - Add dataset:geoqa+_0 with length: 72318
06/10/2024 00:36:51 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:51 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:51 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:51 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:51 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:53 - INFO - __main__ - Add dataset:synthdog_en_0 with length: 29765
06/10/2024 00:36:53 - INFO - __main__ - [Dataset] num_image_token: 256
06/10/2024 00:36:53 - INFO - __main__ - [Dataset] dynamic_image_size: True
06/10/2024 00:36:53 - INFO - __main__ - [Dataset] use_thumbnail: True
06/10/2024 00:36:53 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12
06/10/2024 00:36:53 - INFO - __main__ - Formatting inputs...Skip in lazy mode
06/10/2024 00:36:59 - INFO - __main__ - Add dataset:medical_sft_sample500k_0 with length: 499712
06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.class_embedding
06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.position_embedding
06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.patch_embedding.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.patch_embedding.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.ls1
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.ls2
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.qkv.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.qkv.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.proj.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.proj.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc2.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm1.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm1.bias
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm2.weight
06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm2.bias
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.tok_embeddings.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.attention.wqkv.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.attention.wo.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.feed_forward.w1.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.feed_forward.w3.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.feed_forward.w2.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.attention_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.ffn_norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.model.norm.weight
06/10/2024 00:36:59 - INFO - __main__ - language_model.output.weight
06/10/2024 00:36:59 - INFO - __main__ - mlp1.0.weight
06/10/2024 00:36:59 - INFO - __main__ - mlp1.0.bias
06/10/2024 00:36:59 - INFO - __main__ - mlp1.1.weight
06/10/2024 00:36:59 - INFO - __main__ - mlp1.1.bias
06/10/2024 00:36:59 - INFO - __main__ - mlp1.3.weight
06/10/2024 00:36:59 - INFO - __main__ - mlp1.3.bias
06/10/2024 00:36:59 - WARNING - accelerate.utils.other - Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:571] 2024-06-10 00:36:59,654 >> Using auto half precision backend
[2024-06-10 00:37:00,502] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.13.5, git-hash=unknown, git-branch=unknown
[2024-06-10 00:37:07,652] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 1.6104459762573242 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 1.6218023300170898 seconds
Loading extension module fused_adam...
Loading extension module fused_adam...
Time to load fused_adam op: 1.6295955181121826 seconds
Time to load fused_adam op: 1.6301324367523193 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 1.6191380023956299 seconds
Loading extension module fused_adam...
Time to load fused_adam op: 1.6189420223236084 seconds
[2024-06-10 00:37:10,267] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer
[2024-06-10 00:37:10,267] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-06-10 00:37:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2024-06-10 00:37:10,303] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2024-06-10 00:37:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer
[2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 1000000000
[2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 1000000000
[2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False
[2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False
Loading extension module fused_adam...
Loading extension module fused_adam...
Time to load fused_adam op: 1.625352382659912 seconds
Time to load fused_adam op: 1.6189563274383545 seconds
[2024-06-10 00:37:22,009] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states
[2024-06-10 00:37:22,011] [INFO] [utils.py:801:see_memory_usage] MA 5.51 GB         Max_MA 6.03 GB         CA 6.36 GB         Max_CA 6 GB
[2024-06-10 00:37:22,012] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 111.06 GB, percent = 11.0%
[2024-06-10 00:37:22,506] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states
[2024-06-10 00:37:22,507] [INFO] [utils.py:801:see_memory_usage] MA 5.51 GB         Max_MA 6.54 GB         CA 7.38 GB         Max_CA 7 GB
[2024-06-10 00:37:22,507] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 112.8 GB, percent = 11.2%
[2024-06-10 00:37:22,507] [INFO] [stage_1_and_2.py:539:__init__] optimizer state initialized
[2024-06-10 00:37:23,097] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer
[2024-06-10 00:37:23,098] [INFO] [utils.py:801:see_memory_usage] MA 5.51 GB         Max_MA 5.51 GB         CA 7.38 GB         Max_CA 7 GB
[2024-06-10 00:37:23,099] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 114.43 GB, percent = 11.4%
[2024-06-10 00:37:23,109] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw
[2024-06-10 00:37:23,109] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client callable to create LR scheduler
[2024-06-10 00:37:23,110] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f21320ca7c0>
[2024-06-10 00:37:23,110] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[[0.9, 0.999]]
[2024-06-10 00:37:23,111] [INFO] [config.py:996:print] DeepSpeedEngine configuration:
[2024-06-10 00:37:23,111] [INFO] [config.py:1000:print]   activation_checkpointing_config  {
    "partition_activations": false,
    "contiguous_memory_optimization": false,
    "cpu_checkpointing": false,
    "number_checkpoints": null,
    "synchronize_checkpoint_boundary": false,
    "profile": false
}
[2024-06-10 00:37:23,111] [INFO] [config.py:1000:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-06-10 00:37:23,111] [INFO] [config.py:1000:print]   amp_enabled .................. False
[2024-06-10 00:37:23,111] [INFO] [config.py:1000:print]   amp_params ................... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   autotuning_config ............ {
    "enabled": false,
    "start_step": null,
    "end_step": null,
    "metric_path": null,
    "arg_mappings": null,
    "metric": "throughput",
    "model_info": null,
    "results_dir": "autotuning_results",
    "exps_dir": "autotuning_exps",
    "overwrite": true,
    "fast": true,
    "start_profile_step": 3,
    "end_profile_step": 5,
    "tuner_type": "gridsearch",
    "tuner_early_stopping": 5,
    "tuner_num_trials": 50,
    "model_info_path": null,
    "mp_size": 1,
    "max_train_batch_size": null,
    "min_train_batch_size": 1,
    "max_train_micro_batch_size_per_gpu": 1.024000e+03,
    "min_train_micro_batch_size_per_gpu": 1,
    "num_tuning_micro_batch_sizes": 3
}
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   bfloat16_enabled ............. True
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   bfloat16_immediate_grad_update  False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   checkpoint_parallel_write_pipeline  False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   checkpoint_tag_validation_enabled  True
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   checkpoint_tag_validation_fail  False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f210dfd5b50>
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   communication_data_type ...... None
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   compile_config ............... enabled=False backend='inductor' kwargs={}
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   curriculum_enabled_legacy .... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   curriculum_params_legacy ..... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   data_efficiency_enabled ...... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   dataloader_drop_last ......... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   disable_allgather ............ False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   dump_state ................... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   dynamic_loss_scale_args ...... None
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_enabled ........... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_gas_boundary_resolution  1
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_layer_num ......... 0
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_max_iter .......... 100
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_stability ......... 1e-06
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_tol ............... 0.01
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   eigenvalue_verbose ........... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   elasticity_enabled ........... False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   flops_profiler_config ........ {
    "enabled": false,
    "recompute_fwd_factor": 0.0,
    "profile_step": 1,
    "module_depth": -1,
    "top_modules": 1,
    "detailed": true,
    "output_file": null
}
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   fp16_auto_cast ............... None
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   fp16_enabled ................. False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   fp16_master_weights_and_gradients  False
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   global_rank .................. 0
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   grad_accum_dtype ............. None
[2024-06-10 00:37:23,112] [INFO] [config.py:1000:print]   gradient_accumulation_steps .. 32
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   gradient_clipping ............ 1.0
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   gradient_predivide_factor .... 1.0
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   graph_harvesting ............. False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   initial_dynamic_scale ........ 1
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   load_universal_checkpoint .... False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   loss_scale ................... 1.0
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   memory_breakdown ............. False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   mics_hierarchial_params_gather  False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   mics_shard_size .............. -1
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   nebula_config ................ {
    "enabled": false,
    "persistent_storage_path": null,
    "persistent_time_interval": 100,
    "num_of_version_in_retention": 2,
    "enable_nebula_load": true,
    "load_path": null
}
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   optimizer_legacy_fusion ...... False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   optimizer_name ............... adamw
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   optimizer_params ............. {'lr': 4e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.01}
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   pld_enabled .................. False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   pld_params ................... False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   prescale_gradients ........... False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   scheduler_name ............... None
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   scheduler_params ............. None
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   seq_parallel_communication_data_type  torch.float32
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   sparse_attention ............. None
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   sparse_gradients_enabled ..... False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   steps_per_print .............. inf
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   train_batch_size ............. 1024
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   train_micro_batch_size_per_gpu  4
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   use_data_before_expert_parallel_  False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   use_node_local_storage ....... False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   wall_clock_breakdown ......... True
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   weight_quantization_config ... None
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   world_size ................... 8
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   zero_allow_untested_optimizer  False
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=1000000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=1000000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   zero_enabled ................. True
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   zero_force_ds_cpu_optimizer .. True
[2024-06-10 00:37:23,113] [INFO] [config.py:1000:print]   zero_optimization_stage ...... 1
[2024-06-10 00:37:23,114] [INFO] [config.py:986:print_user_config]   json = {
    "zero_optimization": {
        "stage": 1,
        "allgather_partitions": true,
        "allgather_bucket_size": 1.000000e+09,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 1.000000e+09,
        "contiguous_gradients": true
    },
    "fp16": {
        "enabled": false,
        "auto_cast": true,
        "loss_scale": 0,
        "initial_scale_power": 32,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": true
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": 4e-05,
            "betas": [0.9, 0.999],
            "eps": 1e-08,
            "weight_decay": 0.01
        }
    },
    "gradient_accumulation_steps": 32,
    "gradient_clipping": 1.0,
    "steps_per_print": inf,
    "train_batch_size": 1.024000e+03,
    "train_micro_batch_size_per_gpu": 4,
    "wall_clock_breakdown": true
}
[INFO|trainer.py:1721] 2024-06-10 00:37:23,114 >> ***** Running training *****
[INFO|trainer.py:1722] 2024-06-10 00:37:23,114 >>   Num examples = 1,767,531
[INFO|trainer.py:1723] 2024-06-10 00:37:23,114 >>   Num Epochs = 1
[INFO|trainer.py:1724] 2024-06-10 00:37:23,114 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1727] 2024-06-10 00:37:23,114 >>   Total train batch size (w. parallel, distributed & accumulation) = 1,024
[INFO|trainer.py:1728] 2024-06-10 00:37:23,114 >>   Gradient Accumulation steps = 32
[INFO|trainer.py:1729] 2024-06-10 00:37:23,114 >>   Total optimization steps = 1,726
[INFO|trainer.py:1730] 2024-06-10 00:37:23,116 >>   Number of trainable parameters = 2,205,754,368
[2024-06-10 00:37:37,731] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:37,732] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:39,682] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:39,775] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:39,776] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:40,095] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:40,384] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:37:41,285] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:32,397] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:32,403] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:40,348] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:40,484] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:40,487] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:40,491] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:40,491] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:38:40,522] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:37,186] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:37,191] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:37,637] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:43,054] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:43,056] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:44,430] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:44,431] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:39:44,431] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:45,784] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:50,760] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:50,762] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:54,757] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:54,759] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:54,759] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:56,597] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-10 00:40:56,608] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1929
[2024-06-10 00:43:21,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.39 | bwd_microstep: 983.10 | bwd_inner_microstep: 982.88 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899
[2024-06-10 00:43:23,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.59 | bwd_microstep: 1571.79 | bwd_inner_microstep: 1571.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2242
[2024-06-10 00:43:25,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.47 | bwd_microstep: 910.24 | bwd_inner_microstep: 910.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 00:43:27,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1539.10 | bwd_inner_microstep: 1539.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1061
[2024-06-10 00:43:27,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 171.74 | bwd_microstep: 447.24 | bwd_inner_microstep: 447.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 00:43:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1312.76 | bwd_inner_microstep: 1312.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 00:43:31,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.86 | bwd_microstep: 1620.97 | bwd_inner_microstep: 1620.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 00:43:34,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.33 | bwd_microstep: 1522.22 | bwd_inner_microstep: 1522.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3502
[2024-06-10 00:43:36,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1505.43 | bwd_inner_microstep: 1505.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 00:43:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.68 | bwd_microstep: 1275.00 | bwd_inner_microstep: 1274.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 00:43:39,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1338.27 | bwd_inner_microstep: 1338.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3515
[2024-06-10 00:43:41,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1532.36 | bwd_inner_microstep: 1532.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.02
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3660
[2024-06-10 00:43:44,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.44 | bwd_microstep: 1682.01 | bwd_inner_microstep: 1681.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459
[2024-06-10 00:43:46,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.90 | bwd_microstep: 1337.72 | bwd_inner_microstep: 1337.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 00:43:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.91 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1491.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 00:43:50,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.68 | bwd_microstep: 1499.31 | bwd_inner_microstep: 1499.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3662
[2024-06-10 00:43:52,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1654.00 | bwd_inner_microstep: 1653.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975
[2024-06-10 00:43:53,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.84 | bwd_microstep: 732.94 | bwd_inner_microstep: 732.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 00:43:55,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 00:43:57,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1552.97 | bwd_inner_microstep: 1552.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 00:43:59,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1497.64 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 00:44:01,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.26 | bwd_microstep: 1603.77 | bwd_inner_microstep: 1603.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1123
[2024-06-10 00:44:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 166.92 | bwd_microstep: 426.97 | bwd_inner_microstep: 426.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909
[2024-06-10 00:44:03,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.18 | bwd_microstep: 778.16 | bwd_inner_microstep: 778.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 00:44:05,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1375.88 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 00:44:07,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612
[2024-06-10 00:44:09,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 00:44:10,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1285.42 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2426
[2024-06-10 00:44:12,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.53 | bwd_microstep: 1069.19 | bwd_inner_microstep: 1069.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564
[2024-06-10 00:44:14,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1331.67 | bwd_inner_microstep: 1331.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 00:44:16,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.40 | bwd_microstep: 1603.49 | bwd_inner_microstep: 1603.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 00:44:22,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.51 | optimizer_step: 7.76
[2024-06-10 00:44:22,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.46 | bwd_microstep: 5122.37 | bwd_inner_microstep: 1868.17 | bwd_allreduce_microstep: 3254.14 | step_microstep: 41.16
[2024-06-10 00:44:22,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 18620.45 | bwd: 45632.92 | bwd_inner: 42377.68 | bwd_allreduce: 3254.48 | step: 41.65
{'loss': 1.4159, 'learning_rate': 7.692307692307694e-07, 'epoch': 0.0}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 00:44:26,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1486.45 | bwd_inner_microstep: 1486.31 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 00:44:28,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.97 | bwd_microstep: 1243.54 | bwd_inner_microstep: 1243.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 00:44:30,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.71 | bwd_microstep: 1342.08 | bwd_inner_microstep: 1342.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 00:44:32,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1381.06 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 00:44:34,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1298.53 | bwd_inner_microstep: 1298.33 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4061
[2024-06-10 00:44:36,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.19 | bwd_microstep: 1718.99 | bwd_inner_microstep: 1718.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 00:44:38,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.05 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953
[2024-06-10 00:44:39,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 823.46 | bwd_inner_microstep: 823.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074
[2024-06-10 00:44:40,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.53 | bwd_microstep: 820.65 | bwd_inner_microstep: 820.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3494
[2024-06-10 00:44:42,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.52 | bwd_microstep: 1442.17 | bwd_inner_microstep: 1442.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2444
[2024-06-10 00:44:43,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.66 | bwd_microstep: 947.81 | bwd_inner_microstep: 947.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3411
[2024-06-10 00:44:45,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.04 | bwd_microstep: 1387.98 | bwd_inner_microstep: 1387.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 00:44:47,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1290.39 | bwd_inner_microstep: 1290.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3633
[2024-06-10 00:44:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1445.69 | bwd_inner_microstep: 1445.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 00:44:51,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1560.10 | bwd_inner_microstep: 1560.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640
[2024-06-10 00:44:53,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.53 | bwd_microstep: 1317.28 | bwd_inner_microstep: 1317.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 00:44:55,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1287.68 | bwd_inner_microstep: 1287.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 00:44:57,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-10 00:44:59,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.87 | bwd_microstep: 1416.23 | bwd_inner_microstep: 1416.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 00:45:01,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1397.65 | bwd_inner_microstep: 1397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 00:45:03,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.89 | bwd_microstep: 1424.23 | bwd_inner_microstep: 1424.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 00:45:05,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1610.14 | bwd_inner_microstep: 1610.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 00:45:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1394.40 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462
[2024-06-10 00:45:09,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.88 | bwd_microstep: 1348.07 | bwd_inner_microstep: 1348.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2042
[2024-06-10 00:45:10,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.61 | bwd_microstep: 718.88 | bwd_inner_microstep: 718.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 00:45:12,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1560.20 | bwd_inner_microstep: 1560.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 00:45:14,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.44 | bwd_microstep: 1661.78 | bwd_inner_microstep: 1661.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3503
[2024-06-10 00:45:16,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 1340.11 | bwd_inner_microstep: 1340.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3595
[2024-06-10 00:45:18,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.65 | bwd_microstep: 1441.71 | bwd_inner_microstep: 1441.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.02
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 00:45:20,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.92 | bwd_microstep: 1444.31 | bwd_inner_microstep: 1444.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2257
[2024-06-10 00:45:21,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.83 | bwd_microstep: 972.29 | bwd_inner_microstep: 972.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 00:45:24,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.17 | optimizer_step: 6.58
[2024-06-10 00:45:24,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.97 | bwd_microstep: 2043.23 | bwd_inner_microstep: 1878.38 | bwd_allreduce_microstep: 164.78 | step_microstep: 38.91
[2024-06-10 00:45:24,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.10 | bwd: 43349.19 | bwd_inner: 43183.16 | bwd_allreduce: 165.16 | step: 39.50
{'loss': 1.4336, 'learning_rate': 1.5384615384615387e-06, 'epoch': 0.0}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 00:45:26,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.03 | bwd_microstep: 1292.66 | bwd_inner_microstep: 1292.49 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4038
[2024-06-10 00:45:28,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1623.48 | bwd_inner_microstep: 1623.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3870
[2024-06-10 00:45:30,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.92 | bwd_microstep: 1526.73 | bwd_inner_microstep: 1526.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 00:45:32,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1385.56 | bwd_inner_microstep: 1385.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 00:45:34,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1353.31 | bwd_inner_microstep: 1353.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 00:45:35,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 794.18 | bwd_inner_microstep: 794.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 00:45:37,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.43 | bwd_microstep: 1292.33 | bwd_inner_microstep: 1292.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 00:45:39,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1423.11 | bwd_inner_microstep: 1423.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 00:45:41,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505
[2024-06-10 00:45:43,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.81 | bwd_microstep: 1548.14 | bwd_inner_microstep: 1548.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 00:45:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.14 | bwd_microstep: 1510.60 | bwd_inner_microstep: 1510.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 00:45:47,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.97 | bwd_microstep: 1488.94 | bwd_inner_microstep: 1488.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 00:45:49,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1439.88 | bwd_inner_microstep: 1439.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-10 00:45:51,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.59 | bwd_microstep: 1718.04 | bwd_inner_microstep: 1718.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3718
[2024-06-10 00:45:53,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1474.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 00:45:55,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.61 | bwd_microstep: 1490.16 | bwd_inner_microstep: 1490.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3840
[2024-06-10 00:45:57,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.25 | bwd_microstep: 1456.18 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-10 00:45:59,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.44 | bwd_microstep: 1222.26 | bwd_inner_microstep: 1222.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 00:46:01,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.71 | bwd_microstep: 1261.06 | bwd_inner_microstep: 1261.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 00:46:03,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.27 | bwd_microstep: 1466.57 | bwd_inner_microstep: 1466.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2088
[2024-06-10 00:46:04,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.06 | bwd_microstep: 760.02 | bwd_inner_microstep: 759.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 00:46:06,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.32 | bwd_microstep: 1548.40 | bwd_inner_microstep: 1548.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 00:46:08,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.12 | bwd_microstep: 1560.77 | bwd_inner_microstep: 1560.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 00:46:10,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.11 | bwd_microstep: 1639.42 | bwd_inner_microstep: 1639.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 00:46:13,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.50 | bwd_microstep: 1660.93 | bwd_inner_microstep: 1660.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 00:46:14,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.19 | bwd_microstep: 812.68 | bwd_inner_microstep: 812.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465
[2024-06-10 00:46:16,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.26 | bwd_microstep: 1216.70 | bwd_inner_microstep: 1216.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3774
[2024-06-10 00:46:18,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1500.81 | bwd_inner_microstep: 1500.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 00:46:20,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.11 | bwd_microstep: 1550.47 | bwd_inner_microstep: 1550.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3391
[2024-06-10 00:46:22,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.41 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 00:46:24,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1551.78 | bwd_inner_microstep: 1551.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 00:46:26,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.65
[2024-06-10 00:46:26,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1541.62 | bwd_allreduce_microstep: 7.73 | step_microstep: 38.51
[2024-06-10 00:46:26,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16677.84 | bwd: 44685.92 | bwd_inner: 44677.13 | bwd_allreduce: 8.01 | step: 40.47
{'loss': 1.4715, 'learning_rate': 2.307692307692308e-06, 'epoch': 0.0}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 00:46:28,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1286.74 | bwd_inner_microstep: 1286.54 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 00:46:30,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.28 | bwd_microstep: 1461.68 | bwd_inner_microstep: 1461.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3853
[2024-06-10 00:46:32,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1365.31 | bwd_inner_microstep: 1365.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3752
[2024-06-10 00:46:33,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1404.16 | bwd_inner_microstep: 1404.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 00:46:35,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1395.40 | bwd_inner_microstep: 1395.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 00:46:37,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 00:46:39,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1527.48 | bwd_inner_microstep: 1527.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 00:46:41,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1279.71 | bwd_inner_microstep: 1279.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405
[2024-06-10 00:46:43,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.28 | bwd_microstep: 1439.14 | bwd_inner_microstep: 1439.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399
[2024-06-10 00:46:45,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1393.30 | bwd_inner_microstep: 1393.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1894
[2024-06-10 00:46:46,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.29 | bwd_microstep: 777.37 | bwd_inner_microstep: 777.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656
[2024-06-10 00:46:49,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.40 | bwd_microstep: 1721.57 | bwd_inner_microstep: 1721.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 00:46:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.39 | bwd_microstep: 1422.19 | bwd_inner_microstep: 1422.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 00:46:52,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.07 | bwd_microstep: 1256.97 | bwd_inner_microstep: 1256.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446
[2024-06-10 00:46:54,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1380.44 | bwd_inner_microstep: 1380.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 00:46:56,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.42 | bwd_microstep: 1621.47 | bwd_inner_microstep: 1621.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 00:46:57,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.53 | bwd_microstep: 796.34 | bwd_inner_microstep: 796.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 00:46:59,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.29 | bwd_microstep: 1258.08 | bwd_inner_microstep: 1258.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 00:47:01,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1391.20 | bwd_inner_microstep: 1391.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3846
[2024-06-10 00:47:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1529.30 | bwd_inner_microstep: 1529.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3787
[2024-06-10 00:47:05,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1481.67 | bwd_inner_microstep: 1481.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 00:47:07,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1601.44 | bwd_inner_microstep: 1601.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 00:47:10,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1561.95 | bwd_inner_microstep: 1561.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 00:47:12,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1513.23 | bwd_inner_microstep: 1513.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 00:47:14,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.43 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2278
[2024-06-10 00:47:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.17 | bwd_microstep: 942.41 | bwd_inner_microstep: 942.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 00:47:16,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.61 | bwd_microstep: 976.38 | bwd_inner_microstep: 976.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 00:47:18,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1356.07 | bwd_inner_microstep: 1356.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107
[2024-06-10 00:47:19,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.77 | bwd_microstep: 1019.33 | bwd_inner_microstep: 1019.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3581
[2024-06-10 00:47:22,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.37 | bwd_microstep: 1535.80 | bwd_inner_microstep: 1535.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 00:47:24,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.57 | bwd_microstep: 1641.69 | bwd_inner_microstep: 1641.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-10 00:47:28,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.60
[2024-06-10 00:47:28,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 4001.37 | bwd_inner_microstep: 1642.00 | bwd_allreduce_microstep: 2359.29 | step_microstep: 39.67
[2024-06-10 00:47:28,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16206.56 | bwd: 45972.47 | bwd_inner: 43610.74 | bwd_allreduce: 2359.61 | step: 41.82
{'loss': 1.3975, 'learning_rate': 3.0769230769230774e-06, 'epoch': 0.0}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 00:47:30,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1370.76 | bwd_inner_microstep: 1370.66 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3938
[2024-06-10 00:47:32,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1398.45 | bwd_inner_microstep: 1398.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3890
[2024-06-10 00:47:35,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.19 | bwd_microstep: 1683.78 | bwd_inner_microstep: 1683.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 00:47:37,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2297
[2024-06-10 00:47:38,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.78 | bwd_microstep: 939.37 | bwd_inner_microstep: 939.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 00:47:39,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.48 | bwd_microstep: 791.81 | bwd_inner_microstep: 791.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 00:47:41,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 00:47:43,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1288.59 | bwd_inner_microstep: 1288.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 00:47:45,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897
[2024-06-10 00:47:46,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.10 | bwd_microstep: 685.90 | bwd_inner_microstep: 685.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3737
[2024-06-10 00:47:48,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.00 | bwd_microstep: 1565.85 | bwd_inner_microstep: 1565.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-10 00:47:50,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.35 | bwd_microstep: 1623.05 | bwd_inner_microstep: 1623.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 00:47:52,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3517
[2024-06-10 00:47:54,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.32 | bwd_microstep: 1655.75 | bwd_inner_microstep: 1655.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 00:47:56,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1596.00 | bwd_inner_microstep: 1595.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3681
[2024-06-10 00:47:58,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.72 | bwd_microstep: 1569.60 | bwd_inner_microstep: 1569.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 00:48:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.91 | bwd_microstep: 1557.51 | bwd_inner_microstep: 1557.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 00:48:02,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1343.83 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102
[2024-06-10 00:48:03,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.70 | bwd_microstep: 825.48 | bwd_inner_microstep: 825.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 00:48:06,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.79 | bwd_microstep: 1572.03 | bwd_inner_microstep: 1572.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 00:48:08,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.17 | bwd_microstep: 1377.53 | bwd_inner_microstep: 1377.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930
[2024-06-10 00:48:09,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.19 | bwd_microstep: 761.27 | bwd_inner_microstep: 761.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 00:48:10,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.42 | bwd_microstep: 1396.84 | bwd_inner_microstep: 1396.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 00:48:13,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.93 | bwd_microstep: 1559.87 | bwd_inner_microstep: 1559.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 00:48:15,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.35 | bwd_microstep: 1498.73 | bwd_inner_microstep: 1498.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 00:48:17,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.84 | bwd_microstep: 1501.03 | bwd_inner_microstep: 1501.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 00:48:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1407.93 | bwd_inner_microstep: 1407.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 00:48:21,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.79 | bwd_microstep: 1404.81 | bwd_inner_microstep: 1404.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-10 00:48:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.79 | bwd_microstep: 1345.90 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808
[2024-06-10 00:48:24,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.35 | bwd_microstep: 1358.75 | bwd_inner_microstep: 1358.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 00:48:26,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.96 | bwd_microstep: 1453.43 | bwd_inner_microstep: 1453.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 00:48:29,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 00:48:29,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.29 | bwd_microstep: 1642.18 | bwd_inner_microstep: 1634.32 | bwd_allreduce_microstep: 7.80 | step_microstep: 38.70
[2024-06-10 00:48:29,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.44 | bwd: 43625.39 | bwd_inner: 43616.59 | bwd_allreduce: 8.09 | step: 40.86
{'loss': 1.3749, 'learning_rate': 3.846153846153847e-06, 'epoch': 0.0}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 00:48:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1481.18 | bwd_inner_microstep: 1481.05 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4464
[2024-06-10 00:48:34,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 1306.31 | bwd_microstep: 1636.77 | bwd_inner_microstep: 1636.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3885
[2024-06-10 00:48:36,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1397.32 | bwd_inner_microstep: 1397.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-10 00:48:38,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.30 | bwd_microstep: 1349.89 | bwd_inner_microstep: 1349.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 00:48:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.28 | bwd_microstep: 1551.64 | bwd_inner_microstep: 1551.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 00:48:41,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.71 | bwd_microstep: 1249.95 | bwd_inner_microstep: 1249.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 00:48:43,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.79 | bwd_microstep: 1288.66 | bwd_inner_microstep: 1288.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 00:48:45,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1304.94 | bwd_inner_microstep: 1304.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 00:48:47,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.44 | bwd_microstep: 1657.28 | bwd_inner_microstep: 1657.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 00:48:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.92 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3217
[2024-06-10 00:48:51,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1274.03 | bwd_inner_microstep: 1274.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 00:48:53,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.80 | bwd_microstep: 1619.71 | bwd_inner_microstep: 1619.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3672
[2024-06-10 00:48:55,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.69 | bwd_microstep: 1436.25 | bwd_inner_microstep: 1436.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 00:48:57,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.69 | bwd_microstep: 1451.38 | bwd_inner_microstep: 1451.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942
[2024-06-10 00:48:58,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.62 | bwd_microstep: 891.15 | bwd_inner_microstep: 891.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1916
[2024-06-10 00:49:00,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.90 | bwd_microstep: 844.34 | bwd_inner_microstep: 844.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 00:49:02,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.92 | bwd_microstep: 1528.23 | bwd_inner_microstep: 1528.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 00:49:04,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.79 | bwd_microstep: 1385.88 | bwd_inner_microstep: 1385.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3556
[2024-06-10 00:49:06,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.23 | bwd_microstep: 1699.45 | bwd_inner_microstep: 1699.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 00:49:08,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1395.29 | bwd_inner_microstep: 1395.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 00:49:10,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.33 | bwd_microstep: 1356.30 | bwd_inner_microstep: 1356.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 00:49:11,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.63 | bwd_microstep: 817.62 | bwd_inner_microstep: 817.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1138
[2024-06-10 00:49:12,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 176.61 | bwd_microstep: 462.15 | bwd_inner_microstep: 462.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3722
[2024-06-10 00:49:13,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.21 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 00:49:15,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1469.20 | bwd_inner_microstep: 1469.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 00:49:17,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1419.12 | bwd_inner_microstep: 1419.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 00:49:19,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.83 | bwd_microstep: 1405.23 | bwd_inner_microstep: 1405.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 00:49:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3831
[2024-06-10 00:49:23,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.44 | bwd_microstep: 1723.20 | bwd_inner_microstep: 1723.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 00:49:26,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.00 | bwd_microstep: 1537.84 | bwd_inner_microstep: 1537.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581
[2024-06-10 00:49:28,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.05 | bwd_microstep: 1435.24 | bwd_inner_microstep: 1435.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3812
[2024-06-10 00:49:30,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 00:49:30,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.45 | bwd_microstep: 1705.02 | bwd_inner_microstep: 1697.23 | bwd_allreduce_microstep: 7.74 | step_microstep: 39.99
[2024-06-10 00:49:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17045.92 | bwd: 43723.27 | bwd_inner: 43714.52 | bwd_allreduce: 8.01 | step: 41.89
{'loss': 1.3896, 'learning_rate': 4.615384615384616e-06, 'epoch': 0.0}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 00:49:32,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.25 | bwd_microstep: 1347.86 | bwd_inner_microstep: 1347.64 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907
[2024-06-10 00:49:34,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.22 | bwd_microstep: 1592.10 | bwd_inner_microstep: 1592.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 00:49:36,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.74 | bwd_microstep: 1503.11 | bwd_inner_microstep: 1503.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 00:49:38,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1652.65 | bwd_inner_microstep: 1652.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4188
[2024-06-10 00:49:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.36 | bwd_microstep: 1593.59 | bwd_inner_microstep: 1593.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-10 00:49:43,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1553.15 | bwd_inner_microstep: 1553.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 00:49:44,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.11 | bwd_microstep: 1148.55 | bwd_inner_microstep: 1148.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 00:49:45,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.17 | bwd_microstep: 700.47 | bwd_inner_microstep: 700.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3489
[2024-06-10 00:49:47,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.37 | bwd_microstep: 1353.44 | bwd_inner_microstep: 1353.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3690
[2024-06-10 00:49:49,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.54 | bwd_microstep: 1592.27 | bwd_inner_microstep: 1592.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3718
[2024-06-10 00:49:52,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.69 | bwd_microstep: 1730.90 | bwd_inner_microstep: 1730.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 00:49:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.21 | bwd_microstep: 1388.09 | bwd_inner_microstep: 1388.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2027
[2024-06-10 00:49:55,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.91 | bwd_microstep: 843.61 | bwd_inner_microstep: 843.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 00:49:57,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.46 | bwd_microstep: 1520.99 | bwd_inner_microstep: 1520.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681
[2024-06-10 00:49:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.79 | bwd_microstep: 1336.15 | bwd_inner_microstep: 1336.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2437
[2024-06-10 00:50:00,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.32 | bwd_microstep: 950.11 | bwd_inner_microstep: 950.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 00:50:02,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.98 | bwd_microstep: 1490.70 | bwd_inner_microstep: 1490.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176
[2024-06-10 00:50:03,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.21 | bwd_microstep: 861.16 | bwd_inner_microstep: 861.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 00:50:05,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.90 | bwd_microstep: 1289.96 | bwd_inner_microstep: 1289.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 00:50:07,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1459.73 | bwd_inner_microstep: 1459.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2179
[2024-06-10 00:50:08,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.29 | bwd_microstep: 860.30 | bwd_inner_microstep: 860.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 00:50:10,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.13 | bwd_microstep: 1291.51 | bwd_inner_microstep: 1291.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 00:50:12,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.88 | bwd_microstep: 1286.31 | bwd_inner_microstep: 1286.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 00:50:13,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.27 | bwd_microstep: 804.03 | bwd_inner_microstep: 804.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3527
[2024-06-10 00:50:15,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.39 | bwd_microstep: 1557.42 | bwd_inner_microstep: 1557.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814
[2024-06-10 00:50:17,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.43 | bwd_microstep: 1585.63 | bwd_inner_microstep: 1585.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2090
[2024-06-10 00:50:19,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.32 | bwd_microstep: 953.83 | bwd_inner_microstep: 953.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3428
[2024-06-10 00:50:21,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1397.49 | bwd_inner_microstep: 1397.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2265
[2024-06-10 00:50:22,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.85 | bwd_microstep: 1070.62 | bwd_inner_microstep: 1070.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2036
[2024-06-10 00:50:23,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.86 | bwd_microstep: 911.50 | bwd_inner_microstep: 911.22 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 00:50:25,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.82 | bwd_microstep: 1567.55 | bwd_inner_microstep: 1567.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 00:50:33,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.35 | optimizer_step: 6.60
[2024-06-10 00:50:33,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 6465.99 | bwd_inner_microstep: 1753.19 | bwd_allreduce_microstep: 4712.72 | step_microstep: 39.88
[2024-06-10 00:50:33,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15632.28 | bwd: 46660.78 | bwd_inner: 41946.81 | bwd_allreduce: 4713.14 | step: 42.22

  0%|          | 0/1726 [00:00<?, ?it/s]
  0%|          | 1/1726 [06:58<200:44:53, 418.95s/it]


  0%|          | 1/1726 [07:01<200:44:53, 418.95s/it]
  0%|          | 2/1726 [08:01<100:11:08, 209.20s/it]


  0%|          | 2/1726 [08:01<100:11:08, 209.20s/it]
  0%|          | 3/1726 [09:03<67:54:01, 141.87s/it]


  0%|          | 3/1726 [09:03<67:54:01, 141.87s/it]
  0%|          | 4/1726 [10:05<52:53:09, 110.56s/it]


  0%|          | 4/1726 [10:05<52:53:09, 110.56s/it]
  0%|          | 5/1726 [11:05<44:11:34, 92.44s/it]


  0%|          | 5/1726 [11:05<44:11:34, 92.44s/it]
  0%|          | 6/1726 [12:07<39:05:01, 81.80s/it]


  0%|          | 6/1726 [12:07<39:05:01, 81.80s/it]
  0%|          | 7/1{'loss': 1.3891, 'learning_rate': 5.384615384615385e-06, 'epoch': 0.0}
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 4239
[2024-06-10 00:50:35,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.46 | bwd_microstep: 1763.61 | bwd_inner_microstep: 1763.38 | bwd_allreduce_microstep: 0.16 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 00:50:37,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1378.67 | bwd_inner_microstep: 1378.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3889
[2024-06-10 00:50:39,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.67 | bwd_microstep: 1508.29 | bwd_inner_microstep: 1508.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 00:50:41,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.25 | bwd_microstep: 1452.62 | bwd_inner_microstep: 1452.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 00:50:43,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.35 | bwd_microstep: 1440.72 | bwd_inner_microstep: 1440.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 00:50:45,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.81 | bwd_microstep: 1282.46 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 00:50:47,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.71 | bwd_microstep: 1343.09 | bwd_inner_microstep: 1343.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 00:50:49,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1413.07 | bwd_inner_microstep: 1413.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2203
[2024-06-10 00:50:50,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.67 | bwd_microstep: 866.46 | bwd_inner_microstep: 866.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1957
[2024-06-10 00:50:51,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.44 | bwd_microstep: 828.74 | bwd_inner_microstep: 828.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 00:50:53,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1379.20 | bwd_inner_microstep: 1379.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 00:50:55,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.14 | bwd_microstep: 1534.00 | bwd_inner_microstep: 1533.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3605
[2024-06-10 00:50:57,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 1652.85 | bwd_inner_microstep: 1652.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 00:50:59,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.40 | bwd_microstep: 1285.28 | bwd_inner_microstep: 1285.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 00:51:01,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.08 | bwd_microstep: 1388.78 | bwd_inner_microstep: 1388.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667
[2024-06-10 00:51:03,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.87 | bwd_microstep: 1615.58 | bwd_inner_microstep: 1615.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 00:51:05,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1496.14 | bwd_inner_microstep: 1495.91 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 00:51:07,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1345.50 | bwd_inner_microstep: 1345.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 00:51:09,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.63 | bwd_microstep: 1515.03 | bwd_inner_microstep: 1515.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 00:51:11,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.50 | bwd_microstep: 1463.83 | bwd_inner_microstep: 1463.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 00:51:13,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1408.28 | bwd_inner_microstep: 1408.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 00:51:15,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1399.74 | bwd_inner_microstep: 1399.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3827
[2024-06-10 00:51:17,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.07 | bwd_microstep: 1693.23 | bwd_inner_microstep: 1693.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 00:51:19,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1404.94 | bwd_inner_microstep: 1404.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 00:51:21,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1497.11 | bwd_inner_microstep: 1497.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 00:51:23,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1330.57 | bwd_inner_microstep: 1330.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 00:51:25,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.06 | bwd_microstep: 1352.92 | bwd_inner_microstep: 1352.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 00:51:27,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.32 | bwd_microstep: 1548.96 | bwd_inner_microstep: 1548.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 00:51:29,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-10 00:51:31,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.05 | bwd_microstep: 1352.55 | bwd_inner_microstep: 1352.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 00:51:33,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.92 | bwd_microstep: 1309.20 | bwd_inner_microstep: 1309.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181
[2024-06-10 00:51:35,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.96 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 00:51:35,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.57 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1001.83 | bwd_allreduce_microstep: 248.09 | step_microstep: 38.61
[2024-06-10 00:51:35,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16724.26 | bwd: 44907.45 | bwd_inner: 44658.06 | bwd_allreduce: 248.58 | step: 40.65
{'loss': 1.4131, 'learning_rate': 6.153846153846155e-06, 'epoch': 0.0}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3485
[2024-06-10 00:51:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.33 | bwd_microstep: 1577.38 | bwd_inner_microstep: 1577.16 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-10 00:51:39,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.25 | bwd_microstep: 1487.25 | bwd_inner_microstep: 1487.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 00:51:41,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1550.23 | bwd_inner_microstep: 1550.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1345
[2024-06-10 00:51:42,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.12 | bwd_microstep: 517.29 | bwd_inner_microstep: 517.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 00:51:44,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1344.93 | bwd_inner_microstep: 1344.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 00:51:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1283.73 | bwd_inner_microstep: 1283.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 00:51:47,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.10 | bwd_microstep: 1388.20 | bwd_inner_microstep: 1388.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 00:51:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.40 | bwd_microstep: 807.09 | bwd_inner_microstep: 807.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2054
[2024-06-10 00:51:50,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.46 | bwd_microstep: 849.88 | bwd_inner_microstep: 849.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 00:51:51,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.42 | bwd_microstep: 1345.17 | bwd_inner_microstep: 1345.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3476
[2024-06-10 00:51:54,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.19 | bwd_microstep: 1547.42 | bwd_inner_microstep: 1547.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.30
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3527
[2024-06-10 00:51:55,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.10 | bwd_microstep: 1357.45 | bwd_inner_microstep: 1357.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 00:51:57,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1350.39 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.27
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 00:51:59,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.25 | bwd_microstep: 1486.23 | bwd_inner_microstep: 1486.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 00:52:01,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.01 | bwd_microstep: 1252.46 | bwd_inner_microstep: 1252.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 00:52:03,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1489.67 | bwd_inner_microstep: 1489.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 00:52:05,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.80 | bwd_microstep: 1254.82 | bwd_inner_microstep: 1254.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 00:52:07,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1314.68 | bwd_inner_microstep: 1314.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2134
[2024-06-10 00:52:08,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.17 | bwd_microstep: 928.17 | bwd_inner_microstep: 928.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 00:52:10,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.98 | bwd_microstep: 1559.53 | bwd_inner_microstep: 1559.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 00:52:12,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.89 | bwd_microstep: 1416.88 | bwd_inner_microstep: 1416.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 00:52:14,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1284.00 | bwd_inner_microstep: 1283.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180
[2024-06-10 00:52:15,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.73 | bwd_microstep: 860.41 | bwd_inner_microstep: 860.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3535
[2024-06-10 00:52:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.35 | bwd_microstep: 1474.70 | bwd_inner_microstep: 1474.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 00:52:19,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.66 | bwd_microstep: 1659.97 | bwd_inner_microstep: 1659.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 00:52:21,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1409.43 | bwd_inner_microstep: 1409.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 00:52:23,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1392.36 | bwd_inner_microstep: 1392.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 00:52:25,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.11 | bwd_microstep: 1306.37 | bwd_inner_microstep: 1306.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2001
[2024-06-10 00:52:26,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.60 | bwd_microstep: 774.56 | bwd_inner_microstep: 774.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-10 00:52:28,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1445.13 | bwd_inner_microstep: 1445.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2727
[2024-06-10 00:52:30,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.72 | bwd_microstep: 1045.22 | bwd_inner_microstep: 1045.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3581
[2024-06-10 00:52:34,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 00:52:34,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.81 | bwd_microstep: 3672.83 | bwd_inner_microstep: 2035.47 | bwd_allreduce_microstep: 1637.31 | step_microstep: 39.31
[2024-06-10 00:52:34,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15586.19 | bwd: 43433.89 | bwd_inner: 41795.43 | bwd_allreduce: 1637.66 | step: 41.74
{'loss': 1.3731, 'learning_rate': 6.923076923076923e-06, 'epoch': 0.01}
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3667
[2024-06-10 00:52:36,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1275.51 | bwd_inner_microstep: 1275.42 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.14
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1968
[2024-06-10 00:52:37,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.08 | bwd_microstep: 734.61 | bwd_inner_microstep: 734.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915
[2024-06-10 00:52:39,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.91 | bwd_microstep: 1594.10 | bwd_inner_microstep: 1594.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3842
[2024-06-10 00:52:41,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.52 | bwd_microstep: 1661.11 | bwd_inner_microstep: 1661.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 00:52:43,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.42 | bwd_microstep: 1542.10 | bwd_inner_microstep: 1542.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 00:52:45,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1253.62 | bwd_inner_microstep: 1253.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 00:52:47,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.54 | bwd_microstep: 1151.89 | bwd_inner_microstep: 1151.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 00:52:48,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1248.32 | bwd_inner_microstep: 1248.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 00:52:50,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1252.51 | bwd_inner_microstep: 1252.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-10 00:52:52,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.06 | bwd_microstep: 1156.35 | bwd_inner_microstep: 1156.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3642
[2024-06-10 00:52:54,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 1325.42 | bwd_inner_microstep: 1325.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 00:52:55,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1341.82 | bwd_inner_microstep: 1341.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2945
[2024-06-10 00:52:57,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.53 | bwd_microstep: 1100.23 | bwd_inner_microstep: 1100.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3529
[2024-06-10 00:52:59,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1520.46 | bwd_inner_microstep: 1520.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 00:53:01,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1387.11 | bwd_inner_microstep: 1387.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3960
[2024-06-10 00:53:03,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.48 | bwd_microstep: 1588.50 | bwd_inner_microstep: 1588.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644
[2024-06-10 00:53:05,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1505.50 | bwd_inner_microstep: 1505.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 00:53:08,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.62 | bwd_microstep: 1654.82 | bwd_inner_microstep: 1654.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670
[2024-06-10 00:53:09,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.74 | bwd_microstep: 1326.95 | bwd_inner_microstep: 1326.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 00:53:11,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.02 | bwd_microstep: 1183.37 | bwd_inner_microstep: 1183.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 00:53:13,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.90 | bwd_microstep: 1190.61 | bwd_inner_microstep: 1190.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817
[2024-06-10 00:53:15,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.54 | bwd_microstep: 1625.46 | bwd_inner_microstep: 1625.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3655
[2024-06-10 00:53:17,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.48 | bwd_microstep: 1826.38 | bwd_inner_microstep: 1826.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3443
[2024-06-10 00:53:19,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.25 | bwd_microstep: 1413.32 | bwd_inner_microstep: 1413.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-10 00:53:20,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.29 | bwd_microstep: 691.46 | bwd_inner_microstep: 691.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2446
[2024-06-10 00:53:22,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.00 | bwd_microstep: 855.18 | bwd_inner_microstep: 855.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 00:53:24,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.58 | bwd_microstep: 1406.40 | bwd_inner_microstep: 1406.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-10 00:53:25,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.95 | bwd_microstep: 1331.90 | bwd_inner_microstep: 1331.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 00:53:27,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1454.38 | bwd_inner_microstep: 1454.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2278
[2024-06-10 00:53:29,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.99 | bwd_microstep: 849.18 | bwd_inner_microstep: 848.99 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 00:53:31,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.54 | bwd_microstep: 1658.27 | bwd_inner_microstep: 1658.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 00:53:36,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.24 | optimizer_step: 6.57
[2024-06-10 00:53:36,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.64 | bwd_microstep: 4222.54 | bwd_inner_microstep: 1581.66 | bwd_allreduce_microstep: 2640.82 | step_microstep: 38.61
[2024-06-10 00:53:36,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15973.74 | bwd: 45329.41 | bwd_inner: 42687.45 | bwd_allreduce: 2641.16 | step: 40.64
{'loss': 1.3564, 'learning_rate': 7.692307692307694e-06, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 00:53:38,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1357.04 | bwd_inner_microstep: 1356.81 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.22
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 00:53:39,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1348.81 | bwd_inner_microstep: 1348.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 00:53:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.33 | bwd_microstep: 1656.36 | bwd_inner_microstep: 1656.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 00:53:44,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1404.70 | bwd_inner_microstep: 1404.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 00:53:45,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1285.11 | bwd_inner_microstep: 1285.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3735
[2024-06-10 00:53:47,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.02 | bwd_microstep: 1339.65 | bwd_inner_microstep: 1339.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 00:53:50,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.53 | bwd_microstep: 1635.88 | bwd_inner_microstep: 1635.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 00:53:51,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.64 | bwd_microstep: 1251.01 | bwd_inner_microstep: 1250.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 00:53:53,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.74 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 00:53:54,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.38 | bwd_microstep: 800.30 | bwd_inner_microstep: 800.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487
[2024-06-10 00:53:56,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.85 | bwd_microstep: 1418.61 | bwd_inner_microstep: 1418.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 00:53:58,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1434.94 | bwd_inner_microstep: 1434.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 00:54:00,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.44 | bwd_microstep: 1286.48 | bwd_inner_microstep: 1286.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525
[2024-06-10 00:54:02,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.68 | bwd_microstep: 1456.28 | bwd_inner_microstep: 1456.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3663
[2024-06-10 00:54:04,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.20 | bwd_microstep: 1563.09 | bwd_inner_microstep: 1563.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3902
[2024-06-10 00:54:06,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.99 | bwd_microstep: 1719.89 | bwd_inner_microstep: 1719.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2975
[2024-06-10 00:54:08,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.86 | bwd_microstep: 1236.36 | bwd_inner_microstep: 1236.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 00:54:10,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.55 | bwd_microstep: 1388.92 | bwd_inner_microstep: 1388.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 00:54:12,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.78 | bwd_microstep: 1516.00 | bwd_inner_microstep: 1515.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1971
[2024-06-10 00:54:13,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.95 | bwd_microstep: 861.43 | bwd_inner_microstep: 861.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 00:54:15,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.09 | bwd_microstep: 1433.44 | bwd_inner_microstep: 1433.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 00:54:17,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 1497.79 | bwd_inner_microstep: 1497.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 00:54:19,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.73 | bwd_microstep: 1259.56 | bwd_inner_microstep: 1259.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 00:54:21,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1381.61 | bwd_inner_microstep: 1381.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766
[2024-06-10 00:54:23,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.18 | bwd_microstep: 1356.19 | bwd_inner_microstep: 1356.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2249
[2024-06-10 00:54:24,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 876.78 | bwd_inner_microstep: 876.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811
[2024-06-10 00:54:26,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1358.11 | bwd_inner_microstep: 1358.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 00:54:28,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.61 | bwd_microstep: 1657.82 | bwd_inner_microstep: 1657.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 00:54:30,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1510.77 | bwd_inner_microstep: 1510.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 00:54:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.14 | bwd_microstep: 1657.53 | bwd_inner_microstep: 1657.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3598
[2024-06-10 00:54:35,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1387.11 | bwd_inner_microstep: 1387.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 00:54:37,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 00:54:37,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.85 | bwd_microstep: 1806.67 | bwd_inner_microstep: 1471.83 | bwd_allreduce_microstep: 334.79 | step_microstep: 38.83
[2024-06-10 00:54:37,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16463.63 | bwd: 44429.37 | bwd_inner: 44093.49 | bwd_allreduce: 335.13 | step: 40.81
{'loss': 1.2771, 'learning_rate': 8.461538461538462e-06, 'epoch': 0.01}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511
[2024-06-10 00:54:39,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.42 | bwd_microstep: 1189.08 | bwd_inner_microstep: 1189.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3924
[2024-06-10 00:54:41,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.29 | bwd_microstep: 1697.70 | bwd_inner_microstep: 1697.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 00:54:43,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.79 | bwd_microstep: 1657.84 | bwd_inner_microstep: 1657.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 00:54:45,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1290.21 | bwd_inner_microstep: 1290.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 00:54:46,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.99 | bwd_microstep: 688.85 | bwd_inner_microstep: 688.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 00:54:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.28 | bwd_microstep: 699.82 | bwd_inner_microstep: 699.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 00:54:49,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.82 | bwd_microstep: 1252.40 | bwd_inner_microstep: 1252.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 00:54:51,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1417.20 | bwd_inner_microstep: 1417.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1985
[2024-06-10 00:54:52,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.90 | bwd_microstep: 711.21 | bwd_inner_microstep: 711.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530
[2024-06-10 00:54:54,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.87 | bwd_microstep: 1419.51 | bwd_inner_microstep: 1419.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 00:54:55,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.77 | bwd_microstep: 1353.54 | bwd_inner_microstep: 1353.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 00:54:57,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.01 | bwd_microstep: 1192.67 | bwd_inner_microstep: 1192.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 00:54:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.74 | bwd_microstep: 1382.13 | bwd_inner_microstep: 1382.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 00:55:01,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.66 | bwd_microstep: 1438.10 | bwd_inner_microstep: 1438.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 00:55:02,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.72 | bwd_microstep: 794.44 | bwd_inner_microstep: 794.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 00:55:04,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.97 | bwd_microstep: 1386.99 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 00:55:06,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1388.34 | bwd_inner_microstep: 1388.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 00:55:08,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.23 | bwd_microstep: 1522.70 | bwd_inner_microstep: 1522.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293
[2024-06-10 00:55:09,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.85 | bwd_microstep: 883.84 | bwd_inner_microstep: 883.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1037
[2024-06-10 00:55:10,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.85 | bwd_microstep: 398.58 | bwd_inner_microstep: 398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-10 00:55:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1304.18 | bwd_inner_microstep: 1304.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 00:55:14,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.00 | bwd_microstep: 1383.04 | bwd_inner_microstep: 1383.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 00:55:16,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.16 | bwd_microstep: 1542.67 | bwd_inner_microstep: 1542.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 00:55:18,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.64 | bwd_microstep: 1617.80 | bwd_inner_microstep: 1617.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 00:55:20,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1596.71 | bwd_inner_microstep: 1596.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 00:55:22,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.08 | bwd_microstep: 1648.83 | bwd_inner_microstep: 1648.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-10 00:55:23,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.12 | bwd_microstep: 709.01 | bwd_inner_microstep: 708.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3561
[2024-06-10 00:55:25,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.79 | bwd_microstep: 1429.95 | bwd_inner_microstep: 1429.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.59
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 00:55:27,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.11 | bwd_microstep: 1455.06 | bwd_inner_microstep: 1455.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 00:55:29,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.32 | bwd_microstep: 1501.76 | bwd_inner_microstep: 1501.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3238
[2024-06-10 00:55:31,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.49 | bwd_microstep: 1183.48 | bwd_inner_microstep: 1183.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3814
[2024-06-10 00:55:39,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.37 | optimizer_step: 6.60
[2024-06-10 00:55:39,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 7473.72 | bwd_inner_microstep: 1561.33 | bwd_allreduce_microstep: 5912.31 | step_microstep: 40.00
[2024-06-10 00:55:39,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15208.27 | bwd: 46611.37 | bwd_inner: 40698.12 | bwd_allreduce: 5912.56 | step: 43.36
{'loss': 1.3572, 'learning_rate': 9.230769230769232e-06, 'epoch': 0.01}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3460
[2024-06-10 00:55:41,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.01 | bwd_microstep: 1565.07 | bwd_inner_microstep: 1564.84 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904
[2024-06-10 00:55:43,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.06 | bwd_microstep: 1482.14 | bwd_inner_microstep: 1482.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-10 00:55:45,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.16 | bwd_microstep: 1459.83 | bwd_inner_microstep: 1459.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 00:55:47,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.54 | bwd_microstep: 1246.99 | bwd_inner_microstep: 1246.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 00:55:48,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.98 | bwd_microstep: 782.61 | bwd_inner_microstep: 782.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 00:55:50,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1379.30 | bwd_inner_microstep: 1379.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 00:55:52,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.45 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1288.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 00:55:54,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1384.54 | bwd_inner_microstep: 1384.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 00:55:56,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1386.92 | bwd_inner_microstep: 1386.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 00:55:57,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1250.90 | bwd_inner_microstep: 1250.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-10 00:55:59,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.47 | bwd_microstep: 1155.69 | bwd_inner_microstep: 1155.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947
[2024-06-10 00:56:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.21 | bwd_microstep: 821.58 | bwd_inner_microstep: 821.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 00:56:02,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.10 | bwd_microstep: 1402.42 | bwd_inner_microstep: 1402.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005
[2024-06-10 00:56:03,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.10 | bwd_microstep: 896.76 | bwd_inner_microstep: 896.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3613
[2024-06-10 00:56:06,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.68 | bwd_microstep: 1712.28 | bwd_inner_microstep: 1712.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 00:56:08,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.44 | bwd_microstep: 1616.83 | bwd_inner_microstep: 1616.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3423
[2024-06-10 00:56:10,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.97 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 00:56:12,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.49 | bwd_microstep: 1432.45 | bwd_inner_microstep: 1432.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3658
[2024-06-10 00:56:14,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.57 | bwd_microstep: 1683.72 | bwd_inner_microstep: 1683.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3833
[2024-06-10 00:56:16,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.77 | bwd_microstep: 1521.43 | bwd_inner_microstep: 1521.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 00:56:18,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.94 | bwd_microstep: 1284.82 | bwd_inner_microstep: 1284.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1996
[2024-06-10 00:56:19,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.21 | bwd_microstep: 713.65 | bwd_inner_microstep: 713.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-10 00:56:20,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.46 | bwd_microstep: 696.20 | bwd_inner_microstep: 696.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 00:56:21,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.87 | bwd_microstep: 814.37 | bwd_inner_microstep: 814.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3019
[2024-06-10 00:56:23,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.81 | bwd_microstep: 1232.69 | bwd_inner_microstep: 1232.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2272
[2024-06-10 00:56:24,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.32 | bwd_microstep: 1034.97 | bwd_inner_microstep: 1034.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 00:56:26,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1407.67 | bwd_inner_microstep: 1407.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 00:56:28,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3749
[2024-06-10 00:56:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.63 | bwd_microstep: 1603.46 | bwd_inner_microstep: 1603.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 00:56:32,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.39 | bwd_microstep: 1185.39 | bwd_inner_microstep: 1185.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3492
[2024-06-10 00:56:34,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.03 | bwd_microstep: 1335.60 | bwd_inner_microstep: 1335.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 00:56:42,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.37 | optimizer_step: 6.60
[2024-06-10 00:56:42,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.20 | bwd_microstep: 8206.76 | bwd_inner_microstep: 1579.52 | bwd_allreduce_microstep: 6627.17 | step_microstep: 41.65
[2024-06-10 00:56:42,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15352.66 | bwd: 47647.33 | bwd_inner: 41019.05 | bwd_allreduce: 6627.51 | step: 43.83
726 [13:09<36:04:30, 75.55s/it]


  0%|          | 7/1726 [13:09<36:04:30, 75.55s/it]
  0%|          | 8/1726 [14:11<33:59:50, 71.24s/it]


  0%|          | 8/1726 [14:11<33:59:50, 71.24s/it]
  1%|          | 9/1726 [15:11<32:12:48, 67.54s/it]


  1%|          | 9/1726 [15:11<32:12:48, 67.54s/it]
  1%|          | 10/1726 [16:12<31:19:53, 65.73s/it]


  1%|          | 10/1726 [16:12<31:19:53, 65.73s/it]
  1%|          | 11/1726 [17:14<30:39:48, 64.37s/it]


  1%|          | 11/1726 [17:14<30:39:48, 64.37s/it]
  1%|          | 12/1726 [18:16<30:19:49, 63.70s/it]


  1%|          | 12/1726 [18:16<30:19:49, 63.70s/it]
  1%|          | 13/1726 [19:19<30:15:59, 63.61s/it]
                                        {'loss': 1.3726, 'learning_rate': 1e-05, 'epoch': 0.01}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2888
[2024-06-10 00:56:44,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.92 | bwd_microstep: 1173.10 | bwd_inner_microstep: 1172.87 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 00:56:46,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.59 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407
[2024-06-10 00:56:48,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.40 | bwd_microstep: 1187.94 | bwd_inner_microstep: 1187.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 00:56:50,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.12 | bwd_microstep: 1547.97 | bwd_inner_microstep: 1547.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 00:56:51,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.37 | bwd_microstep: 1247.60 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771
[2024-06-10 00:56:53,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.94 | bwd_microstep: 1436.91 | bwd_inner_microstep: 1436.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-10 00:56:54,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.57 | bwd_microstep: 682.94 | bwd_inner_microstep: 682.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 00:56:56,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.52 | bwd_microstep: 1243.97 | bwd_inner_microstep: 1243.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 00:56:58,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1533.78 | bwd_inner_microstep: 1533.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3610
[2024-06-10 00:57:00,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.36 | bwd_microstep: 1376.88 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3377
[2024-06-10 00:57:02,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.64 | bwd_microstep: 1175.99 | bwd_inner_microstep: 1175.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 00:57:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1341.41 | bwd_inner_microstep: 1341.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 00:57:05,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1348.77 | bwd_inner_microstep: 1348.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 00:57:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.85 | bwd_microstep: 1615.71 | bwd_inner_microstep: 1615.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3521
[2024-06-10 00:57:10,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1357.43 | bwd_inner_microstep: 1357.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3685
[2024-06-10 00:57:12,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1493.49 | bwd_inner_microstep: 1493.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 00:57:14,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1503.83 | bwd_inner_microstep: 1503.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 00:57:16,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1609.81 | bwd_inner_microstep: 1609.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 00:57:18,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.33 | bwd_microstep: 1473.21 | bwd_inner_microstep: 1473.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 00:57:20,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1286.03 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 00:57:21,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.66 | bwd_microstep: 915.44 | bwd_inner_microstep: 915.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 00:57:23,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.35 | bwd_microstep: 1404.69 | bwd_inner_microstep: 1404.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 00:57:25,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.22 | bwd_microstep: 1298.31 | bwd_inner_microstep: 1298.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292
[2024-06-10 00:57:26,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.01 | bwd_microstep: 880.18 | bwd_inner_microstep: 880.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 00:57:28,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.31 | bwd_microstep: 1335.19 | bwd_inner_microstep: 1335.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 00:57:30,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1513.03 | bwd_inner_microstep: 1513.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 00:57:32,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.67 | bwd_microstep: 1282.00 | bwd_inner_microstep: 1281.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 00:57:34,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.09 | bwd_microstep: 1354.45 | bwd_inner_microstep: 1354.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-10 00:57:36,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1463.34 | bwd_inner_microstep: 1463.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-10 00:57:37,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.80 | bwd_microstep: 1312.79 | bwd_inner_microstep: 1312.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 00:57:40,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.89 | bwd_microstep: 1547.89 | bwd_inner_microstep: 1547.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 00:57:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 00:57:43,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.38 | bwd_microstep: 2569.76 | bwd_inner_microstep: 1698.29 | bwd_allreduce_microstep: 871.42 | step_microstep: 39.24
[2024-06-10 00:57:43,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16045.11 | bwd: 43798.12 | bwd_inner: 42925.62 | bwd_allreduce: 871.74 | step: 41.42
{'loss': 1.3354, 'learning_rate': 1.076923076923077e-05, 'epoch': 0.01}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 00:57:45,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1467.69 | bwd_inner_microstep: 1467.55 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3966
[2024-06-10 00:57:47,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.51 | bwd_microstep: 1404.35 | bwd_inner_microstep: 1404.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 00:57:49,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1387.21 | bwd_inner_microstep: 1387.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 00:57:50,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 00:57:52,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1391.08 | bwd_inner_microstep: 1391.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3441
[2024-06-10 00:57:54,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1301.54 | bwd_inner_microstep: 1301.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2514
[2024-06-10 00:57:55,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.09 | bwd_microstep: 935.24 | bwd_inner_microstep: 935.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 00:57:57,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1282.87 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3789
[2024-06-10 00:57:59,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.99 | bwd_microstep: 1650.40 | bwd_inner_microstep: 1650.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 00:58:02,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.47 | bwd_microstep: 1639.51 | bwd_inner_microstep: 1639.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2033
[2024-06-10 00:58:03,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.20 | bwd_microstep: 907.43 | bwd_inner_microstep: 907.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 00:58:05,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1448.14 | bwd_inner_microstep: 1448.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 00:58:07,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.41 | bwd_microstep: 1521.63 | bwd_inner_microstep: 1521.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3708
[2024-06-10 00:58:09,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.19 | bwd_microstep: 1729.85 | bwd_inner_microstep: 1729.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 00:58:11,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3640
[2024-06-10 00:58:14,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.71 | bwd_microstep: 1682.99 | bwd_inner_microstep: 1682.82 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.26
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1953
[2024-06-10 00:58:15,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.72 | bwd_microstep: 704.33 | bwd_inner_microstep: 704.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 00:58:16,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.52 | bwd_microstep: 1295.68 | bwd_inner_microstep: 1295.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 00:58:18,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.92 | bwd_microstep: 801.05 | bwd_inner_microstep: 801.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 00:58:20,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.77 | bwd_microstep: 1610.07 | bwd_inner_microstep: 1610.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 00:58:22,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.51 | bwd_microstep: 1354.27 | bwd_inner_microstep: 1354.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 00:58:24,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.45 | bwd_microstep: 1387.51 | bwd_inner_microstep: 1387.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 00:58:25,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.49 | bwd_microstep: 977.31 | bwd_inner_microstep: 977.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 00:58:26,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.90 | bwd_microstep: 807.00 | bwd_inner_microstep: 806.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3797
[2024-06-10 00:58:28,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 1387.36 | bwd_inner_microstep: 1387.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2039
[2024-06-10 00:58:29,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.07 | bwd_microstep: 841.67 | bwd_inner_microstep: 841.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2195
[2024-06-10 00:58:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.17 | bwd_microstep: 895.16 | bwd_inner_microstep: 895.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 00:58:32,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.98 | bwd_microstep: 1383.32 | bwd_inner_microstep: 1383.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2196
[2024-06-10 00:58:33,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 773.14 | bwd_inner_microstep: 773.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 00:58:36,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.89 | bwd_microstep: 1637.50 | bwd_inner_microstep: 1637.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2002
[2024-06-10 00:58:37,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.11 | bwd_microstep: 741.76 | bwd_inner_microstep: 741.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 00:58:43,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.38 | optimizer_step: 6.59
[2024-06-10 00:58:43,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 6266.48 | bwd_inner_microstep: 1569.91 | bwd_allreduce_microstep: 4696.50 | step_microstep: 39.99
[2024-06-10 00:58:43,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15140.36 | bwd: 45260.06 | bwd_inner: 40562.30 | bwd_allreduce: 4696.90 | step: 42.42
{'loss': 1.3803, 'learning_rate': 1.1538461538461538e-05, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 00:58:45,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.69 | bwd_microstep: 1335.91 | bwd_inner_microstep: 1335.70 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 00:58:46,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 789.48 | bwd_inner_microstep: 789.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3853
[2024-06-10 00:58:48,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1465.44 | bwd_inner_microstep: 1465.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3862
[2024-06-10 00:58:50,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.31 | bwd_microstep: 1397.66 | bwd_inner_microstep: 1397.59 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 00:58:52,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1401.81 | bwd_inner_microstep: 1401.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 00:58:54,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1543.67 | bwd_inner_microstep: 1543.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 00:58:57,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1535.23 | bwd_inner_microstep: 1535.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 00:58:58,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1252.79 | bwd_inner_microstep: 1252.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215
[2024-06-10 00:59:00,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.99 | bwd_microstep: 958.54 | bwd_inner_microstep: 958.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-10 00:59:01,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.91 | bwd_microstep: 1192.13 | bwd_inner_microstep: 1192.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 738
[2024-06-10 00:59:02,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.46 | bwd_microstep: 300.13 | bwd_inner_microstep: 300.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 00:59:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.79 | bwd_microstep: 1393.20 | bwd_inner_microstep: 1393.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3668
[2024-06-10 00:59:06,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.06 | bwd_microstep: 1567.83 | bwd_inner_microstep: 1567.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3444
[2024-06-10 00:59:08,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.61 | bwd_microstep: 1372.10 | bwd_inner_microstep: 1372.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2159
[2024-06-10 00:59:09,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.04 | bwd_microstep: 1048.57 | bwd_inner_microstep: 1048.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3830
[2024-06-10 00:59:12,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.18 | bwd_microstep: 1811.82 | bwd_inner_microstep: 1811.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 00:59:14,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1606.35 | bwd_inner_microstep: 1606.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921
[2024-06-10 00:59:15,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.17 | bwd_microstep: 729.63 | bwd_inner_microstep: 729.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3602
[2024-06-10 00:59:17,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.86 | bwd_microstep: 1675.67 | bwd_inner_microstep: 1675.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 00:59:19,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.41 | bwd_microstep: 1296.10 | bwd_inner_microstep: 1296.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-10 00:59:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.70 | bwd_microstep: 1418.86 | bwd_inner_microstep: 1418.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 00:59:23,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.92 | bwd_microstep: 1298.51 | bwd_inner_microstep: 1298.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3591
[2024-06-10 00:59:25,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.85 | bwd_microstep: 1538.16 | bwd_inner_microstep: 1538.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1929
[2024-06-10 00:59:26,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.86 | bwd_microstep: 761.77 | bwd_inner_microstep: 761.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 00:59:28,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1393.06 | bwd_inner_microstep: 1393.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-10 00:59:30,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.08 | bwd_microstep: 1602.82 | bwd_inner_microstep: 1602.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1645
[2024-06-10 00:59:31,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 242.91 | bwd_microstep: 616.66 | bwd_inner_microstep: 616.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684
[2024-06-10 00:59:33,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1430.44 | bwd_inner_microstep: 1430.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2277
[2024-06-10 00:59:34,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.90 | bwd_microstep: 911.85 | bwd_inner_microstep: 911.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670
[2024-06-10 00:59:36,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1458.26 | bwd_inner_microstep: 1458.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 00:59:38,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.38 | bwd_microstep: 1254.02 | bwd_inner_microstep: 1253.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 00:59:44,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.35 | optimizer_step: 6.60
[2024-06-10 00:59:44,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.67 | bwd_microstep: 5595.33 | bwd_inner_microstep: 1719.57 | bwd_allreduce_microstep: 3875.68 | step_microstep: 41.45
[2024-06-10 00:59:44,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15319.52 | bwd: 44953.85 | bwd_inner: 41077.02 | bwd_allreduce: 3876.04 | step: 43.59
{'loss': 1.3955, 'learning_rate': 1.230769230769231e-05, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 00:59:46,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.53 | bwd_microstep: 1374.15 | bwd_inner_microstep: 1373.94 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4389
[2024-06-10 00:59:48,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.49 | bwd_microstep: 1709.60 | bwd_inner_microstep: 1709.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 00:59:50,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.59 | bwd_microstep: 1149.05 | bwd_inner_microstep: 1149.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 910
[2024-06-10 00:59:51,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.24 | bwd_microstep: 374.90 | bwd_inner_microstep: 374.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 00:59:52,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1384.30 | bwd_inner_microstep: 1384.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 00:59:54,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.10 | bwd_microstep: 1433.92 | bwd_inner_microstep: 1433.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3715
[2024-06-10 00:59:57,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.61 | bwd_microstep: 1494.84 | bwd_inner_microstep: 1494.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4010
[2024-06-10 00:59:59,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.89 | bwd_microstep: 1542.31 | bwd_inner_microstep: 1542.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3701
[2024-06-10 01:00:01,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.81 | bwd_microstep: 1595.36 | bwd_inner_microstep: 1595.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 01:00:03,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1414.27 | bwd_inner_microstep: 1414.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 01:00:05,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.51 | bwd_microstep: 1518.07 | bwd_inner_microstep: 1518.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696
[2024-06-10 01:00:07,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.24 | bwd_microstep: 1627.09 | bwd_inner_microstep: 1627.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649
[2024-06-10 01:00:09,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.32 | bwd_microstep: 1553.25 | bwd_inner_microstep: 1553.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 01:00:11,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.61 | bwd_microstep: 1346.74 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 01:00:13,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1488.96 | bwd_inner_microstep: 1488.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 01:00:15,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1321.11 | bwd_inner_microstep: 1321.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-10 01:00:16,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.19 | bwd_microstep: 810.13 | bwd_inner_microstep: 810.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 01:00:18,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.54 | bwd_microstep: 1407.40 | bwd_inner_microstep: 1407.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-10 01:00:20,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.62 | bwd_microstep: 1184.10 | bwd_inner_microstep: 1184.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3679
[2024-06-10 01:00:22,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.25 | bwd_microstep: 1359.85 | bwd_inner_microstep: 1359.67 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.21
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2133
[2024-06-10 01:00:23,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.32 | bwd_microstep: 836.87 | bwd_inner_microstep: 836.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 01:00:25,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.31 | bwd_microstep: 1300.04 | bwd_inner_microstep: 1300.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 01:00:27,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.74 | bwd_microstep: 1558.94 | bwd_inner_microstep: 1558.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 01:00:29,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1536.84 | bwd_inner_microstep: 1536.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-10 01:00:30,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.70 | bwd_microstep: 699.82 | bwd_inner_microstep: 699.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3733
[2024-06-10 01:00:32,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1563.12 | bwd_inner_microstep: 1563.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 01:00:34,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.30 | bwd_microstep: 1299.78 | bwd_inner_microstep: 1299.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 01:00:36,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.28 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3475
[2024-06-10 01:00:38,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.12 | bwd_microstep: 1442.02 | bwd_inner_microstep: 1441.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3568
[2024-06-10 01:00:40,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.51 | bwd_microstep: 1661.92 | bwd_inner_microstep: 1661.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 01:00:41,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 790.18 | bwd_inner_microstep: 790.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 01:00:45,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 01:00:45,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.86 | bwd_microstep: 3123.95 | bwd_inner_microstep: 1855.89 | bwd_allreduce_microstep: 1268.00 | step_microstep: 38.82
[2024-06-10 01:00:45,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16040.34 | bwd: 44241.41 | bwd_inner: 42972.17 | bwd_allreduce: 1268.47 | step: 40.94
{'loss': 1.2918, 'learning_rate': 1.3076923076923078e-05, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 01:00:47,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1404.07 | bwd_inner_microstep: 1403.86 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.19
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 01:00:48,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.93 | bwd_microstep: 788.90 | bwd_inner_microstep: 788.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797
[2024-06-10 01:00:50,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1508.79 | bwd_inner_microstep: 1508.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-10 01:00:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.84 | bwd_microstep: 1638.40 | bwd_inner_microstep: 1638.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-10 01:00:54,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1351.48 | bwd_inner_microstep: 1351.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 01:00:56,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1387.54 | bwd_inner_microstep: 1387.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967
[2024-06-10 01:00:57,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.47 | bwd_microstep: 736.37 | bwd_inner_microstep: 736.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 01:00:59,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1284.81 | bwd_inner_microstep: 1284.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4040
[2024-06-10 01:01:01,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1457.34 | bwd_inner_microstep: 1457.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686
[2024-06-10 01:01:03,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.79 | bwd_microstep: 1725.00 | bwd_inner_microstep: 1724.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 01:01:05,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1511.23 | bwd_inner_microstep: 1511.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 01:01:07,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 1589.67 | bwd_inner_microstep: 1589.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 01:01:10,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.72 | bwd_microstep: 1592.00 | bwd_inner_microstep: 1591.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 01:01:11,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.59 | bwd_microstep: 811.24 | bwd_inner_microstep: 811.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 01:01:13,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1493.03 | bwd_inner_microstep: 1493.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 01:01:15,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.59 | bwd_microstep: 1346.61 | bwd_inner_microstep: 1346.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 01:01:17,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 01:01:18,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.80 | bwd_microstep: 1293.74 | bwd_inner_microstep: 1293.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 01:01:20,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 1495.14 | bwd_inner_microstep: 1495.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3515
[2024-06-10 01:01:23,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.45 | bwd_microstep: 1552.08 | bwd_inner_microstep: 1552.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 01:01:25,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1492.81 | bwd_inner_microstep: 1492.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2272
[2024-06-10 01:01:26,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.35 | bwd_microstep: 1006.12 | bwd_inner_microstep: 1006.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 01:01:28,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.07 | bwd_microstep: 1187.74 | bwd_inner_microstep: 1187.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 01:01:30,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.10 | bwd_microstep: 1454.37 | bwd_inner_microstep: 1454.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-10 01:01:32,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.37 | bwd_microstep: 1583.07 | bwd_inner_microstep: 1583.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 01:01:34,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.49 | bwd_microstep: 1555.07 | bwd_inner_microstep: 1555.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3599
[2024-06-10 01:01:36,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.57 | bwd_microstep: 1440.75 | bwd_inner_microstep: 1440.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2308
[2024-06-10 01:01:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.43 | bwd_microstep: 986.47 | bwd_inner_microstep: 986.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3801
[2024-06-10 01:01:39,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.30 | bwd_microstep: 1484.28 | bwd_inner_microstep: 1484.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 01:01:42,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.95 | bwd_microstep: 1555.77 | bwd_inner_microstep: 1555.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 01:01:44,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.07 | bwd_microstep: 1406.45 | bwd_inner_microstep: 1406.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3813
[2024-06-10 01:01:47,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.26 | optimizer_step: 6.56
[2024-06-10 01:01:47,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 3173.83 | bwd_inner_microstep: 1566.64 | bwd_allreduce_microstep: 1607.14 | step_microstep: 39.20
[2024-06-10 01:01:47,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16426.57 | bwd: 45681.67 | bwd_inner: 44073.47 | bwd_allreduce: 1607.46 | step: 41.58
{'loss': 1.3429, 'learning_rate': 1.3846153846153847e-05, 'epoch': 0.01}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 01:01:49,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.88 | bwd_microstep: 1286.03 | bwd_inner_microstep: 1285.83 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3969
[2024-06-10 01:01:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1605.95 | bwd_inner_microstep: 1605.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 01:01:53,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1281.01 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3881
[2024-06-10 01:01:55,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.92 | bwd_microstep: 1511.37 | bwd_inner_microstep: 1511.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 01:01:56,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.96 | bwd_microstep: 792.39 | bwd_inner_microstep: 792.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 01:01:58,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.77 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 01:02:00,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.42 | bwd_microstep: 1288.75 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1942
[2024-06-10 01:02:01,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.82 | bwd_microstep: 761.82 | bwd_inner_microstep: 761.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3710
[2024-06-10 01:02:03,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.52 | bwd_microstep: 1457.50 | bwd_inner_microstep: 1457.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3556
[2024-06-10 01:02:05,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.79 | bwd_microstep: 1457.65 | bwd_inner_microstep: 1457.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 01:02:07,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.25 | bwd_microstep: 1433.99 | bwd_inner_microstep: 1433.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569
[2024-06-10 01:02:09,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.33 | bwd_microstep: 1462.26 | bwd_inner_microstep: 1462.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-10 01:02:11,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.22 | bwd_microstep: 1524.59 | bwd_inner_microstep: 1524.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1851
[2024-06-10 01:02:12,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.16 | bwd_microstep: 675.07 | bwd_inner_microstep: 675.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 01:02:14,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1486.41 | bwd_inner_microstep: 1486.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659
[2024-06-10 01:02:16,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.56 | bwd_microstep: 1449.13 | bwd_inner_microstep: 1449.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3855
[2024-06-10 01:02:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.96 | bwd_microstep: 1744.15 | bwd_inner_microstep: 1744.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 01:02:20,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.28 | bwd_microstep: 704.27 | bwd_inner_microstep: 704.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 01:02:22,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.57 | bwd_microstep: 1454.90 | bwd_inner_microstep: 1454.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3835
[2024-06-10 01:02:23,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.69 | bwd_microstep: 1362.04 | bwd_inner_microstep: 1362.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 01:02:26,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1512.74 | bwd_inner_microstep: 1512.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 01:02:27,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1404.67 | bwd_inner_microstep: 1404.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2174
[2024-06-10 01:02:29,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.84 | bwd_microstep: 952.04 | bwd_inner_microstep: 952.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 01:02:31,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.97 | bwd_microstep: 1650.03 | bwd_inner_microstep: 1650.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1990
[2024-06-10 01:02:32,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.54 | bwd_microstep: 832.67 | bwd_inner_microstep: 832.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2275
[2024-06-10 01:02:34,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.47 | bwd_microstep: 1025.96 | bwd_inner_microstep: 1025.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 01:02:36,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.04 | bwd_microstep: 1507.00 | bwd_inner_microstep: 1506.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760
[2024-06-10 01:02:38,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.57 | bwd_microstep: 1638.00 | bwd_inner_microstep: 1637.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 01:02:40,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.59 | bwd_microstep: 1399.26 | bwd_inner_microstep: 1399.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 01:02:42,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.78 | bwd_microstep: 1659.57 | bwd_inner_microstep: 1659.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 01:02:44,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1512.40 | bwd_inner_microstep: 1512.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3802
[2024-06-10 01:02:51,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.41 | optimizer_step: 6.61
[2024-06-10 01:02:51,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.73 | bwd_microstep: 6132.30 | bwd_inner_microstep: 1525.94 | bwd_allreduce_microstep: 4606.28 | step_microstep: 40.08
[2024-06-10 01:02:51,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15936.06 | bwd: 47349.80 | bwd_inner: 42742.42 | bwd_allreduce: 4606.62 | step: 42.32
{'loss': 1.412, 'learning_rate': 1.4615384615384615e-05, 'epoch': 0.01}


  1%|          | 13/1726 [19:19<30:15:59, 63.61s/it]
  1%|          | 14/1726 [20:19<29:45:48, 62.59s/it]


  1%|          | 14/1726 [20:19<29:45:48, 62.59s/it]
  1%|          | 15/1726 [21:20<29:29:15, 62.04s/it]


  1%|          | 15/1726 [21:20<29:29:15, 62.04s/it]
  1%|          | 16/1726 [22:21<29:16:21, 61.63s/it]


  1%|          | 16/1726 [22:21<29:16:21, 61.63s/it]
  1%|          | 17/1726 [23:22<29:07:01, 61.34s/it]


  1%|          | 17/1726 [23:22<29:07:01, 61.34s/it]
  1%|          | 18/1726 [24:24<29:16:01, 61.69s/it]


  1%|          | 18/1726 [24:24<29:16:01, 61.69s/it]
  1%|          | 19/1726 [25:28<29:31:55, 62.28s/it]


  1%|          | 19/1726 [25:28<29:31:55, 62.28s/idynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982
[2024-06-10 01:02:52,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.06 | bwd_microstep: 891.67 | bwd_inner_microstep: 891.54 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 01:02:54,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1380.92 | bwd_inner_microstep: 1380.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 01:02:56,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1344.02 | bwd_inner_microstep: 1343.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 01:02:58,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.56 | bwd_microstep: 1311.25 | bwd_inner_microstep: 1311.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 01:03:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1438.04 | bwd_inner_microstep: 1438.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3786
[2024-06-10 01:03:02,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.88 | bwd_microstep: 1445.61 | bwd_inner_microstep: 1445.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 01:03:03,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.16 | bwd_microstep: 1189.95 | bwd_inner_microstep: 1189.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 01:03:05,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.98 | bwd_microstep: 1155.64 | bwd_inner_microstep: 1155.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 01:03:06,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.79 | bwd_microstep: 801.22 | bwd_inner_microstep: 801.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1916
[2024-06-10 01:03:07,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.90 | bwd_microstep: 878.61 | bwd_inner_microstep: 878.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-10 01:03:10,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.21 | bwd_microstep: 1589.02 | bwd_inner_microstep: 1588.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3490
[2024-06-10 01:03:12,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1443.62 | bwd_inner_microstep: 1443.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3614
[2024-06-10 01:03:14,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1571.45 | bwd_inner_microstep: 1571.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3497
[2024-06-10 01:03:16,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.25 | bwd_microstep: 1431.59 | bwd_inner_microstep: 1431.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3653
[2024-06-10 01:03:18,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.14 | bwd_microstep: 1656.45 | bwd_inner_microstep: 1656.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 01:03:20,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1391.72 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 01:03:22,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1531.22 | bwd_inner_microstep: 1531.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 01:03:24,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.61 | bwd_microstep: 1404.24 | bwd_inner_microstep: 1404.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-10 01:03:25,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.45 | bwd_microstep: 697.82 | bwd_inner_microstep: 697.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 01:03:27,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1495.16 | bwd_inner_microstep: 1495.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 01:03:28,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.29 | bwd_microstep: 799.85 | bwd_inner_microstep: 799.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2207
[2024-06-10 01:03:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.18 | bwd_microstep: 960.55 | bwd_inner_microstep: 960.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695
[2024-06-10 01:03:31,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1337.13 | bwd_inner_microstep: 1337.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-10 01:03:33,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.50 | bwd_microstep: 1417.49 | bwd_inner_microstep: 1417.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994
[2024-06-10 01:03:34,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.08 | bwd_microstep: 831.96 | bwd_inner_microstep: 831.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 01:03:37,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1537.05 | bwd_inner_microstep: 1537.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 01:03:38,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.16 | bwd_microstep: 1407.42 | bwd_inner_microstep: 1407.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 01:03:41,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1476.31 | bwd_inner_microstep: 1476.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 01:03:43,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1500.63 | bwd_inner_microstep: 1500.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770
[2024-06-10 01:03:45,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.96 | bwd_microstep: 1573.34 | bwd_inner_microstep: 1573.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 01:03:47,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.74 | bwd_microstep: 1622.22 | bwd_inner_microstep: 1622.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 01:03:55,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 01:03:55,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.30 | bwd_microstep: 7373.46 | bwd_inner_microstep: 1454.68 | bwd_allreduce_microstep: 5918.71 | step_microstep: 39.93
[2024-06-10 01:03:55,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15659.41 | bwd: 47886.67 | bwd_inner: 41966.93 | bwd_allreduce: 5919.00 | step: 42.14
{'loss': 1.374, 'learning_rate': 1.5384615384615387e-05, 'epoch': 0.01}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3475
[2024-06-10 01:03:57,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.27 | bwd_microstep: 1519.26 | bwd_inner_microstep: 1519.10 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4034
[2024-06-10 01:03:59,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.05 | bwd_microstep: 1614.08 | bwd_inner_microstep: 1614.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3964
[2024-06-10 01:04:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.11 | bwd_microstep: 1461.74 | bwd_inner_microstep: 1461.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 01:04:03,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1552.44 | bwd_inner_microstep: 1552.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 01:04:06,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.47 | bwd_microstep: 1534.51 | bwd_inner_microstep: 1534.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 01:04:07,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.99 | bwd_microstep: 780.74 | bwd_inner_microstep: 780.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 01:04:09,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.88 | bwd_microstep: 1394.06 | bwd_inner_microstep: 1394.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4070
[2024-06-10 01:04:11,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.23 | bwd_microstep: 1625.44 | bwd_inner_microstep: 1625.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 01:04:13,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1253.75 | bwd_inner_microstep: 1253.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3856
[2024-06-10 01:04:15,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 682.87 | bwd_microstep: 1874.79 | bwd_inner_microstep: 1874.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 01:04:17,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.26 | bwd_microstep: 1277.97 | bwd_inner_microstep: 1277.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3625
[2024-06-10 01:04:19,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1375.38 | bwd_inner_microstep: 1375.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2051
[2024-06-10 01:04:20,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.01 | bwd_microstep: 817.68 | bwd_inner_microstep: 817.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-10 01:04:22,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1407.70 | bwd_inner_microstep: 1407.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3416
[2024-06-10 01:04:24,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.01 | bwd_microstep: 1297.91 | bwd_inner_microstep: 1297.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 01:04:26,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.38 | bwd_microstep: 1718.37 | bwd_inner_microstep: 1718.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420
[2024-06-10 01:04:28,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.12 | bwd_microstep: 1156.45 | bwd_inner_microstep: 1156.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 01:04:30,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1399.33 | bwd_inner_microstep: 1399.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 01:04:31,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.46 | bwd_microstep: 1297.24 | bwd_inner_microstep: 1297.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 01:04:33,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 01:04:35,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1499.68 | bwd_inner_microstep: 1499.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2651
[2024-06-10 01:04:37,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.74 | bwd_microstep: 1027.40 | bwd_inner_microstep: 1027.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2974
[2024-06-10 01:04:38,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.01 | bwd_microstep: 1107.44 | bwd_inner_microstep: 1107.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2273
[2024-06-10 01:04:39,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.55 | bwd_microstep: 786.00 | bwd_inner_microstep: 785.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 01:04:41,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1385.23 | bwd_inner_microstep: 1385.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 01:04:43,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1306.76 | bwd_inner_microstep: 1306.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 01:04:45,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.25 | bwd_microstep: 1353.83 | bwd_inner_microstep: 1353.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540
[2024-06-10 01:04:47,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.28 | bwd_microstep: 1426.63 | bwd_inner_microstep: 1426.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 01:04:49,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.19 | bwd_microstep: 1400.93 | bwd_inner_microstep: 1400.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-10 01:04:51,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1451.29 | bwd_inner_microstep: 1451.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3569
[2024-06-10 01:04:53,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.60 | bwd_microstep: 1347.89 | bwd_inner_microstep: 1347.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-10 01:04:56,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 01:04:56,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.73 | bwd_microstep: 2700.18 | bwd_inner_microstep: 1945.99 | bwd_allreduce_microstep: 754.13 | step_microstep: 39.18
[2024-06-10 01:04:56,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16342.58 | bwd: 44533.34 | bwd_inner: 43778.17 | bwd_allreduce: 754.43 | step: 41.24
{'loss': 1.3563, 'learning_rate': 1.6153846153846154e-05, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 01:04:58,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.07 | bwd_microstep: 1373.44 | bwd_inner_microstep: 1373.25 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4017
[2024-06-10 01:05:00,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.22 | bwd_microstep: 1609.59 | bwd_inner_microstep: 1609.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3930
[2024-06-10 01:05:03,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.55 | bwd_microstep: 1700.08 | bwd_inner_microstep: 1700.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 01:05:05,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1440.84 | bwd_inner_microstep: 1440.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 01:05:06,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.94 | bwd_microstep: 793.56 | bwd_inner_microstep: 793.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 01:05:08,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1383.35 | bwd_inner_microstep: 1383.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 01:05:09,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1250.14 | bwd_inner_microstep: 1250.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 01:05:11,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 1250.27 | bwd_inner_microstep: 1250.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 01:05:13,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.14 | bwd_microstep: 1286.68 | bwd_inner_microstep: 1286.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-10 01:05:14,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 730.43 | bwd_inner_microstep: 730.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-10 01:05:16,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.16 | bwd_microstep: 1632.67 | bwd_inner_microstep: 1632.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1963
[2024-06-10 01:05:17,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.32 | bwd_microstep: 767.02 | bwd_inner_microstep: 766.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 01:05:19,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.46 | bwd_microstep: 1611.25 | bwd_inner_microstep: 1611.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 01:05:21,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.34 | bwd_microstep: 1349.42 | bwd_inner_microstep: 1349.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 01:05:23,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1517.67 | bwd_inner_microstep: 1517.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2136
[2024-06-10 01:05:25,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.35 | bwd_microstep: 834.32 | bwd_inner_microstep: 834.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2124
[2024-06-10 01:05:26,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.69 | bwd_microstep: 834.27 | bwd_inner_microstep: 834.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099
[2024-06-10 01:05:27,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.93 | bwd_microstep: 920.11 | bwd_inner_microstep: 920.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 01:05:29,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.21 | bwd_microstep: 1398.67 | bwd_inner_microstep: 1398.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 01:05:31,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.74 | bwd_microstep: 1290.95 | bwd_inner_microstep: 1290.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-10 01:05:33,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.73 | bwd_microstep: 1522.71 | bwd_inner_microstep: 1522.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-10 01:05:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.78 | bwd_microstep: 1640.34 | bwd_inner_microstep: 1640.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 01:05:37,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.00 | bwd_microstep: 1288.16 | bwd_inner_microstep: 1288.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-10 01:05:39,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.80 | bwd_microstep: 1415.80 | bwd_inner_microstep: 1415.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 01:05:41,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1251.45 | bwd_inner_microstep: 1251.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 01:05:43,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1393.82 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 01:05:45,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.63 | bwd_microstep: 1421.12 | bwd_inner_microstep: 1421.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429
[2024-06-10 01:05:46,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3765
[2024-06-10 01:05:48,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.35 | bwd_microstep: 1346.25 | bwd_inner_microstep: 1346.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-10 01:05:50,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.17 | bwd_microstep: 1438.72 | bwd_inner_microstep: 1438.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 01:05:52,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.46 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3759
[2024-06-10 01:05:57,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.39 | optimizer_step: 6.64
[2024-06-10 01:05:57,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.44 | bwd_microstep: 4332.40 | bwd_inner_microstep: 2093.58 | bwd_allreduce_microstep: 2238.75 | step_microstep: 39.96
[2024-06-10 01:05:57,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15864.29 | bwd: 44699.86 | bwd_inner: 42460.05 | bwd_allreduce: 2239.05 | step: 42.23
{'loss': 1.3291, 'learning_rate': 1.6923076923076924e-05, 'epoch': 0.01}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2032
[2024-06-10 01:05:58,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.19 | bwd_microstep: 718.03 | bwd_inner_microstep: 717.88 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3879
[2024-06-10 01:06:00,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1481.28 | bwd_inner_microstep: 1481.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 01:06:02,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1247.07 | bwd_inner_microstep: 1247.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 01:06:04,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1381.54 | bwd_inner_microstep: 1381.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 01:06:06,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1282.11 | bwd_inner_microstep: 1282.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724
[2024-06-10 01:06:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.77 | bwd_microstep: 1365.03 | bwd_inner_microstep: 1365.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1912
[2024-06-10 01:06:09,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.47 | bwd_microstep: 777.39 | bwd_inner_microstep: 777.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 01:06:11,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.98 | bwd_microstep: 1436.46 | bwd_inner_microstep: 1436.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3706
[2024-06-10 01:06:13,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.81 | bwd_microstep: 1680.13 | bwd_inner_microstep: 1680.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3529
[2024-06-10 01:06:15,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.49 | bwd_microstep: 1446.80 | bwd_inner_microstep: 1446.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 01:06:17,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-10 01:06:19,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.55 | bwd_microstep: 1322.51 | bwd_inner_microstep: 1322.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-10 01:06:21,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1582.76 | bwd_inner_microstep: 1582.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3614
[2024-06-10 01:06:23,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.59 | bwd_microstep: 1445.37 | bwd_inner_microstep: 1445.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-10 01:06:24,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.14 | bwd_microstep: 898.27 | bwd_inner_microstep: 898.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974
[2024-06-10 01:06:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.26 | bwd_microstep: 771.27 | bwd_inner_microstep: 771.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 01:06:27,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1386.31 | bwd_inner_microstep: 1386.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3375
[2024-06-10 01:06:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.38 | bwd_microstep: 1339.14 | bwd_inner_microstep: 1339.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 01:06:31,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1479.35 | bwd_inner_microstep: 1479.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931
[2024-06-10 01:06:32,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.20 | bwd_microstep: 823.79 | bwd_inner_microstep: 823.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-10 01:06:33,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.05 | bwd_microstep: 764.56 | bwd_inner_microstep: 764.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-10 01:06:35,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.42 | bwd_microstep: 1589.07 | bwd_inner_microstep: 1589.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 01:06:37,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.21 | bwd_microstep: 977.74 | bwd_inner_microstep: 977.35 | bwd_allreduce_microstep: 0.19 | step_microstep: 0.23
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2282
[2024-06-10 01:06:38,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.99 | bwd_microstep: 787.79 | bwd_inner_microstep: 787.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-10 01:06:39,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.94 | bwd_microstep: 699.46 | bwd_inner_microstep: 699.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 01:06:40,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.00 | bwd_microstep: 698.42 | bwd_inner_microstep: 698.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2149
[2024-06-10 01:06:41,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.13 | bwd_microstep: 884.76 | bwd_inner_microstep: 884.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612
[2024-06-10 01:06:43,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.39 | bwd_microstep: 1314.10 | bwd_inner_microstep: 1314.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 01:06:45,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.51 | bwd_microstep: 1356.89 | bwd_inner_microstep: 1356.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 01:06:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.98 | bwd_microstep: 1282.84 | bwd_inner_microstep: 1282.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3930
[2024-06-10 01:06:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.80 | bwd_microstep: 1499.25 | bwd_inner_microstep: 1499.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 01:06:59,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.41 | optimizer_step: 6.59
[2024-06-10 01:06:59,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.77 | bwd_microstep: 10180.19 | bwd_inner_microstep: 1749.74 | bwd_allreduce_microstep: 8430.38 | step_microstep: 40.00
[2024-06-10 01:06:59,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14527.42 | bwd: 47285.04 | bwd_inner: 38853.26 | bwd_allreduce: 8430.86 | step: 42.19
{'loss': 1.3112, 'learning_rate': 1.7692307692307694e-05, 'epoch': 0.01}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 01:07:01,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.57 | bwd_microstep: 1268.39 | bwd_inner_microstep: 1268.24 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3940
[2024-06-10 01:07:03,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.48 | bwd_microstep: 1690.31 | bwd_inner_microstep: 1690.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886
[2024-06-10 01:07:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.22 | bwd_microstep: 1584.38 | bwd_inner_microstep: 1584.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 01:07:08,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.89 | bwd_microstep: 1477.06 | bwd_inner_microstep: 1477.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 01:07:09,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.36 | bwd_microstep: 676.55 | bwd_inner_microstep: 676.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 01:07:10,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.87 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 01:07:12,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.03 | bwd_microstep: 1532.17 | bwd_inner_microstep: 1532.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 01:07:14,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1253.11 | bwd_inner_microstep: 1253.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 01:07:16,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.39 | bwd_microstep: 1197.66 | bwd_inner_microstep: 1197.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3924
[2024-06-10 01:07:18,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.12 | bwd_microstep: 1494.22 | bwd_inner_microstep: 1494.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2054
[2024-06-10 01:07:19,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.86 | bwd_microstep: 726.55 | bwd_inner_microstep: 726.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-10 01:07:21,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1282.09 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 01:07:23,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1498.14 | bwd_inner_microstep: 1498.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3668
[2024-06-10 01:07:25,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.76 | bwd_microstep: 1788.26 | bwd_inner_microstep: 1788.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2471
[2024-06-10 01:07:27,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.59 | bwd_microstep: 989.44 | bwd_inner_microstep: 989.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2183
[2024-06-10 01:07:28,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.74 | bwd_microstep: 920.93 | bwd_inner_microstep: 920.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3519
[2024-06-10 01:07:30,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1548.98 | bwd_inner_microstep: 1548.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 01:07:32,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1420.86 | bwd_inner_microstep: 1420.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 01:07:34,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.80 | bwd_microstep: 1450.62 | bwd_inner_microstep: 1450.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3671
[2024-06-10 01:07:36,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1357.89 | bwd_inner_microstep: 1357.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3617
[2024-06-10 01:07:38,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.22 | bwd_microstep: 1677.21 | bwd_inner_microstep: 1677.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 01:07:40,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.65 | bwd_microstep: 1397.23 | bwd_inner_microstep: 1397.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 01:07:42,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1403.57 | bwd_inner_microstep: 1403.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 01:07:44,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.66 | bwd_microstep: 1494.87 | bwd_inner_microstep: 1494.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-10 01:07:45,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.78 | bwd_microstep: 686.02 | bwd_inner_microstep: 685.83 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 01:07:47,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1451.72 | bwd_inner_microstep: 1451.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 01:07:49,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1354.52 | bwd_inner_microstep: 1354.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2038
[2024-06-10 01:07:50,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 809.28 | bwd_inner_microstep: 809.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 01:07:52,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.43 | bwd_microstep: 1447.08 | bwd_inner_microstep: 1447.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3780
[2024-06-10 01:07:54,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1502.67 | bwd_inner_microstep: 1502.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2717
[2024-06-10 01:07:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.50 | bwd_microstep: 1132.44 | bwd_inner_microstep: 1132.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3443
[2024-06-10 01:08:01,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.31 | optimizer_step: 6.58
[2024-06-10 01:08:01,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 4852.78 | bwd_inner_microstep: 1605.89 | bwd_allreduce_microstep: 3246.83 | step_microstep: 38.76
[2024-06-10 01:08:01,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15808.92 | bwd: 45614.09 | bwd_inner: 42366.06 | bwd_allreduce: 3247.22 | step: 40.74
{'loss': 1.3825, 'learning_rate': 1.8461538461538465e-05, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 01:08:03,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1373.25 | bwd_inner_microstep: 1373.03 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3936
[2024-06-10 01:08:05,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.77 | bwd_microstep: 1523.34 | bwd_inner_microstep: 1523.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 01:08:07,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1290.55 | bwd_inner_microstep: 1290.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 01:08:09,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.29 | bwd_microstep: 1453.47 | bwd_inner_microstep: 1453.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 01:08:11,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.70 | bwd_microstep: 1400.55 | bwd_inner_microstep: 1400.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3787
[2024-06-10 01:08:13,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.51 | bwd_microstep: 1550.90 | bwd_inner_microstep: 1550.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 01:08:15,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1254.98 | bwd_inner_microstep: 1254.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 01:08:17,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.68 | bwd_microstep: 1487.43 | bwd_inner_microstep: 1487.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1952
[2024-06-10 01:08:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.43 | bwd_microstep: 764.43 | bwd_inner_microstep: 764.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 01:08:20,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.14 | bwd_microstep: 1487.78 | bwd_inner_microstep: 1487.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 01:08:22,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.41 | bwd_microstep: 1448.46 | bwd_inner_microstep: 1448.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 01:08:24,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.47 | bwd_microstep: 1473.68 | bwd_inner_microstep: 1473.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 01:08:26,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.44 | bwd_microstep: 1358.63 | bwd_inner_microstep: 1358.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 01:08:28,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-10 01:08:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.09 | bwd_microstep: 1589.61 | bwd_inner_microstep: 1589.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 01:08:32,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.27 | bwd_microstep: 1520.95 | bwd_inner_microstep: 1520.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 01:08:34,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.09 | bwd_microstep: 1562.12 | bwd_inner_microstep: 1562.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 01:08:36,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.74 | bwd_microstep: 1161.93 | bwd_inner_microstep: 1161.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3598
[2024-06-10 01:08:38,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.86 | bwd_microstep: 1439.96 | bwd_inner_microstep: 1439.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532
[2024-06-10 01:08:40,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.59 | bwd_microstep: 1332.14 | bwd_inner_microstep: 1332.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 01:08:41,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1412.20 | bwd_inner_microstep: 1412.02 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.11
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3554
[2024-06-10 01:08:43,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.60 | bwd_microstep: 1367.51 | bwd_inner_microstep: 1367.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3533
[2024-06-10 01:08:45,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1261.93 | bwd_inner_microstep: 1261.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 01:08:47,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.95 | bwd_microstep: 1362.41 | bwd_inner_microstep: 1362.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 01:08:49,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.25 | bwd_microstep: 1406.29 | bwd_inner_microstep: 1406.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 01:08:51,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.93 | bwd_microstep: 1499.94 | bwd_inner_microstep: 1499.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4076
[2024-06-10 01:08:53,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.62 | bwd_microstep: 1733.45 | bwd_inner_microstep: 1733.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2056
[2024-06-10 01:08:55,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.63 | bwd_microstep: 979.87 | bwd_inner_microstep: 979.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627
[2024-06-10 01:08:57,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.27 | bwd_microstep: 1711.74 | bwd_inner_microstep: 1711.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3575
[2024-06-10 01:08:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.88 | bwd_microstep: 1459.29 | bwd_inner_microstep: 1459.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 01:09:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1517.55 | bwd_inner_microstep: 1517.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 01:09:04,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.23 | optimizer_step: 6.64
[2024-06-10 01:09:04,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.73 | bwd_microstep: 1793.40 | bwd_inner_microstep: 1785.56 | bwd_allreduce_microstep: 7.79 | step_microstep: 38.68
[2024-06-10 01:09:04,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16936.76 | bwd: 45261.28 | bwd_inner: 45252.29 | bwd_allreduce: 8.16 | step: 41.23
{'loss': 1.3144, 'learning_rate': 1.923076923076923e-05, 'epoch': 0.01}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 01:09:06,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.67 | bwd_microstep: 1352.18 | bwd_inner_microstep: 1351.93 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4663
[2024-06-10 01:09:08,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.55 | bwd_microstep: 1677.10 | bwd_inner_microstep: 1677.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2370
[2024-06-10 01:09:09,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.91 | bwd_microstep: 901.74 | bwd_inner_microstep: 901.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 01:09:11,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1246.03 | bwd_inner_microstep: 1246.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 01:09:13,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1382.47 | bwd_inner_microstep: 1382.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397
[2024-06-10 01:09:14,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.48 | bwd_microstep: 1154.43 | bwd_inner_microstep: 1154.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504
[2024-06-10 01:09:16,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.97 | bwd_microstep: 1193.57 | bwd_inner_microstep: 1193.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 01:09:18,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1282.94 | bwd_inner_microstep: 1282.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-10 01:09:20,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.92 | bwd_microstep: 1634.87 | bwd_inner_microstep: 1634.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 01:09:22,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.13 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2132
[2024-06-10 01:09:23,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.41 | bwd_microstep: 928.95 | bwd_inner_microstep: 928.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 01:09:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1283.17 | bwd_inner_microstep: 1283.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3685
[2024-06-10 01:09:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.61 | bwd_microstep: 1576.70 | bwd_inner_microstep: 1576.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3710
[2024-06-10 01:09:30,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.86 | bwd_microstep: 1691.68 | bwd_inner_microstep: 1691.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 01:09:32,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1419.95 | bwd_inner_microstep: 1419.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1892
[2024-06-10 01:09:33,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.34 | bwd_microstep: 782.19 | bwd_inner_microstep: 782.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3657
[2024-06-10 01:09:35,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.34 | bwd_microstep: 1590.01 | bwd_inner_microstep: 1589.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 01:09:37,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1424.61 | bwd_inner_microstep: 1424.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3643
[2024-06-10 01:09:39,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.49 | bwd_microstep: 1349.86 | bwd_inner_microstep: 1349.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 01:09:41,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.20 | bwd_microstep: 1660.38 | bwd_inner_microstep: 1660.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644
[2024-06-10 01:09:43,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1522.36 | bwd_inner_microstep: 1522.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 01:09:45,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1394.76 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 01:09:47,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.10 | bwd_microstep: 1356.81 | bwd_inner_microstep: 1356.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 01:09:49,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.75 | bwd_microstep: 1514.08 | bwd_inner_microstep: 1514.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-10 01:09:51,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1303.96 | bwd_inner_microstep: 1303.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 01:09:53,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1304.78 | bwd_inner_microstep: 1304.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3954
[2024-06-10 01:09:55,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.23 | bwd_microstep: 1711.88 | bwd_inner_microstep: 1711.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816
[2024-06-10 01:09:57,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1394.40 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 01:09:59,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1353.99 | bwd_inner_microstep: 1353.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3556
[2024-06-10 01:10:01,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.02 | bwd_microstep: 1528.27 | bwd_inner_microstep: 1528.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 01:10:03,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.35 | bwd_microstep: 1500.92 | bwd_inner_microstep: 1500.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 01:10:05,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.19 | optimizer_step: 6.63
[2024-06-10 01:10:05,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.22 | bwd_microstep: 1394.44 | bwd_inner_microstep: 1386.76 | bwd_allreduce_microstep: 7.63 | step_microstep: 38.87
[2024-06-10 01:10:05,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16565.16 | bwd: 44199.90 | bwd_inner: 44191.16 | bwd_allreduce: 7.99 | step: 41.00
t]
  1%|          | 20/1726 [26:32<29:45:00, 62.78s/it]


  1%|          | 20/1726 [26:32<29:45:00, 62.78s/it]
  1%|          | 21/1726 [27:33<29:30:59, 62.32s/it]


  1%|          | 21/1726 [27:33<29:30:59, 62.32s/it]
  1%|▏         | 22/1726 [28:34<29:18:15, 61.91s/it]


  1%|▏         | 22/1726 [28:34<29:18:15, 61.91s/it]
  1%|▏         | 23/1726 [29:36<29:19:33, 61.99s/it]


  1%|▏         | 23/1726 [29:36<29:19:33, 61.99s/it]
  1%|▏         | 24/1726 [30:38<29:16:52, 61.93s/it]


  1%|▏         | 24/1726 [30:38<29:16:52, 61.93s/it]
  1%|▏         | 25/1726 [31:40<29:21:24, 62.13s/it]


  1%|▏         | 25/1726 [31:40<29:21:24, 62.13s/it]
  2%|▏         | 26/1726 [32:42<29:12:06, 6{'loss': 1.4003, 'learning_rate': 2e-05, 'epoch': 0.02}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 01:10:07,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1392.85 | bwd_inner_microstep: 1392.70 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352
[2024-06-10 01:10:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.69 | bwd_microstep: 991.54 | bwd_inner_microstep: 991.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3877
[2024-06-10 01:10:10,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.39 | bwd_microstep: 1547.24 | bwd_inner_microstep: 1547.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-10 01:10:12,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.54 | bwd_microstep: 1413.57 | bwd_inner_microstep: 1413.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-10 01:10:13,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.90 | bwd_microstep: 713.74 | bwd_inner_microstep: 713.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430
[2024-06-10 01:10:15,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.07 | bwd_microstep: 1158.48 | bwd_inner_microstep: 1158.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 01:10:17,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1529.20 | bwd_inner_microstep: 1529.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 01:10:19,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 01:10:21,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.67 | bwd_microstep: 1547.09 | bwd_inner_microstep: 1547.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526
[2024-06-10 01:10:23,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.36 | bwd_microstep: 1447.05 | bwd_inner_microstep: 1447.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 01:10:25,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.79 | bwd_microstep: 1489.36 | bwd_inner_microstep: 1489.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 01:10:27,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.03 | bwd_microstep: 1486.51 | bwd_inner_microstep: 1486.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-10 01:10:29,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1421.14 | bwd_inner_microstep: 1421.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968
[2024-06-10 01:10:30,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.68 | bwd_microstep: 704.70 | bwd_inner_microstep: 704.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454
[2024-06-10 01:10:32,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.69 | bwd_microstep: 1190.93 | bwd_inner_microstep: 1190.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 01:10:33,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1385.86 | bwd_inner_microstep: 1385.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3845
[2024-06-10 01:10:35,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.10 | bwd_microstep: 1468.39 | bwd_inner_microstep: 1468.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 01:10:37,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1358.32 | bwd_inner_microstep: 1358.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 01:10:38,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.90 | bwd_microstep: 699.05 | bwd_inner_microstep: 699.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 01:10:41,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 1663.99 | bwd_inner_microstep: 1663.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 01:10:43,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.19 | bwd_microstep: 1653.47 | bwd_inner_microstep: 1653.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1993
[2024-06-10 01:10:44,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.72 | bwd_microstep: 713.29 | bwd_inner_microstep: 713.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 01:10:46,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1281.27 | bwd_inner_microstep: 1281.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 01:10:48,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1408.09 | bwd_inner_microstep: 1408.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3811
[2024-06-10 01:10:50,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.05 | bwd_microstep: 1720.85 | bwd_inner_microstep: 1720.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3902
[2024-06-10 01:10:52,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.93 | bwd_microstep: 1618.01 | bwd_inner_microstep: 1617.84 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3690
[2024-06-10 01:10:54,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1424.95 | bwd_inner_microstep: 1424.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-10 01:10:56,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1626.45 | bwd_inner_microstep: 1626.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3616
[2024-06-10 01:10:59,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.63 | bwd_microstep: 1647.92 | bwd_inner_microstep: 1647.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 01:11:01,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.21 | bwd_microstep: 1647.63 | bwd_inner_microstep: 1647.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2377
[2024-06-10 01:11:02,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.29 | bwd_microstep: 1003.68 | bwd_inner_microstep: 1003.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 01:11:05,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 6.18 | optimizer_step: 6.60
[2024-06-10 01:11:05,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.90 | bwd_microstep: 2183.53 | bwd_inner_microstep: 1754.24 | bwd_allreduce_microstep: 429.24 | step_microstep: 40.91
[2024-06-10 01:11:05,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16153.42 | bwd: 43787.81 | bwd_inner: 43357.42 | bwd_allreduce: 429.59 | step: 43.09
{'loss': 1.3746, 'learning_rate': 2.0769230769230772e-05, 'epoch': 0.02}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 01:11:07,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.36 | bwd_microstep: 1337.94 | bwd_inner_microstep: 1337.78 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 01:11:09,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.26 | bwd_microstep: 1353.42 | bwd_inner_microstep: 1353.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3906
[2024-06-10 01:11:11,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.17 | bwd_microstep: 1694.27 | bwd_inner_microstep: 1694.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 01:11:13,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.58 | bwd_microstep: 1255.61 | bwd_inner_microstep: 1255.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638
[2024-06-10 01:11:14,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.10 | bwd_microstep: 1019.47 | bwd_inner_microstep: 1019.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 01:11:16,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.54 | bwd_microstep: 1287.61 | bwd_inner_microstep: 1287.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 01:11:18,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1384.94 | bwd_inner_microstep: 1384.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 01:11:20,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.24 | bwd_microstep: 1487.26 | bwd_inner_microstep: 1487.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 01:11:22,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.57 | bwd_microstep: 1354.81 | bwd_inner_microstep: 1354.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 01:11:24,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.21 | bwd_microstep: 1278.93 | bwd_inner_microstep: 1278.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3456
[2024-06-10 01:11:26,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1317.70 | bwd_inner_microstep: 1317.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.45
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-10 01:11:27,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.31 | bwd_microstep: 1354.95 | bwd_inner_microstep: 1354.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 01:11:30,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1483.91 | bwd_inner_microstep: 1483.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2155
[2024-06-10 01:11:31,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.39 | bwd_microstep: 952.42 | bwd_inner_microstep: 952.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-10 01:11:33,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.19 | bwd_microstep: 1338.40 | bwd_inner_microstep: 1338.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1947
[2024-06-10 01:11:34,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.13 | bwd_microstep: 852.00 | bwd_inner_microstep: 851.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511
[2024-06-10 01:11:36,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.19 | bwd_microstep: 1196.81 | bwd_inner_microstep: 1196.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 01:11:37,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1411.67 | bwd_inner_microstep: 1411.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-10 01:11:39,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.53 | bwd_microstep: 882.00 | bwd_inner_microstep: 881.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3694
[2024-06-10 01:11:41,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1331.89 | bwd_inner_microstep: 1331.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 01:11:42,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.59 | bwd_microstep: 1303.49 | bwd_inner_microstep: 1303.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 01:11:44,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1491.79 | bwd_inner_microstep: 1491.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 01:11:46,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.99 | bwd_microstep: 1380.55 | bwd_inner_microstep: 1380.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 01:11:48,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1395.45 | bwd_inner_microstep: 1395.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 01:11:50,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1501.88 | bwd_inner_microstep: 1501.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 01:11:52,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.66 | bwd_microstep: 1432.26 | bwd_inner_microstep: 1432.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3752
[2024-06-10 01:11:54,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.18 | bwd_microstep: 1444.29 | bwd_inner_microstep: 1444.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 01:11:56,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1259.20 | bwd_inner_microstep: 1259.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3588
[2024-06-10 01:11:58,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.18 | bwd_microstep: 1677.60 | bwd_inner_microstep: 1677.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 01:12:01,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.26 | bwd_microstep: 1548.70 | bwd_inner_microstep: 1548.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 01:12:02,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1344.01 | bwd_inner_microstep: 1343.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 01:12:07,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 01:12:07,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.08 | bwd_microstep: 3978.90 | bwd_inner_microstep: 1809.16 | bwd_allreduce_microstep: 2169.69 | step_microstep: 39.54
[2024-06-10 01:12:07,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16099.61 | bwd: 45334.17 | bwd_inner: 43163.45 | bwd_allreduce: 2169.97 | step: 44.03
{'loss': 1.2862, 'learning_rate': 2.153846153846154e-05, 'epoch': 0.02}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 01:12:09,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.54 | bwd_microstep: 1376.97 | bwd_inner_microstep: 1376.76 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-10 01:12:11,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1309.60 | bwd_inner_microstep: 1309.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2335
[2024-06-10 01:12:12,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.13 | bwd_microstep: 884.70 | bwd_inner_microstep: 884.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3419
[2024-06-10 01:12:14,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.11 | bwd_microstep: 1210.62 | bwd_inner_microstep: 1210.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 01:12:15,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.67 | bwd_microstep: 1280.86 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 01:12:18,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.71 | bwd_microstep: 1643.08 | bwd_inner_microstep: 1643.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 01:12:19,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.53 | bwd_microstep: 1153.52 | bwd_inner_microstep: 1153.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-10 01:12:22,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.70 | bwd_microstep: 1640.51 | bwd_inner_microstep: 1640.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695
[2024-06-10 01:12:24,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.21 | bwd_microstep: 1460.00 | bwd_inner_microstep: 1459.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 01:12:25,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1347.05 | bwd_inner_microstep: 1347.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 01:12:27,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1485.44 | bwd_inner_microstep: 1485.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1936
[2024-06-10 01:12:29,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.22 | bwd_microstep: 885.51 | bwd_inner_microstep: 885.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656
[2024-06-10 01:12:31,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1520.82 | bwd_inner_microstep: 1520.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 01:12:33,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.21 | bwd_microstep: 1486.33 | bwd_inner_microstep: 1486.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4006
[2024-06-10 01:12:35,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 1617.37 | bwd_inner_microstep: 1617.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 01:12:37,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.18 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 01:12:38,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.21 | bwd_microstep: 796.00 | bwd_inner_microstep: 795.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 01:12:40,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.37 | bwd_microstep: 1614.09 | bwd_inner_microstep: 1614.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 01:12:42,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1393.38 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 01:12:44,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.98 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 01:12:46,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.52 | bwd_microstep: 1291.66 | bwd_inner_microstep: 1291.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3612
[2024-06-10 01:12:48,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1441.30 | bwd_inner_microstep: 1441.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 01:12:50,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.18 | bwd_microstep: 1285.05 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3720
[2024-06-10 01:12:51,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1290.58 | bwd_inner_microstep: 1290.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 01:12:53,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.03 | bwd_microstep: 1513.96 | bwd_inner_microstep: 1513.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2184
[2024-06-10 01:12:55,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.80 | bwd_microstep: 859.68 | bwd_inner_microstep: 859.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3592
[2024-06-10 01:12:57,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1441.18 | bwd_inner_microstep: 1441.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 01:12:59,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.22 | bwd_microstep: 1601.03 | bwd_inner_microstep: 1601.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3592
[2024-06-10 01:13:01,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.30 | bwd_microstep: 1459.77 | bwd_inner_microstep: 1459.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3683
[2024-06-10 01:13:03,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.04 | bwd_microstep: 1488.59 | bwd_inner_microstep: 1488.45 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3567
[2024-06-10 01:13:05,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.47 | bwd_microstep: 1696.96 | bwd_inner_microstep: 1696.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 01:13:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 01:13:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.07 | bwd_microstep: 1401.53 | bwd_inner_microstep: 1280.44 | bwd_allreduce_microstep: 121.04 | step_microstep: 38.76
[2024-06-10 01:13:07,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16240.07 | bwd: 43465.73 | bwd_inner: 43343.48 | bwd_allreduce: 121.45 | step: 41.01
{'loss': 1.3549, 'learning_rate': 2.230769230769231e-05, 'epoch': 0.02}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 01:13:09,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.08 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.23 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.16
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472
[2024-06-10 01:13:11,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.69 | bwd_microstep: 1216.11 | bwd_inner_microstep: 1216.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2394
[2024-06-10 01:13:12,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.58 | bwd_microstep: 1003.56 | bwd_inner_microstep: 1003.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 01:13:14,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.07 | bwd_microstep: 1314.31 | bwd_inner_microstep: 1314.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 01:13:16,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.23 | bwd_microstep: 1652.84 | bwd_inner_microstep: 1652.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 01:13:18,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1384.36 | bwd_inner_microstep: 1384.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3436
[2024-06-10 01:13:20,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.31 | bwd_microstep: 1189.00 | bwd_inner_microstep: 1188.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-10 01:13:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.11 | bwd_microstep: 823.86 | bwd_inner_microstep: 823.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 01:13:23,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.69 | bwd_microstep: 1298.82 | bwd_inner_microstep: 1298.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3700
[2024-06-10 01:13:25,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.04 | bwd_microstep: 1656.90 | bwd_inner_microstep: 1656.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-10 01:13:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 704.40 | bwd_inner_microstep: 704.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 01:13:28,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1311.47 | bwd_inner_microstep: 1311.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 01:13:29,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.56 | bwd_microstep: 1157.50 | bwd_inner_microstep: 1157.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896
[2024-06-10 01:13:32,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.91 | bwd_microstep: 1596.26 | bwd_inner_microstep: 1596.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947
[2024-06-10 01:13:33,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.50 | bwd_microstep: 730.89 | bwd_inner_microstep: 730.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 01:13:34,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1287.27 | bwd_inner_microstep: 1287.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 01:13:36,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1354.94 | bwd_inner_microstep: 1354.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-10 01:13:38,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 01:13:40,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1295.09 | bwd_inner_microstep: 1295.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 01:13:42,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.82 | bwd_microstep: 1433.00 | bwd_inner_microstep: 1432.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 01:13:43,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.98 | bwd_microstep: 701.41 | bwd_inner_microstep: 701.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 01:13:45,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.47 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 01:13:47,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.80 | bwd_microstep: 1424.57 | bwd_inner_microstep: 1424.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 01:13:49,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.05 | bwd_microstep: 1520.67 | bwd_inner_microstep: 1520.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 01:13:51,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1659.62 | bwd_inner_microstep: 1659.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 01:13:53,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1345.60 | bwd_inner_microstep: 1345.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 01:13:55,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1390.78 | bwd_inner_microstep: 1390.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3596
[2024-06-10 01:13:57,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.49 | bwd_microstep: 1338.95 | bwd_inner_microstep: 1338.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 01:13:59,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1550.39 | bwd_inner_microstep: 1550.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3346
[2024-06-10 01:14:01,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.14 | bwd_microstep: 1465.13 | bwd_inner_microstep: 1465.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 01:14:03,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.23 | bwd_microstep: 1187.58 | bwd_inner_microstep: 1187.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 01:14:07,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.41 | optimizer_step: 6.62
[2024-06-10 01:14:07,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.70 | bwd_microstep: 3716.52 | bwd_inner_microstep: 1576.55 | bwd_allreduce_microstep: 2139.89 | step_microstep: 39.92
[2024-06-10 01:14:07,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15711.65 | bwd: 44026.82 | bwd_inner: 41885.85 | bwd_allreduce: 2140.19 | step: 42.19
{'loss': 1.3456, 'learning_rate': 2.3076923076923076e-05, 'epoch': 0.02}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3476
[2024-06-10 01:14:09,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1325.56 | bwd_inner_microstep: 1325.40 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3903
[2024-06-10 01:14:11,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.82 | bwd_microstep: 1684.92 | bwd_inner_microstep: 1684.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 01:14:12,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.67 | bwd_microstep: 792.72 | bwd_inner_microstep: 792.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 01:14:14,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1346.31 | bwd_inner_microstep: 1346.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 01:14:16,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742
[2024-06-10 01:14:18,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.04 | bwd_microstep: 1539.48 | bwd_inner_microstep: 1539.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 01:14:20,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.68 | bwd_microstep: 1413.85 | bwd_inner_microstep: 1413.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 01:14:22,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.82 | bwd_microstep: 1154.15 | bwd_inner_microstep: 1154.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2084
[2024-06-10 01:14:23,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.32 | bwd_microstep: 763.49 | bwd_inner_microstep: 763.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-10 01:14:25,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.19 | bwd_microstep: 1194.23 | bwd_inner_microstep: 1194.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-10 01:14:26,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696
[2024-06-10 01:14:29,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.32 | bwd_microstep: 1624.01 | bwd_inner_microstep: 1623.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.27
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 01:14:31,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.45 | bwd_microstep: 1721.79 | bwd_inner_microstep: 1721.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3461
[2024-06-10 01:14:33,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1425.70 | bwd_inner_microstep: 1425.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3445
[2024-06-10 01:14:35,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1320.73 | bwd_inner_microstep: 1320.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 01:14:37,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1608.34 | bwd_inner_microstep: 1608.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 01:14:38,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.82 | bwd_microstep: 805.75 | bwd_inner_microstep: 805.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 01:14:40,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1487.41 | bwd_inner_microstep: 1487.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623
[2024-06-10 01:14:43,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.09 | bwd_microstep: 1709.91 | bwd_inner_microstep: 1709.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 01:14:44,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.70 | bwd_microstep: 1352.68 | bwd_inner_microstep: 1352.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2065
[2024-06-10 01:14:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.80 | bwd_microstep: 917.78 | bwd_inner_microstep: 917.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 01:14:48,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.66 | bwd_microstep: 1404.88 | bwd_inner_microstep: 1404.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 01:14:50,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1493.00 | bwd_inner_microstep: 1492.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 01:14:52,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.05 | bwd_microstep: 1259.46 | bwd_inner_microstep: 1259.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 01:14:53,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.19 | bwd_microstep: 1405.77 | bwd_inner_microstep: 1405.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 01:14:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.41 | bwd_microstep: 1300.68 | bwd_inner_microstep: 1300.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 01:14:57,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.95 | bwd_microstep: 1287.27 | bwd_inner_microstep: 1287.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2156
[2024-06-10 01:14:58,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.05 | bwd_microstep: 854.09 | bwd_inner_microstep: 854.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 01:15:00,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1405.19 | bwd_inner_microstep: 1405.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 01:15:02,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.47 | bwd_microstep: 1443.11 | bwd_inner_microstep: 1442.95 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 01:15:04,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.95 | bwd_microstep: 1602.43 | bwd_inner_microstep: 1602.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 01:15:07,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 01:15:07,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.56 | bwd_microstep: 2280.92 | bwd_inner_microstep: 1740.80 | bwd_allreduce_microstep: 540.05 | step_microstep: 38.88
[2024-06-10 01:15:07,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16068.21 | bwd: 43597.32 | bwd_inner: 43056.06 | bwd_allreduce: 540.39 | step: 41.30
{'loss': 1.3635, 'learning_rate': 2.384615384615385e-05, 'epoch': 0.02}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 01:15:09,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1474.26 | bwd_inner_microstep: 1474.08 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2348
[2024-06-10 01:15:11,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.41 | bwd_microstep: 987.70 | bwd_inner_microstep: 987.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 01:15:12,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.48 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 01:15:14,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.25 | bwd_microstep: 1153.51 | bwd_inner_microstep: 1153.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1943
[2024-06-10 01:15:15,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.30 | bwd_microstep: 826.89 | bwd_inner_microstep: 826.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 01:15:17,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.25 | bwd_microstep: 1636.99 | bwd_inner_microstep: 1636.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 01:15:19,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.48 | bwd_microstep: 1254.18 | bwd_inner_microstep: 1254.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 01:15:21,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.75 | bwd_microstep: 1253.93 | bwd_inner_microstep: 1253.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 01:15:23,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.01 | bwd_microstep: 1384.31 | bwd_inner_microstep: 1384.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 01:15:24,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.40 | bwd_microstep: 685.54 | bwd_inner_microstep: 685.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 01:15:26,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.27 | bwd_microstep: 1254.71 | bwd_inner_microstep: 1254.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 01:15:27,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.50 | bwd_microstep: 1318.06 | bwd_inner_microstep: 1318.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495
[2024-06-10 01:15:29,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.51 | bwd_microstep: 1440.29 | bwd_inner_microstep: 1440.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 01:15:30,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.75 | bwd_microstep: 790.06 | bwd_inner_microstep: 790.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 01:15:33,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.64 | bwd_microstep: 1520.56 | bwd_inner_microstep: 1520.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2286
[2024-06-10 01:15:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.22 | bwd_microstep: 1071.24 | bwd_inner_microstep: 1071.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 01:15:36,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.55 | bwd_microstep: 1518.89 | bwd_inner_microstep: 1518.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 01:15:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1352.06 | bwd_inner_microstep: 1352.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 01:15:40,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 1394.61 | bwd_inner_microstep: 1394.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 01:15:42,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1529.03 | bwd_inner_microstep: 1529.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2125
[2024-06-10 01:15:43,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.86 | bwd_microstep: 926.32 | bwd_inner_microstep: 926.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 01:15:45,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1497.06 | bwd_inner_microstep: 1497.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 01:15:47,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.58 | bwd_microstep: 1284.80 | bwd_inner_microstep: 1284.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3670
[2024-06-10 01:15:49,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.67 | bwd_microstep: 1658.02 | bwd_inner_microstep: 1657.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 01:15:52,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.11 | bwd_microstep: 1498.86 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3826
[2024-06-10 01:15:53,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.58 | bwd_microstep: 1262.58 | bwd_inner_microstep: 1262.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4114
[2024-06-10 01:15:56,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 683.87 | bwd_microstep: 1848.93 | bwd_inner_microstep: 1848.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 01:15:58,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.66 | bwd_microstep: 1582.90 | bwd_inner_microstep: 1582.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3621
[2024-06-10 01:16:00,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.23 | bwd_microstep: 1710.35 | bwd_inner_microstep: 1710.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3563
[2024-06-10 01:16:03,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.11 | bwd_microstep: 1565.44 | bwd_inner_microstep: 1565.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 01:16:04,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1247.70 | bwd_inner_microstep: 1247.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 01:16:11,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.44 | optimizer_step: 6.63
[2024-06-10 01:16:11,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.23 | bwd_microstep: 6025.09 | bwd_inner_microstep: 1683.92 | bwd_allreduce_microstep: 4341.10 | step_microstep: 40.28
[2024-06-10 01:16:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16006.30 | bwd: 47235.13 | bwd_inner: 42892.90 | bwd_allreduce: 4341.43 | step: 42.43
1.84s/it]


  2%|▏         | 26/1726 [32:42<29:12:06, 61.84s/it]
  2%|▏         | 27/1726 [33:42<28:58:11, 61.38s/it]


  2%|▏         | 27/1726 [33:42<28:58:11, 61.38s/it]
  2%|▏         | 28/1726 [34:44<29:00:52, 61.51s/it]


  2%|▏         | 28/1726 [34:44<29:00:52, 61.51s/it]
  2%|▏         | 29/1726 [35:44<28:47:47, 61.09s/it]


  2%|▏         | 29/1726 [35:44<28:47:47, 61.09s/it]
  2%|▏         | 30/1726 [36:44<28:38:35, 60.80s/it]


  2%|▏         | 30/1726 [36:44<28:38:35, 60.80s/it]
  2%|▏         | 31/1726 [37:44<28:31:24, 60.58s/it]


  2%|▏         | 31/1726 [37:44<28:31:24, 60.58s/it]
  2%|▏         | 32/1726 [38:48<28:56:16, 61.50s/it]
                              {'loss': 1.3544, 'learning_rate': 2.461538461538462e-05, 'epoch': 0.02}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 01:16:13,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.95 | bwd_microstep: 1393.68 | bwd_inner_microstep: 1393.52 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469
[2024-06-10 01:16:15,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.56 | bwd_microstep: 1240.56 | bwd_inner_microstep: 1240.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 01:16:16,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1277.91 | bwd_inner_microstep: 1277.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 01:16:19,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.72 | bwd_microstep: 1653.79 | bwd_inner_microstep: 1653.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 01:16:20,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.29 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 01:16:23,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3730
[2024-06-10 01:16:24,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.99 | bwd_microstep: 1337.22 | bwd_inner_microstep: 1337.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4111
[2024-06-10 01:16:27,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.25 | bwd_microstep: 1637.95 | bwd_inner_microstep: 1637.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 01:16:29,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 01:16:30,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.39 | bwd_microstep: 1292.43 | bwd_inner_microstep: 1292.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-10 01:16:32,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1288.84 | bwd_inner_microstep: 1288.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 01:16:34,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.45 | bwd_microstep: 1487.34 | bwd_inner_microstep: 1487.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-10 01:16:36,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.36 | bwd_microstep: 1288.01 | bwd_inner_microstep: 1287.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299
[2024-06-10 01:16:37,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.14 | bwd_microstep: 989.81 | bwd_inner_microstep: 989.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 01:16:40,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.65 | bwd_microstep: 1614.91 | bwd_inner_microstep: 1614.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1934
[2024-06-10 01:16:41,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.69 | bwd_microstep: 819.93 | bwd_inner_microstep: 819.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 01:16:43,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.40 | bwd_microstep: 1531.24 | bwd_inner_microstep: 1531.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3639
[2024-06-10 01:16:45,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.19 | bwd_microstep: 1542.48 | bwd_inner_microstep: 1542.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 01:16:47,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1414.41 | bwd_inner_microstep: 1414.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 01:16:49,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1314.49 | bwd_inner_microstep: 1314.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 01:16:51,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.59 | bwd_microstep: 1284.62 | bwd_inner_microstep: 1284.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 01:16:53,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.86 | bwd_microstep: 1566.36 | bwd_inner_microstep: 1566.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 01:16:55,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.23 | bwd_microstep: 1500.54 | bwd_inner_microstep: 1500.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531
[2024-06-10 01:16:57,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.23 | bwd_microstep: 1591.43 | bwd_inner_microstep: 1591.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 01:16:59,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.47 | bwd_microstep: 1438.56 | bwd_inner_microstep: 1438.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 01:17:01,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.11 | bwd_microstep: 1249.31 | bwd_inner_microstep: 1249.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 01:17:03,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.49 | bwd_microstep: 1555.55 | bwd_inner_microstep: 1555.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 01:17:05,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.08 | bwd_microstep: 1650.71 | bwd_inner_microstep: 1650.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3569
[2024-06-10 01:17:07,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.78 | bwd_microstep: 1547.56 | bwd_inner_microstep: 1547.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 01:17:09,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1381.13 | bwd_inner_microstep: 1381.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 01:17:11,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.79 | bwd_microstep: 1442.03 | bwd_inner_microstep: 1442.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 01:17:13,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 01:17:13,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.93 | bwd_microstep: 1447.00 | bwd_inner_microstep: 1439.17 | bwd_allreduce_microstep: 7.78 | step_microstep: 38.69
[2024-06-10 01:17:13,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16853.40 | bwd: 44973.08 | bwd_inner: 44964.28 | bwd_allreduce: 8.08 | step: 41.03
{'loss': 1.3859, 'learning_rate': 2.5384615384615386e-05, 'epoch': 0.02}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 01:17:15,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1477.06 | bwd_inner_microstep: 1476.85 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3568
[2024-06-10 01:17:17,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.44 | bwd_microstep: 1206.96 | bwd_inner_microstep: 1206.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 01:17:19,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.23 | bwd_microstep: 1481.06 | bwd_inner_microstep: 1481.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 01:17:21,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1316.01 | bwd_inner_microstep: 1315.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 01:17:23,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1382.88 | bwd_inner_microstep: 1382.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 01:17:25,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.34 | bwd_microstep: 1387.65 | bwd_inner_microstep: 1387.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 01:17:27,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.35 | bwd_microstep: 1386.27 | bwd_inner_microstep: 1386.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1434
[2024-06-10 01:17:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 208.71 | bwd_microstep: 536.10 | bwd_inner_microstep: 536.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 01:17:29,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.48 | bwd_microstep: 1293.76 | bwd_inner_microstep: 1293.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689
[2024-06-10 01:17:31,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.46 | bwd_microstep: 1423.92 | bwd_inner_microstep: 1423.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1928
[2024-06-10 01:17:32,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.59 | bwd_microstep: 824.33 | bwd_inner_microstep: 824.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-10 01:17:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1349.50 | bwd_inner_microstep: 1349.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 01:17:36,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2324
[2024-06-10 01:17:37,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.23 | bwd_microstep: 891.12 | bwd_inner_microstep: 891.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 01:17:39,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1493.36 | bwd_inner_microstep: 1493.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 01:17:41,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1374.03 | bwd_inner_microstep: 1374.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3726
[2024-06-10 01:17:43,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.55 | bwd_microstep: 1312.27 | bwd_inner_microstep: 1312.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 01:17:44,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.97 | bwd_microstep: 806.07 | bwd_inner_microstep: 806.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 01:17:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.64 | bwd_microstep: 1662.16 | bwd_inner_microstep: 1662.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3399
[2024-06-10 01:17:48,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1406.93 | bwd_inner_microstep: 1406.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 01:17:50,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 1502.93 | bwd_inner_microstep: 1502.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 01:17:52,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.07 | bwd_microstep: 976.86 | bwd_inner_microstep: 976.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 01:17:54,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1386.77 | bwd_inner_microstep: 1386.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 01:17:56,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1363.08 | bwd_inner_microstep: 1363.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807
[2024-06-10 01:17:57,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.47 | bwd_microstep: 1356.84 | bwd_inner_microstep: 1356.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 01:17:59,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.24 | bwd_microstep: 1281.62 | bwd_inner_microstep: 1281.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-10 01:18:01,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.93 | bwd_microstep: 1355.06 | bwd_inner_microstep: 1355.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3710
[2024-06-10 01:18:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1400.86 | bwd_inner_microstep: 1400.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 01:18:05,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.22 | bwd_microstep: 1399.65 | bwd_inner_microstep: 1399.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2010
[2024-06-10 01:18:06,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.10 | bwd_microstep: 898.67 | bwd_inner_microstep: 898.47 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.22
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3622
[2024-06-10 01:18:09,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.18 | bwd_microstep: 1709.63 | bwd_inner_microstep: 1709.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 01:18:16,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.44 | optimizer_step: 6.58
[2024-06-10 01:18:16,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 7273.79 | bwd_inner_microstep: 1701.48 | bwd_allreduce_microstep: 5572.23 | step_microstep: 40.11
[2024-06-10 01:18:16,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15628.20 | bwd: 47267.74 | bwd_inner: 41694.20 | bwd_allreduce: 5572.72 | step: 42.44
{'loss': 1.3453, 'learning_rate': 2.6153846153846157e-05, 'epoch': 0.02}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942
[2024-06-10 01:18:17,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.33 | bwd_microstep: 721.08 | bwd_inner_microstep: 720.92 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 01:18:19,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.58 | bwd_microstep: 805.88 | bwd_inner_microstep: 805.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 01:18:21,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.91 | bwd_microstep: 1682.55 | bwd_inner_microstep: 1682.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 01:18:23,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.20 | bwd_microstep: 1461.54 | bwd_inner_microstep: 1461.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 01:18:24,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.63 | bwd_microstep: 813.51 | bwd_inner_microstep: 813.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 01:18:26,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1344.23 | bwd_inner_microstep: 1344.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 01:18:28,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.25 | bwd_microstep: 1645.28 | bwd_inner_microstep: 1645.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4102
[2024-06-10 01:18:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.58 | bwd_microstep: 1736.98 | bwd_inner_microstep: 1736.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-10 01:18:33,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.72 | bwd_microstep: 1544.28 | bwd_inner_microstep: 1544.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3948
[2024-06-10 01:18:35,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.23 | bwd_microstep: 1603.21 | bwd_inner_microstep: 1603.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3619
[2024-06-10 01:18:37,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 1361.30 | bwd_inner_microstep: 1361.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643
[2024-06-10 01:18:39,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1607.99 | bwd_inner_microstep: 1607.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674
[2024-06-10 01:18:41,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1690.10 | bwd_inner_microstep: 1690.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 01:18:43,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1392.00 | bwd_inner_microstep: 1391.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3514
[2024-06-10 01:18:45,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1517.75 | bwd_inner_microstep: 1517.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 01:18:48,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.44 | bwd_microstep: 1616.22 | bwd_inner_microstep: 1616.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 01:18:49,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1297.52 | bwd_inner_microstep: 1297.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 01:18:51,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.52 | bwd_microstep: 1523.21 | bwd_inner_microstep: 1523.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2601
[2024-06-10 01:18:53,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.50 | bwd_microstep: 999.57 | bwd_inner_microstep: 999.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3602
[2024-06-10 01:18:55,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1309.51 | bwd_inner_microstep: 1309.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 01:18:57,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.00 | bwd_microstep: 1336.22 | bwd_inner_microstep: 1336.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3718
[2024-06-10 01:18:58,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.67 | bwd_microstep: 1246.38 | bwd_inner_microstep: 1246.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 01:19:00,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1256.85 | bwd_inner_microstep: 1256.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 01:19:02,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.63 | bwd_microstep: 1608.09 | bwd_inner_microstep: 1608.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 01:19:04,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.73 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 01:19:06,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.68 | bwd_microstep: 1362.42 | bwd_inner_microstep: 1362.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 01:19:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.75 | bwd_microstep: 1538.83 | bwd_inner_microstep: 1538.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053
[2024-06-10 01:19:09,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.29 | bwd_microstep: 918.93 | bwd_inner_microstep: 918.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3419
[2024-06-10 01:19:11,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1396.29 | bwd_inner_microstep: 1396.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418
[2024-06-10 01:19:13,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.22 | bwd_microstep: 1315.50 | bwd_inner_microstep: 1315.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3572
[2024-06-10 01:19:16,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.20 | bwd_microstep: 1700.78 | bwd_inner_microstep: 1700.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 01:19:18,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-10 01:19:18,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.73 | bwd_microstep: 2362.67 | bwd_inner_microstep: 1503.13 | bwd_allreduce_microstep: 859.49 | step_microstep: 39.07
[2024-06-10 01:19:18,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16484.28 | bwd: 45122.75 | bwd_inner: 44262.23 | bwd_allreduce: 859.77 | step: 41.05
{'loss': 1.4121, 'learning_rate': 2.6923076923076927e-05, 'epoch': 0.02}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 01:19:20,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.51 | bwd_microstep: 1243.88 | bwd_inner_microstep: 1243.74 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 01:19:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.30 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 01:19:24,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.73 | bwd_microstep: 1659.57 | bwd_inner_microstep: 1659.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2047
[2024-06-10 01:19:25,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.26 | bwd_microstep: 847.47 | bwd_inner_microstep: 847.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 01:19:27,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 01:19:29,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 01:19:30,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.03 | bwd_microstep: 800.07 | bwd_inner_microstep: 800.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870
[2024-06-10 01:19:31,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.50 | bwd_microstep: 679.27 | bwd_inner_microstep: 679.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 01:19:33,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1316.24 | bwd_inner_microstep: 1316.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2196
[2024-06-10 01:19:34,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.58 | bwd_microstep: 909.56 | bwd_inner_microstep: 909.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 01:19:36,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.79 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 01:19:38,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 1383.79 | bwd_inner_microstep: 1383.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 01:19:40,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.86 | bwd_microstep: 1484.38 | bwd_inner_microstep: 1484.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 01:19:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1481.13 | bwd_inner_microstep: 1481.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 01:19:44,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.20 | bwd_microstep: 1431.28 | bwd_inner_microstep: 1431.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498
[2024-06-10 01:19:46,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.10 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2060
[2024-06-10 01:19:47,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.93 | bwd_microstep: 913.68 | bwd_inner_microstep: 913.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 01:19:49,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1423.77 | bwd_inner_microstep: 1423.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 01:19:51,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1295.67 | bwd_inner_microstep: 1295.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 01:19:53,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.08 | bwd_microstep: 1614.72 | bwd_inner_microstep: 1614.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3633
[2024-06-10 01:19:55,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.25 | bwd_microstep: 1266.42 | bwd_inner_microstep: 1266.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 01:19:57,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1287.70 | bwd_inner_microstep: 1287.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 01:19:59,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.39 | bwd_microstep: 1630.00 | bwd_inner_microstep: 1629.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3820
[2024-06-10 01:20:01,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.79 | bwd_microstep: 1609.45 | bwd_inner_microstep: 1609.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 01:20:03,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1504.39 | bwd_inner_microstep: 1504.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 01:20:05,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.12 | bwd_microstep: 1357.75 | bwd_inner_microstep: 1357.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 01:20:08,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.05 | bwd_microstep: 1653.18 | bwd_inner_microstep: 1653.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 01:20:10,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1551.64 | bwd_inner_microstep: 1551.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2935
[2024-06-10 01:20:11,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.05 | bwd_microstep: 1038.60 | bwd_inner_microstep: 1038.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3579
[2024-06-10 01:20:13,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1454.49 | bwd_inner_microstep: 1454.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3573
[2024-06-10 01:20:15,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.82 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 01:20:19,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.05 | optimizer_gradients: 4.41 | optimizer_step: 6.60
[2024-06-10 01:20:19,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.54 | bwd_microstep: 2847.73 | bwd_inner_microstep: 1740.50 | bwd_allreduce_microstep: 1107.15 | step_microstep: 76.25
[2024-06-10 01:20:19,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15967.09 | bwd: 43888.14 | bwd_inner: 42779.92 | bwd_allreduce: 1107.45 | step: 78.20
{'loss': 1.3533, 'learning_rate': 2.7692307692307694e-05, 'epoch': 0.02}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1867
[2024-06-10 01:20:20,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.59 | bwd_microstep: 735.80 | bwd_inner_microstep: 735.68 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3967
[2024-06-10 01:20:22,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.67 | bwd_microstep: 1498.06 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 01:20:24,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.85 | bwd_microstep: 1553.51 | bwd_inner_microstep: 1553.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2461
[2024-06-10 01:20:25,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.45 | bwd_microstep: 951.55 | bwd_inner_microstep: 951.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 01:20:27,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.18 | bwd_microstep: 1251.70 | bwd_inner_microstep: 1251.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4072
[2024-06-10 01:20:29,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.56 | bwd_microstep: 1622.93 | bwd_inner_microstep: 1622.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 01:20:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1287.64 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716
[2024-06-10 01:20:33,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.78 | bwd_microstep: 1335.31 | bwd_inner_microstep: 1335.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 01:20:34,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 795.54 | bwd_inner_microstep: 795.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 01:20:36,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1245.51 | bwd_inner_microstep: 1245.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 01:20:38,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.73 | bwd_microstep: 1323.63 | bwd_inner_microstep: 1323.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3476
[2024-06-10 01:20:40,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.88 | bwd_microstep: 1578.35 | bwd_inner_microstep: 1578.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 01:20:42,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.95 | bwd_microstep: 1484.26 | bwd_inner_microstep: 1484.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3509
[2024-06-10 01:20:44,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.76 | bwd_microstep: 1536.07 | bwd_inner_microstep: 1536.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3516
[2024-06-10 01:20:46,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.48 | bwd_microstep: 1320.18 | bwd_inner_microstep: 1320.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2418
[2024-06-10 01:20:47,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.08 | bwd_microstep: 1005.64 | bwd_inner_microstep: 1005.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1627
[2024-06-10 01:20:48,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 229.06 | bwd_microstep: 584.13 | bwd_inner_microstep: 584.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 01:20:50,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.72 | bwd_microstep: 1251.42 | bwd_inner_microstep: 1251.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 01:20:52,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.18 | bwd_microstep: 1422.06 | bwd_inner_microstep: 1422.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1989
[2024-06-10 01:20:53,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.58 | bwd_microstep: 739.04 | bwd_inner_microstep: 739.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 01:20:55,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.13 | bwd_microstep: 1351.89 | bwd_inner_microstep: 1351.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 01:20:56,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.43 | bwd_microstep: 823.04 | bwd_inner_microstep: 823.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3544
[2024-06-10 01:20:58,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1545.50 | bwd_inner_microstep: 1545.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 01:21:00,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1388.18 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 01:21:01,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.56 | bwd_microstep: 816.69 | bwd_inner_microstep: 816.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 01:21:02,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.23 | bwd_microstep: 979.19 | bwd_inner_microstep: 979.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-10 01:21:05,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.61 | bwd_microstep: 1657.49 | bwd_inner_microstep: 1657.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 01:21:06,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.56 | bwd_microstep: 800.73 | bwd_inner_microstep: 800.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-10 01:21:08,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1555.51 | bwd_inner_microstep: 1555.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-10 01:21:09,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.30 | bwd_microstep: 811.79 | bwd_inner_microstep: 811.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2057
[2024-06-10 01:21:10,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.63 | bwd_microstep: 1011.70 | bwd_inner_microstep: 1011.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3560
[2024-06-10 01:21:19,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.44 | optimizer_step: 6.60
[2024-06-10 01:21:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.48 | bwd_microstep: 8087.64 | bwd_inner_microstep: 1806.14 | bwd_allreduce_microstep: 6281.42 | step_microstep: 40.08
[2024-06-10 01:21:19,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14588.08 | bwd: 45351.69 | bwd_inner: 39069.24 | bwd_allreduce: 6281.71 | step: 42.37
{'loss': 1.2746, 'learning_rate': 2.8461538461538464e-05, 'epoch': 0.02}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 01:21:21,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.24 | bwd_microstep: 1275.98 | bwd_inner_microstep: 1275.83 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3974
[2024-06-10 01:21:23,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.62 | bwd_microstep: 1601.47 | bwd_inner_microstep: 1601.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 01:21:25,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1280.98 | bwd_inner_microstep: 1280.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-10 01:21:27,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.92 | bwd_microstep: 1300.96 | bwd_inner_microstep: 1300.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2047
[2024-06-10 01:21:28,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.75 | bwd_microstep: 751.18 | bwd_inner_microstep: 751.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 01:21:30,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1645.45 | bwd_inner_microstep: 1645.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 01:21:32,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1540.89 | bwd_inner_microstep: 1540.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 01:21:34,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.18 | bwd_microstep: 1389.82 | bwd_inner_microstep: 1389.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 843
[2024-06-10 01:21:34,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 134.50 | bwd_microstep: 347.85 | bwd_inner_microstep: 347.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 01:21:36,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1248.58 | bwd_inner_microstep: 1248.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 01:21:38,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.17 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 01:21:40,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.50 | bwd_microstep: 1492.95 | bwd_inner_microstep: 1492.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1924
[2024-06-10 01:21:41,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.28 | bwd_microstep: 838.27 | bwd_inner_microstep: 838.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795
[2024-06-10 01:21:43,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.32 | bwd_microstep: 1543.02 | bwd_inner_microstep: 1542.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 01:21:45,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1414.83 | bwd_inner_microstep: 1414.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3652
[2024-06-10 01:21:48,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.49 | bwd_microstep: 1820.66 | bwd_inner_microstep: 1820.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3644
[2024-06-10 01:21:50,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.87 | bwd_microstep: 1709.77 | bwd_inner_microstep: 1709.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2414
[2024-06-10 01:21:52,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.64 | bwd_microstep: 940.11 | bwd_inner_microstep: 940.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 01:21:53,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.84 | bwd_microstep: 1164.29 | bwd_inner_microstep: 1164.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 01:21:55,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.03 | bwd_microstep: 1495.58 | bwd_inner_microstep: 1495.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2057
[2024-06-10 01:21:57,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.09 | bwd_microstep: 961.21 | bwd_inner_microstep: 961.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3830
[2024-06-10 01:21:59,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.86 | bwd_microstep: 1588.55 | bwd_inner_microstep: 1588.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3525
[2024-06-10 01:22:00,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.04 | bwd_microstep: 1198.38 | bwd_inner_microstep: 1198.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 01:22:02,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.53 | bwd_microstep: 980.66 | bwd_inner_microstep: 980.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 01:22:04,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1283.91 | bwd_inner_microstep: 1283.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446
[2024-06-10 01:22:05,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1288.24 | bwd_inner_microstep: 1288.05 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.24
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2072
[2024-06-10 01:22:07,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.74 | bwd_microstep: 880.44 | bwd_inner_microstep: 880.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3615
[2024-06-10 01:22:09,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1541.39 | bwd_inner_microstep: 1541.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3381
[2024-06-10 01:22:10,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.67 | bwd_microstep: 1277.12 | bwd_inner_microstep: 1277.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3051
[2024-06-10 01:22:12,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.81 | bwd_microstep: 1137.42 | bwd_inner_microstep: 1137.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 01:22:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.83 | bwd_microstep: 1446.18 | bwd_inner_microstep: 1446.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442
[2024-06-10 01:22:22,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.41 | optimizer_step: 6.61
[2024-06-10 01:22:22,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 7451.84 | bwd_inner_microstep: 1896.57 | bwd_allreduce_microstep: 5555.19 | step_microstep: 39.98
[2024-06-10 01:22:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15451.08 | bwd: 47216.07 | bwd_inner: 41659.65 | bwd_allreduce: 5555.61 | step: 42.23
{'loss': 1.323, 'learning_rate': 2.923076923076923e-05, 'epoch': 0.02}


  2%|▏         | 32/1726 [38:48<28:56:16, 61.50s/it]
  2%|▏         | 33/1726 [39:50<29:01:20, 61.71s/it]


  2%|▏         | 33/1726 [39:50<29:01:20, 61.71s/it]
  2%|▏         | 34/1726 [40:53<29:13:34, 62.18s/it]


  2%|▏         | 34/1726 [40:53<29:13:34, 62.18s/it]
  2%|▏         | 35/1726 [41:55<29:11:02, 62.13s/it]


  2%|▏         | 35/1726 [41:55<29:11:02, 62.13s/it]
  2%|▏         | 36/1726 [42:55<28:54:18, 61.57s/it]


  2%|▏         | 36/1726 [42:55<28:54:18, 61.57s/it]
  2%|▏         | 37/1726 [43:56<28:42:41, 61.20s/it]


  2%|▏         | 37/1726 [43:56<28:42:41, 61.20s/it]
  2%|▏         | 38/1726 [44:59<28:57:20, 61.75s/it]


  2%|▏        dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2752
[2024-06-10 01:22:24,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.03 | bwd_microstep: 1070.24 | bwd_inner_microstep: 1070.09 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3890
[2024-06-10 01:22:26,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1386.75 | bwd_inner_microstep: 1386.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 01:22:27,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.44 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 01:22:30,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1537.82 | bwd_inner_microstep: 1537.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 01:22:31,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.32 | bwd_microstep: 1149.94 | bwd_inner_microstep: 1149.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 01:22:32,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.19 | bwd_microstep: 681.67 | bwd_inner_microstep: 681.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 01:22:33,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.11 | bwd_microstep: 961.46 | bwd_inner_microstep: 961.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 01:22:35,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1389.85 | bwd_inner_microstep: 1389.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 01:22:37,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1257.67 | bwd_inner_microstep: 1257.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-10 01:22:39,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.92 | bwd_microstep: 1716.17 | bwd_inner_microstep: 1716.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 01:22:41,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1376.44 | bwd_inner_microstep: 1376.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3664
[2024-06-10 01:22:44,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.23 | bwd_microstep: 1583.99 | bwd_inner_microstep: 1583.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3481
[2024-06-10 01:22:45,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.32 | bwd_microstep: 1232.59 | bwd_inner_microstep: 1232.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 01:22:47,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.45 | bwd_microstep: 1394.65 | bwd_inner_microstep: 1394.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 01:22:49,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.94 | bwd_microstep: 1488.54 | bwd_inner_microstep: 1488.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 01:22:51,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.86 | bwd_microstep: 1622.50 | bwd_inner_microstep: 1622.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 01:22:53,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.71 | bwd_microstep: 1419.21 | bwd_inner_microstep: 1419.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 01:22:55,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1380.02 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518
[2024-06-10 01:22:57,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1416.54 | bwd_inner_microstep: 1416.38 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.22
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 01:22:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.22 | bwd_microstep: 977.50 | bwd_inner_microstep: 977.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 01:23:01,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.04 | bwd_microstep: 1501.65 | bwd_inner_microstep: 1501.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 01:23:03,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1282.92 | bwd_inner_microstep: 1282.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 01:23:05,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.92 | bwd_microstep: 1437.26 | bwd_inner_microstep: 1437.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3609
[2024-06-10 01:23:06,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 01:23:08,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.52 | bwd_microstep: 1285.96 | bwd_inner_microstep: 1285.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 01:23:10,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.79 | bwd_microstep: 1191.85 | bwd_inner_microstep: 1191.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 01:23:12,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.13 | bwd_microstep: 1534.00 | bwd_inner_microstep: 1533.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298
[2024-06-10 01:23:13,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.21 | bwd_microstep: 979.10 | bwd_inner_microstep: 979.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-10 01:23:15,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.97 | bwd_microstep: 1533.70 | bwd_inner_microstep: 1533.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 01:23:18,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.20 | bwd_microstep: 1651.50 | bwd_inner_microstep: 1651.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3771
[2024-06-10 01:23:20,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.96 | bwd_microstep: 1696.27 | bwd_inner_microstep: 1696.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 01:23:23,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 01:23:23,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.91 | bwd_microstep: 2031.44 | bwd_inner_microstep: 1666.25 | bwd_allreduce_microstep: 365.13 | step_microstep: 39.09
[2024-06-10 01:23:23,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16269.88 | bwd: 43794.64 | bwd_inner: 43428.30 | bwd_allreduce: 365.53 | step: 41.52
{'loss': 1.3599, 'learning_rate': 3.0000000000000004e-05, 'epoch': 0.02}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 01:23:24,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.94 | bwd_microstep: 1388.20 | bwd_inner_microstep: 1387.99 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 01:23:27,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.08 | bwd_microstep: 1588.65 | bwd_inner_microstep: 1588.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4270
[2024-06-10 01:23:29,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.47 | bwd_microstep: 1772.59 | bwd_inner_microstep: 1772.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 01:23:31,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.59 | bwd_microstep: 1243.76 | bwd_inner_microstep: 1243.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-10 01:23:32,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.14 | bwd_microstep: 683.52 | bwd_inner_microstep: 683.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2634
[2024-06-10 01:23:33,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.31 | bwd_microstep: 1113.95 | bwd_inner_microstep: 1113.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2189
[2024-06-10 01:23:35,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.39 | bwd_microstep: 954.64 | bwd_inner_microstep: 954.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 01:23:37,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.88 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506
[2024-06-10 01:23:38,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.26 | bwd_microstep: 1196.38 | bwd_inner_microstep: 1196.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 01:23:40,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.66 | bwd_microstep: 1154.42 | bwd_inner_microstep: 1154.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3020
[2024-06-10 01:23:42,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.27 | bwd_microstep: 1230.80 | bwd_inner_microstep: 1230.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2116
[2024-06-10 01:23:43,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.37 | bwd_microstep: 829.34 | bwd_inner_microstep: 829.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 01:23:45,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.55 | bwd_microstep: 1413.75 | bwd_inner_microstep: 1413.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656
[2024-06-10 01:23:47,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.75 | bwd_microstep: 1718.15 | bwd_inner_microstep: 1718.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482
[2024-06-10 01:23:49,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.08 | bwd_microstep: 1575.50 | bwd_inner_microstep: 1575.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-10 01:23:51,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1347.27 | bwd_inner_microstep: 1347.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 01:23:53,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.68 | bwd_microstep: 1513.31 | bwd_inner_microstep: 1513.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1968
[2024-06-10 01:23:54,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.48 | bwd_microstep: 825.15 | bwd_inner_microstep: 825.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017
[2024-06-10 01:23:56,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.44 | bwd_microstep: 841.06 | bwd_inner_microstep: 841.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2289
[2024-06-10 01:23:57,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.50 | bwd_microstep: 1009.85 | bwd_inner_microstep: 1009.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 01:23:59,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.52 | bwd_microstep: 1402.70 | bwd_inner_microstep: 1402.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 01:24:01,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.37 | bwd_microstep: 1648.19 | bwd_inner_microstep: 1648.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 01:24:03,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1296.89 | bwd_inner_microstep: 1296.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 01:24:05,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1350.33 | bwd_inner_microstep: 1350.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587
[2024-06-10 01:24:08,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.94 | bwd_microstep: 2326.93 | bwd_inner_microstep: 2326.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065
[2024-06-10 01:24:09,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.56 | bwd_microstep: 818.58 | bwd_inner_microstep: 818.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 01:24:11,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.97 | bwd_microstep: 1405.66 | bwd_inner_microstep: 1405.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2752
[2024-06-10 01:24:12,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.50 | bwd_microstep: 1179.47 | bwd_inner_microstep: 1179.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 01:24:14,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.58 | bwd_microstep: 1426.90 | bwd_inner_microstep: 1426.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 01:24:16,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.11 | bwd_microstep: 1492.96 | bwd_inner_microstep: 1492.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 01:24:19,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.08 | bwd_microstep: 1639.04 | bwd_inner_microstep: 1639.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3565
[2024-06-10 01:24:25,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.39 | optimizer_step: 6.56
[2024-06-10 01:24:25,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.47 | bwd_microstep: 5609.02 | bwd_inner_microstep: 1735.78 | bwd_allreduce_microstep: 3873.16 | step_microstep: 40.07
[2024-06-10 01:24:25,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15519.28 | bwd: 46403.62 | bwd_inner: 42529.30 | bwd_allreduce: 3873.49 | step: 42.39
{'loss': 1.3784, 'learning_rate': 3.0769230769230774e-05, 'epoch': 0.02}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 01:24:27,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.93 | bwd_microstep: 1478.84 | bwd_inner_microstep: 1478.69 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2499
[2024-06-10 01:24:28,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.91 | bwd_microstep: 834.69 | bwd_inner_microstep: 834.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 01:24:30,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.08 | bwd_microstep: 1151.09 | bwd_inner_microstep: 1151.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3834
[2024-06-10 01:24:32,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.70 | bwd_microstep: 1617.44 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2250
[2024-06-10 01:24:33,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.93 | bwd_microstep: 966.88 | bwd_inner_microstep: 966.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 01:24:35,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.66 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 01:24:37,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1249.72 | bwd_inner_microstep: 1249.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 01:24:39,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 01:24:41,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1391.29 | bwd_inner_microstep: 1391.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3694
[2024-06-10 01:24:43,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.51 | bwd_microstep: 1580.14 | bwd_inner_microstep: 1580.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432
[2024-06-10 01:24:44,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.24 | bwd_microstep: 1193.04 | bwd_inner_microstep: 1193.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887
[2024-06-10 01:24:45,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.60 | bwd_microstep: 682.47 | bwd_inner_microstep: 682.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-10 01:24:47,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.09 | bwd_microstep: 903.80 | bwd_inner_microstep: 903.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2397
[2024-06-10 01:24:48,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.34 | bwd_microstep: 939.30 | bwd_inner_microstep: 939.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 01:24:50,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1354.42 | bwd_inner_microstep: 1354.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 01:24:52,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.09 | bwd_microstep: 1569.72 | bwd_inner_microstep: 1569.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 01:24:54,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1291.40 | bwd_inner_microstep: 1291.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 01:24:56,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.92 | bwd_microstep: 1344.08 | bwd_inner_microstep: 1344.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 01:24:58,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1530.77 | bwd_inner_microstep: 1530.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-10 01:24:59,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.12 | bwd_microstep: 1190.06 | bwd_inner_microstep: 1190.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1923
[2024-06-10 01:25:00,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.82 | bwd_microstep: 731.16 | bwd_inner_microstep: 731.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 01:25:02,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.72 | bwd_microstep: 1298.69 | bwd_inner_microstep: 1298.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 01:25:04,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.42 | bwd_microstep: 1615.10 | bwd_inner_microstep: 1615.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 01:25:06,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1402.16 | bwd_inner_microstep: 1402.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3527
[2024-06-10 01:25:08,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1357.14 | bwd_inner_microstep: 1357.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2047
[2024-06-10 01:25:10,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.97 | bwd_microstep: 1004.71 | bwd_inner_microstep: 1004.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2290
[2024-06-10 01:25:11,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.19 | bwd_microstep: 881.02 | bwd_inner_microstep: 881.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 01:25:13,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.47 | bwd_microstep: 1448.86 | bwd_inner_microstep: 1448.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2275
[2024-06-10 01:25:14,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.25 | bwd_microstep: 813.40 | bwd_inner_microstep: 813.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3616
[2024-06-10 01:25:16,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1710.28 | bwd_inner_microstep: 1710.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 01:25:18,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1402.57 | bwd_inner_microstep: 1402.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3802
[2024-06-10 01:25:25,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.42 | optimizer_step: 6.56
[2024-06-10 01:25:25,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.56 | bwd_microstep: 5712.56 | bwd_inner_microstep: 1985.05 | bwd_allreduce_microstep: 3727.44 | step_microstep: 40.11
[2024-06-10 01:25:25,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15196.97 | bwd: 44323.33 | bwd_inner: 40594.71 | bwd_allreduce: 3727.77 | step: 42.43
{'loss': 1.2985, 'learning_rate': 3.153846153846154e-05, 'epoch': 0.02}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 01:25:27,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.55 | bwd_microstep: 1480.86 | bwd_inner_microstep: 1480.71 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 01:25:29,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.09 | bwd_microstep: 1494.59 | bwd_inner_microstep: 1494.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 01:25:31,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1498.01 | bwd_inner_microstep: 1497.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 01:25:33,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.78 | bwd_microstep: 1150.04 | bwd_inner_microstep: 1150.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-10 01:25:35,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1415.97 | bwd_inner_microstep: 1415.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 01:25:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.79 | bwd_microstep: 686.77 | bwd_inner_microstep: 686.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 01:25:38,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.87 | bwd_microstep: 1539.59 | bwd_inner_microstep: 1539.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 01:25:40,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1384.26 | bwd_inner_microstep: 1384.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 01:25:41,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1393.33 | bwd_inner_microstep: 1393.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 01:25:43,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1497.27 | bwd_inner_microstep: 1497.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3688
[2024-06-10 01:25:46,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.13 | bwd_microstep: 1661.77 | bwd_inner_microstep: 1661.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 01:25:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1285.84 | bwd_inner_microstep: 1285.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2964
[2024-06-10 01:25:49,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.06 | bwd_microstep: 1198.08 | bwd_inner_microstep: 1198.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 01:25:51,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.66 | bwd_microstep: 1282.75 | bwd_inner_microstep: 1282.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3673
[2024-06-10 01:25:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.89 | bwd_microstep: 1478.64 | bwd_inner_microstep: 1478.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 01:25:55,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.71 | bwd_microstep: 1486.21 | bwd_inner_microstep: 1486.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 01:25:57,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.83 | bwd_microstep: 1543.36 | bwd_inner_microstep: 1543.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 01:25:59,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1349.87 | bwd_inner_microstep: 1349.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 01:26:01,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.77 | bwd_microstep: 1505.02 | bwd_inner_microstep: 1504.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 01:26:03,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.09 | bwd_microstep: 1595.86 | bwd_inner_microstep: 1595.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3629
[2024-06-10 01:26:05,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1473.96 | bwd_inner_microstep: 1473.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-10 01:26:07,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.12 | bwd_microstep: 978.85 | bwd_inner_microstep: 978.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 01:26:09,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.67 | bwd_microstep: 1315.44 | bwd_inner_microstep: 1315.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2287
[2024-06-10 01:26:10,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.17 | bwd_microstep: 912.78 | bwd_inner_microstep: 912.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3612
[2024-06-10 01:26:12,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.85 | bwd_microstep: 1441.08 | bwd_inner_microstep: 1441.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2019
[2024-06-10 01:26:13,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.04 | bwd_microstep: 718.64 | bwd_inner_microstep: 718.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 01:26:15,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1320.54 | bwd_inner_microstep: 1320.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-10 01:26:17,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.79 | bwd_microstep: 1427.34 | bwd_inner_microstep: 1427.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 01:26:19,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1362.85 | bwd_inner_microstep: 1362.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 01:26:21,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.43 | bwd_microstep: 1453.66 | bwd_inner_microstep: 1453.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 01:26:22,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1346.46 | bwd_inner_microstep: 1346.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1032
[2024-06-10 01:26:26,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.41 | optimizer_step: 6.62
[2024-06-10 01:26:26,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 156.94 | bwd_microstep: 3482.71 | bwd_inner_microstep: 525.85 | bwd_allreduce_microstep: 2956.79 | step_microstep: 41.34
[2024-06-10 01:26:26,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15776.33 | bwd: 45162.44 | bwd_inner: 42204.60 | bwd_allreduce: 2957.09 | step: 44.15
{'loss': 1.3171, 'learning_rate': 3.230769230769231e-05, 'epoch': 0.02}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 01:26:28,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1278.31 | bwd_inner_microstep: 1278.16 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3467
[2024-06-10 01:26:30,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1332.50 | bwd_inner_microstep: 1332.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 01:26:32,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.17 | bwd_microstep: 1660.52 | bwd_inner_microstep: 1660.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3787
[2024-06-10 01:26:34,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1547.32 | bwd_inner_microstep: 1547.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 01:26:36,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.31 | bwd_microstep: 1400.36 | bwd_inner_microstep: 1400.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 01:26:38,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.34 | bwd_microstep: 1387.65 | bwd_inner_microstep: 1387.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3708
[2024-06-10 01:26:40,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.61 | bwd_microstep: 1235.99 | bwd_inner_microstep: 1235.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 01:26:42,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1281.27 | bwd_inner_microstep: 1281.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 01:26:43,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.47 | bwd_microstep: 793.50 | bwd_inner_microstep: 793.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3491
[2024-06-10 01:26:45,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.77 | bwd_microstep: 1437.90 | bwd_inner_microstep: 1437.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2374
[2024-06-10 01:26:46,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.37 | bwd_microstep: 1030.48 | bwd_inner_microstep: 1030.37 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.17
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3496
[2024-06-10 01:26:48,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.80 | bwd_microstep: 1680.47 | bwd_inner_microstep: 1680.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 01:26:50,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.30 | bwd_microstep: 1342.56 | bwd_inner_microstep: 1342.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 01:26:52,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.54 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 01:26:54,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1285.15 | bwd_inner_microstep: 1285.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 01:26:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1256.93 | bwd_inner_microstep: 1256.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 01:26:58,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 01:26:59,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.97 | bwd_microstep: 1290.89 | bwd_inner_microstep: 1290.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 01:27:01,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 01:27:03,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.06 | bwd_microstep: 1493.20 | bwd_inner_microstep: 1493.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1937
[2024-06-10 01:27:04,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.46 | bwd_microstep: 730.05 | bwd_inner_microstep: 730.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 01:27:06,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1512.41 | bwd_inner_microstep: 1512.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3427
[2024-06-10 01:27:08,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1985
[2024-06-10 01:27:10,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.31 | bwd_microstep: 896.48 | bwd_inner_microstep: 896.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 01:27:12,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.86 | bwd_microstep: 1589.10 | bwd_inner_microstep: 1589.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379
[2024-06-10 01:27:14,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.97 | bwd_microstep: 1436.98 | bwd_inner_microstep: 1436.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1909
[2024-06-10 01:27:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.94 | bwd_microstep: 717.02 | bwd_inner_microstep: 716.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379
[2024-06-10 01:27:17,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.19 | bwd_microstep: 1244.85 | bwd_inner_microstep: 1244.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 01:27:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.48 | bwd_microstep: 1433.68 | bwd_inner_microstep: 1433.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 01:27:20,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.39 | bwd_microstep: 1306.52 | bwd_inner_microstep: 1306.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 01:27:22,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.94 | bwd_microstep: 1440.18 | bwd_inner_microstep: 1440.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 01:27:29,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.37 | optimizer_step: 6.61
[2024-06-10 01:27:29,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.90 | bwd_microstep: 5897.96 | bwd_inner_microstep: 1802.44 | bwd_allreduce_microstep: 4095.45 | step_microstep: 40.24
[2024-06-10 01:27:29,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15842.38 | bwd: 46501.85 | bwd_inner: 42405.22 | bwd_allreduce: 4095.81 | step: 42.49
{'loss': 1.3918, 'learning_rate': 3.307692307692308e-05, 'epoch': 0.02}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3417
[2024-06-10 01:27:31,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.33 | bwd_microstep: 1208.24 | bwd_inner_microstep: 1208.16 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 01:27:32,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1343.49 | bwd_inner_microstep: 1343.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3412
[2024-06-10 01:27:34,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.94 | bwd_microstep: 1297.31 | bwd_inner_microstep: 1297.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 01:27:36,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1325.98 | bwd_inner_microstep: 1325.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 01:27:38,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.98 | bwd_microstep: 1267.73 | bwd_inner_microstep: 1267.56 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.23
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 01:27:40,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.62 | bwd_microstep: 1282.03 | bwd_inner_microstep: 1282.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 01:27:41,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1387.09 | bwd_inner_microstep: 1387.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2869
[2024-06-10 01:27:43,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.72 | bwd_microstep: 1069.30 | bwd_inner_microstep: 1069.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3706
[2024-06-10 01:27:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1457.95 | bwd_inner_microstep: 1457.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 01:27:47,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.25 | bwd_microstep: 1192.84 | bwd_inner_microstep: 1192.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 01:27:49,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.61 | bwd_microstep: 1411.43 | bwd_inner_microstep: 1411.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 01:27:50,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.37 | bwd_microstep: 1379.08 | bwd_inner_microstep: 1379.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3630
[2024-06-10 01:27:53,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.65 | bwd_microstep: 1539.47 | bwd_inner_microstep: 1539.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 01:27:55,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1585.58 | bwd_inner_microstep: 1585.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 01:27:57,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.47 | bwd_microstep: 1486.20 | bwd_inner_microstep: 1486.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 01:27:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1581.37 | bwd_inner_microstep: 1581.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 01:28:01,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.43 | bwd_microstep: 1489.42 | bwd_inner_microstep: 1489.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3636
[2024-06-10 01:28:03,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.27 | bwd_microstep: 1713.29 | bwd_inner_microstep: 1713.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 01:28:05,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.43 | bwd_microstep: 1504.66 | bwd_inner_microstep: 1504.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 01:28:07,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1396.41 | bwd_inner_microstep: 1396.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3552
[2024-06-10 01:28:09,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1478.25 | bwd_inner_microstep: 1478.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 01:28:11,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1282.18 | bwd_inner_microstep: 1282.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 01:28:13,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.10 | bwd_microstep: 1615.62 | bwd_inner_microstep: 1615.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 01:28:15,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.88 | bwd_microstep: 1466.29 | bwd_inner_microstep: 1466.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 01:28:18,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.10 | bwd_microstep: 1526.87 | bwd_inner_microstep: 1526.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 01:28:20,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.86 | bwd_microstep: 1501.67 | bwd_inner_microstep: 1501.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3524
[2024-06-10 01:28:21,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1259.71 | bwd_inner_microstep: 1259.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 01:28:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.53 | bwd_microstep: 1438.16 | bwd_inner_microstep: 1438.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-10 01:28:25,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.06 | bwd_microstep: 813.61 | bwd_inner_microstep: 813.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 01:28:27,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.09 | bwd_microstep: 1426.62 | bwd_inner_microstep: 1426.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3287
[2024-06-10 01:28:28,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.66 | bwd_microstep: 1320.15 | bwd_inner_microstep: 1320.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 01:28:32,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 7.16
[2024-06-10 01:28:32,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.26 | bwd_microstep: 2746.21 | bwd_inner_microstep: 1533.84 | bwd_allreduce_microstep: 1212.31 | step_microstep: 39.49
[2024-06-10 01:28:32,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16617.43 | bwd: 45794.28 | bwd_inner: 44580.83 | bwd_allreduce: 1212.69 | step: 41.81
{'loss': 1.3376, 'learning_rate': 3.384615384615385e-05, 'epoch': 0.03}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 01:28:33,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1236.25 | bwd_inner_microstep: 1236.10 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 01:28:35,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1380.65 | bwd_inner_microstep: 1380.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 01:28:37,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.09 | bwd_microstep: 1556.45 | bwd_inner_microstep: 1556.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-10 01:28:39,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.59 | bwd_microstep: 1448.02 | bwd_inner_microstep: 1447.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 01:28:41,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739
[2024-06-10 01:28:44,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.51 | bwd_microstep: 1637.04 | bwd_inner_microstep: 1637.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 01:28:45,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.40 | bwd_microstep: 1282.45 | bwd_inner_microstep: 1282.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 01:28:47,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.60 | bwd_microstep: 1305.07 | bwd_inner_microstep: 1305.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 01:28:49,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.41 | bwd_microstep: 1189.32 | bwd_inner_microstep: 1189.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 01:28:51,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.42 | bwd_microstep: 1644.57 | bwd_inner_microstep: 1644.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1974
[2024-06-10 01:28:52,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.57 | bwd_microstep: 753.70 | bwd_inner_microstep: 753.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547
[2024-06-10 01:28:54,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1462.55 | bwd_inner_microstep: 1462.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3669
[2024-06-10 01:28:56,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.80 | bwd_microstep: 1591.20 | bwd_inner_microstep: 1591.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 01:28:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.79 | bwd_microstep: 1451.07 | bwd_inner_microstep: 1451.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 01:29:00,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.91 | bwd_microstep: 1490.77 | bwd_inner_microstep: 1490.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3936
[2024-06-10 01:29:03,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.24 | bwd_microstep: 1692.80 | bwd_inner_microstep: 1692.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3491
[2024-06-10 01:29:05,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.50 | bwd_microstep: 1366.36 | bwd_inner_microstep: 1366.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 01:29:06,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1314.85 | bwd_inner_microstep: 1314.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 01:29:08,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1432.95 | bwd_inner_microstep: 1432.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 01:29:10,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1354.24 | bwd_inner_microstep: 1354.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 01:29:12,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.91 | bwd_microstep: 1463.16 | bwd_inner_microstep: 1463.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 01:29:14,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.73 | bwd_microstep: 1388.20 | bwd_inner_microstep: 1388.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-10 01:29:16,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.19 | bwd_microstep: 1214.68 | bwd_inner_microstep: 1214.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2460
[2024-06-10 01:29:17,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.39 | bwd_microstep: 955.44 | bwd_inner_microstep: 955.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 01:29:20,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.37 | bwd_microstep: 1604.58 | bwd_inner_microstep: 1604.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883
[2024-06-10 01:29:22,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.17 | bwd_microstep: 1593.33 | bwd_inner_microstep: 1593.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 01:29:24,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.94 | bwd_microstep: 1401.30 | bwd_inner_microstep: 1401.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814
[2024-06-10 01:29:26,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1585.70 | bwd_inner_microstep: 1585.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 01:29:28,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.94 | bwd_microstep: 1564.04 | bwd_inner_microstep: 1564.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2284
[2024-06-10 01:29:29,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.57 | bwd_microstep: 1013.77 | bwd_inner_microstep: 1013.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 01:29:32,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.39 | bwd_microstep: 1624.26 | bwd_inner_microstep: 1624.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 01:29:34,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 01:29:34,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.31 | bwd_microstep: 1713.20 | bwd_inner_microstep: 877.94 | bwd_allreduce_microstep: 835.17 | step_microstep: 39.27
[2024-06-10 01:29:34,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16576.38 | bwd: 45095.85 | bwd_inner: 44259.61 | bwd_allreduce: 835.50 | step: 41.94
 | 38/1726 [44:59<28:57:20, 61.75s/it]
  2%|▏         | 39/1726 [45:59<28:45:26, 61.37s/it]


  2%|▏         | 39/1726 [45:59<28:45:26, 61.37s/it]
  2%|▏         | 40/1726 [47:02<28:52:20, 61.65s/it]


  2%|▏         | 40/1726 [47:02<28:52:20, 61.65s/it]
  2%|▏         | 41/1726 [48:02<28:36:41, 61.13s/it]


  2%|▏         | 41/1726 [48:02<28:36:41, 61.13s/it]
  2%|▏         | 42/1726 [49:03<28:37:25, 61.19s/it]


  2%|▏         | 42/1726 [49:03<28:37:25, 61.19s/it]
  2%|▏         | 43/1726 [50:06<28:49:24, 61.65s/it]


  2%|▏         | 43/1726 [50:06<28:49:24, 61.65s/it]
  3%|▎         | 44/1726 [51:08<28:58:05, 62.00s/it]


  3%|▎         | 44/1726 [51:08<28:58:05, 62.00s/it]
 {'loss': 1.3132, 'learning_rate': 3.461538461538462e-05, 'epoch': 0.03}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 01:29:36,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1472.28 | bwd_inner_microstep: 1472.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 01:29:38,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1277.46 | bwd_inner_microstep: 1277.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3855
[2024-06-10 01:29:40,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.07 | bwd_microstep: 1431.05 | bwd_inner_microstep: 1431.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-10 01:29:42,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.12 | bwd_microstep: 1481.15 | bwd_inner_microstep: 1481.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 01:29:43,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1246.88 | bwd_inner_microstep: 1246.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 01:29:45,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1381.71 | bwd_inner_microstep: 1381.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 01:29:47,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2061
[2024-06-10 01:29:48,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.80 | bwd_microstep: 817.19 | bwd_inner_microstep: 817.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 01:29:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.98 | bwd_microstep: 1291.97 | bwd_inner_microstep: 1291.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 01:29:51,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.85 | bwd_microstep: 797.69 | bwd_inner_microstep: 797.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2987
[2024-06-10 01:29:53,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.10 | bwd_microstep: 1110.26 | bwd_inner_microstep: 1110.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 01:29:55,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.03 | bwd_microstep: 1488.29 | bwd_inner_microstep: 1488.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3478
[2024-06-10 01:29:57,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.94 | bwd_microstep: 1365.64 | bwd_inner_microstep: 1365.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 01:29:59,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.36 | bwd_microstep: 1344.23 | bwd_inner_microstep: 1344.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3455
[2024-06-10 01:30:00,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1416.21 | bwd_inner_microstep: 1416.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2000
[2024-06-10 01:30:02,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.64 | bwd_microstep: 928.51 | bwd_inner_microstep: 928.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918
[2024-06-10 01:30:03,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.20 | bwd_microstep: 780.00 | bwd_inner_microstep: 779.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 01:30:05,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.63 | bwd_microstep: 1455.60 | bwd_inner_microstep: 1455.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 01:30:07,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1528.03 | bwd_inner_microstep: 1528.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 01:30:09,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.77 | bwd_microstep: 1395.56 | bwd_inner_microstep: 1395.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3621
[2024-06-10 01:30:11,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1345.52 | bwd_inner_microstep: 1345.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 01:30:12,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.10 | bwd_microstep: 700.37 | bwd_inner_microstep: 700.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3667
[2024-06-10 01:30:14,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.58 | bwd_microstep: 1355.81 | bwd_inner_microstep: 1355.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 01:30:16,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 01:30:18,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.61 | bwd_microstep: 1432.58 | bwd_inner_microstep: 1432.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2260
[2024-06-10 01:30:19,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.43 | bwd_microstep: 875.47 | bwd_inner_microstep: 875.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2015
[2024-06-10 01:30:20,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.05 | bwd_microstep: 715.11 | bwd_inner_microstep: 715.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818
[2024-06-10 01:30:22,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.72 | bwd_microstep: 1727.35 | bwd_inner_microstep: 1727.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3584
[2024-06-10 01:30:25,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.86 | bwd_microstep: 1696.71 | bwd_inner_microstep: 1696.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 01:30:27,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1405.32 | bwd_inner_microstep: 1405.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 01:30:29,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.39 | bwd_microstep: 1642.17 | bwd_inner_microstep: 1642.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2003
[2024-06-10 01:30:35,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.45 | optimizer_step: 6.61
[2024-06-10 01:30:35,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.26 | bwd_microstep: 5807.70 | bwd_inner_microstep: 983.71 | bwd_allreduce_microstep: 4823.92 | step_microstep: 42.32
[2024-06-10 01:30:35,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.39 | bwd: 45610.74 | bwd_inner: 40785.76 | bwd_allreduce: 4824.22 | step: 44.93
{'loss': 1.369, 'learning_rate': 3.538461538461539e-05, 'epoch': 0.03}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 01:30:37,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1279.93 | bwd_inner_microstep: 1279.77 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2498
[2024-06-10 01:30:38,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.08 | bwd_microstep: 1025.94 | bwd_inner_microstep: 1025.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 01:30:40,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 01:30:42,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1397.42 | bwd_inner_microstep: 1397.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 01:30:44,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.50 | bwd_microstep: 1376.62 | bwd_inner_microstep: 1376.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-10 01:30:46,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.19 | bwd_microstep: 1636.28 | bwd_inner_microstep: 1636.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 01:30:48,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.57 | bwd_microstep: 1344.17 | bwd_inner_microstep: 1344.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 01:30:49,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.80 | bwd_microstep: 780.62 | bwd_inner_microstep: 780.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 01:30:51,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1389.49 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 01:30:52,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.34 | bwd_microstep: 687.80 | bwd_inner_microstep: 687.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 01:30:54,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.58 | bwd_microstep: 1486.69 | bwd_inner_microstep: 1486.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3569
[2024-06-10 01:30:56,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1526.84 | bwd_inner_microstep: 1526.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 01:30:58,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.28 | bwd_microstep: 1620.39 | bwd_inner_microstep: 1620.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2515
[2024-06-10 01:31:00,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.28 | bwd_microstep: 1057.59 | bwd_inner_microstep: 1057.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663
[2024-06-10 01:31:02,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.66 | bwd_microstep: 1623.87 | bwd_inner_microstep: 1623.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1883
[2024-06-10 01:31:03,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.92 | bwd_microstep: 745.13 | bwd_inner_microstep: 745.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1213
[2024-06-10 01:31:04,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 189.82 | bwd_microstep: 497.24 | bwd_inner_microstep: 497.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 01:31:05,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1253.83 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897
[2024-06-10 01:31:06,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.73 | bwd_microstep: 685.37 | bwd_inner_microstep: 685.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 01:31:08,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1389.96 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-10 01:31:10,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.76 | bwd_microstep: 1290.23 | bwd_inner_microstep: 1290.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 01:31:12,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.29 | bwd_microstep: 1191.71 | bwd_inner_microstep: 1191.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3767
[2024-06-10 01:31:14,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.17 | bwd_microstep: 1745.90 | bwd_inner_microstep: 1745.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3576
[2024-06-10 01:31:16,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.88 | bwd_microstep: 1241.19 | bwd_inner_microstep: 1241.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 01:31:17,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.22 | bwd_microstep: 917.41 | bwd_inner_microstep: 917.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062
[2024-06-10 01:31:18,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.31 | bwd_microstep: 815.74 | bwd_inner_microstep: 815.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3590
[2024-06-10 01:31:20,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.62 | bwd_microstep: 1215.92 | bwd_inner_microstep: 1215.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 01:31:22,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.73 | bwd_microstep: 1189.85 | bwd_inner_microstep: 1189.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 01:31:24,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.69 | bwd_microstep: 1545.74 | bwd_inner_microstep: 1545.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 01:31:26,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.42 | bwd_microstep: 1434.34 | bwd_inner_microstep: 1434.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 01:31:28,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1492.76 | bwd_inner_microstep: 1492.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-10 01:31:36,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.43 | optimizer_step: 6.62
[2024-06-10 01:31:36,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.70 | bwd_microstep: 8009.15 | bwd_inner_microstep: 1216.02 | bwd_allreduce_microstep: 6793.05 | step_microstep: 40.27
[2024-06-10 01:31:36,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14732.33 | bwd: 46137.65 | bwd_inner: 39343.49 | bwd_allreduce: 6793.36 | step: 42.77
{'loss': 1.2947, 'learning_rate': 3.615384615384616e-05, 'epoch': 0.03}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 01:31:38,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.09 | bwd_microstep: 1326.90 | bwd_inner_microstep: 1326.73 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3858
[2024-06-10 01:31:40,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1366.90 | bwd_inner_microstep: 1366.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 01:31:42,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1377.15 | bwd_inner_microstep: 1377.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 01:31:43,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 817.51 | bwd_inner_microstep: 817.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2258
[2024-06-10 01:31:44,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.36 | bwd_microstep: 780.58 | bwd_inner_microstep: 780.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 01:31:46,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1351.39 | bwd_inner_microstep: 1351.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 01:31:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1281.37 | bwd_inner_microstep: 1281.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-10 01:31:49,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.87 | bwd_microstep: 1185.38 | bwd_inner_microstep: 1185.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 01:31:51,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1390.47 | bwd_inner_microstep: 1390.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3733
[2024-06-10 01:31:53,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1365.60 | bwd_inner_microstep: 1365.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1962
[2024-06-10 01:31:54,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.60 | bwd_microstep: 825.88 | bwd_inner_microstep: 825.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 01:31:56,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1483.41 | bwd_inner_microstep: 1483.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2056
[2024-06-10 01:31:58,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.71 | bwd_microstep: 913.18 | bwd_inner_microstep: 913.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 01:32:00,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.82 | bwd_microstep: 1614.60 | bwd_inner_microstep: 1614.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3512
[2024-06-10 01:32:02,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1553.46 | bwd_inner_microstep: 1553.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 01:32:03,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.88 | bwd_microstep: 800.27 | bwd_inner_microstep: 800.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150
[2024-06-10 01:32:04,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.82 | bwd_microstep: 954.71 | bwd_inner_microstep: 954.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3468
[2024-06-10 01:32:07,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.86 | bwd_microstep: 1507.86 | bwd_inner_microstep: 1507.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 01:32:09,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1529.35 | bwd_inner_microstep: 1529.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-10 01:32:10,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.93 | bwd_microstep: 880.21 | bwd_inner_microstep: 880.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1965
[2024-06-10 01:32:11,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 704.07 | bwd_inner_microstep: 704.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2172
[2024-06-10 01:32:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.66 | bwd_microstep: 856.66 | bwd_inner_microstep: 856.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3824
[2024-06-10 01:32:14,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.37 | bwd_microstep: 1729.38 | bwd_inner_microstep: 1729.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 01:32:17,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1491.74 | bwd_inner_microstep: 1491.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3810
[2024-06-10 01:32:19,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.57 | bwd_microstep: 1687.44 | bwd_inner_microstep: 1687.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 01:32:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.01 | bwd_microstep: 1377.96 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 01:32:23,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-10 01:32:24,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.28 | bwd_microstep: 810.12 | bwd_inner_microstep: 810.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 01:32:26,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.80 | bwd_microstep: 1608.81 | bwd_inner_microstep: 1608.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 01:32:28,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.13 | bwd_microstep: 1400.63 | bwd_inner_microstep: 1400.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 01:32:30,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1282.97 | bwd_inner_microstep: 1282.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3560
[2024-06-10 01:32:38,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.44 | optimizer_step: 6.59
[2024-06-10 01:32:38,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 7427.95 | bwd_inner_microstep: 1731.46 | bwd_allreduce_microstep: 5696.41 | step_microstep: 40.23
[2024-06-10 01:32:38,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15099.71 | bwd: 46081.28 | bwd_inner: 40383.80 | bwd_allreduce: 5696.72 | step: 42.66
{'loss': 1.3644, 'learning_rate': 3.692307692307693e-05, 'epoch': 0.03}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3409
[2024-06-10 01:32:40,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1395.09 | bwd_inner_microstep: 1394.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-10 01:32:42,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1310.97 | bwd_inner_microstep: 1310.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 01:32:43,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.09 | bwd_microstep: 814.99 | bwd_inner_microstep: 814.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 01:32:44,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.61 | bwd_microstep: 1149.05 | bwd_inner_microstep: 1149.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 01:32:46,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1379.54 | bwd_inner_microstep: 1379.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 01:32:48,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.42 | bwd_microstep: 1253.07 | bwd_inner_microstep: 1253.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1974
[2024-06-10 01:32:49,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.93 | bwd_microstep: 736.30 | bwd_inner_microstep: 736.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 01:32:51,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1292.81 | bwd_inner_microstep: 1292.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 01:32:52,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.38 | bwd_microstep: 793.22 | bwd_inner_microstep: 793.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3507
[2024-06-10 01:32:54,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.20 | bwd_microstep: 1225.86 | bwd_inner_microstep: 1225.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2100
[2024-06-10 01:32:55,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.20 | bwd_microstep: 854.45 | bwd_inner_microstep: 854.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 944
[2024-06-10 01:32:55,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 158.32 | bwd_microstep: 411.84 | bwd_inner_microstep: 411.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 01:32:58,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.84 | bwd_microstep: 1751.91 | bwd_inner_microstep: 1751.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2135
[2024-06-10 01:32:59,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.45 | bwd_microstep: 961.96 | bwd_inner_microstep: 961.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 01:33:01,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.82 | bwd_microstep: 1241.94 | bwd_inner_microstep: 1241.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 01:33:03,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1394.35 | bwd_inner_microstep: 1394.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-10 01:33:05,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.81 | bwd_microstep: 1417.17 | bwd_inner_microstep: 1417.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3757
[2024-06-10 01:33:07,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.44 | bwd_microstep: 1575.91 | bwd_inner_microstep: 1575.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 01:33:09,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.04 | bwd_microstep: 1287.76 | bwd_inner_microstep: 1287.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3824
[2024-06-10 01:33:11,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.28 | bwd_microstep: 1423.82 | bwd_inner_microstep: 1423.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 01:33:12,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.08 | bwd_microstep: 1259.34 | bwd_inner_microstep: 1259.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 01:33:14,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.17 | bwd_microstep: 1461.70 | bwd_inner_microstep: 1461.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 01:33:17,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.14 | bwd_microstep: 1659.84 | bwd_inner_microstep: 1659.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 01:33:19,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.14 | bwd_microstep: 1363.45 | bwd_inner_microstep: 1363.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3444
[2024-06-10 01:33:20,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.00 | bwd_microstep: 1302.31 | bwd_inner_microstep: 1302.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3611
[2024-06-10 01:33:23,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.30 | bwd_microstep: 1661.03 | bwd_inner_microstep: 1661.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063
[2024-06-10 01:33:24,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.09 | bwd_microstep: 917.65 | bwd_inner_microstep: 917.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003
[2024-06-10 01:33:25,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 900.10 | bwd_inner_microstep: 900.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3801
[2024-06-10 01:33:27,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1506.54 | bwd_inner_microstep: 1506.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 01:33:28,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.06 | bwd_microstep: 811.10 | bwd_inner_microstep: 811.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 01:33:31,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.09 | bwd_microstep: 1657.76 | bwd_inner_microstep: 1657.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 01:33:41,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.42 | optimizer_step: 6.61
[2024-06-10 01:33:41,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.67 | bwd_microstep: 9900.25 | bwd_inner_microstep: 1799.63 | bwd_allreduce_microstep: 8100.55 | step_microstep: 40.18
[2024-06-10 01:33:41,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14918.45 | bwd: 48073.09 | bwd_inner: 39971.45 | bwd_allreduce: 8100.86 | step: 42.22
{'loss': 1.3727, 'learning_rate': 3.769230769230769e-05, 'epoch': 0.03}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-10 01:33:43,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.16 | bwd_microstep: 1401.14 | bwd_inner_microstep: 1401.01 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3923
[2024-06-10 01:33:45,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1489.03 | bwd_inner_microstep: 1489.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 01:33:47,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.32 | bwd_microstep: 1378.56 | bwd_inner_microstep: 1378.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 01:33:48,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.62 | bwd_microstep: 699.70 | bwd_inner_microstep: 699.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 01:33:49,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.73 | bwd_microstep: 804.16 | bwd_inner_microstep: 804.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 01:33:51,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1422.79 | bwd_inner_microstep: 1422.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 01:33:53,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.99 | bwd_microstep: 1283.85 | bwd_inner_microstep: 1283.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3423
[2024-06-10 01:33:55,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.53 | bwd_microstep: 1216.01 | bwd_inner_microstep: 1215.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2468
[2024-06-10 01:33:56,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.68 | bwd_microstep: 954.12 | bwd_inner_microstep: 954.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 01:33:58,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.18 | bwd_microstep: 1287.94 | bwd_inner_microstep: 1287.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3673
[2024-06-10 01:34:00,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.57 | bwd_microstep: 1821.04 | bwd_inner_microstep: 1821.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032
[2024-06-10 01:34:01,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.23 | bwd_microstep: 906.99 | bwd_inner_microstep: 906.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3379
[2024-06-10 01:34:03,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.18 | bwd_microstep: 1301.89 | bwd_inner_microstep: 1301.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 01:34:05,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.97 | bwd_microstep: 1279.11 | bwd_inner_microstep: 1279.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 01:34:07,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.61 | bwd_microstep: 1637.16 | bwd_inner_microstep: 1637.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 01:34:09,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1434.71 | bwd_inner_microstep: 1434.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 01:34:11,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1393.52 | bwd_inner_microstep: 1393.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 01:34:13,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1418.76 | bwd_inner_microstep: 1418.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 01:34:15,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.21 | bwd_microstep: 1517.25 | bwd_inner_microstep: 1517.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 01:34:17,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.13 | bwd_microstep: 1527.28 | bwd_inner_microstep: 1527.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 01:34:19,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.25 | bwd_microstep: 1439.36 | bwd_inner_microstep: 1439.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 01:34:21,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.29 | bwd_microstep: 1441.16 | bwd_inner_microstep: 1441.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 01:34:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1397.07 | bwd_inner_microstep: 1397.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 01:34:25,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1464.61 | bwd_inner_microstep: 1464.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-10 01:34:27,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.20 | bwd_microstep: 1313.99 | bwd_inner_microstep: 1313.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2723
[2024-06-10 01:34:29,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.94 | bwd_microstep: 1042.14 | bwd_inner_microstep: 1042.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 01:34:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.43 | bwd_microstep: 978.81 | bwd_inner_microstep: 978.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2293
[2024-06-10 01:34:31,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.21 | bwd_microstep: 787.33 | bwd_inner_microstep: 787.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-10 01:34:32,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.33 | bwd_microstep: 811.89 | bwd_inner_microstep: 811.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031
[2024-06-10 01:34:33,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.66 | bwd_microstep: 809.97 | bwd_inner_microstep: 809.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808
[2024-06-10 01:34:35,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1387.00 | bwd_inner_microstep: 1386.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3449
[2024-06-10 01:34:45,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.42 | optimizer_step: 6.61
[2024-06-10 01:34:45,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.58 | bwd_microstep: 9327.17 | bwd_inner_microstep: 1475.53 | bwd_allreduce_microstep: 7851.56 | step_microstep: 40.18
[2024-06-10 01:34:45,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15166.62 | bwd: 48375.54 | bwd_inner: 40522.92 | bwd_allreduce: 7851.87 | step: 42.69
{'loss': 1.318, 'learning_rate': 3.846153846153846e-05, 'epoch': 0.03}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 01:34:47,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.56 | bwd_microstep: 1333.27 | bwd_inner_microstep: 1333.10 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.23
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 01:34:49,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.38 | bwd_microstep: 1472.37 | bwd_inner_microstep: 1472.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3985
[2024-06-10 01:34:51,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.16 | bwd_microstep: 1703.00 | bwd_inner_microstep: 1702.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2276
[2024-06-10 01:34:52,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.61 | bwd_microstep: 813.91 | bwd_inner_microstep: 813.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 01:34:54,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.24 | bwd_microstep: 1379.96 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 01:34:57,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.67 | bwd_microstep: 1547.66 | bwd_inner_microstep: 1547.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 01:34:58,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1245.30 | bwd_inner_microstep: 1245.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511
[2024-06-10 01:35:00,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.85 | bwd_microstep: 1193.14 | bwd_inner_microstep: 1193.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3699
[2024-06-10 01:35:02,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.10 | bwd_microstep: 1553.49 | bwd_inner_microstep: 1553.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3486
[2024-06-10 01:35:04,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.52 | bwd_microstep: 1546.34 | bwd_inner_microstep: 1546.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 01:35:06,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1385.03 | bwd_inner_microstep: 1385.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 01:35:08,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.95 | bwd_microstep: 1484.27 | bwd_inner_microstep: 1484.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 01:35:10,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.99 | bwd_microstep: 1606.05 | bwd_inner_microstep: 1606.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3466
[2024-06-10 01:35:12,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.93 | bwd_microstep: 1340.71 | bwd_inner_microstep: 1340.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501
[2024-06-10 01:35:14,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1417.65 | bwd_inner_microstep: 1417.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 01:35:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.77 | bwd_microstep: 797.96 | bwd_inner_microstep: 797.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 01:35:17,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.07 | bwd_microstep: 1291.35 | bwd_inner_microstep: 1291.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907
[2024-06-10 01:35:18,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.75 | bwd_microstep: 687.90 | bwd_inner_microstep: 687.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 01:35:20,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1558.11 | bwd_inner_microstep: 1558.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3452
[2024-06-10 01:35:22,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.31 | bwd_microstep: 1413.92 | bwd_inner_microstep: 1413.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 01:35:24,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.57 | bwd_microstep: 1494.54 | bwd_inner_microstep: 1494.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111
[2024-06-10 01:35:25,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.00 | bwd_microstep: 921.89 | bwd_inner_microstep: 921.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 01:35:27,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.18 | bwd_microstep: 979.75 | bwd_inner_microstep: 979.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 01:35:29,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 1387.64 | bwd_inner_microstep: 1387.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1460
[2024-06-10 01:35:30,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 210.99 | bwd_microstep: 545.11 | bwd_inner_microstep: 545.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3920
[2024-06-10 01:35:32,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.95 | bwd_microstep: 1697.37 | bwd_inner_microstep: 1697.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 01:35:34,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.15 | bwd_microstep: 1357.05 | bwd_inner_microstep: 1357.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3829
[2024-06-10 01:35:36,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.72 | bwd_microstep: 1423.59 | bwd_inner_microstep: 1423.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3710
[2024-06-10 01:35:38,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1364.91 | bwd_inner_microstep: 1364.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-10 01:35:40,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.47 | bwd_microstep: 1597.30 | bwd_inner_microstep: 1597.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 01:35:42,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.72 | bwd_microstep: 1646.85 | bwd_inner_microstep: 1646.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 01:35:46,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.50 | optimizer_gradients: 4.29 | optimizer_step: 6.62
[2024-06-10 01:35:46,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 3641.43 | bwd_inner_microstep: 2019.15 | bwd_allreduce_microstep: 1622.22 | step_microstep: 41.37
[2024-06-10 01:35:46,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16023.84 | bwd: 44828.84 | bwd_inner: 43205.58 | bwd_allreduce: 1622.52 | step: 43.35
 3%|▎         | 45/1726 [52:10<28:57:36, 62.02s/it]


  3%|▎         | 45/1726 [52:10<28:57:36, 62.02s/it]
  3%|▎         | 46/1726 [53:12<28:50:12, 61.79s/it]


  3%|▎         | 46/1726 [53:12<28:50:12, 61.79s/it]
  3%|▎         | 47/1726 [54:13<28:44:38, 61.63s/it]


  3%|▎         | 47/1726 [54:13<28:44:38, 61.63s/it]
  3%|▎         | 48/1726 [55:15<28:43:06, 61.61s/it]


  3%|▎         | 48/1726 [55:15<28:43:06, 61.61s/it]
  3%|▎         | 49/1726 [56:18<28:56:46, 62.14s/it]


  3%|▎         | 49/1726 [56:18<28:56:46, 62.14s/it]
  3%|▎         | 50/1726 [57:22<29:10:48, 62.68s/it]


  3%|▎         | 50/1726 [57:22<29:10:48, 62.68s/it]
  3%|▎         | 51/1726 [58:23<28:57:4{'loss': 1.3105, 'learning_rate': 3.923076923076923e-05, 'epoch': 0.03}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 01:35:48,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1438.64 | bwd_inner_microstep: 1438.44 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3475
[2024-06-10 01:35:50,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.09 | bwd_microstep: 1330.03 | bwd_inner_microstep: 1330.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 01:35:52,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.39 | bwd_microstep: 1299.94 | bwd_inner_microstep: 1299.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3900
[2024-06-10 01:35:54,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.02 | bwd_microstep: 1685.27 | bwd_inner_microstep: 1685.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 01:35:57,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1659.50 | bwd_inner_microstep: 1659.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 01:35:59,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.73 | bwd_microstep: 1641.10 | bwd_inner_microstep: 1641.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3724
[2024-06-10 01:36:01,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1396.44 | bwd_inner_microstep: 1396.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 01:36:03,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1524.95 | bwd_inner_microstep: 1524.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 01:36:05,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.81 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 01:36:06,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.33 | bwd_microstep: 1265.59 | bwd_inner_microstep: 1265.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 01:36:08,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1319.77 | bwd_inner_microstep: 1319.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-10 01:36:10,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1428.59 | bwd_inner_microstep: 1428.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513
[2024-06-10 01:36:12,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.93 | bwd_microstep: 1453.34 | bwd_inner_microstep: 1453.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 01:36:14,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.01 | bwd_microstep: 1376.45 | bwd_inner_microstep: 1376.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 01:36:16,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.31 | bwd_microstep: 1421.76 | bwd_inner_microstep: 1421.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 01:36:18,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1414.64 | bwd_inner_microstep: 1414.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 01:36:20,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1250.65 | bwd_inner_microstep: 1250.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3401
[2024-06-10 01:36:22,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.27 | bwd_microstep: 1313.21 | bwd_inner_microstep: 1313.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 01:36:24,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.71 | bwd_microstep: 1556.30 | bwd_inner_microstep: 1556.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3530
[2024-06-10 01:36:26,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.25 | bwd_microstep: 1521.89 | bwd_inner_microstep: 1521.20 | bwd_allreduce_microstep: 0.32 | step_microstep: 0.23
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 01:36:27,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.34 | bwd_microstep: 983.95 | bwd_inner_microstep: 983.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537
[2024-06-10 01:36:29,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.76 | bwd_microstep: 1593.33 | bwd_inner_microstep: 1593.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 01:36:31,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.95 | bwd_microstep: 1477.65 | bwd_inner_microstep: 1477.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3402
[2024-06-10 01:36:33,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.17 | bwd_microstep: 1310.17 | bwd_inner_microstep: 1310.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 01:36:35,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1558.73 | bwd_inner_microstep: 1558.20 | bwd_allreduce_microstep: 0.22 | step_microstep: 0.50
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 01:36:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.90 | bwd_microstep: 1164.22 | bwd_inner_microstep: 1164.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 01:36:39,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.56 | bwd_microstep: 1441.81 | bwd_inner_microstep: 1441.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2248
[2024-06-10 01:36:40,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.54 | bwd_microstep: 870.78 | bwd_inner_microstep: 870.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3838
[2024-06-10 01:36:43,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.87 | bwd_microstep: 1724.45 | bwd_inner_microstep: 1724.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 01:36:45,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.49 | bwd_microstep: 1656.42 | bwd_inner_microstep: 1656.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 01:36:47,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.68 | bwd_microstep: 1517.58 | bwd_inner_microstep: 1517.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 01:36:50,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.00 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 01:36:50,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.73 | bwd_microstep: 2068.78 | bwd_inner_microstep: 1650.32 | bwd_allreduce_microstep: 418.41 | step_microstep: 38.96
[2024-06-10 01:36:50,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16980.28 | bwd: 45953.31 | bwd_inner: 45532.83 | bwd_allreduce: 419.23 | step: 41.68
{'loss': 1.361, 'learning_rate': 4e-05, 'epoch': 0.03}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 01:36:51,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.37 | bwd_microstep: 793.16 | bwd_inner_microstep: 793.03 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 4069
[2024-06-10 01:36:53,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.35 | bwd_microstep: 1674.56 | bwd_inner_microstep: 1674.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 01:36:55,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1257.21 | bwd_inner_microstep: 1257.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2606
[2024-06-10 01:36:56,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.31 | bwd_microstep: 1067.94 | bwd_inner_microstep: 1067.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 01:36:58,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.77 | bwd_microstep: 1357.12 | bwd_inner_microstep: 1357.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 01:37:00,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.96 | bwd_microstep: 1442.78 | bwd_inner_microstep: 1442.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3776
[2024-06-10 01:37:02,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.67 | bwd_microstep: 1351.65 | bwd_inner_microstep: 1351.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3407
[2024-06-10 01:37:04,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1299.31 | bwd_inner_microstep: 1299.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3731
[2024-06-10 01:37:06,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.11 | bwd_microstep: 1339.67 | bwd_inner_microstep: 1339.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2179
[2024-06-10 01:37:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.04 | bwd_microstep: 889.25 | bwd_inner_microstep: 889.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 01:37:09,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1485.20 | bwd_inner_microstep: 1485.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 01:37:11,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.69 | bwd_microstep: 1350.32 | bwd_inner_microstep: 1350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3703
[2024-06-10 01:37:13,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.64 | bwd_microstep: 1680.61 | bwd_inner_microstep: 1680.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3495
[2024-06-10 01:37:15,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.32 | bwd_microstep: 1348.90 | bwd_inner_microstep: 1348.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3689
[2024-06-10 01:37:17,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1490.90 | bwd_inner_microstep: 1490.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 01:37:19,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.44 | bwd_microstep: 1456.03 | bwd_inner_microstep: 1456.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 01:37:21,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.65 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 01:37:22,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.92 | bwd_microstep: 814.23 | bwd_inner_microstep: 814.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 957
[2024-06-10 01:37:23,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.98 | bwd_microstep: 382.63 | bwd_inner_microstep: 382.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 01:37:25,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1388.05 | bwd_inner_microstep: 1388.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2019
[2024-06-10 01:37:26,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.36 | bwd_microstep: 720.05 | bwd_inner_microstep: 720.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 01:37:28,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.33 | bwd_microstep: 1538.84 | bwd_inner_microstep: 1538.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-10 01:37:30,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.56 | bwd_microstep: 1493.31 | bwd_inner_microstep: 1493.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287
[2024-06-10 01:37:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.56 | bwd_microstep: 882.83 | bwd_inner_microstep: 882.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782
[2024-06-10 01:37:33,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.83 | bwd_microstep: 1449.26 | bwd_inner_microstep: 1449.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3617
[2024-06-10 01:37:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1650.34 | bwd_inner_microstep: 1650.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 01:37:37,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.38 | bwd_microstep: 884.58 | bwd_inner_microstep: 884.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 01:37:38,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.94 | bwd_microstep: 967.58 | bwd_inner_microstep: 967.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3004
[2024-06-10 01:37:40,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.70 | bwd_microstep: 1210.40 | bwd_inner_microstep: 1210.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3821
[2024-06-10 01:37:42,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.76 | bwd_microstep: 1574.74 | bwd_inner_microstep: 1574.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2045
[2024-06-10 01:37:43,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.66 | bwd_microstep: 970.30 | bwd_inner_microstep: 970.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771
[2024-06-10 01:37:52,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.44 | optimizer_step: 6.62
[2024-06-10 01:37:52,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.02 | bwd_microstep: 7819.58 | bwd_inner_microstep: 1625.12 | bwd_allreduce_microstep: 6194.38 | step_microstep: 40.14
[2024-06-10 01:37:52,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15052.56 | bwd: 46538.14 | bwd_inner: 40342.72 | bwd_allreduce: 6194.67 | step: 42.12
{'loss': 1.2792, 'learning_rate': 3.9999964780051985e-05, 'epoch': 0.03}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 01:37:53,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1285.47 | bwd_inner_microstep: 1285.27 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.21
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1864
[2024-06-10 01:37:54,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.38 | bwd_microstep: 739.35 | bwd_inner_microstep: 739.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3900
[2024-06-10 01:37:57,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.38 | bwd_microstep: 1686.03 | bwd_inner_microstep: 1686.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 01:37:59,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3767
[2024-06-10 01:38:01,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1404.22 | bwd_inner_microstep: 1404.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 01:38:03,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.28 | bwd_microstep: 1540.68 | bwd_inner_microstep: 1540.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-10 01:38:05,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.29 | bwd_microstep: 1639.52 | bwd_inner_microstep: 1639.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 01:38:07,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1249.23 | bwd_inner_microstep: 1249.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 01:38:08,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1250.50 | bwd_inner_microstep: 1250.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 01:38:10,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.27 | bwd_microstep: 1254.34 | bwd_inner_microstep: 1254.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 01:38:12,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 01:38:14,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 01:38:16,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1491.26 | bwd_inner_microstep: 1491.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 01:38:18,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3380
[2024-06-10 01:38:20,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1435.59 | bwd_inner_microstep: 1435.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 01:38:22,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1521.11 | bwd_inner_microstep: 1521.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 01:38:24,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.87 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1987
[2024-06-10 01:38:25,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.96 | bwd_microstep: 896.95 | bwd_inner_microstep: 896.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 01:38:27,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1411.32 | bwd_inner_microstep: 1411.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 01:38:29,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1496.03 | bwd_inner_microstep: 1496.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 01:38:31,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.58 | bwd_microstep: 1259.30 | bwd_inner_microstep: 1259.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 01:38:33,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.00 | bwd_microstep: 1410.09 | bwd_inner_microstep: 1410.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3556
[2024-06-10 01:38:35,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.74 | bwd_microstep: 1548.81 | bwd_inner_microstep: 1548.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2268
[2024-06-10 01:38:36,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.03 | bwd_microstep: 940.67 | bwd_inner_microstep: 940.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 01:38:38,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1548.04 | bwd_inner_microstep: 1548.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 01:38:40,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.93 | bwd_microstep: 1304.62 | bwd_inner_microstep: 1304.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3600
[2024-06-10 01:38:42,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.56 | bwd_microstep: 1458.19 | bwd_inner_microstep: 1458.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 01:38:44,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 01:38:46,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.15 | bwd_microstep: 1501.38 | bwd_inner_microstep: 1501.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 01:38:48,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.15 | bwd_microstep: 1652.79 | bwd_inner_microstep: 1652.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 01:38:50,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.68 | bwd_microstep: 1546.15 | bwd_inner_microstep: 1546.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-10 01:38:55,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.80 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 01:38:55,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.18 | bwd_microstep: 3915.95 | bwd_inner_microstep: 1647.88 | bwd_allreduce_microstep: 2268.01 | step_microstep: 41.65
[2024-06-10 01:38:55,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16452.28 | bwd: 46420.38 | bwd_inner: 44151.31 | bwd_allreduce: 2268.32 | step: 43.90
{'loss': 1.4033, 'learning_rate': 3.9999859120331985e-05, 'epoch': 0.03}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3532
[2024-06-10 01:38:57,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.70 | bwd_microstep: 1592.80 | bwd_inner_microstep: 1592.59 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 01:38:58,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 787.48 | bwd_inner_microstep: 787.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 01:39:00,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1346.30 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2708
[2024-06-10 01:39:01,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.42 | bwd_microstep: 991.27 | bwd_inner_microstep: 991.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258
[2024-06-10 01:39:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.05 | bwd_microstep: 968.58 | bwd_inner_microstep: 968.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 01:39:05,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1282.31 | bwd_inner_microstep: 1282.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 859
[2024-06-10 01:39:05,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 139.00 | bwd_microstep: 354.11 | bwd_inner_microstep: 354.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2885
[2024-06-10 01:39:07,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.99 | bwd_microstep: 1126.84 | bwd_inner_microstep: 1126.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 01:39:08,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1251.55 | bwd_inner_microstep: 1251.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 01:39:10,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.07 | bwd_microstep: 1258.90 | bwd_inner_microstep: 1258.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432
[2024-06-10 01:39:12,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.85 | bwd_microstep: 1285.35 | bwd_inner_microstep: 1285.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 754
[2024-06-10 01:39:12,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.70 | bwd_microstep: 306.90 | bwd_inner_microstep: 306.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 01:39:14,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1388.52 | bwd_inner_microstep: 1388.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 01:39:16,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.02 | bwd_microstep: 1490.42 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388
[2024-06-10 01:39:18,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1444.96 | bwd_inner_microstep: 1444.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3671
[2024-06-10 01:39:21,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.56 | bwd_microstep: 1588.10 | bwd_inner_microstep: 1588.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 01:39:23,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.43 | bwd_microstep: 1518.54 | bwd_inner_microstep: 1518.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 01:39:25,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 01:39:27,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1462.86 | bwd_inner_microstep: 1462.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 01:39:29,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.21 | bwd_microstep: 1515.51 | bwd_inner_microstep: 1515.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 01:39:30,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.85 | bwd_microstep: 881.24 | bwd_inner_microstep: 881.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3856
[2024-06-10 01:39:32,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.25 | bwd_microstep: 1603.70 | bwd_inner_microstep: 1603.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 01:39:34,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.32 | bwd_microstep: 1561.02 | bwd_inner_microstep: 1560.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 01:39:36,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.95 | bwd_microstep: 1191.28 | bwd_inner_microstep: 1191.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-10 01:39:38,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1498.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 01:39:40,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 1461.61 | bwd_inner_microstep: 1461.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-10 01:39:42,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.07 | bwd_microstep: 1438.40 | bwd_inner_microstep: 1438.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3608
[2024-06-10 01:39:44,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.39 | bwd_microstep: 1708.09 | bwd_inner_microstep: 1708.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3769
[2024-06-10 01:39:47,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.32 | bwd_microstep: 1746.57 | bwd_inner_microstep: 1746.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814
[2024-06-10 01:39:49,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.22 | bwd_microstep: 1729.03 | bwd_inner_microstep: 1729.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3430
[2024-06-10 01:39:51,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.59 | bwd_microstep: 1542.46 | bwd_inner_microstep: 1542.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1945
[2024-06-10 01:39:56,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.37 | optimizer_step: 6.60
[2024-06-10 01:39:56,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.17 | bwd_microstep: 4422.12 | bwd_inner_microstep: 872.12 | bwd_allreduce_microstep: 3549.95 | step_microstep: 39.71
[2024-06-10 01:39:56,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15516.16 | bwd: 45143.29 | bwd_inner: 41592.26 | bwd_allreduce: 3550.26 | step: 41.71
{'loss': 1.4116, 'learning_rate': 3.9999683021212134e-05, 'epoch': 0.03}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 01:39:58,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1469.25 | bwd_inner_microstep: 1469.12 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 01:40:00,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.09 | bwd_microstep: 1273.71 | bwd_inner_microstep: 1273.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3399
[2024-06-10 01:40:01,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.30 | bwd_microstep: 1183.68 | bwd_inner_microstep: 1183.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 01:40:03,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.45 | bwd_microstep: 1296.82 | bwd_inner_microstep: 1296.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 01:40:05,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 01:40:07,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1529.19 | bwd_inner_microstep: 1529.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-10 01:40:09,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1550.08 | bwd_inner_microstep: 1550.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888
[2024-06-10 01:40:10,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.98 | bwd_microstep: 683.45 | bwd_inner_microstep: 683.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 01:40:12,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.16 | bwd_microstep: 1292.56 | bwd_inner_microstep: 1292.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428
[2024-06-10 01:40:14,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1286.27 | bwd_inner_microstep: 1286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-10 01:40:16,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1419.03 | bwd_inner_microstep: 1419.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3672
[2024-06-10 01:40:18,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.27 | bwd_microstep: 1530.09 | bwd_inner_microstep: 1530.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 01:40:20,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1489.61 | bwd_inner_microstep: 1489.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-10 01:40:22,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1414.68 | bwd_inner_microstep: 1414.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 01:40:24,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1432.44 | bwd_inner_microstep: 1432.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 01:40:25,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 974.33 | bwd_inner_microstep: 974.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 01:40:27,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 01:40:29,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1413.73 | bwd_inner_microstep: 1413.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 01:40:31,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 01:40:33,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.43 | bwd_microstep: 1300.16 | bwd_inner_microstep: 1300.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 01:40:35,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.46 | bwd_microstep: 1510.55 | bwd_inner_microstep: 1510.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 01:40:37,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.11 | bwd_microstep: 1506.16 | bwd_inner_microstep: 1505.71 | bwd_allreduce_microstep: 0.25 | step_microstep: 0.33
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 01:40:39,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.33 | bwd_microstep: 1297.38 | bwd_inner_microstep: 1297.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2146
[2024-06-10 01:40:40,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.12 | bwd_microstep: 820.91 | bwd_inner_microstep: 820.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 01:40:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.24 | bwd_microstep: 1157.51 | bwd_inner_microstep: 1157.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 01:40:44,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1550.07 | bwd_inner_microstep: 1550.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 01:40:46,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.60 | bwd_microstep: 1396.21 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 01:40:48,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1645.32 | bwd_inner_microstep: 1645.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3598
[2024-06-10 01:40:50,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.40 | bwd_microstep: 1639.53 | bwd_inner_microstep: 1639.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 01:40:52,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.83 | bwd_microstep: 1452.83 | bwd_inner_microstep: 1452.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3575
[2024-06-10 01:40:55,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.76 | bwd_microstep: 1652.20 | bwd_inner_microstep: 1652.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 01:40:57,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.23 | optimizer_step: 6.64
[2024-06-10 01:40:57,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.94 | bwd_microstep: 1537.82 | bwd_inner_microstep: 1529.80 | bwd_allreduce_microstep: 7.97 | step_microstep: 38.63
[2024-06-10 01:40:57,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16442.97 | bwd: 43866.60 | bwd_inner: 43857.26 | bwd_allreduce: 8.50 | step: 40.85
{'loss': 1.3953, 'learning_rate': 3.999943648331265e-05, 'epoch': 0.03}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 01:40:59,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.25 | bwd_microstep: 1568.78 | bwd_inner_microstep: 1568.64 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3941
[2024-06-10 01:41:01,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.44 | bwd_microstep: 1701.11 | bwd_inner_microstep: 1701.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 01:41:03,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.32 | bwd_microstep: 1585.87 | bwd_inner_microstep: 1585.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2349
[2024-06-10 01:41:05,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.11 | bwd_microstep: 988.85 | bwd_inner_microstep: 988.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3860
[2024-06-10 01:41:07,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.91 | bwd_microstep: 1667.99 | bwd_inner_microstep: 1667.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 01:41:09,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.32 | bwd_microstep: 1482.94 | bwd_inner_microstep: 1482.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 01:41:11,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1385.88 | bwd_inner_microstep: 1385.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4082
[2024-06-10 01:41:13,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.74 | bwd_microstep: 1729.01 | bwd_inner_microstep: 1728.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-10 01:41:15,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.04 | bwd_microstep: 1440.45 | bwd_inner_microstep: 1440.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3715
[2024-06-10 01:41:17,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.25 | bwd_microstep: 1464.79 | bwd_inner_microstep: 1464.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 01:41:19,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1412.96 | bwd_inner_microstep: 1412.86 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.26
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 01:41:21,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.11 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1050
[2024-06-10 01:41:22,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 149.18 | bwd_microstep: 377.13 | bwd_inner_microstep: 377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482
[2024-06-10 01:41:24,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.69 | bwd_microstep: 1315.24 | bwd_inner_microstep: 1315.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3471
[2024-06-10 01:41:25,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.02 | bwd_microstep: 1251.45 | bwd_inner_microstep: 1251.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3667
[2024-06-10 01:41:27,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.87 | bwd_microstep: 1472.65 | bwd_inner_microstep: 1472.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 01:41:29,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.65 | bwd_microstep: 1479.29 | bwd_inner_microstep: 1479.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3825
[2024-06-10 01:41:32,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.12 | bwd_microstep: 1604.21 | bwd_inner_microstep: 1604.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-10 01:41:33,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.80 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 01:41:35,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.61 | bwd_microstep: 1522.68 | bwd_inner_microstep: 1522.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 01:41:38,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.64 | bwd_microstep: 1603.23 | bwd_inner_microstep: 1603.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 01:41:39,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1254.84 | bwd_inner_microstep: 1254.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2307
[2024-06-10 01:41:41,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.31 | bwd_microstep: 981.32 | bwd_inner_microstep: 981.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-10 01:41:43,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1411.58 | bwd_inner_microstep: 1411.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 01:41:44,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.33 | bwd_microstep: 1355.68 | bwd_inner_microstep: 1355.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 01:41:46,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.68 | bwd_microstep: 1380.92 | bwd_inner_microstep: 1380.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 01:41:48,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.07 | bwd_microstep: 1188.06 | bwd_inner_microstep: 1188.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 01:41:50,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1402.00 | bwd_inner_microstep: 1401.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 01:41:52,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.66 | bwd_microstep: 1635.80 | bwd_inner_microstep: 1635.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 01:41:55,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1644.86 | bwd_inner_microstep: 1644.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 01:41:57,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.01 | bwd_microstep: 1636.28 | bwd_inner_microstep: 1636.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3803
[2024-06-10 01:41:59,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.77 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 01:41:59,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.67 | bwd_microstep: 1657.60 | bwd_inner_microstep: 1649.54 | bwd_allreduce_microstep: 8.01 | step_microstep: 39.83
[2024-06-10 01:41:59,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16863.15 | bwd: 45137.23 | bwd_inner: 45128.02 | bwd_allreduce: 8.39 | step: 42.56
1, 62.25s/it]


  3%|▎         | 51/1726 [58:23<28:57:41, 62.25s/it]
  3%|▎         | 52/1726 [59:26<29:05:50, 62.57s/it]


  3%|▎         | 52/1726 [59:26<29:05:50, 62.57s/it]
  3%|▎         | 53/1726 [1:00:28<28:59:42, 62.39s/it]


  3%|▎         | 53/1726 [1:00:28<28:59:42, 62.39s/it]
  3%|▎         | 54/1726 [1:01:32<29:06:06, 62.66s/it]


  3%|▎         | 54/1726 [1:01:32<29:06:06, 62.66s/it]
  3%|▎         | 55/1726 [1:02:33<28:51:37, 62.18s/it]


  3%|▎         | 55/1726 [1:02:33<28:51:37, 62.18s/it]
  3%|▎         | 56/1726 [1:03:33<28:38:13, 61.73s/it]


  3%|▎         | 56/1726 [1:03:33<28:38:13, 61.73s/it]
  3%|▎         | 57/1726 [1:04:36<28:42:48, 61.93s/it]
{'loss': 1.3489, 'learning_rate': 3.999911950750184e-05, 'epoch': 0.03}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 01:42:00,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 783.93 | bwd_inner_microstep: 783.78 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4063
[2024-06-10 01:42:02,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.85 | bwd_microstep: 1521.47 | bwd_inner_microstep: 1521.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3884
[2024-06-10 01:42:04,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1587.03 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811
[2024-06-10 01:42:06,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.41 | bwd_microstep: 1416.24 | bwd_inner_microstep: 1416.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 01:42:08,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1250.29 | bwd_inner_microstep: 1250.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-10 01:42:10,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.40 | bwd_microstep: 1191.80 | bwd_inner_microstep: 1191.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 01:42:12,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1349.72 | bwd_inner_microstep: 1349.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 01:42:13,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.91 | bwd_microstep: 1291.77 | bwd_inner_microstep: 1291.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 01:42:15,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.68 | bwd_microstep: 1316.17 | bwd_inner_microstep: 1316.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1945
[2024-06-10 01:42:17,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.01 | bwd_microstep: 917.51 | bwd_inner_microstep: 917.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 01:42:19,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1430.41 | bwd_inner_microstep: 1430.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1932
[2024-06-10 01:42:20,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.08 | bwd_microstep: 818.49 | bwd_inner_microstep: 818.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3961
[2024-06-10 01:42:22,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.09 | bwd_microstep: 1799.59 | bwd_inner_microstep: 1799.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3508
[2024-06-10 01:42:24,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1518.56 | bwd_inner_microstep: 1518.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 01:42:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.66 | bwd_microstep: 1493.07 | bwd_inner_microstep: 1493.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-10 01:42:27,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.26 | bwd_microstep: 783.81 | bwd_inner_microstep: 783.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2644
[2024-06-10 01:42:29,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.08 | bwd_microstep: 1053.34 | bwd_inner_microstep: 1053.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3838
[2024-06-10 01:42:31,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.19 | bwd_microstep: 1423.32 | bwd_inner_microstep: 1423.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3561
[2024-06-10 01:42:33,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1267.97 | bwd_inner_microstep: 1267.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537
[2024-06-10 01:42:35,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.86 | bwd_microstep: 1426.86 | bwd_inner_microstep: 1426.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 01:42:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.79 | bwd_microstep: 1535.37 | bwd_inner_microstep: 1535.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 01:42:39,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1404.57 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597
[2024-06-10 01:42:41,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1436.48 | bwd_inner_microstep: 1436.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3481
[2024-06-10 01:42:43,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1530.17 | bwd_inner_microstep: 1530.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3433
[2024-06-10 01:42:44,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.22 | bwd_microstep: 1237.82 | bwd_inner_microstep: 1237.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 01:42:46,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-10 01:42:47,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.77 | bwd_microstep: 703.77 | bwd_inner_microstep: 703.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124
[2024-06-10 01:42:49,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.40 | bwd_microstep: 929.67 | bwd_inner_microstep: 929.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 01:42:51,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.51 | bwd_microstep: 1503.37 | bwd_inner_microstep: 1503.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 01:42:52,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1252.81 | bwd_inner_microstep: 1252.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 01:42:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.53 | bwd_microstep: 1616.38 | bwd_inner_microstep: 1616.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035
[2024-06-10 01:43:01,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.42 | optimizer_step: 6.60
[2024-06-10 01:43:01,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.64 | bwd_microstep: 6034.72 | bwd_inner_microstep: 932.06 | bwd_allreduce_microstep: 5102.58 | step_microstep: 40.14
[2024-06-10 01:43:01,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15410.07 | bwd: 46208.91 | bwd_inner: 41105.27 | bwd_allreduce: 5102.89 | step: 42.64
{'loss': 1.3421, 'learning_rate': 3.9998732094896084e-05, 'epoch': 0.03}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 01:43:03,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.24 | bwd_microstep: 1138.26 | bwd_inner_microstep: 1138.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3941
[2024-06-10 01:43:05,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.02 | bwd_microstep: 1694.92 | bwd_inner_microstep: 1694.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3469
[2024-06-10 01:43:07,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1428.63 | bwd_inner_microstep: 1428.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 01:43:09,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.03 | bwd_microstep: 1483.43 | bwd_inner_microstep: 1483.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 01:43:11,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.40 | bwd_microstep: 1380.81 | bwd_inner_microstep: 1380.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2038
[2024-06-10 01:43:12,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.53 | bwd_microstep: 810.32 | bwd_inner_microstep: 810.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 01:43:14,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.19 | bwd_microstep: 1536.05 | bwd_inner_microstep: 1536.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 01:43:16,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.15 | bwd_microstep: 1286.13 | bwd_inner_microstep: 1286.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 01:43:18,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.08 | bwd_microstep: 1648.74 | bwd_inner_microstep: 1648.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 01:43:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1314.55 | bwd_inner_microstep: 1314.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 01:43:22,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.25 | bwd_microstep: 1504.98 | bwd_inner_microstep: 1504.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 01:43:24,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.74 | bwd_microstep: 1392.68 | bwd_inner_microstep: 1392.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2152
[2024-06-10 01:43:25,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.63 | bwd_microstep: 854.54 | bwd_inner_microstep: 854.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3606
[2024-06-10 01:43:28,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.11 | bwd_microstep: 1708.22 | bwd_inner_microstep: 1708.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3489
[2024-06-10 01:43:29,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.50 | bwd_microstep: 1345.85 | bwd_inner_microstep: 1345.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-10 01:43:31,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.41 | bwd_microstep: 961.05 | bwd_inner_microstep: 961.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2014
[2024-06-10 01:43:32,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.80 | bwd_microstep: 712.74 | bwd_inner_microstep: 712.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123
[2024-06-10 01:43:33,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.46 | bwd_microstep: 837.78 | bwd_inner_microstep: 837.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 01:43:35,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1296.88 | bwd_inner_microstep: 1296.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3664
[2024-06-10 01:43:37,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1328.00 | bwd_inner_microstep: 1327.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2283
[2024-06-10 01:43:38,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.13 | bwd_microstep: 788.09 | bwd_inner_microstep: 788.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 01:43:39,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.49 | bwd_microstep: 807.44 | bwd_inner_microstep: 807.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 01:43:41,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.11 | bwd_microstep: 1306.37 | bwd_inner_microstep: 1306.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 01:43:42,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.08 | bwd_microstep: 1303.84 | bwd_inner_microstep: 1303.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921
[2024-06-10 01:43:43,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.10 | bwd_microstep: 728.97 | bwd_inner_microstep: 728.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 3299
[2024-06-10 01:43:45,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.81 | bwd_microstep: 1157.05 | bwd_inner_microstep: 1157.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-10 01:43:47,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.49 | bwd_microstep: 1597.46 | bwd_inner_microstep: 1597.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3794
[2024-06-10 01:43:49,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.08 | bwd_microstep: 1584.92 | bwd_inner_microstep: 1584.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2518
[2024-06-10 01:43:51,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.20 | bwd_microstep: 1060.35 | bwd_inner_microstep: 1060.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787
[2024-06-10 01:43:53,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.81 | bwd_microstep: 1451.46 | bwd_inner_microstep: 1451.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586
[2024-06-10 01:43:55,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1340.30 | bwd_inner_microstep: 1340.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801
[2024-06-10 01:44:04,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.45 | optimizer_step: 6.60
[2024-06-10 01:44:04,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.20 | bwd_microstep: 8401.97 | bwd_inner_microstep: 1992.34 | bwd_allreduce_microstep: 6409.56 | step_microstep: 40.23
[2024-06-10 01:44:04,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15233.17 | bwd: 47192.82 | bwd_inner: 40782.22 | bwd_allreduce: 6409.86 | step: 42.32
{'loss': 1.3462, 'learning_rate': 3.999827424685986e-05, 'epoch': 0.03}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 01:44:06,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1399.55 | bwd_inner_microstep: 1399.38 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4039
[2024-06-10 01:44:08,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.59 | bwd_microstep: 1614.87 | bwd_inner_microstep: 1614.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479
[2024-06-10 01:44:10,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.05 | bwd_microstep: 1217.57 | bwd_inner_microstep: 1217.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 01:44:12,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1250.77 | bwd_inner_microstep: 1250.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 01:44:14,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1478.36 | bwd_inner_microstep: 1478.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-10 01:44:16,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.85 | bwd_microstep: 1649.86 | bwd_inner_microstep: 1649.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 01:44:17,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.83 | bwd_microstep: 793.39 | bwd_inner_microstep: 793.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491
[2024-06-10 01:44:19,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.42 | bwd_microstep: 1507.98 | bwd_inner_microstep: 1507.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531
[2024-06-10 01:44:21,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1585.79 | bwd_inner_microstep: 1585.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 01:44:23,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1418.31 | bwd_inner_microstep: 1418.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 01:44:25,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.04 | bwd_microstep: 1484.86 | bwd_inner_microstep: 1484.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 01:44:27,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.75 | bwd_microstep: 1252.91 | bwd_inner_microstep: 1252.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-10 01:44:29,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.75 | bwd_microstep: 1438.51 | bwd_inner_microstep: 1438.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 01:44:31,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.51 | bwd_microstep: 1478.16 | bwd_inner_microstep: 1478.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 01:44:33,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.77 | bwd_microstep: 1621.89 | bwd_inner_microstep: 1621.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-10 01:44:35,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1372.71 | bwd_inner_microstep: 1372.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 01:44:37,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.32 | bwd_microstep: 1419.29 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-10 01:44:38,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.95 | bwd_microstep: 704.97 | bwd_inner_microstep: 704.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 01:44:39,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.49 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 957
[2024-06-10 01:44:40,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.89 | bwd_microstep: 383.19 | bwd_inner_microstep: 383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 01:44:42,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.61 | bwd_microstep: 1655.30 | bwd_inner_microstep: 1655.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 01:44:44,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1389.37 | bwd_inner_microstep: 1389.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 01:44:46,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1287.63 | bwd_inner_microstep: 1287.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3864
[2024-06-10 01:44:48,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.39 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3806
[2024-06-10 01:44:50,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.03 | bwd_microstep: 1577.94 | bwd_inner_microstep: 1577.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 01:44:52,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.74 | bwd_microstep: 1256.64 | bwd_inner_microstep: 1256.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2272
[2024-06-10 01:44:53,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.23 | bwd_microstep: 875.51 | bwd_inner_microstep: 875.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 01:44:55,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.46 | bwd_microstep: 1608.96 | bwd_inner_microstep: 1608.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3616
[2024-06-10 01:44:57,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.19 | bwd_microstep: 1444.45 | bwd_inner_microstep: 1444.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2240
[2024-06-10 01:44:58,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.60 | bwd_microstep: 867.40 | bwd_inner_microstep: 867.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 01:45:00,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.23 | bwd_microstep: 1534.53 | bwd_inner_microstep: 1534.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 01:45:06,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.42 | optimizer_step: 6.62
[2024-06-10 01:45:06,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.01 | bwd_microstep: 5499.10 | bwd_inner_microstep: 1579.34 | bwd_allreduce_microstep: 3919.69 | step_microstep: 40.03
[2024-06-10 01:45:06,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15833.92 | bwd: 46360.41 | bwd_inner: 42439.66 | bwd_allreduce: 3920.00 | step: 42.39
{'loss': 1.3605, 'learning_rate': 3.99977459650057e-05, 'epoch': 0.03}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2913
[2024-06-10 01:45:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.42 | bwd_microstep: 1214.36 | bwd_inner_microstep: 1214.17 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1850
[2024-06-10 01:45:09,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.84 | bwd_microstep: 706.87 | bwd_inner_microstep: 706.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 01:45:11,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.02 | bwd_microstep: 1282.19 | bwd_inner_microstep: 1282.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654
[2024-06-10 01:45:13,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.96 | bwd_microstep: 1426.77 | bwd_inner_microstep: 1426.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 01:45:15,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.43 | bwd_microstep: 1285.22 | bwd_inner_microstep: 1285.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 01:45:16,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.42 | bwd_microstep: 681.96 | bwd_inner_microstep: 681.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 01:45:17,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1289.59 | bwd_inner_microstep: 1289.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 01:45:20,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.32 | bwd_microstep: 1533.66 | bwd_inner_microstep: 1533.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 01:45:21,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1387.99 | bwd_inner_microstep: 1387.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 01:45:23,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.05 | bwd_microstep: 1252.84 | bwd_inner_microstep: 1252.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1904
[2024-06-10 01:45:24,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.78 | bwd_microstep: 779.14 | bwd_inner_microstep: 779.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 01:45:26,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.39 | bwd_microstep: 1585.34 | bwd_inner_microstep: 1585.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111
[2024-06-10 01:45:28,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.53 | bwd_microstep: 921.15 | bwd_inner_microstep: 921.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1985
[2024-06-10 01:45:29,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.59 | bwd_microstep: 835.81 | bwd_inner_microstep: 835.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 01:45:31,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.21 | bwd_microstep: 1616.59 | bwd_inner_microstep: 1616.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516
[2024-06-10 01:45:33,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.96 | bwd_microstep: 1194.06 | bwd_inner_microstep: 1194.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-10 01:45:35,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.67 | bwd_microstep: 1347.58 | bwd_inner_microstep: 1347.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 01:45:37,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1464.51 | bwd_inner_microstep: 1464.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 01:45:39,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1356.24 | bwd_inner_microstep: 1356.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 01:45:41,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.89 | bwd_microstep: 1591.10 | bwd_inner_microstep: 1591.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 01:45:43,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.61 | bwd_microstep: 1561.60 | bwd_inner_microstep: 1561.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 01:45:45,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.43 | bwd_microstep: 1257.33 | bwd_inner_microstep: 1257.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3724
[2024-06-10 01:45:46,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1342.14 | bwd_inner_microstep: 1342.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 01:45:48,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.17 | bwd_microstep: 804.66 | bwd_inner_microstep: 804.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3775
[2024-06-10 01:45:50,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.13 | bwd_microstep: 1576.42 | bwd_inner_microstep: 1576.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609
[2024-06-10 01:45:52,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.96 | bwd_microstep: 1538.90 | bwd_inner_microstep: 1538.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449
[2024-06-10 01:45:54,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.62 | bwd_microstep: 1318.90 | bwd_inner_microstep: 1318.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3779
[2024-06-10 01:45:56,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1384.29 | bwd_inner_microstep: 1384.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2251
[2024-06-10 01:45:57,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.95 | bwd_microstep: 1067.42 | bwd_inner_microstep: 1067.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 01:45:59,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1403.31 | bwd_inner_microstep: 1403.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575
[2024-06-10 01:46:01,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 1422.28 | bwd_inner_microstep: 1422.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3801
[2024-06-10 01:46:08,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.41 | optimizer_step: 6.59
[2024-06-10 01:46:08,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.80 | bwd_microstep: 6017.61 | bwd_inner_microstep: 1701.70 | bwd_allreduce_microstep: 4315.83 | step_microstep: 40.09
[2024-06-10 01:46:08,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15344.25 | bwd: 45447.85 | bwd_inner: 41130.93 | bwd_allreduce: 4316.16 | step: 42.70
{'loss': 1.389, 'learning_rate': 3.99971472511942e-05, 'epoch': 0.04}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 01:46:09,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.87 | bwd_microstep: 801.71 | bwd_inner_microstep: 801.49 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 01:46:11,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4281
[2024-06-10 01:46:13,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.39 | bwd_microstep: 1504.83 | bwd_inner_microstep: 1504.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296
[2024-06-10 01:46:14,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.48 | bwd_microstep: 972.54 | bwd_inner_microstep: 972.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 01:46:16,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1252.41 | bwd_inner_microstep: 1252.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-10 01:46:17,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.47 | bwd_microstep: 1222.80 | bwd_inner_microstep: 1222.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 01:46:19,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 01:46:21,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1482.52 | bwd_inner_microstep: 1482.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 01:46:22,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.40 | bwd_microstep: 794.35 | bwd_inner_microstep: 794.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 01:46:24,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.37 | bwd_microstep: 1486.79 | bwd_inner_microstep: 1486.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 01:46:26,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.31 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3415
[2024-06-10 01:46:28,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1395.06 | bwd_inner_microstep: 1395.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 01:46:30,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.33 | bwd_microstep: 1348.35 | bwd_inner_microstep: 1348.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687
[2024-06-10 01:46:32,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.72 | bwd_microstep: 1522.93 | bwd_inner_microstep: 1522.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1919
[2024-06-10 01:46:33,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.57 | bwd_microstep: 734.20 | bwd_inner_microstep: 734.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3516
[2024-06-10 01:46:35,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.40 | bwd_microstep: 1546.36 | bwd_inner_microstep: 1546.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3934
[2024-06-10 01:46:38,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.10 | bwd_microstep: 1705.26 | bwd_inner_microstep: 1705.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 01:46:39,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.64 | bwd_microstep: 1189.38 | bwd_inner_microstep: 1189.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 01:46:41,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1315.91 | bwd_inner_microstep: 1315.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3821
[2024-06-10 01:46:43,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1325.92 | bwd_inner_microstep: 1325.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 01:46:45,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.24 | bwd_microstep: 1611.85 | bwd_inner_microstep: 1611.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3454
[2024-06-10 01:46:47,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1418.46 | bwd_inner_microstep: 1418.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 01:46:49,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1284.21 | bwd_inner_microstep: 1284.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 01:46:51,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.80 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3817
[2024-06-10 01:46:53,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.51 | bwd_microstep: 1421.96 | bwd_inner_microstep: 1421.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 01:46:55,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1348.60 | bwd_inner_microstep: 1348.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2768
[2024-06-10 01:46:56,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.87 | bwd_microstep: 1147.07 | bwd_inner_microstep: 1147.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 01:46:58,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.02 | bwd_microstep: 1407.12 | bwd_inner_microstep: 1407.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 01:47:00,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.76 | bwd_microstep: 1543.23 | bwd_inner_microstep: 1543.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 01:47:03,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1594.95 | bwd_inner_microstep: 1594.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3594
[2024-06-10 01:47:04,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1338.23 | bwd_inner_microstep: 1338.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 01:47:09,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.42 | optimizer_step: 6.62
[2024-06-10 01:47:09,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 3668.44 | bwd_inner_microstep: 1693.98 | bwd_allreduce_microstep: 1974.39 | step_microstep: 39.94
[2024-06-10 01:47:09,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15946.22 | bwd: 44646.57 | bwd_inner: 42671.09 | bwd_allreduce: 1974.72 | step: 42.37
{'loss': 1.357, 'learning_rate': 3.999647810753404e-05, 'epoch': 0.04}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 01:47:11,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1338.03 | bwd_inner_microstep: 1337.88 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 01:47:12,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.03 | bwd_microstep: 1390.90 | bwd_inner_microstep: 1390.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262
[2024-06-10 01:47:14,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.93 | bwd_microstep: 970.29 | bwd_inner_microstep: 970.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3811
[2024-06-10 01:47:16,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1318.07 | bwd_inner_microstep: 1318.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 01:47:18,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.88 | bwd_microstep: 1384.64 | bwd_inner_microstep: 1384.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 01:47:19,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1393.25 | bwd_inner_microstep: 1393.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431
[2024-06-10 01:47:21,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.44 | bwd_microstep: 1155.41 | bwd_inner_microstep: 1155.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 01:47:23,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.99 | bwd_microstep: 1287.03 | bwd_inner_microstep: 1287.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 01:47:24,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.39 | bwd_microstep: 796.94 | bwd_inner_microstep: 796.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 01:47:26,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.09 | bwd_microstep: 1251.04 | bwd_inner_microstep: 1251.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2936
[2024-06-10 01:47:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.89 | bwd_microstep: 1038.08 | bwd_inner_microstep: 1038.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 01:47:29,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.60 | bwd_microstep: 1527.68 | bwd_inner_microstep: 1527.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 01:47:31,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.09 | bwd_microstep: 1583.23 | bwd_inner_microstep: 1583.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2391
[2024-06-10 01:47:33,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.16 | bwd_microstep: 1003.12 | bwd_inner_microstep: 1003.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 01:47:34,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.41 | bwd_microstep: 796.57 | bwd_inner_microstep: 796.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 01:47:36,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.25 | bwd_microstep: 1508.80 | bwd_inner_microstep: 1508.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 01:47:37,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.42 | bwd_microstep: 704.39 | bwd_inner_microstep: 704.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 01:47:39,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1413.29 | bwd_inner_microstep: 1413.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 01:47:41,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1430.35 | bwd_inner_microstep: 1430.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 01:47:43,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 1409.58 | bwd_inner_microstep: 1409.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 01:47:45,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.71 | bwd_microstep: 1506.15 | bwd_inner_microstep: 1506.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3576
[2024-06-10 01:47:47,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.74 | bwd_microstep: 1240.82 | bwd_inner_microstep: 1240.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 01:47:49,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.34 | bwd_microstep: 1490.24 | bwd_inner_microstep: 1490.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 01:47:51,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1415.34 | bwd_inner_microstep: 1415.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 01:47:53,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1399.89 | bwd_inner_microstep: 1399.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 01:47:55,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.15 | bwd_microstep: 1582.51 | bwd_inner_microstep: 1582.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 01:47:57,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.19 | bwd_microstep: 1314.26 | bwd_inner_microstep: 1314.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 01:47:59,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1420.95 | bwd_inner_microstep: 1420.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 01:48:01,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.28 | bwd_microstep: 1457.65 | bwd_inner_microstep: 1457.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 01:48:03,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.20 | bwd_microstep: 1502.23 | bwd_inner_microstep: 1502.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 01:48:05,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.40 | bwd_microstep: 1538.63 | bwd_inner_microstep: 1538.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2922
[2024-06-10 01:48:12,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.43 | optimizer_step: 6.59
[2024-06-10 01:48:12,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 6761.45 | bwd_inner_microstep: 1465.60 | bwd_allreduce_microstep: 5295.78 | step_microstep: 40.14
[2024-06-10 01:48:12,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15750.89 | bwd: 47330.83 | bwd_inner: 42034.00 | bwd_allreduce: 5296.09 | step: 42.55


  3%|▎         | 57/1726 [1:04:36<28:42:48, 61.93s/it]
  3%|▎         | 58/1726 [1:05:38<28:42:22, 61.96s/it]


  3%|▎         | 58/1726 [1:05:38<28:42:22, 61.96s/it]
  3%|▎         | 59/1726 [1:06:41<28:48:26, 62.21s/it]


  3%|▎         | 59/1726 [1:06:41<28:48:26, 62.21s/it]
  3%|▎         | 60/1726 [1:07:43<28:50:28, 62.32s/it]


  3%|▎         | 60/1726 [1:07:43<28:50:28, 62.32s/it]
  4%|▎         | 61/1726 [1:08:44<28:39:57, 61.98s/it]


  4%|▎         | 61/1726 [1:08:44<28:39:57, 61.98s/it]
  4%|▎         | 62/1726 [1:09:45<28:30:35, 61.68s/it]


  4%|▎         | 62/1726 [1:09:45<28:30:35, 61.68s/it]
  4%|▎         | 63/1726 [1:10:49<28:44:25, 62.22s/it]
    {'loss': 1.3692, 'learning_rate': 3.999573853638194e-05, 'epoch': 0.04}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 01:48:14,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 01:48:16,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.97 | bwd_microstep: 1492.19 | bwd_inner_microstep: 1492.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2375
[2024-06-10 01:48:17,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.38 | bwd_microstep: 899.09 | bwd_inner_microstep: 899.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 01:48:19,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1379.09 | bwd_inner_microstep: 1379.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2308
[2024-06-10 01:48:21,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.40 | bwd_microstep: 945.60 | bwd_inner_microstep: 945.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 01:48:23,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1654.07 | bwd_inner_microstep: 1654.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 01:48:25,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1396.98 | bwd_inner_microstep: 1396.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 01:48:27,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 01:48:29,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.45 | bwd_microstep: 1628.28 | bwd_inner_microstep: 1628.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 01:48:31,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1253.08 | bwd_inner_microstep: 1253.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502
[2024-06-10 01:48:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1347.20 | bwd_inner_microstep: 1347.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 01:48:34,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.10 | bwd_microstep: 1293.90 | bwd_inner_microstep: 1293.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1667
[2024-06-10 01:48:35,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 250.30 | bwd_microstep: 663.96 | bwd_inner_microstep: 663.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 01:48:37,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1413.06 | bwd_inner_microstep: 1413.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 01:48:39,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.39 | bwd_microstep: 1293.17 | bwd_inner_microstep: 1293.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 01:48:41,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.74 | bwd_microstep: 1513.10 | bwd_inner_microstep: 1513.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1987
[2024-06-10 01:48:42,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.04 | bwd_microstep: 895.40 | bwd_inner_microstep: 895.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3396
[2024-06-10 01:48:44,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3824
[2024-06-10 01:48:46,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.79 | bwd_microstep: 1306.69 | bwd_inner_microstep: 1306.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 01:48:48,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.09 | bwd_microstep: 1659.27 | bwd_inner_microstep: 1659.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 01:48:50,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1249.57 | bwd_inner_microstep: 1249.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 01:48:52,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1499.60 | bwd_inner_microstep: 1499.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 01:48:54,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1380.87 | bwd_inner_microstep: 1380.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 01:48:56,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.65 | bwd_microstep: 1285.20 | bwd_inner_microstep: 1285.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492
[2024-06-10 01:48:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.89 | bwd_microstep: 1430.85 | bwd_inner_microstep: 1430.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2022
[2024-06-10 01:48:59,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.86 | bwd_microstep: 842.99 | bwd_inner_microstep: 842.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3563
[2024-06-10 01:49:01,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1363.63 | bwd_inner_microstep: 1363.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3162
[2024-06-10 01:49:03,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.73 | bwd_microstep: 1200.09 | bwd_inner_microstep: 1200.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2948
[2024-06-10 01:49:04,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.42 | bwd_microstep: 1173.01 | bwd_inner_microstep: 1172.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484
[2024-06-10 01:49:06,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.47 | bwd_microstep: 1579.39 | bwd_inner_microstep: 1579.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 01:49:08,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.17 | bwd_microstep: 1499.64 | bwd_inner_microstep: 1499.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 01:49:15,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.46 | optimizer_step: 6.60
[2024-06-10 01:49:15,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.52 | bwd_microstep: 5856.81 | bwd_inner_microstep: 1812.16 | bwd_allreduce_microstep: 4044.57 | step_microstep: 39.99
[2024-06-10 01:49:15,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15883.46 | bwd: 46603.73 | bwd_inner: 42558.05 | bwd_allreduce: 4044.91 | step: 42.24
{'loss': 1.3083, 'learning_rate': 3.999492854034266e-05, 'epoch': 0.04}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537
[2024-06-10 01:49:17,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.44 | bwd_microstep: 1589.49 | bwd_inner_microstep: 1589.29 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.19
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 01:49:19,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.73 | bwd_microstep: 1376.66 | bwd_inner_microstep: 1376.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850
[2024-06-10 01:49:21,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.66 | bwd_microstep: 1557.83 | bwd_inner_microstep: 1557.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-10 01:49:23,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.03 | bwd_microstep: 1450.51 | bwd_inner_microstep: 1450.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-10 01:49:25,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.08 | bwd_microstep: 1546.74 | bwd_inner_microstep: 1546.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2912
[2024-06-10 01:49:27,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.15 | bwd_microstep: 1187.93 | bwd_inner_microstep: 1187.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 01:49:29,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.46 | bwd_microstep: 1386.73 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 01:49:30,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 791.77 | bwd_inner_microstep: 791.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 01:49:32,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.90 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 01:49:34,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.17 | bwd_microstep: 1354.41 | bwd_inner_microstep: 1354.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2428
[2024-06-10 01:49:35,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.54 | bwd_microstep: 945.41 | bwd_inner_microstep: 945.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 01:49:37,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.96 | bwd_microstep: 1260.60 | bwd_inner_microstep: 1260.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3667
[2024-06-10 01:49:39,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1406.16 | bwd_inner_microstep: 1406.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-10 01:49:41,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1382.50 | bwd_inner_microstep: 1382.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 01:49:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1344.88 | bwd_inner_microstep: 1344.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485
[2024-06-10 01:49:44,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.33 | bwd_microstep: 1344.27 | bwd_inner_microstep: 1344.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527
[2024-06-10 01:49:47,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.74 | bwd_microstep: 1585.00 | bwd_inner_microstep: 1584.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 01:49:48,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 1342.83 | bwd_inner_microstep: 1342.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3428
[2024-06-10 01:49:51,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1546.52 | bwd_inner_microstep: 1546.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896
[2024-06-10 01:49:52,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.79 | bwd_microstep: 777.79 | bwd_inner_microstep: 777.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667
[2024-06-10 01:49:53,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1326.93 | bwd_inner_microstep: 1326.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 01:49:55,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1494.22 | bwd_inner_microstep: 1494.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 01:49:57,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1284.85 | bwd_inner_microstep: 1284.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 01:49:59,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1396.32 | bwd_inner_microstep: 1396.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2057
[2024-06-10 01:50:00,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.43 | bwd_microstep: 914.28 | bwd_inner_microstep: 914.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433
[2024-06-10 01:50:02,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.87 | bwd_microstep: 1377.69 | bwd_inner_microstep: 1377.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 01:50:04,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1461.51 | bwd_inner_microstep: 1461.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3805
[2024-06-10 01:50:06,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1478.85 | bwd_inner_microstep: 1478.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 01:50:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1517.74 | bwd_inner_microstep: 1517.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581
[2024-06-10 01:50:11,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.18 | bwd_microstep: 1424.90 | bwd_inner_microstep: 1424.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 01:50:12,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1345.74 | bwd_inner_microstep: 1345.60 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 01:50:16,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.27 | optimizer_step: 6.58
[2024-06-10 01:50:16,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.13 | bwd_microstep: 2942.99 | bwd_inner_microstep: 1683.44 | bwd_allreduce_microstep: 1259.49 | step_microstep: 39.23
[2024-06-10 01:50:16,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16125.37 | bwd: 44427.48 | bwd_inner: 43166.80 | bwd_allreduce: 1259.90 | step: 41.92
{'loss': 1.3737, 'learning_rate': 3.999404812226901e-05, 'epoch': 0.04}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 01:50:18,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.76 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.64 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3931
[2024-06-10 01:50:20,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1395.00 | bwd_inner_microstep: 1394.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3884
[2024-06-10 01:50:22,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.10 | bwd_microstep: 1488.70 | bwd_inner_microstep: 1488.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 01:50:24,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.61 | bwd_microstep: 1481.26 | bwd_inner_microstep: 1481.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3828
[2024-06-10 01:50:26,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1406.30 | bwd_inner_microstep: 1406.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 01:50:28,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1249.46 | bwd_inner_microstep: 1249.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-10 01:50:30,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1434.45 | bwd_inner_microstep: 1434.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 01:50:31,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.33 | bwd_microstep: 1187.86 | bwd_inner_microstep: 1187.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3503
[2024-06-10 01:50:33,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.86 | bwd_microstep: 1352.44 | bwd_inner_microstep: 1352.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3612
[2024-06-10 01:50:35,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.07 | bwd_microstep: 1473.74 | bwd_inner_microstep: 1473.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 01:50:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.94 | bwd_microstep: 1426.90 | bwd_inner_microstep: 1426.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3595
[2024-06-10 01:50:39,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.36 | bwd_microstep: 1272.68 | bwd_inner_microstep: 1272.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1999
[2024-06-10 01:50:40,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.02 | bwd_microstep: 897.14 | bwd_inner_microstep: 897.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 01:50:42,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.71 | bwd_microstep: 1346.92 | bwd_inner_microstep: 1346.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3665
[2024-06-10 01:50:44,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 1553.12 | bwd_inner_microstep: 1553.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652
[2024-06-10 01:50:46,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.13 | bwd_microstep: 1719.69 | bwd_inner_microstep: 1719.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 01:50:48,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1392.62 | bwd_inner_microstep: 1392.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 01:50:50,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.29 | bwd_microstep: 1487.78 | bwd_inner_microstep: 1487.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 01:50:52,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1396.29 | bwd_inner_microstep: 1396.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 01:50:54,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.13 | bwd_microstep: 1355.93 | bwd_inner_microstep: 1355.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 01:50:56,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.33 | bwd_microstep: 1261.73 | bwd_inner_microstep: 1261.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 01:50:58,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.31 | bwd_microstep: 1613.91 | bwd_inner_microstep: 1613.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 01:51:00,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1284.52 | bwd_inner_microstep: 1284.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 01:51:02,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.16 | bwd_microstep: 1558.84 | bwd_inner_microstep: 1558.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 01:51:04,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.79 | bwd_microstep: 1285.81 | bwd_inner_microstep: 1285.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2061
[2024-06-10 01:51:05,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.75 | bwd_microstep: 852.84 | bwd_inner_microstep: 852.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-10 01:51:07,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1407.97 | bwd_inner_microstep: 1407.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 01:51:09,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1348.82 | bwd_inner_microstep: 1348.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 01:51:11,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.79 | bwd_microstep: 1451.76 | bwd_inner_microstep: 1451.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2780
[2024-06-10 01:51:12,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.35 | bwd_microstep: 1083.88 | bwd_inner_microstep: 1083.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3600
[2024-06-10 01:51:15,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.88 | bwd_microstep: 1707.46 | bwd_inner_microstep: 1707.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 01:51:16,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.97 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 01:51:16,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.47 | bwd_microstep: 1159.80 | bwd_inner_microstep: 938.71 | bwd_allreduce_microstep: 221.04 | step_microstep: 39.43
[2024-06-10 01:51:16,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16274.75 | bwd: 43722.49 | bwd_inner: 43500.37 | bwd_allreduce: 221.36 | step: 41.67
{'loss': 1.357, 'learning_rate': 3.999309728526181e-05, 'epoch': 0.04}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3398
[2024-06-10 01:51:18,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.89 | bwd_microstep: 1303.29 | bwd_inner_microstep: 1303.07 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937
[2024-06-10 01:51:20,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.80 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3871
[2024-06-10 01:51:22,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.05 | bwd_microstep: 1568.38 | bwd_inner_microstep: 1568.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-10 01:51:24,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1486.08 | bwd_inner_microstep: 1486.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-10 01:51:27,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.73 | bwd_microstep: 1566.20 | bwd_inner_microstep: 1566.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 01:51:28,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.34 | bwd_microstep: 789.96 | bwd_inner_microstep: 789.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 01:51:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.14 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 01:51:31,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.85 | bwd_microstep: 1249.24 | bwd_inner_microstep: 1249.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4151
[2024-06-10 01:51:33,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.62 | bwd_microstep: 1541.24 | bwd_inner_microstep: 1541.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 01:51:35,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.50 | bwd_microstep: 802.65 | bwd_inner_microstep: 800.36 | bwd_allreduce_microstep: 2.02 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 01:51:36,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.70 | bwd_microstep: 1288.56 | bwd_inner_microstep: 1288.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 01:51:37,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.32 | bwd_microstep: 799.76 | bwd_inner_microstep: 799.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441
[2024-06-10 01:51:39,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1414.61 | bwd_inner_microstep: 1414.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3462
[2024-06-10 01:51:41,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.48 | bwd_microstep: 1330.22 | bwd_inner_microstep: 1330.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 01:51:43,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.15 | bwd_microstep: 894.21 | bwd_inner_microstep: 894.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 01:51:45,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.27 | bwd_microstep: 1526.73 | bwd_inner_microstep: 1526.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 01:51:46,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.32 | bwd_microstep: 1258.14 | bwd_inner_microstep: 1258.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2434
[2024-06-10 01:51:48,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.80 | bwd_microstep: 976.08 | bwd_inner_microstep: 976.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3691
[2024-06-10 01:51:50,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.76 | bwd_microstep: 1458.94 | bwd_inner_microstep: 1458.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3826
[2024-06-10 01:51:52,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1392.17 | bwd_inner_microstep: 1392.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295
[2024-06-10 01:51:53,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.35 | bwd_microstep: 881.36 | bwd_inner_microstep: 881.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 01:51:55,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1559.97 | bwd_inner_microstep: 1559.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807
[2024-06-10 01:51:57,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.32 | bwd_microstep: 1360.63 | bwd_inner_microstep: 1360.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2101
[2024-06-10 01:51:58,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.37 | bwd_microstep: 921.67 | bwd_inner_microstep: 921.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287
[2024-06-10 01:51:59,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.62 | bwd_microstep: 876.83 | bwd_inner_microstep: 876.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3718
[2024-06-10 01:52:01,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1384.52 | bwd_inner_microstep: 1384.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2057
[2024-06-10 01:52:02,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.75 | bwd_microstep: 754.56 | bwd_inner_microstep: 754.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 01:52:04,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1355.76 | bwd_inner_microstep: 1355.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 01:52:06,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1243.73 | bwd_inner_microstep: 1243.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471
[2024-06-10 01:52:08,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.64 | bwd_microstep: 1441.79 | bwd_inner_microstep: 1441.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3740
[2024-06-10 01:52:10,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.76 | bwd_microstep: 1664.40 | bwd_inner_microstep: 1664.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2235
[2024-06-10 01:52:16,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.43 | optimizer_step: 6.59
[2024-06-10 01:52:16,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.15 | bwd_microstep: 5464.80 | bwd_inner_microstep: 1212.87 | bwd_allreduce_microstep: 4251.86 | step_microstep: 40.17
[2024-06-10 01:52:16,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15014.81 | bwd: 44431.03 | bwd_inner: 40175.78 | bwd_allreduce: 4254.30 | step: 42.68
{'loss': 1.3958, 'learning_rate': 3.9992076032669905e-05, 'epoch': 0.04}
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1436
[2024-06-10 01:52:17,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.78 | bwd_microstep: 501.90 | bwd_inner_microstep: 501.74 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 01:52:19,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1404.10 | bwd_inner_microstep: 1404.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3809
[2024-06-10 01:52:21,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.05 | bwd_microstep: 1412.88 | bwd_inner_microstep: 1412.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 01:52:23,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.05 | bwd_inner_microstep: 1382.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 01:52:24,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1248.86 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 01:52:26,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.46 | bwd_microstep: 1378.77 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 01:52:28,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.98 | bwd_microstep: 1389.06 | bwd_inner_microstep: 1389.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 01:52:29,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.50 | bwd_microstep: 791.84 | bwd_inner_microstep: 791.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 01:52:31,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1282.90 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2934
[2024-06-10 01:52:33,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.44 | bwd_microstep: 1195.59 | bwd_inner_microstep: 1195.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2011
[2024-06-10 01:52:34,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.47 | bwd_microstep: 898.03 | bwd_inner_microstep: 898.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 01:52:36,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3651
[2024-06-10 01:52:38,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.53 | bwd_microstep: 1651.73 | bwd_inner_microstep: 1651.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962
[2024-06-10 01:52:39,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.31 | bwd_microstep: 893.84 | bwd_inner_microstep: 893.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3410
[2024-06-10 01:52:42,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.56 | bwd_microstep: 1509.13 | bwd_inner_microstep: 1509.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912
[2024-06-10 01:52:43,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.85 | bwd_microstep: 688.10 | bwd_inner_microstep: 688.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 01:52:44,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-10 01:52:46,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.02 | bwd_microstep: 1346.25 | bwd_inner_microstep: 1346.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2296
[2024-06-10 01:52:47,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.37 | bwd_microstep: 786.41 | bwd_inner_microstep: 786.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 01:52:49,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.10 | bwd_microstep: 978.72 | bwd_inner_microstep: 978.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 01:52:51,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.01 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533
[2024-06-10 01:52:52,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.61 | bwd_microstep: 1230.99 | bwd_inner_microstep: 1230.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 01:52:54,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1256.24 | bwd_inner_microstep: 1256.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-10 01:52:56,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1408.72 | bwd_inner_microstep: 1408.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 01:52:58,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.69 | bwd_microstep: 1614.01 | bwd_inner_microstep: 1613.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 01:53:00,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.06 | bwd_microstep: 1494.74 | bwd_inner_microstep: 1494.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 01:53:01,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.06 | bwd_microstep: 880.21 | bwd_inner_microstep: 880.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 01:53:04,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1475.15 | bwd_inner_microstep: 1475.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3805
[2024-06-10 01:53:06,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1495.32 | bwd_inner_microstep: 1495.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2231
[2024-06-10 01:53:07,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.80 | bwd_microstep: 930.61 | bwd_inner_microstep: 930.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-10 01:53:09,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 1439.22 | bwd_inner_microstep: 1439.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3474
[2024-06-10 01:53:17,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.43 | optimizer_step: 6.61
[2024-06-10 01:53:17,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 7956.24 | bwd_inner_microstep: 1628.35 | bwd_allreduce_microstep: 6327.81 | step_microstep: 40.17
[2024-06-10 01:53:17,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14883.49 | bwd: 46005.98 | bwd_inner: 39677.08 | bwd_allreduce: 6328.13 | step: 42.67
{'loss': 1.319, 'learning_rate': 3.999098436809014e-05, 'epoch': 0.04}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-10 01:53:19,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 1430.94 | bwd_inner_microstep: 1430.72 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.18
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414
[2024-06-10 01:53:21,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1371.95 | bwd_inner_microstep: 1371.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509
[2024-06-10 01:53:23,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.64 | bwd_microstep: 1453.44 | bwd_inner_microstep: 1453.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865
[2024-06-10 01:53:25,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.41 | bwd_microstep: 1470.98 | bwd_inner_microstep: 1470.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2630
[2024-06-10 01:53:27,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.89 | bwd_microstep: 1021.84 | bwd_inner_microstep: 1021.78 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 01:53:29,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.85 | bwd_microstep: 1403.08 | bwd_inner_microstep: 1403.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2928
[2024-06-10 01:53:30,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.31 | bwd_microstep: 1095.86 | bwd_inner_microstep: 1095.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 01:53:32,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1541.23 | bwd_inner_microstep: 1541.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 01:53:34,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.15 | bwd_microstep: 1535.05 | bwd_inner_microstep: 1535.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 01:53:36,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1393.92 | bwd_inner_microstep: 1393.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3994
[2024-06-10 01:53:39,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1610.59 | bwd_inner_microstep: 1610.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 01:53:40,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.60 | bwd_microstep: 808.16 | bwd_inner_microstep: 808.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 01:53:42,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.11 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3509
[2024-06-10 01:53:44,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.83 | bwd_microstep: 1552.26 | bwd_inner_microstep: 1552.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553
[2024-06-10 01:53:46,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.69 | bwd_microstep: 1427.71 | bwd_inner_microstep: 1427.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492
[2024-06-10 01:53:48,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1444.19 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-10 01:53:50,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.82 | bwd_microstep: 1318.37 | bwd_inner_microstep: 1318.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 01:53:52,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.06 | bwd_microstep: 1264.58 | bwd_inner_microstep: 1264.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3548
[2024-06-10 01:53:53,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.61 | bwd_microstep: 1234.84 | bwd_inner_microstep: 1234.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 01:53:55,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1400.52 | bwd_inner_microstep: 1400.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2535
[2024-06-10 01:53:57,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.23 | bwd_microstep: 997.16 | bwd_inner_microstep: 997.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 01:53:58,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.07 | bwd_microstep: 1360.85 | bwd_inner_microstep: 1360.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 01:54:00,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1297.61 | bwd_inner_microstep: 1297.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 01:54:02,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.12 | bwd_microstep: 1404.55 | bwd_inner_microstep: 1404.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1993
[2024-06-10 01:54:03,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.38 | bwd_microstep: 836.50 | bwd_inner_microstep: 836.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2098
[2024-06-10 01:54:05,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.24 | bwd_microstep: 970.21 | bwd_inner_microstep: 970.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3810
[2024-06-10 01:54:07,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1499.55 | bwd_inner_microstep: 1499.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 01:54:09,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.55 | bwd_microstep: 1595.43 | bwd_inner_microstep: 1595.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2233
[2024-06-10 01:54:10,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.78 | bwd_microstep: 836.80 | bwd_inner_microstep: 836.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 01:54:12,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.33 | bwd_microstep: 1650.68 | bwd_inner_microstep: 1650.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 01:54:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1346.18 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3742
[2024-06-10 01:54:19,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 5.38 | optimizer_step: 6.61
[2024-06-10 01:54:19,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.24 | bwd_microstep: 4498.52 | bwd_inner_microstep: 1973.03 | bwd_allreduce_microstep: 2525.41 | step_microstep: 40.99
[2024-06-10 01:54:19,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16083.01 | bwd: 45597.98 | bwd_inner: 43071.37 | bwd_allreduce: 2525.78 | step: 43.47


  4%|▎         | 63/1726 [1:10:49<28:44:25, 62.22s/it]
  4%|▎         | 64/1726 [1:11:52<28:48:50, 62.41s/it]


  4%|▎         | 64/1726 [1:11:52<28:48:50, 62.41s/it]
  4%|▍         | 65/1726 [1:12:53<28:35:36, 61.97s/it]


  4%|▍         | 65/1726 [1:12:53<28:35:36, 61.97s/it]
  4%|▍         | 66/1726 [1:13:53<28:21:23, 61.50s/it]


  4%|▍         | 66/1726 [1:13:53<28:21:23, 61.50s/it]
  4%|▍         | 67/1726 [1:14:53<28:06:37, 61.00s/it]


  4%|▍         | 67/1726 [1:14:53<28:06:37, 61.00s/it]
  4%|▍         | 68/1726 [1:15:54<28:07:52, 61.08s/it]


  4%|▍         | 68/1726 [1:15:54<28:07:52, 61.08s/it]
  4%|▍         | 69/1726 [1:16:56<28:15:04, 61.38s/it]
        {'loss': 1.3631, 'learning_rate': 3.9989822295367356e-05, 'epoch': 0.04}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 01:54:21,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1366.37 | bwd_inner_microstep: 1366.21 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 01:54:23,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3869
[2024-06-10 01:54:25,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1563.72 | bwd_inner_microstep: 1563.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2295
[2024-06-10 01:54:27,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.24 | bwd_microstep: 938.24 | bwd_inner_microstep: 938.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 01:54:29,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.15 | bwd_microstep: 1489.69 | bwd_inner_microstep: 1489.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 01:54:30,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 796.14 | bwd_inner_microstep: 796.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 01:54:32,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1150.76 | bwd_inner_microstep: 1150.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789
[2024-06-10 01:54:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1455.33 | bwd_inner_microstep: 1455.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 01:54:35,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.71 | bwd_microstep: 1253.31 | bwd_inner_microstep: 1253.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 01:54:37,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1491.48 | bwd_inner_microstep: 1491.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3456
[2024-06-10 01:54:39,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.08 | bwd_microstep: 1195.15 | bwd_inner_microstep: 1195.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3578
[2024-06-10 01:54:41,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.99 | bwd_microstep: 1459.18 | bwd_inner_microstep: 1459.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3514
[2024-06-10 01:54:43,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1370.76 | bwd_inner_microstep: 1370.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 01:54:45,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1384.80 | bwd_inner_microstep: 1384.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 01:54:47,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1606.01 | bwd_inner_microstep: 1605.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550
[2024-06-10 01:54:49,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.83 | bwd_microstep: 1588.74 | bwd_inner_microstep: 1588.55 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3635
[2024-06-10 01:54:51,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.75 | bwd_microstep: 1315.72 | bwd_inner_microstep: 1315.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1480
[2024-06-10 01:54:52,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 211.78 | bwd_microstep: 546.39 | bwd_inner_microstep: 546.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1996
[2024-06-10 01:54:53,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.79 | bwd_microstep: 713.36 | bwd_inner_microstep: 713.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 01:54:55,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.56 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 01:54:56,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1248.03 | bwd_inner_microstep: 1248.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 01:54:58,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.34 | bwd_microstep: 819.45 | bwd_inner_microstep: 819.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182
[2024-06-10 01:54:59,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.87 | bwd_microstep: 858.65 | bwd_inner_microstep: 858.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 01:55:01,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1552.82 | bwd_inner_microstep: 1552.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 01:55:03,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1391.89 | bwd_inner_microstep: 1391.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-10 01:55:05,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1302.71 | bwd_inner_microstep: 1302.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 01:55:07,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1408.20 | bwd_inner_microstep: 1408.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 01:55:09,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.43 | bwd_microstep: 1658.72 | bwd_inner_microstep: 1658.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787
[2024-06-10 01:55:11,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.91 | bwd_microstep: 1445.18 | bwd_inner_microstep: 1445.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574
[2024-06-10 01:55:13,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-10 01:55:15,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.33 | bwd_microstep: 1365.22 | bwd_inner_microstep: 1365.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 01:55:20,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 6.68 | optimizer_step: 6.61
[2024-06-10 01:55:20,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.46 | bwd_microstep: 4112.83 | bwd_inner_microstep: 1531.23 | bwd_allreduce_microstep: 2581.52 | step_microstep: 42.35
[2024-06-10 01:55:20,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15601.65 | bwd: 44176.54 | bwd_inner: 41593.79 | bwd_allreduce: 2581.91 | step: 44.73
{'loss': 1.3589, 'learning_rate': 3.998858981859436e-05, 'epoch': 0.04}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 01:55:21,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.91 | bwd_microstep: 787.91 | bwd_inner_microstep: 787.78 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650
[2024-06-10 01:55:23,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.03 | bwd_microstep: 1423.86 | bwd_inner_microstep: 1423.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 01:55:25,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.28 | bwd_microstep: 1565.28 | bwd_inner_microstep: 1565.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 01:55:27,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1283.49 | bwd_inner_microstep: 1283.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 01:55:29,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.61 | bwd_microstep: 1642.34 | bwd_inner_microstep: 1642.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 01:55:31,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1282.45 | bwd_inner_microstep: 1282.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 01:55:33,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1405.76 | bwd_inner_microstep: 1405.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 01:55:35,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1414.31 | bwd_inner_microstep: 1414.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 01:55:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1254.38 | bwd_inner_microstep: 1254.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 01:55:38,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.53 | bwd_microstep: 1258.43 | bwd_inner_microstep: 1258.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 01:55:40,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.34 | bwd_microstep: 1627.12 | bwd_inner_microstep: 1627.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683
[2024-06-10 01:55:42,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1330.93 | bwd_inner_microstep: 1330.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2128
[2024-06-10 01:55:44,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.87 | bwd_microstep: 956.65 | bwd_inner_microstep: 956.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 01:55:45,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1287.98 | bwd_inner_microstep: 1287.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 01:55:47,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.94 | bwd_microstep: 1498.34 | bwd_inner_microstep: 1498.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 01:55:49,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.21 | bwd_microstep: 1481.03 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 01:55:51,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1476.19 | bwd_inner_microstep: 1476.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650
[2024-06-10 01:55:53,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.66 | bwd_microstep: 1418.66 | bwd_inner_microstep: 1418.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 01:55:55,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1447.38 | bwd_inner_microstep: 1447.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 01:55:57,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1410.39 | bwd_inner_microstep: 1410.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3621
[2024-06-10 01:55:59,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.37 | bwd_microstep: 1539.74 | bwd_inner_microstep: 1539.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521
[2024-06-10 01:56:01,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.28 | bwd_microstep: 1204.39 | bwd_inner_microstep: 1204.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 929
[2024-06-10 01:56:02,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 149.74 | bwd_microstep: 380.32 | bwd_inner_microstep: 380.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3608
[2024-06-10 01:56:04,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 1445.36 | bwd_inner_microstep: 1445.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2915
[2024-06-10 01:56:05,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.19 | bwd_microstep: 1005.86 | bwd_inner_microstep: 1005.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 01:56:07,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1557.92 | bwd_inner_microstep: 1557.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 01:56:09,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1285.95 | bwd_inner_microstep: 1285.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 01:56:11,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.42 | bwd_microstep: 1464.65 | bwd_inner_microstep: 1464.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3594
[2024-06-10 01:56:13,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1431.26 | bwd_inner_microstep: 1431.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2240
[2024-06-10 01:56:14,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.62 | bwd_microstep: 898.81 | bwd_inner_microstep: 898.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3803
[2024-06-10 01:56:17,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.78 | bwd_microstep: 1689.87 | bwd_inner_microstep: 1689.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 01:56:23,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.45 | optimizer_step: 6.58
[2024-06-10 01:56:23,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 5706.09 | bwd_inner_microstep: 1525.78 | bwd_allreduce_microstep: 4180.24 | step_microstep: 40.25
[2024-06-10 01:56:23,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15950.30 | bwd: 46863.15 | bwd_inner: 42681.87 | bwd_allreduce: 4180.54 | step: 42.39
{'loss': 1.303, 'learning_rate': 3.9987286942111946e-05, 'epoch': 0.04}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 01:56:24,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 787.09 | bwd_inner_microstep: 786.87 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.15
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3021
[2024-06-10 01:56:26,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.91 | bwd_microstep: 1238.70 | bwd_inner_microstep: 1238.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 01:56:27,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.45 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 01:56:30,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.63 | bwd_microstep: 1617.02 | bwd_inner_microstep: 1617.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 01:56:31,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1278.05 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 01:56:33,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1288.12 | bwd_inner_microstep: 1288.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 01:56:34,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.36 | bwd_microstep: 793.69 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 01:56:36,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.30 | bwd_microstep: 1388.54 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 01:56:38,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.48 | bwd_microstep: 1155.71 | bwd_inner_microstep: 1155.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1962
[2024-06-10 01:56:39,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.45 | bwd_microstep: 736.94 | bwd_inner_microstep: 736.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 01:56:41,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.58 | bwd_microstep: 1537.96 | bwd_inner_microstep: 1537.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1976
[2024-06-10 01:56:42,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.55 | bwd_microstep: 857.28 | bwd_inner_microstep: 857.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486
[2024-06-10 01:56:44,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.60 | bwd_microstep: 1582.53 | bwd_inner_microstep: 1582.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3517
[2024-06-10 01:56:46,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.55 | bwd_microstep: 1451.45 | bwd_inner_microstep: 1451.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3662
[2024-06-10 01:56:49,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1546.42 | bwd_inner_microstep: 1546.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3397
[2024-06-10 01:56:50,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.32 | bwd_microstep: 1180.53 | bwd_inner_microstep: 1180.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-10 01:56:52,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.14 | bwd_microstep: 1427.32 | bwd_inner_microstep: 1427.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 01:56:54,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1411.49 | bwd_inner_microstep: 1411.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 01:56:56,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.48 | bwd_microstep: 1581.34 | bwd_inner_microstep: 1581.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3631
[2024-06-10 01:56:58,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1560.77 | bwd_inner_microstep: 1560.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 01:57:00,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.35 | bwd_microstep: 1437.95 | bwd_inner_microstep: 1437.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 01:57:02,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.84 | bwd_microstep: 1404.02 | bwd_inner_microstep: 1403.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-10 01:57:04,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.59 | bwd_microstep: 1197.67 | bwd_inner_microstep: 1197.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 01:57:06,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1555.77 | bwd_inner_microstep: 1555.50 | bwd_allreduce_microstep: 0.19 | step_microstep: 0.29
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 01:57:08,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.83 | bwd_microstep: 1656.39 | bwd_inner_microstep: 1656.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 01:57:11,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1562.76 | bwd_inner_microstep: 1562.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3745
[2024-06-10 01:57:13,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 1447.39 | bwd_inner_microstep: 1447.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-10 01:57:14,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.79 | bwd_microstep: 704.32 | bwd_inner_microstep: 704.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2280
[2024-06-10 01:57:15,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.36 | bwd_microstep: 1070.28 | bwd_inner_microstep: 1070.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3584
[2024-06-10 01:57:17,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1566.95 | bwd_inner_microstep: 1566.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3386
[2024-06-10 01:57:19,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1506.07 | bwd_inner_microstep: 1506.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405
[2024-06-10 01:57:25,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.45 | optimizer_step: 6.59
[2024-06-10 01:57:25,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 5176.79 | bwd_inner_microstep: 1638.41 | bwd_allreduce_microstep: 3538.31 | step_microstep: 42.02
[2024-06-10 01:57:25,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15847.84 | bwd: 45987.57 | bwd_inner: 42447.88 | bwd_allreduce: 3538.86 | step: 44.56
{'loss': 1.2859, 'learning_rate': 3.9985913670508816e-05, 'epoch': 0.04}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3417
[2024-06-10 01:57:27,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.88 | bwd_microstep: 1176.42 | bwd_inner_microstep: 1176.24 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 01:57:29,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.04 | bwd_microstep: 1484.41 | bwd_inner_microstep: 1484.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3877
[2024-06-10 01:57:31,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1580.94 | bwd_inner_microstep: 1580.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 01:57:33,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.85 | bwd_microstep: 1486.19 | bwd_inner_microstep: 1486.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2324
[2024-06-10 01:57:34,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.47 | bwd_microstep: 920.96 | bwd_inner_microstep: 920.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907
[2024-06-10 01:57:35,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.66 | bwd_microstep: 691.02 | bwd_inner_microstep: 690.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 01:57:37,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1248.29 | bwd_inner_microstep: 1248.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 01:57:38,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 794.04 | bwd_inner_microstep: 794.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 01:57:40,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1249.45 | bwd_inner_microstep: 1249.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 846
[2024-06-10 01:57:40,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 134.56 | bwd_microstep: 351.96 | bwd_inner_microstep: 351.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 01:57:42,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3406
[2024-06-10 01:57:44,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.05 | bwd_microstep: 1199.06 | bwd_inner_microstep: 1199.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1953
[2024-06-10 01:57:45,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.39 | bwd_microstep: 888.23 | bwd_inner_microstep: 888.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452
[2024-06-10 01:57:47,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.12 | bwd_microstep: 1314.53 | bwd_inner_microstep: 1314.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 01:57:49,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3698
[2024-06-10 01:57:51,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.88 | bwd_microstep: 1584.19 | bwd_inner_microstep: 1584.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-10 01:57:53,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1246.12 | bwd_inner_microstep: 1246.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 01:57:55,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1512.18 | bwd_inner_microstep: 1512.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 01:57:57,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.64 | bwd_microstep: 1618.01 | bwd_inner_microstep: 1617.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 01:57:59,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 1561.73 | bwd_inner_microstep: 1561.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-10 01:58:00,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.77 | bwd_microstep: 707.80 | bwd_inner_microstep: 707.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3705
[2024-06-10 01:58:02,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1335.61 | bwd_inner_microstep: 1335.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2001
[2024-06-10 01:58:03,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.64 | bwd_microstep: 740.08 | bwd_inner_microstep: 740.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 01:58:05,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.29 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 01:58:07,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.43 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 01:58:09,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 1441.96 | bwd_inner_microstep: 1441.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 01:58:11,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.81 | bwd_microstep: 1460.60 | bwd_inner_microstep: 1460.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 01:58:13,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.70 | bwd_microstep: 1659.04 | bwd_inner_microstep: 1659.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 01:58:15,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.98 | bwd_microstep: 1508.21 | bwd_inner_microstep: 1508.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 01:58:17,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.33 | bwd_microstep: 1443.18 | bwd_inner_microstep: 1443.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2046
[2024-06-10 01:58:18,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.08 | bwd_microstep: 847.92 | bwd_inner_microstep: 847.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 01:58:27,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.44 | optimizer_step: 6.62
[2024-06-10 01:58:27,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.73 | bwd_microstep: 8280.05 | bwd_inner_microstep: 1458.02 | bwd_allreduce_microstep: 6821.96 | step_microstep: 40.31
[2024-06-10 01:58:27,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14868.60 | bwd: 46562.67 | bwd_inner: 39739.64 | bwd_allreduce: 6822.27 | step: 42.58
{'loss': 1.3637, 'learning_rate': 3.998447000862164e-05, 'epoch': 0.04}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998
[2024-06-10 01:58:28,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.89 | bwd_microstep: 888.93 | bwd_inner_microstep: 888.77 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 01:58:29,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.27 | bwd_microstep: 680.96 | bwd_inner_microstep: 680.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2395
[2024-06-10 01:58:30,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.17 | bwd_microstep: 901.71 | bwd_inner_microstep: 901.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474
[2024-06-10 01:58:33,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.07 | bwd_microstep: 1572.07 | bwd_inner_microstep: 1572.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 01:58:35,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1548.50 | bwd_inner_microstep: 1548.26 | bwd_allreduce_microstep: 0.13 | step_microstep: 0.25
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 01:58:36,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.88 | bwd_microstep: 790.95 | bwd_inner_microstep: 790.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 01:58:38,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.74 | bwd_microstep: 1278.36 | bwd_inner_microstep: 1278.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 01:58:40,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.41 | bwd_microstep: 1474.48 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 01:58:42,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.17 | bwd_microstep: 1383.50 | bwd_inner_microstep: 1383.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948
[2024-06-10 01:58:43,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.34 | bwd_microstep: 731.26 | bwd_inner_microstep: 731.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 01:58:44,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1317.32 | bwd_inner_microstep: 1317.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 01:58:46,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1482.85 | bwd_inner_microstep: 1482.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3685
[2024-06-10 01:58:49,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.28 | bwd_microstep: 1786.70 | bwd_inner_microstep: 1786.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-10 01:58:51,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.81 | bwd_microstep: 1579.24 | bwd_inner_microstep: 1579.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 01:58:53,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.41 | bwd_microstep: 1607.14 | bwd_inner_microstep: 1607.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2116
[2024-06-10 01:58:54,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.71 | bwd_microstep: 830.08 | bwd_inner_microstep: 830.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150
[2024-06-10 01:58:56,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.54 | bwd_microstep: 856.45 | bwd_inner_microstep: 856.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 01:58:58,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.00 | bwd_microstep: 1484.77 | bwd_inner_microstep: 1484.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3519
[2024-06-10 01:59:00,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.08 | bwd_microstep: 1517.96 | bwd_inner_microstep: 1517.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524
[2024-06-10 01:59:01,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.10 | bwd_microstep: 1201.53 | bwd_inner_microstep: 1201.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3108
[2024-06-10 01:59:03,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1247.08 | bwd_inner_microstep: 1247.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3576
[2024-06-10 01:59:05,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.19 | bwd_microstep: 1662.19 | bwd_inner_microstep: 1662.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579
[2024-06-10 01:59:07,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.33 | bwd_microstep: 1432.10 | bwd_inner_microstep: 1432.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3420
[2024-06-10 01:59:09,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1516.16 | bwd_inner_microstep: 1516.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3607
[2024-06-10 01:59:12,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.53 | bwd_microstep: 1453.98 | bwd_inner_microstep: 1453.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3770
[2024-06-10 01:59:14,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.93 | bwd_microstep: 1711.70 | bwd_inner_microstep: 1711.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2149
[2024-06-10 01:59:15,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.47 | bwd_microstep: 854.35 | bwd_inner_microstep: 854.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 01:59:17,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.69 | bwd_microstep: 1355.40 | bwd_inner_microstep: 1355.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233
[2024-06-10 01:59:18,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.59 | bwd_microstep: 867.04 | bwd_inner_microstep: 867.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-10 01:59:20,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1503.48 | bwd_inner_microstep: 1503.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 01:59:22,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.40 | bwd_microstep: 1333.56 | bwd_inner_microstep: 1333.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 01:59:26,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.40 | optimizer_step: 6.58
[2024-06-10 01:59:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 2930.40 | bwd_inner_microstep: 1566.65 | bwd_allreduce_microstep: 1363.67 | step_microstep: 39.83
[2024-06-10 01:59:26,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15451.21 | bwd: 42782.31 | bwd_inner: 41417.21 | bwd_allreduce: 1364.18 | step: 42.76
{'loss': 1.3781, 'learning_rate': 3.998295596153497e-05, 'epoch': 0.04}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 01:59:27,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.16 | bwd_microstep: 1276.08 | bwd_inner_microstep: 1275.92 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873
[2024-06-10 01:59:30,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.28 | bwd_microstep: 1679.79 | bwd_inner_microstep: 1679.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 01:59:32,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1343.60 | bwd_inner_microstep: 1343.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 01:59:33,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1389.88 | bwd_inner_microstep: 1389.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 01:59:35,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-10 01:59:37,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2966
[2024-06-10 01:59:38,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.43 | bwd_microstep: 1041.65 | bwd_inner_microstep: 1041.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 01:59:40,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.46 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 01:59:42,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.83 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 01:59:44,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.17 | bwd_microstep: 1292.38 | bwd_inner_microstep: 1292.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 01:59:46,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 01:59:47,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.90 | bwd_microstep: 1158.02 | bwd_inner_microstep: 1157.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-10 01:59:49,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.00 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1927
[2024-06-10 01:59:50,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.62 | bwd_microstep: 821.98 | bwd_inner_microstep: 821.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 01:59:52,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1345.63 | bwd_inner_microstep: 1345.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3399
[2024-06-10 01:59:54,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.40 | bwd_microstep: 1407.70 | bwd_inner_microstep: 1407.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 01:59:56,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.08 | bwd_microstep: 1442.95 | bwd_inner_microstep: 1442.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.30
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668
[2024-06-10 01:59:58,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.63 | bwd_microstep: 1619.74 | bwd_inner_microstep: 1619.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 02:00:00,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1535.63 | bwd_inner_microstep: 1535.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3570
[2024-06-10 02:00:02,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.58 | bwd_microstep: 1346.65 | bwd_inner_microstep: 1346.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 02:00:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1353.11 | bwd_inner_microstep: 1353.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 02:00:06,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.02 | bwd_microstep: 1638.48 | bwd_inner_microstep: 1638.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 02:00:08,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.91 | bwd_microstep: 1453.45 | bwd_inner_microstep: 1453.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 02:00:10,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.38 | bwd_microstep: 878.45 | bwd_inner_microstep: 878.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 02:00:12,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1400.95 | bwd_inner_microstep: 1400.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 02:00:13,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.41 | bwd_microstep: 910.05 | bwd_inner_microstep: 910.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3818
[2024-06-10 02:00:15,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.96 | bwd_microstep: 1691.37 | bwd_inner_microstep: 1691.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 02:00:17,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.78 | bwd_microstep: 1301.13 | bwd_inner_microstep: 1301.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 02:00:19,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.95 | bwd_microstep: 1396.38 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3617
[2024-06-10 02:00:21,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.98 | bwd_microstep: 1709.27 | bwd_inner_microstep: 1709.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3566
[2024-06-10 02:00:23,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.04 | bwd_microstep: 1528.55 | bwd_inner_microstep: 1528.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 02:00:25,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 02:00:25,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.34 | bwd_microstep: 1520.01 | bwd_inner_microstep: 1511.91 | bwd_allreduce_microstep: 8.05 | step_microstep: 38.28
[2024-06-10 02:00:25,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16223.38 | bwd: 43251.41 | bwd_inner: 43242.21 | bwd_allreduce: 8.40 | step: 41.43


  4%|▍         | 69/1726 [1:16:56<28:15:04, 61.38s/it]
  4%|▍         | 70/1726 [1:17:56<28:03:59, 61.01s/it]


  4%|▍         | 70/1726 [1:17:56<28:03:59, 61.01s/it]
  4%|▍         | 71/1726 [1:19:00<28:21:07, 61.67s/it]


  4%|▍         | 71/1726 [1:19:00<28:21:07, 61.67s/it]
  4%|▍         | 72/1726 [1:20:02<28:24:44, 61.84s/it]


  4%|▍         | 72/1726 [1:20:02<28:24:44, 61.84s/it]
  4%|▍         | 73/1726 [1:21:04<28:23:31, 61.83s/it]


  4%|▍         | 73/1726 [1:21:04<28:23:31, 61.83s/it]
  4%|▍         | 74/1726 [1:22:02<27:56:10, 60.88s/it]


  4%|▍         | 74/1726 [1:22:02<27:56:10, 60.88s/it]
  4%|▍         | 75/1726 [1:23:02<27:46:51, 60.58s/it]
            {'loss': 1.349, 'learning_rate': 3.99813715345813e-05, 'epoch': 0.04}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-10 02:00:27,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.29 | bwd_microstep: 1180.52 | bwd_inner_microstep: 1180.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 02:00:29,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3930
[2024-06-10 02:00:31,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.36 | bwd_microstep: 1502.55 | bwd_inner_microstep: 1502.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4184
[2024-06-10 02:00:33,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.98 | bwd_microstep: 1651.89 | bwd_inner_microstep: 1651.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-10 02:00:35,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.80 | bwd_microstep: 1156.42 | bwd_inner_microstep: 1156.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 02:00:37,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.81 | bwd_microstep: 1533.47 | bwd_inner_microstep: 1533.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 02:00:39,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.04 | bwd_microstep: 1431.08 | bwd_inner_microstep: 1431.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 02:00:41,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 02:00:43,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1392.76 | bwd_inner_microstep: 1392.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3243
[2024-06-10 02:00:45,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 02:00:47,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1254.44 | bwd_inner_microstep: 1254.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3552
[2024-06-10 02:00:48,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.13 | bwd_microstep: 1330.67 | bwd_inner_microstep: 1330.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 02:00:50,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.44 | bwd_microstep: 1454.57 | bwd_inner_microstep: 1454.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3656
[2024-06-10 02:00:53,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.83 | bwd_microstep: 1824.27 | bwd_inner_microstep: 1824.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651
[2024-06-10 02:00:55,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.99 | bwd_microstep: 1717.01 | bwd_inner_microstep: 1716.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2051
[2024-06-10 02:00:56,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.30 | bwd_microstep: 880.24 | bwd_inner_microstep: 880.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 02:00:58,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.99 | bwd_microstep: 1312.94 | bwd_inner_microstep: 1312.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3532
[2024-06-10 02:01:00,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.75 | bwd_microstep: 1199.51 | bwd_inner_microstep: 1199.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 02:01:02,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.42 | bwd_microstep: 1290.32 | bwd_inner_microstep: 1290.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 02:01:04,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.40 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 02:01:05,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1379.10 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 02:01:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.02 | bwd_microstep: 1301.71 | bwd_inner_microstep: 1301.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 02:01:09,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.66 | bwd_microstep: 1500.17 | bwd_inner_microstep: 1500.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 02:01:11,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.22 | bwd_microstep: 1403.57 | bwd_inner_microstep: 1403.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3460
[2024-06-10 02:01:13,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.31 | bwd_microstep: 1247.17 | bwd_inner_microstep: 1247.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478
[2024-06-10 02:01:15,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.32 | bwd_microstep: 1442.35 | bwd_inner_microstep: 1442.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1861
[2024-06-10 02:01:16,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.19 | bwd_microstep: 676.63 | bwd_inner_microstep: 676.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 02:01:18,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1645.03 | bwd_inner_microstep: 1645.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 02:01:20,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.97 | bwd_microstep: 1248.76 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 02:01:22,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1437.11 | bwd_inner_microstep: 1437.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3813
[2024-06-10 02:01:24,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.87 | bwd_microstep: 1858.71 | bwd_inner_microstep: 1858.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 02:01:29,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.45 | optimizer_step: 6.59
[2024-06-10 02:01:29,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 3543.79 | bwd_inner_microstep: 1647.42 | bwd_allreduce_microstep: 1896.29 | step_microstep: 40.00
[2024-06-10 02:01:29,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16585.80 | bwd: 46201.41 | bwd_inner: 44304.18 | bwd_allreduce: 1896.54 | step: 44.24
{'loss': 1.3118, 'learning_rate': 3.997971673334095e-05, 'epoch': 0.04}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 02:01:31,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1379.82 | bwd_inner_microstep: 1379.73 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922
[2024-06-10 02:01:32,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.11 | bwd_microstep: 823.14 | bwd_inner_microstep: 823.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3654
[2024-06-10 02:01:34,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.27 | bwd_microstep: 1723.60 | bwd_inner_microstep: 1723.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 02:01:36,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.79 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 02:01:38,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1553.99 | bwd_inner_microstep: 1553.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 02:01:40,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1410.39 | bwd_inner_microstep: 1410.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 02:01:42,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.66 | bwd_microstep: 1250.41 | bwd_inner_microstep: 1250.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-10 02:01:44,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1415.07 | bwd_inner_microstep: 1415.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 02:01:45,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.35 | bwd_microstep: 710.05 | bwd_inner_microstep: 710.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 02:01:46,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.97 | bwd_microstep: 1353.48 | bwd_inner_microstep: 1353.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1968
[2024-06-10 02:01:48,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.10 | bwd_microstep: 827.08 | bwd_inner_microstep: 827.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452
[2024-06-10 02:01:49,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.90 | bwd_microstep: 1323.38 | bwd_inner_microstep: 1323.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 02:01:52,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.45 | bwd_microstep: 1622.62 | bwd_inner_microstep: 1622.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 02:01:53,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.25 | bwd_microstep: 830.45 | bwd_inner_microstep: 830.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 02:01:55,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.22 | bwd_microstep: 1622.10 | bwd_inner_microstep: 1622.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 02:01:57,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1615.89 | bwd_inner_microstep: 1615.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 02:01:59,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 02:02:01,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1390.31 | bwd_inner_microstep: 1390.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 02:02:03,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1454.53 | bwd_inner_microstep: 1454.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829
[2024-06-10 02:02:05,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.78 | bwd_microstep: 1522.96 | bwd_inner_microstep: 1522.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2425
[2024-06-10 02:02:07,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.61 | bwd_microstep: 1067.29 | bwd_inner_microstep: 1067.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1987
[2024-06-10 02:02:08,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.08 | bwd_microstep: 743.69 | bwd_inner_microstep: 743.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 02:02:10,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1300.83 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 02:02:12,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1558.12 | bwd_inner_microstep: 1558.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2175
[2024-06-10 02:02:13,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 764.55 | bwd_inner_microstep: 764.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3676
[2024-06-10 02:02:15,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.86 | bwd_microstep: 1676.68 | bwd_inner_microstep: 1676.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3557
[2024-06-10 02:02:17,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1455.08 | bwd_inner_microstep: 1455.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 02:02:19,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.68 | bwd_microstep: 1598.10 | bwd_inner_microstep: 1598.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 02:02:21,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1428.29 | bwd_inner_microstep: 1428.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3424
[2024-06-10 02:02:23,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1403.14 | bwd_inner_microstep: 1403.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3426
[2024-06-10 02:02:25,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1477.42 | bwd_inner_microstep: 1477.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 02:02:31,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.44 | optimizer_step: 6.56
[2024-06-10 02:02:31,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 4941.94 | bwd_inner_microstep: 1411.60 | bwd_allreduce_microstep: 3530.27 | step_microstep: 40.23
[2024-06-10 02:02:31,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15792.90 | bwd: 45922.81 | bwd_inner: 42391.53 | bwd_allreduce: 3530.56 | step: 42.68
{'loss': 1.3459, 'learning_rate': 3.997799156364213e-05, 'epoch': 0.04}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 02:02:33,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1472.71 | bwd_inner_microstep: 1472.55 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 02:02:35,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1297.54 | bwd_inner_microstep: 1297.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873
[2024-06-10 02:02:37,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.10 | bwd_microstep: 1684.35 | bwd_inner_microstep: 1684.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 02:02:39,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.12 | bwd_microstep: 1381.04 | bwd_inner_microstep: 1381.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847
[2024-06-10 02:02:41,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.66 | bwd_microstep: 1665.46 | bwd_inner_microstep: 1665.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 02:02:43,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1389.27 | bwd_inner_microstep: 1389.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 02:02:45,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1391.97 | bwd_inner_microstep: 1391.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2248
[2024-06-10 02:02:46,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.69 | bwd_microstep: 969.39 | bwd_inner_microstep: 969.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 02:02:48,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.90 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3787
[2024-06-10 02:02:50,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1554.08 | bwd_inner_microstep: 1554.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4080
[2024-06-10 02:02:53,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.13 | bwd_microstep: 1630.58 | bwd_inner_microstep: 1630.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934
[2024-06-10 02:02:54,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.82 | bwd_microstep: 762.03 | bwd_inner_microstep: 762.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415
[2024-06-10 02:02:55,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.28 | bwd_microstep: 1310.86 | bwd_inner_microstep: 1310.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2187
[2024-06-10 02:02:57,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.79 | bwd_microstep: 1055.48 | bwd_inner_microstep: 1055.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3722
[2024-06-10 02:02:59,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.52 | bwd_microstep: 1840.56 | bwd_inner_microstep: 1840.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 2989
[2024-06-10 02:03:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1401.90 | bwd_inner_microstep: 1401.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 02:03:03,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.79 | bwd_microstep: 1188.32 | bwd_inner_microstep: 1188.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 02:03:05,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1380.45 | bwd_inner_microstep: 1380.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 02:03:07,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.49 | bwd_microstep: 1495.46 | bwd_inner_microstep: 1495.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2987
[2024-06-10 02:03:08,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.76 | bwd_microstep: 1017.90 | bwd_inner_microstep: 1017.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 02:03:10,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1393.39 | bwd_inner_microstep: 1393.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 02:03:12,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.44 | bwd_microstep: 1402.07 | bwd_inner_microstep: 1402.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640
[2024-06-10 02:03:14,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.83 | bwd_microstep: 1318.70 | bwd_inner_microstep: 1318.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3592
[2024-06-10 02:03:16,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.39 | bwd_microstep: 1465.40 | bwd_inner_microstep: 1465.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 02:03:18,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.49 | bwd_microstep: 975.73 | bwd_inner_microstep: 975.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1924
[2024-06-10 02:03:19,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.49 | bwd_microstep: 730.86 | bwd_inner_microstep: 730.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3577
[2024-06-10 02:03:21,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.68 | bwd_microstep: 1456.12 | bwd_inner_microstep: 1456.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3825
[2024-06-10 02:03:23,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.21 | bwd_microstep: 1692.25 | bwd_inner_microstep: 1692.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 02:03:25,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.66 | bwd_microstep: 1639.23 | bwd_inner_microstep: 1639.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3436
[2024-06-10 02:03:27,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.02 | bwd_microstep: 1518.24 | bwd_inner_microstep: 1518.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2278
[2024-06-10 02:03:29,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.89 | bwd_microstep: 1006.07 | bwd_inner_microstep: 1006.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3807
[2024-06-10 02:03:35,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.44 | optimizer_step: 6.62
[2024-06-10 02:03:35,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 5580.97 | bwd_inner_microstep: 1562.30 | bwd_allreduce_microstep: 4018.59 | step_microstep: 40.12
[2024-06-10 02:03:35,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16171.84 | bwd: 47460.76 | bwd_inner: 43441.05 | bwd_allreduce: 4018.89 | step: 43.11
{'loss': 1.3018, 'learning_rate': 3.997619603156088e-05, 'epoch': 0.05}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 02:03:37,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.00 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.24 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1915
[2024-06-10 02:03:38,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.57 | bwd_microstep: 783.68 | bwd_inner_microstep: 783.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 02:03:40,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.99 | bwd_microstep: 1314.61 | bwd_inner_microstep: 1314.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 02:03:41,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.73 | bwd_microstep: 1278.77 | bwd_inner_microstep: 1278.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788
[2024-06-10 02:03:43,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.93 | bwd_microstep: 1457.47 | bwd_inner_microstep: 1457.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 02:03:45,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1349.04 | bwd_inner_microstep: 1349.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 02:03:47,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.91 | bwd_microstep: 1287.36 | bwd_inner_microstep: 1287.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-10 02:03:48,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.87 | bwd_microstep: 689.99 | bwd_inner_microstep: 689.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-10 02:03:50,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.75 | bwd_microstep: 1426.41 | bwd_inner_microstep: 1426.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 02:03:52,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.91 | bwd_microstep: 1257.23 | bwd_inner_microstep: 1257.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 02:03:54,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.98 | bwd_microstep: 1259.80 | bwd_inner_microstep: 1259.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 02:03:55,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.52 | bwd_microstep: 1293.10 | bwd_inner_microstep: 1293.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 02:03:56,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.27 | bwd_microstep: 797.64 | bwd_inner_microstep: 797.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2090
[2024-06-10 02:03:58,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.76 | bwd_microstep: 824.09 | bwd_inner_microstep: 824.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1991
[2024-06-10 02:03:59,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.77 | bwd_microstep: 898.96 | bwd_inner_microstep: 898.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 02:04:00,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.09 | bwd_microstep: 799.63 | bwd_inner_microstep: 799.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 02:04:02,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.26 | bwd_microstep: 1606.07 | bwd_inner_microstep: 1606.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 02:04:04,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.66 | bwd_microstep: 1511.58 | bwd_inner_microstep: 1511.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3623
[2024-06-10 02:04:07,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.64 | bwd_microstep: 1813.72 | bwd_inner_microstep: 1813.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 02:04:09,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.95 | bwd_microstep: 1584.16 | bwd_inner_microstep: 1584.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 02:04:11,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1584.32 | bwd_inner_microstep: 1584.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 02:04:13,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1608.68 | bwd_inner_microstep: 1608.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 02:04:15,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1379.37 | bwd_inner_microstep: 1379.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 02:04:17,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.18 | bwd_microstep: 1434.57 | bwd_inner_microstep: 1434.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2899
[2024-06-10 02:04:19,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.06 | bwd_microstep: 1187.79 | bwd_inner_microstep: 1187.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 02:04:21,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.02 | bwd_microstep: 1489.61 | bwd_inner_microstep: 1489.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 02:04:23,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.24 | bwd_microstep: 1651.01 | bwd_inner_microstep: 1650.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 02:04:24,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 794.10 | bwd_inner_microstep: 794.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 02:04:26,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3571
[2024-06-10 02:04:28,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1365.49 | bwd_inner_microstep: 1365.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3599
[2024-06-10 02:04:30,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.32 | bwd_microstep: 1537.27 | bwd_inner_microstep: 1537.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2042
[2024-06-10 02:04:39,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.42 | optimizer_step: 6.59
[2024-06-10 02:04:39,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.34 | bwd_microstep: 8124.21 | bwd_inner_microstep: 1074.91 | bwd_allreduce_microstep: 7049.23 | step_microstep: 40.14
[2024-06-10 02:04:39,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15299.05 | bwd: 48077.33 | bwd_inner: 41027.03 | bwd_allreduce: 7049.54 | step: 42.34
{'loss': 1.3514, 'learning_rate': 3.997433014342106e-05, 'epoch': 0.05}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 02:04:40,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.50 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3916
[2024-06-10 02:04:43,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.68 | bwd_microstep: 1686.64 | bwd_inner_microstep: 1686.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 02:04:45,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.64 | bwd_microstep: 1650.02 | bwd_inner_microstep: 1649.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2279
[2024-06-10 02:04:46,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.80 | bwd_microstep: 811.96 | bwd_inner_microstep: 811.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 02:04:48,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1414.38 | bwd_inner_microstep: 1414.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972
[2024-06-10 02:04:49,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 830.30 | bwd_inner_microstep: 830.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 02:04:51,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.98 | bwd_microstep: 1253.90 | bwd_inner_microstep: 1253.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 02:04:53,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.46 | bwd_microstep: 1481.45 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 02:04:55,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1394.17 | bwd_inner_microstep: 1394.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4059
[2024-06-10 02:04:57,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.72 | bwd_microstep: 1631.95 | bwd_inner_microstep: 1631.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652
[2024-06-10 02:04:59,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1327.86 | bwd_inner_microstep: 1327.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 02:05:01,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.58 | bwd_microstep: 1350.41 | bwd_inner_microstep: 1350.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505
[2024-06-10 02:05:03,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1554.46 | bwd_inner_microstep: 1554.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 02:05:05,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1439.89 | bwd_inner_microstep: 1439.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 02:05:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.75 | bwd_microstep: 1496.05 | bwd_inner_microstep: 1496.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 02:05:09,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.41 | bwd_microstep: 1622.39 | bwd_inner_microstep: 1622.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 02:05:11,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 02:05:13,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.47 | bwd_microstep: 1615.97 | bwd_inner_microstep: 1615.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 02:05:15,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.54 | bwd_microstep: 1522.58 | bwd_inner_microstep: 1522.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948
[2024-06-10 02:05:16,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.65 | bwd_microstep: 732.83 | bwd_inner_microstep: 732.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 02:05:19,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.82 | bwd_microstep: 1491.12 | bwd_inner_microstep: 1491.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 02:05:20,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.43 | bwd_microstep: 1281.68 | bwd_inner_microstep: 1281.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 02:05:22,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.92 | bwd_microstep: 1360.50 | bwd_inner_microstep: 1360.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 02:05:24,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1354.53 | bwd_inner_microstep: 1354.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3602
[2024-06-10 02:05:26,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1464.33 | bwd_inner_microstep: 1464.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 02:05:28,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.29 | bwd_microstep: 1652.75 | bwd_inner_microstep: 1652.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 02:05:30,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1409.26 | bwd_inner_microstep: 1409.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 02:05:32,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1505.40 | bwd_inner_microstep: 1505.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2061
[2024-06-10 02:05:34,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.94 | bwd_microstep: 1013.07 | bwd_inner_microstep: 1013.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 02:05:36,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.74 | bwd_microstep: 1450.84 | bwd_inner_microstep: 1450.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 02:05:38,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1252.77 | bwd_inner_microstep: 1252.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2081
[2024-06-10 02:05:41,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 02:05:41,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.15 | bwd_microstep: 2967.48 | bwd_inner_microstep: 1132.14 | bwd_allreduce_microstep: 1835.28 | step_microstep: 39.32
[2024-06-10 02:05:41,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16294.49 | bwd: 45659.47 | bwd_inner: 43823.12 | bwd_allreduce: 1835.59 | step: 41.60
{'loss': 1.326, 'learning_rate': 3.9972393905794304e-05, 'epoch': 0.05}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 02:05:43,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.98 | bwd_microstep: 1478.62 | bwd_inner_microstep: 1478.40 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-10 02:05:45,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1582.51 | bwd_inner_microstep: 1582.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-10 02:05:47,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1584.37 | bwd_inner_microstep: 1584.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3803
[2024-06-10 02:05:49,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.74 | bwd_microstep: 1354.00 | bwd_inner_microstep: 1353.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4165
[2024-06-10 02:05:51,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.51 | bwd_microstep: 1649.39 | bwd_inner_microstep: 1649.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 02:05:52,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.68 | bwd_microstep: 731.38 | bwd_inner_microstep: 731.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 02:05:54,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.36 | bwd_microstep: 1343.83 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 02:05:56,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618
[2024-06-10 02:05:58,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1312.52 | bwd_inner_microstep: 1312.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530
[2024-06-10 02:06:00,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.11 | bwd_microstep: 1201.46 | bwd_inner_microstep: 1201.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2994
[2024-06-10 02:06:01,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.29 | bwd_microstep: 1109.73 | bwd_inner_microstep: 1109.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-10 02:06:03,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.07 | bwd_microstep: 1193.03 | bwd_inner_microstep: 1193.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 02:06:05,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.69 | bwd_microstep: 1352.38 | bwd_inner_microstep: 1352.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-10 02:06:07,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.54 | bwd_microstep: 1718.80 | bwd_inner_microstep: 1718.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 02:06:09,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.71 | bwd_microstep: 1499.29 | bwd_inner_microstep: 1499.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1984
[2024-06-10 02:06:10,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.05 | bwd_microstep: 896.10 | bwd_inner_microstep: 896.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 02:06:12,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.87 | bwd_microstep: 1498.37 | bwd_inner_microstep: 1498.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 02:06:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1342.89 | bwd_inner_microstep: 1342.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-10 02:06:16,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.57 | bwd_microstep: 1465.55 | bwd_inner_microstep: 1465.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 02:06:18,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.72 | bwd_microstep: 1318.34 | bwd_inner_microstep: 1318.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1414
[2024-06-10 02:06:19,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 226.59 | bwd_microstep: 594.20 | bwd_inner_microstep: 594.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 02:06:21,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.04 | bwd_microstep: 1625.99 | bwd_inner_microstep: 1625.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3564
[2024-06-10 02:06:23,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1530.74 | bwd_inner_microstep: 1530.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 02:06:24,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.84 | bwd_microstep: 705.91 | bwd_inner_microstep: 705.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2022
[2024-06-10 02:06:26,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.07 | bwd_microstep: 901.16 | bwd_inner_microstep: 901.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 02:06:28,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.99 | bwd_microstep: 1626.88 | bwd_inner_microstep: 1626.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2054
[2024-06-10 02:06:29,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.68 | bwd_microstep: 816.67 | bwd_inner_microstep: 816.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3859
[2024-06-10 02:06:31,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.02 | bwd_microstep: 1662.86 | bwd_inner_microstep: 1662.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 02:06:33,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.14 | bwd_microstep: 1452.22 | bwd_inner_microstep: 1452.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 02:06:35,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 1357.73 | bwd_inner_microstep: 1357.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 02:06:37,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1509.17 | bwd_inner_microstep: 1509.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450
[2024-06-10 02:06:42,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.40 | optimizer_step: 6.60
[2024-06-10 02:06:42,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 4203.61 | bwd_inner_microstep: 1609.42 | bwd_allreduce_microstep: 2594.11 | step_microstep: 39.64
[2024-06-10 02:06:42,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15780.17 | bwd: 44902.86 | bwd_inner: 42307.64 | bwd_allreduce: 2594.45 | step: 41.70


  4%|▍         | 75/1726 [1:23:02<27:46:51, 60.58s/it]
  4%|▍         | 76/1726 [1:24:05<28:07:21, 61.36s/it]


  4%|▍         | 76/1726 [1:24:05<28:07:21, 61.36s/it]
  4%|▍         | 77/1726 [1:25:07<28:12:33, 61.58s/it]


  4%|▍         | 77/1726 [1:25:07<28:12:33, 61.58s/it]
  5%|▍         | 78/1726 [1:26:12<28:31:44, 62.32s/it]


  5%|▍         | 78/1726 [1:26:12<28:31:44, 62.32s/it]
  5%|▍         | 79/1726 [1:27:15<28:42:34, 62.75s/it]


  5%|▍         | 79/1726 [1:27:15<28:42:34, 62.75s/it]
  5%|▍         | 80/1726 [1:28:18<28:38:10, 62.63s/it]


  5%|▍         | 80/1726 [1:28:18<28:38:10, 62.63s/it]
  5%|▍         | 81/1726 [1:29:19<28:24:21, 62.16s/it]
                {'loss': 1.3213, 'learning_rate': 3.997038732550004e-05, 'epoch': 0.05}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 02:06:44,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1332.61 | bwd_inner_microstep: 1332.45 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.14
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870
[2024-06-10 02:06:45,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.13 | bwd_microstep: 739.25 | bwd_inner_microstep: 739.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907
[2024-06-10 02:06:47,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.19 | bwd_microstep: 1592.78 | bwd_inner_microstep: 1592.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-10 02:06:49,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.47 | bwd_microstep: 1565.99 | bwd_inner_microstep: 1565.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-10 02:06:51,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.82 | bwd_microstep: 1665.07 | bwd_inner_microstep: 1665.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 02:06:53,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.98 | bwd_microstep: 1286.07 | bwd_inner_microstep: 1286.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 02:06:55,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1545.15 | bwd_inner_microstep: 1545.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3413
[2024-06-10 02:06:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.02 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046
[2024-06-10 02:06:58,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.22 | bwd_microstep: 815.30 | bwd_inner_microstep: 815.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-10 02:06:59,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.64 | bwd_microstep: 685.69 | bwd_inner_microstep: 685.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3720
[2024-06-10 02:07:01,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1587.86 | bwd_inner_microstep: 1587.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 02:07:03,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.86 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688
[2024-06-10 02:07:05,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.36 | bwd_microstep: 1519.80 | bwd_inner_microstep: 1519.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 02:07:07,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.32 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3992
[2024-06-10 02:07:10,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.60 | bwd_microstep: 1604.25 | bwd_inner_microstep: 1604.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 02:07:12,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.78 | bwd_microstep: 1481.74 | bwd_inner_microstep: 1481.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 02:07:13,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1341.86 | bwd_inner_microstep: 1341.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1980
[2024-06-10 02:07:15,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.53 | bwd_microstep: 860.88 | bwd_inner_microstep: 860.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 02:07:17,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.72 | bwd_microstep: 1355.70 | bwd_inner_microstep: 1355.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2078
[2024-06-10 02:07:18,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.73 | bwd_microstep: 1016.26 | bwd_inner_microstep: 1016.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 02:07:20,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 1514.54 | bwd_inner_microstep: 1514.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 02:07:22,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1431.96 | bwd_inner_microstep: 1431.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2157
[2024-06-10 02:07:23,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.59 | bwd_microstep: 952.47 | bwd_inner_microstep: 952.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 02:07:24,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.98 | bwd_microstep: 806.74 | bwd_inner_microstep: 806.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 02:07:26,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3438
[2024-06-10 02:07:28,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1499.89 | bwd_inner_microstep: 1499.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 02:07:30,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.97 | bwd_microstep: 1350.22 | bwd_inner_microstep: 1350.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 02:07:32,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.36 | bwd_microstep: 1290.18 | bwd_inner_microstep: 1290.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 02:07:34,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1250.17 | bwd_inner_microstep: 1250.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 02:07:36,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 1481.77 | bwd_inner_microstep: 1481.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 02:07:38,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.19 | bwd_microstep: 1504.48 | bwd_inner_microstep: 1504.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2246
[2024-06-10 02:07:52,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.45 | optimizer_step: 6.60
[2024-06-10 02:07:52,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.13 | bwd_microstep: 13344.47 | bwd_inner_microstep: 884.33 | bwd_allreduce_microstep: 12460.07 | step_microstep: 40.13
[2024-06-10 02:07:52,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15446.48 | bwd: 53778.60 | bwd_inner: 41317.45 | bwd_allreduce: 12460.37 | step: 42.67
{'loss': 1.3296, 'learning_rate': 3.996831040960543e-05, 'epoch': 0.05}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 02:07:54,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.18 | bwd_microstep: 1481.30 | bwd_inner_microstep: 1481.14 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 02:07:55,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.43 | bwd_microstep: 1289.32 | bwd_inner_microstep: 1289.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903
[2024-06-10 02:07:58,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1584.97 | bwd_inner_microstep: 1584.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3837
[2024-06-10 02:08:00,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1501.42 | bwd_inner_microstep: 1501.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 02:08:02,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1383.13 | bwd_inner_microstep: 1383.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486
[2024-06-10 02:08:03,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1217.23 | bwd_inner_microstep: 1217.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 02:08:05,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1348.82 | bwd_inner_microstep: 1348.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410
[2024-06-10 02:08:07,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1153.29 | bwd_inner_microstep: 1153.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3409
[2024-06-10 02:08:08,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.10 | bwd_microstep: 1213.39 | bwd_inner_microstep: 1213.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 02:08:10,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.71 | bwd_microstep: 1282.86 | bwd_inner_microstep: 1282.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 02:08:11,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.85 | bwd_microstep: 797.60 | bwd_inner_microstep: 797.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-10 02:08:13,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1289.05 | bwd_inner_microstep: 1289.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 02:08:15,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1502.53 | bwd_inner_microstep: 1502.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-10 02:08:17,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1344.02 | bwd_inner_microstep: 1343.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3165
[2024-06-10 02:08:19,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.38 | bwd_microstep: 1163.12 | bwd_inner_microstep: 1163.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3429
[2024-06-10 02:08:21,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1316.28 | bwd_inner_microstep: 1316.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448
[2024-06-10 02:08:22,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.90 | bwd_microstep: 1414.42 | bwd_inner_microstep: 1414.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3554
[2024-06-10 02:08:25,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.27 | bwd_microstep: 1548.27 | bwd_inner_microstep: 1548.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 02:08:27,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.70 | bwd_microstep: 1562.29 | bwd_inner_microstep: 1562.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 02:08:28,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.51 | bwd_microstep: 1256.64 | bwd_inner_microstep: 1256.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 02:08:30,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2069
[2024-06-10 02:08:32,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.12 | bwd_microstep: 882.60 | bwd_inner_microstep: 882.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3672
[2024-06-10 02:08:34,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.67 | bwd_microstep: 1330.50 | bwd_inner_microstep: 1330.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 02:08:35,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.28 | bwd_microstep: 1397.46 | bwd_inner_microstep: 1397.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3555
[2024-06-10 02:08:37,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1352.01 | bwd_inner_microstep: 1351.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 02:08:39,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.78 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-10 02:08:41,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.65 | bwd_microstep: 1444.43 | bwd_inner_microstep: 1444.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-10 02:08:43,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1313.71 | bwd_inner_microstep: 1313.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3571
[2024-06-10 02:08:45,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.85 | bwd_microstep: 1454.50 | bwd_inner_microstep: 1454.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 02:08:47,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.25 | bwd_microstep: 1622.09 | bwd_inner_microstep: 1622.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 02:08:49,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1439.41 | bwd_inner_microstep: 1439.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 02:08:53,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.42 | optimizer_step: 6.60
[2024-06-10 02:08:53,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.06 | bwd_microstep: 2847.34 | bwd_inner_microstep: 1812.01 | bwd_allreduce_microstep: 1035.25 | step_microstep: 39.94
[2024-06-10 02:08:53,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16308.50 | bwd: 44542.41 | bwd_inner: 43506.10 | bwd_allreduce: 1035.56 | step: 42.50
{'loss': 1.3016, 'learning_rate': 3.996616316542537e-05, 'epoch': 0.05}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 02:08:55,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.92 | bwd_microstep: 1324.72 | bwd_inner_microstep: 1324.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3934
[2024-06-10 02:08:57,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.91 | bwd_microstep: 1699.98 | bwd_inner_microstep: 1699.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 02:08:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.50 | bwd_microstep: 1484.19 | bwd_inner_microstep: 1484.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3864
[2024-06-10 02:09:01,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.13 | bwd_microstep: 1634.53 | bwd_inner_microstep: 1634.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 02:09:03,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1252.89 | bwd_inner_microstep: 1252.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402
[2024-06-10 02:09:05,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.25 | bwd_microstep: 1209.81 | bwd_inner_microstep: 1209.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482
[2024-06-10 02:09:07,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.85 | bwd_microstep: 1318.83 | bwd_inner_microstep: 1318.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 02:09:09,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.54 | bwd_microstep: 1428.73 | bwd_inner_microstep: 1428.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 02:09:11,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.19 | bwd_microstep: 1441.70 | bwd_inner_microstep: 1441.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 02:09:12,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.66 | bwd_microstep: 1285.80 | bwd_inner_microstep: 1285.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 02:09:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1433.62 | bwd_inner_microstep: 1433.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3503
[2024-06-10 02:09:16,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 1250.80 | bwd_inner_microstep: 1250.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 02:09:18,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1312.66 | bwd_inner_microstep: 1312.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-10 02:09:20,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.60 | bwd_microstep: 1347.56 | bwd_inner_microstep: 1347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452
[2024-06-10 02:09:22,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.76 | bwd_microstep: 1317.33 | bwd_inner_microstep: 1317.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 02:09:24,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.27 | bwd_microstep: 1610.17 | bwd_inner_microstep: 1610.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 02:09:26,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.26 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1481.91 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3668
[2024-06-10 02:09:28,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.80 | bwd_microstep: 1826.36 | bwd_inner_microstep: 1826.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-10 02:09:30,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.25 | bwd_microstep: 1431.03 | bwd_inner_microstep: 1431.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3674
[2024-06-10 02:09:32,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1239.01 | bwd_inner_microstep: 1238.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 02:09:34,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.48 | bwd_microstep: 1362.67 | bwd_inner_microstep: 1362.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 02:09:36,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.69 | bwd_microstep: 1257.08 | bwd_inner_microstep: 1257.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 02:09:38,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.21 | bwd_microstep: 1664.30 | bwd_inner_microstep: 1664.03 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 02:09:40,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1282.96 | bwd_inner_microstep: 1282.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 02:09:42,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.38 | bwd_microstep: 1442.34 | bwd_inner_microstep: 1442.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 02:09:44,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1402.12 | bwd_inner_microstep: 1402.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 02:09:46,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.24 | bwd_microstep: 1562.11 | bwd_inner_microstep: 1562.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 02:09:48,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1296.95 | bwd_inner_microstep: 1296.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 02:09:49,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.85 | bwd_microstep: 1267.67 | bwd_inner_microstep: 1267.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 02:09:52,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.98 | bwd_microstep: 1633.76 | bwd_inner_microstep: 1633.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 02:09:54,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1488.13 | bwd_inner_microstep: 1488.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 02:09:56,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 02:09:56,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1454.86 | bwd_inner_microstep: 1447.01 | bwd_allreduce_microstep: 7.80 | step_microstep: 38.83
[2024-06-10 02:09:56,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17043.97 | bwd: 45446.85 | bwd_inner: 45437.68 | bwd_allreduce: 8.24 | step: 41.48
{'loss': 1.3972, 'learning_rate': 3.996394560052243e-05, 'epoch': 0.05}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 02:09:58,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.18 | bwd_microstep: 1390.57 | bwd_inner_microstep: 1390.40 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-10 02:09:59,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.41 | bwd_microstep: 1185.80 | bwd_inner_microstep: 1185.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 02:10:01,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1549.71 | bwd_inner_microstep: 1549.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806
[2024-06-10 02:10:04,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.02 | bwd_microstep: 1514.60 | bwd_inner_microstep: 1514.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 02:10:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.72 | bwd_microstep: 1150.78 | bwd_inner_microstep: 1150.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 02:10:07,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.90 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 02:10:09,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.27 | bwd_microstep: 1386.73 | bwd_inner_microstep: 1386.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 02:10:11,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1443.03 | bwd_inner_microstep: 1443.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3681
[2024-06-10 02:10:13,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1455.78 | bwd_inner_microstep: 1455.58 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 02:10:15,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1507.15 | bwd_inner_microstep: 1507.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3598
[2024-06-10 02:10:17,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1381.38 | bwd_inner_microstep: 1381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 02:10:19,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1582.54 | bwd_inner_microstep: 1582.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442
[2024-06-10 02:10:21,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.54 | bwd_microstep: 1313.44 | bwd_inner_microstep: 1313.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 02:10:23,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.99 | bwd_microstep: 1347.83 | bwd_inner_microstep: 1347.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429
[2024-06-10 02:10:25,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.42 | bwd_microstep: 1380.66 | bwd_inner_microstep: 1380.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-10 02:10:26,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.45 | bwd_microstep: 1193.43 | bwd_inner_microstep: 1193.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 785
[2024-06-10 02:10:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 123.30 | bwd_microstep: 313.72 | bwd_inner_microstep: 313.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 02:10:28,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.01 | bwd_microstep: 895.36 | bwd_inner_microstep: 895.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3403
[2024-06-10 02:10:30,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1405.92 | bwd_inner_microstep: 1405.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-10 02:10:32,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.50 | bwd_microstep: 1329.94 | bwd_inner_microstep: 1329.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3667
[2024-06-10 02:10:34,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1372.82 | bwd_inner_microstep: 1372.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 5482
[2024-06-10 02:10:37,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 1637.87 | bwd_microstep: 2010.76 | bwd_inner_microstep: 2010.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 02:10:39,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1402.99 | bwd_inner_microstep: 1402.76 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.21
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2080
[2024-06-10 02:10:40,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.24 | bwd_microstep: 729.86 | bwd_inner_microstep: 729.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 02:10:42,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1508.44 | bwd_inner_microstep: 1508.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 02:10:44,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1242.41 | bwd_inner_microstep: 1242.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2020
[2024-06-10 02:10:45,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.85 | bwd_microstep: 837.51 | bwd_inner_microstep: 837.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 02:10:47,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1456.22 | bwd_inner_microstep: 1456.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 02:10:49,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.84 | bwd_microstep: 975.37 | bwd_inner_microstep: 975.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3802
[2024-06-10 02:10:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.07 | bwd_microstep: 1355.57 | bwd_inner_microstep: 1355.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651
[2024-06-10 02:10:52,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.76 | bwd_microstep: 1325.65 | bwd_inner_microstep: 1325.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749
[2024-06-10 02:10:57,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.44 | optimizer_step: 6.62
[2024-06-10 02:10:57,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.86 | bwd_microstep: 4173.14 | bwd_inner_microstep: 1623.95 | bwd_allreduce_microstep: 2549.11 | step_microstep: 40.06
[2024-06-10 02:10:57,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16557.34 | bwd: 44405.26 | bwd_inner: 41854.69 | bwd_allreduce: 2549.58 | step: 42.32
{'loss': 1.3396, 'learning_rate': 3.9961657722706864e-05, 'epoch': 0.05}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890
[2024-06-10 02:10:58,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.23 | bwd_microstep: 776.05 | bwd_inner_microstep: 775.90 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 02:11:00,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.37 | bwd_microstep: 1457.93 | bwd_inner_microstep: 1457.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 02:11:02,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1250.04 | bwd_inner_microstep: 1250.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3796
[2024-06-10 02:11:04,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1478.07 | bwd_inner_microstep: 1478.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 02:11:06,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.52 | bwd_microstep: 1413.83 | bwd_inner_microstep: 1413.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 02:11:08,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.03 | bwd_microstep: 1286.52 | bwd_inner_microstep: 1286.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441
[2024-06-10 02:11:09,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.64 | bwd_microstep: 1189.75 | bwd_inner_microstep: 1189.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 02:11:11,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.74 | bwd_microstep: 1161.66 | bwd_inner_microstep: 1161.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:11:13,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1251.21 | bwd_inner_microstep: 1251.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 02:11:15,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.65 | bwd_microstep: 1415.72 | bwd_inner_microstep: 1415.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 02:11:16,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.61 | bwd_microstep: 1310.23 | bwd_inner_microstep: 1310.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-10 02:11:19,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.10 | bwd_microstep: 1711.21 | bwd_inner_microstep: 1711.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3507
[2024-06-10 02:11:21,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1519.27 | bwd_inner_microstep: 1519.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-10 02:11:22,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.95 | bwd_microstep: 895.78 | bwd_inner_microstep: 895.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 02:11:24,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 02:11:26,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.88 | bwd_microstep: 1453.18 | bwd_inner_microstep: 1453.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2002
[2024-06-10 02:11:27,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.66 | bwd_microstep: 863.87 | bwd_inner_microstep: 863.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490
[2024-06-10 02:11:29,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.60 | bwd_microstep: 1579.29 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 02:11:31,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.33 | bwd_microstep: 1523.60 | bwd_inner_microstep: 1523.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 02:11:33,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.84 | bwd_microstep: 787.57 | bwd_inner_microstep: 787.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 02:11:35,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.19 | bwd_microstep: 1754.38 | bwd_inner_microstep: 1754.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675
[2024-06-10 02:11:37,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.78 | bwd_microstep: 1551.87 | bwd_inner_microstep: 1551.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2072
[2024-06-10 02:11:38,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.45 | bwd_microstep: 822.80 | bwd_inner_microstep: 822.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 02:11:40,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1435.03 | bwd_inner_microstep: 1435.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 02:11:42,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.08 | bwd_microstep: 1407.64 | bwd_inner_microstep: 1407.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 02:11:44,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1493.56 | bwd_inner_microstep: 1493.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 02:11:46,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1398.10 | bwd_inner_microstep: 1398.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-10 02:11:48,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.28 | bwd_microstep: 1302.72 | bwd_inner_microstep: 1302.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799
[2024-06-10 02:11:50,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.85 | bwd_microstep: 1656.21 | bwd_inner_microstep: 1656.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 02:11:52,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.00 | bwd_microstep: 1438.18 | bwd_inner_microstep: 1438.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 02:11:54,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.59 | bwd_microstep: 1401.93 | bwd_inner_microstep: 1401.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188
[2024-06-10 02:12:01,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.39 | optimizer_step: 6.60
[2024-06-10 02:12:01,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.28 | bwd_microstep: 6206.60 | bwd_inner_microstep: 1086.82 | bwd_allreduce_microstep: 5119.71 | step_microstep: 39.07
[2024-06-10 02:12:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15839.99 | bwd: 47536.65 | bwd_inner: 42415.90 | bwd_allreduce: 5120.01 | step: 41.76
{'loss': 1.3698, 'learning_rate': 3.995929954003657e-05, 'epoch': 0.05}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 02:12:02,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.67 | bwd_microstep: 791.35 | bwd_inner_microstep: 791.19 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 02:12:04,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.13 | bwd_microstep: 1345.06 | bwd_inner_microstep: 1345.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 02:12:06,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1481.59 | bwd_inner_microstep: 1481.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 02:12:08,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.91 | bwd_microstep: 1380.05 | bwd_inner_microstep: 1380.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435
[2024-06-10 02:12:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.10 | bwd_microstep: 1153.97 | bwd_inner_microstep: 1153.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 02:12:11,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1386.00 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 02:12:13,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1398.56 | bwd_inner_microstep: 1398.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 02:12:15,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 02:12:17,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.03 | bwd_microstep: 1252.35 | bwd_inner_microstep: 1252.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 02:12:18,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.51 | bwd_microstep: 1257.21 | bwd_inner_microstep: 1257.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2164
[2024-06-10 02:12:20,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.78 | bwd_microstep: 953.47 | bwd_inner_microstep: 953.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 02:12:22,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.70 | bwd_microstep: 1616.67 | bwd_inner_microstep: 1616.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 02:12:24,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1253.55 | bwd_inner_microstep: 1253.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 02:12:26,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.00 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1347.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3953
[2024-06-10 02:12:28,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.99 | bwd_microstep: 1803.59 | bwd_inner_microstep: 1803.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377
[2024-06-10 02:12:30,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.41 | bwd_microstep: 1338.44 | bwd_inner_microstep: 1338.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2144
[2024-06-10 02:12:31,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.30 | bwd_microstep: 1027.22 | bwd_inner_microstep: 1027.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 02:12:33,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1407.56 | bwd_inner_microstep: 1407.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3690
[2024-06-10 02:12:35,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.37 | bwd_microstep: 1465.60 | bwd_inner_microstep: 1465.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 02:12:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 791.21 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2081
[2024-06-10 02:12:38,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.53 | bwd_microstep: 918.59 | bwd_inner_microstep: 918.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 02:12:39,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.89 | bwd_microstep: 977.88 | bwd_inner_microstep: 977.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 02:12:40,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.71 | bwd_microstep: 808.80 | bwd_inner_microstep: 808.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 02:12:41,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 703.17 | bwd_inner_microstep: 703.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452
[2024-06-10 02:12:43,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1384.79 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 02:12:45,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1465.71 | bwd_inner_microstep: 1465.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 02:12:46,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.61 | bwd_microstep: 880.88 | bwd_inner_microstep: 880.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3599
[2024-06-10 02:12:48,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.83 | bwd_microstep: 1557.56 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3578
[2024-06-10 02:12:51,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.69 | bwd_microstep: 1533.89 | bwd_inner_microstep: 1533.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3771
[2024-06-10 02:12:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.16 | bwd_microstep: 1474.61 | bwd_inner_microstep: 1474.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 02:12:54,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1385.14 | bwd_inner_microstep: 1385.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2206
[2024-06-10 02:13:00,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.38 | optimizer_step: 6.60
[2024-06-10 02:13:00,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.93 | bwd_microstep: 4752.14 | bwd_inner_microstep: 1161.45 | bwd_allreduce_microstep: 3590.63 | step_microstep: 39.83
[2024-06-10 02:13:00,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14907.40 | bwd: 43540.38 | bwd_inner: 39948.72 | bwd_allreduce: 3590.92 | step: 42.18


  5%|▍         | 81/1726 [1:29:19<28:24:21, 62.16s/it]
  5%|▍         | 82/1726 [1:30:28<29:24:31, 64.40s/it]


  5%|▍         | 82/1726 [1:30:28<29:24:31, 64.40s/it]
  5%|▍         | 83/1726 [1:31:30<28:57:29, 63.45s/it]


  5%|▍         | 83/1726 [1:31:30<28:57:29, 63.45s/it]
  5%|▍         | 84/1726 [1:32:32<28:51:47, 63.28s/it]


  5%|▍         | 84/1726 [1:32:32<28:51:47, 63.28s/it]
  5%|▍         | 85/1726 [1:33:34<28:34:50, 62.70s/it]


  5%|▍         | 85/1726 [1:33:34<28:34:50, 62.70s/it]
  5%|▍         | 86/1726 [1:34:38<28:42:33, 63.02s/it]


  5%|▍         | 86/1726 [1:34:38<28:42:33, 63.02s/it]
  5%|▌         | 87/1726 [1:35:36<28:07:10, 61.76s/it]
                    {'loss': 1.2946, 'learning_rate': 3.9956871060817065e-05, 'epoch': 0.05}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 02:13:02,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.23 | bwd_microstep: 1470.11 | bwd_inner_microstep: 1469.94 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.15
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2425
[2024-06-10 02:13:03,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.27 | bwd_microstep: 937.64 | bwd_inner_microstep: 937.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 02:13:05,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1464.23 | bwd_inner_microstep: 1464.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 02:13:07,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1354.98 | bwd_inner_microstep: 1354.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 02:13:09,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.54 | bwd_microstep: 1283.46 | bwd_inner_microstep: 1283.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1088
[2024-06-10 02:13:09,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 174.51 | bwd_microstep: 447.99 | bwd_inner_microstep: 447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 02:13:11,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.74 | bwd_microstep: 1483.19 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 02:13:13,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.54 | bwd_microstep: 1403.94 | bwd_inner_microstep: 1403.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 02:13:15,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.81 | bwd_microstep: 1535.20 | bwd_inner_microstep: 1535.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1882
[2024-06-10 02:13:16,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.54 | bwd_microstep: 728.02 | bwd_inner_microstep: 727.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 02:13:18,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.25 | bwd_microstep: 1354.72 | bwd_inner_microstep: 1354.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3509
[2024-06-10 02:13:20,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.45 | bwd_microstep: 1369.94 | bwd_inner_microstep: 1369.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 02:13:22,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.70 | bwd_microstep: 1430.50 | bwd_inner_microstep: 1430.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 02:13:24,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.92 | bwd_microstep: 1291.96 | bwd_inner_microstep: 1291.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 02:13:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.51 | bwd_microstep: 1584.93 | bwd_inner_microstep: 1584.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 02:13:28,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1511.58 | bwd_inner_microstep: 1511.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 02:13:30,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1261.72 | bwd_inner_microstep: 1261.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 02:13:32,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.31 | bwd_microstep: 1427.36 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468
[2024-06-10 02:13:34,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.23 | bwd_microstep: 1442.63 | bwd_inner_microstep: 1442.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 02:13:36,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.54 | bwd_microstep: 1528.69 | bwd_inner_microstep: 1528.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 02:13:38,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1490.33 | bwd_inner_microstep: 1490.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 02:13:40,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1392.43 | bwd_inner_microstep: 1392.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 02:13:42,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1436.18 | bwd_inner_microstep: 1436.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3698
[2024-06-10 02:13:44,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.98 | bwd_microstep: 1331.90 | bwd_inner_microstep: 1331.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154
[2024-06-10 02:13:45,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.15 | bwd_microstep: 855.64 | bwd_inner_microstep: 855.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2140
[2024-06-10 02:13:46,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.54 | bwd_microstep: 741.44 | bwd_inner_microstep: 741.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 02:13:48,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.00 | bwd_microstep: 1562.10 | bwd_inner_microstep: 1562.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 02:13:50,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1378.80 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 02:13:52,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.35 | bwd_microstep: 1353.65 | bwd_inner_microstep: 1353.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-10 02:13:54,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1307.69 | bwd_inner_microstep: 1307.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 02:13:56,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.54 | bwd_microstep: 1539.01 | bwd_inner_microstep: 1538.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 02:14:01,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.98 | optimizer_gradients: 4.45 | optimizer_step: 6.62
[2024-06-10 02:14:01,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.51 | bwd_microstep: 4356.15 | bwd_inner_microstep: 1744.66 | bwd_allreduce_microstep: 2611.41 | step_microstep: 40.42
[2024-06-10 02:14:01,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15842.76 | bwd: 45058.15 | bwd_inner: 42445.67 | bwd_allreduce: 2611.73 | step: 42.59
{'loss': 1.3286, 'learning_rate': 3.9954372293601415e-05, 'epoch': 0.05}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 02:14:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.46 | bwd_microstep: 1240.02 | bwd_inner_microstep: 1239.87 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3859
[2024-06-10 02:14:05,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.89 | bwd_microstep: 1659.29 | bwd_inner_microstep: 1659.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 02:14:07,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.27 | bwd_microstep: 1477.31 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 02:14:09,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1384.58 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 02:14:11,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1251.02 | bwd_inner_microstep: 1250.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 02:14:12,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1283.00 | bwd_inner_microstep: 1282.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 02:14:14,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1390.24 | bwd_inner_microstep: 1390.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586
[2024-06-10 02:14:16,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.90 | bwd_microstep: 1210.30 | bwd_inner_microstep: 1210.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-10 02:14:18,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.34 | bwd_microstep: 1411.23 | bwd_inner_microstep: 1411.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 02:14:20,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1287.24 | bwd_inner_microstep: 1287.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415
[2024-06-10 02:14:22,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1309.42 | bwd_inner_microstep: 1309.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3666
[2024-06-10 02:14:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.13 | bwd_microstep: 1449.06 | bwd_inner_microstep: 1449.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 02:14:25,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.38 | bwd_microstep: 1282.04 | bwd_inner_microstep: 1282.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 02:14:27,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.57 | bwd_microstep: 1438.60 | bwd_inner_microstep: 1438.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624
[2024-06-10 02:14:29,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1310.75 | bwd_inner_microstep: 1310.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 02:14:30,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 795.67 | bwd_inner_microstep: 795.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 02:14:32,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.35 | bwd_microstep: 1474.75 | bwd_inner_microstep: 1474.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219
[2024-06-10 02:14:34,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.16 | bwd_microstep: 960.44 | bwd_inner_microstep: 960.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2920
[2024-06-10 02:14:35,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.70 | bwd_microstep: 1323.38 | bwd_inner_microstep: 1323.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 02:14:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.37 | bwd_microstep: 1509.67 | bwd_inner_microstep: 1509.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 02:14:40,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1513.13 | bwd_inner_microstep: 1513.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3824
[2024-06-10 02:14:42,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1389.53 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 02:14:44,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.74 | bwd_microstep: 1644.14 | bwd_inner_microstep: 1644.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3714
[2024-06-10 02:14:46,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.46 | bwd_microstep: 1487.91 | bwd_inner_microstep: 1487.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799
[2024-06-10 02:14:48,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.97 | bwd_microstep: 1644.89 | bwd_inner_microstep: 1644.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 02:14:50,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.88 | bwd_microstep: 1657.45 | bwd_inner_microstep: 1657.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-10 02:14:52,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.75 | bwd_microstep: 890.92 | bwd_inner_microstep: 890.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 02:14:53,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1249.25 | bwd_inner_microstep: 1249.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2035
[2024-06-10 02:14:55,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.89 | bwd_microstep: 839.85 | bwd_inner_microstep: 839.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 02:14:56,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1373.46 | bwd_inner_microstep: 1373.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 02:14:58,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.37 | bwd_microstep: 1353.77 | bwd_inner_microstep: 1353.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 02:15:02,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.29 | optimizer_step: 6.59
[2024-06-10 02:15:02,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 3112.01 | bwd_inner_microstep: 1528.45 | bwd_allreduce_microstep: 1583.51 | step_microstep: 39.49
[2024-06-10 02:15:02,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16081.69 | bwd: 44604.37 | bwd_inner: 43019.83 | bwd_allreduce: 1583.80 | step: 42.08
{'loss': 1.3903, 'learning_rate': 3.995180324719029e-05, 'epoch': 0.05}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 02:15:04,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.04 | bwd_microstep: 1491.36 | bwd_inner_microstep: 1491.14 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.21
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3928
[2024-06-10 02:15:06,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.92 | bwd_microstep: 1398.19 | bwd_inner_microstep: 1398.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3881
[2024-06-10 02:15:08,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.38 | bwd_microstep: 1587.55 | bwd_inner_microstep: 1587.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 02:15:10,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.31 | bwd_microstep: 1286.71 | bwd_inner_microstep: 1286.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 02:15:12,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.36 | bwd_microstep: 1487.18 | bwd_inner_microstep: 1487.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782
[2024-06-10 02:15:14,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1352.33 | bwd_inner_microstep: 1352.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 02:15:16,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.82 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 02:15:18,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.76 | bwd_microstep: 1286.92 | bwd_inner_microstep: 1286.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-10 02:15:19,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.99 | bwd_microstep: 1312.50 | bwd_inner_microstep: 1312.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3717
[2024-06-10 02:15:21,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.75 | bwd_microstep: 1457.24 | bwd_inner_microstep: 1457.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4094
[2024-06-10 02:15:24,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.44 | bwd_microstep: 1528.46 | bwd_inner_microstep: 1528.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-10 02:15:26,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.13 | bwd_microstep: 1413.25 | bwd_inner_microstep: 1413.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3360
[2024-06-10 02:15:27,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.36 | bwd_microstep: 1211.11 | bwd_inner_microstep: 1211.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1980
[2024-06-10 02:15:28,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.22 | bwd_microstep: 848.82 | bwd_inner_microstep: 848.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3521
[2024-06-10 02:15:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.84 | bwd_microstep: 1355.79 | bwd_inner_microstep: 1355.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1853
[2024-06-10 02:15:31,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.18 | bwd_microstep: 676.97 | bwd_inner_microstep: 676.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 02:15:33,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.11 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 02:15:35,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.53 | bwd_microstep: 1597.67 | bwd_inner_microstep: 1597.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302
[2024-06-10 02:15:37,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.24 | bwd_microstep: 979.13 | bwd_inner_microstep: 979.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-10 02:15:38,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.11 | bwd_microstep: 1186.62 | bwd_inner_microstep: 1186.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 02:15:40,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.63 | bwd_microstep: 1303.04 | bwd_inner_microstep: 1303.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 02:15:42,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1389.10 | bwd_inner_microstep: 1389.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 02:15:44,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.55 | bwd_microstep: 1514.86 | bwd_inner_microstep: 1514.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 02:15:45,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.64 | bwd_microstep: 807.40 | bwd_inner_microstep: 807.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3848
[2024-06-10 02:15:47,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1370.86 | bwd_inner_microstep: 1370.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 02:15:49,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1508.20 | bwd_inner_microstep: 1508.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 02:15:52,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.97 | bwd_microstep: 1653.88 | bwd_inner_microstep: 1653.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 02:15:54,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.98 | bwd_microstep: 1458.43 | bwd_inner_microstep: 1458.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2071
[2024-06-10 02:15:55,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.82 | bwd_microstep: 821.11 | bwd_inner_microstep: 821.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 02:15:57,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1511.99 | bwd_inner_microstep: 1511.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 02:15:59,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.77 | bwd_microstep: 1263.41 | bwd_inner_microstep: 1263.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3827
[2024-06-10 02:16:03,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.38 | optimizer_step: 6.62
[2024-06-10 02:16:03,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.27 | bwd_microstep: 3190.12 | bwd_inner_microstep: 2069.55 | bwd_allreduce_microstep: 1120.50 | step_microstep: 39.62
[2024-06-10 02:16:03,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16074.65 | bwd: 44125.60 | bwd_inner: 43004.00 | bwd_allreduce: 1120.83 | step: 42.16
{'loss': 1.3086, 'learning_rate': 3.9949163930631846e-05, 'epoch': 0.05}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2037
[2024-06-10 02:16:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.04 | bwd_microstep: 839.22 | bwd_inner_microstep: 839.07 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3466
[2024-06-10 02:16:06,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.74 | bwd_microstep: 1246.16 | bwd_inner_microstep: 1246.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 02:16:07,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.79 | bwd_microstep: 1387.01 | bwd_inner_microstep: 1386.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 02:16:09,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1284.06 | bwd_inner_microstep: 1284.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3755
[2024-06-10 02:16:11,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1447.37 | bwd_inner_microstep: 1447.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225
[2024-06-10 02:16:13,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.85 | bwd_microstep: 961.65 | bwd_inner_microstep: 961.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 917
[2024-06-10 02:16:13,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 153.99 | bwd_microstep: 379.01 | bwd_inner_microstep: 378.70 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 02:16:15,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.83 | bwd_microstep: 1491.67 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3695
[2024-06-10 02:16:17,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.30 | bwd_microstep: 1364.79 | bwd_inner_microstep: 1364.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431
[2024-06-10 02:16:19,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.51 | bwd_microstep: 1154.90 | bwd_inner_microstep: 1154.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3722
[2024-06-10 02:16:21,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.53 | bwd_microstep: 1670.50 | bwd_inner_microstep: 1670.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3445
[2024-06-10 02:16:23,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1306.97 | bwd_inner_microstep: 1306.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 02:16:25,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.79 | bwd_microstep: 1383.52 | bwd_inner_microstep: 1383.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3928
[2024-06-10 02:16:27,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.02 | bwd_microstep: 1692.55 | bwd_inner_microstep: 1692.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3611
[2024-06-10 02:16:29,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.35 | bwd_microstep: 1457.77 | bwd_inner_microstep: 1457.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 02:16:31,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1391.94 | bwd_inner_microstep: 1391.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 02:16:33,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.96 | bwd_microstep: 1617.29 | bwd_inner_microstep: 1617.03 | bwd_allreduce_microstep: 0.15 | step_microstep: 0.28
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 02:16:35,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.78 | bwd_microstep: 1558.87 | bwd_inner_microstep: 1558.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 02:16:36,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.90 | bwd_microstep: 796.26 | bwd_inner_microstep: 796.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670
[2024-06-10 02:16:38,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1330.30 | bwd_inner_microstep: 1330.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3827
[2024-06-10 02:16:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.72 | bwd_microstep: 1727.48 | bwd_inner_microstep: 1727.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 02:16:43,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.77 | bwd_microstep: 1534.47 | bwd_inner_microstep: 1534.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 02:16:45,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.59 | bwd_microstep: 1617.52 | bwd_inner_microstep: 1617.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3539
[2024-06-10 02:16:47,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1358.15 | bwd_inner_microstep: 1358.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2025
[2024-06-10 02:16:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.34 | bwd_microstep: 747.24 | bwd_inner_microstep: 747.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 02:16:50,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.37 | bwd_microstep: 1455.26 | bwd_inner_microstep: 1455.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 02:16:52,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1430.99 | bwd_inner_microstep: 1430.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2272
[2024-06-10 02:16:53,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.52 | bwd_microstep: 1073.89 | bwd_inner_microstep: 1073.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1917
[2024-06-10 02:16:55,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.48 | bwd_microstep: 843.94 | bwd_inner_microstep: 843.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3590
[2024-06-10 02:16:56,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.94 | bwd_microstep: 1249.04 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2041
[2024-06-10 02:16:58,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.33 | bwd_microstep: 1005.74 | bwd_inner_microstep: 1005.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918
[2024-06-10 02:17:03,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.44 | optimizer_step: 6.63
[2024-06-10 02:17:03,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.73 | bwd_microstep: 5360.77 | bwd_inner_microstep: 1019.73 | bwd_allreduce_microstep: 4340.97 | step_microstep: 40.21
[2024-06-10 02:17:03,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15205.77 | bwd: 45166.36 | bwd_inner: 40823.84 | bwd_allreduce: 4341.52 | step: 43.04
{'loss': 1.3781, 'learning_rate': 3.994645435322174e-05, 'epoch': 0.05}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1892
[2024-06-10 02:17:05,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.17 | bwd_microstep: 802.00 | bwd_inner_microstep: 801.85 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 02:17:06,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.85 | bwd_microstep: 1249.70 | bwd_inner_microstep: 1249.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2342
[2024-06-10 02:17:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.30 | bwd_microstep: 948.74 | bwd_inner_microstep: 948.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 02:17:10,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.47 | bwd_microstep: 1408.66 | bwd_inner_microstep: 1408.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 02:17:11,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1385.74 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502
[2024-06-10 02:17:13,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.04 | bwd_microstep: 1223.20 | bwd_inner_microstep: 1223.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 02:17:15,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.15 | bwd_microstep: 1282.25 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2196
[2024-06-10 02:17:16,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.31 | bwd_microstep: 891.10 | bwd_inner_microstep: 891.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:17:18,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1252.71 | bwd_inner_microstep: 1252.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 02:17:20,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.41 | bwd_microstep: 1263.08 | bwd_inner_microstep: 1263.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 02:17:21,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1323.81 | bwd_inner_microstep: 1323.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3504
[2024-06-10 02:17:23,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.27 | bwd_microstep: 1224.12 | bwd_inner_microstep: 1224.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 1960
[2024-06-10 02:17:24,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.29 | bwd_microstep: 950.84 | bwd_inner_microstep: 950.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643
[2024-06-10 02:17:27,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.78 | bwd_microstep: 1607.66 | bwd_inner_microstep: 1607.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 02:17:29,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1341.12 | bwd_inner_microstep: 1341.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1845
[2024-06-10 02:17:30,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.32 | bwd_microstep: 702.83 | bwd_inner_microstep: 702.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1982
[2024-06-10 02:17:31,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.91 | bwd_microstep: 863.01 | bwd_inner_microstep: 862.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 02:17:33,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.18 | bwd_microstep: 1320.63 | bwd_inner_microstep: 1320.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 02:17:35,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.72 | bwd_microstep: 1666.86 | bwd_inner_microstep: 1666.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 02:17:37,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1409.55 | bwd_inner_microstep: 1409.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 02:17:39,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.84 | bwd_microstep: 1429.45 | bwd_inner_microstep: 1429.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 02:17:41,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.76 | bwd_microstep: 1557.06 | bwd_inner_microstep: 1557.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 02:17:43,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1609.10 | bwd_inner_microstep: 1609.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3629
[2024-06-10 02:17:45,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.29 | bwd_microstep: 1222.96 | bwd_inner_microstep: 1222.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-10 02:17:47,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 02:17:49,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.79 | bwd_microstep: 1510.38 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719
[2024-06-10 02:17:51,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.23 | bwd_microstep: 1340.10 | bwd_inner_microstep: 1339.87 | bwd_allreduce_microstep: 0.15 | step_microstep: 0.19
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3448
[2024-06-10 02:17:52,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1272.70 | bwd_inner_microstep: 1272.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053
[2024-06-10 02:17:54,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.11 | bwd_microstep: 910.18 | bwd_inner_microstep: 910.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 02:17:55,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.64 | bwd_microstep: 974.24 | bwd_inner_microstep: 974.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 02:17:56,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.52 | bwd_microstep: 809.55 | bwd_inner_microstep: 809.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 02:18:03,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.43 | optimizer_step: 6.59
[2024-06-10 02:18:03,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 6354.36 | bwd_inner_microstep: 1852.13 | bwd_allreduce_microstep: 4502.16 | step_microstep: 40.10
[2024-06-10 02:18:03,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14853.08 | bwd: 44419.10 | bwd_inner: 39915.68 | bwd_allreduce: 4502.56 | step: 42.74
{'loss': 1.3091, 'learning_rate': 3.99436745245031e-05, 'epoch': 0.05}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3394
[2024-06-10 02:18:05,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.72 | bwd_microstep: 1358.23 | bwd_inner_microstep: 1358.07 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3980
[2024-06-10 02:18:07,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.11 | bwd_microstep: 1443.54 | bwd_inner_microstep: 1443.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 02:18:09,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1382.22 | bwd_inner_microstep: 1382.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857
[2024-06-10 02:18:11,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.05 | bwd_microstep: 1561.07 | bwd_inner_microstep: 1561.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 02:18:13,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.91 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754
[2024-06-10 02:18:15,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.17 | bwd_microstep: 1447.27 | bwd_inner_microstep: 1447.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 02:18:17,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1390.77 | bwd_inner_microstep: 1390.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 02:18:19,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.52 | bwd_microstep: 1539.97 | bwd_inner_microstep: 1539.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 02:18:21,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.23 | bwd_microstep: 1292.60 | bwd_inner_microstep: 1292.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 02:18:23,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1354.98 | bwd_inner_microstep: 1354.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 02:18:25,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 02:18:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1382.74 | bwd_inner_microstep: 1382.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3691
[2024-06-10 02:18:29,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.93 | bwd_microstep: 1828.19 | bwd_inner_microstep: 1828.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3918
[2024-06-10 02:18:31,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.71 | bwd_microstep: 1591.05 | bwd_inner_microstep: 1591.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 02:18:33,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1346.96 | bwd_inner_microstep: 1346.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-10 02:18:35,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1417.83 | bwd_inner_microstep: 1417.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3923
[2024-06-10 02:18:37,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.23 | bwd_microstep: 1597.14 | bwd_inner_microstep: 1597.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687
[2024-06-10 02:18:39,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1425.52 | bwd_inner_microstep: 1425.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 02:18:41,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1376.61 | bwd_inner_microstep: 1376.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746
[2024-06-10 02:18:43,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.15 | bwd_microstep: 1646.54 | bwd_inner_microstep: 1646.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 02:18:45,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.86 | bwd_microstep: 805.15 | bwd_inner_microstep: 805.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 02:18:47,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.95 | bwd_microstep: 1464.66 | bwd_inner_microstep: 1464.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2192
[2024-06-10 02:18:48,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.87 | bwd_microstep: 858.45 | bwd_inner_microstep: 858.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3809
[2024-06-10 02:18:50,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 1406.32 | bwd_inner_microstep: 1406.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3819
[2024-06-10 02:18:52,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.53 | bwd_microstep: 1585.40 | bwd_inner_microstep: 1585.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3433
[2024-06-10 02:18:54,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1406.00 | bwd_inner_microstep: 1405.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2281
[2024-06-10 02:18:55,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.19 | bwd_microstep: 942.47 | bwd_inner_microstep: 942.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3762
[2024-06-10 02:18:57,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1505.36 | bwd_inner_microstep: 1505.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2264
[2024-06-10 02:18:59,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.04 | bwd_microstep: 905.28 | bwd_inner_microstep: 905.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807
[2024-06-10 02:19:01,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.89 | bwd_microstep: 1621.90 | bwd_inner_microstep: 1621.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821
[2024-06-10 02:19:03,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.47 | bwd_microstep: 1519.69 | bwd_inner_microstep: 1519.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2941
[2024-06-10 02:19:05,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 02:19:05,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.76 | bwd_microstep: 1236.62 | bwd_inner_microstep: 1228.87 | bwd_allreduce_microstep: 7.70 | step_microstep: 38.55
[2024-06-10 02:19:05,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16632.78 | bwd: 44507.61 | bwd_inner: 44498.88 | bwd_allreduce: 7.99 | step: 40.68


  5%|▌         | 87/1726 [1:35:36<28:07:10, 61.76s/it]
  5%|▌         | 88/1726 [1:36:38<28:02:15, 61.62s/it]


  5%|▌         | 88/1726 [1:36:38<28:02:15, 61.62s/it]
  5%|▌         | 89/1726 [1:37:39<27:56:47, 61.46s/it]


  5%|▌         | 89/1726 [1:37:39<27:56:47, 61.46s/it]
  5%|▌         | 90/1726 [1:38:39<27:48:43, 61.20s/it]


  5%|▌         | 90/1726 [1:38:39<27:48:43, 61.20s/it]
  5%|▌         | 91/1726 [1:39:40<27:44:16, 61.07s/it]


  5%|▌         | 91/1726 [1:39:40<27:44:16, 61.07s/it]
  5%|▌         | 92/1726 [1:40:40<27:31:40, 60.65s/it]


  5%|▌         | 92/1726 [1:40:40<27:31:40, 60.65s/it]
  5%|▌         | 93/1726 [1:41:41<27:37:52, 60.91s/it]
                        {'loss': 1.2806, 'learning_rate': 3.994082445426646e-05, 'epoch': 0.05}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 02:19:06,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.98 | bwd_microstep: 1243.34 | bwd_inner_microstep: 1243.20 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3919
[2024-06-10 02:19:08,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1546.87 | bwd_inner_microstep: 1546.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 02:19:11,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1653.98 | bwd_inner_microstep: 1653.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 02:19:13,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.44 | bwd_microstep: 1483.80 | bwd_inner_microstep: 1483.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 02:19:15,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 02:19:16,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1257.46 | bwd_inner_microstep: 1257.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3497
[2024-06-10 02:19:18,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.68 | bwd_microstep: 1222.17 | bwd_inner_microstep: 1222.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 02:19:20,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1256.47 | bwd_inner_microstep: 1256.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 02:19:21,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.58 | bwd_microstep: 1191.73 | bwd_inner_microstep: 1191.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 02:19:23,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.61 | bwd_microstep: 1358.23 | bwd_inner_microstep: 1358.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3425
[2024-06-10 02:19:25,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1298.01 | bwd_inner_microstep: 1297.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 02:19:27,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1385.32 | bwd_inner_microstep: 1385.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 02:19:29,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.68 | bwd_microstep: 1615.88 | bwd_inner_microstep: 1615.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 02:19:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.14 | bwd_microstep: 1445.55 | bwd_inner_microstep: 1445.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 02:19:33,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.30 | bwd_microstep: 1345.71 | bwd_inner_microstep: 1345.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 02:19:35,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1481.86 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3666
[2024-06-10 02:19:37,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1360.12 | bwd_inner_microstep: 1360.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 02:19:39,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1508.81 | bwd_inner_microstep: 1508.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 02:19:41,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.06 | bwd_microstep: 1257.87 | bwd_inner_microstep: 1257.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 02:19:43,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.35 | bwd_microstep: 1518.63 | bwd_inner_microstep: 1518.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 02:19:45,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.17 | bwd_microstep: 1184.80 | bwd_inner_microstep: 1184.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833
[2024-06-10 02:19:47,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1467.10 | bwd_inner_microstep: 1467.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 02:19:48,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1403.99 | bwd_inner_microstep: 1403.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 02:19:50,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1403.67 | bwd_inner_microstep: 1403.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 02:19:52,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.54 | bwd_microstep: 1297.93 | bwd_inner_microstep: 1297.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 02:19:54,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1298.17 | bwd_inner_microstep: 1298.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616
[2024-06-10 02:19:56,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 02:19:58,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.09 | bwd_microstep: 1634.04 | bwd_inner_microstep: 1634.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743
[2024-06-10 02:20:00,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.46 | bwd_microstep: 1535.04 | bwd_inner_microstep: 1535.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-10 02:20:02,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.27 | bwd_microstep: 915.85 | bwd_inner_microstep: 915.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3608
[2024-06-10 02:20:04,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.73 | bwd_microstep: 1714.36 | bwd_inner_microstep: 1714.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 02:20:06,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 02:20:06,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1723.39 | bwd_inner_microstep: 1585.93 | bwd_allreduce_microstep: 137.40 | step_microstep: 38.86
[2024-06-10 02:20:06,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16615.01 | bwd: 44605.03 | bwd_inner: 44466.60 | bwd_allreduce: 137.68 | step: 41.26
{'loss': 1.3446, 'learning_rate': 3.9937904152549746e-05, 'epoch': 0.05}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-10 02:20:08,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.45 | bwd_microstep: 1435.62 | bwd_inner_microstep: 1435.40 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 02:20:09,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.17 | bwd_microstep: 790.95 | bwd_inner_microstep: 790.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 02:20:11,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.74 | bwd_microstep: 1452.72 | bwd_inner_microstep: 1452.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-10 02:20:13,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.28 | bwd_microstep: 1443.99 | bwd_inner_microstep: 1443.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3485
[2024-06-10 02:20:15,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.99 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2220
[2024-06-10 02:20:16,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.75 | bwd_microstep: 894.72 | bwd_inner_microstep: 894.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3727
[2024-06-10 02:20:18,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1482.72 | bwd_inner_microstep: 1482.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 02:20:20,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1287.21 | bwd_inner_microstep: 1287.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 02:20:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.67 | bwd_microstep: 1405.37 | bwd_inner_microstep: 1405.28 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.27
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 02:20:24,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.22 | bwd_microstep: 1388.26 | bwd_inner_microstep: 1388.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 02:20:26,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.50 | bwd_microstep: 1287.74 | bwd_inner_microstep: 1287.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1930
[2024-06-10 02:20:27,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.82 | bwd_microstep: 822.12 | bwd_inner_microstep: 822.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 02:20:28,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.57 | bwd_microstep: 802.14 | bwd_inner_microstep: 802.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-10 02:20:30,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1322.02 | bwd_inner_microstep: 1321.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 02:20:31,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.90 | bwd_microstep: 1155.00 | bwd_inner_microstep: 1154.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 02:20:33,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1310.36 | bwd_inner_microstep: 1310.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 02:20:35,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.51 | bwd_microstep: 1255.12 | bwd_inner_microstep: 1255.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 02:20:37,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1393.36 | bwd_inner_microstep: 1393.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 02:20:39,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1494.36 | bwd_inner_microstep: 1494.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 02:20:41,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.58 | bwd_microstep: 1407.20 | bwd_inner_microstep: 1407.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 02:20:42,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.77 | bwd_microstep: 805.39 | bwd_inner_microstep: 805.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 884
[2024-06-10 02:20:43,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.92 | bwd_microstep: 373.32 | bwd_inner_microstep: 373.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 02:20:44,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3682
[2024-06-10 02:20:47,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1557.91 | bwd_inner_microstep: 1557.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471
[2024-06-10 02:20:49,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1447.62 | bwd_inner_microstep: 1447.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 02:20:51,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.67 | bwd_microstep: 1481.17 | bwd_inner_microstep: 1481.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 02:20:52,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1347.67 | bwd_inner_microstep: 1347.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3007
[2024-06-10 02:20:54,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.01 | bwd_microstep: 1117.55 | bwd_inner_microstep: 1117.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3630
[2024-06-10 02:20:56,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.56 | bwd_microstep: 1548.01 | bwd_inner_microstep: 1547.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-10 02:20:58,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.41 | bwd_microstep: 1424.24 | bwd_inner_microstep: 1424.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463
[2024-06-10 02:21:00,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.17 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3581
[2024-06-10 02:21:08,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.45 | optimizer_step: 6.61
[2024-06-10 02:21:08,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.01 | bwd_microstep: 7112.84 | bwd_inner_microstep: 1879.53 | bwd_allreduce_microstep: 5233.23 | step_microstep: 40.10
[2024-06-10 02:21:08,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15224.27 | bwd: 45840.97 | bwd_inner: 40606.55 | bwd_allreduce: 5233.60 | step: 42.46
{'loss': 1.3393, 'learning_rate': 3.993491362963826e-05, 'epoch': 0.06}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1926
[2024-06-10 02:21:09,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.43 | bwd_microstep: 875.91 | bwd_inner_microstep: 875.71 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 02:21:11,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1255.25 | bwd_inner_microstep: 1255.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3877
[2024-06-10 02:21:13,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.56 | bwd_microstep: 1582.82 | bwd_inner_microstep: 1582.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 02:21:15,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1354.01 | bwd_inner_microstep: 1353.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 02:21:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.52 | bwd_microstep: 1633.53 | bwd_inner_microstep: 1633.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 02:21:18,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.88 | bwd_microstep: 973.28 | bwd_inner_microstep: 973.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3717
[2024-06-10 02:21:20,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1272.85 | bwd_inner_microstep: 1272.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 02:21:22,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.26 | bwd_microstep: 1390.49 | bwd_inner_microstep: 1390.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 02:21:24,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.35 | bwd_microstep: 1629.92 | bwd_inner_microstep: 1629.75 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 02:21:26,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.88 | bwd_microstep: 1435.96 | bwd_inner_microstep: 1435.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-10 02:21:28,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1318.14 | bwd_inner_microstep: 1318.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 02:21:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 795.15 | bwd_inner_microstep: 795.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 02:21:31,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1522.01 | bwd_inner_microstep: 1521.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2112
[2024-06-10 02:21:32,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.17 | bwd_microstep: 828.11 | bwd_inner_microstep: 828.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3584
[2024-06-10 02:21:35,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.75 | bwd_microstep: 1647.80 | bwd_inner_microstep: 1647.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-10 02:21:36,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.69 | bwd_microstep: 1315.54 | bwd_inner_microstep: 1315.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 02:21:38,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.09 | bwd_microstep: 1449.77 | bwd_inner_microstep: 1449.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 02:21:41,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.40 | bwd_microstep: 1558.58 | bwd_inner_microstep: 1558.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 02:21:42,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1353.42 | bwd_inner_microstep: 1353.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-10 02:21:44,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1318.37 | bwd_inner_microstep: 1318.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3606
[2024-06-10 02:21:46,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.21 | bwd_microstep: 1248.14 | bwd_inner_microstep: 1248.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 02:21:48,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.59 | bwd_microstep: 1505.07 | bwd_inner_microstep: 1505.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3473
[2024-06-10 02:21:50,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1396.98 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 02:21:52,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 02:21:54,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.41 | bwd_microstep: 1664.12 | bwd_inner_microstep: 1664.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 02:21:56,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1392.28 | bwd_inner_microstep: 1392.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3412
[2024-06-10 02:21:58,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1378.62 | bwd_inner_microstep: 1378.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 02:22:00,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1406.35 | bwd_inner_microstep: 1406.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 02:22:02,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.83 | bwd_microstep: 1514.30 | bwd_inner_microstep: 1514.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3468
[2024-06-10 02:22:04,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.94 | bwd_microstep: 1214.19 | bwd_inner_microstep: 1214.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 02:22:06,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1377.59 | bwd_inner_microstep: 1377.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 02:22:09,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.37 | optimizer_step: 6.61
[2024-06-10 02:22:09,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.05 | bwd_microstep: 2708.08 | bwd_inner_microstep: 1568.04 | bwd_allreduce_microstep: 1139.99 | step_microstep: 39.30
[2024-06-10 02:22:09,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16264.99 | bwd: 44694.62 | bwd_inner: 43553.42 | bwd_allreduce: 1140.37 | step: 42.06
{'loss': 1.3528, 'learning_rate': 3.9931852896064606e-05, 'epoch': 0.06}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 02:22:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1378.96 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 02:22:13,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.24 | bwd_microstep: 1384.39 | bwd_inner_microstep: 1384.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 02:22:15,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1484.33 | bwd_inner_microstep: 1484.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2342
[2024-06-10 02:22:16,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.55 | bwd_microstep: 989.16 | bwd_inner_microstep: 989.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 02:22:18,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.25 | bwd_microstep: 1253.75 | bwd_inner_microstep: 1253.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399
[2024-06-10 02:22:20,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.15 | bwd_microstep: 1373.02 | bwd_inner_microstep: 1373.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 02:22:22,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.90 | bwd_microstep: 1540.22 | bwd_inner_microstep: 1540.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 02:22:24,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.13 | bwd_microstep: 1396.15 | bwd_inner_microstep: 1396.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408
[2024-06-10 02:22:26,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.21 | bwd_microstep: 1317.03 | bwd_inner_microstep: 1317.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 02:22:28,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1357.28 | bwd_inner_microstep: 1357.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137
[2024-06-10 02:22:29,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.79 | bwd_microstep: 830.85 | bwd_inner_microstep: 830.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-10 02:22:30,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.13 | bwd_microstep: 1198.49 | bwd_inner_microstep: 1198.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2040
[2024-06-10 02:22:32,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.18 | bwd_microstep: 836.76 | bwd_inner_microstep: 836.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 02:22:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1345.59 | bwd_inner_microstep: 1345.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 02:22:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1436.84 | bwd_inner_microstep: 1436.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3617
[2024-06-10 02:22:38,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.98 | bwd_microstep: 1808.67 | bwd_inner_microstep: 1808.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 02:22:40,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1384.90 | bwd_inner_microstep: 1384.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 02:22:42,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1554.94 | bwd_inner_microstep: 1554.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 02:22:44,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1412.29 | bwd_inner_microstep: 1412.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 02:22:46,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1509.51 | bwd_inner_microstep: 1509.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 02:22:48,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546
[2024-06-10 02:22:50,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.03 | bwd_microstep: 1200.60 | bwd_inner_microstep: 1200.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 02:22:52,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1502.00 | bwd_inner_microstep: 1501.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 02:22:54,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.05 | bwd_microstep: 1404.15 | bwd_inner_microstep: 1404.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512
[2024-06-10 02:22:55,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.70 | bwd_microstep: 1197.56 | bwd_inner_microstep: 1197.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3555
[2024-06-10 02:22:57,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.20 | bwd_microstep: 1352.07 | bwd_inner_microstep: 1352.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3627
[2024-06-10 02:22:59,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.05 | bwd_microstep: 1538.40 | bwd_inner_microstep: 1538.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3543
[2024-06-10 02:23:01,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.39 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3540
[2024-06-10 02:23:04,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.01 | bwd_microstep: 1639.09 | bwd_inner_microstep: 1639.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3726
[2024-06-10 02:23:06,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.81 | bwd_microstep: 1602.07 | bwd_inner_microstep: 1602.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581
[2024-06-10 02:23:08,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.80 | bwd_microstep: 1699.15 | bwd_inner_microstep: 1699.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 02:23:13,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 02:23:13,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.35 | bwd_microstep: 4071.18 | bwd_inner_microstep: 1857.47 | bwd_allreduce_microstep: 2213.65 | step_microstep: 39.58
[2024-06-10 02:23:13,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16610.17 | bwd: 46801.05 | bwd_inner: 44586.35 | bwd_allreduce: 2213.95 | step: 42.05
{'loss': 1.3918, 'learning_rate': 3.992872196260866e-05, 'epoch': 0.06}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3392
[2024-06-10 02:23:15,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.29 | bwd_microstep: 1302.83 | bwd_inner_microstep: 1302.61 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 02:23:16,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.30 | bwd_microstep: 1280.17 | bwd_inner_microstep: 1280.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3936
[2024-06-10 02:23:19,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.49 | bwd_microstep: 1693.07 | bwd_inner_microstep: 1693.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 02:23:21,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 02:23:23,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.33 | bwd_microstep: 1400.04 | bwd_inner_microstep: 1400.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 02:23:25,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.54 | bwd_microstep: 1444.56 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 02:23:26,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.26 | bwd_microstep: 1314.28 | bwd_inner_microstep: 1314.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3607
[2024-06-10 02:23:28,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.87 | bwd_microstep: 1245.19 | bwd_inner_microstep: 1245.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2017
[2024-06-10 02:23:29,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.29 | bwd_microstep: 743.65 | bwd_inner_microstep: 743.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3698
[2024-06-10 02:23:31,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.50 | bwd_microstep: 1588.21 | bwd_inner_microstep: 1588.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 02:23:34,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1621.81 | bwd_inner_microstep: 1621.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 02:23:35,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1343.10 | bwd_inner_microstep: 1343.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3662
[2024-06-10 02:23:37,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1353.85 | bwd_inner_microstep: 1353.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3715
[2024-06-10 02:23:39,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.17 | bwd_microstep: 1338.24 | bwd_inner_microstep: 1338.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3789
[2024-06-10 02:23:41,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.61 | bwd_microstep: 1502.99 | bwd_inner_microstep: 1502.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 02:23:43,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1478.88 | bwd_inner_microstep: 1478.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 02:23:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.87 | bwd_microstep: 702.51 | bwd_inner_microstep: 702.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 02:23:46,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.08 | bwd_microstep: 1253.44 | bwd_inner_microstep: 1253.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 02:23:48,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.05 | bwd_microstep: 1406.03 | bwd_inner_microstep: 1406.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 02:23:50,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1437.37 | bwd_inner_microstep: 1437.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 02:23:52,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1401.73 | bwd_inner_microstep: 1401.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 02:23:54,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1385.64 | bwd_inner_microstep: 1385.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2196
[2024-06-10 02:23:55,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.69 | bwd_microstep: 862.58 | bwd_inner_microstep: 862.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-10 02:23:57,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.93 | bwd_microstep: 1334.76 | bwd_inner_microstep: 1334.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 02:23:59,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 02:24:01,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.40 | bwd_microstep: 1559.18 | bwd_inner_microstep: 1559.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 02:24:03,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1612.31 | bwd_inner_microstep: 1612.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 02:24:05,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.14 | bwd_microstep: 1288.09 | bwd_inner_microstep: 1288.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 02:24:07,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.22 | bwd_microstep: 1610.66 | bwd_inner_microstep: 1610.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2038
[2024-06-10 02:24:08,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.08 | bwd_microstep: 845.11 | bwd_inner_microstep: 845.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2411
[2024-06-10 02:24:10,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.67 | bwd_microstep: 1036.27 | bwd_inner_microstep: 1036.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3588
[2024-06-10 02:24:13,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.32 | optimizer_step: 6.61
[2024-06-10 02:24:13,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.81 | bwd_microstep: 2836.35 | bwd_inner_microstep: 1879.84 | bwd_allreduce_microstep: 956.45 | step_microstep: 39.33
[2024-06-10 02:24:13,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16096.11 | bwd: 44008.36 | bwd_inner: 43050.82 | bwd_allreduce: 956.77 | step: 42.05
{'loss': 1.3496, 'learning_rate': 3.992552084029757e-05, 'epoch': 0.06}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 02:24:15,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.74 | bwd_microstep: 1249.93 | bwd_inner_microstep: 1249.71 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 02:24:17,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.24 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3538
[2024-06-10 02:24:19,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.33 | bwd_microstep: 1257.57 | bwd_inner_microstep: 1257.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 02:24:21,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.72 | bwd_microstep: 1561.41 | bwd_inner_microstep: 1561.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3803
[2024-06-10 02:24:23,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.24 | bwd_microstep: 1515.80 | bwd_inner_microstep: 1515.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3800
[2024-06-10 02:24:25,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.83 | bwd_microstep: 1457.00 | bwd_inner_microstep: 1456.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 02:24:27,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.84 | bwd_microstep: 1536.01 | bwd_inner_microstep: 1535.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 02:24:28,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.88 | bwd_microstep: 796.99 | bwd_inner_microstep: 796.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 02:24:30,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1301.54 | bwd_inner_microstep: 1301.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 02:24:32,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1256.31 | bwd_inner_microstep: 1256.14 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 02:24:34,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1490.77 | bwd_inner_microstep: 1490.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 02:24:36,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.05 | bwd_microstep: 1484.80 | bwd_inner_microstep: 1484.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-10 02:24:37,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.52 | bwd_microstep: 902.97 | bwd_inner_microstep: 902.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 02:24:39,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504
[2024-06-10 02:24:41,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.04 | bwd_microstep: 1319.39 | bwd_inner_microstep: 1319.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-10 02:24:42,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.47 | bwd_microstep: 928.23 | bwd_inner_microstep: 928.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3651
[2024-06-10 02:24:44,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.60 | bwd_microstep: 1569.65 | bwd_inner_microstep: 1569.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3606
[2024-06-10 02:24:46,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.54 | bwd_microstep: 1491.97 | bwd_inner_microstep: 1491.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3425
[2024-06-10 02:24:48,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1514.56 | bwd_inner_microstep: 1514.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666
[2024-06-10 02:24:50,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.82 | bwd_microstep: 1591.40 | bwd_inner_microstep: 1591.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3604
[2024-06-10 02:24:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.19 | bwd_microstep: 1640.40 | bwd_inner_microstep: 1640.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3820
[2024-06-10 02:24:55,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.37 | bwd_microstep: 1624.97 | bwd_inner_microstep: 1624.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 02:24:57,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1407.23 | bwd_inner_microstep: 1407.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 02:24:59,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.45 | bwd_microstep: 1630.94 | bwd_inner_microstep: 1630.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 02:25:01,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1416.25 | bwd_inner_microstep: 1416.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 02:25:02,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.84 | bwd_microstep: 978.23 | bwd_inner_microstep: 978.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 02:25:04,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.39 | bwd_microstep: 1299.67 | bwd_inner_microstep: 1299.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2267
[2024-06-10 02:25:05,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.29 | bwd_microstep: 875.78 | bwd_inner_microstep: 875.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1406
[2024-06-10 02:25:06,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 209.10 | bwd_microstep: 532.85 | bwd_inner_microstep: 532.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 02:25:08,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573
[2024-06-10 02:25:10,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1428.88 | bwd_inner_microstep: 1428.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 02:25:14,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.28 | optimizer_step: 6.62
[2024-06-10 02:25:14,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 3456.19 | bwd_inner_microstep: 2109.84 | bwd_allreduce_microstep: 1346.30 | step_microstep: 39.58
[2024-06-10 02:25:14,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.14 | bwd: 44473.16 | bwd_inner: 43125.64 | bwd_allreduce: 1346.70 | step: 41.99


  5%|▌         | 93/1726 [1:41:41<27:37:52, 60.91s/it]
  5%|▌         | 94/1726 [1:42:43<27:42:33, 61.12s/it]


  5%|▌         | 94/1726 [1:42:43<27:42:33, 61.12s/it]
  6%|▌         | 95/1726 [1:43:44<27:44:10, 61.22s/it]


  6%|▌         | 95/1726 [1:43:44<27:44:10, 61.22s/it]
  6%|▌         | 96/1726 [1:44:46<27:44:16, 61.26s/it]


  6%|▌         | 96/1726 [1:44:46<27:44:16, 61.26s/it]
  6%|▌         | 97/1726 [1:45:50<28:03:57, 62.02s/it]


  6%|▌         | 97/1726 [1:45:50<28:03:57, 62.02s/it]
  6%|▌         | 98/1726 [1:46:50<27:50:30, 61.57s/it]


  6%|▌         | 98/1726 [1:46:50<27:50:30, 61.57s/it]
  6%|▌         | 99/1726 [1:47:51<27:43:25, 61.34s/it]
                            {'loss': 1.3672, 'learning_rate': 3.9922249540405654e-05, 'epoch': 0.06}
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3452
[2024-06-10 02:25:16,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.67 | bwd_microstep: 1424.26 | bwd_inner_microstep: 1424.04 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 02:25:18,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.78 | bwd_microstep: 1244.80 | bwd_inner_microstep: 1244.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 02:25:20,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.67 | bwd_microstep: 1343.82 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 02:25:22,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1411.37 | bwd_inner_microstep: 1411.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 02:25:24,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.14 | bwd_microstep: 1542.08 | bwd_inner_microstep: 1542.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 02:25:25,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.07 | bwd_microstep: 1246.74 | bwd_inner_microstep: 1246.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862
[2024-06-10 02:25:26,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.59 | bwd_microstep: 681.01 | bwd_inner_microstep: 680.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 02:25:28,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.91 | bwd_microstep: 1351.73 | bwd_inner_microstep: 1351.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 02:25:30,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.21 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 02:25:31,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.01 | bwd_microstep: 791.10 | bwd_inner_microstep: 791.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2470
[2024-06-10 02:25:33,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.12 | bwd_microstep: 954.21 | bwd_inner_microstep: 954.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-10 02:25:35,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1429.17 | bwd_inner_microstep: 1429.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 02:25:37,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.35 | bwd_microstep: 1482.66 | bwd_inner_microstep: 1482.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 02:25:39,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2996
[2024-06-10 02:25:40,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1272.34 | bwd_inner_microstep: 1272.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 2950
[2024-06-10 02:25:42,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.54 | bwd_microstep: 1311.80 | bwd_inner_microstep: 1311.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 02:25:44,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1323.98 | bwd_inner_microstep: 1323.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2099
[2024-06-10 02:25:45,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.92 | bwd_microstep: 833.91 | bwd_inner_microstep: 833.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 02:25:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1516.53 | bwd_inner_microstep: 1516.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071
[2024-06-10 02:25:49,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.90 | bwd_microstep: 917.98 | bwd_inner_microstep: 917.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2103
[2024-06-10 02:25:50,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.92 | bwd_microstep: 859.27 | bwd_inner_microstep: 859.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 02:25:52,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.26 | bwd_microstep: 1384.16 | bwd_inner_microstep: 1384.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 02:25:53,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.16 | bwd_microstep: 1258.31 | bwd_inner_microstep: 1258.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 02:25:55,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.76 | bwd_microstep: 1516.42 | bwd_inner_microstep: 1516.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 02:25:58,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.65 | bwd_microstep: 1660.91 | bwd_inner_microstep: 1660.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 02:26:00,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.84 | bwd_microstep: 1629.61 | bwd_inner_microstep: 1629.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 02:26:02,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1551.79 | bwd_inner_microstep: 1551.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 02:26:04,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.06 | bwd_microstep: 1509.20 | bwd_inner_microstep: 1509.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3560
[2024-06-10 02:26:07,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.35 | bwd_microstep: 1668.36 | bwd_inner_microstep: 1668.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 02:26:09,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.65 | bwd_microstep: 1504.49 | bwd_inner_microstep: 1504.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 02:26:10,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1350.82 | bwd_inner_microstep: 1350.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3769
[2024-06-10 02:26:13,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 02:26:13,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.83 | bwd_microstep: 1720.97 | bwd_inner_microstep: 1312.56 | bwd_allreduce_microstep: 408.36 | step_microstep: 39.06
[2024-06-10 02:26:13,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15744.87 | bwd: 42476.06 | bwd_inner: 42066.62 | bwd_allreduce: 408.69 | step: 41.02
{'loss': 1.3768, 'learning_rate': 3.991890807445443e-05, 'epoch': 0.06}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 02:26:14,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.78 | bwd_microstep: 890.30 | bwd_inner_microstep: 890.14 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866
[2024-06-10 02:26:16,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.21 | bwd_microstep: 1567.07 | bwd_inner_microstep: 1567.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 02:26:18,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1512.26 | bwd_inner_microstep: 1512.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3428
[2024-06-10 02:26:20,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.81 | bwd_microstep: 1219.23 | bwd_inner_microstep: 1219.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1407
[2024-06-10 02:26:21,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 218.38 | bwd_microstep: 564.15 | bwd_inner_microstep: 564.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 02:26:22,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.32 | bwd_microstep: 1250.62 | bwd_inner_microstep: 1250.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-10 02:26:24,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.15 | bwd_microstep: 1193.97 | bwd_inner_microstep: 1193.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1909
[2024-06-10 02:26:25,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.46 | bwd_microstep: 814.95 | bwd_inner_microstep: 814.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4109
[2024-06-10 02:26:28,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.05 | bwd_microstep: 1657.50 | bwd_inner_microstep: 1657.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3719
[2024-06-10 02:26:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.44 | bwd_microstep: 1801.89 | bwd_inner_microstep: 1801.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-10 02:26:32,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.42 | bwd_microstep: 1586.16 | bwd_inner_microstep: 1586.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 02:26:34,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1292.55 | bwd_inner_microstep: 1292.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 02:26:36,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1507.43 | bwd_inner_microstep: 1507.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 02:26:38,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1353.12 | bwd_inner_microstep: 1353.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971
[2024-06-10 02:26:39,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 896.64 | bwd_inner_microstep: 896.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3476
[2024-06-10 02:26:41,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1431.20 | bwd_inner_microstep: 1431.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 02:26:43,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.83 | bwd_microstep: 1452.99 | bwd_inner_microstep: 1452.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3650
[2024-06-10 02:26:45,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.42 | bwd_microstep: 1260.28 | bwd_inner_microstep: 1260.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-10 02:26:47,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.27 | bwd_microstep: 1755.51 | bwd_inner_microstep: 1755.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3821
[2024-06-10 02:26:49,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1530.64 | bwd_inner_microstep: 1530.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1994
[2024-06-10 02:26:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.31 | bwd_microstep: 757.56 | bwd_inner_microstep: 757.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2169
[2024-06-10 02:26:52,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.57 | bwd_microstep: 861.22 | bwd_inner_microstep: 861.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 02:26:54,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.55 | bwd_microstep: 1489.48 | bwd_inner_microstep: 1489.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 02:26:55,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.21 | bwd_microstep: 1167.15 | bwd_inner_microstep: 1167.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449
[2024-06-10 02:26:57,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1291.78 | bwd_inner_microstep: 1291.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 02:26:59,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.87 | bwd_microstep: 1163.45 | bwd_inner_microstep: 1163.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 02:27:01,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 02:27:02,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.51 | bwd_microstep: 1159.95 | bwd_inner_microstep: 1159.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 02:27:04,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.00 | bwd_microstep: 1506.21 | bwd_inner_microstep: 1506.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 02:27:06,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1468.20 | bwd_inner_microstep: 1468.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 02:27:08,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1284.96 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3803
[2024-06-10 02:27:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.42 | optimizer_step: 6.60
[2024-06-10 02:27:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.20 | bwd_microstep: 6516.49 | bwd_inner_microstep: 2065.84 | bwd_allreduce_microstep: 4450.57 | step_microstep: 40.12
[2024-06-10 02:27:15,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15587.66 | bwd: 46489.19 | bwd_inner: 42037.56 | bwd_allreduce: 4450.87 | step: 42.50
{'loss': 1.2732, 'learning_rate': 3.991549645421252e-05, 'epoch': 0.06}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 02:27:17,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1489.87 | bwd_inner_microstep: 1489.64 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3455
[2024-06-10 02:27:19,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.83 | bwd_microstep: 1304.93 | bwd_inner_microstep: 1304.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 02:27:21,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.32 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 02:27:23,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.49 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 02:27:25,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.09 | bwd_microstep: 1320.87 | bwd_inner_microstep: 1320.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482
[2024-06-10 02:27:26,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.15 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 02:27:28,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.96 | bwd_microstep: 1191.05 | bwd_inner_microstep: 1191.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 02:27:30,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1386.56 | bwd_inner_microstep: 1386.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3582
[2024-06-10 02:27:32,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.80 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.58 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 02:27:34,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.27 | bwd_microstep: 1258.94 | bwd_inner_microstep: 1258.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-10 02:27:36,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.85 | bwd_microstep: 1721.58 | bwd_inner_microstep: 1721.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 02:27:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.38 | bwd_microstep: 1390.39 | bwd_inner_microstep: 1390.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 02:27:40,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.71 | bwd_microstep: 1501.97 | bwd_inner_microstep: 1501.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401
[2024-06-10 02:27:42,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.92 | bwd_microstep: 1214.12 | bwd_inner_microstep: 1214.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3404
[2024-06-10 02:27:44,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.62 | bwd_microstep: 1405.27 | bwd_inner_microstep: 1405.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456
[2024-06-10 02:27:46,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.15 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591
[2024-06-10 02:27:48,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.87 | bwd_microstep: 1468.89 | bwd_inner_microstep: 1468.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 02:27:49,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 801.84 | bwd_inner_microstep: 801.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 02:27:51,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1320.43 | bwd_inner_microstep: 1320.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 02:27:52,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1395.72 | bwd_inner_microstep: 1395.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 02:27:54,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.45 | bwd_microstep: 1303.62 | bwd_inner_microstep: 1303.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3811
[2024-06-10 02:27:57,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.81 | bwd_microstep: 1625.48 | bwd_inner_microstep: 1625.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 02:27:58,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.86 | bwd_microstep: 1286.43 | bwd_inner_microstep: 1286.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 02:27:59,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 806.64 | bwd_inner_microstep: 806.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3709
[2024-06-10 02:28:01,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.41 | bwd_microstep: 1467.83 | bwd_inner_microstep: 1467.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2041
[2024-06-10 02:28:03,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.64 | bwd_microstep: 753.70 | bwd_inner_microstep: 753.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 02:28:04,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.91 | bwd_microstep: 1168.64 | bwd_inner_microstep: 1168.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 02:28:06,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1452.14 | bwd_inner_microstep: 1452.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 02:28:08,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1401.97 | bwd_inner_microstep: 1401.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3598
[2024-06-10 02:28:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1341.61 | bwd_inner_microstep: 1341.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 02:28:12,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.22 | bwd_microstep: 1486.23 | bwd_inner_microstep: 1486.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 02:28:16,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.41 | optimizer_step: 6.59
[2024-06-10 02:28:16,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 3452.59 | bwd_inner_microstep: 1506.29 | bwd_allreduce_microstep: 1946.23 | step_microstep: 40.10
[2024-06-10 02:28:16,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15971.90 | bwd: 44459.35 | bwd_inner: 42511.87 | bwd_allreduce: 1946.61 | step: 42.46
{'loss': 1.3019, 'learning_rate': 3.9912014691695614e-05, 'epoch': 0.06}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382
[2024-06-10 02:28:18,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.69 | bwd_microstep: 1278.61 | bwd_inner_microstep: 1278.46 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 02:28:20,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1379.75 | bwd_inner_microstep: 1379.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 02:28:22,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 02:28:24,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.69 | bwd_microstep: 1656.22 | bwd_inner_microstep: 1656.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 02:28:26,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.15 | bwd_microstep: 1482.21 | bwd_inner_microstep: 1482.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 02:28:28,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.68 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1483.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-10 02:28:30,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.72 | bwd_microstep: 1633.45 | bwd_inner_microstep: 1633.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870
[2024-06-10 02:28:31,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.80 | bwd_microstep: 743.50 | bwd_inner_microstep: 743.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3692
[2024-06-10 02:28:34,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.05 | bwd_microstep: 1665.29 | bwd_inner_microstep: 1665.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 02:28:36,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1433.62 | bwd_inner_microstep: 1433.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3980
[2024-06-10 02:28:38,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.20 | bwd_microstep: 1713.71 | bwd_inner_microstep: 1713.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1963
[2024-06-10 02:28:39,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.89 | bwd_microstep: 847.41 | bwd_inner_microstep: 847.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3506
[2024-06-10 02:28:41,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.60 | bwd_microstep: 1581.35 | bwd_inner_microstep: 1581.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 02:28:43,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.29 | bwd_microstep: 1257.80 | bwd_inner_microstep: 1257.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 02:28:44,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.08 | bwd_microstep: 976.27 | bwd_inner_microstep: 976.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-10 02:28:46,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.25 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3558
[2024-06-10 02:28:48,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.88 | bwd_microstep: 1205.21 | bwd_inner_microstep: 1205.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 02:28:50,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.06 | bwd_microstep: 1260.17 | bwd_inner_microstep: 1260.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 02:28:51,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.29 | bwd_microstep: 1258.27 | bwd_inner_microstep: 1258.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 02:28:53,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.01 | bwd_microstep: 1298.18 | bwd_inner_microstep: 1298.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 02:28:55,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1300.29 | bwd_inner_microstep: 1300.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3568
[2024-06-10 02:28:57,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1266.64 | bwd_inner_microstep: 1266.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 02:28:59,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.32 | bwd_microstep: 1495.55 | bwd_inner_microstep: 1495.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 02:29:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1494.06 | bwd_inner_microstep: 1494.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 02:29:03,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1406.12 | bwd_inner_microstep: 1406.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 02:29:05,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.69 | bwd_microstep: 1507.53 | bwd_inner_microstep: 1507.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555
[2024-06-10 02:29:07,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.63 | bwd_microstep: 1332.30 | bwd_inner_microstep: 1332.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3801
[2024-06-10 02:29:08,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1293.83 | bwd_inner_microstep: 1293.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2013
[2024-06-10 02:29:10,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.41 | bwd_microstep: 867.63 | bwd_inner_microstep: 867.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 02:29:12,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.75 | bwd_microstep: 1400.28 | bwd_inner_microstep: 1400.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2268
[2024-06-10 02:29:13,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.39 | bwd_microstep: 1038.24 | bwd_inner_microstep: 1038.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3816
[2024-06-10 02:29:18,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.44 | optimizer_step: 6.59
[2024-06-10 02:29:18,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.41 | bwd_microstep: 4447.01 | bwd_inner_microstep: 1612.56 | bwd_allreduce_microstep: 2834.38 | step_microstep: 40.07
[2024-06-10 02:29:18,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16021.65 | bwd: 45662.87 | bwd_inner: 42827.44 | bwd_allreduce: 2834.68 | step: 42.36
{'loss': 1.3331, 'learning_rate': 3.990846279916649e-05, 'epoch': 0.06}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463
[2024-06-10 02:29:20,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.76 | bwd_microstep: 1242.69 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901
[2024-06-10 02:29:22,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1584.83 | bwd_inner_microstep: 1584.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3798
[2024-06-10 02:29:24,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.71 | bwd_microstep: 1603.26 | bwd_inner_microstep: 1603.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 02:29:25,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.92 | bwd_microstep: 792.15 | bwd_inner_microstep: 792.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2019
[2024-06-10 02:29:26,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.07 | bwd_microstep: 780.29 | bwd_inner_microstep: 780.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4074
[2024-06-10 02:29:29,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.45 | bwd_microstep: 1693.64 | bwd_inner_microstep: 1693.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 02:29:30,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.67 | bwd_microstep: 803.60 | bwd_inner_microstep: 803.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3842
[2024-06-10 02:29:32,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.36 | bwd_microstep: 1767.92 | bwd_inner_microstep: 1767.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3482
[2024-06-10 02:29:34,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.57 | bwd_microstep: 1223.75 | bwd_inner_microstep: 1223.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3691
[2024-06-10 02:29:36,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1552.78 | bwd_inner_microstep: 1552.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1984
[2024-06-10 02:29:37,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.22 | bwd_microstep: 855.97 | bwd_inner_microstep: 855.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 02:29:39,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-10 02:29:42,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.96 | bwd_microstep: 1713.16 | bwd_inner_microstep: 1713.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 02:29:43,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.38 | bwd_microstep: 1350.83 | bwd_inner_microstep: 1350.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-10 02:29:46,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.29 | bwd_microstep: 1717.64 | bwd_inner_microstep: 1717.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 02:29:48,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.53 | bwd_microstep: 1475.68 | bwd_inner_microstep: 1475.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3377
[2024-06-10 02:29:50,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1395.25 | bwd_inner_microstep: 1395.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 02:29:52,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.67 | bwd_microstep: 1497.37 | bwd_inner_microstep: 1497.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 02:29:54,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.00 | bwd_microstep: 1464.98 | bwd_inner_microstep: 1464.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 02:29:56,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.60 | bwd_microstep: 1529.79 | bwd_inner_microstep: 1529.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 02:29:58,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.31 | bwd_microstep: 1523.31 | bwd_inner_microstep: 1523.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 02:30:00,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.48 | bwd_microstep: 1288.80 | bwd_inner_microstep: 1288.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 02:30:02,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1319.28 | bwd_inner_microstep: 1319.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713
[2024-06-10 02:30:04,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.87 | bwd_microstep: 1338.49 | bwd_inner_microstep: 1338.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2410
[2024-06-10 02:30:05,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.40 | bwd_microstep: 1037.45 | bwd_inner_microstep: 1037.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 02:30:07,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.39 | bwd_microstep: 1402.61 | bwd_inner_microstep: 1402.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3530
[2024-06-10 02:30:09,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.72 | bwd_microstep: 1542.30 | bwd_inner_microstep: 1542.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 02:30:11,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.06 | bwd_microstep: 1191.57 | bwd_inner_microstep: 1191.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 02:30:13,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.25 | bwd_microstep: 1504.09 | bwd_inner_microstep: 1504.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 02:30:15,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.81 | bwd_microstep: 1584.75 | bwd_inner_microstep: 1584.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 02:30:17,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1508.29 | bwd_inner_microstep: 1508.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3591
[2024-06-10 02:30:19,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.24 | optimizer_step: 6.64
[2024-06-10 02:30:19,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1608.10 | bwd_inner_microstep: 1600.07 | bwd_allreduce_microstep: 7.98 | step_microstep: 38.99
[2024-06-10 02:30:19,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16565.18 | bwd: 44286.40 | bwd_inner: 44277.45 | bwd_allreduce: 8.24 | step: 41.47
{'loss': 1.3401, 'learning_rate': 3.990484078913488e-05, 'epoch': 0.06}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 02:30:21,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.54 | bwd_microstep: 1323.53 | bwd_inner_microstep: 1323.41 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-10 02:30:23,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1242.67 | bwd_inner_microstep: 1242.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3509
[2024-06-10 02:30:25,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.54 | bwd_microstep: 1226.50 | bwd_inner_microstep: 1226.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 02:30:26,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1249.18 | bwd_inner_microstep: 1249.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-10 02:30:28,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.19 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4092
[2024-06-10 02:30:30,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.78 | bwd_microstep: 1634.51 | bwd_inner_microstep: 1634.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 02:30:32,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1291.42 | bwd_inner_microstep: 1291.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 02:30:34,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1385.44 | bwd_inner_microstep: 1385.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 02:30:36,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1251.91 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2109
[2024-06-10 02:30:37,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.92 | bwd_microstep: 826.54 | bwd_inner_microstep: 826.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 02:30:38,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.78 | bwd_microstep: 701.55 | bwd_inner_microstep: 701.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504
[2024-06-10 02:30:40,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1581.10 | bwd_inner_microstep: 1581.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 02:30:42,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1246.88 | bwd_inner_microstep: 1246.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-10 02:30:44,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.70 | bwd_microstep: 1613.34 | bwd_inner_microstep: 1613.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 02:30:46,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.68 | bwd_microstep: 1607.73 | bwd_inner_microstep: 1607.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3660
[2024-06-10 02:30:48,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.38 | bwd_microstep: 1648.20 | bwd_inner_microstep: 1648.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652
[2024-06-10 02:30:50,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1455.07 | bwd_inner_microstep: 1455.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 02:30:52,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1291.89 | bwd_inner_microstep: 1291.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910
[2024-06-10 02:30:53,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.56 | bwd_microstep: 716.32 | bwd_inner_microstep: 716.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3517
[2024-06-10 02:30:55,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1356.68 | bwd_inner_microstep: 1356.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 02:30:57,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.22 | bwd_microstep: 1467.01 | bwd_inner_microstep: 1466.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 02:30:59,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.34 | bwd_microstep: 1190.43 | bwd_inner_microstep: 1190.27 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 02:31:00,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.47 | bwd_microstep: 805.53 | bwd_inner_microstep: 805.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-10 02:31:01,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.71 | bwd_microstep: 699.80 | bwd_inner_microstep: 699.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 02:31:03,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.54 | bwd_microstep: 1417.07 | bwd_inner_microstep: 1417.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 02:31:05,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.41 | bwd_microstep: 1399.46 | bwd_inner_microstep: 1399.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 02:31:07,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.97 | bwd_microstep: 1285.88 | bwd_inner_microstep: 1285.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3819
[2024-06-10 02:31:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.34 | bwd_microstep: 1787.40 | bwd_inner_microstep: 1787.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-10 02:31:10,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.90 | bwd_microstep: 721.43 | bwd_inner_microstep: 721.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3711
[2024-06-10 02:31:12,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1398.18 | bwd_inner_microstep: 1398.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 02:31:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.11 | bwd_microstep: 1595.80 | bwd_inner_microstep: 1595.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 02:31:22,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.42 | optimizer_step: 6.62
[2024-06-10 02:31:22,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.34 | bwd_microstep: 6910.97 | bwd_inner_microstep: 1665.75 | bwd_allreduce_microstep: 5245.14 | step_microstep: 40.00
[2024-06-10 02:31:22,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15474.75 | bwd: 46517.31 | bwd_inner: 41270.98 | bwd_allreduce: 5245.50 | step: 42.28


  6%|▌         | 99/1726 [1:47:51<27:43:25, 61.34s/it]
  6%|▌         | 100/1726 [1:48:49<27:20:07, 60.52s/it]


  6%|▌         | 100/1726 [1:48:49<27:20:07, 60.52s/it]
  6%|▌         | 101/1726 [1:49:52<27:34:55, 61.10s/it]


  6%|▌         | 101/1726 [1:49:52<27:34:55, 61.10s/it]
  6%|▌         | 102/1726 [1:50:53<27:31:38, 61.02s/it]


  6%|▌         | 102/1726 [1:50:53<27:31:38, 61.02s/it]
  6%|▌         | 103/1726 [1:51:55<27:39:12, 61.34s/it]


  6%|▌         | 103/1726 [1:51:55<27:39:12, 61.34s/it]
  6%|▌         | 104/1726 [1:52:56<27:37:28, 61.31s/it]


  6%|▌         | 104/1726 [1:52:56<27:37:28, 61.31s/it]
  6%|▌         | 105/1726 [1:53:58<27:45:05, 61.63s/it]
                {'loss': 1.3883, 'learning_rate': 3.9901148674357476e-05, 'epoch': 0.06}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 02:31:24,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.02 | bwd_microstep: 1567.13 | bwd_inner_microstep: 1567.01 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495
[2024-06-10 02:31:26,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.18 | bwd_microstep: 1442.19 | bwd_inner_microstep: 1442.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2370
[2024-06-10 02:31:27,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.98 | bwd_microstep: 995.52 | bwd_inner_microstep: 995.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-10 02:31:29,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.88 | bwd_microstep: 1413.19 | bwd_inner_microstep: 1413.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 02:31:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1252.48 | bwd_inner_microstep: 1252.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 02:31:33,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1555.58 | bwd_inner_microstep: 1555.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3800
[2024-06-10 02:31:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.18 | bwd_microstep: 1478.13 | bwd_inner_microstep: 1478.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 02:31:37,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1383.80 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 02:31:39,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1388.04 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 02:31:41,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.17 | bwd_microstep: 1278.28 | bwd_inner_microstep: 1278.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 02:31:43,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.87 | bwd_microstep: 1623.31 | bwd_inner_microstep: 1623.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2109
[2024-06-10 02:31:44,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.06 | bwd_microstep: 888.03 | bwd_inner_microstep: 888.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2089
[2024-06-10 02:31:45,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.02 | bwd_microstep: 822.58 | bwd_inner_microstep: 822.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-10 02:31:47,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1316.00 | bwd_inner_microstep: 1315.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 02:31:49,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1408.38 | bwd_inner_microstep: 1408.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 02:31:51,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1515.40 | bwd_inner_microstep: 1515.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3906
[2024-06-10 02:31:53,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.59 | bwd_microstep: 1462.58 | bwd_inner_microstep: 1462.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 02:31:55,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.01 | bwd_microstep: 1419.31 | bwd_inner_microstep: 1419.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2045
[2024-06-10 02:31:56,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.08 | bwd_microstep: 846.67 | bwd_inner_microstep: 846.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3537
[2024-06-10 02:31:58,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.49 | bwd_microstep: 1358.42 | bwd_inner_microstep: 1358.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 02:31:59,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.49 | bwd_microstep: 802.15 | bwd_inner_microstep: 802.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 02:32:02,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.66 | bwd_microstep: 1628.07 | bwd_inner_microstep: 1628.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836
[2024-06-10 02:32:04,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.38 | bwd_microstep: 1757.12 | bwd_inner_microstep: 1757.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 02:32:06,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.44 | bwd_microstep: 1561.20 | bwd_inner_microstep: 1561.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3819
[2024-06-10 02:32:09,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.53 | bwd_microstep: 1818.95 | bwd_inner_microstep: 1818.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 02:32:11,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1343.81 | bwd_inner_microstep: 1343.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3809
[2024-06-10 02:32:13,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.43 | bwd_microstep: 1595.05 | bwd_inner_microstep: 1595.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 02:32:15,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.33 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 02:32:17,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.94 | bwd_microstep: 1187.69 | bwd_inner_microstep: 1187.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 02:32:19,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1479.84 | bwd_inner_microstep: 1479.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919
[2024-06-10 02:32:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.53 | bwd_microstep: 689.32 | bwd_inner_microstep: 689.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 02:32:24,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 02:32:24,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.87 | bwd_microstep: 3982.86 | bwd_inner_microstep: 1869.31 | bwd_allreduce_microstep: 2113.49 | step_microstep: 39.19
[2024-06-10 02:32:24,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16278.05 | bwd: 45918.23 | bwd_inner: 43803.71 | bwd_allreduce: 2113.78 | step: 40.94
{'loss': 1.3428, 'learning_rate': 3.98973864678379e-05, 'epoch': 0.06}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 02:32:26,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1291.10 | bwd_inner_microstep: 1291.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565
[2024-06-10 02:32:28,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.92 | bwd_microstep: 1330.77 | bwd_inner_microstep: 1330.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 02:32:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.61 | bwd_microstep: 1483.67 | bwd_inner_microstep: 1483.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 02:32:32,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.38 | bwd_microstep: 1448.62 | bwd_inner_microstep: 1448.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3751
[2024-06-10 02:32:34,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1339.57 | bwd_inner_microstep: 1339.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-10 02:32:36,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.91 | bwd_microstep: 1215.74 | bwd_inner_microstep: 1215.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 02:32:38,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.00 | bwd_microstep: 1529.15 | bwd_inner_microstep: 1529.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 02:32:39,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 792.93 | bwd_inner_microstep: 792.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 02:32:40,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1250.80 | bwd_inner_microstep: 1250.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 02:32:42,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1350.64 | bwd_inner_microstep: 1350.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651
[2024-06-10 02:32:44,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1323.94 | bwd_inner_microstep: 1323.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 02:32:46,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1393.00 | bwd_inner_microstep: 1392.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 02:32:47,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.21 | bwd_microstep: 893.52 | bwd_inner_microstep: 893.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 02:32:50,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.59 | bwd_microstep: 1610.82 | bwd_inner_microstep: 1610.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3845
[2024-06-10 02:32:52,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1521.62 | bwd_inner_microstep: 1521.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 897
[2024-06-10 02:32:52,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.86 | bwd_microstep: 345.73 | bwd_inner_microstep: 345.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2022
[2024-06-10 02:32:53,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 793.07 | bwd_inner_microstep: 793.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1924
[2024-06-10 02:32:54,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.92 | bwd_microstep: 761.96 | bwd_inner_microstep: 761.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 02:32:56,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1405.59 | bwd_inner_microstep: 1405.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 02:32:58,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.49 | bwd_microstep: 1251.44 | bwd_inner_microstep: 1251.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 02:33:00,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1504.78 | bwd_inner_microstep: 1504.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 895
[2024-06-10 02:33:01,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.92 | bwd_microstep: 371.91 | bwd_inner_microstep: 371.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 02:33:03,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1400.13 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580
[2024-06-10 02:33:05,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.98 | bwd_microstep: 1603.53 | bwd_inner_microstep: 1603.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 02:33:07,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1298.48 | bwd_inner_microstep: 1298.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596
[2024-06-10 02:33:09,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1440.17 | bwd_inner_microstep: 1440.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 02:33:10,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.01 | bwd_microstep: 803.91 | bwd_inner_microstep: 803.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 02:33:12,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.71 | bwd_microstep: 1541.31 | bwd_inner_microstep: 1541.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 02:33:14,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1460.54 | bwd_inner_microstep: 1460.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 02:33:16,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.69 | bwd_microstep: 1550.76 | bwd_inner_microstep: 1550.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-10 02:33:18,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.40 | bwd_microstep: 1645.80 | bwd_inner_microstep: 1645.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 02:33:26,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.38 | optimizer_step: 6.62
[2024-06-10 02:33:26,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 7252.48 | bwd_inner_microstep: 2008.78 | bwd_allreduce_microstep: 5243.62 | step_microstep: 39.72
[2024-06-10 02:33:26,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15198.94 | bwd: 46207.50 | bwd_inner: 40962.94 | bwd_allreduce: 5243.87 | step: 41.46
{'loss': 1.3191, 'learning_rate': 3.989355418282663e-05, 'epoch': 0.06}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2424
[2024-06-10 02:33:27,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.11 | bwd_microstep: 963.83 | bwd_inner_microstep: 963.67 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 02:33:29,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.45 | bwd_microstep: 1278.18 | bwd_inner_microstep: 1278.03 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 02:33:31,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1452.23 | bwd_inner_microstep: 1452.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 02:33:33,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.98 | bwd_microstep: 1543.79 | bwd_inner_microstep: 1543.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868
[2024-06-10 02:33:34,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.17 | bwd_microstep: 709.44 | bwd_inner_microstep: 709.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 02:33:36,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1378.56 | bwd_inner_microstep: 1378.41 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.14
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3484
[2024-06-10 02:33:38,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.58 | bwd_microstep: 1244.28 | bwd_inner_microstep: 1244.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 02:33:40,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 02:33:42,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1414.27 | bwd_inner_microstep: 1414.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 02:33:44,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.27 | bwd_microstep: 1386.65 | bwd_inner_microstep: 1386.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 02:33:45,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 02:33:47,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.14 | bwd_microstep: 791.61 | bwd_inner_microstep: 791.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 4073
[2024-06-10 02:33:49,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.26 | bwd_microstep: 1762.72 | bwd_inner_microstep: 1762.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 02:33:51,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.37 | bwd_microstep: 1422.19 | bwd_inner_microstep: 1422.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-10 02:33:52,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.08 | bwd_microstep: 917.57 | bwd_inner_microstep: 917.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3422
[2024-06-10 02:33:54,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.41 | bwd_microstep: 1319.25 | bwd_inner_microstep: 1319.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445
[2024-06-10 02:33:56,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.06 | bwd_microstep: 1226.60 | bwd_inner_microstep: 1226.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086
[2024-06-10 02:33:57,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.18 | bwd_microstep: 823.70 | bwd_inner_microstep: 823.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-10 02:33:59,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.58 | bwd_microstep: 1159.01 | bwd_inner_microstep: 1158.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3538
[2024-06-10 02:34:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.83 | bwd_microstep: 1664.54 | bwd_inner_microstep: 1664.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.34
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 02:34:03,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.58 | bwd_microstep: 1283.51 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3534
[2024-06-10 02:34:05,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.91 | bwd_microstep: 1448.40 | bwd_inner_microstep: 1448.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 02:34:06,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.09 | bwd_microstep: 1161.75 | bwd_inner_microstep: 1161.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016
[2024-06-10 02:34:07,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.70 | bwd_microstep: 902.74 | bwd_inner_microstep: 902.61 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.22
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3602
[2024-06-10 02:34:10,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.92 | bwd_microstep: 1550.92 | bwd_inner_microstep: 1550.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2927
[2024-06-10 02:34:11,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.27 | bwd_microstep: 1254.62 | bwd_inner_microstep: 1254.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 02:34:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.49 | bwd_microstep: 1536.20 | bwd_inner_microstep: 1536.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 02:34:16,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.04 | bwd_microstep: 1489.36 | bwd_inner_microstep: 1489.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3579
[2024-06-10 02:34:18,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.65 | bwd_microstep: 1629.71 | bwd_inner_microstep: 1629.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850
[2024-06-10 02:34:20,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.09 | bwd_microstep: 1661.55 | bwd_inner_microstep: 1661.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3584
[2024-06-10 02:34:22,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.97 | bwd_microstep: 1697.94 | bwd_inner_microstep: 1697.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-10 02:34:28,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 02:34:28,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.85 | bwd_microstep: 5423.06 | bwd_inner_microstep: 1938.04 | bwd_allreduce_microstep: 3484.97 | step_microstep: 38.56
[2024-06-10 02:34:28,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15897.09 | bwd: 46133.51 | bwd_inner: 42647.15 | bwd_allreduce: 3485.45 | step: 41.35
{'loss': 1.3984, 'learning_rate': 3.988965183282094e-05, 'epoch': 0.06}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 02:34:30,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.75 | bwd_microstep: 1444.83 | bwd_inner_microstep: 1444.68 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3429
[2024-06-10 02:34:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.09 | bwd_microstep: 1314.90 | bwd_inner_microstep: 1314.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3957
[2024-06-10 02:34:34,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1494.90 | bwd_inner_microstep: 1494.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2388
[2024-06-10 02:34:36,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.45 | bwd_microstep: 934.99 | bwd_inner_microstep: 934.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 02:34:37,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.77 | bwd_microstep: 806.87 | bwd_inner_microstep: 806.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 02:34:39,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.35 | bwd_microstep: 1459.46 | bwd_inner_microstep: 1459.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3764
[2024-06-10 02:34:41,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.41 | bwd_microstep: 1248.16 | bwd_inner_microstep: 1248.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 02:34:43,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1553.33 | bwd_inner_microstep: 1553.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-10 02:34:44,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.78 | bwd_microstep: 713.92 | bwd_inner_microstep: 713.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3450
[2024-06-10 02:34:45,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.64 | bwd_microstep: 1189.52 | bwd_inner_microstep: 1189.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 718
[2024-06-10 02:34:46,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.72 | bwd_microstep: 294.69 | bwd_inner_microstep: 294.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410
[2024-06-10 02:34:48,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1375.59 | bwd_inner_microstep: 1375.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 02:34:49,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.75 | bwd_microstep: 1256.82 | bwd_inner_microstep: 1256.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 02:34:51,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.05 | bwd_microstep: 799.09 | bwd_inner_microstep: 799.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1947
[2024-06-10 02:34:52,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.06 | bwd_microstep: 891.52 | bwd_inner_microstep: 891.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3656
[2024-06-10 02:34:54,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.41 | bwd_microstep: 1687.03 | bwd_inner_microstep: 1686.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3455
[2024-06-10 02:34:56,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 02:34:58,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.14 | bwd_microstep: 1626.29 | bwd_inner_microstep: 1626.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 02:35:00,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.41 | bwd_microstep: 1279.41 | bwd_inner_microstep: 1279.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 02:35:02,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1586.58 | bwd_inner_microstep: 1586.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 02:35:04,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.92 | bwd_microstep: 1416.59 | bwd_inner_microstep: 1416.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 02:35:06,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.19 | bwd_microstep: 1482.64 | bwd_inner_microstep: 1482.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506
[2024-06-10 02:35:08,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.99 | bwd_microstep: 1320.15 | bwd_inner_microstep: 1320.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3542
[2024-06-10 02:35:10,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.72 | bwd_microstep: 1419.99 | bwd_inner_microstep: 1419.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-10 02:35:12,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1433.09 | bwd_inner_microstep: 1433.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 02:35:14,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3551
[2024-06-10 02:35:16,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1528.75 | bwd_inner_microstep: 1528.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 02:35:18,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.34 | bwd_microstep: 1487.35 | bwd_inner_microstep: 1487.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 02:35:20,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.94 | bwd_microstep: 1479.98 | bwd_inner_microstep: 1479.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2264
[2024-06-10 02:35:22,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.95 | bwd_microstep: 999.98 | bwd_inner_microstep: 999.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 02:35:23,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.06 | bwd_microstep: 1292.10 | bwd_inner_microstep: 1292.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 02:35:32,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.41 | optimizer_step: 6.58
[2024-06-10 02:35:32,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 8421.24 | bwd_inner_microstep: 1487.28 | bwd_allreduce_microstep: 6933.88 | step_microstep: 39.93
[2024-06-10 02:35:32,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15401.96 | bwd: 48120.63 | bwd_inner: 41185.59 | bwd_allreduce: 6934.19 | step: 42.14
{'loss': 1.3348, 'learning_rate': 3.988567943156489e-05, 'epoch': 0.06}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 02:35:34,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.96 | bwd_microstep: 1476.85 | bwd_inner_microstep: 1476.69 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 02:35:36,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.20 | bwd_microstep: 1272.50 | bwd_inner_microstep: 1272.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 02:35:38,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3924
[2024-06-10 02:35:40,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.38 | bwd_microstep: 1694.62 | bwd_inner_microstep: 1694.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2249
[2024-06-10 02:35:42,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.60 | bwd_microstep: 963.91 | bwd_inner_microstep: 963.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-10 02:35:43,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1216.17 | bwd_inner_microstep: 1216.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3713
[2024-06-10 02:35:45,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.27 | bwd_microstep: 1393.97 | bwd_inner_microstep: 1393.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 02:35:46,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.52 | bwd_microstep: 792.84 | bwd_inner_microstep: 792.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 02:35:48,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1385.70 | bwd_inner_microstep: 1385.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888
[2024-06-10 02:35:49,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.43 | bwd_microstep: 687.72 | bwd_inner_microstep: 687.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 02:35:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1258.86 | bwd_inner_microstep: 1258.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502
[2024-06-10 02:35:53,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.68 | bwd_microstep: 1436.97 | bwd_inner_microstep: 1436.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 02:35:55,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.30 | bwd_microstep: 1486.64 | bwd_inner_microstep: 1486.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 02:35:57,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.87 | bwd_microstep: 1487.97 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827
[2024-06-10 02:36:00,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.86 | bwd_microstep: 1756.91 | bwd_inner_microstep: 1756.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2420
[2024-06-10 02:36:01,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.65 | bwd_microstep: 847.99 | bwd_inner_microstep: 847.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3707
[2024-06-10 02:36:03,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1333.45 | bwd_inner_microstep: 1333.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 02:36:05,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.68 | bwd_microstep: 1514.64 | bwd_inner_microstep: 1514.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 02:36:07,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.49 | bwd_microstep: 1499.92 | bwd_inner_microstep: 1499.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 02:36:09,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.08 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3813
[2024-06-10 02:36:11,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.73 | bwd_microstep: 1586.37 | bwd_inner_microstep: 1586.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 02:36:13,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.35 | bwd_microstep: 1286.03 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 02:36:14,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.33 | bwd_microstep: 978.20 | bwd_inner_microstep: 978.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 02:36:16,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.98 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 02:36:18,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.43 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 02:36:19,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.52 | bwd_microstep: 903.23 | bwd_inner_microstep: 903.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-10 02:36:21,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1418.29 | bwd_inner_microstep: 1418.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2287
[2024-06-10 02:36:22,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.92 | bwd_microstep: 939.58 | bwd_inner_microstep: 939.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 02:36:24,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.54 | bwd_microstep: 1287.73 | bwd_inner_microstep: 1287.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3585
[2024-06-10 02:36:26,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.50 | bwd_microstep: 1701.18 | bwd_inner_microstep: 1701.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-10 02:36:27,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.93 | bwd_microstep: 782.16 | bwd_inner_microstep: 782.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 02:36:33,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 02:36:33,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.46 | bwd_microstep: 5421.86 | bwd_inner_microstep: 1866.45 | bwd_allreduce_microstep: 3555.35 | step_microstep: 39.28
[2024-06-10 02:36:33,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15536.10 | bwd: 45199.28 | bwd_inner: 41642.90 | bwd_allreduce: 3555.64 | step: 41.13
{'loss': 1.3681, 'learning_rate': 3.988163699304926e-05, 'epoch': 0.06}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-10 02:36:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.65 | bwd_microstep: 1445.27 | bwd_inner_microstep: 1445.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 02:36:37,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1345.01 | bwd_inner_microstep: 1344.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3867
[2024-06-10 02:36:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1462.96 | bwd_inner_microstep: 1462.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 02:36:41,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1550.88 | bwd_inner_microstep: 1550.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3795
[2024-06-10 02:36:43,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.25 | bwd_microstep: 1355.28 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 02:36:44,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.00 | bwd_microstep: 798.15 | bwd_inner_microstep: 798.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 02:36:46,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.72 | bwd_microstep: 789.51 | bwd_inner_microstep: 789.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 02:36:48,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 1414.45 | bwd_inner_microstep: 1414.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488
[2024-06-10 02:36:49,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.61 | bwd_microstep: 1222.17 | bwd_inner_microstep: 1222.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3576
[2024-06-10 02:36:51,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.22 | bwd_microstep: 1239.58 | bwd_inner_microstep: 1239.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:36:53,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1250.45 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 02:36:54,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489
[2024-06-10 02:36:56,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.42 | bwd_microstep: 1224.80 | bwd_inner_microstep: 1224.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3383
[2024-06-10 02:36:58,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.10 | bwd_microstep: 1368.32 | bwd_inner_microstep: 1368.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3032
[2024-06-10 02:37:00,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.99 | bwd_microstep: 1322.75 | bwd_inner_microstep: 1322.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1913
[2024-06-10 02:37:01,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.76 | bwd_microstep: 810.52 | bwd_inner_microstep: 810.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 02:37:02,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.31 | bwd_microstep: 812.43 | bwd_inner_microstep: 812.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3449
[2024-06-10 02:37:04,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.80 | bwd_microstep: 1240.35 | bwd_inner_microstep: 1240.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479
[2024-06-10 02:37:06,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.52 | bwd_microstep: 1218.63 | bwd_inner_microstep: 1218.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 02:37:08,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1497.77 | bwd_inner_microstep: 1497.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 02:37:10,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.73 | bwd_microstep: 1462.14 | bwd_inner_microstep: 1462.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3682
[2024-06-10 02:37:12,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.42 | bwd_microstep: 1460.13 | bwd_inner_microstep: 1460.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 02:37:13,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1286.29 | bwd_inner_microstep: 1286.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3785
[2024-06-10 02:37:16,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.65 | bwd_microstep: 1513.54 | bwd_inner_microstep: 1513.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4724
[2024-06-10 02:37:18,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.89 | bwd_microstep: 1629.46 | bwd_inner_microstep: 1629.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 02:37:20,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.01 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3594
[2024-06-10 02:37:22,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.03 | bwd_microstep: 1674.62 | bwd_inner_microstep: 1674.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2273
[2024-06-10 02:37:23,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.12 | bwd_microstep: 926.44 | bwd_inner_microstep: 926.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 02:37:26,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.86 | bwd_microstep: 1699.78 | bwd_inner_microstep: 1699.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 02:37:28,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.96 | bwd_microstep: 1502.23 | bwd_inner_microstep: 1502.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752
[2024-06-10 02:37:30,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.97 | bwd_microstep: 1635.73 | bwd_inner_microstep: 1635.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 02:37:34,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.70 | optimizer_gradients: 4.26 | optimizer_step: 6.57
[2024-06-10 02:37:34,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.86 | bwd_microstep: 3442.29 | bwd_inner_microstep: 1113.42 | bwd_allreduce_microstep: 2328.81 | step_microstep: 41.24
[2024-06-10 02:37:34,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15677.26 | bwd: 44272.09 | bwd_inner: 41942.26 | bwd_allreduce: 2329.10 | step: 42.88


  6%|▌         | 105/1726 [1:53:58<27:45:05, 61.63s/it]
  6%|▌         | 106/1726 [1:55:01<27:51:41, 61.91s/it]


  6%|▌         | 106/1726 [1:55:01<27:51:41, 61.91s/it]
  6%|▌         | 107/1726 [1:56:03<27:49:27, 61.87s/it]


  6%|▌         | 107/1726 [1:56:03<27:49:27, 61.87s/it]
  6%|▋         | 108/1726 [1:57:05<27:52:53, 62.04s/it]


  6%|▋         | 108/1726 [1:57:05<27:52:53, 62.04s/it]
  6%|▋         | 109/1726 [1:58:09<28:06:55, 62.59s/it]


  6%|▋         | 109/1726 [1:58:09<28:06:55, 62.59s/it]
  6%|▋         | 110/1726 [1:59:10<27:53:47, 62.15s/it]


  6%|▋         | 110/1726 [1:59:10<27:53:47, 62.15s/it]
  6%|▋         | 111/1726 [2:00:11<27:37:54, 61.59s/it]
  {'loss': 1.4085, 'learning_rate': 3.987752453151149e-05, 'epoch': 0.06}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518
[2024-06-10 02:37:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1416.91 | bwd_inner_microstep: 1416.80 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3832
[2024-06-10 02:37:38,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1357.77 | bwd_inner_microstep: 1357.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502
[2024-06-10 02:37:40,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.28 | bwd_microstep: 1447.62 | bwd_inner_microstep: 1447.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559
[2024-06-10 02:37:42,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.67 | bwd_microstep: 1432.79 | bwd_inner_microstep: 1432.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 02:37:43,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1349.07 | bwd_inner_microstep: 1349.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 02:37:45,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.48 | bwd_microstep: 1412.02 | bwd_inner_microstep: 1411.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4042
[2024-06-10 02:37:47,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.11 | bwd_microstep: 1455.38 | bwd_inner_microstep: 1455.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3609
[2024-06-10 02:37:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.00 | bwd_microstep: 1219.61 | bwd_inner_microstep: 1219.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1959
[2024-06-10 02:37:50,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.41 | bwd_microstep: 823.43 | bwd_inner_microstep: 823.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 02:37:52,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1346.70 | bwd_inner_microstep: 1346.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3699
[2024-06-10 02:37:55,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.05 | bwd_microstep: 1825.63 | bwd_inner_microstep: 1825.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2711
[2024-06-10 02:37:56,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.62 | bwd_microstep: 1127.78 | bwd_inner_microstep: 1127.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 02:37:58,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.89 | bwd_microstep: 1351.35 | bwd_inner_microstep: 1351.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 02:38:00,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 02:38:02,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1526.14 | bwd_inner_microstep: 1526.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-10 02:38:04,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.57 | bwd_microstep: 1626.76 | bwd_inner_microstep: 1626.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3528
[2024-06-10 02:38:06,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.36 | bwd_microstep: 1564.47 | bwd_inner_microstep: 1564.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 02:38:08,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1279.96 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 02:38:09,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.55 | bwd_microstep: 879.44 | bwd_inner_microstep: 879.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3673
[2024-06-10 02:38:12,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.34 | bwd_microstep: 1657.13 | bwd_inner_microstep: 1657.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 02:38:14,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.15 | bwd_microstep: 1297.88 | bwd_inner_microstep: 1297.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 02:38:16,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.69 | bwd_microstep: 1660.56 | bwd_inner_microstep: 1660.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2124
[2024-06-10 02:38:17,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.59 | bwd_microstep: 800.63 | bwd_inner_microstep: 800.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895
[2024-06-10 02:38:18,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.96 | bwd_microstep: 719.01 | bwd_inner_microstep: 718.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 02:38:20,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.52 | bwd_microstep: 1407.87 | bwd_inner_microstep: 1407.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3830
[2024-06-10 02:38:22,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.05 | bwd_microstep: 1685.04 | bwd_inner_microstep: 1685.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3789
[2024-06-10 02:38:24,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.38 | bwd_microstep: 1403.28 | bwd_inner_microstep: 1403.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-10 02:38:26,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1312.12 | bwd_inner_microstep: 1312.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 02:38:28,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.99 | bwd_microstep: 1456.64 | bwd_inner_microstep: 1456.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426
[2024-06-10 02:38:30,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.68 | bwd_microstep: 1316.07 | bwd_inner_microstep: 1316.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-10 02:38:32,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.11 | bwd_microstep: 1704.45 | bwd_inner_microstep: 1704.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 02:38:35,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 02:38:35,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1980.91 | bwd_inner_microstep: 1793.01 | bwd_allreduce_microstep: 187.85 | step_microstep: 38.28
[2024-06-10 02:38:35,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16412.42 | bwd: 44232.80 | bwd_inner: 44043.95 | bwd_allreduce: 188.12 | step: 40.21
{'loss': 1.3135, 'learning_rate': 3.9873342061435664e-05, 'epoch': 0.06}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 02:38:37,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.40 | bwd_microstep: 1581.55 | bwd_inner_microstep: 1581.45 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 02:38:39,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.53 | bwd_microstep: 1349.23 | bwd_inner_microstep: 1349.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892
[2024-06-10 02:38:41,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.17 | bwd_microstep: 1586.93 | bwd_inner_microstep: 1586.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 02:38:43,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1342.65 | bwd_inner_microstep: 1342.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 02:38:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1250.11 | bwd_inner_microstep: 1250.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 02:38:46,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1251.97 | bwd_inner_microstep: 1251.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 02:38:48,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.42 | bwd_microstep: 1303.17 | bwd_inner_microstep: 1303.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 02:38:50,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.65 | bwd_microstep: 1256.58 | bwd_inner_microstep: 1256.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 02:38:52,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.06 | bwd_microstep: 1290.14 | bwd_inner_microstep: 1290.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-10 02:38:53,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.01 | bwd_microstep: 1191.00 | bwd_inner_microstep: 1190.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2978
[2024-06-10 02:38:55,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.73 | bwd_microstep: 1145.12 | bwd_inner_microstep: 1145.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 02:38:57,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.42 | bwd_microstep: 1281.46 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3472
[2024-06-10 02:38:59,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.11 | bwd_microstep: 1449.45 | bwd_inner_microstep: 1449.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 02:39:01,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.99 | bwd_microstep: 1722.42 | bwd_inner_microstep: 1722.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 02:39:03,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1391.97 | bwd_inner_microstep: 1391.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-10 02:39:05,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.56 | bwd_microstep: 1424.73 | bwd_inner_microstep: 1424.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3657
[2024-06-10 02:39:07,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1323.48 | bwd_inner_microstep: 1323.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 02:39:09,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1445.43 | bwd_inner_microstep: 1445.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 02:39:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.90 | bwd_microstep: 1483.95 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3632
[2024-06-10 02:39:13,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1562.59 | bwd_inner_microstep: 1562.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 02:39:15,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.85 | bwd_microstep: 1282.55 | bwd_inner_microstep: 1282.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3622
[2024-06-10 02:39:17,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.34 | bwd_microstep: 1446.26 | bwd_inner_microstep: 1446.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 02:39:19,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.72 | bwd_microstep: 1662.46 | bwd_inner_microstep: 1662.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3584
[2024-06-10 02:39:21,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.97 | bwd_microstep: 1356.01 | bwd_inner_microstep: 1355.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2218
[2024-06-10 02:39:22,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.78 | bwd_microstep: 863.81 | bwd_inner_microstep: 863.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 02:39:24,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.20 | bwd_microstep: 1500.36 | bwd_inner_microstep: 1500.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 02:39:26,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.64 | bwd_microstep: 1291.17 | bwd_inner_microstep: 1291.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2255
[2024-06-10 02:39:27,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.39 | bwd_microstep: 812.01 | bwd_inner_microstep: 811.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 02:39:29,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1386.78 | bwd_inner_microstep: 1386.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 02:39:31,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.25 | bwd_microstep: 1420.52 | bwd_inner_microstep: 1420.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 02:39:33,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.82 | bwd_microstep: 1359.02 | bwd_inner_microstep: 1358.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 02:39:37,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 02:39:37,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.79 | bwd_microstep: 3030.70 | bwd_inner_microstep: 1694.74 | bwd_allreduce_microstep: 1335.90 | step_microstep: 38.64
[2024-06-10 02:39:37,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16350.07 | bwd: 45045.61 | bwd_inner: 43708.72 | bwd_allreduce: 1336.18 | step: 40.55
{'loss': 1.3375, 'learning_rate': 3.98690895975524e-05, 'epoch': 0.07}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 02:39:38,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.20 | bwd_microstep: 1339.07 | bwd_inner_microstep: 1339.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4443
[2024-06-10 02:39:41,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.47 | bwd_microstep: 1559.04 | bwd_inner_microstep: 1559.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 02:39:43,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1482.18 | bwd_inner_microstep: 1482.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 02:39:44,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.65 | bwd_inner_microstep: 1249.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 02:39:46,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1285.64 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713
[2024-06-10 02:39:48,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.42 | bwd_microstep: 1462.52 | bwd_inner_microstep: 1462.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 02:39:50,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1488.68 | bwd_inner_microstep: 1488.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 02:39:52,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.71 | bwd_microstep: 1288.91 | bwd_inner_microstep: 1288.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 02:39:54,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1306.04 | bwd_inner_microstep: 1306.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3658
[2024-06-10 02:39:56,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.49 | bwd_microstep: 1653.12 | bwd_inner_microstep: 1653.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 02:39:58,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 1346.52 | bwd_inner_microstep: 1346.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 02:40:00,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1493.91 | bwd_inner_microstep: 1493.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3667
[2024-06-10 02:40:02,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.39 | bwd_microstep: 1686.93 | bwd_inner_microstep: 1686.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3424
[2024-06-10 02:40:04,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1395.16 | bwd_inner_microstep: 1395.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 02:40:06,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.27 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4044
[2024-06-10 02:40:08,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.20 | bwd_microstep: 1719.91 | bwd_inner_microstep: 1719.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3688
[2024-06-10 02:40:10,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.19 | bwd_microstep: 1363.25 | bwd_inner_microstep: 1363.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3430
[2024-06-10 02:40:12,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1399.35 | bwd_inner_microstep: 1399.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 02:40:14,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.46 | bwd_microstep: 1251.82 | bwd_inner_microstep: 1251.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 02:40:16,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1408.22 | bwd_inner_microstep: 1408.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 02:40:18,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.54 | bwd_microstep: 1505.74 | bwd_inner_microstep: 1505.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 02:40:20,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.40 | bwd_microstep: 1283.82 | bwd_inner_microstep: 1283.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3602
[2024-06-10 02:40:22,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.26 | bwd_microstep: 1218.02 | bwd_inner_microstep: 1217.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 02:40:23,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.00 | bwd_microstep: 690.45 | bwd_inner_microstep: 690.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1924
[2024-06-10 02:40:24,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.97 | bwd_microstep: 728.00 | bwd_inner_microstep: 727.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 02:40:26,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.21 | bwd_microstep: 1457.07 | bwd_inner_microstep: 1457.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 02:40:28,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.69 | bwd_microstep: 1462.24 | bwd_inner_microstep: 1462.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574
[2024-06-10 02:40:30,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.89 | bwd_microstep: 1537.05 | bwd_inner_microstep: 1537.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 02:40:32,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1362.58 | bwd_inner_microstep: 1362.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3549
[2024-06-10 02:40:34,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.79 | bwd_microstep: 1589.73 | bwd_inner_microstep: 1589.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 02:40:36,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.03 | bwd_microstep: 1650.62 | bwd_inner_microstep: 1650.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2073
[2024-06-10 02:40:40,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 02:40:40,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.94 | bwd_microstep: 3096.91 | bwd_inner_microstep: 1083.31 | bwd_allreduce_microstep: 2013.55 | step_microstep: 38.36
[2024-06-10 02:40:40,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16480.79 | bwd: 46106.28 | bwd_inner: 44091.83 | bwd_allreduce: 2013.78 | step: 40.51
{'loss': 1.3184, 'learning_rate': 3.9864767154838864e-05, 'epoch': 0.07}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 02:40:41,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.26 | bwd_microstep: 1272.44 | bwd_inner_microstep: 1272.25 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.19
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3464
[2024-06-10 02:40:43,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.41 | bwd_microstep: 1546.73 | bwd_inner_microstep: 1546.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 02:40:46,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.08 | bwd_microstep: 1561.66 | bwd_inner_microstep: 1561.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 02:40:48,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.05 | bwd_microstep: 1479.63 | bwd_inner_microstep: 1479.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 02:40:50,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.57 | bwd_microstep: 1551.98 | bwd_inner_microstep: 1551.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 02:40:52,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.57 | bwd_microstep: 1641.37 | bwd_inner_microstep: 1641.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 02:40:53,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.91 | bwd_microstep: 820.37 | bwd_inner_microstep: 820.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-10 02:40:55,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.21 | bwd_microstep: 1189.51 | bwd_inner_microstep: 1189.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143
[2024-06-10 02:40:56,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.75 | bwd_microstep: 833.12 | bwd_inner_microstep: 833.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 02:40:58,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.75 | bwd_microstep: 1525.27 | bwd_inner_microstep: 1525.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 02:41:00,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.49 | bwd_microstep: 1479.02 | bwd_inner_microstep: 1478.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-10 02:41:02,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.95 | bwd_microstep: 1585.19 | bwd_inner_microstep: 1585.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 02:41:04,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1338.67 | bwd_inner_microstep: 1338.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 02:41:06,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.86 | bwd_microstep: 1489.07 | bwd_inner_microstep: 1489.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 02:41:08,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.15 | bwd_microstep: 1523.10 | bwd_inner_microstep: 1523.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 02:41:11,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.85 | bwd_microstep: 1618.92 | bwd_inner_microstep: 1618.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523
[2024-06-10 02:41:13,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.81 | bwd_microstep: 1455.66 | bwd_inner_microstep: 1455.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3691
[2024-06-10 02:41:15,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.98 | bwd_microstep: 1695.94 | bwd_inner_microstep: 1695.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091
[2024-06-10 02:41:16,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.44 | bwd_microstep: 918.97 | bwd_inner_microstep: 918.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2098
[2024-06-10 02:41:17,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.71 | bwd_microstep: 731.02 | bwd_inner_microstep: 730.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-10 02:41:20,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.85 | bwd_microstep: 1708.45 | bwd_inner_microstep: 1708.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3499
[2024-06-10 02:41:21,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.69 | bwd_microstep: 1355.68 | bwd_inner_microstep: 1355.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 02:41:23,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.31 | bwd_microstep: 1378.05 | bwd_inner_microstep: 1378.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3595
[2024-06-10 02:41:25,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.34 | bwd_microstep: 1538.32 | bwd_inner_microstep: 1538.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3826
[2024-06-10 02:41:27,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1326.17 | bwd_inner_microstep: 1326.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 02:41:29,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 02:41:31,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.15 | bwd_microstep: 1261.96 | bwd_inner_microstep: 1261.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 02:41:33,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1359.15 | bwd_inner_microstep: 1359.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 02:41:35,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1418.83 | bwd_inner_microstep: 1418.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 02:41:36,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.94 | bwd_microstep: 974.24 | bwd_inner_microstep: 974.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4034
[2024-06-10 02:41:38,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1520.65 | bwd_inner_microstep: 1520.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2275
[2024-06-10 02:41:42,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.33 | optimizer_step: 6.59
[2024-06-10 02:41:42,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.34 | bwd_microstep: 3167.29 | bwd_inner_microstep: 883.34 | bwd_allreduce_microstep: 2283.89 | step_microstep: 39.16
[2024-06-10 02:41:42,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16197.40 | bwd: 45646.99 | bwd_inner: 43362.04 | bwd_allreduce: 2284.20 | step: 41.07
{'loss': 1.3137, 'learning_rate': 3.9860374748518676e-05, 'epoch': 0.07}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 02:41:44,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.13 | bwd_microstep: 1481.74 | bwd_inner_microstep: 1481.56 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3938
[2024-06-10 02:41:46,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.20 | bwd_microstep: 1594.90 | bwd_inner_microstep: 1594.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 02:41:48,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.00 | bwd_microstep: 1482.45 | bwd_inner_microstep: 1482.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 02:41:50,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1451.94 | bwd_inner_microstep: 1451.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 02:41:52,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1282.13 | bwd_inner_microstep: 1282.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 02:41:54,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1253.50 | bwd_inner_microstep: 1253.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 02:41:55,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1251.54 | bwd_inner_microstep: 1251.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 02:41:57,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.88 | bwd_microstep: 1258.46 | bwd_inner_microstep: 1258.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 02:41:58,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 700.17 | bwd_inner_microstep: 700.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 02:42:00,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.51 | bwd_microstep: 1280.06 | bwd_inner_microstep: 1280.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 02:42:02,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.93 | bwd_microstep: 1245.92 | bwd_inner_microstep: 1245.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956
[2024-06-10 02:42:03,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.94 | bwd_microstep: 826.12 | bwd_inner_microstep: 826.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2251
[2024-06-10 02:42:04,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.19 | bwd_microstep: 873.42 | bwd_inner_microstep: 873.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506
[2024-06-10 02:42:06,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.60 | bwd_microstep: 1347.89 | bwd_inner_microstep: 1347.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 02:42:08,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1340.54 | bwd_inner_microstep: 1340.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3656
[2024-06-10 02:42:09,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1328.96 | bwd_inner_microstep: 1328.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-10 02:42:11,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.52 | bwd_microstep: 1416.51 | bwd_inner_microstep: 1416.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 02:42:14,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.85 | bwd_microstep: 1559.58 | bwd_inner_microstep: 1559.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 02:42:16,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1398.92 | bwd_inner_microstep: 1398.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 02:42:17,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.64 | bwd_microstep: 1190.81 | bwd_inner_microstep: 1190.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 02:42:19,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.42 | bwd_microstep: 1288.99 | bwd_inner_microstep: 1288.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 02:42:21,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1467.14 | bwd_inner_microstep: 1467.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 02:42:23,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1394.30 | bwd_inner_microstep: 1394.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-10 02:42:25,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.77 | bwd_microstep: 1614.77 | bwd_inner_microstep: 1614.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 02:42:27,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.37 | bwd_microstep: 1556.88 | bwd_inner_microstep: 1556.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 02:42:29,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1403.95 | bwd_inner_microstep: 1403.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3531
[2024-06-10 02:42:31,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.34 | bwd_microstep: 1256.73 | bwd_inner_microstep: 1256.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927
[2024-06-10 02:42:32,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.58 | bwd_microstep: 733.84 | bwd_inner_microstep: 733.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 02:42:34,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.95 | bwd_microstep: 1583.13 | bwd_inner_microstep: 1583.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 02:42:36,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1511.29 | bwd_inner_microstep: 1511.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 02:42:38,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1589.47 | bwd_inner_microstep: 1589.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3032
[2024-06-10 02:42:42,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.40 | optimizer_step: 6.61
[2024-06-10 02:42:42,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.22 | bwd_microstep: 3025.34 | bwd_inner_microstep: 1277.57 | bwd_allreduce_microstep: 1747.70 | step_microstep: 39.74
[2024-06-10 02:42:42,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15821.88 | bwd: 43991.42 | bwd_inner: 42242.64 | bwd_allreduce: 1748.01 | step: 41.67
{'loss': 1.3809, 'learning_rate': 3.985591239406187e-05, 'epoch': 0.07}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 02:42:45,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 2439.25 | bwd_inner_microstep: 2439.09 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 02:42:47,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1384.13 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 02:42:49,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.21 | bwd_microstep: 1386.77 | bwd_inner_microstep: 1386.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 02:42:51,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.61 | bwd_microstep: 1489.35 | bwd_inner_microstep: 1489.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 02:42:53,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.65 | bwd_microstep: 1435.15 | bwd_inner_microstep: 1435.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 02:42:55,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.37 | bwd_microstep: 1651.63 | bwd_inner_microstep: 1651.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 02:42:57,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1248.15 | bwd_inner_microstep: 1248.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 02:42:59,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.16 | bwd_microstep: 1388.52 | bwd_inner_microstep: 1388.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 02:43:00,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.46 | bwd_microstep: 1150.38 | bwd_inner_microstep: 1150.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 02:43:01,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.49 | bwd_microstep: 798.87 | bwd_inner_microstep: 798.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3727
[2024-06-10 02:43:04,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.26 | bwd_microstep: 1593.78 | bwd_inner_microstep: 1593.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3405
[2024-06-10 02:43:06,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.56 | bwd_microstep: 1403.54 | bwd_inner_microstep: 1403.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3668
[2024-06-10 02:43:08,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.26 | bwd_microstep: 1821.49 | bwd_inner_microstep: 1821.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 02:43:10,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1352.78 | bwd_inner_microstep: 1352.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 02:43:12,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1527.19 | bwd_inner_microstep: 1527.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-10 02:43:13,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.22 | bwd_microstep: 805.94 | bwd_inner_microstep: 805.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 02:43:15,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.65 | bwd_microstep: 1290.43 | bwd_inner_microstep: 1290.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 02:43:17,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.94 | bwd_microstep: 1493.66 | bwd_inner_microstep: 1493.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 02:43:19,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1550.60 | bwd_inner_microstep: 1550.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 02:43:21,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.06 | bwd_microstep: 1283.14 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 02:43:23,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1388.07 | bwd_inner_microstep: 1388.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 02:43:25,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1399.11 | bwd_inner_microstep: 1399.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-10 02:43:27,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.65 | bwd_microstep: 1412.60 | bwd_inner_microstep: 1412.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 02:43:29,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.34 | bwd_microstep: 1559.34 | bwd_inner_microstep: 1559.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2213
[2024-06-10 02:43:30,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.25 | bwd_microstep: 865.92 | bwd_inner_microstep: 865.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 02:43:32,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.68 | bwd_microstep: 1658.00 | bwd_inner_microstep: 1657.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 02:43:34,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.42 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968
[2024-06-10 02:43:35,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.77 | bwd_microstep: 705.82 | bwd_inner_microstep: 705.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.77
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 02:43:37,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-10 02:43:38,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.25 | bwd_microstep: 968.61 | bwd_inner_microstep: 968.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 02:43:41,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1581.56 | bwd_inner_microstep: 1581.31 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 02:43:43,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 02:43:43,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.26 | bwd_microstep: 1503.77 | bwd_inner_microstep: 1328.67 | bwd_allreduce_microstep: 175.05 | step_microstep: 40.65
[2024-06-10 02:43:43,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16102.61 | bwd: 44232.94 | bwd_inner: 44056.68 | bwd_allreduce: 175.44 | step: 44.75


  6%|▋         | 111/1726 [2:00:11<27:37:54, 61.59s/it]
  6%|▋         | 112/1726 [2:01:12<27:32:16, 61.42s/it]


  6%|▋         | 112/1726 [2:01:12<27:32:16, 61.42s/it]
  7%|▋         | 113/1726 [2:02:13<27:34:01, 61.53s/it]


  7%|▋         | 113/1726 [2:02:13<27:34:01, 61.53s/it]
  7%|▋         | 114/1726 [2:03:16<27:44:37, 61.96s/it]


  7%|▋         | 114/1726 [2:03:16<27:44:37, 61.96s/it]
  7%|▋         | 115/1726 [2:04:19<27:45:44, 62.04s/it]


  7%|▋         | 115/1726 [2:04:19<27:45:44, 62.04s/it]
  7%|▋         | 116/1726 [2:05:19<27:29:49, 61.48s/it]


  7%|▋         | 116/1726 [2:05:19<27:29:49, 61.48s/it]
  7%|▋         | 117/1726 [2:06:19<27:22:39,{'loss': 1.3572, 'learning_rate': 3.985138010718483e-05, 'epoch': 0.07}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 02:43:44,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.27 | bwd_microstep: 1290.86 | bwd_inner_microstep: 1290.74 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2413
[2024-06-10 02:43:46,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.43 | bwd_microstep: 1004.19 | bwd_inner_microstep: 1004.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 02:43:48,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.13 | bwd_microstep: 1377.88 | bwd_inner_microstep: 1377.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1931
[2024-06-10 02:43:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.24 | bwd_microstep: 759.80 | bwd_inner_microstep: 759.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 02:43:51,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.71 | bwd_microstep: 1342.93 | bwd_inner_microstep: 1342.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 02:43:53,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.53 | bwd_microstep: 1421.17 | bwd_inner_microstep: 1421.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 02:43:55,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.87 | bwd_microstep: 1502.85 | bwd_inner_microstep: 1502.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 02:43:57,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2478
[2024-06-10 02:43:58,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.74 | bwd_microstep: 895.39 | bwd_inner_microstep: 895.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 02:44:00,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.78 | bwd_microstep: 1164.67 | bwd_inner_microstep: 1164.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 02:44:00,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.22 | bwd_microstep: 680.23 | bwd_inner_microstep: 680.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3612
[2024-06-10 02:44:03,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1467.02 | bwd_inner_microstep: 1466.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3434
[2024-06-10 02:44:05,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.94 | bwd_microstep: 1481.90 | bwd_inner_microstep: 1481.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 02:44:06,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1351.11 | bwd_inner_microstep: 1351.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 02:44:09,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.53 | bwd_microstep: 1525.01 | bwd_inner_microstep: 1524.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 02:44:10,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1339.03 | bwd_inner_microstep: 1339.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 02:44:12,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1279.28 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475
[2024-06-10 02:44:14,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1414.49 | bwd_inner_microstep: 1414.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 02:44:16,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1384.29 | bwd_inner_microstep: 1384.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 02:44:18,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.65 | bwd_microstep: 1615.29 | bwd_inner_microstep: 1615.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3556
[2024-06-10 02:44:20,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.12 | bwd_microstep: 1204.71 | bwd_inner_microstep: 1204.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 02:44:22,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 02:44:23,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.20 | bwd_microstep: 802.92 | bwd_inner_microstep: 802.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3541
[2024-06-10 02:44:25,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1360.00 | bwd_inner_microstep: 1359.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3727
[2024-06-10 02:44:27,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1401.17 | bwd_inner_microstep: 1401.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3439
[2024-06-10 02:44:29,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.41 | bwd_microstep: 1158.70 | bwd_inner_microstep: 1158.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 02:44:30,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.93 | bwd_microstep: 914.39 | bwd_inner_microstep: 914.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2257
[2024-06-10 02:44:31,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.37 | bwd_microstep: 872.89 | bwd_inner_microstep: 872.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 02:44:33,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.99 | bwd_microstep: 1537.84 | bwd_inner_microstep: 1537.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2048
[2024-06-10 02:44:34,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.31 | bwd_microstep: 873.02 | bwd_inner_microstep: 872.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3807
[2024-06-10 02:44:36,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.22 | bwd_microstep: 1506.62 | bwd_inner_microstep: 1506.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 02:44:45,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.40 | optimizer_step: 6.60
[2024-06-10 02:44:45,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.75 | bwd_microstep: 8389.62 | bwd_inner_microstep: 1635.10 | bwd_allreduce_microstep: 6754.45 | step_microstep: 39.88
[2024-06-10 02:44:45,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15148.74 | bwd: 47198.71 | bwd_inner: 40443.18 | bwd_allreduce: 6754.75 | step: 43.04
{'loss': 1.3803, 'learning_rate': 3.984677790385025e-05, 'epoch': 0.07}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4244
[2024-06-10 02:44:48,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.70 | bwd_microstep: 1708.17 | bwd_inner_microstep: 1708.07 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 02:44:50,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1351.34 | bwd_inner_microstep: 1351.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 02:44:52,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.59 | bwd_microstep: 1410.69 | bwd_inner_microstep: 1410.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:44:53,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1244.30 | bwd_inner_microstep: 1244.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3986
[2024-06-10 02:44:56,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.25 | bwd_microstep: 1605.32 | bwd_inner_microstep: 1605.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 02:44:58,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1482.80 | bwd_inner_microstep: 1482.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-10 02:45:00,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1538.69 | bwd_inner_microstep: 1538.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3400
[2024-06-10 02:45:01,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.88 | bwd_microstep: 1208.57 | bwd_inner_microstep: 1208.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 02:45:03,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.55 | bwd_microstep: 1345.67 | bwd_inner_microstep: 1345.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 02:45:05,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.75 | bwd_microstep: 1247.72 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 02:45:07,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.09 | bwd_microstep: 1289.05 | bwd_inner_microstep: 1289.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977
[2024-06-10 02:45:08,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.29 | bwd_microstep: 893.44 | bwd_inner_microstep: 893.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3411
[2024-06-10 02:45:10,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1540.76 | bwd_inner_microstep: 1540.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 02:45:12,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.19 | bwd_microstep: 1420.92 | bwd_inner_microstep: 1420.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499
[2024-06-10 02:45:14,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1411.83 | bwd_inner_microstep: 1411.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-10 02:45:16,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.68 | bwd_microstep: 1314.68 | bwd_inner_microstep: 1314.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3696
[2024-06-10 02:45:18,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.35 | bwd_microstep: 1559.41 | bwd_inner_microstep: 1559.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 02:45:20,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.12 | bwd_microstep: 1492.83 | bwd_inner_microstep: 1492.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3513
[2024-06-10 02:45:22,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1225.36 | bwd_inner_microstep: 1225.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1982
[2024-06-10 02:45:23,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.40 | bwd_microstep: 768.50 | bwd_inner_microstep: 768.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 02:45:25,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.17 | bwd_microstep: 1495.80 | bwd_inner_microstep: 1495.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 02:45:27,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1283.48 | bwd_inner_microstep: 1283.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537
[2024-06-10 02:45:29,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.76 | bwd_microstep: 1451.45 | bwd_inner_microstep: 1451.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 02:45:31,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.08 | bwd_microstep: 1434.97 | bwd_inner_microstep: 1434.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3818
[2024-06-10 02:45:33,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.69 | bwd_microstep: 1686.62 | bwd_inner_microstep: 1686.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2285
[2024-06-10 02:45:34,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.29 | bwd_microstep: 1007.37 | bwd_inner_microstep: 1007.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 02:45:36,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.05 | bwd_microstep: 1415.28 | bwd_inner_microstep: 1415.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 02:45:38,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.69 | bwd_microstep: 1513.14 | bwd_inner_microstep: 1513.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2236
[2024-06-10 02:45:40,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.73 | bwd_microstep: 962.98 | bwd_inner_microstep: 962.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3718
[2024-06-10 02:45:42,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.97 | bwd_microstep: 1604.50 | bwd_inner_microstep: 1604.25 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 02:45:44,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.53 | bwd_microstep: 1602.71 | bwd_inner_microstep: 1602.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 02:45:46,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 02:45:46,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1606.42 | bwd_inner_microstep: 1581.73 | bwd_allreduce_microstep: 24.63 | step_microstep: 38.62
[2024-06-10 02:45:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16477.30 | bwd: 44124.81 | bwd_inner: 44098.98 | bwd_allreduce: 25.04 | step: 40.84
{'loss': 1.3228, 'learning_rate': 3.984210580026707e-05, 'epoch': 0.07}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 02:45:48,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.66 | bwd_microstep: 1396.93 | bwd_inner_microstep: 1396.72 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 02:45:50,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.50 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2339
[2024-06-10 02:45:51,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.00 | bwd_microstep: 953.58 | bwd_inner_microstep: 953.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 02:45:54,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.73 | bwd_microstep: 1652.20 | bwd_inner_microstep: 1652.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 02:45:55,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1246.38 | bwd_inner_microstep: 1246.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 02:45:57,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1387.85 | bwd_inner_microstep: 1387.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2473
[2024-06-10 02:45:59,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.07 | bwd_microstep: 969.91 | bwd_inner_microstep: 969.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 02:46:01,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1387.36 | bwd_inner_microstep: 1387.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 02:46:03,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1382.22 | bwd_inner_microstep: 1382.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 02:46:04,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.17 | bwd_microstep: 793.42 | bwd_inner_microstep: 793.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 02:46:05,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.54 | bwd_microstep: 1279.60 | bwd_inner_microstep: 1279.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3444
[2024-06-10 02:46:07,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.32 | bwd_microstep: 1191.50 | bwd_inner_microstep: 1191.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 02:46:09,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1316.34 | bwd_inner_microstep: 1316.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 02:46:11,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1288.04 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 02:46:13,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1390.73 | bwd_inner_microstep: 1390.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 02:46:15,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.08 | bwd_microstep: 1515.55 | bwd_inner_microstep: 1515.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387
[2024-06-10 02:46:17,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1340.21 | bwd_inner_microstep: 1340.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 02:46:18,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.81 | bwd_microstep: 1353.70 | bwd_inner_microstep: 1353.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 02:46:20,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1294.95 | bwd_inner_microstep: 1294.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626
[2024-06-10 02:46:22,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.64 | bwd_microstep: 1475.67 | bwd_inner_microstep: 1475.54 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 02:46:24,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.80 | bwd_microstep: 980.04 | bwd_inner_microstep: 980.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 02:46:25,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.33 | bwd_microstep: 1192.59 | bwd_inner_microstep: 1192.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 02:46:26,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.40 | bwd_microstep: 803.22 | bwd_inner_microstep: 803.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2026
[2024-06-10 02:46:27,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.58 | bwd_microstep: 717.47 | bwd_inner_microstep: 717.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-10 02:46:29,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.84 | bwd_microstep: 1436.37 | bwd_inner_microstep: 1436.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 02:46:32,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.90 | bwd_microstep: 1557.13 | bwd_inner_microstep: 1557.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 02:46:34,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.41 | bwd_microstep: 1456.65 | bwd_inner_microstep: 1456.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 02:46:35,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.73 | bwd_microstep: 1340.48 | bwd_inner_microstep: 1340.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2056
[2024-06-10 02:46:37,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.84 | bwd_microstep: 913.12 | bwd_inner_microstep: 913.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-10 02:46:38,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.61 | bwd_microstep: 912.53 | bwd_inner_microstep: 912.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3572
[2024-06-10 02:46:40,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.45 | bwd_microstep: 1525.05 | bwd_inner_microstep: 1525.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805
[2024-06-10 02:46:49,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.39 | optimizer_step: 6.58
[2024-06-10 02:46:49,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.65 | bwd_microstep: 8048.85 | bwd_inner_microstep: 1992.39 | bwd_allreduce_microstep: 6056.38 | step_microstep: 39.89
[2024-06-10 02:46:49,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15220.89 | bwd: 46784.71 | bwd_inner: 40727.12 | bwd_allreduce: 6056.80 | step: 41.84
{'loss': 1.3245, 'learning_rate': 3.983736381289041e-05, 'epoch': 0.07}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426
[2024-06-10 02:46:50,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.26 | bwd_microstep: 1185.40 | bwd_inner_microstep: 1185.21 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2485
[2024-06-10 02:46:52,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.24 | bwd_microstep: 925.65 | bwd_inner_microstep: 925.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 02:46:53,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.76 | bwd_microstep: 1276.71 | bwd_inner_microstep: 1276.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 02:46:55,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 02:46:57,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.21 | bwd_microstep: 1539.97 | bwd_inner_microstep: 1539.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 02:46:59,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1354.71 | bwd_inner_microstep: 1354.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 02:47:01,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.29 | bwd_microstep: 1387.45 | bwd_inner_microstep: 1387.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 02:47:02,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.09 | bwd_microstep: 797.41 | bwd_inner_microstep: 797.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 02:47:04,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.28 | bwd_microstep: 1380.62 | bwd_inner_microstep: 1380.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 02:47:06,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.95 | bwd_microstep: 1382.62 | bwd_inner_microstep: 1382.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-10 02:47:08,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1289.59 | bwd_inner_microstep: 1289.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3665
[2024-06-10 02:47:10,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.86 | bwd_microstep: 1583.07 | bwd_inner_microstep: 1583.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3957
[2024-06-10 02:47:12,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.62 | bwd_microstep: 1602.50 | bwd_inner_microstep: 1602.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498
[2024-06-10 02:47:14,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.95 | bwd_microstep: 1551.65 | bwd_inner_microstep: 1551.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953
[2024-06-10 02:47:16,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.78 | bwd_microstep: 824.47 | bwd_inner_microstep: 824.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 02:47:18,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.17 | bwd_microstep: 1527.51 | bwd_inner_microstep: 1527.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530
[2024-06-10 02:47:20,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.34 | bwd_microstep: 1418.99 | bwd_inner_microstep: 1418.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 02:47:22,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.59 | bwd_microstep: 1444.76 | bwd_inner_microstep: 1444.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2005
[2024-06-10 02:47:23,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.10 | bwd_microstep: 743.08 | bwd_inner_microstep: 743.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622
[2024-06-10 02:47:24,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.61 | bwd_microstep: 1316.10 | bwd_inner_microstep: 1316.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 02:47:26,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1396.15 | bwd_inner_microstep: 1396.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 02:47:29,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.30 | bwd_microstep: 1564.55 | bwd_inner_microstep: 1564.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886
[2024-06-10 02:47:31,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.67 | bwd_microstep: 1592.96 | bwd_inner_microstep: 1592.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 02:47:33,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 1357.12 | bwd_inner_microstep: 1357.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1442
[2024-06-10 02:47:33,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 208.72 | bwd_microstep: 541.93 | bwd_inner_microstep: 541.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2057
[2024-06-10 02:47:35,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.90 | bwd_microstep: 818.30 | bwd_inner_microstep: 818.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 02:47:36,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.15 | bwd_microstep: 1420.92 | bwd_inner_microstep: 1420.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 02:47:38,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.85 | bwd_microstep: 1260.05 | bwd_inner_microstep: 1260.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 02:47:40,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1498.32 | bwd_inner_microstep: 1498.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 02:47:42,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.53 | bwd_microstep: 1408.06 | bwd_inner_microstep: 1408.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 02:47:44,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1456.06 | bwd_inner_microstep: 1456.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2286
[2024-06-10 02:47:48,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.44 | optimizer_step: 6.60
[2024-06-10 02:47:48,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.14 | bwd_microstep: 3727.84 | bwd_inner_microstep: 983.78 | bwd_allreduce_microstep: 2743.98 | step_microstep: 39.34
[2024-06-10 02:47:48,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15376.82 | bwd: 43852.33 | bwd_inner: 41107.27 | bwd_allreduce: 2744.31 | step: 41.48
{'loss': 1.3113, 'learning_rate': 3.983255195842152e-05, 'epoch': 0.07}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 02:47:50,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.74 | bwd_microstep: 1471.42 | bwd_inner_microstep: 1471.27 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3931
[2024-06-10 02:47:52,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.97 | bwd_microstep: 1398.34 | bwd_inner_microstep: 1398.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2309
[2024-06-10 02:47:54,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.80 | bwd_microstep: 979.48 | bwd_inner_microstep: 979.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4124
[2024-06-10 02:47:56,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.51 | bwd_microstep: 1639.25 | bwd_inner_microstep: 1639.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:47:58,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.93 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764
[2024-06-10 02:48:00,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.64 | bwd_microstep: 1547.26 | bwd_inner_microstep: 1547.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3491
[2024-06-10 02:48:02,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.17 | bwd_microstep: 1221.96 | bwd_inner_microstep: 1221.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 02:48:03,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.81 | bwd_microstep: 1156.69 | bwd_inner_microstep: 1156.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3499
[2024-06-10 02:48:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.05 | bwd_microstep: 1650.90 | bwd_inner_microstep: 1650.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 02:48:07,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.50 | bwd_microstep: 1326.85 | bwd_inner_microstep: 1326.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3826
[2024-06-10 02:48:10,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.27 | bwd_microstep: 1757.54 | bwd_inner_microstep: 1757.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3462
[2024-06-10 02:48:12,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.05 | bwd_microstep: 1441.31 | bwd_inner_microstep: 1441.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 02:48:14,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1354.36 | bwd_inner_microstep: 1354.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 02:48:15,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1402.03 | bwd_inner_microstep: 1402.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 02:48:17,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1285.48 | bwd_inner_microstep: 1285.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 02:48:19,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1380.27 | bwd_inner_microstep: 1380.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-10 02:48:21,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.94 | bwd_microstep: 1526.45 | bwd_inner_microstep: 1526.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 02:48:22,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.61 | bwd_microstep: 696.68 | bwd_inner_microstep: 696.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 02:48:24,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1420.26 | bwd_inner_microstep: 1420.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 02:48:26,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.81 | bwd_microstep: 1169.42 | bwd_inner_microstep: 1169.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-10 02:48:28,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.07 | bwd_microstep: 1185.01 | bwd_inner_microstep: 1184.84 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 02:48:30,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1513.62 | bwd_inner_microstep: 1513.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3607
[2024-06-10 02:48:31,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1342.50 | bwd_inner_microstep: 1342.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 02:48:33,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3555
[2024-06-10 02:48:35,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.33 | bwd_microstep: 1363.24 | bwd_inner_microstep: 1363.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2075
[2024-06-10 02:48:36,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.11 | bwd_microstep: 860.95 | bwd_inner_microstep: 860.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3574
[2024-06-10 02:48:39,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.95 | bwd_microstep: 1597.70 | bwd_inner_microstep: 1597.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3722
[2024-06-10 02:48:41,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1493.45 | bwd_inner_microstep: 1493.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 02:48:43,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1392.29 | bwd_inner_microstep: 1392.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 02:48:45,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1553.08 | bwd_inner_microstep: 1553.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3423
[2024-06-10 02:48:47,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.67 | bwd_microstep: 1493.85 | bwd_inner_microstep: 1493.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 02:48:49,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 02:48:49,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.23 | bwd_microstep: 1690.16 | bwd_inner_microstep: 1682.35 | bwd_allreduce_microstep: 7.76 | step_microstep: 38.53
[2024-06-10 02:48:49,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16466.54 | bwd: 43847.54 | bwd_inner: 43838.62 | bwd_allreduce: 8.13 | step: 40.70
{'loss': 1.3728, 'learning_rate': 3.982767025380774e-05, 'epoch': 0.07}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 02:48:51,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1367.92 | bwd_inner_microstep: 1367.86 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 02:48:53,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.74 | bwd_microstep: 1357.38 | bwd_inner_microstep: 1357.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 02:48:55,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1381.98 | bwd_inner_microstep: 1381.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3916
[2024-06-10 02:48:57,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.61 | bwd_microstep: 1692.24 | bwd_inner_microstep: 1692.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-10 02:48:59,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.62 | bwd_microstep: 1187.11 | bwd_inner_microstep: 1187.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1898
[2024-06-10 02:49:00,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.62 | bwd_microstep: 717.82 | bwd_inner_microstep: 717.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3795
[2024-06-10 02:49:02,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.47 | bwd_microstep: 1356.06 | bwd_inner_microstep: 1356.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482
[2024-06-10 02:49:03,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.26 | bwd_microstep: 1251.63 | bwd_inner_microstep: 1251.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2223
[2024-06-10 02:49:05,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.75 | bwd_microstep: 962.32 | bwd_inner_microstep: 962.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 02:49:06,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.07 | bwd_microstep: 684.28 | bwd_inner_microstep: 684.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-10 02:49:07,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.38 | bwd_microstep: 730.97 | bwd_inner_microstep: 730.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-10 02:49:08,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.11 | bwd_microstep: 781.34 | bwd_inner_microstep: 781.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 02:49:09,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.94 | bwd_microstep: 1185.65 | bwd_inner_microstep: 1185.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 02:49:11,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.16 | bwd_microstep: 1334.06 | bwd_inner_microstep: 1334.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3801
[2024-06-10 02:49:13,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1502.47 | bwd_inner_microstep: 1502.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2620
[2024-06-10 02:49:15,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.48 | bwd_microstep: 1018.40 | bwd_inner_microstep: 1018.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3392
[2024-06-10 02:49:17,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.56 | bwd_microstep: 1307.06 | bwd_inner_microstep: 1307.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 02:49:18,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2941
[2024-06-10 02:49:20,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.18 | bwd_microstep: 1104.45 | bwd_inner_microstep: 1104.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 02:49:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.18 | bwd_microstep: 1615.42 | bwd_inner_microstep: 1615.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-10 02:49:23,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.75 | bwd_microstep: 838.80 | bwd_inner_microstep: 838.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 02:49:25,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.39 | bwd_microstep: 1262.18 | bwd_inner_microstep: 1262.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 02:49:27,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.95 | bwd_microstep: 1505.18 | bwd_inner_microstep: 1505.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 02:49:29,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1408.34 | bwd_inner_microstep: 1408.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3705
[2024-06-10 02:49:31,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.96 | bwd_microstep: 1333.88 | bwd_inner_microstep: 1333.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 02:49:33,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.22 | bwd_microstep: 1192.67 | bwd_inner_microstep: 1192.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3514
[2024-06-10 02:49:34,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1276.99 | bwd_inner_microstep: 1276.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 02:49:36,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1518.00 | bwd_inner_microstep: 1517.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 02:49:39,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.15 | bwd_microstep: 1658.20 | bwd_inner_microstep: 1658.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066
[2024-06-10 02:49:40,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.80 | bwd_microstep: 917.72 | bwd_inner_microstep: 917.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 02:49:42,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.78 | bwd_microstep: 1453.29 | bwd_inner_microstep: 1453.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-10 02:49:49,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.43 | optimizer_step: 6.59
[2024-06-10 02:49:49,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.05 | bwd_microstep: 6486.72 | bwd_inner_microstep: 1145.68 | bwd_allreduce_microstep: 5340.97 | step_microstep: 40.07
[2024-06-10 02:49:49,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14786.20 | bwd: 44673.33 | bwd_inner: 39331.37 | bwd_allreduce: 5341.23 | step: 41.96
 61.26s/it]


  7%|▋         | 117/1726 [2:06:19<27:22:39, 61.26s/it]
  7%|▋         | 118/1726 [2:07:22<27:33:27, 61.70s/it]


  7%|▋         | 118/1726 [2:07:22<27:33:27, 61.70s/it]
  7%|▋         | 119/1726 [2:08:23<27:26:43, 61.48s/it]


  7%|▋         | 119/1726 [2:08:23<27:26:43, 61.48s/it]
  7%|▋         | 120/1726 [2:09:26<27:32:49, 61.75s/it]


  7%|▋         | 120/1726 [2:09:26<27:32:49, 61.75s/it]
  7%|▋         | 121/1726 [2:10:25<27:14:32, 61.10s/it]


  7%|▋         | 121/1726 [2:10:25<27:14:32, 61.10s/it]
  7%|▋         | 122/1726 [2:11:26<27:10:16, 60.98s/it]


  7%|▋         | 122/1726 [2:11:26<27:10:16, 60.98s/it]
  7%|▋         | 123/1726 [2:1{'loss': 1.2744, 'learning_rate': 3.98227187162424e-05, 'epoch': 0.07}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 02:49:51,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1364.97 | bwd_inner_microstep: 1364.82 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441
[2024-06-10 02:49:52,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.26 | bwd_microstep: 945.09 | bwd_inner_microstep: 945.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3929
[2024-06-10 02:49:54,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.80 | bwd_microstep: 1696.50 | bwd_inner_microstep: 1696.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 02:49:56,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1245.35 | bwd_inner_microstep: 1245.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 02:49:57,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.43 | bwd_microstep: 687.41 | bwd_inner_microstep: 687.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 02:49:59,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1388.53 | bwd_inner_microstep: 1388.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 02:50:01,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.94 | bwd_microstep: 1245.98 | bwd_inner_microstep: 1245.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 02:50:02,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.33 | bwd_microstep: 681.81 | bwd_inner_microstep: 681.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 02:50:04,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1385.57 | bwd_inner_microstep: 1385.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 02:50:05,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.84 | bwd_microstep: 1289.40 | bwd_inner_microstep: 1289.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 02:50:07,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.52 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 02:50:09,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1374.34 | bwd_inner_microstep: 1374.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 02:50:11,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1394.86 | bwd_inner_microstep: 1394.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 02:50:13,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1348.56 | bwd_inner_microstep: 1348.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-10 02:50:15,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1577.79 | bwd_inner_microstep: 1577.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2454
[2024-06-10 02:50:16,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.33 | bwd_microstep: 1015.72 | bwd_inner_microstep: 1015.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 02:50:18,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1376.66 | bwd_inner_microstep: 1376.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385
[2024-06-10 02:50:20,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1240.49 | bwd_inner_microstep: 1240.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2413
[2024-06-10 02:50:22,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.52 | bwd_microstep: 1064.26 | bwd_inner_microstep: 1064.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-10 02:50:24,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1580.25 | bwd_inner_microstep: 1580.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 02:50:26,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.93 | bwd_microstep: 1455.17 | bwd_inner_microstep: 1455.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516
[2024-06-10 02:50:28,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.00 | bwd_microstep: 1584.31 | bwd_inner_microstep: 1584.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 02:50:30,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.83 | bwd_microstep: 1650.71 | bwd_inner_microstep: 1650.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 02:50:32,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.74 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 02:50:34,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4644
[2024-06-10 02:50:37,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 739.15 | bwd_microstep: 1986.29 | bwd_inner_microstep: 1986.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 02:50:38,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.89 | bwd_microstep: 1423.74 | bwd_inner_microstep: 1423.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725
[2024-06-10 02:50:40,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.32 | bwd_microstep: 1338.02 | bwd_inner_microstep: 1337.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3568
[2024-06-10 02:50:42,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.29 | bwd_microstep: 1449.71 | bwd_inner_microstep: 1449.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3624
[2024-06-10 02:50:44,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.30 | bwd_microstep: 1373.01 | bwd_inner_microstep: 1372.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 02:50:46,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.03 | bwd_microstep: 1505.64 | bwd_inner_microstep: 1505.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 02:50:49,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 02:50:49,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.41 | bwd_microstep: 2264.88 | bwd_inner_microstep: 1662.07 | bwd_allreduce_microstep: 602.75 | step_microstep: 38.87
[2024-06-10 02:50:49,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16135.45 | bwd: 43789.94 | bwd_inner: 43186.17 | bwd_allreduce: 603.04 | step: 40.61
{'loss': 1.2521, 'learning_rate': 3.981769736316478e-05, 'epoch': 0.07}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-10 02:50:51,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1432.94 | bwd_inner_microstep: 1432.82 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 02:50:53,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1345.34 | bwd_inner_microstep: 1345.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 02:50:55,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1249.60 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416
[2024-06-10 02:50:56,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.96 | bwd_microstep: 1185.52 | bwd_inner_microstep: 1185.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053
[2024-06-10 02:50:58,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.10 | bwd_microstep: 819.12 | bwd_inner_microstep: 819.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 02:50:59,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.03 | bwd_microstep: 1298.20 | bwd_inner_microstep: 1298.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 02:51:01,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1248.87 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 02:51:03,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 02:51:05,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.74 | bwd_microstep: 1188.21 | bwd_inner_microstep: 1188.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2892
[2024-06-10 02:51:06,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.67 | bwd_microstep: 1000.77 | bwd_inner_microstep: 1000.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3573
[2024-06-10 02:51:08,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1363.25 | bwd_inner_microstep: 1363.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3664
[2024-06-10 02:51:10,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.17 | bwd_microstep: 1820.06 | bwd_inner_microstep: 1820.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3602
[2024-06-10 02:51:12,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.21 | bwd_microstep: 1467.74 | bwd_inner_microstep: 1467.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3640
[2024-06-10 02:51:15,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.17 | bwd_microstep: 1661.84 | bwd_inner_microstep: 1661.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3510
[2024-06-10 02:51:17,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.46 | bwd_microstep: 1539.03 | bwd_inner_microstep: 1539.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 02:51:19,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1430.00 | bwd_inner_microstep: 1429.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 02:51:21,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1486.31 | bwd_inner_microstep: 1486.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-10 02:51:23,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1987
[2024-06-10 02:51:24,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.36 | bwd_microstep: 897.34 | bwd_inner_microstep: 897.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 02:51:26,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.77 | bwd_microstep: 1600.57 | bwd_inner_microstep: 1600.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-10 02:51:27,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.70 | bwd_microstep: 696.59 | bwd_inner_microstep: 696.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1902
[2024-06-10 02:51:28,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.56 | bwd_microstep: 717.96 | bwd_inner_microstep: 717.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-10 02:51:30,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.86 | bwd_microstep: 1358.22 | bwd_inner_microstep: 1358.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 02:51:32,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1386.25 | bwd_inner_microstep: 1386.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720
[2024-06-10 02:51:34,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.23 | bwd_microstep: 1338.18 | bwd_inner_microstep: 1338.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 02:51:36,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3536
[2024-06-10 02:51:38,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1378.74 | bwd_inner_microstep: 1378.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 02:51:40,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.72 | bwd_microstep: 1328.76 | bwd_inner_microstep: 1328.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 02:51:42,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.12 | bwd_microstep: 1504.24 | bwd_inner_microstep: 1504.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1228
[2024-06-10 02:51:42,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.77 | bwd_microstep: 450.48 | bwd_inner_microstep: 450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3778
[2024-06-10 02:51:44,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.73 | bwd_microstep: 1628.12 | bwd_inner_microstep: 1628.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3916
[2024-06-10 02:51:50,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.38 | optimizer_step: 6.60
[2024-06-10 02:51:50,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.83 | bwd_microstep: 4673.19 | bwd_inner_microstep: 1791.75 | bwd_allreduce_microstep: 2881.36 | step_microstep: 39.92
[2024-06-10 02:51:50,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15605.32 | bwd: 44626.63 | bwd_inner: 41744.24 | bwd_allreduce: 2881.65 | step: 42.30
{'loss': 1.3595, 'learning_rate': 3.9812606212260075e-05, 'epoch': 0.07}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3452
[2024-06-10 02:51:52,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.76 | bwd_microstep: 1545.02 | bwd_inner_microstep: 1544.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 02:51:54,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.66 | bwd_microstep: 1473.43 | bwd_inner_microstep: 1473.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 02:51:56,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1244.01 | bwd_inner_microstep: 1243.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 02:51:57,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1283.62 | bwd_inner_microstep: 1283.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 02:51:59,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1249.40 | bwd_inner_microstep: 1249.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 02:52:01,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4071
[2024-06-10 02:52:03,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.84 | bwd_microstep: 1535.38 | bwd_inner_microstep: 1535.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 02:52:05,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.48 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 02:52:07,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.21 | bwd_microstep: 1405.17 | bwd_inner_microstep: 1405.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3527
[2024-06-10 02:52:09,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.95 | bwd_microstep: 1562.11 | bwd_inner_microstep: 1562.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3410
[2024-06-10 02:52:11,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1491.53 | bwd_inner_microstep: 1491.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 02:52:13,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.80 | bwd_microstep: 1349.02 | bwd_inner_microstep: 1348.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 02:52:15,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.50 | bwd_microstep: 1280.35 | bwd_inner_microstep: 1280.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3490
[2024-06-10 02:52:17,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.71 | bwd_microstep: 1537.62 | bwd_inner_microstep: 1537.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415
[2024-06-10 02:52:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1406.67 | bwd_inner_microstep: 1406.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513
[2024-06-10 02:52:21,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.15 | bwd_microstep: 1453.86 | bwd_inner_microstep: 1453.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 02:52:23,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1612.59 | bwd_inner_microstep: 1612.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 02:52:25,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.46 | bwd_microstep: 1282.51 | bwd_inner_microstep: 1282.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 02:52:26,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1291.79 | bwd_inner_microstep: 1291.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3828
[2024-06-10 02:52:28,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1296.60 | bwd_inner_microstep: 1296.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 02:52:30,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1385.84 | bwd_inner_microstep: 1385.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819
[2024-06-10 02:52:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1391.30 | bwd_inner_microstep: 1391.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 02:52:34,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.30 | bwd_microstep: 1160.25 | bwd_inner_microstep: 1160.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 02:52:36,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.16 | bwd_microstep: 1308.42 | bwd_inner_microstep: 1308.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 02:52:37,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.81 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-10 02:52:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1632.53 | bwd_inner_microstep: 1632.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3554
[2024-06-10 02:52:42,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1452.98 | bwd_inner_microstep: 1452.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2011
[2024-06-10 02:52:43,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.02 | bwd_microstep: 775.28 | bwd_inner_microstep: 775.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 02:52:45,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.65 | bwd_microstep: 1649.29 | bwd_inner_microstep: 1649.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 02:52:47,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1512.98 | bwd_inner_microstep: 1512.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 02:52:49,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.93 | bwd_microstep: 1477.76 | bwd_inner_microstep: 1477.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444
[2024-06-10 02:52:51,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 02:52:51,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1571.26 | bwd_inner_microstep: 1452.78 | bwd_allreduce_microstep: 118.43 | step_microstep: 38.60
[2024-06-10 02:52:51,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16620.25 | bwd: 44403.73 | bwd_inner: 44284.35 | bwd_allreduce: 118.67 | step: 40.70
{'loss': 1.301, 'learning_rate': 3.980744528145929e-05, 'epoch': 0.07}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 02:52:53,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1373.04 | bwd_inner_microstep: 1372.91 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 02:52:55,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.57 | bwd_microstep: 1301.66 | bwd_inner_microstep: 1301.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3834
[2024-06-10 02:52:57,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.18 | bwd_microstep: 1587.78 | bwd_inner_microstep: 1587.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3397
[2024-06-10 02:52:59,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.70 | bwd_microstep: 1280.33 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 02:53:01,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1345.62 | bwd_inner_microstep: 1345.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:53:02,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1251.39 | bwd_inner_microstep: 1251.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 02:53:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.63 | bwd_microstep: 1402.50 | bwd_inner_microstep: 1402.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 02:53:06,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418
[2024-06-10 02:53:08,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.15 | bwd_microstep: 1185.85 | bwd_inner_microstep: 1185.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 02:53:10,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.79 | bwd_microstep: 1519.49 | bwd_inner_microstep: 1519.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 02:53:12,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1389.92 | bwd_inner_microstep: 1389.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 02:53:14,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.14 | bwd_microstep: 1304.67 | bwd_inner_microstep: 1304.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3671
[2024-06-10 02:53:16,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.60 | bwd_microstep: 1588.30 | bwd_inner_microstep: 1588.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-10 02:53:18,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.16 | bwd_microstep: 1441.83 | bwd_inner_microstep: 1441.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 02:53:20,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1375.21 | bwd_inner_microstep: 1375.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 02:53:21,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 796.50 | bwd_inner_microstep: 796.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 02:53:23,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.59 | bwd_microstep: 1294.87 | bwd_inner_microstep: 1294.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 02:53:24,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.40 | bwd_microstep: 1333.17 | bwd_inner_microstep: 1333.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 02:53:26,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1281.57 | bwd_inner_microstep: 1281.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 02:53:27,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.46 | bwd_microstep: 806.26 | bwd_inner_microstep: 806.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 02:53:30,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1660.92 | bwd_inner_microstep: 1660.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-10 02:53:31,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.12 | bwd_microstep: 933.59 | bwd_inner_microstep: 933.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 02:53:32,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.65 | bwd_microstep: 800.53 | bwd_inner_microstep: 800.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 02:53:34,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.85 | bwd_microstep: 1381.77 | bwd_inner_microstep: 1381.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 02:53:36,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.53 | bwd_microstep: 1539.50 | bwd_inner_microstep: 1539.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3855
[2024-06-10 02:53:38,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1626.10 | bwd_inner_microstep: 1626.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 02:53:40,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1506.07 | bwd_inner_microstep: 1506.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3534
[2024-06-10 02:53:42,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1420.68 | bwd_inner_microstep: 1420.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 02:53:44,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1464.08 | bwd_inner_microstep: 1464.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3724
[2024-06-10 02:53:47,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.27 | bwd_microstep: 1704.17 | bwd_inner_microstep: 1704.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 02:53:49,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1599.36 | bwd_inner_microstep: 1599.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 02:53:54,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.39 | optimizer_step: 6.63
[2024-06-10 02:53:54,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.77 | bwd_microstep: 4515.91 | bwd_inner_microstep: 1690.67 | bwd_allreduce_microstep: 2825.16 | step_microstep: 39.80
[2024-06-10 02:53:54,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16244.05 | bwd: 46263.25 | bwd_inner: 43437.02 | bwd_allreduce: 2825.47 | step: 41.52
{'loss': 1.34, 'learning_rate': 3.980221458893919e-05, 'epoch': 0.07}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459
[2024-06-10 02:53:56,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.12 | bwd_microstep: 1333.06 | bwd_inner_microstep: 1332.84 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3926
[2024-06-10 02:53:58,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.58 | bwd_microstep: 1690.06 | bwd_inner_microstep: 1690.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 02:54:00,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.14 | bwd_microstep: 1485.74 | bwd_inner_microstep: 1485.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2330
[2024-06-10 02:54:02,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.41 | bwd_microstep: 983.59 | bwd_inner_microstep: 983.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910
[2024-06-10 02:54:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.63 | bwd_microstep: 779.90 | bwd_inner_microstep: 779.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 02:54:05,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.27 | bwd_microstep: 1279.13 | bwd_inner_microstep: 1279.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 02:54:06,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.26 | bwd_microstep: 1282.96 | bwd_inner_microstep: 1282.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 02:54:08,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.73 | bwd_microstep: 1378.08 | bwd_inner_microstep: 1378.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 02:54:10,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1282.06 | bwd_inner_microstep: 1282.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 02:54:12,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.94 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3483
[2024-06-10 02:54:14,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.52 | bwd_microstep: 1326.22 | bwd_inner_microstep: 1326.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 02:54:16,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1471.42 | bwd_inner_microstep: 1471.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2120
[2024-06-10 02:54:17,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.21 | bwd_microstep: 923.43 | bwd_inner_microstep: 923.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-10 02:54:19,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.18 | bwd_microstep: 1188.41 | bwd_inner_microstep: 1188.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3409
[2024-06-10 02:54:21,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.22 | bwd_microstep: 1433.84 | bwd_inner_microstep: 1433.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 02:54:23,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1345.17 | bwd_inner_microstep: 1345.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 02:54:25,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1388.93 | bwd_inner_microstep: 1388.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 02:54:26,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.21 | bwd_microstep: 1288.31 | bwd_inner_microstep: 1288.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 02:54:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.03 | bwd_microstep: 1515.99 | bwd_inner_microstep: 1515.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2481
[2024-06-10 02:54:30,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.43 | bwd_microstep: 1051.49 | bwd_inner_microstep: 1051.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463
[2024-06-10 02:54:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.97 | bwd_microstep: 1243.05 | bwd_inner_microstep: 1243.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 02:54:34,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.42 | bwd_microstep: 1461.66 | bwd_inner_microstep: 1461.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 02:54:36,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1413.14 | bwd_inner_microstep: 1413.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 02:54:38,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1413.22 | bwd_inner_microstep: 1413.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 02:54:39,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1378.31 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 02:54:42,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.14 | bwd_microstep: 1631.55 | bwd_inner_microstep: 1631.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 02:54:43,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1254.87 | bwd_inner_microstep: 1254.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 02:54:46,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.30 | bwd_microstep: 1554.83 | bwd_inner_microstep: 1554.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 02:54:47,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.27 | bwd_microstep: 1262.42 | bwd_inner_microstep: 1262.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 02:54:50,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.92 | bwd_microstep: 1654.54 | bwd_inner_microstep: 1654.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2069
[2024-06-10 02:54:51,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.99 | bwd_microstep: 728.34 | bwd_inner_microstep: 728.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577
[2024-06-10 02:54:55,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.27 | optimizer_step: 6.61
[2024-06-10 02:54:55,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.92 | bwd_microstep: 3623.52 | bwd_inner_microstep: 1623.10 | bwd_allreduce_microstep: 2000.36 | step_microstep: 39.35
[2024-06-10 02:54:55,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15908.31 | bwd: 44485.35 | bwd_inner: 42483.91 | bwd_allreduce: 2000.68 | step: 41.53
{'loss': 1.3578, 'learning_rate': 3.979691415312225e-05, 'epoch': 0.07}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 02:54:57,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.88 | bwd_microstep: 1468.09 | bwd_inner_microstep: 1467.86 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 02:54:59,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1302.57 | bwd_inner_microstep: 1302.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3922
[2024-06-10 02:55:01,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.50 | bwd_microstep: 1592.13 | bwd_inner_microstep: 1592.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 02:55:02,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.96 | bwd_microstep: 793.25 | bwd_inner_microstep: 793.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 02:55:04,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1278.40 | bwd_inner_microstep: 1278.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 02:55:06,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.15 | bwd_microstep: 1402.35 | bwd_inner_microstep: 1402.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-10 02:55:07,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.61 | bwd_microstep: 705.90 | bwd_inner_microstep: 705.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 02:55:08,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1251.53 | bwd_inner_microstep: 1251.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 02:55:10,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1296.59 | bwd_inner_microstep: 1296.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 02:55:12,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1387.10 | bwd_inner_microstep: 1387.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 02:55:14,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.15 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 02:55:16,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.18 | bwd_microstep: 1251.58 | bwd_inner_microstep: 1251.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 02:55:18,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.79 | bwd_microstep: 1538.28 | bwd_inner_microstep: 1538.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3462
[2024-06-10 02:55:20,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.74 | bwd_microstep: 1243.37 | bwd_inner_microstep: 1243.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687
[2024-06-10 02:55:22,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.20 | bwd_microstep: 1728.08 | bwd_inner_microstep: 1728.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3384
[2024-06-10 02:55:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1307.50 | bwd_inner_microstep: 1307.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 02:55:26,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.44 | bwd_microstep: 1399.44 | bwd_inner_microstep: 1399.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 02:55:27,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.14 | bwd_microstep: 883.57 | bwd_inner_microstep: 883.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 02:55:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.79 | bwd_microstep: 1419.17 | bwd_inner_microstep: 1419.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 02:55:31,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.38 | bwd_microstep: 1415.88 | bwd_inner_microstep: 1415.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 02:55:33,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.65 | bwd_microstep: 1457.62 | bwd_inner_microstep: 1457.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 02:55:35,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1430.47 | bwd_inner_microstep: 1430.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428
[2024-06-10 02:55:37,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1303.58 | bwd_inner_microstep: 1303.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3474
[2024-06-10 02:55:38,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.59 | bwd_microstep: 1343.22 | bwd_inner_microstep: 1343.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3675
[2024-06-10 02:55:41,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.02 | bwd_microstep: 1460.86 | bwd_inner_microstep: 1460.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 02:55:42,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1341.38 | bwd_inner_microstep: 1341.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 02:55:44,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.84 | bwd_microstep: 1293.23 | bwd_inner_microstep: 1293.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3490
[2024-06-10 02:55:46,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.09 | bwd_microstep: 1411.10 | bwd_inner_microstep: 1411.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3725
[2024-06-10 02:55:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.42 | bwd_microstep: 1470.04 | bwd_inner_microstep: 1470.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 02:55:50,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1398.66 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 02:55:52,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.85 | bwd_microstep: 1608.74 | bwd_inner_microstep: 1608.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 580
[2024-06-10 02:55:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.74 | optimizer_step: 6.61
[2024-06-10 02:55:56,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.06 | bwd_microstep: 3126.29 | bwd_inner_microstep: 300.10 | bwd_allreduce_microstep: 2826.07 | step_microstep: 44.89
[2024-06-10 02:55:56,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15702.77 | bwd: 44597.28 | bwd_inner: 41770.04 | bwd_allreduce: 2826.43 | step: 46.85
2:26<26:59:58, 60.64s/it]


  7%|▋         | 123/1726 [2:12:26<26:59:58, 60.64s/it]
  7%|▋         | 124/1726 [2:13:26<26:56:15, 60.53s/it]


  7%|▋         | 124/1726 [2:13:26<26:56:15, 60.53s/it]
  7%|▋         | 125/1726 [2:14:27<26:55:54, 60.56s/it]


  7%|▋         | 125/1726 [2:14:27<26:55:54, 60.56s/it]
  7%|▋         | 126/1726 [2:15:28<27:01:44, 60.82s/it]


  7%|▋         | 126/1726 [2:15:28<27:01:44, 60.82s/it]
  7%|▋         | 127/1726 [2:16:31<27:17:16, 61.44s/it]


  7%|▋         | 127/1726 [2:16:31<27:17:16, 61.44s/it]
  7%|▋         | 128/1726 [2:17:32<27:10:57, 61.24s/it]


  7%|▋         | 128/1726 [2:17:32<27:10:57, 61.24s/it]
  7%|▋         |{'loss': 1.3494, 'learning_rate': 3.979154399267657e-05, 'epoch': 0.07}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 02:55:57,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1293.32 | bwd_inner_microstep: 1293.26 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 02:55:59,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.73 | bwd_microstep: 1394.20 | bwd_inner_microstep: 1394.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 02:56:00,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.76 | bwd_microstep: 803.14 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3842
[2024-06-10 02:56:02,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1397.26 | bwd_inner_microstep: 1397.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 02:56:03,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.32 | bwd_microstep: 697.41 | bwd_inner_microstep: 697.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 02:56:05,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.38 | bwd_microstep: 1477.22 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 02:56:06,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.12 | bwd_microstep: 793.14 | bwd_inner_microstep: 793.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228
[2024-06-10 02:56:08,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.42 | bwd_microstep: 961.08 | bwd_inner_microstep: 961.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884
[2024-06-10 02:56:09,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.03 | bwd_microstep: 713.55 | bwd_inner_microstep: 713.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3704
[2024-06-10 02:56:11,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1329.33 | bwd_inner_microstep: 1329.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 02:56:12,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.10 | bwd_microstep: 794.68 | bwd_inner_microstep: 794.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 02:56:14,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.61 | bwd_microstep: 1313.72 | bwd_inner_microstep: 1313.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 02:56:15,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.12 | bwd_microstep: 1395.41 | bwd_inner_microstep: 1395.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 02:56:17,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1344.38 | bwd_inner_microstep: 1344.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-10 02:56:20,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.21 | bwd_microstep: 1621.58 | bwd_inner_microstep: 1621.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 02:56:22,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.01 | bwd_microstep: 1488.46 | bwd_inner_microstep: 1488.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3641
[2024-06-10 02:56:23,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.59 | bwd_microstep: 1260.93 | bwd_inner_microstep: 1260.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625
[2024-06-10 02:56:26,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.67 | bwd_microstep: 1539.75 | bwd_inner_microstep: 1539.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1999
[2024-06-10 02:56:27,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.57 | bwd_microstep: 835.42 | bwd_inner_microstep: 835.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3548
[2024-06-10 02:56:28,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.62 | bwd_microstep: 1201.62 | bwd_inner_microstep: 1201.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 02:56:30,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1413.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2074
[2024-06-10 02:56:32,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.92 | bwd_microstep: 919.24 | bwd_inner_microstep: 919.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 02:56:34,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1511.98 | bwd_inner_microstep: 1511.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 02:56:36,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1389.08 | bwd_inner_microstep: 1389.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645
[2024-06-10 02:56:38,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1516.92 | bwd_inner_microstep: 1516.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545
[2024-06-10 02:56:40,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.62 | bwd_microstep: 1596.57 | bwd_inner_microstep: 1596.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559
[2024-06-10 02:56:42,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1433.03 | bwd_inner_microstep: 1433.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 02:56:44,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.37 | bwd_microstep: 1354.30 | bwd_inner_microstep: 1354.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 02:56:45,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.34 | bwd_microstep: 791.26 | bwd_inner_microstep: 791.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3395
[2024-06-10 02:56:47,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.92 | bwd_microstep: 1443.14 | bwd_inner_microstep: 1443.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 02:56:49,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.19 | bwd_microstep: 1643.27 | bwd_inner_microstep: 1643.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 02:56:54,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.41 | optimizer_step: 6.59
[2024-06-10 02:56:54,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.38 | bwd_microstep: 4482.46 | bwd_inner_microstep: 1679.49 | bwd_allreduce_microstep: 2802.89 | step_microstep: 39.88
[2024-06-10 02:56:54,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15081.68 | bwd: 43150.93 | bwd_inner: 40347.05 | bwd_allreduce: 2803.15 | step: 41.78
{'loss': 1.3261, 'learning_rate': 3.978610412651584e-05, 'epoch': 0.08}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:56:56,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1246.08 | bwd_inner_microstep: 1246.02 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402
[2024-06-10 02:56:58,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.22 | bwd_microstep: 1211.51 | bwd_inner_microstep: 1211.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 02:56:59,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 795.46 | bwd_inner_microstep: 795.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3861
[2024-06-10 02:57:01,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.66 | bwd_microstep: 1460.48 | bwd_inner_microstep: 1460.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2275
[2024-06-10 02:57:02,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.78 | bwd_microstep: 815.45 | bwd_inner_microstep: 815.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 02:57:04,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1247.63 | bwd_inner_microstep: 1247.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 02:57:05,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.35 | bwd_microstep: 793.71 | bwd_inner_microstep: 793.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 02:57:06,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.36 | bwd_microstep: 704.10 | bwd_inner_microstep: 704.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 02:57:07,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.98 | bwd_microstep: 1289.28 | bwd_inner_microstep: 1289.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 02:57:08,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.19 | bwd_microstep: 683.80 | bwd_inner_microstep: 683.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3493
[2024-06-10 02:57:10,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.61 | bwd_microstep: 1366.24 | bwd_inner_microstep: 1366.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1941
[2024-06-10 02:57:12,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.09 | bwd_microstep: 857.79 | bwd_inner_microstep: 857.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 02:57:13,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1408.71 | bwd_inner_microstep: 1408.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 02:57:16,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.22 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 02:57:17,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1376.20 | bwd_inner_microstep: 1375.96 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 02:57:18,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.67 | bwd_microstep: 720.66 | bwd_inner_microstep: 720.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 02:57:20,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.04 | bwd_microstep: 1355.42 | bwd_inner_microstep: 1355.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2099
[2024-06-10 02:57:21,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.35 | bwd_microstep: 822.74 | bwd_inner_microstep: 822.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 02:57:23,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 1395.58 | bwd_inner_microstep: 1395.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964
[2024-06-10 02:57:24,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.14 | bwd_microstep: 704.08 | bwd_inner_microstep: 704.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 02:57:27,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1558.39 | bwd_inner_microstep: 1558.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 02:57:28,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.26 | bwd_microstep: 1387.54 | bwd_inner_microstep: 1387.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 02:57:30,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.23 | bwd_microstep: 972.06 | bwd_inner_microstep: 972.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3612
[2024-06-10 02:57:32,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.70 | bwd_microstep: 1382.94 | bwd_inner_microstep: 1382.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 02:57:34,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.60 | bwd_microstep: 1540.12 | bwd_inner_microstep: 1540.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 02:57:36,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.75 | bwd_microstep: 1455.75 | bwd_inner_microstep: 1455.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579
[2024-06-10 02:57:38,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.41 | bwd_microstep: 1428.05 | bwd_inner_microstep: 1428.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3810
[2024-06-10 02:57:40,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1390.17 | bwd_inner_microstep: 1390.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193
[2024-06-10 02:57:41,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.99 | bwd_microstep: 863.27 | bwd_inner_microstep: 863.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565
[2024-06-10 02:57:43,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.60 | bwd_microstep: 1561.79 | bwd_inner_microstep: 1561.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3567
[2024-06-10 02:57:45,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.49 | bwd_microstep: 1433.24 | bwd_inner_microstep: 1433.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3588
[2024-06-10 02:57:57,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.42 | optimizer_step: 6.61
[2024-06-10 02:57:57,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.30 | bwd_microstep: 11085.83 | bwd_inner_microstep: 1537.90 | bwd_allreduce_microstep: 9547.85 | step_microstep: 42.28
[2024-06-10 02:57:57,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14357.24 | bwd: 47804.51 | bwd_inner: 38255.35 | bwd_allreduce: 9548.32 | step: 44.37
{'loss': 1.3943, 'learning_rate': 3.9780594573799234e-05, 'epoch': 0.08}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 02:57:59,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1373.61 | bwd_inner_microstep: 1373.46 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 02:58:01,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1370.90 | bwd_inner_microstep: 1370.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 02:58:03,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.10 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 02:58:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.24 | bwd_microstep: 1467.63 | bwd_inner_microstep: 1467.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871
[2024-06-10 02:58:07,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.47 | bwd_microstep: 1662.51 | bwd_inner_microstep: 1662.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 02:58:09,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1384.64 | bwd_inner_microstep: 1384.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2083
[2024-06-10 02:58:10,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.71 | bwd_microstep: 759.48 | bwd_inner_microstep: 759.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 02:58:12,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1385.10 | bwd_inner_microstep: 1385.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 02:58:13,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.57 | bwd_microstep: 788.72 | bwd_inner_microstep: 788.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 02:58:15,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1286.35 | bwd_inner_microstep: 1286.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 02:58:17,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.82 | bwd_microstep: 1286.37 | bwd_inner_microstep: 1286.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 02:58:18,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1347.45 | bwd_inner_microstep: 1347.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3676
[2024-06-10 02:58:21,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1533.78 | bwd_inner_microstep: 1533.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 02:58:23,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1515.91 | bwd_inner_microstep: 1515.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 02:58:25,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.33 | bwd_microstep: 1480.38 | bwd_inner_microstep: 1480.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515
[2024-06-10 02:58:26,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.76 | bwd_microstep: 1321.53 | bwd_inner_microstep: 1321.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 02:58:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.91 | bwd_microstep: 1321.66 | bwd_inner_microstep: 1321.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3681
[2024-06-10 02:58:30,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.64 | bwd_microstep: 1365.68 | bwd_inner_microstep: 1365.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3511
[2024-06-10 02:58:32,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.81 | bwd_microstep: 1228.07 | bwd_inner_microstep: 1228.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 02:58:34,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1396.45 | bwd_inner_microstep: 1396.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2054
[2024-06-10 02:58:35,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.99 | bwd_microstep: 756.91 | bwd_inner_microstep: 756.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 02:58:37,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1306.71 | bwd_inner_microstep: 1306.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 02:58:39,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.28 | bwd_microstep: 1560.27 | bwd_inner_microstep: 1560.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 02:58:41,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1435.34 | bwd_inner_microstep: 1435.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 02:58:43,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1379.91 | bwd_inner_microstep: 1379.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 02:58:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1403.00 | bwd_inner_microstep: 1402.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2932
[2024-06-10 02:58:46,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.25 | bwd_microstep: 1099.91 | bwd_inner_microstep: 1099.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 02:58:48,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1412.50 | bwd_inner_microstep: 1412.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3732
[2024-06-10 02:58:50,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.74 | bwd_microstep: 1373.97 | bwd_inner_microstep: 1373.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 02:58:52,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.73 | bwd_microstep: 1699.58 | bwd_inner_microstep: 1699.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2019
[2024-06-10 02:58:54,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.00 | bwd_microstep: 903.84 | bwd_inner_microstep: 903.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 02:58:57,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.33 | optimizer_step: 6.59
[2024-06-10 02:58:57,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.79 | bwd_microstep: 2605.94 | bwd_inner_microstep: 1873.95 | bwd_allreduce_microstep: 731.92 | step_microstep: 39.26
[2024-06-10 02:58:57,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16089.18 | bwd: 43772.07 | bwd_inner: 43039.10 | bwd_allreduce: 732.22 | step: 40.93
{'loss': 1.3423, 'learning_rate': 3.9775015353931374e-05, 'epoch': 0.08}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 02:58:59,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.13 | bwd_microstep: 1513.66 | bwd_inner_microstep: 1513.49 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 02:59:01,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1243.77 | bwd_inner_microstep: 1243.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-10 02:59:03,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.69 | bwd_microstep: 1560.96 | bwd_inner_microstep: 1560.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 02:59:05,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.35 | bwd_microstep: 1187.06 | bwd_inner_microstep: 1187.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-10 02:59:07,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.93 | bwd_microstep: 1412.70 | bwd_inner_microstep: 1412.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 02:59:09,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.99 | bwd_microstep: 1647.03 | bwd_inner_microstep: 1647.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 02:59:11,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1387.20 | bwd_inner_microstep: 1387.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 02:59:13,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.61 | bwd_microstep: 1531.83 | bwd_inner_microstep: 1531.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4080
[2024-06-10 02:59:15,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1625.43 | bwd_inner_microstep: 1625.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-10 02:59:17,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.47 | bwd_microstep: 1183.01 | bwd_inner_microstep: 1182.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1956
[2024-06-10 02:59:18,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.63 | bwd_microstep: 859.00 | bwd_inner_microstep: 858.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 02:59:20,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1247.27 | bwd_inner_microstep: 1247.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3396
[2024-06-10 02:59:21,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.60 | bwd_microstep: 1306.90 | bwd_inner_microstep: 1306.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 02:59:23,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.93 | bwd_microstep: 1184.21 | bwd_inner_microstep: 1184.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 02:59:25,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.71 | bwd_microstep: 1415.76 | bwd_inner_microstep: 1415.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 02:59:27,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.34 | bwd_microstep: 1428.76 | bwd_inner_microstep: 1428.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 02:59:29,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.95 | bwd_microstep: 1517.95 | bwd_inner_microstep: 1517.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 02:59:31,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 1290.69 | bwd_inner_microstep: 1290.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2280
[2024-06-10 02:59:32,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.58 | bwd_microstep: 785.35 | bwd_inner_microstep: 785.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 02:59:34,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1401.07 | bwd_inner_microstep: 1401.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 02:59:36,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.23 | bwd_microstep: 1258.34 | bwd_inner_microstep: 1258.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 02:59:38,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.53 | bwd_microstep: 1659.76 | bwd_inner_microstep: 1659.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 02:59:40,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1391.69 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 02:59:42,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.37 | bwd_microstep: 1613.20 | bwd_inner_microstep: 1613.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3745
[2024-06-10 02:59:44,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.32 | bwd_microstep: 1682.04 | bwd_inner_microstep: 1682.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3554
[2024-06-10 02:59:46,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.89 | bwd_microstep: 1245.78 | bwd_inner_microstep: 1245.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3605
[2024-06-10 02:59:48,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.68 | bwd_microstep: 1363.07 | bwd_inner_microstep: 1363.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434
[2024-06-10 02:59:50,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.41 | bwd_microstep: 1316.55 | bwd_inner_microstep: 1316.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3581
[2024-06-10 02:59:52,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.12 | bwd_microstep: 1453.63 | bwd_inner_microstep: 1453.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 02:59:54,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1504.87 | bwd_inner_microstep: 1504.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3579
[2024-06-10 02:59:56,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.83 | bwd_microstep: 1698.74 | bwd_inner_microstep: 1698.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 02:59:58,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 02:59:58,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1293.39 | bwd_inner_microstep: 1284.69 | bwd_allreduce_microstep: 8.64 | step_microstep: 38.60
[2024-06-10 02:59:58,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16588.64 | bwd: 44210.71 | bwd_inner: 44201.03 | bwd_allreduce: 8.93 | step: 40.29
{'loss': 1.3374, 'learning_rate': 3.976936648656223e-05, 'epoch': 0.08}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 03:00:00,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1338.04 | bwd_inner_microstep: 1337.90 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 03:00:02,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.44 | bwd_microstep: 1385.27 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 03:00:04,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1284.44 | bwd_inner_microstep: 1284.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3825
[2024-06-10 03:00:06,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.68 | bwd_microstep: 1420.24 | bwd_inner_microstep: 1420.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 03:00:08,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.71 | bwd_microstep: 1487.44 | bwd_inner_microstep: 1487.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-10 03:00:10,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.41 | bwd_microstep: 1346.60 | bwd_inner_microstep: 1346.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2226
[2024-06-10 03:00:11,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.51 | bwd_microstep: 865.46 | bwd_inner_microstep: 865.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 03:00:13,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 03:00:15,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.79 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3693
[2024-06-10 03:00:17,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1332.09 | bwd_inner_microstep: 1332.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3427
[2024-06-10 03:00:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.59 | bwd_microstep: 1332.26 | bwd_inner_microstep: 1332.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 03:00:20,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1250.86 | bwd_inner_microstep: 1250.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2972
[2024-06-10 03:00:22,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.01 | bwd_microstep: 1156.93 | bwd_inner_microstep: 1156.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 03:00:23,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.70 | bwd_microstep: 1151.48 | bwd_inner_microstep: 1151.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-10 03:00:25,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1404.40 | bwd_inner_microstep: 1404.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 03:00:27,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.71 | bwd_microstep: 1448.80 | bwd_inner_microstep: 1448.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3699
[2024-06-10 03:00:30,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.89 | bwd_microstep: 1835.30 | bwd_inner_microstep: 1835.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 03:00:31,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.44 | bwd_microstep: 805.83 | bwd_inner_microstep: 805.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 03:00:33,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1464.84 | bwd_inner_microstep: 1464.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 03:00:35,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.63 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 03:00:37,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1347.53 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-10 03:00:39,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.23 | bwd_microstep: 1441.18 | bwd_inner_microstep: 1441.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 03:00:41,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.26 | bwd_microstep: 1380.20 | bwd_inner_microstep: 1380.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 03:00:43,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1353.80 | bwd_inner_microstep: 1353.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 03:00:44,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.17 | bwd_microstep: 1288.98 | bwd_inner_microstep: 1288.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 03:00:46,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.13 | bwd_microstep: 1484.34 | bwd_inner_microstep: 1484.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 03:00:49,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1555.07 | bwd_inner_microstep: 1555.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2169
[2024-06-10 03:00:50,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.22 | bwd_microstep: 954.96 | bwd_inner_microstep: 954.72 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3594
[2024-06-10 03:00:52,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.93 | bwd_microstep: 1533.57 | bwd_inner_microstep: 1533.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3592
[2024-06-10 03:00:54,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.29 | bwd_microstep: 1675.76 | bwd_inner_microstep: 1675.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3600
[2024-06-10 03:00:57,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1671.92 | bwd_inner_microstep: 1671.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3761
[2024-06-10 03:00:59,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 03:00:59,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.17 | bwd_microstep: 1646.08 | bwd_inner_microstep: 1638.11 | bwd_allreduce_microstep: 7.92 | step_microstep: 38.57
[2024-06-10 03:00:59,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16427.64 | bwd: 43909.70 | bwd_inner: 43900.34 | bwd_allreduce: 8.41 | step: 41.20
{'loss': 1.3359, 'learning_rate': 3.97636479915871e-05, 'epoch': 0.08}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931
[2024-06-10 03:01:00,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.79 | bwd_microstep: 726.15 | bwd_inner_microstep: 725.99 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3916
[2024-06-10 03:01:02,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.71 | bwd_microstep: 1430.08 | bwd_inner_microstep: 1430.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3875
[2024-06-10 03:01:04,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.54 | bwd_microstep: 1482.59 | bwd_inner_microstep: 1482.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 03:01:05,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.12 | bwd_microstep: 791.80 | bwd_inner_microstep: 791.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 03:01:07,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.46 | bwd_microstep: 1252.25 | bwd_inner_microstep: 1252.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797
[2024-06-10 03:01:09,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.70 | bwd_microstep: 1650.02 | bwd_inner_microstep: 1649.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4165
[2024-06-10 03:01:11,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.36 | bwd_microstep: 1656.88 | bwd_inner_microstep: 1656.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 03:01:13,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.47 | bwd_microstep: 1289.85 | bwd_inner_microstep: 1289.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 03:01:15,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1390.43 | bwd_inner_microstep: 1390.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 03:01:17,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1348.76 | bwd_inner_microstep: 1348.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 03:01:19,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1376.74 | bwd_inner_microstep: 1376.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 03:01:21,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1416.87 | bwd_inner_microstep: 1416.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 03:01:22,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.52 | bwd_microstep: 1151.25 | bwd_inner_microstep: 1151.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2136
[2024-06-10 03:01:24,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.21 | bwd_microstep: 832.77 | bwd_inner_microstep: 832.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 03:01:25,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1349.02 | bwd_inner_microstep: 1348.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2003
[2024-06-10 03:01:27,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.08 | bwd_microstep: 859.69 | bwd_inner_microstep: 859.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3821
[2024-06-10 03:01:29,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.34 | bwd_microstep: 1606.27 | bwd_inner_microstep: 1606.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-10 03:01:30,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 706.91 | bwd_inner_microstep: 706.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 03:01:31,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.63 | bwd_microstep: 1161.08 | bwd_inner_microstep: 1161.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2471
[2024-06-10 03:01:33,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.17 | bwd_microstep: 1023.23 | bwd_inner_microstep: 1023.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 03:01:35,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1295.28 | bwd_inner_microstep: 1295.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512
[2024-06-10 03:01:36,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.89 | bwd_microstep: 1199.00 | bwd_inner_microstep: 1198.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-10 03:01:37,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.30 | bwd_microstep: 702.41 | bwd_inner_microstep: 702.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 03:01:39,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.71 | bwd_microstep: 1561.52 | bwd_inner_microstep: 1561.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539
[2024-06-10 03:01:41,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 1519.61 | bwd_inner_microstep: 1519.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3009
[2024-06-10 03:01:43,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.50 | bwd_microstep: 1229.99 | bwd_inner_microstep: 1229.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 03:01:45,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.55 | bwd_microstep: 1647.38 | bwd_inner_microstep: 1647.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3809
[2024-06-10 03:01:48,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.81 | bwd_microstep: 1599.13 | bwd_inner_microstep: 1599.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054
[2024-06-10 03:01:49,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.58 | bwd_microstep: 913.95 | bwd_inner_microstep: 913.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-10 03:01:51,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.56 | bwd_microstep: 1749.02 | bwd_inner_microstep: 1749.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3620
[2024-06-10 03:01:53,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1487.11 | bwd_inner_microstep: 1487.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437
[2024-06-10 03:02:00,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.38 | optimizer_step: 6.59
[2024-06-10 03:02:00,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 5963.43 | bwd_inner_microstep: 1567.02 | bwd_allreduce_microstep: 4396.34 | step_microstep: 39.91
[2024-06-10 03:02:00,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15334.91 | bwd: 45370.49 | bwd_inner: 40973.09 | bwd_allreduce: 4396.65 | step: 41.91
 129/1726 [2:18:32<27:05:35, 61.07s/it]


  7%|▋         | 129/1726 [2:18:32<27:05:35, 61.07s/it]
  8%|▊         | 130/1726 [2:19:31<26:44:57, 60.34s/it]


  8%|▊         | 130/1726 [2:19:31<26:44:57, 60.34s/it]
  8%|▊         | 131/1726 [2:20:33<27:01:27, 61.00s/it]


  8%|▊         | 131/1726 [2:20:33<27:01:27, 61.00s/it]
  8%|▊         | 132/1726 [2:21:34<26:54:22, 60.77s/it]


  8%|▊         | 132/1726 [2:21:34<26:54:22, 60.77s/it]
  8%|▊         | 133/1726 [2:22:35<26:56:36, 60.89s/it]


  8%|▊         | 133/1726 [2:22:35<26:56:36, 60.89s/it]
  8%|▊         | 134/1726 [2:23:36<26:54:17, 60.84s/it]


  8%|▊         | 134/1726 [2:23:36<26:54:17, 60.84s/it]
  8%{'loss': 1.3472, 'learning_rate': 3.975785988914647e-05, 'epoch': 0.08}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 03:02:02,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1361.34 | bwd_inner_microstep: 1361.23 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4002
[2024-06-10 03:02:04,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.53 | bwd_microstep: 1511.71 | bwd_inner_microstep: 1511.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 03:02:06,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1278.21 | bwd_inner_microstep: 1278.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3800
[2024-06-10 03:02:08,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.03 | bwd_microstep: 1446.87 | bwd_inner_microstep: 1446.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 03:02:10,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.54 | bwd_microstep: 1350.90 | bwd_inner_microstep: 1350.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 03:02:12,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.18 | bwd_microstep: 1438.53 | bwd_inner_microstep: 1438.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2444
[2024-06-10 03:02:13,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.49 | bwd_microstep: 1046.69 | bwd_inner_microstep: 1046.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 03:02:15,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.29 | bwd_microstep: 1289.36 | bwd_inner_microstep: 1289.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 03:02:17,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 03:02:19,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.13 | bwd_microstep: 1543.51 | bwd_inner_microstep: 1543.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 03:02:20,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1287.77 | bwd_inner_microstep: 1287.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2095
[2024-06-10 03:02:22,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.37 | bwd_microstep: 921.94 | bwd_inner_microstep: 921.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2119
[2024-06-10 03:02:23,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.86 | bwd_microstep: 890.53 | bwd_inner_microstep: 890.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 03:02:25,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3700
[2024-06-10 03:02:27,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.83 | bwd_microstep: 1690.47 | bwd_inner_microstep: 1690.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3579
[2024-06-10 03:02:29,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.08 | bwd_microstep: 1696.21 | bwd_inner_microstep: 1696.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 03:02:31,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1395.72 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 03:02:33,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1376.51 | bwd_inner_microstep: 1376.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 03:02:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.21 | bwd_microstep: 1603.89 | bwd_inner_microstep: 1603.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 03:02:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.17 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 03:02:39,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.25 | bwd_microstep: 1397.27 | bwd_inner_microstep: 1397.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 03:02:41,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2357
[2024-06-10 03:02:42,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.65 | bwd_microstep: 895.68 | bwd_inner_microstep: 895.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 03:02:44,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.63 | bwd_microstep: 974.47 | bwd_inner_microstep: 974.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-10 03:02:46,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.36 | bwd_microstep: 1600.27 | bwd_inner_microstep: 1600.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 03:02:47,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.96 | bwd_microstep: 879.40 | bwd_inner_microstep: 879.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 03:02:49,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.49 | bwd_microstep: 1483.75 | bwd_inner_microstep: 1483.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 03:02:51,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.71 | bwd_microstep: 1255.39 | bwd_inner_microstep: 1255.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 03:02:53,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1501.16 | bwd_inner_microstep: 1501.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2195
[2024-06-10 03:02:54,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 768.10 | bwd_inner_microstep: 768.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 03:02:56,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.89 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 03:03:01,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.39 | optimizer_step: 6.58
[2024-06-10 03:03:01,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 4885.27 | bwd_inner_microstep: 1760.06 | bwd_allreduce_microstep: 3125.14 | step_microstep: 39.88
[2024-06-10 03:03:01,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15672.79 | bwd: 45249.69 | bwd_inner: 42123.51 | bwd_allreduce: 3125.44 | step: 41.96
{'loss': 1.3219, 'learning_rate': 3.9752002199626035e-05, 'epoch': 0.08}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079
[2024-06-10 03:03:02,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.74 | bwd_microstep: 815.05 | bwd_inner_microstep: 814.88 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3929
[2024-06-10 03:03:05,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1590.64 | bwd_inner_microstep: 1590.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890
[2024-06-10 03:03:06,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.35 | bwd_microstep: 772.92 | bwd_inner_microstep: 772.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 03:03:08,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1383.88 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 03:03:10,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1437.61 | bwd_inner_microstep: 1437.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 03:03:11,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.73 | bwd_microstep: 1154.12 | bwd_inner_microstep: 1153.91 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.19
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3727
[2024-06-10 03:03:13,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1381.18 | bwd_inner_microstep: 1381.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 03:03:15,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.88 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 03:03:17,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1252.39 | bwd_inner_microstep: 1252.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 03:03:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.51 | bwd_microstep: 1446.71 | bwd_inner_microstep: 1446.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 03:03:20,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 03:03:22,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.00 | bwd_microstep: 1285.96 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2495
[2024-06-10 03:03:23,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.06 | bwd_microstep: 869.69 | bwd_inner_microstep: 869.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 03:03:26,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 03:03:27,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.43 | bwd_microstep: 1257.33 | bwd_inner_microstep: 1257.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 03:03:29,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.68 | bwd_microstep: 1483.60 | bwd_inner_microstep: 1483.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 03:03:31,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1275.77 | bwd_inner_microstep: 1275.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 03:03:33,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.17 | bwd_microstep: 1341.00 | bwd_inner_microstep: 1340.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3485
[2024-06-10 03:03:35,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.08 | bwd_microstep: 1220.85 | bwd_inner_microstep: 1220.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 03:03:37,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.83 | bwd_microstep: 1397.86 | bwd_inner_microstep: 1397.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 03:03:38,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1293.61 | bwd_inner_microstep: 1293.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181
[2024-06-10 03:03:40,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.70 | bwd_microstep: 860.05 | bwd_inner_microstep: 860.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 03:03:41,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.21 | bwd_microstep: 1261.83 | bwd_inner_microstep: 1261.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 03:03:43,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1311.09 | bwd_inner_microstep: 1311.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 03:03:45,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.94 | bwd_microstep: 1514.84 | bwd_inner_microstep: 1514.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 03:03:47,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1382.23 | bwd_inner_microstep: 1382.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768
[2024-06-10 03:03:49,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.91 | bwd_microstep: 1349.92 | bwd_inner_microstep: 1349.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 03:03:51,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.43 | bwd_microstep: 1537.37 | bwd_inner_microstep: 1537.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 03:03:53,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1381.32 | bwd_inner_microstep: 1381.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3376
[2024-06-10 03:03:55,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.72 | bwd_microstep: 1242.52 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 03:03:57,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.42 | bwd_microstep: 1516.64 | bwd_inner_microstep: 1516.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781
[2024-06-10 03:04:02,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 03:04:02,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 4625.14 | bwd_inner_microstep: 1633.18 | bwd_allreduce_microstep: 2991.90 | step_microstep: 39.50
[2024-06-10 03:04:02,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15674.66 | bwd: 44810.95 | bwd_inner: 41817.85 | bwd_allreduce: 2992.28 | step: 41.42
{'loss': 1.3062, 'learning_rate': 3.9746074943656534e-05, 'epoch': 0.08}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3491
[2024-06-10 03:04:04,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.33 | bwd_microstep: 1539.79 | bwd_inner_microstep: 1539.63 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3929
[2024-06-10 03:04:06,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.34 | bwd_microstep: 1639.42 | bwd_inner_microstep: 1639.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939
[2024-06-10 03:04:09,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.96 | bwd_microstep: 1691.90 | bwd_inner_microstep: 1691.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 03:04:11,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.97 | bwd_microstep: 1563.19 | bwd_inner_microstep: 1563.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 03:04:13,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.23 | bwd_microstep: 1250.39 | bwd_inner_microstep: 1250.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3792
[2024-06-10 03:04:15,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.82 | bwd_microstep: 1580.33 | bwd_inner_microstep: 1580.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 03:04:17,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.93 | bwd_microstep: 1277.16 | bwd_inner_microstep: 1277.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1947
[2024-06-10 03:04:18,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 891.79 | bwd_inner_microstep: 891.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2948
[2024-06-10 03:04:20,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.34 | bwd_microstep: 1294.64 | bwd_inner_microstep: 1294.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-10 03:04:22,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1439.39 | bwd_inner_microstep: 1439.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 03:04:23,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1343.64 | bwd_inner_microstep: 1343.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2125
[2024-06-10 03:04:25,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.51 | bwd_microstep: 1024.65 | bwd_inner_microstep: 1024.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3499
[2024-06-10 03:04:27,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1521.92 | bwd_inner_microstep: 1521.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2077
[2024-06-10 03:04:28,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.44 | bwd_microstep: 1014.01 | bwd_inner_microstep: 1013.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 03:04:30,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1354.33 | bwd_inner_microstep: 1354.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 03:04:31,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.78 | bwd_microstep: 782.93 | bwd_inner_microstep: 782.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 03:04:33,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.94 | bwd_microstep: 1433.39 | bwd_inner_microstep: 1433.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 03:04:35,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.69 | bwd_microstep: 1464.21 | bwd_inner_microstep: 1464.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1984
[2024-06-10 03:04:36,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.68 | bwd_microstep: 706.12 | bwd_inner_microstep: 706.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 03:04:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1410.32 | bwd_inner_microstep: 1410.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3638
[2024-06-10 03:04:40,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1448.34 | bwd_inner_microstep: 1448.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 03:04:42,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.05 | bwd_microstep: 1400.85 | bwd_inner_microstep: 1400.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3767
[2024-06-10 03:04:44,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2541
[2024-06-10 03:04:45,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.18 | bwd_microstep: 933.98 | bwd_inner_microstep: 933.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 03:04:47,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.63 | bwd_microstep: 1450.83 | bwd_inner_microstep: 1450.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-10 03:04:49,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.52 | bwd_microstep: 959.75 | bwd_inner_microstep: 959.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 03:04:51,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.11 | bwd_microstep: 1615.49 | bwd_inner_microstep: 1615.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 03:04:53,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1515.05 | bwd_inner_microstep: 1515.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1930
[2024-06-10 03:04:54,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.76 | bwd_microstep: 731.48 | bwd_inner_microstep: 731.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3572
[2024-06-10 03:04:56,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.26 | bwd_microstep: 1242.34 | bwd_inner_microstep: 1242.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 03:04:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.53 | bwd_microstep: 1481.98 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3808
[2024-06-10 03:05:04,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 5.09 | optimizer_step: 6.61
[2024-06-10 03:05:04,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.05 | bwd_microstep: 5021.31 | bwd_inner_microstep: 1679.97 | bwd_allreduce_microstep: 3341.28 | step_microstep: 40.26
[2024-06-10 03:05:04,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15655.36 | bwd: 45405.74 | bwd_inner: 42063.42 | bwd_allreduce: 3341.57 | step: 42.01
{'loss': 1.3357, 'learning_rate': 3.974007814211373e-05, 'epoch': 0.08}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441
[2024-06-10 03:05:05,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.02 | bwd_microstep: 1391.03 | bwd_inner_microstep: 1390.86 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 03:05:07,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 791.40 | bwd_inner_microstep: 791.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862
[2024-06-10 03:05:09,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1466.44 | bwd_inner_microstep: 1466.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2923
[2024-06-10 03:05:10,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.86 | bwd_microstep: 1094.94 | bwd_inner_microstep: 1094.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 03:05:12,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1283.88 | bwd_inner_microstep: 1283.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 03:05:14,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.02 | bwd_microstep: 1186.74 | bwd_inner_microstep: 1186.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-10 03:05:16,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.79 | bwd_microstep: 1537.00 | bwd_inner_microstep: 1536.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 03:05:18,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1395.81 | bwd_inner_microstep: 1395.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 03:05:20,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.77 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 03:05:21,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 03:05:23,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.97 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2171
[2024-06-10 03:05:25,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.09 | bwd_microstep: 954.01 | bwd_inner_microstep: 953.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 03:05:27,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1478.88 | bwd_inner_microstep: 1478.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 03:05:29,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.55 | bwd_microstep: 1504.42 | bwd_inner_microstep: 1504.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 03:05:31,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.78 | bwd_microstep: 1488.49 | bwd_inner_microstep: 1488.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 03:05:33,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.59 | bwd_microstep: 1534.26 | bwd_inner_microstep: 1534.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651
[2024-06-10 03:05:35,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1426.58 | bwd_inner_microstep: 1426.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 03:05:37,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1250.93 | bwd_inner_microstep: 1250.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 03:05:39,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1495.00 | bwd_inner_microstep: 1494.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 03:05:41,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.93 | bwd_microstep: 1503.00 | bwd_inner_microstep: 1502.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 03:05:43,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.90 | bwd_microstep: 1516.37 | bwd_inner_microstep: 1516.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284
[2024-06-10 03:05:44,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.87 | bwd_microstep: 912.52 | bwd_inner_microstep: 912.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 03:05:46,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.05 | bwd_microstep: 1516.23 | bwd_inner_microstep: 1516.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2180
[2024-06-10 03:05:48,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.55 | bwd_microstep: 893.56 | bwd_inner_microstep: 893.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2053
[2024-06-10 03:05:49,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.07 | bwd_microstep: 960.31 | bwd_inner_microstep: 960.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 03:05:51,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1511.78 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 03:05:53,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2275
[2024-06-10 03:05:55,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.66 | bwd_microstep: 1070.30 | bwd_inner_microstep: 1070.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 03:05:57,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.07 | bwd_microstep: 1601.97 | bwd_inner_microstep: 1601.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 03:05:59,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.62 | bwd_microstep: 1540.95 | bwd_inner_microstep: 1540.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 03:06:01,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.87 | bwd_microstep: 1545.79 | bwd_inner_microstep: 1545.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1931
[2024-06-10 03:06:06,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.40 | optimizer_step: 6.62
[2024-06-10 03:06:06,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.13 | bwd_microstep: 4848.40 | bwd_inner_microstep: 874.44 | bwd_allreduce_microstep: 3973.89 | step_microstep: 42.27
[2024-06-10 03:06:06,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15853.61 | bwd: 46420.02 | bwd_inner: 42445.06 | bwd_allreduce: 3974.19 | step: 44.20
{'loss': 1.3064, 'learning_rate': 3.973401181611832e-05, 'epoch': 0.08}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 03:06:08,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.19 | bwd_microstep: 1477.57 | bwd_inner_microstep: 1477.41 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3915
[2024-06-10 03:06:10,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1523.30 | bwd_inner_microstep: 1523.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 03:06:12,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1383.69 | bwd_inner_microstep: 1383.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 03:06:13,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 797.73 | bwd_inner_microstep: 797.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438
[2024-06-10 03:06:15,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.12 | bwd_microstep: 1153.84 | bwd_inner_microstep: 1153.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 03:06:17,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1289.27 | bwd_inner_microstep: 1289.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 03:06:18,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.47 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3763
[2024-06-10 03:06:20,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.60 | bwd_microstep: 1347.68 | bwd_inner_microstep: 1347.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 03:06:22,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1397.93 | bwd_inner_microstep: 1397.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 03:06:24,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.82 | bwd_microstep: 1351.24 | bwd_inner_microstep: 1351.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 03:06:25,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.17 | bwd_microstep: 1294.39 | bwd_inner_microstep: 1294.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1989
[2024-06-10 03:06:27,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.18 | bwd_microstep: 866.61 | bwd_inner_microstep: 866.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3680
[2024-06-10 03:06:28,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.91 | bwd_microstep: 1328.12 | bwd_inner_microstep: 1328.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 03:06:30,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.91 | bwd_microstep: 1501.42 | bwd_inner_microstep: 1501.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 03:06:32,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1489.98 | bwd_inner_microstep: 1489.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-10 03:06:34,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1437.30 | bwd_inner_microstep: 1437.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399
[2024-06-10 03:06:36,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.85 | bwd_microstep: 1445.64 | bwd_inner_microstep: 1445.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 03:06:38,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.02 | bwd_microstep: 1252.96 | bwd_inner_microstep: 1252.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 03:06:40,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1414.88 | bwd_inner_microstep: 1414.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-10 03:06:42,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.09 | bwd_microstep: 1533.45 | bwd_inner_microstep: 1533.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 03:06:44,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1316.92 | bwd_inner_microstep: 1316.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 03:06:46,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.31 | bwd_microstep: 1557.75 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 03:06:48,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1401.57 | bwd_inner_microstep: 1401.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 03:06:50,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1298.08 | bwd_inner_microstep: 1298.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-10 03:06:52,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 03:06:54,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.58 | bwd_microstep: 1334.77 | bwd_inner_microstep: 1334.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177
[2024-06-10 03:06:55,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.95 | bwd_microstep: 859.78 | bwd_inner_microstep: 859.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3590
[2024-06-10 03:06:57,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1534.54 | bwd_inner_microstep: 1534.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 03:06:59,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.68 | bwd_microstep: 1409.87 | bwd_inner_microstep: 1409.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 03:07:01,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.75 | bwd_microstep: 1602.69 | bwd_inner_microstep: 1602.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 03:07:03,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.84 | bwd_microstep: 1608.92 | bwd_inner_microstep: 1608.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3813
[2024-06-10 03:07:07,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 03:07:07,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 2536.91 | bwd_inner_microstep: 1447.01 | bwd_allreduce_microstep: 1089.85 | step_microstep: 38.83
[2024-06-10 03:07:07,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16051.09 | bwd: 43935.17 | bwd_inner: 42844.26 | bwd_allreduce: 1090.14 | step: 41.01
{'loss': 1.3682, 'learning_rate': 3.972787598703589e-05, 'epoch': 0.08}
|▊         | 135/1726 [2:24:37<26:55:08, 60.91s/it]


  8%|▊         | 135/1726 [2:24:37<26:55:08, 60.91s/it]
  8%|▊         | 136/1726 [2:25:38<26:57:12, 61.03s/it]


  8%|▊         | 136/1726 [2:25:38<26:57:12, 61.03s/it]
  8%|▊         | 137/1726 [2:26:39<26:54:48, 60.97s/it]


  8%|▊         | 137/1726 [2:26:39<26:54:48, 60.97s/it]
  8%|▊         | 138/1726 [2:27:40<26:57:29, 61.11s/it]


  8%|▊         | 138/1726 [2:27:40<26:57:29, 61.11s/it]
  8%|▊         | 139/1726 [2:28:43<27:08:43, 61.58s/it]


  8%|▊         | 139/1726 [2:28:43<27:08:43, 61.58s/it]
  8%|▊         | 140/1726 [2:29:43<26:58:04, 61.21s/it]


  8%|▊         | 140/1726 [2:29:43<26:58:04, 6dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544
[2024-06-10 03:07:09,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1418.10 | bwd_inner_microstep: 1417.89 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 03:07:10,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1396.61 | bwd_inner_microstep: 1396.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 03:07:12,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.88 | bwd_microstep: 1391.57 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 03:07:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.40 | bwd_microstep: 1386.53 | bwd_inner_microstep: 1386.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 03:07:16,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.45 | bwd_microstep: 1486.29 | bwd_inner_microstep: 1486.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3758
[2024-06-10 03:07:18,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.04 | bwd_microstep: 1466.22 | bwd_inner_microstep: 1466.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 03:07:21,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.00 | bwd_microstep: 1635.96 | bwd_inner_microstep: 1635.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 03:07:22,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1348.92 | bwd_inner_microstep: 1348.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 03:07:24,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.97 | bwd_microstep: 795.31 | bwd_inner_microstep: 795.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 03:07:25,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1282.38 | bwd_inner_microstep: 1282.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4105
[2024-06-10 03:07:28,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.30 | bwd_microstep: 1736.61 | bwd_inner_microstep: 1736.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 4068
[2024-06-10 03:07:30,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.08 | bwd_microstep: 1479.25 | bwd_inner_microstep: 1479.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3431
[2024-06-10 03:07:32,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.30 | bwd_microstep: 1422.93 | bwd_inner_microstep: 1422.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 03:07:34,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.32 | bwd_microstep: 1615.10 | bwd_inner_microstep: 1615.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3670
[2024-06-10 03:07:36,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.83 | bwd_microstep: 1547.74 | bwd_inner_microstep: 1547.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3578
[2024-06-10 03:07:38,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.89 | bwd_microstep: 1701.38 | bwd_inner_microstep: 1701.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 03:07:40,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1452.65 | bwd_inner_microstep: 1452.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3141
[2024-06-10 03:07:42,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.98 | bwd_microstep: 1315.11 | bwd_inner_microstep: 1315.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 03:07:44,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1396.54 | bwd_inner_microstep: 1396.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3625
[2024-06-10 03:07:46,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.40 | bwd_microstep: 1223.36 | bwd_inner_microstep: 1223.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 03:07:48,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.46 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1944
[2024-06-10 03:07:49,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.98 | bwd_microstep: 744.39 | bwd_inner_microstep: 744.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 03:07:51,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3639
[2024-06-10 03:07:53,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.97 | bwd_microstep: 1441.94 | bwd_inner_microstep: 1441.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3389
[2024-06-10 03:07:55,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.14 | bwd_microstep: 1367.55 | bwd_inner_microstep: 1367.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 03:07:57,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1556.87 | bwd_inner_microstep: 1556.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 03:07:58,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.11 | bwd_microstep: 809.39 | bwd_inner_microstep: 809.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626
[2024-06-10 03:08:00,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.99 | bwd_microstep: 1476.55 | bwd_inner_microstep: 1476.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1917
[2024-06-10 03:08:01,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.91 | bwd_microstep: 720.74 | bwd_inner_microstep: 720.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 03:08:03,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.12 | bwd_microstep: 1394.08 | bwd_inner_microstep: 1394.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3597
[2024-06-10 03:08:05,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.06 | bwd_microstep: 1342.07 | bwd_inner_microstep: 1342.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3573
[2024-06-10 03:08:08,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 03:08:08,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 2806.91 | bwd_inner_microstep: 1417.13 | bwd_allreduce_microstep: 1389.72 | step_microstep: 38.87
[2024-06-10 03:08:08,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16270.44 | bwd: 44821.26 | bwd_inner: 43430.46 | bwd_allreduce: 1390.04 | step: 40.80
{'loss': 1.3175, 'learning_rate': 3.972167067647678e-05, 'epoch': 0.08}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 03:08:10,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1368.96 | bwd_inner_microstep: 1368.80 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 03:08:12,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.40 | bwd_microstep: 1482.98 | bwd_inner_microstep: 1482.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 03:08:14,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.02 | bwd_microstep: 1380.61 | bwd_inner_microstep: 1380.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2020
[2024-06-10 03:08:15,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.09 | bwd_microstep: 745.49 | bwd_inner_microstep: 745.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 03:08:17,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1249.13 | bwd_inner_microstep: 1249.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 03:08:18,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.14 | bwd_microstep: 796.32 | bwd_inner_microstep: 796.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 03:08:19,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.01 | bwd_microstep: 698.35 | bwd_inner_microstep: 698.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 03:08:20,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.03 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2491
[2024-06-10 03:08:22,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.82 | bwd_microstep: 1027.19 | bwd_inner_microstep: 1027.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 03:08:24,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1359.69 | bwd_inner_microstep: 1359.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508
[2024-06-10 03:08:26,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.45 | bwd_microstep: 1457.21 | bwd_inner_microstep: 1457.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-10 03:08:28,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1379.09 | bwd_inner_microstep: 1379.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3505
[2024-06-10 03:08:30,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.00 | bwd_microstep: 1448.18 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3500
[2024-06-10 03:08:32,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.17 | bwd_microstep: 1445.05 | bwd_inner_microstep: 1445.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 03:08:34,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.33 | bwd_microstep: 1383.00 | bwd_inner_microstep: 1382.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 03:08:35,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1921
[2024-06-10 03:08:36,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.42 | bwd_microstep: 758.31 | bwd_inner_microstep: 758.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 03:08:39,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.40 | bwd_microstep: 1612.69 | bwd_inner_microstep: 1612.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 03:08:41,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.94 | bwd_microstep: 1492.85 | bwd_inner_microstep: 1492.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2289
[2024-06-10 03:08:42,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.05 | bwd_microstep: 1072.28 | bwd_inner_microstep: 1072.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 03:08:44,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 03:08:46,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1502.81 | bwd_inner_microstep: 1502.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 03:08:48,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.09 | bwd_microstep: 1292.70 | bwd_inner_microstep: 1292.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 03:08:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.76 | bwd_microstep: 1304.84 | bwd_inner_microstep: 1304.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 03:08:52,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1383.81 | bwd_inner_microstep: 1383.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 03:08:53,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.57 | bwd_microstep: 700.38 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 03:08:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.52 | bwd_microstep: 1661.61 | bwd_inner_microstep: 1661.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 03:08:57,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1408.83 | bwd_inner_microstep: 1408.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 03:08:59,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.61 | bwd_microstep: 1659.07 | bwd_inner_microstep: 1659.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-10 03:09:01,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.41 | bwd_microstep: 1434.49 | bwd_inner_microstep: 1434.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042
[2024-06-10 03:09:02,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.01 | bwd_microstep: 911.00 | bwd_inner_microstep: 910.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 03:09:10,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.38 | optimizer_step: 6.60
[2024-06-10 03:09:10,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.47 | bwd_microstep: 6654.98 | bwd_inner_microstep: 1757.98 | bwd_allreduce_microstep: 4896.93 | step_microstep: 39.92
[2024-06-10 03:09:10,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15344.44 | bwd: 46025.05 | bwd_inner: 41127.07 | bwd_allreduce: 4897.24 | step: 41.90
{'loss': 1.2987, 'learning_rate': 3.971539590629608e-05, 'epoch': 0.08}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-10 03:09:12,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1434.11 | bwd_inner_microstep: 1433.96 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2903
[2024-06-10 03:09:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.36 | bwd_microstep: 1189.41 | bwd_inner_microstep: 1189.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2259
[2024-06-10 03:09:15,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.14 | bwd_microstep: 933.94 | bwd_inner_microstep: 933.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 03:09:16,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1276.83 | bwd_inner_microstep: 1276.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3765
[2024-06-10 03:09:18,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1346.40 | bwd_inner_microstep: 1346.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 03:09:19,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.88 | bwd_microstep: 792.46 | bwd_inner_microstep: 792.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 03:09:21,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 03:09:23,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1245.64 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-10 03:09:25,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.76 | bwd_microstep: 1474.09 | bwd_inner_microstep: 1474.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 03:09:27,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.95 | bwd_microstep: 1428.98 | bwd_inner_microstep: 1428.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-10 03:09:29,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1322.88 | bwd_inner_microstep: 1322.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 03:09:31,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.97 | bwd_microstep: 1420.00 | bwd_inner_microstep: 1419.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3516
[2024-06-10 03:09:33,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.66 | bwd_microstep: 1451.63 | bwd_inner_microstep: 1451.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2658
[2024-06-10 03:09:34,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.22 | bwd_microstep: 1214.77 | bwd_inner_microstep: 1214.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526
[2024-06-10 03:09:36,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.26 | bwd_microstep: 1452.84 | bwd_inner_microstep: 1452.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423
[2024-06-10 03:09:38,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.49 | bwd_microstep: 1315.20 | bwd_inner_microstep: 1315.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 03:09:40,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1389.88 | bwd_inner_microstep: 1389.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910
[2024-06-10 03:09:41,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.51 | bwd_microstep: 720.92 | bwd_inner_microstep: 720.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 03:09:43,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1381.92 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2306
[2024-06-10 03:09:44,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.36 | bwd_microstep: 891.72 | bwd_inner_microstep: 891.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 03:09:46,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.63 | bwd_microstep: 1260.28 | bwd_inner_microstep: 1260.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 03:09:48,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.03 | bwd_microstep: 1184.61 | bwd_inner_microstep: 1184.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 03:09:50,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.04 | bwd_microstep: 1361.90 | bwd_inner_microstep: 1361.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 03:09:52,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.81 | bwd_microstep: 1561.11 | bwd_inner_microstep: 1561.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 03:09:54,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.64 | bwd_microstep: 1581.96 | bwd_inner_microstep: 1581.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 03:09:55,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.67 | bwd_microstep: 688.01 | bwd_inner_microstep: 687.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 03:09:57,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.92 | bwd_microstep: 1564.47 | bwd_inner_microstep: 1564.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 03:09:59,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.35 | bwd_microstep: 1620.11 | bwd_inner_microstep: 1620.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3805
[2024-06-10 03:10:02,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.47 | bwd_microstep: 1858.41 | bwd_inner_microstep: 1858.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3779
[2024-06-10 03:10:04,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.33 | bwd_microstep: 1626.98 | bwd_inner_microstep: 1626.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 03:10:06,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.79 | bwd_microstep: 1598.10 | bwd_inner_microstep: 1598.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-10 03:10:11,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.41 | optimizer_step: 6.59
[2024-06-10 03:10:11,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.77 | bwd_microstep: 4083.39 | bwd_inner_microstep: 2020.81 | bwd_allreduce_microstep: 2062.51 | step_microstep: 42.15
[2024-06-10 03:10:11,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15891.26 | bwd: 44956.57 | bwd_inner: 42893.01 | bwd_allreduce: 2062.80 | step: 44.33
{'loss': 1.3612, 'learning_rate': 3.970905169859348e-05, 'epoch': 0.08}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 03:10:13,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1486.25 | bwd_inner_microstep: 1486.04 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3923
[2024-06-10 03:10:15,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1491.52 | bwd_inner_microstep: 1491.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 03:10:17,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1557.03 | bwd_inner_microstep: 1557.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 03:10:19,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.57 | bwd_microstep: 1544.12 | bwd_inner_microstep: 1544.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2627
[2024-06-10 03:10:21,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.05 | bwd_microstep: 1068.28 | bwd_inner_microstep: 1068.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 03:10:23,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1255.97 | bwd_inner_microstep: 1255.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 03:10:24,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.65 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3752
[2024-06-10 03:10:26,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.56 | bwd_microstep: 1249.74 | bwd_inner_microstep: 1249.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 03:10:28,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1553.58 | bwd_inner_microstep: 1553.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 03:10:30,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.34 | bwd_microstep: 1390.93 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3678
[2024-06-10 03:10:32,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.29 | bwd_microstep: 1493.07 | bwd_inner_microstep: 1492.97 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.22
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 03:10:34,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1491.29 | bwd_inner_microstep: 1491.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3425
[2024-06-10 03:10:36,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.64 | bwd_microstep: 1401.39 | bwd_inner_microstep: 1401.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 03:10:37,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.85 | bwd_microstep: 681.62 | bwd_inner_microstep: 681.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 03:10:39,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1461.17 | bwd_inner_microstep: 1461.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 03:10:41,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1282.20 | bwd_inner_microstep: 1282.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 03:10:43,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 03:10:45,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1301.17 | bwd_inner_microstep: 1301.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 03:10:47,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.49 | bwd_microstep: 1513.35 | bwd_inner_microstep: 1513.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 03:10:49,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.31 | bwd_microstep: 1435.66 | bwd_inner_microstep: 1435.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 03:10:51,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1614.40 | bwd_inner_microstep: 1614.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-10 03:10:53,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.43 | bwd_microstep: 1318.28 | bwd_inner_microstep: 1318.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-10 03:10:54,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.68 | bwd_microstep: 704.37 | bwd_inner_microstep: 704.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 585
[2024-06-10 03:10:54,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.01 | bwd_microstep: 258.45 | bwd_inner_microstep: 258.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 03:10:57,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.87 | bwd_microstep: 1660.48 | bwd_inner_microstep: 1660.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2276
[2024-06-10 03:10:58,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.57 | bwd_microstep: 941.39 | bwd_inner_microstep: 941.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3562
[2024-06-10 03:11:00,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.18 | bwd_microstep: 1449.35 | bwd_inner_microstep: 1449.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 03:11:02,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.90 | bwd_microstep: 1554.91 | bwd_inner_microstep: 1554.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3565
[2024-06-10 03:11:04,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.52 | bwd_microstep: 1697.00 | bwd_inner_microstep: 1696.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 03:11:06,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.61 | bwd_microstep: 1497.31 | bwd_inner_microstep: 1497.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3428
[2024-06-10 03:11:08,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.66 | bwd_microstep: 1543.98 | bwd_inner_microstep: 1543.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 03:11:11,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 03:11:11,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.20 | bwd_microstep: 1482.25 | bwd_inner_microstep: 1474.44 | bwd_allreduce_microstep: 7.76 | step_microstep: 38.64
[2024-06-10 03:11:11,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16111.64 | bwd: 43059.59 | bwd_inner: 43050.65 | bwd_allreduce: 8.12 | step: 40.82
{'loss': 1.3015, 'learning_rate': 3.9702638075713265e-05, 'epoch': 0.08}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3406
[2024-06-10 03:11:12,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1370.96 | bwd_inner_microstep: 1370.81 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4011
[2024-06-10 03:11:15,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.39 | bwd_microstep: 1509.92 | bwd_inner_microstep: 1509.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1101
[2024-06-10 03:11:15,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 172.39 | bwd_microstep: 442.61 | bwd_inner_microstep: 442.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 03:11:17,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 03:11:19,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.04 | bwd_microstep: 1483.51 | bwd_inner_microstep: 1483.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3409
[2024-06-10 03:11:21,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.41 | bwd_microstep: 1216.00 | bwd_inner_microstep: 1215.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 03:11:23,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.05 | bwd_microstep: 1643.02 | bwd_inner_microstep: 1642.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 03:11:25,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.63 | bwd_microstep: 1431.52 | bwd_inner_microstep: 1431.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483
[2024-06-10 03:11:27,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.80 | bwd_microstep: 1415.98 | bwd_inner_microstep: 1415.65 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 03:11:29,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1249.51 | bwd_inner_microstep: 1249.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 03:11:30,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1336.52 | bwd_inner_microstep: 1336.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-10 03:11:31,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.15 | bwd_microstep: 728.59 | bwd_inner_microstep: 728.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 03:11:33,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1242.27 | bwd_inner_microstep: 1242.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3049
[2024-06-10 03:11:35,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.34 | bwd_microstep: 1139.50 | bwd_inner_microstep: 1139.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 03:11:37,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.02 | bwd_microstep: 1346.60 | bwd_inner_microstep: 1346.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 03:11:39,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1515.29 | bwd_inner_microstep: 1515.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 03:11:40,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.32 | bwd_microstep: 1264.25 | bwd_inner_microstep: 1264.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 03:11:43,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.54 | bwd_microstep: 1625.26 | bwd_inner_microstep: 1625.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3669
[2024-06-10 03:11:45,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1330.46 | bwd_inner_microstep: 1330.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613
[2024-06-10 03:11:46,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.74 | bwd_microstep: 1216.52 | bwd_inner_microstep: 1216.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 03:11:48,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1494.27 | bwd_inner_microstep: 1494.10 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.11
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829
[2024-06-10 03:11:51,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.04 | bwd_microstep: 1592.21 | bwd_inner_microstep: 1592.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 03:11:52,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.60 | bwd_microstep: 1436.93 | bwd_inner_microstep: 1436.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 03:11:55,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.09 | bwd_microstep: 1513.10 | bwd_inner_microstep: 1513.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 03:11:57,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.30 | bwd_microstep: 1408.10 | bwd_inner_microstep: 1408.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3682
[2024-06-10 03:11:59,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.23 | bwd_microstep: 1459.10 | bwd_inner_microstep: 1459.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 03:12:00,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.61 | bwd_microstep: 1316.67 | bwd_inner_microstep: 1316.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 03:12:03,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1553.34 | bwd_inner_microstep: 1553.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3690
[2024-06-10 03:12:05,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1505.06 | bwd_inner_microstep: 1505.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 03:12:07,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.70 | bwd_microstep: 1497.72 | bwd_inner_microstep: 1497.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478
[2024-06-10 03:12:09,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.11 | bwd_microstep: 1342.21 | bwd_inner_microstep: 1342.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3582
[2024-06-10 03:12:13,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 03:12:13,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.88 | bwd_microstep: 4000.28 | bwd_inner_microstep: 1960.76 | bwd_allreduce_microstep: 2039.46 | step_microstep: 39.19
[2024-06-10 03:12:13,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16370.13 | bwd: 45910.68 | bwd_inner: 43869.83 | bwd_allreduce: 2039.93 | step: 41.27
{'loss': 1.3505, 'learning_rate': 3.9696155060244166e-05, 'epoch': 0.08}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1934
[2024-06-10 03:12:14,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.67 | bwd_microstep: 882.19 | bwd_inner_microstep: 882.03 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3412
[2024-06-10 03:12:16,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.79 | bwd_microstep: 1179.02 | bwd_inner_microstep: 1178.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2730
[2024-06-10 03:12:18,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.65 | bwd_microstep: 1039.06 | bwd_inner_microstep: 1039.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 03:12:20,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.92 | bwd_microstep: 1547.89 | bwd_inner_microstep: 1547.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 03:12:21,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3856
[2024-06-10 03:12:23,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.54 | bwd_microstep: 1465.33 | bwd_inner_microstep: 1465.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2902
[2024-06-10 03:12:25,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.41 | bwd_microstep: 1000.35 | bwd_inner_microstep: 1000.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 03:12:27,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.99 | bwd_microstep: 1532.61 | bwd_inner_microstep: 1532.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486
[2024-06-10 03:12:29,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.82 | bwd_microstep: 1191.52 | bwd_inner_microstep: 1191.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3704
[2024-06-10 03:12:31,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.12 | bwd_microstep: 1559.39 | bwd_inner_microstep: 1559.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 03:12:33,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.60 | bwd_microstep: 1319.35 | bwd_inner_microstep: 1319.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 03:12:35,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1419.30 | bwd_inner_microstep: 1419.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4090
[2024-06-10 03:12:37,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.24 | bwd_microstep: 1537.32 | bwd_inner_microstep: 1537.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 03:12:39,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1619.57 | bwd_inner_microstep: 1619.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 03:12:41,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.67 | bwd_microstep: 1268.28 | bwd_inner_microstep: 1268.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 03:12:43,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1397.11 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 03:12:44,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1378.97 | bwd_inner_microstep: 1378.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 03:12:46,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1243.41 | bwd_inner_microstep: 1243.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3023
[2024-06-10 03:12:48,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1399.24 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2409
[2024-06-10 03:12:50,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 1031.34 | bwd_inner_microstep: 1031.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 03:12:52,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1525.46 | bwd_inner_microstep: 1525.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 03:12:54,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.80 | bwd_microstep: 1567.74 | bwd_inner_microstep: 1567.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.27
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2186
[2024-06-10 03:12:55,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.43 | bwd_microstep: 862.34 | bwd_inner_microstep: 862.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 03:12:57,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1301.96 | bwd_inner_microstep: 1301.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 03:12:59,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.94 | bwd_microstep: 1427.40 | bwd_inner_microstep: 1427.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2294
[2024-06-10 03:13:00,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.85 | bwd_microstep: 787.56 | bwd_inner_microstep: 787.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2234
[2024-06-10 03:13:01,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.25 | bwd_microstep: 868.44 | bwd_inner_microstep: 868.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 03:13:03,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1406.11 | bwd_inner_microstep: 1406.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 03:13:05,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.86 | bwd_microstep: 1303.49 | bwd_inner_microstep: 1303.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 03:13:07,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.19 | bwd_microstep: 1406.17 | bwd_inner_microstep: 1406.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2688
[2024-06-10 03:13:08,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.63 | bwd_microstep: 1125.49 | bwd_inner_microstep: 1125.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 03:13:14,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.39 | optimizer_step: 6.58
[2024-06-10 03:13:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.81 | bwd_microstep: 4975.67 | bwd_inner_microstep: 891.21 | bwd_allreduce_microstep: 4084.39 | step_microstep: 40.20
[2024-06-10 03:13:14,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15314.85 | bwd: 44816.18 | bwd_inner: 40730.69 | bwd_allreduce: 4084.68 | step: 42.77
{'loss': 1.3889, 'learning_rate': 3.968960267501933e-05, 'epoch': 0.08}
1.21s/it]
  8%|▊         | 141/1726 [2:30:45<26:59:08, 61.29s/it]


  8%|▊         | 141/1726 [2:30:45<26:59:08, 61.29s/it]
  8%|▊         | 142/1726 [2:31:47<27:01:43, 61.43s/it]


  8%|▊         | 142/1726 [2:31:47<27:01:43, 61.43s/it]
  8%|▊         | 143/1726 [2:32:48<26:59:10, 61.37s/it]


  8%|▊         | 143/1726 [2:32:48<26:59:10, 61.37s/it]
  8%|▊         | 144/1726 [2:33:47<26:43:49, 60.83s/it]


  8%|▊         | 144/1726 [2:33:47<26:43:49, 60.83s/it]
  8%|▊         | 145/1726 [2:34:50<26:57:19, 61.38s/it]


  8%|▊         | 145/1726 [2:34:50<26:57:19, 61.38s/it]
  8%|▊         | 146/1726 [2:35:50<26:49:31, 61.12s/it]


  8%|▊         | 146/1726 [2:35:dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 03:13:16,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1335.35 | bwd_inner_microstep: 1335.27 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4638
[2024-06-10 03:13:18,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.60 | bwd_microstep: 1666.97 | bwd_inner_microstep: 1666.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3801
[2024-06-10 03:13:20,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.43 | bwd_microstep: 1507.79 | bwd_inner_microstep: 1507.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 03:13:22,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.13 | bwd_microstep: 1656.71 | bwd_inner_microstep: 1656.62 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.23
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429
[2024-06-10 03:13:24,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.17 | bwd_microstep: 1157.78 | bwd_inner_microstep: 1157.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 03:13:26,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.69 | bwd_microstep: 1284.21 | bwd_inner_microstep: 1284.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 03:13:28,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.23 | bwd_microstep: 1457.34 | bwd_inner_microstep: 1457.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 03:13:30,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1391.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1902
[2024-06-10 03:13:31,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.21 | bwd_microstep: 751.59 | bwd_inner_microstep: 751.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 03:13:32,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.33 | bwd_microstep: 1250.94 | bwd_inner_microstep: 1250.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 03:13:34,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1416.10 | bwd_inner_microstep: 1416.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3499
[2024-06-10 03:13:36,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006
[2024-06-10 03:13:37,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.63 | bwd_microstep: 840.38 | bwd_inner_microstep: 840.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3724
[2024-06-10 03:13:40,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.42 | bwd_microstep: 1735.01 | bwd_inner_microstep: 1734.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3685
[2024-06-10 03:13:42,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.16 | bwd_microstep: 1722.90 | bwd_inner_microstep: 1722.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 03:13:44,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.42 | bwd_microstep: 1294.19 | bwd_inner_microstep: 1294.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 03:13:46,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1415.37 | bwd_inner_microstep: 1415.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633
[2024-06-10 03:13:48,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1319.80 | bwd_inner_microstep: 1319.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3647
[2024-06-10 03:13:50,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.55 | bwd_microstep: 1321.03 | bwd_inner_microstep: 1321.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 03:13:51,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.06 | bwd_microstep: 1283.77 | bwd_inner_microstep: 1283.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3829
[2024-06-10 03:13:54,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.92 | bwd_microstep: 1616.56 | bwd_inner_microstep: 1616.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 03:13:55,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 1286.90 | bwd_inner_microstep: 1286.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 03:13:58,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.17 | bwd_microstep: 1498.20 | bwd_inner_microstep: 1498.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 03:13:59,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 977.27 | bwd_inner_microstep: 977.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 03:14:01,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.08 | bwd_microstep: 1478.19 | bwd_inner_microstep: 1478.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 03:14:03,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.81 | bwd_microstep: 1451.87 | bwd_inner_microstep: 1451.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 03:14:05,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.46 | bwd_microstep: 1656.27 | bwd_inner_microstep: 1656.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666
[2024-06-10 03:14:07,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.31 | bwd_microstep: 1587.09 | bwd_inner_microstep: 1587.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 03:14:10,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.23 | bwd_microstep: 1549.57 | bwd_inner_microstep: 1549.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2031
[2024-06-10 03:14:11,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.39 | bwd_microstep: 873.86 | bwd_inner_microstep: 873.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 03:14:13,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.83 | bwd_microstep: 1636.92 | bwd_inner_microstep: 1636.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 03:14:15,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.22 | optimizer_step: 6.65
[2024-06-10 03:14:15,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.68 | bwd_microstep: 1646.00 | bwd_inner_microstep: 1638.08 | bwd_allreduce_microstep: 7.88 | step_microstep: 38.81
[2024-06-10 03:14:15,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16627.00 | bwd: 44513.98 | bwd_inner: 44505.05 | bwd_allreduce: 8.19 | step: 40.96
{'loss': 1.3667, 'learning_rate': 3.9682980943116236e-05, 'epoch': 0.09}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3394
[2024-06-10 03:14:17,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.55 | bwd_microstep: 1305.14 | bwd_inner_microstep: 1305.01 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 03:14:18,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.84 | bwd_microstep: 807.30 | bwd_inner_microstep: 807.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3849
[2024-06-10 03:14:20,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1462.67 | bwd_inner_microstep: 1462.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 03:14:22,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1345.04 | bwd_inner_microstep: 1345.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 03:14:24,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.10 | bwd_microstep: 1656.68 | bwd_inner_microstep: 1656.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3768
[2024-06-10 03:14:26,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.50 | bwd_microstep: 1409.13 | bwd_inner_microstep: 1409.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 03:14:28,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.72 | bwd_microstep: 1350.31 | bwd_inner_microstep: 1350.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 03:14:30,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.04 | bwd_microstep: 1391.19 | bwd_inner_microstep: 1391.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 03:14:32,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1248.98 | bwd_inner_microstep: 1248.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3399
[2024-06-10 03:14:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.89 | bwd_microstep: 1202.30 | bwd_inner_microstep: 1202.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 1971
[2024-06-10 03:14:35,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.43 | bwd_microstep: 879.03 | bwd_inner_microstep: 879.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-10 03:14:37,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.96 | bwd_microstep: 1632.47 | bwd_inner_microstep: 1632.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493
[2024-06-10 03:14:39,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 03:14:41,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.52 | bwd_microstep: 1287.77 | bwd_inner_microstep: 1287.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 03:14:43,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1519.98 | bwd_inner_microstep: 1519.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 03:14:45,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1252.80 | bwd_inner_microstep: 1252.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3673
[2024-06-10 03:14:47,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1474.46 | bwd_inner_microstep: 1474.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 03:14:49,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1435.88 | bwd_inner_microstep: 1435.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1992
[2024-06-10 03:14:50,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.21 | bwd_microstep: 834.89 | bwd_inner_microstep: 834.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-10 03:14:52,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1429.86 | bwd_inner_microstep: 1429.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 03:14:54,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.23 | bwd_microstep: 1614.84 | bwd_inner_microstep: 1614.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 03:14:56,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.52 | bwd_microstep: 1462.58 | bwd_inner_microstep: 1462.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 03:14:57,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.94 | bwd_microstep: 809.96 | bwd_inner_microstep: 809.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 03:14:59,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.90 | bwd_microstep: 1386.09 | bwd_inner_microstep: 1386.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 03:15:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1281.55 | bwd_inner_microstep: 1281.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2099
[2024-06-10 03:15:02,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.85 | bwd_microstep: 763.83 | bwd_inner_microstep: 763.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 03:15:04,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.42 | bwd_microstep: 1297.31 | bwd_inner_microstep: 1297.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 03:15:06,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.86 | bwd_microstep: 1475.73 | bwd_inner_microstep: 1475.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3568
[2024-06-10 03:15:08,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1461.43 | bwd_inner_microstep: 1461.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 03:15:10,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.02 | bwd_microstep: 1600.46 | bwd_inner_microstep: 1600.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3398
[2024-06-10 03:15:12,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.54 | bwd_microstep: 1308.55 | bwd_inner_microstep: 1308.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 03:15:16,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.43 | optimizer_step: 6.58
[2024-06-10 03:15:16,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 4097.74 | bwd_inner_microstep: 1575.66 | bwd_allreduce_microstep: 2521.98 | step_microstep: 39.90
[2024-06-10 03:15:16,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15829.36 | bwd: 44931.42 | bwd_inner: 42408.37 | bwd_allreduce: 2522.29 | step: 42.05
{'loss': 1.3265, 'learning_rate': 3.967628988785658e-05, 'epoch': 0.09}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 03:15:18,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.32 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3398
[2024-06-10 03:15:20,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 03:15:22,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1377.43 | bwd_inner_microstep: 1377.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3855
[2024-06-10 03:15:24,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.57 | bwd_microstep: 1664.75 | bwd_inner_microstep: 1664.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 03:15:26,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.18 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-10 03:15:28,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.48 | bwd_microstep: 1319.12 | bwd_inner_microstep: 1319.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4046
[2024-06-10 03:15:30,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.80 | bwd_microstep: 1724.10 | bwd_inner_microstep: 1724.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2641
[2024-06-10 03:15:32,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.69 | bwd_microstep: 960.68 | bwd_inner_microstep: 960.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 03:15:34,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1253.06 | bwd_inner_microstep: 1253.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3495
[2024-06-10 03:15:35,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1349.92 | bwd_inner_microstep: 1349.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3710
[2024-06-10 03:15:38,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.88 | bwd_microstep: 1729.31 | bwd_inner_microstep: 1729.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 03:15:40,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1353.32 | bwd_inner_microstep: 1353.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495
[2024-06-10 03:15:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.73 | bwd_microstep: 1411.23 | bwd_inner_microstep: 1411.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 03:15:44,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1407.23 | bwd_inner_microstep: 1407.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3680
[2024-06-10 03:15:46,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.82 | bwd_microstep: 1587.35 | bwd_inner_microstep: 1587.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518
[2024-06-10 03:15:48,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1518.27 | bwd_inner_microstep: 1518.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 03:15:50,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.94 | bwd_microstep: 1559.81 | bwd_inner_microstep: 1559.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 03:15:52,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.54 | bwd_microstep: 1407.89 | bwd_inner_microstep: 1407.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 03:15:54,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.39 | bwd_microstep: 1298.49 | bwd_inner_microstep: 1298.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 03:15:56,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.42 | bwd_microstep: 1454.76 | bwd_inner_microstep: 1454.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2071
[2024-06-10 03:15:57,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.19 | bwd_microstep: 821.51 | bwd_inner_microstep: 821.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 03:15:59,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1496.00 | bwd_inner_microstep: 1495.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2087
[2024-06-10 03:16:00,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.65 | bwd_microstep: 922.38 | bwd_inner_microstep: 922.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3708
[2024-06-10 03:16:02,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1463.02 | bwd_inner_microstep: 1462.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3781
[2024-06-10 03:16:04,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.18 | bwd_microstep: 1259.11 | bwd_inner_microstep: 1259.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3614
[2024-06-10 03:16:06,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.03 | bwd_microstep: 1250.84 | bwd_inner_microstep: 1250.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564
[2024-06-10 03:16:08,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1428.24 | bwd_inner_microstep: 1428.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 03:16:10,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.78 | bwd_microstep: 1456.27 | bwd_inner_microstep: 1456.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3588
[2024-06-10 03:16:12,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.74 | bwd_microstep: 1452.35 | bwd_inner_microstep: 1452.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3587
[2024-06-10 03:16:14,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.46 | bwd_microstep: 1368.61 | bwd_inner_microstep: 1368.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 03:16:15,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.79 | bwd_microstep: 1286.54 | bwd_inner_microstep: 1286.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2904
[2024-06-10 03:16:19,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 03:16:19,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.27 | bwd_microstep: 3126.51 | bwd_inner_microstep: 1344.94 | bwd_allreduce_microstep: 1781.51 | step_microstep: 41.42
[2024-06-10 03:16:19,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16468.95 | bwd: 45769.51 | bwd_inner: 43987.00 | bwd_allreduce: 1781.80 | step: 43.45
{'loss': 1.3359, 'learning_rate': 3.966952953280623e-05, 'epoch': 0.09}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 03:16:21,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1372.91 | bwd_inner_microstep: 1372.68 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4080
[2024-06-10 03:16:23,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.00 | bwd_microstep: 1624.25 | bwd_inner_microstep: 1624.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4402
[2024-06-10 03:16:25,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1517.44 | bwd_inner_microstep: 1517.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 03:16:27,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1346.93 | bwd_inner_microstep: 1346.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3910
[2024-06-10 03:16:29,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1488.14 | bwd_inner_microstep: 1488.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088
[2024-06-10 03:16:30,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.20 | bwd_microstep: 731.94 | bwd_inner_microstep: 731.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 03:16:32,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.77 | bwd_microstep: 1541.13 | bwd_inner_microstep: 1540.87 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 03:16:33,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.74 | bwd_microstep: 792.37 | bwd_inner_microstep: 792.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 03:16:35,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.85 | bwd_microstep: 1297.23 | bwd_inner_microstep: 1297.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490
[2024-06-10 03:16:37,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.50 | bwd_microstep: 1582.16 | bwd_inner_microstep: 1582.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 03:16:40,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.30 | bwd_microstep: 1478.29 | bwd_inner_microstep: 1478.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 03:16:41,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.85 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3494
[2024-06-10 03:16:44,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.61 | bwd_microstep: 1679.40 | bwd_inner_microstep: 1679.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513
[2024-06-10 03:16:46,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.93 | bwd_microstep: 1551.27 | bwd_inner_microstep: 1551.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3462
[2024-06-10 03:16:48,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1322.91 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 03:16:49,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.03 | bwd_microstep: 1159.45 | bwd_inner_microstep: 1159.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-10 03:16:51,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.19 | bwd_microstep: 1609.55 | bwd_inner_microstep: 1609.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 03:16:53,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.06 | bwd_microstep: 1297.97 | bwd_inner_microstep: 1297.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 03:16:55,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1508.98 | bwd_inner_microstep: 1508.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 03:16:57,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.18 | bwd_microstep: 1448.38 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 03:16:59,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.46 | bwd_microstep: 1556.45 | bwd_inner_microstep: 1556.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969
[2024-06-10 03:17:00,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.22 | bwd_microstep: 707.59 | bwd_inner_microstep: 707.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2039
[2024-06-10 03:17:02,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.20 | bwd_microstep: 845.00 | bwd_inner_microstep: 844.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 03:17:04,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.99 | bwd_microstep: 1531.25 | bwd_inner_microstep: 1531.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 03:17:06,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.11 | bwd_microstep: 1317.76 | bwd_inner_microstep: 1317.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3525
[2024-06-10 03:17:07,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.67 | bwd_microstep: 1260.52 | bwd_inner_microstep: 1260.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3582
[2024-06-10 03:17:10,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.50 | bwd_microstep: 1675.63 | bwd_inner_microstep: 1675.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 03:17:11,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1294.01 | bwd_inner_microstep: 1293.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 03:17:13,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.48 | bwd_microstep: 794.45 | bwd_inner_microstep: 794.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 03:17:15,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.52 | bwd_microstep: 1647.35 | bwd_inner_microstep: 1647.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 03:17:17,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1393.98 | bwd_inner_microstep: 1393.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 03:17:22,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.38 | optimizer_step: 6.59
[2024-06-10 03:17:22,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.72 | bwd_microstep: 5026.78 | bwd_inner_microstep: 1805.91 | bwd_allreduce_microstep: 3220.80 | step_microstep: 39.84
[2024-06-10 03:17:22,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16247.07 | bwd: 46742.83 | bwd_inner: 43520.71 | bwd_allreduce: 3221.27 | step: 42.04
{'loss': 1.3458, 'learning_rate': 3.9662699901775114e-05, 'epoch': 0.09}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 03:17:24,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1483.80 | bwd_inner_microstep: 1483.63 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 03:17:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.86 | bwd_microstep: 1312.02 | bwd_inner_microstep: 1311.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 03:17:28,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.01 | bwd_microstep: 970.58 | bwd_inner_microstep: 970.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 03:17:29,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 789.83 | bwd_inner_microstep: 789.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884
[2024-06-10 03:17:30,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.42 | bwd_microstep: 712.96 | bwd_inner_microstep: 712.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-10 03:17:32,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.65 | bwd_microstep: 1445.87 | bwd_inner_microstep: 1445.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 03:17:34,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.30 | bwd_microstep: 1286.64 | bwd_inner_microstep: 1286.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 03:17:36,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.89 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 03:17:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.76 | bwd_microstep: 1285.33 | bwd_inner_microstep: 1285.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 03:17:39,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1345.05 | bwd_inner_microstep: 1345.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 03:17:41,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1421.21 | bwd_inner_microstep: 1421.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1953
[2024-06-10 03:17:42,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.53 | bwd_microstep: 894.40 | bwd_inner_microstep: 894.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3488
[2024-06-10 03:17:45,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.80 | bwd_microstep: 1679.43 | bwd_inner_microstep: 1679.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2397
[2024-06-10 03:17:46,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.10 | bwd_microstep: 1000.47 | bwd_inner_microstep: 1000.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523
[2024-06-10 03:17:48,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.03 | bwd_microstep: 1591.53 | bwd_inner_microstep: 1591.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 03:17:50,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1331.45 | bwd_inner_microstep: 1331.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 03:17:52,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.35 | bwd_microstep: 1295.78 | bwd_inner_microstep: 1295.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 03:17:54,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 03:17:56,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1215.17 | bwd_inner_microstep: 1215.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 03:17:57,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1279.28 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3819
[2024-06-10 03:17:59,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.09 | bwd_microstep: 1297.78 | bwd_inner_microstep: 1297.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-10 03:18:01,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1315.30 | bwd_inner_microstep: 1315.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 03:18:02,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.01 | bwd_microstep: 976.37 | bwd_inner_microstep: 976.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 03:18:04,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1499.12 | bwd_inner_microstep: 1499.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 03:18:07,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.80 | bwd_microstep: 1500.59 | bwd_inner_microstep: 1500.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 03:18:09,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.74 | bwd_microstep: 1557.91 | bwd_inner_microstep: 1557.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 03:18:11,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1440.91 | bwd_inner_microstep: 1440.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3394
[2024-06-10 03:18:13,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1441.13 | bwd_inner_microstep: 1441.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3453
[2024-06-10 03:18:15,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.31 | bwd_microstep: 1517.75 | bwd_inner_microstep: 1517.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3564
[2024-06-10 03:18:17,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.86 | bwd_microstep: 1696.24 | bwd_inner_microstep: 1696.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 03:18:19,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.26 | bwd_microstep: 1407.99 | bwd_inner_microstep: 1407.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-10 03:18:27,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.40 | optimizer_step: 6.58
[2024-06-10 03:18:27,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 7213.06 | bwd_inner_microstep: 1633.09 | bwd_allreduce_microstep: 5579.88 | step_microstep: 39.92
[2024-06-10 03:18:27,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15910.69 | bwd: 48130.74 | bwd_inner: 42549.78 | bwd_allreduce: 5580.20 | step: 41.87
{'loss': 1.3154, 'learning_rate': 3.9655801018817166e-05, 'epoch': 0.09}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 03:18:29,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1465.67 | bwd_inner_microstep: 1465.49 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2329
[2024-06-10 03:18:30,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.36 | bwd_microstep: 857.31 | bwd_inner_microstep: 857.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3895
[2024-06-10 03:18:32,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.95 | bwd_microstep: 1583.42 | bwd_inner_microstep: 1583.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 03:18:34,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.96 | bwd_microstep: 1536.66 | bwd_inner_microstep: 1536.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 03:18:36,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1244.85 | bwd_inner_microstep: 1244.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 03:18:38,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1387.52 | bwd_inner_microstep: 1387.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 03:18:40,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.36 | bwd_microstep: 1616.20 | bwd_inner_microstep: 1616.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2641
[2024-06-10 03:18:42,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.63 | bwd_microstep: 1114.03 | bwd_inner_microstep: 1114.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 03:18:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1248.45 | bwd_inner_microstep: 1248.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 03:18:45,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1381.23 | bwd_inner_microstep: 1381.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3412
[2024-06-10 03:18:47,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.43 | bwd_microstep: 1377.49 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3502
[2024-06-10 03:18:49,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.54 | bwd_microstep: 1555.22 | bwd_inner_microstep: 1555.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2945
[2024-06-10 03:18:51,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.80 | bwd_microstep: 1196.87 | bwd_inner_microstep: 1196.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 03:18:53,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.48 | bwd_microstep: 1483.63 | bwd_inner_microstep: 1483.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 03:18:55,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.12 | bwd_microstep: 1482.59 | bwd_inner_microstep: 1482.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-10 03:18:57,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1289.59 | bwd_inner_microstep: 1289.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 03:18:59,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.98 | bwd_microstep: 1560.56 | bwd_inner_microstep: 1560.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 03:19:01,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1490.44 | bwd_inner_microstep: 1490.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 03:19:03,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.25 | bwd_microstep: 1614.03 | bwd_inner_microstep: 1614.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 614
[2024-06-10 03:19:04,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 105.13 | bwd_microstep: 261.18 | bwd_inner_microstep: 261.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3539
[2024-06-10 03:19:06,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.91 | bwd_microstep: 1234.84 | bwd_inner_microstep: 1234.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 03:19:08,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.90 | bwd_microstep: 1637.17 | bwd_inner_microstep: 1637.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3715
[2024-06-10 03:19:10,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1597.25 | bwd_inner_microstep: 1597.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3721
[2024-06-10 03:19:12,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 1464.96 | bwd_inner_microstep: 1464.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3587
[2024-06-10 03:19:14,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.61 | bwd_microstep: 1654.37 | bwd_inner_microstep: 1654.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3620
[2024-06-10 03:19:16,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.03 | bwd_microstep: 1537.28 | bwd_inner_microstep: 1537.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539
[2024-06-10 03:19:18,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.82 | bwd_microstep: 1523.20 | bwd_inner_microstep: 1523.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-10 03:19:20,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.00 | bwd_microstep: 1372.08 | bwd_inner_microstep: 1372.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3565
[2024-06-10 03:19:23,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.48 | bwd_microstep: 1694.66 | bwd_inner_microstep: 1694.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 03:19:25,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.86 | bwd_microstep: 1537.48 | bwd_inner_microstep: 1537.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 03:19:27,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.02 | bwd_microstep: 1531.87 | bwd_inner_microstep: 1531.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 03:19:29,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.19 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 03:19:29,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.80 | bwd_microstep: 1444.41 | bwd_inner_microstep: 1436.58 | bwd_allreduce_microstep: 7.78 | step_microstep: 39.82
[2024-06-10 03:19:29,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16757.71 | bwd: 44976.56 | bwd_inner: 44967.73 | bwd_allreduce: 8.08 | step: 41.75
{'loss': 1.3312, 'learning_rate': 3.96488329082302e-05, 'epoch': 0.09}
50<26:49:31, 61.12s/it]
  9%|▊         | 147/1726 [2:36:52<26:51:44, 61.24s/it]


  9%|▊         | 147/1726 [2:36:52<26:51:44, 61.24s/it]
  9%|▊         | 148/1726 [2:37:53<26:49:57, 61.22s/it]


  9%|▊         | 148/1726 [2:37:53<26:49:57, 61.22s/it]
  9%|▊         | 149/1726 [2:38:56<27:00:03, 61.64s/it]


  9%|▊         | 149/1726 [2:38:56<27:00:03, 61.64s/it]
  9%|▊         | 150/1726 [2:39:59<27:12:48, 62.16s/it]


  9%|▊         | 150/1726 [2:39:59<27:12:48, 62.16s/it]
  9%|▊         | 151/1726 [2:41:04<27:29:30, 62.84s/it]


  9%|▊         | 151/1726 [2:41:04<27:29:30, 62.84s/it]
  9%|▉         | 152/1726 [2:42:06<27:22:46, 62.62s/it]


  9%|▉         | 1dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2682
[2024-06-10 03:19:31,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.19 | bwd_microstep: 1113.95 | bwd_inner_microstep: 1113.88 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 03:19:32,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 03:19:34,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.45 | bwd_microstep: 1283.53 | bwd_inner_microstep: 1283.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3870
[2024-06-10 03:19:36,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.52 | bwd_microstep: 1667.57 | bwd_inner_microstep: 1667.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 03:19:38,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1345.89 | bwd_inner_microstep: 1345.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868
[2024-06-10 03:19:39,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.24 | bwd_microstep: 712.00 | bwd_inner_microstep: 711.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 03:19:41,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1253.99 | bwd_inner_microstep: 1253.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 03:19:42,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.38 | bwd_microstep: 792.84 | bwd_inner_microstep: 792.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 03:19:44,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1295.40 | bwd_inner_microstep: 1295.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 03:19:46,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.17 | bwd_microstep: 1252.85 | bwd_inner_microstep: 1252.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 03:19:48,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.47 | bwd_microstep: 1425.85 | bwd_inner_microstep: 1425.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3503
[2024-06-10 03:19:50,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1339.69 | bwd_inner_microstep: 1339.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 03:19:51,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.78 | bwd_microstep: 1260.90 | bwd_inner_microstep: 1260.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3417
[2024-06-10 03:19:53,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.06 | bwd_microstep: 1185.01 | bwd_inner_microstep: 1184.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 03:19:55,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.75 | bwd_microstep: 1516.39 | bwd_inner_microstep: 1516.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 03:19:57,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.25 | bwd_microstep: 1607.85 | bwd_inner_microstep: 1607.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 03:19:59,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.67 | bwd_microstep: 1513.91 | bwd_inner_microstep: 1513.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 03:20:01,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.93 | bwd_microstep: 1515.54 | bwd_inner_microstep: 1515.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 03:20:03,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.55 | bwd_microstep: 1329.28 | bwd_inner_microstep: 1329.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 03:20:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.83 | bwd_microstep: 799.10 | bwd_inner_microstep: 799.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 03:20:06,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.95 | bwd_microstep: 1186.55 | bwd_inner_microstep: 1186.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 03:20:08,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.73 | bwd_microstep: 1332.30 | bwd_inner_microstep: 1332.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2242
[2024-06-10 03:20:09,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.36 | bwd_microstep: 1004.62 | bwd_inner_microstep: 1004.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 03:20:11,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.34 | bwd_microstep: 1406.70 | bwd_inner_microstep: 1406.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 03:20:13,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1471.75 | bwd_inner_microstep: 1471.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 03:20:15,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.93 | bwd_microstep: 1286.08 | bwd_inner_microstep: 1286.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3712
[2024-06-10 03:20:17,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1497.35 | bwd_inner_microstep: 1497.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 03:20:19,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1349.76 | bwd_inner_microstep: 1349.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 03:20:21,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1431.51 | bwd_inner_microstep: 1431.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2232
[2024-06-10 03:20:22,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.66 | bwd_microstep: 1060.48 | bwd_inner_microstep: 1060.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 03:20:25,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.74 | bwd_microstep: 1595.76 | bwd_inner_microstep: 1595.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 03:20:31,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.38 | optimizer_step: 6.61
[2024-06-10 03:20:31,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.17 | bwd_microstep: 5833.18 | bwd_inner_microstep: 1923.17 | bwd_allreduce_microstep: 3909.93 | step_microstep: 39.88
[2024-06-10 03:20:31,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15678.85 | bwd: 46049.93 | bwd_inner: 42139.00 | bwd_allreduce: 3910.21 | step: 41.76
{'loss': 1.345, 'learning_rate': 3.964179559455588e-05, 'epoch': 0.09}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3476
[2024-06-10 03:20:33,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1588.49 | bwd_inner_microstep: 1588.37 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3937
[2024-06-10 03:20:35,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.61 | bwd_microstep: 1590.93 | bwd_inner_microstep: 1590.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4377
[2024-06-10 03:20:38,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.71 | bwd_microstep: 1705.89 | bwd_inner_microstep: 1705.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3897
[2024-06-10 03:20:40,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.26 | bwd_microstep: 1545.71 | bwd_inner_microstep: 1545.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 03:20:42,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1549.77 | bwd_inner_microstep: 1549.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 03:20:44,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.87 | bwd_microstep: 1148.50 | bwd_inner_microstep: 1148.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 03:20:46,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.54 | bwd_microstep: 1349.16 | bwd_inner_microstep: 1349.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 03:20:47,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1248.26 | bwd_inner_microstep: 1248.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 03:20:49,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.23 | bwd_microstep: 1412.96 | bwd_inner_microstep: 1412.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 03:20:51,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1285.74 | bwd_inner_microstep: 1285.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415
[2024-06-10 03:20:53,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.76 | bwd_microstep: 1187.43 | bwd_inner_microstep: 1187.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3499
[2024-06-10 03:20:55,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.08 | bwd_microstep: 1615.18 | bwd_inner_microstep: 1615.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2123
[2024-06-10 03:20:56,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.93 | bwd_microstep: 924.43 | bwd_inner_microstep: 924.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2473
[2024-06-10 03:20:58,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.53 | bwd_microstep: 1050.99 | bwd_inner_microstep: 1050.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2033
[2024-06-10 03:20:59,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.98 | bwd_microstep: 907.79 | bwd_inner_microstep: 907.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 03:21:01,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.39 | bwd_microstep: 1258.09 | bwd_inner_microstep: 1258.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 03:21:02,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1287.05 | bwd_inner_microstep: 1287.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 03:21:04,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537
[2024-06-10 03:21:06,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.35 | bwd_microstep: 1329.36 | bwd_inner_microstep: 1329.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 03:21:08,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1509.25 | bwd_inner_microstep: 1509.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536
[2024-06-10 03:21:10,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1426.14 | bwd_inner_microstep: 1426.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 03:21:12,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.36 | bwd_microstep: 1506.20 | bwd_inner_microstep: 1506.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 03:21:14,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.64 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2245
[2024-06-10 03:21:15,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.10 | bwd_microstep: 902.27 | bwd_inner_microstep: 902.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 03:21:17,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.49 | bwd_microstep: 1494.71 | bwd_inner_microstep: 1494.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3785
[2024-06-10 03:21:19,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.19 | bwd_microstep: 1455.04 | bwd_inner_microstep: 1455.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 03:21:21,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1408.18 | bwd_inner_microstep: 1408.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-10 03:21:23,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.26 | bwd_microstep: 1608.91 | bwd_inner_microstep: 1608.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3598
[2024-06-10 03:21:25,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1272.95 | bwd_inner_microstep: 1272.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-10 03:21:27,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1317.85 | bwd_inner_microstep: 1317.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235
[2024-06-10 03:21:28,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.00 | bwd_microstep: 868.38 | bwd_inner_microstep: 868.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 03:21:34,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.41 | optimizer_step: 6.63
[2024-06-10 03:21:34,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 5291.78 | bwd_inner_microstep: 1755.56 | bwd_allreduce_microstep: 3536.14 | step_microstep: 39.94
[2024-06-10 03:21:34,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16130.15 | bwd: 46649.90 | bwd_inner: 43112.72 | bwd_allreduce: 3536.44 | step: 42.26
{'loss': 1.2805, 'learning_rate': 3.963468910257959e-05, 'epoch': 0.09}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-10 03:21:36,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.43 | bwd_microstep: 1567.89 | bwd_inner_microstep: 1567.80 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 03:21:38,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1286.38 | bwd_inner_microstep: 1286.22 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 03:21:40,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1376.73 | bwd_inner_microstep: 1376.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 03:21:42,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1248.35 | bwd_inner_microstep: 1248.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 03:21:43,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.38 | bwd_microstep: 1177.38 | bwd_inner_microstep: 1177.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 03:21:46,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.38 | bwd_microstep: 1650.11 | bwd_inner_microstep: 1650.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-10 03:21:47,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.13 | bwd_microstep: 1219.39 | bwd_inner_microstep: 1219.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 03:21:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1388.61 | bwd_inner_microstep: 1388.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 03:21:50,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.38 | bwd_microstep: 793.91 | bwd_inner_microstep: 793.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3478
[2024-06-10 03:21:52,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.42 | bwd_microstep: 1216.67 | bwd_inner_microstep: 1216.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3678
[2024-06-10 03:21:54,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.16 | bwd_microstep: 1476.62 | bwd_inner_microstep: 1476.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 03:21:56,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3410
[2024-06-10 03:21:58,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.53 | bwd_microstep: 1395.06 | bwd_inner_microstep: 1395.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 03:22:00,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.43 | bwd_microstep: 1519.55 | bwd_inner_microstep: 1519.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-10 03:22:02,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 1596.61 | bwd_inner_microstep: 1596.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171
[2024-06-10 03:22:03,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.34 | bwd_microstep: 857.58 | bwd_inner_microstep: 857.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 03:22:05,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.00 | bwd_microstep: 798.50 | bwd_inner_microstep: 798.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 03:22:07,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1431.75 | bwd_inner_microstep: 1431.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 03:22:09,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.86 | bwd_microstep: 1388.39 | bwd_inner_microstep: 1388.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644
[2024-06-10 03:22:10,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.63 | bwd_microstep: 1420.97 | bwd_inner_microstep: 1420.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2015
[2024-06-10 03:22:12,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.63 | bwd_microstep: 840.27 | bwd_inner_microstep: 840.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 03:22:14,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1558.13 | bwd_inner_microstep: 1558.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523
[2024-06-10 03:22:16,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.82 | bwd_microstep: 1324.89 | bwd_inner_microstep: 1324.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 03:22:17,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.68 | bwd_microstep: 804.14 | bwd_inner_microstep: 804.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 03:22:19,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.93 | bwd_microstep: 1661.37 | bwd_inner_microstep: 1661.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 03:22:21,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.26 | bwd_microstep: 1292.20 | bwd_inner_microstep: 1292.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3489
[2024-06-10 03:22:23,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1335.60 | bwd_inner_microstep: 1335.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1429
[2024-06-10 03:22:23,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 208.54 | bwd_microstep: 537.94 | bwd_inner_microstep: 537.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 03:22:26,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.68 | bwd_microstep: 1648.18 | bwd_inner_microstep: 1648.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769
[2024-06-10 03:22:28,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.62 | bwd_microstep: 1346.75 | bwd_inner_microstep: 1346.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 03:22:29,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1249.26 | bwd_inner_microstep: 1249.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 03:22:36,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.82 | optimizer_gradients: 4.40 | optimizer_step: 6.64
[2024-06-10 03:22:36,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.24 | bwd_microstep: 5664.88 | bwd_inner_microstep: 1812.47 | bwd_allreduce_microstep: 3852.33 | step_microstep: 41.22
[2024-06-10 03:22:36,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15533.77 | bwd: 45452.91 | bwd_inner: 41599.43 | bwd_allreduce: 3852.68 | step: 42.99
{'loss': 1.3732, 'learning_rate': 3.962751345733034e-05, 'epoch': 0.09}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 03:22:38,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1470.97 | bwd_inner_microstep: 1470.81 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3890
[2024-06-10 03:22:40,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.28 | bwd_microstep: 1487.69 | bwd_inner_microstep: 1487.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 03:22:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1274.36 | bwd_inner_microstep: 1274.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 03:22:43,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1277.67 | bwd_inner_microstep: 1277.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 03:22:45,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.33 | bwd_microstep: 1532.08 | bwd_inner_microstep: 1532.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488
[2024-06-10 03:22:47,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.01 | bwd_microstep: 1219.86 | bwd_inner_microstep: 1219.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 03:22:48,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.80 | bwd_microstep: 796.68 | bwd_inner_microstep: 796.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 03:22:50,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.35 | bwd_microstep: 1250.95 | bwd_inner_microstep: 1250.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 03:22:52,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.08 | bwd_microstep: 1444.47 | bwd_inner_microstep: 1444.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3688
[2024-06-10 03:22:54,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.97 | bwd_microstep: 1828.13 | bwd_inner_microstep: 1828.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3768
[2024-06-10 03:22:57,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.93 | bwd_microstep: 1573.55 | bwd_inner_microstep: 1573.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 03:22:58,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1381.51 | bwd_inner_microstep: 1381.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 03:23:00,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.69 | bwd_microstep: 1418.97 | bwd_inner_microstep: 1418.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 03:23:03,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.40 | bwd_microstep: 1719.08 | bwd_inner_microstep: 1719.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3517
[2024-06-10 03:23:05,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1420.50 | bwd_inner_microstep: 1420.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3587
[2024-06-10 03:23:06,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.34 | bwd_microstep: 1212.98 | bwd_inner_microstep: 1212.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 03:23:08,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.71 | bwd_microstep: 1282.12 | bwd_inner_microstep: 1282.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3694
[2024-06-10 03:23:10,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1268.75 | bwd_inner_microstep: 1268.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-10 03:23:12,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.01 | bwd_microstep: 1185.98 | bwd_inner_microstep: 1185.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543
[2024-06-10 03:23:13,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1333.06 | bwd_inner_microstep: 1333.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-10 03:23:15,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.28 | bwd_microstep: 1184.57 | bwd_inner_microstep: 1184.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 03:23:17,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1427.81 | bwd_inner_microstep: 1427.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 03:23:19,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.84 | bwd_microstep: 1281.55 | bwd_inner_microstep: 1281.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 03:23:21,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.77 | bwd_microstep: 1285.57 | bwd_inner_microstep: 1285.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 03:23:23,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.63 | bwd_microstep: 1405.35 | bwd_inner_microstep: 1405.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3557
[2024-06-10 03:23:25,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.24 | bwd_microstep: 1529.48 | bwd_inner_microstep: 1529.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3811
[2024-06-10 03:23:27,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.02 | bwd_microstep: 1706.30 | bwd_inner_microstep: 1706.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3818
[2024-06-10 03:23:29,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.55 | bwd_microstep: 1755.52 | bwd_inner_microstep: 1755.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 03:23:31,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.34 | bwd_microstep: 802.83 | bwd_inner_microstep: 802.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3592
[2024-06-10 03:23:33,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1392.66 | bwd_inner_microstep: 1392.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2241
[2024-06-10 03:23:34,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.93 | bwd_microstep: 1068.27 | bwd_inner_microstep: 1068.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 03:23:37,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 03:23:37,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.28 | bwd_microstep: 2141.82 | bwd_inner_microstep: 1744.97 | bwd_allreduce_microstep: 396.80 | step_microstep: 38.68
[2024-06-10 03:23:37,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16458.79 | bwd: 44361.14 | bwd_inner: 43963.30 | bwd_allreduce: 397.09 | step: 40.43
{'loss': 1.39, 'learning_rate': 3.962026868408074e-05, 'epoch': 0.09}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486
[2024-06-10 03:23:39,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.63 | bwd_microstep: 1578.42 | bwd_inner_microstep: 1578.27 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2409
[2024-06-10 03:23:40,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.20 | bwd_microstep: 908.31 | bwd_inner_microstep: 908.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 03:23:42,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1495.56 | bwd_inner_microstep: 1495.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 03:23:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1345.50 | bwd_inner_microstep: 1345.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 03:23:46,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1655.80 | bwd_inner_microstep: 1655.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2203
[2024-06-10 03:23:48,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.18 | bwd_microstep: 956.56 | bwd_inner_microstep: 956.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 03:23:50,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.52 | bwd_microstep: 1636.02 | bwd_inner_microstep: 1635.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 03:23:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.53 | bwd_microstep: 1485.58 | bwd_inner_microstep: 1485.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 03:23:54,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.80 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 03:23:56,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1533.97 | bwd_inner_microstep: 1533.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3411
[2024-06-10 03:23:58,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1296.90 | bwd_inner_microstep: 1296.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 03:24:00,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.72 | bwd_microstep: 1259.80 | bwd_inner_microstep: 1259.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 03:24:01,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.48 | bwd_microstep: 1289.04 | bwd_inner_microstep: 1289.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2932
[2024-06-10 03:24:03,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.82 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656
[2024-06-10 03:24:05,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.12 | bwd_microstep: 1720.54 | bwd_inner_microstep: 1720.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 03:24:07,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.84 | bwd_microstep: 1282.27 | bwd_inner_microstep: 1282.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 03:24:09,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1314.16 | bwd_inner_microstep: 1314.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1991
[2024-06-10 03:24:10,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 897.81 | bwd_inner_microstep: 897.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 03:24:12,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.90 | bwd_microstep: 1491.18 | bwd_inner_microstep: 1491.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3838
[2024-06-10 03:24:15,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.09 | bwd_microstep: 1690.71 | bwd_inner_microstep: 1690.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 03:24:17,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1391.43 | bwd_inner_microstep: 1391.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-10 03:24:18,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.53 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 03:24:21,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.20 | bwd_microstep: 1559.40 | bwd_inner_microstep: 1559.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 03:24:22,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.50 | bwd_microstep: 1186.44 | bwd_inner_microstep: 1186.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 03:24:24,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.19 | bwd_microstep: 1560.46 | bwd_inner_microstep: 1560.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 03:24:26,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 1458.35 | bwd_inner_microstep: 1458.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782
[2024-06-10 03:24:28,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.94 | bwd_microstep: 1456.83 | bwd_inner_microstep: 1456.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 03:24:31,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.22 | bwd_microstep: 1648.46 | bwd_inner_microstep: 1648.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 03:24:33,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1412.21 | bwd_inner_microstep: 1412.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 03:24:35,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.83 | bwd_microstep: 1503.13 | bwd_inner_microstep: 1503.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 03:24:37,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.29 | bwd_microstep: 1406.72 | bwd_inner_microstep: 1406.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769
[2024-06-10 03:24:39,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.63
[2024-06-10 03:24:39,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.51 | bwd_microstep: 1390.62 | bwd_inner_microstep: 1381.70 | bwd_allreduce_microstep: 8.88 | step_microstep: 38.39
[2024-06-10 03:24:39,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16732.18 | bwd: 44733.05 | bwd_inner: 44723.15 | bwd_allreduce: 9.16 | step: 40.52
{'loss': 1.3768, 'learning_rate': 3.961295480834683e-05, 'epoch': 0.09}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 03:24:41,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1370.65 | bwd_inner_microstep: 1370.50 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 03:24:43,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.02 | bwd_microstep: 1663.25 | bwd_inner_microstep: 1663.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 03:24:45,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.82 | bwd_microstep: 1384.65 | bwd_inner_microstep: 1384.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 03:24:46,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.97 | bwd_microstep: 976.60 | bwd_inner_microstep: 976.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3916
[2024-06-10 03:24:48,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.72 | bwd_microstep: 1593.20 | bwd_inner_microstep: 1593.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823
[2024-06-10 03:24:50,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1417.49 | bwd_inner_microstep: 1417.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 03:24:52,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.12 | bwd_microstep: 1496.52 | bwd_inner_microstep: 1496.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-10 03:24:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.52 | bwd_microstep: 1313.06 | bwd_inner_microstep: 1313.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3415
[2024-06-10 03:24:56,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.01 | bwd_microstep: 1215.87 | bwd_inner_microstep: 1215.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 03:24:58,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1353.22 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972
[2024-06-10 03:24:59,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.53 | bwd_microstep: 830.49 | bwd_inner_microstep: 830.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 03:25:01,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.38 | bwd_microstep: 1451.32 | bwd_inner_microstep: 1451.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 03:25:03,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.33 | bwd_microstep: 1651.17 | bwd_inner_microstep: 1651.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3596
[2024-06-10 03:25:05,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1373.43 | bwd_inner_microstep: 1373.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2899
[2024-06-10 03:25:07,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1236.75 | bwd_inner_microstep: 1236.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2034
[2024-06-10 03:25:08,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.90 | bwd_microstep: 873.76 | bwd_inner_microstep: 873.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3545
[2024-06-10 03:25:10,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.74 | bwd_microstep: 1351.97 | bwd_inner_microstep: 1351.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2194
[2024-06-10 03:25:11,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.93 | bwd_microstep: 958.94 | bwd_inner_microstep: 958.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 03:25:13,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.71 | bwd_microstep: 1483.18 | bwd_inner_microstep: 1483.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 03:25:15,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.49 | bwd_microstep: 1489.95 | bwd_inner_microstep: 1489.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3730
[2024-06-10 03:25:17,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.24 | bwd_microstep: 1432.24 | bwd_inner_microstep: 1432.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 03:25:19,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3556
[2024-06-10 03:25:21,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.21 | bwd_microstep: 1209.12 | bwd_inner_microstep: 1209.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 03:25:23,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.04 | bwd_microstep: 1503.39 | bwd_inner_microstep: 1503.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 03:25:25,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.27 | bwd_microstep: 1404.97 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 03:25:27,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.38 | bwd_microstep: 1559.88 | bwd_inner_microstep: 1559.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609
[2024-06-10 03:25:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.87 | bwd_microstep: 1543.64 | bwd_inner_microstep: 1543.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3761
[2024-06-10 03:25:31,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1406.43 | bwd_inner_microstep: 1406.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3597
[2024-06-10 03:25:33,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.64 | bwd_microstep: 1342.33 | bwd_inner_microstep: 1342.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 03:25:35,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.68 | bwd_microstep: 1483.41 | bwd_inner_microstep: 1483.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 03:25:37,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1285.81 | bwd_inner_microstep: 1285.58 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 03:25:40,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 03:25:40,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.96 | bwd_microstep: 2204.97 | bwd_inner_microstep: 1681.36 | bwd_allreduce_microstep: 523.55 | step_microstep: 38.73
[2024-06-10 03:25:40,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16385.59 | bwd: 44268.28 | bwd_inner: 43743.53 | bwd_allreduce: 523.92 | step: 40.88
{'loss': 1.2441, 'learning_rate': 3.960557185588803e-05, 'epoch': 0.09}
52/1726 [2:42:06<27:22:46, 62.62s/it]
  9%|▉         | 153/1726 [2:43:08<27:17:37, 62.47s/it]


  9%|▉         | 153/1726 [2:43:08<27:17:37, 62.47s/it]
  9%|▉         | 154/1726 [2:44:11<27:22:06, 62.68s/it]


  9%|▉         | 154/1726 [2:44:11<27:22:06, 62.68s/it]
  9%|▉         | 155/1726 [2:45:12<27:10:42, 62.28s/it]


  9%|▉         | 155/1726 [2:45:12<27:10:42, 62.28s/it]
  9%|▉         | 156/1726 [2:46:14<27:01:08, 61.95s/it]


  9%|▉         | 156/1726 [2:46:14<27:01:08, 61.95s/it]
  9%|▉         | 157/1726 [2:47:15<26:59:17, 61.92s/it]


  9%|▉         | 157/1726 [2:47:15<26:59:17, 61.92s/it]
  9%|▉         | 158/1726 [2:48:16<26:51:17, 61.66s/it]


  9%|�dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-10 03:25:42,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.72 | bwd_microstep: 1431.08 | bwd_inner_microstep: 1430.86 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 4573
[2024-06-10 03:25:44,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1494.63 | bwd_inner_microstep: 1494.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 03:25:46,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1277.48 | bwd_inner_microstep: 1277.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2245
[2024-06-10 03:25:47,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.93 | bwd_microstep: 966.58 | bwd_inner_microstep: 966.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 03:25:49,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1387.16 | bwd_inner_microstep: 1387.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 03:25:50,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1251.46 | bwd_inner_microstep: 1251.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 03:25:52,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1250.76 | bwd_inner_microstep: 1250.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 03:25:54,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.22 | bwd_microstep: 1388.39 | bwd_inner_microstep: 1388.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3861
[2024-06-10 03:25:56,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.01 | bwd_microstep: 1666.49 | bwd_inner_microstep: 1666.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 03:25:59,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1523.64 | bwd_inner_microstep: 1523.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 03:26:00,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.23 | bwd_microstep: 1286.29 | bwd_inner_microstep: 1286.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3960
[2024-06-10 03:26:03,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.30 | bwd_microstep: 1601.33 | bwd_inner_microstep: 1601.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 03:26:04,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1378.25 | bwd_inner_microstep: 1378.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 03:26:06,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1390.25 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 03:26:08,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1516.17 | bwd_inner_microstep: 1516.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 03:26:11,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.50 | bwd_microstep: 1660.25 | bwd_inner_microstep: 1660.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 03:26:13,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1427.10 | bwd_inner_microstep: 1427.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 03:26:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.06 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 03:26:16,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.99 | bwd_microstep: 1191.92 | bwd_inner_microstep: 1191.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625
[2024-06-10 03:26:18,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.30 | bwd_microstep: 1445.36 | bwd_inner_microstep: 1445.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-10 03:26:20,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.51 | bwd_microstep: 1330.03 | bwd_inner_microstep: 1330.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 03:26:22,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.70 | bwd_microstep: 1353.40 | bwd_inner_microstep: 1353.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670
[2024-06-10 03:26:24,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1459.18 | bwd_inner_microstep: 1459.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-10 03:26:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1438.18 | bwd_inner_microstep: 1438.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 03:26:28,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.79 | bwd_microstep: 1507.18 | bwd_inner_microstep: 1507.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3598
[2024-06-10 03:26:30,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1313.50 | bwd_inner_microstep: 1313.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 03:26:32,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.91 | bwd_microstep: 1599.35 | bwd_inner_microstep: 1599.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 03:26:34,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1377.84 | bwd_inner_microstep: 1377.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 03:26:36,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1553.27 | bwd_inner_microstep: 1553.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567
[2024-06-10 03:26:38,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.84 | bwd_microstep: 1594.71 | bwd_inner_microstep: 1594.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3606
[2024-06-10 03:26:41,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.46 | bwd_microstep: 1708.62 | bwd_inner_microstep: 1708.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3606
[2024-06-10 03:26:43,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.47 | optimizer_step: 6.67
[2024-06-10 03:26:43,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.11 | bwd_microstep: 1491.05 | bwd_inner_microstep: 1482.95 | bwd_allreduce_microstep: 8.05 | step_microstep: 46.59
[2024-06-10 03:26:43,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17055.39 | bwd: 45547.52 | bwd_inner: 45538.39 | bwd_allreduce: 8.37 | step: 48.74
{'loss': 1.3482, 'learning_rate': 3.959811985270708e-05, 'epoch': 0.09}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 03:26:45,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1338.91 | bwd_inner_microstep: 1338.75 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 03:26:46,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.35 | bwd_microstep: 1385.10 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 03:26:49,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1486.87 | bwd_inner_microstep: 1486.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-10 03:26:51,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1454.46 | bwd_inner_microstep: 1454.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 03:26:52,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.83 | bwd_microstep: 1408.92 | bwd_inner_microstep: 1408.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 03:26:54,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.55 | bwd_microstep: 1184.91 | bwd_inner_microstep: 1184.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590
[2024-06-10 03:26:56,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1309.73 | bwd_inner_microstep: 1309.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 03:26:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.77 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 03:26:59,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 796.28 | bwd_inner_microstep: 796.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 03:27:01,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1284.61 | bwd_inner_microstep: 1284.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-10 03:27:03,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.44 | bwd_microstep: 1634.57 | bwd_inner_microstep: 1634.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 03:27:05,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.17 | bwd_microstep: 1421.24 | bwd_inner_microstep: 1421.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 03:27:07,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.92 | bwd_microstep: 1441.92 | bwd_inner_microstep: 1441.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691
[2024-06-10 03:27:09,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.85 | bwd_microstep: 1428.94 | bwd_inner_microstep: 1428.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 03:27:11,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1282.91 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-10 03:27:13,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.41 | bwd_microstep: 1341.45 | bwd_inner_microstep: 1341.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 03:27:15,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.62 | bwd_microstep: 1606.78 | bwd_inner_microstep: 1606.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3523
[2024-06-10 03:27:16,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.80 | bwd_microstep: 1201.27 | bwd_inner_microstep: 1201.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 03:27:18,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.71 | bwd_microstep: 1183.79 | bwd_inner_microstep: 1183.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 03:27:19,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.01 | bwd_microstep: 980.20 | bwd_inner_microstep: 980.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 03:27:21,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.23 | bwd_microstep: 1189.30 | bwd_inner_microstep: 1189.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3879
[2024-06-10 03:27:23,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.01 | bwd_microstep: 1689.38 | bwd_inner_microstep: 1689.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 03:27:26,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.23 | bwd_microstep: 1662.61 | bwd_inner_microstep: 1662.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 03:27:28,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.04 | bwd_microstep: 1637.09 | bwd_inner_microstep: 1637.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 03:27:30,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.74 | bwd_microstep: 1187.84 | bwd_inner_microstep: 1187.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 03:27:32,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1352.86 | bwd_inner_microstep: 1352.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 03:27:33,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.23 | bwd_microstep: 795.81 | bwd_inner_microstep: 795.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427
[2024-06-10 03:27:34,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.21 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596
[2024-06-10 03:27:36,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.06 | bwd_microstep: 1465.20 | bwd_inner_microstep: 1465.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3763
[2024-06-10 03:27:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1400.07 | bwd_inner_microstep: 1400.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-10 03:27:39,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.29 | bwd_microstep: 737.54 | bwd_inner_microstep: 737.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2275
[2024-06-10 03:27:43,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.38 | optimizer_step: 6.63
[2024-06-10 03:27:43,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.05 | bwd_microstep: 3398.54 | bwd_inner_microstep: 1176.46 | bwd_allreduce_microstep: 2222.00 | step_microstep: 39.76
[2024-06-10 03:27:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15814.49 | bwd: 44374.58 | bwd_inner: 42151.49 | bwd_allreduce: 2222.30 | step: 42.24
{'loss': 1.3937, 'learning_rate': 3.9590598825049896e-05, 'epoch': 0.09}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 03:27:45,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1235.38 | bwd_inner_microstep: 1235.28 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867
[2024-06-10 03:27:46,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.90 | bwd_microstep: 711.59 | bwd_inner_microstep: 711.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 03:27:47,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.57 | bwd_microstep: 969.48 | bwd_inner_microstep: 969.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 03:27:49,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.51 | bwd_microstep: 1371.26 | bwd_inner_microstep: 1371.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 03:27:51,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.49 | bwd_microstep: 1637.34 | bwd_inner_microstep: 1637.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 03:27:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.77 | bwd_microstep: 1534.18 | bwd_inner_microstep: 1534.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 03:27:55,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.46 | bwd_microstep: 1303.88 | bwd_inner_microstep: 1303.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3726
[2024-06-10 03:27:57,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.47 | bwd_microstep: 1336.20 | bwd_inner_microstep: 1336.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 03:27:59,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1388.63 | bwd_inner_microstep: 1388.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 03:28:01,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.73 | bwd_microstep: 1515.79 | bwd_inner_microstep: 1515.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 03:28:03,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.45 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 03:28:05,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.73 | bwd_microstep: 1288.71 | bwd_inner_microstep: 1288.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 03:28:07,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1422.30 | bwd_inner_microstep: 1422.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1960
[2024-06-10 03:28:08,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.00 | bwd_microstep: 735.87 | bwd_inner_microstep: 735.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002
[2024-06-10 03:28:09,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.74 | bwd_microstep: 898.16 | bwd_inner_microstep: 898.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1984
[2024-06-10 03:28:10,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.71 | bwd_microstep: 859.35 | bwd_inner_microstep: 859.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 03:28:12,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.75 | bwd_microstep: 1384.73 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 03:28:14,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.43 | bwd_microstep: 1489.10 | bwd_inner_microstep: 1489.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 03:28:16,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.26 | bwd_microstep: 973.65 | bwd_inner_microstep: 973.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 03:28:17,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.05 | bwd_microstep: 1155.70 | bwd_inner_microstep: 1155.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 03:28:19,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.52 | bwd_microstep: 1488.52 | bwd_inner_microstep: 1488.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3745
[2024-06-10 03:28:21,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.36 | bwd_microstep: 1473.41 | bwd_inner_microstep: 1473.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 03:28:23,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1455.69 | bwd_inner_microstep: 1455.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2443
[2024-06-10 03:28:25,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.33 | bwd_microstep: 1046.37 | bwd_inner_microstep: 1046.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 03:28:27,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.41 | bwd_microstep: 1189.84 | bwd_inner_microstep: 1189.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 03:28:28,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.77 | bwd_microstep: 808.29 | bwd_inner_microstep: 808.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-10 03:28:30,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.48 | bwd_microstep: 1536.44 | bwd_inner_microstep: 1536.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2198
[2024-06-10 03:28:31,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.12 | bwd_microstep: 958.97 | bwd_inner_microstep: 958.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 03:28:33,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.71 | bwd_microstep: 1472.96 | bwd_inner_microstep: 1472.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 03:28:35,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1529.18 | bwd_inner_microstep: 1529.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 03:28:37,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1410.15 | bwd_inner_microstep: 1410.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545
[2024-06-10 03:28:46,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.39 | optimizer_step: 6.59
[2024-06-10 03:28:46,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.52 | bwd_microstep: 8067.19 | bwd_inner_microstep: 1720.28 | bwd_allreduce_microstep: 6346.84 | step_microstep: 39.98
[2024-06-10 03:28:46,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15226.89 | bwd: 47036.29 | bwd_inner: 40688.42 | bwd_allreduce: 6347.15 | step: 42.21
{'loss': 1.3438, 'learning_rate': 3.958300879940549e-05, 'epoch': 0.09}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 03:28:48,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.36 | bwd_microstep: 1562.83 | bwd_inner_microstep: 1562.65 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 03:28:50,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.99 | bwd_microstep: 1150.90 | bwd_inner_microstep: 1150.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 03:28:51,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.59 | bwd_microstep: 1248.77 | bwd_inner_microstep: 1248.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480
[2024-06-10 03:28:53,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.45 | bwd_microstep: 1440.00 | bwd_inner_microstep: 1439.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 03:28:55,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1387.59 | bwd_inner_microstep: 1387.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 03:28:57,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1385.66 | bwd_inner_microstep: 1385.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3746
[2024-06-10 03:28:59,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.10 | bwd_microstep: 1469.87 | bwd_inner_microstep: 1469.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 03:29:01,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.05 | bwd_microstep: 1544.96 | bwd_inner_microstep: 1544.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 03:29:03,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.48 | bwd_microstep: 1257.00 | bwd_inner_microstep: 1256.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2129
[2024-06-10 03:29:04,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.87 | bwd_microstep: 959.81 | bwd_inner_microstep: 959.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 03:29:06,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1248.42 | bwd_inner_microstep: 1248.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3529
[2024-06-10 03:29:08,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1588.16 | bwd_inner_microstep: 1588.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 03:29:10,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1345.25 | bwd_inner_microstep: 1345.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 03:29:12,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.02 | bwd_microstep: 1605.70 | bwd_inner_microstep: 1605.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 03:29:14,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.81 | bwd_microstep: 1488.33 | bwd_inner_microstep: 1488.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-10 03:29:17,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.95 | bwd_microstep: 1712.51 | bwd_inner_microstep: 1712.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1978
[2024-06-10 03:29:18,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.65 | bwd_microstep: 831.73 | bwd_inner_microstep: 831.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 03:29:20,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.69 | bwd_microstep: 1586.64 | bwd_inner_microstep: 1586.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 03:29:22,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1393.06 | bwd_inner_microstep: 1393.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 03:29:24,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.52 | bwd_microstep: 1281.67 | bwd_inner_microstep: 1281.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2244
[2024-06-10 03:29:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.62 | bwd_microstep: 971.60 | bwd_inner_microstep: 971.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3519
[2024-06-10 03:29:27,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.75 | bwd_microstep: 1256.20 | bwd_inner_microstep: 1256.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 03:29:29,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.52 | bwd_microstep: 1432.55 | bwd_inner_microstep: 1432.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 03:29:31,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.19 | bwd_microstep: 1753.51 | bwd_inner_microstep: 1753.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 03:29:33,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1251.02 | bwd_inner_microstep: 1250.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3679
[2024-06-10 03:29:35,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1553.26 | bwd_inner_microstep: 1553.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2079
[2024-06-10 03:29:36,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.14 | bwd_microstep: 859.22 | bwd_inner_microstep: 859.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3627
[2024-06-10 03:29:38,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.10 | bwd_microstep: 1341.86 | bwd_inner_microstep: 1341.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2021
[2024-06-10 03:29:39,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.34 | bwd_microstep: 864.21 | bwd_inner_microstep: 864.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3846
[2024-06-10 03:29:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.42 | bwd_microstep: 1596.52 | bwd_inner_microstep: 1596.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3579
[2024-06-10 03:29:44,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.94 | bwd_microstep: 1697.84 | bwd_inner_microstep: 1697.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 03:29:48,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 03:29:48,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.64 | bwd_microstep: 3383.24 | bwd_inner_microstep: 1743.94 | bwd_allreduce_microstep: 1639.24 | step_microstep: 39.31
[2024-06-10 03:29:48,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16300.95 | bwd: 45449.92 | bwd_inner: 43809.64 | bwd_allreduce: 1639.53 | step: 41.19
{'loss': 1.3152, 'learning_rate': 3.957534980250588e-05, 'epoch': 0.09}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 03:29:50,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1337.47 | bwd_inner_microstep: 1337.40 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 03:29:52,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.67 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 03:29:53,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.71 | bwd_microstep: 1292.13 | bwd_inner_microstep: 1292.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 03:29:56,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.25 | bwd_microstep: 1566.50 | bwd_inner_microstep: 1566.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 03:29:58,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1399.97 | bwd_inner_microstep: 1399.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749
[2024-06-10 03:30:00,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.67 | bwd_microstep: 1437.49 | bwd_inner_microstep: 1437.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4108
[2024-06-10 03:30:02,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.58 | bwd_microstep: 1636.54 | bwd_inner_microstep: 1636.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 03:30:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.97 | bwd_microstep: 1635.60 | bwd_inner_microstep: 1635.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 03:30:06,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1389.68 | bwd_inner_microstep: 1389.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942
[2024-06-10 03:30:07,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.44 | bwd_microstep: 883.90 | bwd_inner_microstep: 883.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4059
[2024-06-10 03:30:09,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.70 | bwd_microstep: 1650.64 | bwd_inner_microstep: 1650.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3495
[2024-06-10 03:30:12,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1543.45 | bwd_inner_microstep: 1543.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3411
[2024-06-10 03:30:14,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.33 | bwd_microstep: 1505.52 | bwd_inner_microstep: 1505.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2125
[2024-06-10 03:30:15,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.84 | bwd_microstep: 927.14 | bwd_inner_microstep: 927.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475
[2024-06-10 03:30:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1405.36 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721
[2024-06-10 03:30:19,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.06 | bwd_microstep: 1338.95 | bwd_inner_microstep: 1338.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 03:30:20,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3643
[2024-06-10 03:30:22,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 03:30:24,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1400.31 | bwd_inner_microstep: 1400.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2971
[2024-06-10 03:30:26,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.63 | bwd_microstep: 1108.97 | bwd_inner_microstep: 1108.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 03:30:28,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.62 | bwd_microstep: 1405.23 | bwd_inner_microstep: 1405.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3725
[2024-06-10 03:30:30,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.13 | bwd_microstep: 1243.81 | bwd_inner_microstep: 1243.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 03:30:31,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.43 | bwd_microstep: 1260.75 | bwd_inner_microstep: 1260.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 03:30:32,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 699.45 | bwd_inner_microstep: 699.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-10 03:30:34,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.06 | bwd_microstep: 1300.21 | bwd_inner_microstep: 1300.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 03:30:36,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.09 | bwd_microstep: 1512.77 | bwd_inner_microstep: 1512.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903
[2024-06-10 03:30:38,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1502.43 | bwd_inner_microstep: 1502.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3217
[2024-06-10 03:30:40,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.09 | bwd_microstep: 1211.16 | bwd_inner_microstep: 1211.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3605
[2024-06-10 03:30:42,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1372.11 | bwd_inner_microstep: 1372.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 03:30:44,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1496.58 | bwd_inner_microstep: 1496.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 03:30:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.24 | bwd_microstep: 1456.66 | bwd_inner_microstep: 1456.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 03:30:49,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-10 03:30:49,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.28 | bwd_microstep: 1937.61 | bwd_inner_microstep: 1735.88 | bwd_allreduce_microstep: 201.68 | step_microstep: 38.71
[2024-06-10 03:30:49,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16344.24 | bwd: 43842.41 | bwd_inner: 43639.68 | bwd_allreduce: 201.97 | step: 40.71
{'loss': 1.3282, 'learning_rate': 3.956762186132604e-05, 'epoch': 0.09}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5241
[2024-06-10 03:30:51,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 698.39 | bwd_microstep: 1837.83 | bwd_inner_microstep: 1837.59 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 03:30:52,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.94 | bwd_microstep: 792.26 | bwd_inner_microstep: 792.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 03:30:54,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 03:30:56,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1294.92 | bwd_inner_microstep: 1294.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 03:30:57,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.64 | bwd_microstep: 1151.04 | bwd_inner_microstep: 1151.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 03:30:59,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1396.12 | bwd_inner_microstep: 1396.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 03:31:02,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.48 | bwd_microstep: 1642.75 | bwd_inner_microstep: 1642.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 03:31:04,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1487.86 | bwd_inner_microstep: 1487.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 03:31:06,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400
[2024-06-10 03:31:07,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.52 | bwd_microstep: 1190.06 | bwd_inner_microstep: 1190.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3406
[2024-06-10 03:31:09,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.00 | bwd_microstep: 1307.11 | bwd_inner_microstep: 1307.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 03:31:11,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.14 | bwd_microstep: 1484.22 | bwd_inner_microstep: 1484.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3497
[2024-06-10 03:31:13,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.60 | bwd_microstep: 1362.81 | bwd_inner_microstep: 1362.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3418
[2024-06-10 03:31:15,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1400.92 | bwd_inner_microstep: 1400.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 03:31:17,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.60 | bwd_microstep: 1347.98 | bwd_inner_microstep: 1347.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 03:31:18,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.20 | bwd_microstep: 1165.25 | bwd_inner_microstep: 1165.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 03:31:20,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.66 | bwd_microstep: 1255.44 | bwd_inner_microstep: 1255.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-10 03:31:21,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.89 | bwd_microstep: 878.54 | bwd_inner_microstep: 878.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 03:31:23,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.93 | bwd_microstep: 1255.83 | bwd_inner_microstep: 1255.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-10 03:31:25,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.45 | bwd_microstep: 1470.19 | bwd_inner_microstep: 1470.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 03:31:27,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1431.74 | bwd_inner_microstep: 1431.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 03:31:29,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.82 | bwd_microstep: 1659.03 | bwd_inner_microstep: 1659.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2325
[2024-06-10 03:31:31,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.04 | bwd_microstep: 1087.89 | bwd_inner_microstep: 1087.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658
[2024-06-10 03:31:33,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.49 | bwd_microstep: 1426.91 | bwd_inner_microstep: 1426.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 03:31:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.29 | bwd_microstep: 811.73 | bwd_inner_microstep: 811.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 03:31:36,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.19 | bwd_microstep: 1317.90 | bwd_inner_microstep: 1317.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2470
[2024-06-10 03:31:37,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.97 | bwd_microstep: 955.66 | bwd_inner_microstep: 955.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-10 03:31:39,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.62 | bwd_microstep: 1625.49 | bwd_inner_microstep: 1625.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 03:31:41,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.38 | bwd_microstep: 916.91 | bwd_inner_microstep: 916.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 03:31:43,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.74 | bwd_microstep: 1657.09 | bwd_inner_microstep: 1657.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-10 03:31:44,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.39 | bwd_microstep: 897.38 | bwd_inner_microstep: 897.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3428
[2024-06-10 03:31:48,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 03:31:48,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.70 | bwd_microstep: 3576.89 | bwd_inner_microstep: 1376.39 | bwd_allreduce_microstep: 2200.44 | step_microstep: 39.26
[2024-06-10 03:31:48,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15546.58 | bwd: 43755.98 | bwd_inner: 41554.44 | bwd_allreduce: 2200.76 | step: 41.40
��         | 158/1726 [2:48:16<26:51:17, 61.66s/it]
  9%|▉         | 159/1726 [2:49:19<27:00:49, 62.06s/it]


  9%|▉         | 159/1726 [2:49:19<27:00:49, 62.06s/it]
  9%|▉         | 160/1726 [2:50:20<26:48:09, 61.61s/it]


  9%|▉         | 160/1726 [2:50:20<26:48:09, 61.61s/it]
  9%|▉         | 161/1726 [2:51:23<26:55:07, 61.92s/it]


  9%|▉         | 161/1726 [2:51:23<26:55:07, 61.92s/it]
  9%|▉         | 162/1726 [2:52:25<26:55:44, 61.99s/it]


  9%|▉         | 162/1726 [2:52:25<26:55:44, 61.99s/it]
  9%|▉         | 163/1726 [2:53:25<26:43:35, 61.56s/it]


  9%|▉         | 163/1726 [2:53:25<26:43:35, 61.56s/it]
 10%|▉         | 164/1726 [2:54:25<26:27:53, 60.99s/it]
                                                 {'loss': 1.2669, 'learning_rate': 3.955982500308373e-05, 'epoch': 0.1}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014
[2024-06-10 03:31:49,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.77 | bwd_microstep: 890.50 | bwd_inner_microstep: 890.28 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 03:31:51,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.13 | bwd_microstep: 1280.18 | bwd_inner_microstep: 1280.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3919
[2024-06-10 03:31:53,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.00 | bwd_microstep: 1593.65 | bwd_inner_microstep: 1593.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 03:31:56,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.83 | bwd_microstep: 1501.91 | bwd_inner_microstep: 1501.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3874
[2024-06-10 03:31:58,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.42 | bwd_microstep: 1650.21 | bwd_inner_microstep: 1650.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 03:32:00,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1486.46 | bwd_inner_microstep: 1486.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 03:32:01,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.35 | bwd_microstep: 789.85 | bwd_inner_microstep: 789.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 03:32:02,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.06 | bwd_microstep: 792.77 | bwd_inner_microstep: 792.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 03:32:04,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.28 | bwd_microstep: 1388.23 | bwd_inner_microstep: 1388.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-10 03:32:05,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.66 | bwd_microstep: 783.16 | bwd_inner_microstep: 783.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 03:32:07,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1246.84 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1966
[2024-06-10 03:32:08,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.41 | bwd_microstep: 888.89 | bwd_inner_microstep: 888.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 03:32:10,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.91 | bwd_microstep: 1616.45 | bwd_inner_microstep: 1616.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2800
[2024-06-10 03:32:12,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.10 | bwd_microstep: 1014.43 | bwd_inner_microstep: 1014.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 03:32:14,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1445.41 | bwd_inner_microstep: 1445.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 03:32:16,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1388.99 | bwd_inner_microstep: 1388.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 03:32:17,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.34 | bwd_microstep: 1261.78 | bwd_inner_microstep: 1261.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3538
[2024-06-10 03:32:19,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.41 | bwd_microstep: 1456.79 | bwd_inner_microstep: 1456.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 03:32:21,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1353.91 | bwd_inner_microstep: 1353.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 03:32:23,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.02 | bwd_microstep: 1525.75 | bwd_inner_microstep: 1525.66 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.22
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 03:32:25,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.16 | bwd_microstep: 1508.60 | bwd_inner_microstep: 1508.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-10 03:32:28,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.58 | bwd_microstep: 1617.82 | bwd_inner_microstep: 1617.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 03:32:30,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1638.42 | bwd_inner_microstep: 1638.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 03:32:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.69 | bwd_microstep: 1456.03 | bwd_inner_microstep: 1456.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 03:32:34,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1516.51 | bwd_inner_microstep: 1516.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 03:32:36,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1558.19 | bwd_inner_microstep: 1558.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 03:32:38,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1553.11 | bwd_inner_microstep: 1553.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803
[2024-06-10 03:32:40,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1459.49 | bwd_inner_microstep: 1459.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 03:32:42,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2923
[2024-06-10 03:32:44,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.32 | bwd_microstep: 1199.44 | bwd_inner_microstep: 1199.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3449
[2024-06-10 03:32:46,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1484.52 | bwd_inner_microstep: 1484.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3807
[2024-06-10 03:32:50,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.29 | optimizer_step: 6.59
[2024-06-10 03:32:50,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.05 | bwd_microstep: 3582.54 | bwd_inner_microstep: 1704.90 | bwd_allreduce_microstep: 1877.58 | step_microstep: 39.34
[2024-06-10 03:32:50,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16124.08 | bwd: 45218.21 | bwd_inner: 43339.43 | bwd_allreduce: 1877.94 | step: 41.38
{'loss': 1.3369, 'learning_rate': 3.955195925523944e-05, 'epoch': 0.1}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 03:32:51,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.17 | bwd_microstep: 780.48 | bwd_inner_microstep: 780.37 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047
[2024-06-10 03:32:52,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.68 | bwd_microstep: 812.21 | bwd_inner_microstep: 812.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 03:32:54,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.25 | bwd_microstep: 1552.72 | bwd_inner_microstep: 1552.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3877
[2024-06-10 03:32:56,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1490.90 | bwd_inner_microstep: 1490.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 03:32:58,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.26 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2447
[2024-06-10 03:33:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.76 | bwd_microstep: 951.31 | bwd_inner_microstep: 951.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 03:33:01,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.59 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 03:33:03,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1386.79 | bwd_inner_microstep: 1386.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 03:33:05,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.97 | bwd_microstep: 1485.08 | bwd_inner_microstep: 1485.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975
[2024-06-10 03:33:06,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.54 | bwd_microstep: 766.59 | bwd_inner_microstep: 766.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 03:33:08,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.29 | bwd_microstep: 1449.03 | bwd_inner_microstep: 1449.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 03:33:10,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.03 | bwd_microstep: 892.43 | bwd_inner_microstep: 892.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 03:33:12,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.87 | bwd_microstep: 1344.11 | bwd_inner_microstep: 1344.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 03:33:14,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.93 | bwd_microstep: 1450.40 | bwd_inner_microstep: 1450.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3658
[2024-06-10 03:33:16,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.95 | bwd_microstep: 1659.07 | bwd_inner_microstep: 1659.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 03:33:18,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1392.62 | bwd_inner_microstep: 1392.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 03:33:20,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.82 | bwd_microstep: 1516.35 | bwd_inner_microstep: 1516.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 03:33:22,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.53 | bwd_microstep: 1293.38 | bwd_inner_microstep: 1293.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 03:33:24,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1421.79 | bwd_inner_microstep: 1421.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 03:33:25,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.35 | bwd_microstep: 1295.88 | bwd_inner_microstep: 1295.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.27
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 03:33:27,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.60 | bwd_microstep: 1301.46 | bwd_inner_microstep: 1301.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 03:33:29,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.29 | bwd_microstep: 1666.95 | bwd_inner_microstep: 1666.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 03:33:31,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1456.16 | bwd_inner_microstep: 1456.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563
[2024-06-10 03:33:34,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.43 | bwd_microstep: 1590.70 | bwd_inner_microstep: 1590.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1956
[2024-06-10 03:33:35,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.00 | bwd_microstep: 734.52 | bwd_inner_microstep: 734.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 03:33:37,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1348.07 | bwd_inner_microstep: 1348.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 03:33:38,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.90 | bwd_microstep: 1166.71 | bwd_inner_microstep: 1166.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 03:33:40,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.00 | bwd_microstep: 1652.06 | bwd_inner_microstep: 1652.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-10 03:33:43,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1604.65 | bwd_inner_microstep: 1604.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 03:33:45,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.58 | bwd_microstep: 1472.44 | bwd_inner_microstep: 1472.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 03:33:47,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1510.31 | bwd_inner_microstep: 1510.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2050
[2024-06-10 03:33:51,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 03:33:51,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.45 | bwd_microstep: 3334.26 | bwd_inner_microstep: 1159.99 | bwd_allreduce_microstep: 2174.21 | step_microstep: 39.27
[2024-06-10 03:33:51,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15769.09 | bwd: 44411.85 | bwd_inner: 42236.55 | bwd_allreduce: 2174.50 | step: 41.47
{'loss': 1.3077, 'learning_rate': 3.954402464549628e-05, 'epoch': 0.1}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399
[2024-06-10 03:33:53,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1438.12 | bwd_inner_microstep: 1437.78 | bwd_allreduce_microstep: 0.15 | step_microstep: 0.26
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2415
[2024-06-10 03:33:54,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.85 | bwd_microstep: 908.67 | bwd_inner_microstep: 908.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 03:33:56,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.83 | bwd_microstep: 1539.15 | bwd_inner_microstep: 1539.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2892
[2024-06-10 03:33:58,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.41 | bwd_microstep: 1186.97 | bwd_inner_microstep: 1186.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 03:33:59,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.78 | bwd_microstep: 795.85 | bwd_inner_microstep: 795.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 03:34:01,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.58 | bwd_microstep: 1438.12 | bwd_inner_microstep: 1438.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 03:34:02,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.54 | bwd_microstep: 791.15 | bwd_inner_microstep: 791.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3733
[2024-06-10 03:34:04,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3588
[2024-06-10 03:34:05,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.45 | bwd_microstep: 1219.58 | bwd_inner_microstep: 1219.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3587
[2024-06-10 03:34:07,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.87 | bwd_microstep: 1438.11 | bwd_inner_microstep: 1438.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057
[2024-06-10 03:34:09,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.11 | bwd_microstep: 852.73 | bwd_inner_microstep: 852.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670
[2024-06-10 03:34:11,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.16 | bwd_microstep: 1520.80 | bwd_inner_microstep: 1520.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 03:34:13,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1350.17 | bwd_inner_microstep: 1350.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 03:34:15,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1411.34 | bwd_inner_microstep: 1411.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742
[2024-06-10 03:34:17,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.68 | bwd_microstep: 1531.71 | bwd_inner_microstep: 1531.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 03:34:19,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.65 | bwd_microstep: 1499.96 | bwd_inner_microstep: 1499.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3518
[2024-06-10 03:34:21,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.94 | bwd_microstep: 1452.49 | bwd_inner_microstep: 1452.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 03:34:23,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.08 | bwd_microstep: 1485.86 | bwd_inner_microstep: 1485.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 03:34:25,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.68 | bwd_microstep: 1661.97 | bwd_inner_microstep: 1661.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 03:34:27,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.39 | bwd_microstep: 1287.66 | bwd_inner_microstep: 1287.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 03:34:29,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1315.12 | bwd_inner_microstep: 1315.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 03:34:31,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.70 | bwd_microstep: 1400.59 | bwd_inner_microstep: 1400.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 03:34:32,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.04 | bwd_microstep: 1258.80 | bwd_inner_microstep: 1258.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185
[2024-06-10 03:34:34,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.77 | bwd_microstep: 954.78 | bwd_inner_microstep: 954.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 03:34:36,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.04 | bwd_microstep: 1627.14 | bwd_inner_microstep: 1627.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 03:34:38,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1555.59 | bwd_inner_microstep: 1555.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 03:34:40,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1604.35 | bwd_inner_microstep: 1604.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3570
[2024-06-10 03:34:42,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.35 | bwd_microstep: 1208.54 | bwd_inner_microstep: 1208.38 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.16
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3498
[2024-06-10 03:34:44,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.97 | bwd_microstep: 1221.81 | bwd_inner_microstep: 1221.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 03:34:45,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.01 | bwd_microstep: 1322.32 | bwd_inner_microstep: 1322.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-10 03:34:47,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768
[2024-06-10 03:34:54,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.59
[2024-06-10 03:34:54,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.90 | bwd_microstep: 6081.54 | bwd_inner_microstep: 1665.94 | bwd_allreduce_microstep: 4415.53 | step_microstep: 39.59
[2024-06-10 03:34:54,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.98 | bwd: 47220.34 | bwd_inner: 42803.48 | bwd_allreduce: 4415.99 | step: 42.10
{'loss': 1.3495, 'learning_rate': 3.9536021201799934e-05, 'epoch': 0.1}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 03:34:56,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1373.03 | bwd_inner_microstep: 1372.80 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3913
[2024-06-10 03:34:58,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.58 | bwd_microstep: 1586.54 | bwd_inner_microstep: 1586.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871
[2024-06-10 03:35:01,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.88 | bwd_microstep: 1662.26 | bwd_inner_microstep: 1662.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3796
[2024-06-10 03:35:03,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.71 | bwd_microstep: 1575.35 | bwd_inner_microstep: 1575.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.21
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 03:35:04,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1250.36 | bwd_inner_microstep: 1250.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 03:35:06,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1286.08 | bwd_inner_microstep: 1286.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3847
[2024-06-10 03:35:08,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.95 | bwd_microstep: 1463.66 | bwd_inner_microstep: 1463.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-10 03:35:10,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1287.45 | bwd_inner_microstep: 1287.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 03:35:12,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.25 | bwd_microstep: 1282.79 | bwd_inner_microstep: 1282.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1879
[2024-06-10 03:35:13,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.77 | bwd_microstep: 773.59 | bwd_inner_microstep: 773.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 03:35:15,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.97 | bwd_microstep: 1412.58 | bwd_inner_microstep: 1412.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3482
[2024-06-10 03:35:17,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.31 | bwd_microstep: 1513.11 | bwd_inner_microstep: 1513.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1938
[2024-06-10 03:35:18,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.95 | bwd_microstep: 887.32 | bwd_inner_microstep: 887.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 03:35:20,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1247.72 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3974
[2024-06-10 03:35:22,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.85 | bwd_microstep: 1608.90 | bwd_inner_microstep: 1608.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 03:35:24,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.52 | bwd_microstep: 1306.09 | bwd_inner_microstep: 1306.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 03:35:26,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.99 | bwd_microstep: 1393.83 | bwd_inner_microstep: 1393.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 03:35:28,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1414.43 | bwd_inner_microstep: 1414.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3904
[2024-06-10 03:35:30,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.94 | bwd_microstep: 1726.32 | bwd_inner_microstep: 1726.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 03:35:32,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.92 | bwd_microstep: 1416.54 | bwd_inner_microstep: 1416.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 03:35:34,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1496.47 | bwd_inner_microstep: 1496.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2099
[2024-06-10 03:35:35,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.90 | bwd_microstep: 729.42 | bwd_inner_microstep: 729.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 03:35:37,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1427.53 | bwd_inner_microstep: 1427.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 03:35:39,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.35 | bwd_microstep: 1560.85 | bwd_inner_microstep: 1560.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 03:35:41,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.95 | bwd_microstep: 1282.12 | bwd_inner_microstep: 1282.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 03:35:42,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 800.59 | bwd_inner_microstep: 800.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 03:35:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.73 | bwd_microstep: 973.72 | bwd_inner_microstep: 973.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 03:35:46,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.77 | bwd_microstep: 1637.94 | bwd_inner_microstep: 1637.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 03:35:48,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.45 | bwd_microstep: 1751.62 | bwd_inner_microstep: 1751.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3423
[2024-06-10 03:35:50,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.76 | bwd_microstep: 1474.66 | bwd_inner_microstep: 1474.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3585
[2024-06-10 03:35:52,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.46 | bwd_microstep: 1209.88 | bwd_inner_microstep: 1209.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 03:35:54,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.74 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 03:35:54,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.61 | bwd_microstep: 1884.55 | bwd_inner_microstep: 927.21 | bwd_allreduce_microstep: 957.29 | step_microstep: 40.15
[2024-06-10 03:35:54,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15967.81 | bwd: 43697.31 | bwd_inner: 42738.94 | bwd_allreduce: 957.62 | step: 44.51
{'loss': 1.2892, 'learning_rate': 3.952794895233847e-05, 'epoch': 0.1}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3455
[2024-06-10 03:35:56,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.67 | bwd_microstep: 1478.58 | bwd_inner_microstep: 1478.46 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 03:35:58,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.43 | bwd_microstep: 1184.34 | bwd_inner_microstep: 1184.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 03:36:00,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1375.77 | bwd_inner_microstep: 1375.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 03:36:02,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.62 | bwd_microstep: 1645.86 | bwd_inner_microstep: 1645.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 03:36:04,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.96 | bwd_microstep: 1281.63 | bwd_inner_microstep: 1281.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2212
[2024-06-10 03:36:05,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.19 | bwd_microstep: 891.32 | bwd_inner_microstep: 891.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 03:36:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.05 | bwd_microstep: 1286.09 | bwd_inner_microstep: 1286.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3696
[2024-06-10 03:36:09,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.12 | bwd_microstep: 1328.57 | bwd_inner_microstep: 1328.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 03:36:10,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.93 | bwd_microstep: 791.42 | bwd_inner_microstep: 791.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 03:36:12,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1432.16 | bwd_inner_microstep: 1432.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3934
[2024-06-10 03:36:14,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.42 | bwd_microstep: 1600.44 | bwd_inner_microstep: 1600.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3672
[2024-06-10 03:36:16,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.86 | bwd_microstep: 1585.64 | bwd_inner_microstep: 1585.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 03:36:18,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1381.37 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969
[2024-06-10 03:36:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.14 | bwd_microstep: 852.61 | bwd_inner_microstep: 852.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 03:36:21,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1342.08 | bwd_inner_microstep: 1342.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3438
[2024-06-10 03:36:23,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.78 | bwd_microstep: 1517.39 | bwd_inner_microstep: 1517.25 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 03:36:25,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.40 | bwd_microstep: 1523.89 | bwd_inner_microstep: 1523.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3604
[2024-06-10 03:36:27,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.49 | bwd_microstep: 1566.39 | bwd_inner_microstep: 1566.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3497
[2024-06-10 03:36:29,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.24 | bwd_microstep: 1350.61 | bwd_inner_microstep: 1350.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3811
[2024-06-10 03:36:31,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.65 | bwd_microstep: 1321.28 | bwd_inner_microstep: 1321.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1989
[2024-06-10 03:36:32,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.97 | bwd_microstep: 831.13 | bwd_inner_microstep: 831.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 03:36:34,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.08 | bwd_microstep: 1555.08 | bwd_inner_microstep: 1555.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 03:36:36,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.15 | bwd_microstep: 1301.15 | bwd_inner_microstep: 1301.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 03:36:38,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1460.29 | bwd_inner_microstep: 1460.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 03:36:40,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.34 | bwd_microstep: 1451.21 | bwd_inner_microstep: 1451.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-10 03:36:42,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1478.39 | bwd_inner_microstep: 1478.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 03:36:44,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1452.34 | bwd_inner_microstep: 1452.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 03:36:46,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3395
[2024-06-10 03:36:48,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.95 | bwd_microstep: 1365.21 | bwd_inner_microstep: 1365.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 03:36:50,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1503.38 | bwd_inner_microstep: 1503.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 03:36:52,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.52 | bwd_microstep: 1591.86 | bwd_inner_microstep: 1591.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3422
[2024-06-10 03:36:55,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.31 | optimizer_step: 6.61
[2024-06-10 03:36:55,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 2377.37 | bwd_inner_microstep: 1775.26 | bwd_allreduce_microstep: 602.05 | step_microstep: 39.46
[2024-06-10 03:36:55,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16365.15 | bwd: 44454.19 | bwd_inner: 43851.01 | bwd_allreduce: 602.39 | step: 41.59
{'loss': 1.3249, 'learning_rate': 3.951980792554231e-05, 'epoch': 0.1}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 03:36:56,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.70 | bwd_microstep: 696.68 | bwd_inner_microstep: 696.51 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3918
[2024-06-10 03:36:59,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.91 | bwd_microstep: 1693.92 | bwd_inner_microstep: 1693.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897
[2024-06-10 03:37:01,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1587.39 | bwd_inner_microstep: 1587.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 03:37:02,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.95 | bwd_microstep: 789.76 | bwd_inner_microstep: 789.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 03:37:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.26 | bwd_microstep: 1482.72 | bwd_inner_microstep: 1482.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3749
[2024-06-10 03:37:06,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.62 | bwd_microstep: 1344.60 | bwd_inner_microstep: 1344.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 03:37:08,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.92 | bwd_microstep: 1182.77 | bwd_inner_microstep: 1182.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 03:37:09,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.22 | bwd_microstep: 1390.70 | bwd_inner_microstep: 1390.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-10 03:37:10,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.47 | bwd_microstep: 704.86 | bwd_inner_microstep: 704.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-10 03:37:13,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.52 | bwd_microstep: 1535.04 | bwd_inner_microstep: 1535.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4045
[2024-06-10 03:37:15,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 705.60 | bwd_microstep: 1928.53 | bwd_inner_microstep: 1928.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-10 03:37:17,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 1327.34 | bwd_inner_microstep: 1327.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3682
[2024-06-10 03:37:19,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.42 | bwd_microstep: 1688.51 | bwd_inner_microstep: 1688.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3516
[2024-06-10 03:37:21,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1522.25 | bwd_inner_microstep: 1522.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 03:37:24,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.88 | bwd_microstep: 1721.00 | bwd_inner_microstep: 1720.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 03:37:26,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.00 | bwd_microstep: 1456.38 | bwd_inner_microstep: 1456.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 03:37:28,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 03:37:30,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.83 | bwd_microstep: 1286.83 | bwd_inner_microstep: 1286.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 03:37:31,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1397.82 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 03:37:34,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.71 | bwd_microstep: 1499.65 | bwd_inner_microstep: 1499.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 03:37:35,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1426.57 | bwd_inner_microstep: 1426.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 03:37:38,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.46 | bwd_microstep: 1513.82 | bwd_inner_microstep: 1513.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 03:37:39,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.48 | bwd_microstep: 1287.57 | bwd_inner_microstep: 1287.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4078
[2024-06-10 03:37:42,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.92 | bwd_microstep: 1630.46 | bwd_inner_microstep: 1630.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046
[2024-06-10 03:37:43,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.43 | bwd_microstep: 812.04 | bwd_inner_microstep: 812.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 03:37:44,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.40 | bwd_microstep: 1160.29 | bwd_inner_microstep: 1160.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2954
[2024-06-10 03:37:46,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.08 | bwd_microstep: 1294.18 | bwd_inner_microstep: 1294.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 03:37:48,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.00 | bwd_microstep: 1319.61 | bwd_inner_microstep: 1319.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2272
[2024-06-10 03:37:49,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.53 | bwd_microstep: 1005.15 | bwd_inner_microstep: 1005.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814
[2024-06-10 03:37:52,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.24 | bwd_microstep: 1731.21 | bwd_inner_microstep: 1731.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3813
[2024-06-10 03:37:54,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.95 | bwd_microstep: 1720.01 | bwd_inner_microstep: 1719.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 03:37:56,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 03:37:56,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.38 | bwd_microstep: 1697.00 | bwd_inner_microstep: 1689.05 | bwd_allreduce_microstep: 7.89 | step_microstep: 38.88
[2024-06-10 03:37:56,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16508.25 | bwd: 44183.16 | bwd_inner: 44174.21 | bwd_allreduce: 8.19 | step: 40.99


 10%|▉         | 164/1726 [2:54:25<26:27:53, 60.99s/it]
 10%|▉         | 165/1726 [2:55:27<26:32:39, 61.22s/it]


 10%|▉         | 165/1726 [2:55:27<26:32:39, 61.22s/it]
 10%|▉         | 166/1726 [2:56:27<26:26:31, 61.02s/it]


 10%|▉         | 166/1726 [2:56:27<26:26:31, 61.02s/it]
 10%|▉         | 167/1726 [2:57:31<26:45:30, 61.79s/it]


 10%|▉         | 167/1726 [2:57:31<26:45:30, 61.79s/it]
 10%|▉         | 168/1726 [2:58:31<26:30:54, 61.27s/it]


 10%|▉         | 168/1726 [2:58:31<26:30:54, 61.27s/it]
 10%|▉         | 169/1726 [2:59:32<26:29:27, 61.25s/it]


 10%|▉         | 169/1726 [2:59:32<26:29:27, 61.25s/it]
 10%|▉         | 170/1726 [3:00:33<26:27:05, 61.20s/it]
                                   {'loss': 1.3553, 'learning_rate': 3.951159815008411e-05, 'epoch': 0.1}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 03:37:58,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.36 | bwd_microstep: 1153.75 | bwd_inner_microstep: 1153.63 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515
[2024-06-10 03:38:00,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.93 | bwd_microstep: 1227.28 | bwd_inner_microstep: 1227.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3906
[2024-06-10 03:38:02,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1492.09 | bwd_inner_microstep: 1492.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2340
[2024-06-10 03:38:03,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.66 | bwd_microstep: 987.83 | bwd_inner_microstep: 987.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 03:38:05,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.98 | bwd_microstep: 960.42 | bwd_inner_microstep: 960.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4117
[2024-06-10 03:38:07,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.34 | bwd_microstep: 1545.53 | bwd_inner_microstep: 1545.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3729
[2024-06-10 03:38:09,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.24 | bwd_microstep: 1302.60 | bwd_inner_microstep: 1302.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3486
[2024-06-10 03:38:10,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.90 | bwd_microstep: 1237.83 | bwd_inner_microstep: 1237.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 03:38:12,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.53 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 03:38:13,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.97 | bwd_microstep: 798.27 | bwd_inner_microstep: 798.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 03:38:14,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.42 | bwd_microstep: 682.85 | bwd_inner_microstep: 682.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4078
[2024-06-10 03:38:16,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.60 | bwd_microstep: 1440.34 | bwd_inner_microstep: 1440.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 03:38:18,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.74 | bwd_microstep: 1531.15 | bwd_inner_microstep: 1531.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 03:38:19,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.94 | bwd_microstep: 795.68 | bwd_inner_microstep: 795.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 03:38:21,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1293.29 | bwd_inner_microstep: 1293.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 03:38:23,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.63 | bwd_microstep: 1618.67 | bwd_inner_microstep: 1618.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2647
[2024-06-10 03:38:25,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.35 | bwd_microstep: 1021.09 | bwd_inner_microstep: 1021.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-10 03:38:27,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.45 | bwd_microstep: 1410.60 | bwd_inner_microstep: 1410.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3536
[2024-06-10 03:38:29,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.54 | bwd_microstep: 1589.77 | bwd_inner_microstep: 1589.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 03:38:31,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.25 | bwd_microstep: 1464.68 | bwd_inner_microstep: 1464.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 03:38:33,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.00 | bwd_microstep: 1352.84 | bwd_inner_microstep: 1352.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1986
[2024-06-10 03:38:34,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.72 | bwd_microstep: 897.13 | bwd_inner_microstep: 897.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3582
[2024-06-10 03:38:36,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.84 | bwd_microstep: 1336.42 | bwd_inner_microstep: 1336.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-10 03:38:38,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.57 | bwd_microstep: 1626.46 | bwd_inner_microstep: 1626.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 03:38:40,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1398.28 | bwd_inner_microstep: 1398.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3491
[2024-06-10 03:38:42,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.65 | bwd_microstep: 1222.28 | bwd_inner_microstep: 1222.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024
[2024-06-10 03:38:43,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.70 | bwd_microstep: 715.02 | bwd_inner_microstep: 714.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553
[2024-06-10 03:38:45,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.53 | bwd_microstep: 1330.73 | bwd_inner_microstep: 1330.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 03:38:47,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.50 | bwd_microstep: 1399.27 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 03:38:48,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.30 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2057
[2024-06-10 03:38:49,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.78 | bwd_microstep: 726.82 | bwd_inner_microstep: 726.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 03:38:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.39 | optimizer_step: 6.58
[2024-06-10 03:38:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 9169.05 | bwd_inner_microstep: 1750.71 | bwd_allreduce_microstep: 7418.26 | step_microstep: 39.90
[2024-06-10 03:38:59,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14997.24 | bwd: 47299.57 | bwd_inner: 39880.27 | bwd_allreduce: 7418.56 | step: 41.62
{'loss': 1.3726, 'learning_rate': 3.9503319654878655e-05, 'epoch': 0.1}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 03:39:01,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1338.52 | bwd_inner_microstep: 1338.37 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 03:39:03,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.96 | bwd_microstep: 1244.35 | bwd_inner_microstep: 1244.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 03:39:04,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.65 | bwd_microstep: 1248.52 | bwd_inner_microstep: 1248.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 03:39:07,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1556.47 | bwd_inner_microstep: 1556.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903
[2024-06-10 03:39:09,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1585.75 | bwd_inner_microstep: 1585.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764
[2024-06-10 03:39:11,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 1436.17 | bwd_inner_microstep: 1436.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 03:39:13,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.91 | bwd_microstep: 1650.01 | bwd_inner_microstep: 1649.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 03:39:14,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 788.38 | bwd_inner_microstep: 788.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 03:39:16,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1280.78 | bwd_inner_microstep: 1280.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 03:39:18,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1530.06 | bwd_inner_microstep: 1530.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 03:39:20,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1286.80 | bwd_inner_microstep: 1286.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 03:39:22,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.29 | bwd_microstep: 1388.27 | bwd_inner_microstep: 1388.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 03:39:24,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1318.11 | bwd_inner_microstep: 1318.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3685
[2024-06-10 03:39:26,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.57 | bwd_microstep: 1689.94 | bwd_inner_microstep: 1689.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 03:39:28,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.81 | bwd_microstep: 1419.98 | bwd_inner_microstep: 1419.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 03:39:30,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.01 | bwd_microstep: 1351.06 | bwd_inner_microstep: 1351.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964
[2024-06-10 03:39:31,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 829.32 | bwd_inner_microstep: 829.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 03:39:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1396.82 | bwd_inner_microstep: 1396.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3548
[2024-06-10 03:39:35,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.08 | bwd_microstep: 1522.41 | bwd_inner_microstep: 1522.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2933
[2024-06-10 03:39:36,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.68 | bwd_microstep: 1132.61 | bwd_inner_microstep: 1132.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 03:39:38,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.18 | bwd_microstep: 1441.01 | bwd_inner_microstep: 1440.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 03:39:40,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1396.89 | bwd_inner_microstep: 1396.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3492
[2024-06-10 03:39:42,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.82 | bwd_microstep: 1337.51 | bwd_inner_microstep: 1337.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 03:39:43,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.41 | bwd_microstep: 699.20 | bwd_inner_microstep: 699.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-10 03:39:45,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.01 | bwd_microstep: 1330.08 | bwd_inner_microstep: 1330.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 03:39:47,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.36 | bwd_microstep: 1660.35 | bwd_inner_microstep: 1660.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 03:39:49,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.20 | bwd_microstep: 1547.12 | bwd_inner_microstep: 1547.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2152
[2024-06-10 03:39:51,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.53 | bwd_microstep: 852.78 | bwd_inner_microstep: 852.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 03:39:53,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1557.20 | bwd_inner_microstep: 1557.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 03:39:55,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1286.28 | bwd_inner_microstep: 1286.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2036
[2024-06-10 03:39:56,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.84 | bwd_microstep: 776.30 | bwd_inner_microstep: 776.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-10 03:39:59,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.40 | optimizer_step: 6.58
[2024-06-10 03:39:59,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.05 | bwd_microstep: 3321.67 | bwd_inner_microstep: 1343.22 | bwd_allreduce_microstep: 1978.38 | step_microstep: 39.85
[2024-06-10 03:40:00,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15789.06 | bwd: 44200.77 | bwd_inner: 42221.35 | bwd_allreduce: 1978.68 | step: 41.97
{'loss': 1.3523, 'learning_rate': 3.9494972469082764e-05, 'epoch': 0.1}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 03:40:01,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.29 | bwd_microstep: 1177.10 | bwd_inner_microstep: 1177.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3973
[2024-06-10 03:40:03,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.70 | bwd_microstep: 1343.01 | bwd_inner_microstep: 1342.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 03:40:05,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 03:40:07,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.56 | bwd_microstep: 1685.18 | bwd_inner_microstep: 1685.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4409
[2024-06-10 03:40:10,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.68 | bwd_microstep: 1817.98 | bwd_inner_microstep: 1817.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 03:40:11,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.48 | bwd_microstep: 1247.27 | bwd_inner_microstep: 1247.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 03:40:14,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1487.79 | bwd_inner_microstep: 1487.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 03:40:15,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.28 | bwd_microstep: 1190.53 | bwd_inner_microstep: 1190.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 03:40:17,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1396.22 | bwd_inner_microstep: 1396.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870
[2024-06-10 03:40:18,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.75 | bwd_microstep: 716.58 | bwd_inner_microstep: 716.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 03:40:20,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1388.86 | bwd_inner_microstep: 1388.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409
[2024-06-10 03:40:22,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.26 | bwd_microstep: 1315.55 | bwd_inner_microstep: 1315.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 03:40:24,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1292.08 | bwd_inner_microstep: 1292.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 03:40:25,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.00 | bwd_microstep: 807.87 | bwd_inner_microstep: 807.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 03:40:27,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.43 | bwd_microstep: 1385.26 | bwd_inner_microstep: 1385.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2099
[2024-06-10 03:40:28,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.14 | bwd_microstep: 796.05 | bwd_inner_microstep: 796.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 03:40:30,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.79 | bwd_microstep: 1516.50 | bwd_inner_microstep: 1516.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2140
[2024-06-10 03:40:31,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.65 | bwd_microstep: 930.91 | bwd_inner_microstep: 930.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3617
[2024-06-10 03:40:34,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.61 | bwd_microstep: 1761.35 | bwd_inner_microstep: 1761.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-10 03:40:36,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.50 | bwd_microstep: 1617.77 | bwd_inner_microstep: 1617.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 03:40:38,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.37 | bwd_microstep: 1558.68 | bwd_inner_microstep: 1558.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 03:40:40,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1559.51 | bwd_inner_microstep: 1559.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3540
[2024-06-10 03:40:42,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.38 | bwd_microstep: 1564.15 | bwd_inner_microstep: 1564.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 03:40:44,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1491.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 03:40:46,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1280.73 | bwd_inner_microstep: 1280.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2555
[2024-06-10 03:40:47,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.58 | bwd_microstep: 969.44 | bwd_inner_microstep: 969.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466
[2024-06-10 03:40:49,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.01 | bwd_microstep: 1214.69 | bwd_inner_microstep: 1214.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 03:40:51,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.26 | bwd_microstep: 1227.51 | bwd_inner_microstep: 1227.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3815
[2024-06-10 03:40:53,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.00 | bwd_microstep: 1706.31 | bwd_inner_microstep: 1706.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 03:40:55,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1493.87 | bwd_inner_microstep: 1493.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 03:40:58,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.08 | bwd_microstep: 1652.14 | bwd_inner_microstep: 1652.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2031
[2024-06-10 03:41:02,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.43 | optimizer_step: 6.59
[2024-06-10 03:41:02,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.42 | bwd_microstep: 3646.66 | bwd_inner_microstep: 905.86 | bwd_allreduce_microstep: 2740.72 | step_microstep: 40.01
[2024-06-10 03:41:02,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16027.04 | bwd: 45624.21 | bwd_inner: 42882.55 | bwd_allreduce: 2740.96 | step: 42.07
{'loss': 1.3256, 'learning_rate': 3.9486556622095185e-05, 'epoch': 0.1}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563
[2024-06-10 03:41:04,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1454.99 | bwd_inner_microstep: 1454.82 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 03:41:05,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.80 | bwd_microstep: 787.76 | bwd_inner_microstep: 787.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 03:41:06,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.16 | bwd_microstep: 1275.26 | bwd_inner_microstep: 1275.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 03:41:08,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.79 | bwd_microstep: 1399.04 | bwd_inner_microstep: 1399.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-10 03:41:09,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.56 | bwd_microstep: 681.35 | bwd_inner_microstep: 681.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 03:41:10,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.97 | bwd_microstep: 788.90 | bwd_inner_microstep: 788.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1966
[2024-06-10 03:41:11,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.41 | bwd_microstep: 766.94 | bwd_inner_microstep: 766.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2211
[2024-06-10 03:41:13,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.16 | bwd_microstep: 921.19 | bwd_inner_microstep: 921.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 03:41:15,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 03:41:17,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1415.30 | bwd_inner_microstep: 1415.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 03:41:19,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1387.20 | bwd_inner_microstep: 1387.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3612
[2024-06-10 03:41:21,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.37 | bwd_microstep: 1578.23 | bwd_inner_microstep: 1578.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669
[2024-06-10 03:41:23,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.88 | bwd_microstep: 1551.60 | bwd_inner_microstep: 1551.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 03:41:25,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.57 | bwd_microstep: 1484.29 | bwd_inner_microstep: 1484.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-10 03:41:27,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 03:41:29,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.68 | bwd_microstep: 1448.06 | bwd_inner_microstep: 1448.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479
[2024-06-10 03:41:31,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1220.97 | bwd_inner_microstep: 1220.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2074
[2024-06-10 03:41:32,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.12 | bwd_microstep: 918.01 | bwd_inner_microstep: 917.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 03:41:34,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.71 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 03:41:35,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.20 | bwd_microstep: 1298.49 | bwd_inner_microstep: 1298.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 03:41:37,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 03:41:39,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.54 | bwd_microstep: 1302.99 | bwd_inner_microstep: 1302.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 03:41:41,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.90 | bwd_microstep: 1354.88 | bwd_inner_microstep: 1354.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 03:41:43,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.98 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 03:41:45,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.41 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 03:41:47,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1286.09 | bwd_inner_microstep: 1286.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3525
[2024-06-10 03:41:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.78 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940
[2024-06-10 03:41:49,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.78 | bwd_microstep: 763.68 | bwd_inner_microstep: 763.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3779
[2024-06-10 03:41:51,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1416.08 | bwd_inner_microstep: 1416.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 03:41:53,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.72 | bwd_microstep: 1486.84 | bwd_inner_microstep: 1486.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585
[2024-06-10 03:41:56,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.97 | bwd_microstep: 1536.74 | bwd_inner_microstep: 1536.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049
[2024-06-10 03:42:05,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.42 | optimizer_step: 6.59
[2024-06-10 03:42:05,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.15 | bwd_microstep: 8689.21 | bwd_inner_microstep: 934.51 | bwd_allreduce_microstep: 7754.62 | step_microstep: 40.13
[2024-06-10 03:42:05,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15028.90 | bwd: 47727.04 | bwd_inner: 39971.33 | bwd_allreduce: 7754.93 | step: 42.22
{'loss': 1.3271, 'learning_rate': 3.947807214355648e-05, 'epoch': 0.1}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 03:42:07,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1344.71 | bwd_inner_microstep: 1344.53 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 03:42:08,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1345.26 | bwd_inner_microstep: 1345.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.98
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 03:42:10,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1378.84 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 03:42:12,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1393.18 | bwd_inner_microstep: 1393.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 4161
[2024-06-10 03:42:14,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.03 | bwd_microstep: 1413.51 | bwd_inner_microstep: 1413.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2652
[2024-06-10 03:42:16,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.80 | bwd_microstep: 1117.00 | bwd_inner_microstep: 1116.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749
[2024-06-10 03:42:18,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.35 | bwd_microstep: 1434.87 | bwd_inner_microstep: 1434.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 03:42:19,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1246.96 | bwd_inner_microstep: 1246.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 03:42:21,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.16 | bwd_microstep: 1287.85 | bwd_inner_microstep: 1287.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 03:42:23,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.26 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3504
[2024-06-10 03:42:25,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.87 | bwd_microstep: 1223.91 | bwd_inner_microstep: 1223.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 03:42:27,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.11 | bwd_microstep: 1493.40 | bwd_inner_microstep: 1493.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 03:42:29,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1490.41 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 03:42:31,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.52 | bwd_microstep: 1450.68 | bwd_inner_microstep: 1450.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3662
[2024-06-10 03:42:33,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.42 | bwd_microstep: 1547.89 | bwd_inner_microstep: 1547.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2129
[2024-06-10 03:42:34,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.05 | bwd_microstep: 883.22 | bwd_inner_microstep: 883.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 03:42:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1356.68 | bwd_inner_microstep: 1356.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2099
[2024-06-10 03:42:37,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.51 | bwd_microstep: 732.78 | bwd_inner_microstep: 732.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 03:42:39,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1393.82 | bwd_inner_microstep: 1393.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 03:42:41,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.72 | bwd_microstep: 1297.91 | bwd_inner_microstep: 1297.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 03:42:43,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1436.90 | bwd_inner_microstep: 1436.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 03:42:45,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.02 | bwd_microstep: 1500.14 | bwd_inner_microstep: 1500.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954
[2024-06-10 03:42:46,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.11 | bwd_microstep: 705.94 | bwd_inner_microstep: 705.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 03:42:47,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 810.68 | bwd_inner_microstep: 810.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3615
[2024-06-10 03:42:49,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.21 | bwd_microstep: 1542.31 | bwd_inner_microstep: 1542.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 03:42:51,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.72 | bwd_microstep: 1558.45 | bwd_inner_microstep: 1558.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3499
[2024-06-10 03:42:53,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.19 | bwd_microstep: 1446.34 | bwd_inner_microstep: 1446.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 03:42:55,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.70 | bwd_microstep: 1342.84 | bwd_inner_microstep: 1342.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 03:42:57,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.88 | bwd_microstep: 1545.88 | bwd_inner_microstep: 1545.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 03:43:00,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.96 | bwd_microstep: 1658.22 | bwd_inner_microstep: 1658.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 03:43:02,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1506.53 | bwd_inner_microstep: 1506.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 03:43:04,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 03:43:04,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1519.48 | bwd_inner_microstep: 1511.66 | bwd_allreduce_microstep: 7.76 | step_microstep: 38.65
[2024-06-10 03:43:04,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16026.01 | bwd: 42793.91 | bwd_inner: 42785.10 | bwd_allreduce: 8.06 | step: 43.01
{'loss': 1.3189, 'learning_rate': 3.946951906334895e-05, 'epoch': 0.1}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 03:43:06,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.31 | bwd_microstep: 1481.19 | bwd_inner_microstep: 1481.09 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3916
[2024-06-10 03:43:08,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.43 | bwd_microstep: 1556.24 | bwd_inner_microstep: 1556.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4528
[2024-06-10 03:43:10,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.51 | bwd_microstep: 1643.59 | bwd_inner_microstep: 1643.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-10 03:43:12,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.08 | bwd_microstep: 1446.23 | bwd_inner_microstep: 1446.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2046
[2024-06-10 03:43:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.06 | bwd_microstep: 747.43 | bwd_inner_microstep: 747.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 03:43:14,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.78 | bwd_microstep: 793.61 | bwd_inner_microstep: 793.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 03:43:16,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.41 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3980
[2024-06-10 03:43:19,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.77 | bwd_microstep: 1750.36 | bwd_inner_microstep: 1750.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3696
[2024-06-10 03:43:21,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 03:43:22,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1348.00 | bwd_inner_microstep: 1347.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504
[2024-06-10 03:43:25,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.04 | bwd_microstep: 1582.31 | bwd_inner_microstep: 1582.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3393
[2024-06-10 03:43:26,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1276.43 | bwd_inner_microstep: 1276.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 03:43:28,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.80 | bwd_microstep: 1513.03 | bwd_inner_microstep: 1513.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-10 03:43:30,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1245.67 | bwd_inner_microstep: 1245.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 03:43:31,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.70 | bwd_microstep: 729.98 | bwd_inner_microstep: 729.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-10 03:43:32,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.36 | bwd_microstep: 832.84 | bwd_inner_microstep: 832.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 03:43:34,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.72 | bwd_microstep: 1284.96 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 03:43:36,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1408.88 | bwd_inner_microstep: 1408.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 03:43:38,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.94 | bwd_microstep: 1428.77 | bwd_inner_microstep: 1428.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 619
[2024-06-10 03:43:38,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.68 | bwd_microstep: 264.15 | bwd_inner_microstep: 264.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 03:43:40,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.66 | bwd_microstep: 1455.59 | bwd_inner_microstep: 1455.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 03:43:42,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.41 | bwd_microstep: 1298.10 | bwd_inner_microstep: 1298.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3819
[2024-06-10 03:43:44,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.83 | bwd_microstep: 1508.44 | bwd_inner_microstep: 1508.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 03:43:45,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 701.34 | bwd_inner_microstep: 701.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 03:43:47,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1404.61 | bwd_inner_microstep: 1404.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3782
[2024-06-10 03:43:49,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.04 | bwd_microstep: 1388.10 | bwd_inner_microstep: 1388.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 03:43:51,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.10 | bwd_microstep: 1454.73 | bwd_inner_microstep: 1454.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 03:43:53,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.73 | bwd_microstep: 979.95 | bwd_inner_microstep: 979.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 03:43:55,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1503.26 | bwd_inner_microstep: 1503.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3814
[2024-06-10 03:43:57,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.85 | bwd_microstep: 1690.01 | bwd_inner_microstep: 1689.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2914
[2024-06-10 03:43:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.81 | bwd_microstep: 1097.99 | bwd_inner_microstep: 1097.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 03:44:05,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.42 | optimizer_step: 6.58
[2024-06-10 03:44:05,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 5780.60 | bwd_inner_microstep: 1527.75 | bwd_allreduce_microstep: 4252.78 | step_microstep: 40.00
[2024-06-10 03:44:05,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15327.98 | bwd: 45219.79 | bwd_inner: 40965.99 | bwd_allreduce: 4253.08 | step: 42.22


 10%|▉         | 170/1726 [3:00:33<26:27:05, 61.20s/it]
 10%|▉         | 171/1726 [3:01:36<26:37:26, 61.64s/it]


 10%|▉         | 171/1726 [3:01:36<26:37:26, 61.64s/it]
 10%|▉         | 172/1726 [3:02:36<26:26:32, 61.26s/it]


 10%|▉         | 172/1726 [3:02:36<26:26:32, 61.26s/it]
 10%|█         | 173/1726 [3:03:38<26:31:32, 61.49s/it]


 10%|█         | 173/1726 [3:03:38<26:31:32, 61.49s/it]
 10%|█         | 174/1726 [3:04:41<26:43:16, 61.98s/it]


 10%|█         | 174/1726 [3:04:41<26:43:16, 61.98s/it]
 10%|█         | 175/1726 [3:05:41<26:20:39, 61.15s/it]


 10%|█         | 175/1726 [3:05:41<26:20:39, 61.15s/it]
 10%|█         | 176/1726 [3:06:42<26:17:54, 61.08s/it]
                     {'loss': 1.3416, 'learning_rate': 3.946089741159648e-05, 'epoch': 0.1}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3549
[2024-06-10 03:44:07,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1450.92 | bwd_inner_microstep: 1450.69 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 03:44:09,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.81 | bwd_microstep: 1308.17 | bwd_inner_microstep: 1308.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3951
[2024-06-10 03:44:11,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.51 | bwd_microstep: 1498.69 | bwd_inner_microstep: 1498.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 03:44:13,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1549.89 | bwd_inner_microstep: 1549.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3869
[2024-06-10 03:44:15,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.39 | bwd_microstep: 1565.59 | bwd_inner_microstep: 1565.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2501
[2024-06-10 03:44:16,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.89 | bwd_microstep: 930.40 | bwd_inner_microstep: 930.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2637
[2024-06-10 03:44:18,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.77 | bwd_microstep: 1023.47 | bwd_inner_microstep: 1023.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 03:44:19,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1258.99 | bwd_inner_microstep: 1258.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749
[2024-06-10 03:44:21,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.65 | bwd_microstep: 1438.22 | bwd_inner_microstep: 1438.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 03:44:23,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.28 | bwd_microstep: 1252.86 | bwd_inner_microstep: 1252.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2067
[2024-06-10 03:44:24,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.82 | bwd_microstep: 797.80 | bwd_inner_microstep: 797.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 03:44:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.13 | bwd_microstep: 1451.11 | bwd_inner_microstep: 1451.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137
[2024-06-10 03:44:28,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.62 | bwd_microstep: 931.24 | bwd_inner_microstep: 931.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 03:44:29,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1383.21 | bwd_inner_microstep: 1383.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 03:44:32,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.23 | bwd_microstep: 1487.32 | bwd_inner_microstep: 1487.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 03:44:33,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.71 | bwd_microstep: 1419.34 | bwd_inner_microstep: 1419.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 03:44:35,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.60 | bwd_microstep: 1194.43 | bwd_inner_microstep: 1194.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2332
[2024-06-10 03:44:37,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.81 | bwd_microstep: 989.68 | bwd_inner_microstep: 989.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059
[2024-06-10 03:44:38,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.25 | bwd_microstep: 916.92 | bwd_inner_microstep: 916.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 03:44:40,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 1406.73 | bwd_inner_microstep: 1406.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3550
[2024-06-10 03:44:42,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.74 | bwd_microstep: 1272.06 | bwd_inner_microstep: 1272.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-10 03:44:44,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.04 | bwd_microstep: 1643.95 | bwd_inner_microstep: 1643.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2170
[2024-06-10 03:44:45,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.64 | bwd_microstep: 864.46 | bwd_inner_microstep: 864.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 03:44:47,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.72 | bwd_microstep: 1658.46 | bwd_inner_microstep: 1658.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 03:44:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.86 | bwd_microstep: 1509.80 | bwd_inner_microstep: 1509.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284
[2024-06-10 03:44:51,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.74 | bwd_microstep: 877.55 | bwd_inner_microstep: 877.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 03:44:53,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.52 | bwd_microstep: 1655.98 | bwd_inner_microstep: 1655.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2560
[2024-06-10 03:44:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.83 | bwd_microstep: 1065.99 | bwd_inner_microstep: 1065.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2009
[2024-06-10 03:44:55,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.95 | bwd_microstep: 710.39 | bwd_inner_microstep: 710.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3397
[2024-06-10 03:44:57,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.59 | bwd_microstep: 1490.76 | bwd_inner_microstep: 1490.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3818
[2024-06-10 03:45:00,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.50 | bwd_microstep: 1701.07 | bwd_inner_microstep: 1701.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1926
[2024-06-10 03:45:07,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.42 | optimizer_step: 6.60
[2024-06-10 03:45:07,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.81 | bwd_microstep: 6717.71 | bwd_inner_microstep: 826.64 | bwd_allreduce_microstep: 5890.99 | step_microstep: 40.03
[2024-06-10 03:45:07,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15140.10 | bwd: 46423.19 | bwd_inner: 40531.07 | bwd_allreduce: 5891.33 | step: 42.06
{'loss': 1.2773, 'learning_rate': 3.94522072186645e-05, 'epoch': 0.1}
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3411
[2024-06-10 03:45:08,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.59 | bwd_microstep: 1197.18 | bwd_inner_microstep: 1196.98 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 03:45:10,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1243.36 | bwd_inner_microstep: 1243.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 03:45:12,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1372.37 | bwd_inner_microstep: 1372.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 03:45:14,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1391.16 | bwd_inner_microstep: 1391.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 03:45:16,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.50 | bwd_microstep: 1445.05 | bwd_inner_microstep: 1445.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 03:45:18,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.74 | bwd_microstep: 1647.92 | bwd_inner_microstep: 1647.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-10 03:45:19,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.75 | bwd_microstep: 699.18 | bwd_inner_microstep: 699.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4033
[2024-06-10 03:45:21,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.09 | bwd_microstep: 1551.02 | bwd_inner_microstep: 1551.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 03:45:23,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1387.90 | bwd_inner_microstep: 1387.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3486
[2024-06-10 03:45:25,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.05 | bwd_microstep: 1434.65 | bwd_inner_microstep: 1434.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 03:45:27,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.66 | bwd_microstep: 1343.01 | bwd_inner_microstep: 1342.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3665
[2024-06-10 03:45:29,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 03:45:31,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.05 | bwd_microstep: 1515.41 | bwd_inner_microstep: 1515.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934
[2024-06-10 03:45:32,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 761.65 | bwd_inner_microstep: 761.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822
[2024-06-10 03:45:34,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.18 | bwd_microstep: 1356.11 | bwd_inner_microstep: 1356.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 03:45:36,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.28 | bwd_microstep: 1462.87 | bwd_inner_microstep: 1462.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 03:45:38,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.33 | bwd_microstep: 1257.77 | bwd_inner_microstep: 1257.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2081
[2024-06-10 03:45:39,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.06 | bwd_microstep: 822.12 | bwd_inner_microstep: 822.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 03:45:41,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1292.97 | bwd_inner_microstep: 1292.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2104
[2024-06-10 03:45:42,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 829.03 | bwd_inner_microstep: 829.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 03:45:44,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.07 | bwd_microstep: 1260.82 | bwd_inner_microstep: 1260.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 03:45:46,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1327.67 | bwd_inner_microstep: 1327.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 03:45:47,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1420.12 | bwd_inner_microstep: 1420.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 03:45:49,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.04 | bwd_microstep: 1285.75 | bwd_inner_microstep: 1285.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 03:45:50,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.94 | bwd_microstep: 813.67 | bwd_inner_microstep: 813.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 03:45:52,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.33 | bwd_microstep: 967.71 | bwd_inner_microstep: 967.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 03:45:54,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.65 | bwd_microstep: 1451.94 | bwd_inner_microstep: 1451.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 03:45:56,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1491.48 | bwd_inner_microstep: 1491.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074
[2024-06-10 03:45:57,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.58 | bwd_microstep: 1012.98 | bwd_inner_microstep: 1012.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2462
[2024-06-10 03:45:59,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.73 | bwd_microstep: 1022.44 | bwd_inner_microstep: 1022.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814
[2024-06-10 03:46:01,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.12 | bwd_microstep: 1716.03 | bwd_inner_microstep: 1716.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-10 03:46:05,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.28 | optimizer_step: 6.60
[2024-06-10 03:46:05,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.46 | bwd_microstep: 3537.45 | bwd_inner_microstep: 1865.72 | bwd_allreduce_microstep: 1671.68 | step_microstep: 39.17
[2024-06-10 03:46:05,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15337.57 | bwd: 42708.53 | bwd_inner: 41035.76 | bwd_allreduce: 1671.99 | step: 41.26
{'loss': 1.3478, 'learning_rate': 3.9443448515159815e-05, 'epoch': 0.1}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 03:46:07,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1463.10 | bwd_inner_microstep: 1462.89 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2398
[2024-06-10 03:46:09,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.62 | bwd_microstep: 1000.15 | bwd_inner_microstep: 1000.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3861
[2024-06-10 03:46:11,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.26 | bwd_microstep: 1661.37 | bwd_inner_microstep: 1661.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 03:46:13,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.03 | bwd_microstep: 1560.21 | bwd_inner_microstep: 1560.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2951
[2024-06-10 03:46:15,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.43 | bwd_microstep: 1105.05 | bwd_inner_microstep: 1105.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 03:46:17,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.92 | bwd_microstep: 1439.58 | bwd_inner_microstep: 1439.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 03:46:19,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.46 | bwd_microstep: 1412.32 | bwd_inner_microstep: 1412.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 03:46:20,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1287.53 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 03:46:22,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.59 | bwd_microstep: 1350.81 | bwd_inner_microstep: 1350.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3619
[2024-06-10 03:46:24,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.01 | bwd_microstep: 1252.03 | bwd_inner_microstep: 1252.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 03:46:26,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.81 | bwd_microstep: 1432.71 | bwd_inner_microstep: 1432.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3729
[2024-06-10 03:46:28,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.40 | bwd_microstep: 1563.76 | bwd_inner_microstep: 1563.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1934
[2024-06-10 03:46:29,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.24 | bwd_microstep: 775.83 | bwd_inner_microstep: 775.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3502
[2024-06-10 03:46:31,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1469.21 | bwd_inner_microstep: 1469.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 03:46:33,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1344.98 | bwd_inner_microstep: 1344.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 03:46:35,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.61 | bwd_microstep: 1625.27 | bwd_inner_microstep: 1625.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 03:46:37,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.34 | bwd_microstep: 1349.43 | bwd_inner_microstep: 1349.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 03:46:39,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.53 | bwd_microstep: 1301.31 | bwd_inner_microstep: 1301.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 03:46:41,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.19 | bwd_microstep: 1528.64 | bwd_inner_microstep: 1528.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3474
[2024-06-10 03:46:43,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.85 | bwd_microstep: 1545.67 | bwd_inner_microstep: 1545.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 03:46:45,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 1471.47 | bwd_inner_microstep: 1471.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-10 03:46:47,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1383.04 | bwd_inner_microstep: 1383.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 03:46:49,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1380.76 | bwd_inner_microstep: 1380.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 03:46:51,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.80 | bwd_microstep: 1283.52 | bwd_inner_microstep: 1283.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 03:46:53,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.49 | bwd_microstep: 1457.45 | bwd_inner_microstep: 1457.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-10 03:46:55,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.33 | bwd_microstep: 1434.96 | bwd_inner_microstep: 1434.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 03:46:56,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.33 | bwd_microstep: 975.74 | bwd_inner_microstep: 975.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 03:46:58,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1298.08 | bwd_inner_microstep: 1298.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-10 03:47:00,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1405.54 | bwd_inner_microstep: 1405.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 03:47:02,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1348.37 | bwd_inner_microstep: 1348.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 03:47:04,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.85 | bwd_microstep: 1647.99 | bwd_inner_microstep: 1647.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771
[2024-06-10 03:47:06,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.25 | optimizer_step: 6.63
[2024-06-10 03:47:06,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.35 | bwd_microstep: 1481.82 | bwd_inner_microstep: 1473.77 | bwd_allreduce_microstep: 8.00 | step_microstep: 38.59
[2024-06-10 03:47:06,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16502.96 | bwd: 44037.75 | bwd_inner: 44028.62 | bwd_allreduce: 8.35 | step: 40.91
{'loss': 1.3216, 'learning_rate': 3.9434621331930536e-05, 'epoch': 0.1}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-10 03:47:08,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.88 | bwd_microstep: 1447.59 | bwd_inner_microstep: 1447.42 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3918
[2024-06-10 03:47:10,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.51 | bwd_microstep: 1490.93 | bwd_inner_microstep: 1490.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 03:47:12,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.84 | bwd_microstep: 1660.96 | bwd_inner_microstep: 1660.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 03:47:15,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.83 | bwd_microstep: 1548.46 | bwd_inner_microstep: 1548.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3466
[2024-06-10 03:47:16,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.78 | bwd_microstep: 1248.20 | bwd_inner_microstep: 1248.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 03:47:18,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.28 | bwd_microstep: 1281.53 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3721
[2024-06-10 03:47:20,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1495.33 | bwd_inner_microstep: 1495.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 03:47:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1393.22 | bwd_inner_microstep: 1393.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 03:47:24,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.95 | bwd_microstep: 1289.03 | bwd_inner_microstep: 1289.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3753
[2024-06-10 03:47:26,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.28 | bwd_microstep: 1426.92 | bwd_inner_microstep: 1426.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1873
[2024-06-10 03:47:27,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.00 | bwd_microstep: 714.67 | bwd_inner_microstep: 714.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 03:47:29,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.50 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 03:47:31,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1346.22 | bwd_inner_microstep: 1346.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 03:47:32,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1350.83 | bwd_inner_microstep: 1350.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2958
[2024-06-10 03:47:34,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.50 | bwd_microstep: 1203.21 | bwd_inner_microstep: 1203.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 03:47:35,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.84 | bwd_microstep: 820.65 | bwd_inner_microstep: 820.49 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.22
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 03:47:37,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.62 | bwd_microstep: 1161.44 | bwd_inner_microstep: 1161.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 03:47:38,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.60 | bwd_microstep: 1191.64 | bwd_inner_microstep: 1191.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 03:47:40,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.35 | bwd_microstep: 1316.12 | bwd_inner_microstep: 1316.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 03:47:42,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.77 | bwd_microstep: 1556.37 | bwd_inner_microstep: 1556.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 03:47:44,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.49 | bwd_microstep: 878.02 | bwd_inner_microstep: 877.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 03:47:45,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.12 | bwd_microstep: 1266.59 | bwd_inner_microstep: 1266.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 03:47:47,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1395.09 | bwd_inner_microstep: 1395.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 03:47:49,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.24 | bwd_microstep: 1255.95 | bwd_inner_microstep: 1255.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 03:47:50,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.72 | bwd_microstep: 700.81 | bwd_inner_microstep: 700.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 629
[2024-06-10 03:47:50,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.81 | bwd_microstep: 264.37 | bwd_inner_microstep: 264.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3813
[2024-06-10 03:47:52,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 1417.87 | bwd_inner_microstep: 1417.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2267
[2024-06-10 03:47:54,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.85 | bwd_microstep: 790.44 | bwd_inner_microstep: 790.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2444
[2024-06-10 03:47:55,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.03 | bwd_microstep: 855.49 | bwd_inner_microstep: 855.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 03:47:57,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.94 | bwd_microstep: 1491.53 | bwd_inner_microstep: 1491.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 03:47:59,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1555.72 | bwd_inner_microstep: 1555.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2375
[2024-06-10 03:48:07,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.79 | optimizer_step: 6.60
[2024-06-10 03:48:07,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.53 | bwd_microstep: 7705.52 | bwd_inner_microstep: 1179.56 | bwd_allreduce_microstep: 6525.82 | step_microstep: 43.39
[2024-06-10 03:48:07,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14773.88 | bwd: 45805.28 | bwd_inner: 39278.13 | bwd_allreduce: 6526.25 | step: 45.65
{'loss': 1.3495, 'learning_rate': 3.942572570006596e-05, 'epoch': 0.1}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493
[2024-06-10 03:48:09,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.17 | bwd_microstep: 1578.12 | bwd_inner_microstep: 1578.05 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 03:48:11,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1242.95 | bwd_inner_microstep: 1242.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 03:48:13,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.74 | bwd_microstep: 1549.15 | bwd_inner_microstep: 1549.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 03:48:15,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1280.32 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 03:48:17,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1279.73 | bwd_inner_microstep: 1279.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1932
[2024-06-10 03:48:18,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.41 | bwd_microstep: 728.68 | bwd_inner_microstep: 728.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 03:48:19,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.04 | bwd_microstep: 1246.19 | bwd_inner_microstep: 1246.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3404
[2024-06-10 03:48:21,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.94 | bwd_microstep: 1216.23 | bwd_inner_microstep: 1216.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 03:48:23,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1387.95 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 03:48:24,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.48 | bwd_microstep: 792.63 | bwd_inner_microstep: 792.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3683
[2024-06-10 03:48:26,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1571.48 | bwd_inner_microstep: 1571.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 03:48:28,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.11 | bwd_microstep: 1483.18 | bwd_inner_microstep: 1483.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 03:48:29,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.56 | bwd_microstep: 789.58 | bwd_inner_microstep: 789.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 03:48:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1341.56 | bwd_inner_microstep: 1341.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 03:48:33,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.67 | bwd_microstep: 1341.60 | bwd_inner_microstep: 1341.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3666
[2024-06-10 03:48:35,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.36 | bwd_microstep: 1361.22 | bwd_inner_microstep: 1361.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2116
[2024-06-10 03:48:36,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.77 | bwd_microstep: 929.39 | bwd_inner_microstep: 929.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 03:48:38,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1353.71 | bwd_inner_microstep: 1353.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3595
[2024-06-10 03:48:40,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.17 | bwd_microstep: 1354.16 | bwd_inner_microstep: 1354.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 03:48:42,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1512.99 | bwd_inner_microstep: 1512.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3537
[2024-06-10 03:48:44,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1545.07 | bwd_inner_microstep: 1545.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 03:48:46,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1296.06 | bwd_inner_microstep: 1296.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2257
[2024-06-10 03:48:47,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.69 | bwd_microstep: 874.49 | bwd_inner_microstep: 874.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538
[2024-06-10 03:48:49,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.24 | bwd_microstep: 1203.98 | bwd_inner_microstep: 1203.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3813
[2024-06-10 03:48:51,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.10 | bwd_microstep: 1723.76 | bwd_inner_microstep: 1723.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 03:48:54,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.53 | bwd_microstep: 1607.47 | bwd_inner_microstep: 1607.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 03:48:56,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.68 | bwd_microstep: 1547.39 | bwd_inner_microstep: 1547.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2059
[2024-06-10 03:48:57,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.50 | bwd_microstep: 724.24 | bwd_inner_microstep: 724.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 03:48:59,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1402.56 | bwd_inner_microstep: 1402.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 03:49:01,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.81 | bwd_microstep: 1547.43 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 03:49:03,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.40 | bwd_microstep: 1298.83 | bwd_inner_microstep: 1298.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1908
[2024-06-10 03:49:07,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 03:49:07,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.30 | bwd_microstep: 4418.87 | bwd_inner_microstep: 817.74 | bwd_allreduce_microstep: 3601.07 | step_microstep: 38.79
[2024-06-10 03:49:07,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15347.04 | bwd: 44530.99 | bwd_inner: 40928.95 | bwd_allreduce: 3601.34 | step: 40.48
{'loss': 1.2959, 'learning_rate': 3.9416761650896456e-05, 'epoch': 0.1}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 03:49:09,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1328.14 | bwd_inner_microstep: 1328.01 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 03:49:11,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1244.89 | bwd_inner_microstep: 1244.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-10 03:49:13,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.61 | bwd_microstep: 1582.60 | bwd_inner_microstep: 1582.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 03:49:15,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.58 | bwd_microstep: 1242.46 | bwd_inner_microstep: 1242.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 03:49:17,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.36 | bwd_microstep: 1282.38 | bwd_inner_microstep: 1282.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 03:49:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.43 | bwd_microstep: 1404.97 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 03:49:20,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1384.01 | bwd_inner_microstep: 1383.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 03:49:22,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1384.45 | bwd_inner_microstep: 1384.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 03:49:24,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.00 | bwd_microstep: 1291.78 | bwd_inner_microstep: 1291.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3428
[2024-06-10 03:49:26,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.57 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2115
[2024-06-10 03:49:27,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.36 | bwd_microstep: 1019.87 | bwd_inner_microstep: 1019.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-10 03:49:29,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.13 | bwd_microstep: 1308.52 | bwd_inner_microstep: 1308.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3921
[2024-06-10 03:49:31,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.88 | bwd_microstep: 1688.45 | bwd_inner_microstep: 1688.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2106
[2024-06-10 03:49:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.36 | bwd_microstep: 735.74 | bwd_inner_microstep: 735.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 03:49:35,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1608.59 | bwd_inner_microstep: 1608.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 03:49:37,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1492.89 | bwd_inner_microstep: 1492.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 03:49:38,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1340.80 | bwd_inner_microstep: 1340.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429
[2024-06-10 03:49:40,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.79 | bwd_microstep: 1154.96 | bwd_inner_microstep: 1154.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 03:49:42,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.49 | bwd_microstep: 1253.86 | bwd_inner_microstep: 1253.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 03:49:43,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.93 | bwd_microstep: 977.48 | bwd_inner_microstep: 977.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 03:49:45,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.70 | bwd_microstep: 1301.00 | bwd_inner_microstep: 1300.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2020
[2024-06-10 03:49:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.79 | bwd_microstep: 840.73 | bwd_inner_microstep: 840.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552
[2024-06-10 03:49:48,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1431.13 | bwd_inner_microstep: 1431.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 03:49:50,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.12 | bwd_microstep: 1313.78 | bwd_inner_microstep: 1313.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3055
[2024-06-10 03:49:51,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 410.74 | bwd_microstep: 1073.88 | bwd_inner_microstep: 1073.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 03:49:53,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.56 | bwd_microstep: 1396.58 | bwd_inner_microstep: 1396.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3557
[2024-06-10 03:49:55,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.93 | bwd_microstep: 1329.76 | bwd_inner_microstep: 1329.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 03:49:57,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.12 | bwd_microstep: 1182.97 | bwd_inner_microstep: 1182.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284
[2024-06-10 03:49:58,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.62 | bwd_microstep: 911.67 | bwd_inner_microstep: 911.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 03:50:00,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1407.65 | bwd_inner_microstep: 1407.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683
[2024-06-10 03:50:02,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.14 | bwd_microstep: 1724.73 | bwd_inner_microstep: 1724.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467
[2024-06-10 03:50:10,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 03:50:10,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.42 | bwd_microstep: 7161.05 | bwd_inner_microstep: 1780.96 | bwd_allreduce_microstep: 5380.04 | step_microstep: 38.72
[2024-06-10 03:50:10,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15589.32 | bwd: 46987.88 | bwd_inner: 41606.82 | bwd_allreduce: 5380.32 | step: 40.38


 10%|█         | 176/1726 [3:06:42<26:17:54, 61.08s/it]
 10%|█         | 177/1726 [3:07:43<26:23:33, 61.34s/it]


 10%|█         | 177/1726 [3:07:43<26:23:33, 61.34s/it]
 10%|█         | 178/1726 [3:08:42<25:59:58, 60.46s/it]


 10%|█         | 178/1726 [3:08:42<25:59:58, 60.46s/it]
 10%|█         | 179/1726 [3:09:43<26:02:38, 60.61s/it]


 10%|█         | 179/1726 [3:09:43<26:02:38, 60.61s/it]
 10%|█         | 180/1726 [3:10:44<26:04:27, 60.72s/it]


 10%|█         | 180/1726 [3:10:44<26:04:27, 60.72s/it]
 10%|█         | 181/1726 [3:11:44<25:59:41, 60.57s/it]


 10%|█         | 181/1726 [3:11:44<25:59:41, 60.57s/it]
 11%|█         | 182/1726 [3:12:47<26:16:46, 61.27s/it]
       {'loss': 1.3726, 'learning_rate': 3.940772921599335e-05, 'epoch': 0.11}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 03:50:11,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.64 | bwd_microstep: 786.55 | bwd_inner_microstep: 786.41 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3935
[2024-06-10 03:50:13,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1495.95 | bwd_inner_microstep: 1495.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 03:50:15,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1271.90 | bwd_inner_microstep: 1271.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945
[2024-06-10 03:50:16,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.07 | bwd_microstep: 726.27 | bwd_inner_microstep: 726.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2786
[2024-06-10 03:50:18,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.72 | bwd_microstep: 1052.96 | bwd_inner_microstep: 1052.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4105
[2024-06-10 03:50:20,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.41 | bwd_microstep: 1734.81 | bwd_inner_microstep: 1734.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 03:50:22,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.09 | bwd_microstep: 1646.96 | bwd_inner_microstep: 1646.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 03:50:24,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1352.35 | bwd_inner_microstep: 1352.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502
[2024-06-10 03:50:26,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.70 | bwd_microstep: 1220.88 | bwd_inner_microstep: 1220.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-10 03:50:28,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.30 | bwd_microstep: 1422.23 | bwd_inner_microstep: 1422.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3772
[2024-06-10 03:50:30,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426
[2024-06-10 03:50:32,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.43 | bwd_microstep: 1281.08 | bwd_inner_microstep: 1281.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 03:50:33,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.77 | bwd_microstep: 1312.22 | bwd_inner_microstep: 1312.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 03:50:36,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.80 | bwd_microstep: 1479.44 | bwd_inner_microstep: 1479.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3649
[2024-06-10 03:50:38,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.29 | bwd_microstep: 1818.00 | bwd_inner_microstep: 1817.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955
[2024-06-10 03:50:39,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.56 | bwd_microstep: 829.34 | bwd_inner_microstep: 829.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3637
[2024-06-10 03:50:41,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.58 | bwd_microstep: 1441.55 | bwd_inner_microstep: 1441.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 03:50:43,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.47 | bwd_microstep: 1485.06 | bwd_inner_microstep: 1485.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 03:50:45,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 1349.44 | bwd_inner_microstep: 1349.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2157
[2024-06-10 03:50:46,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.74 | bwd_microstep: 821.59 | bwd_inner_microstep: 821.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3682
[2024-06-10 03:50:48,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.39 | bwd_microstep: 1455.01 | bwd_inner_microstep: 1454.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3541
[2024-06-10 03:50:50,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.38 | bwd_microstep: 1559.41 | bwd_inner_microstep: 1559.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-10 03:50:52,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.68 | bwd_microstep: 1511.32 | bwd_inner_microstep: 1511.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 03:50:55,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.09 | bwd_microstep: 1660.12 | bwd_inner_microstep: 1660.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 03:50:57,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1385.75 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 03:50:58,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.01 | bwd_microstep: 1256.42 | bwd_inner_microstep: 1256.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-10 03:51:00,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1442.92 | bwd_inner_microstep: 1442.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938
[2024-06-10 03:51:01,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 728.49 | bwd_inner_microstep: 728.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3056
[2024-06-10 03:51:03,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.59 | bwd_microstep: 1328.41 | bwd_inner_microstep: 1328.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 03:51:05,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.45 | bwd_microstep: 1544.47 | bwd_inner_microstep: 1544.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-10 03:51:07,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.00 | bwd_microstep: 1443.61 | bwd_inner_microstep: 1443.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 03:51:14,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 03:51:14,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.25 | bwd_microstep: 6025.90 | bwd_inner_microstep: 1426.55 | bwd_allreduce_microstep: 4599.29 | step_microstep: 38.69
[2024-06-10 03:51:14,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15947.73 | bwd: 47373.14 | bwd_inner: 42772.84 | bwd_allreduce: 4599.56 | step: 40.33
{'loss': 1.3788, 'learning_rate': 3.939862842716884e-05, 'epoch': 0.11}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1991
[2024-06-10 03:51:15,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.70 | bwd_microstep: 854.95 | bwd_inner_microstep: 854.85 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3920
[2024-06-10 03:51:17,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 1452.48 | bwd_inner_microstep: 1452.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 03:51:19,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.05 | bwd_microstep: 1456.60 | bwd_inner_microstep: 1456.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 03:51:21,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1279.43 | bwd_inner_microstep: 1279.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1954
[2024-06-10 03:51:22,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.43 | bwd_microstep: 827.07 | bwd_inner_microstep: 827.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 03:51:23,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.63 | bwd_microstep: 974.24 | bwd_inner_microstep: 974.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 03:51:25,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 03:51:27,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.92 | bwd_microstep: 1285.53 | bwd_inner_microstep: 1285.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 03:51:29,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.64 | bwd_microstep: 1283.45 | bwd_inner_microstep: 1283.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3597
[2024-06-10 03:51:30,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 03:51:32,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.65 | bwd_microstep: 1158.36 | bwd_inner_microstep: 1158.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947
[2024-06-10 03:51:33,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.97 | bwd_microstep: 823.41 | bwd_inner_microstep: 823.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 03:51:35,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1390.31 | bwd_inner_microstep: 1390.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2995
[2024-06-10 03:51:37,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.91 | bwd_microstep: 1108.64 | bwd_inner_microstep: 1108.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3397
[2024-06-10 03:51:38,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.74 | bwd_microstep: 1373.38 | bwd_inner_microstep: 1373.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 03:51:41,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1489.17 | bwd_inner_microstep: 1489.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 03:51:43,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1485.84 | bwd_inner_microstep: 1485.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 946
[2024-06-10 03:51:43,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.68 | bwd_microstep: 380.71 | bwd_inner_microstep: 380.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 03:51:45,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1381.98 | bwd_inner_microstep: 1381.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 03:51:47,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.91 | bwd_microstep: 1649.36 | bwd_inner_microstep: 1649.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 03:51:49,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.30 | bwd_microstep: 1187.50 | bwd_inner_microstep: 1187.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 03:51:51,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.24 | bwd_microstep: 1410.06 | bwd_inner_microstep: 1410.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 03:51:53,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1461.55 | bwd_inner_microstep: 1461.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 03:51:55,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1382.73 | bwd_inner_microstep: 1382.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 03:51:57,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.33 | bwd_microstep: 1441.94 | bwd_inner_microstep: 1441.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3648
[2024-06-10 03:51:59,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.93 | bwd_microstep: 1253.88 | bwd_inner_microstep: 1253.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3783
[2024-06-10 03:52:01,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.80 | bwd_microstep: 1748.51 | bwd_inner_microstep: 1748.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3822
[2024-06-10 03:52:03,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1294.74 | bwd_inner_microstep: 1294.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-10 03:52:04,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.23 | bwd_microstep: 963.91 | bwd_inner_microstep: 963.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3568
[2024-06-10 03:52:06,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.01 | bwd_microstep: 1699.83 | bwd_inner_microstep: 1699.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3807
[2024-06-10 03:52:09,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.56 | bwd_microstep: 1580.51 | bwd_inner_microstep: 1580.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 03:52:15,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.30 | optimizer_step: 6.63
[2024-06-10 03:52:15,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 5525.89 | bwd_inner_microstep: 1445.10 | bwd_allreduce_microstep: 4080.74 | step_microstep: 38.83
[2024-06-10 03:52:15,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15348.46 | bwd: 45100.74 | bwd_inner: 41019.00 | bwd_allreduce: 4081.01 | step: 40.37
{'loss': 1.3427, 'learning_rate': 3.938945931647585e-05, 'epoch': 0.11}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4571
[2024-06-10 03:52:17,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.34 | bwd_microstep: 1741.30 | bwd_inner_microstep: 1741.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3908
[2024-06-10 03:52:19,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1391.10 | bwd_inner_microstep: 1391.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 03:52:21,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1417.93 | bwd_inner_microstep: 1417.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4134
[2024-06-10 03:52:23,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.46 | bwd_microstep: 1641.38 | bwd_inner_microstep: 1641.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399
[2024-06-10 03:52:25,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.11 | bwd_microstep: 1152.32 | bwd_inner_microstep: 1152.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3760
[2024-06-10 03:52:27,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.86 | bwd_microstep: 1470.64 | bwd_inner_microstep: 1470.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 03:52:28,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.82 | bwd_microstep: 1154.07 | bwd_inner_microstep: 1154.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-10 03:52:29,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.28 | bwd_microstep: 686.42 | bwd_inner_microstep: 686.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3840
[2024-06-10 03:52:31,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.66 | bwd_microstep: 1456.96 | bwd_inner_microstep: 1456.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 03:52:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495
[2024-06-10 03:52:35,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.47 | bwd_microstep: 1435.36 | bwd_inner_microstep: 1435.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 03:52:37,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1347.53 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3431
[2024-06-10 03:52:39,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.75 | bwd_microstep: 1472.34 | bwd_inner_microstep: 1472.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666
[2024-06-10 03:52:41,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.75 | bwd_microstep: 1579.50 | bwd_inner_microstep: 1579.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3416
[2024-06-10 03:52:43,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1514.16 | bwd_inner_microstep: 1514.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 03:52:46,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.00 | bwd_microstep: 1583.24 | bwd_inner_microstep: 1583.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275
[2024-06-10 03:52:47,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.00 | bwd_microstep: 875.88 | bwd_inner_microstep: 875.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468
[2024-06-10 03:52:49,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1403.26 | bwd_inner_microstep: 1403.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 03:52:51,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1490.72 | bwd_inner_microstep: 1490.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 03:52:52,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.71 | bwd_microstep: 1160.30 | bwd_inner_microstep: 1160.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-10 03:52:54,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.75 | bwd_microstep: 835.08 | bwd_inner_microstep: 835.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 03:52:56,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1396.17 | bwd_inner_microstep: 1396.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275
[2024-06-10 03:52:57,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.98 | bwd_microstep: 876.95 | bwd_inner_microstep: 876.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2172
[2024-06-10 03:52:58,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.46 | bwd_microstep: 791.51 | bwd_inner_microstep: 791.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182
[2024-06-10 03:52:59,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.22 | bwd_microstep: 860.98 | bwd_inner_microstep: 860.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 03:53:01,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1500.49 | bwd_inner_microstep: 1500.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 03:53:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1375.61 | bwd_inner_microstep: 1375.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 03:53:05,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1346.39 | bwd_inner_microstep: 1346.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 03:53:07,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 03:53:09,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.27 | bwd_microstep: 1391.57 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3576
[2024-06-10 03:53:11,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.37 | bwd_microstep: 1237.79 | bwd_inner_microstep: 1237.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 03:53:28,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 03:53:28,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 16836.73 | bwd_inner_microstep: 1597.35 | bwd_allreduce_microstep: 15239.31 | step_microstep: 39.35
[2024-06-10 03:53:28,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15758.48 | bwd: 57372.23 | bwd_inner: 42131.97 | bwd_allreduce: 15239.55 | step: 40.97
{'loss': 1.329, 'learning_rate': 3.938022191620794e-05, 'epoch': 0.11}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382
[2024-06-10 03:53:30,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.52 | bwd_microstep: 1261.83 | bwd_inner_microstep: 1261.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 03:53:32,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1373.54 | bwd_inner_microstep: 1373.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3894
[2024-06-10 03:53:34,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.97 | bwd_microstep: 1679.41 | bwd_inner_microstep: 1679.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 03:53:36,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.19 | bwd_microstep: 1338.69 | bwd_inner_microstep: 1338.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 03:53:38,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1388.06 | bwd_inner_microstep: 1388.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 03:53:40,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.23 | bwd_microstep: 1549.03 | bwd_inner_microstep: 1549.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3417
[2024-06-10 03:53:42,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.42 | bwd_microstep: 1184.63 | bwd_inner_microstep: 1184.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 03:53:43,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.81 | bwd_microstep: 795.55 | bwd_inner_microstep: 795.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 03:53:45,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1353.08 | bwd_inner_microstep: 1353.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-10 03:53:46,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.34 | bwd_microstep: 688.06 | bwd_inner_microstep: 688.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3974
[2024-06-10 03:53:48,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.31 | bwd_microstep: 1540.40 | bwd_inner_microstep: 1540.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 03:53:49,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.98 | bwd_microstep: 795.30 | bwd_inner_microstep: 795.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 03:53:50,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.23 | bwd_microstep: 794.56 | bwd_inner_microstep: 794.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486
[2024-06-10 03:53:52,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.14 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-10 03:53:54,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.06 | bwd_microstep: 1516.68 | bwd_inner_microstep: 1516.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-10 03:53:56,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.63 | bwd_microstep: 1657.37 | bwd_inner_microstep: 1657.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3516
[2024-06-10 03:53:59,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.39 | bwd_microstep: 1651.62 | bwd_inner_microstep: 1651.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 03:54:00,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.98 | bwd_microstep: 1387.70 | bwd_inner_microstep: 1387.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 03:54:03,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1514.78 | bwd_inner_microstep: 1514.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 03:54:04,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.41 | bwd_microstep: 1289.64 | bwd_inner_microstep: 1289.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 03:54:06,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.15 | bwd_microstep: 1379.12 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3835
[2024-06-10 03:54:08,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.88 | bwd_microstep: 1584.34 | bwd_inner_microstep: 1584.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 03:54:10,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.64 | bwd_microstep: 1403.59 | bwd_inner_microstep: 1403.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2090
[2024-06-10 03:54:12,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.17 | bwd_microstep: 883.47 | bwd_inner_microstep: 883.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3543
[2024-06-10 03:54:14,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.55 | bwd_microstep: 1590.28 | bwd_inner_microstep: 1590.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 03:54:16,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.42 | bwd_microstep: 1402.30 | bwd_inner_microstep: 1402.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3543
[2024-06-10 03:54:18,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.35 | bwd_microstep: 1588.03 | bwd_inner_microstep: 1588.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3449
[2024-06-10 03:54:20,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.64 | bwd_microstep: 1302.89 | bwd_inner_microstep: 1302.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 03:54:22,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1386.19 | bwd_inner_microstep: 1386.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 03:54:24,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.46 | bwd_microstep: 1500.62 | bwd_inner_microstep: 1500.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754
[2024-06-10 03:54:26,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.52 | bwd_microstep: 1441.64 | bwd_inner_microstep: 1441.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3776
[2024-06-10 03:55:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.37 | optimizer_step: 6.59
[2024-06-10 03:55:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 46000.57 | bwd_inner_microstep: 1706.97 | bwd_allreduce_microstep: 44293.53 | step_microstep: 39.64
[2024-06-10 03:55:12,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16159.17 | bwd: 87668.43 | bwd_inner: 43373.96 | bwd_allreduce: 44293.77 | step: 41.22
{'loss': 1.3124, 'learning_rate': 3.93709162588992e-05, 'epoch': 0.11}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3426
[2024-06-10 03:55:14,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.42 | bwd_microstep: 1480.61 | bwd_inner_microstep: 1480.55 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 03:55:16,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1370.69 | bwd_inner_microstep: 1370.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2311
[2024-06-10 03:55:18,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.09 | bwd_microstep: 938.85 | bwd_inner_microstep: 938.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 03:55:20,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.20 | bwd_microstep: 1542.74 | bwd_inner_microstep: 1542.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 03:55:21,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.46 | bwd_microstep: 1240.42 | bwd_inner_microstep: 1240.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 03:55:23,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884
[2024-06-10 03:55:24,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.22 | bwd_microstep: 680.98 | bwd_inner_microstep: 680.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-10 03:55:25,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.72 | bwd_microstep: 775.09 | bwd_inner_microstep: 775.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-10 03:55:27,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.33 | bwd_microstep: 1311.29 | bwd_inner_microstep: 1311.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.29
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977
[2024-06-10 03:55:28,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.47 | bwd_microstep: 887.13 | bwd_inner_microstep: 887.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 03:55:30,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.35 | bwd_microstep: 1376.54 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3666
[2024-06-10 03:55:32,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.12 | bwd_microstep: 1543.38 | bwd_inner_microstep: 1543.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 03:55:34,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.71 | bwd_microstep: 788.59 | bwd_inner_microstep: 788.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681
[2024-06-10 03:55:35,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.78 | bwd_microstep: 1325.38 | bwd_inner_microstep: 1325.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3927
[2024-06-10 03:55:37,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1359.68 | bwd_inner_microstep: 1359.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 03:55:39,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.05 | bwd_microstep: 1488.72 | bwd_inner_microstep: 1488.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3634
[2024-06-10 03:55:41,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.66 | bwd_microstep: 1434.95 | bwd_inner_microstep: 1434.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3448
[2024-06-10 03:55:43,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1375.59 | bwd_inner_microstep: 1375.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2002
[2024-06-10 03:55:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.15 | bwd_microstep: 831.90 | bwd_inner_microstep: 831.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3621
[2024-06-10 03:55:47,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.49 | bwd_microstep: 1702.55 | bwd_inner_microstep: 1702.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 03:55:49,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.00 | bwd_microstep: 1489.91 | bwd_inner_microstep: 1489.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3538
[2024-06-10 03:55:51,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1321.32 | bwd_inner_microstep: 1321.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 03:55:52,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1350.37 | bwd_inner_microstep: 1350.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 03:55:55,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.65 | bwd_microstep: 1531.53 | bwd_inner_microstep: 1531.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 03:55:57,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.60 | bwd_microstep: 1512.63 | bwd_inner_microstep: 1512.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-10 03:55:59,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1647.29 | bwd_inner_microstep: 1647.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-10 03:56:01,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.33 | bwd_microstep: 1302.05 | bwd_inner_microstep: 1302.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 03:56:03,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.84 | bwd_microstep: 1624.49 | bwd_inner_microstep: 1624.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 03:56:05,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1656.60 | bwd_inner_microstep: 1656.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 03:56:07,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.53 | bwd_microstep: 1401.24 | bwd_inner_microstep: 1401.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 03:56:09,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1395.83 | bwd_inner_microstep: 1395.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2409
[2024-06-10 03:56:13,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 03:56:13,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.47 | bwd_microstep: 3349.50 | bwd_inner_microstep: 1176.20 | bwd_allreduce_microstep: 2173.25 | step_microstep: 38.61
[2024-06-10 03:56:13,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15764.35 | bwd: 44435.92 | bwd_inner: 42261.71 | bwd_allreduce: 2173.50 | step: 40.61
{'loss': 1.2955, 'learning_rate': 3.936154237732409e-05, 'epoch': 0.11}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 03:56:15,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1469.76 | bwd_inner_microstep: 1469.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 03:56:17,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 03:56:19,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1495.76 | bwd_inner_microstep: 1495.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3792
[2024-06-10 03:56:21,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.12 | bwd_microstep: 1509.77 | bwd_inner_microstep: 1509.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 03:56:23,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.34 | bwd_microstep: 1384.98 | bwd_inner_microstep: 1384.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 03:56:25,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.69 | bwd_microstep: 1404.82 | bwd_inner_microstep: 1404.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 03:56:26,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.27 | bwd_microstep: 1153.10 | bwd_inner_microstep: 1153.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3550
[2024-06-10 03:56:28,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.79 | bwd_microstep: 1233.33 | bwd_inner_microstep: 1233.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-10 03:56:30,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.35 | bwd_microstep: 1192.83 | bwd_inner_microstep: 1192.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3686
[2024-06-10 03:56:32,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.79 | bwd_microstep: 1580.69 | bwd_inner_microstep: 1580.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 03:56:34,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1376.59 | bwd_inner_microstep: 1376.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-10 03:56:36,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.69 | bwd_microstep: 1376.15 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 03:56:38,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.05 | bwd_microstep: 1620.02 | bwd_inner_microstep: 1620.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687
[2024-06-10 03:56:40,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.32 | bwd_microstep: 1529.02 | bwd_inner_microstep: 1528.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2425
[2024-06-10 03:56:41,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.83 | bwd_microstep: 945.71 | bwd_inner_microstep: 945.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451
[2024-06-10 03:56:43,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.30 | bwd_microstep: 1318.21 | bwd_inner_microstep: 1318.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3631
[2024-06-10 03:56:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.83 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3456
[2024-06-10 03:56:47,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.47 | bwd_microstep: 1321.78 | bwd_inner_microstep: 1321.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 03:56:49,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1399.48 | bwd_inner_microstep: 1399.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 03:56:51,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.89 | bwd_microstep: 1404.07 | bwd_inner_microstep: 1404.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 03:56:53,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 1454.81 | bwd_inner_microstep: 1454.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 03:56:55,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.31 | bwd_microstep: 1286.38 | bwd_inner_microstep: 1286.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3661
[2024-06-10 03:56:56,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1231.34 | bwd_inner_microstep: 1231.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 03:56:58,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.31 | bwd_microstep: 1317.49 | bwd_inner_microstep: 1317.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 03:56:59,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.22 | bwd_microstep: 699.71 | bwd_inner_microstep: 699.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-10 03:57:01,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.05 | bwd_microstep: 1328.91 | bwd_inner_microstep: 1328.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3687
[2024-06-10 03:57:03,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1392.87 | bwd_inner_microstep: 1392.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 03:57:05,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1315.57 | bwd_inner_microstep: 1315.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-10 03:57:07,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.36 | bwd_microstep: 1703.94 | bwd_inner_microstep: 1703.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 03:57:09,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1400.20 | bwd_inner_microstep: 1400.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 03:57:11,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.01 | bwd_microstep: 1638.90 | bwd_inner_microstep: 1638.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 03:57:15,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.18 | optimizer_step: 6.61
[2024-06-10 03:57:15,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 2796.68 | bwd_inner_microstep: 1529.18 | bwd_allreduce_microstep: 1267.45 | step_microstep: 38.48
[2024-06-10 03:57:15,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16426.46 | bwd: 45108.23 | bwd_inner: 43839.85 | bwd_allreduce: 1267.68 | step: 40.23


 11%|█         | 182/1726 [3:12:47<26:16:46, 61.27s/it]
 11%|█         | 183/1726 [3:13:51<26:34:10, 61.99s/it]


 11%|█         | 183/1726 [3:13:51<26:34:10, 61.99s/it]
 11%|█         | 184/1726 [3:14:51<26:23:49, 61.63s/it]


 11%|█         | 184/1726 [3:14:51<26:23:49, 61.63s/it]
 11%|█         | 185/1726 [3:16:05<27:54:01, 65.18s/it]


 11%|█         | 185/1726 [3:16:05<27:54:01, 65.18s/it]
 11%|█         | 186/1726 [3:17:49<32:53:13, 76.88s/it]


 11%|█         | 186/1726 [3:17:49<32:53:13, 76.88s/it]
 11%|█         | 187/1726 [3:18:50<30:46:15, 71.98s/it]


 11%|█         | 187/1726 [3:18:50<30:46:15, 71.98s/it]
 11%|█         | 188/1726 [3:19:51<29:27:25, 68.9{'loss': 1.3153, 'learning_rate': 3.935210030449738e-05, 'epoch': 0.11}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-10 03:57:17,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1402.65 | bwd_inner_microstep: 1402.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-10 03:57:18,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.22 | bwd_microstep: 1184.95 | bwd_inner_microstep: 1184.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3390
[2024-06-10 03:57:20,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.15 | bwd_microstep: 1274.87 | bwd_inner_microstep: 1274.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 03:57:22,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.46 | bwd_microstep: 1344.05 | bwd_inner_microstep: 1344.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293
[2024-06-10 03:57:23,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.81 | bwd_microstep: 880.29 | bwd_inner_microstep: 880.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 03:57:25,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.34 | bwd_microstep: 1150.58 | bwd_inner_microstep: 1150.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 03:57:26,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.11 | bwd_microstep: 799.29 | bwd_inner_microstep: 799.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 03:57:27,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.35 | bwd_microstep: 797.65 | bwd_inner_microstep: 797.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617
[2024-06-10 03:57:29,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1314.81 | bwd_inner_microstep: 1314.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866
[2024-06-10 03:57:30,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.21 | bwd_microstep: 680.10 | bwd_inner_microstep: 680.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3686
[2024-06-10 03:57:32,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.35 | bwd_microstep: 1674.64 | bwd_inner_microstep: 1674.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3511
[2024-06-10 03:57:34,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1518.90 | bwd_inner_microstep: 1518.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3891
[2024-06-10 03:57:37,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.54 | bwd_microstep: 1821.13 | bwd_inner_microstep: 1821.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3487
[2024-06-10 03:57:39,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.86 | bwd_microstep: 1551.78 | bwd_inner_microstep: 1551.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3474
[2024-06-10 03:57:41,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.94 | bwd_microstep: 1441.45 | bwd_inner_microstep: 1441.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3821
[2024-06-10 03:57:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.65 | bwd_microstep: 1693.33 | bwd_inner_microstep: 1693.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 03:57:45,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1392.03 | bwd_inner_microstep: 1392.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 03:57:47,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.08 | bwd_microstep: 1377.12 | bwd_inner_microstep: 1377.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 03:57:49,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.58 | bwd_microstep: 1281.87 | bwd_inner_microstep: 1281.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2162
[2024-06-10 03:57:50,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.91 | bwd_microstep: 855.05 | bwd_inner_microstep: 855.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 03:57:52,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1457.04 | bwd_inner_microstep: 1457.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 03:57:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1394.19 | bwd_inner_microstep: 1394.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 03:57:56,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1329.36 | bwd_inner_microstep: 1329.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-10 03:57:58,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.24 | bwd_microstep: 1582.92 | bwd_inner_microstep: 1582.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2227
[2024-06-10 03:57:59,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.98 | bwd_microstep: 880.02 | bwd_inner_microstep: 880.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425
[2024-06-10 03:58:01,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.69 | bwd_microstep: 1048.45 | bwd_inner_microstep: 1048.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 03:58:03,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1494.73 | bwd_inner_microstep: 1494.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 03:58:05,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1510.00 | bwd_inner_microstep: 1509.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388
[2024-06-10 03:58:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1337.70 | bwd_inner_microstep: 1337.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 03:58:09,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1443.79 | bwd_inner_microstep: 1443.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2763
[2024-06-10 03:58:10,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.91 | bwd_microstep: 1144.34 | bwd_inner_microstep: 1144.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3455
[2024-06-10 03:58:17,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 03:58:17,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.55 | bwd_microstep: 5974.50 | bwd_inner_microstep: 1345.76 | bwd_allreduce_microstep: 4628.69 | step_microstep: 38.63
[2024-06-10 03:58:17,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15465.25 | bwd: 46033.59 | bwd_inner: 41403.97 | bwd_allreduce: 4628.91 | step: 40.24
{'loss': 1.2953, 'learning_rate': 3.9342590073673995e-05, 'epoch': 0.11}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 03:58:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1243.68 | bwd_inner_microstep: 1243.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3913
[2024-06-10 03:58:20,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1587.27 | bwd_inner_microstep: 1587.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 03:58:22,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.95 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3409
[2024-06-10 03:58:24,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.01 | bwd_microstep: 1373.87 | bwd_inner_microstep: 1373.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 03:58:26,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 03:58:28,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1319.43 | bwd_inner_microstep: 1319.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 844
[2024-06-10 03:58:29,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 139.70 | bwd_microstep: 346.83 | bwd_inner_microstep: 346.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 03:58:31,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1385.80 | bwd_inner_microstep: 1385.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 03:58:32,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1288.24 | bwd_inner_microstep: 1288.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618
[2024-06-10 03:58:34,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1441.48 | bwd_inner_microstep: 1441.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3692
[2024-06-10 03:58:36,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.39 | bwd_microstep: 1486.34 | bwd_inner_microstep: 1486.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3695
[2024-06-10 03:58:39,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.36 | bwd_microstep: 1571.59 | bwd_inner_microstep: 1571.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126
[2024-06-10 03:58:40,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.11 | bwd_microstep: 928.18 | bwd_inner_microstep: 928.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 03:58:42,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1389.44 | bwd_inner_microstep: 1389.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 03:58:44,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1382.09 | bwd_inner_microstep: 1382.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 03:58:46,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.73 | bwd_microstep: 1339.85 | bwd_inner_microstep: 1339.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2104
[2024-06-10 03:58:47,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.56 | bwd_microstep: 822.95 | bwd_inner_microstep: 822.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-10 03:58:49,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.88 | bwd_microstep: 1528.89 | bwd_inner_microstep: 1528.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 03:58:51,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.29 | bwd_microstep: 1556.36 | bwd_inner_microstep: 1556.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3834
[2024-06-10 03:58:53,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1521.48 | bwd_inner_microstep: 1521.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 03:58:55,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.01 | bwd_microstep: 1413.28 | bwd_inner_microstep: 1413.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2416
[2024-06-10 03:58:56,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.62 | bwd_microstep: 1035.92 | bwd_inner_microstep: 1035.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-10 03:58:59,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.87 | bwd_microstep: 1552.14 | bwd_inner_microstep: 1552.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-10 03:59:00,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.66 | bwd_microstep: 802.25 | bwd_inner_microstep: 802.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789
[2024-06-10 03:59:02,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.97 | bwd_microstep: 1450.18 | bwd_inner_microstep: 1450.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2033
[2024-06-10 03:59:03,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.99 | bwd_microstep: 716.85 | bwd_inner_microstep: 716.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 03:59:05,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1564.14 | bwd_inner_microstep: 1564.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 03:59:07,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1382.15 | bwd_inner_microstep: 1382.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2927
[2024-06-10 03:59:08,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.43 | bwd_microstep: 1093.99 | bwd_inner_microstep: 1093.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 03:59:11,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.24 | bwd_microstep: 1652.22 | bwd_inner_microstep: 1652.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 03:59:12,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.04 | bwd_microstep: 1343.77 | bwd_inner_microstep: 1343.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-10 03:59:17,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 03:59:17,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.06 | bwd_microstep: 3872.49 | bwd_inner_microstep: 1041.52 | bwd_allreduce_microstep: 2830.91 | step_microstep: 38.64
[2024-06-10 03:59:17,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15469.69 | bwd: 44262.63 | bwd_inner: 41430.79 | bwd_allreduce: 2831.14 | step: 40.23
{'loss': 1.3589, 'learning_rate': 3.9333011718348925e-05, 'epoch': 0.11}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-10 03:59:18,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.91 | bwd_microstep: 1303.41 | bwd_inner_microstep: 1303.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1876
[2024-06-10 03:59:19,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.99 | bwd_microstep: 741.06 | bwd_inner_microstep: 741.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 4383
[2024-06-10 03:59:22,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.02 | bwd_microstep: 1563.30 | bwd_inner_microstep: 1563.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2315
[2024-06-10 03:59:23,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.46 | bwd_microstep: 852.49 | bwd_inner_microstep: 852.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 03:59:25,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.06 | bwd_microstep: 1640.17 | bwd_inner_microstep: 1640.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 03:59:27,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.90 | bwd_microstep: 1354.83 | bwd_inner_microstep: 1354.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 03:59:29,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.61 | bwd_microstep: 1534.65 | bwd_inner_microstep: 1534.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 03:59:31,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1251.02 | bwd_inner_microstep: 1251.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2114
[2024-06-10 03:59:32,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.37 | bwd_microstep: 830.41 | bwd_inner_microstep: 830.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 03:59:34,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1431.06 | bwd_inner_microstep: 1431.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 03:59:36,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.54 | bwd_microstep: 1422.48 | bwd_inner_microstep: 1422.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3466
[2024-06-10 03:59:38,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.35 | bwd_microstep: 1341.37 | bwd_inner_microstep: 1341.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 03:59:40,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 1339.68 | bwd_inner_microstep: 1339.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-10 03:59:41,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1324.61 | bwd_inner_microstep: 1324.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3399
[2024-06-10 03:59:43,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 1295.67 | bwd_inner_microstep: 1295.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 03:59:45,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.23 | bwd_microstep: 1521.70 | bwd_inner_microstep: 1521.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2973
[2024-06-10 03:59:47,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.58 | bwd_microstep: 1168.99 | bwd_inner_microstep: 1168.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440
[2024-06-10 03:59:49,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1376.83 | bwd_inner_microstep: 1376.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-10 03:59:51,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.72 | bwd_microstep: 1414.56 | bwd_inner_microstep: 1414.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 03:59:53,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 1358.56 | bwd_inner_microstep: 1358.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 03:59:55,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1498.23 | bwd_inner_microstep: 1498.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 03:59:56,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.27 | bwd_microstep: 973.46 | bwd_inner_microstep: 973.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 03:59:58,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.94 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 04:00:00,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1357.72 | bwd_inner_microstep: 1357.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 04:00:02,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.87 | bwd_microstep: 1350.17 | bwd_inner_microstep: 1350.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:00:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1554.23 | bwd_inner_microstep: 1554.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 04:00:06,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.72 | bwd_microstep: 1304.61 | bwd_inner_microstep: 1304.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066
[2024-06-10 04:00:07,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.40 | bwd_microstep: 919.53 | bwd_inner_microstep: 919.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2035
[2024-06-10 04:00:08,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.40 | bwd_microstep: 840.68 | bwd_inner_microstep: 840.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3525
[2024-06-10 04:00:10,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.61 | bwd_microstep: 1552.25 | bwd_inner_microstep: 1552.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 04:00:12,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.76 | bwd_microstep: 1480.58 | bwd_inner_microstep: 1480.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 04:00:15,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 04:00:15,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 2049.30 | bwd_inner_microstep: 1512.42 | bwd_allreduce_microstep: 536.83 | step_microstep: 38.32
[2024-06-10 04:00:15,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15675.77 | bwd: 42344.20 | bwd_inner: 41806.46 | bwd_allreduce: 537.05 | step: 39.88
{'loss': 1.3311, 'learning_rate': 3.932336527225709e-05, 'epoch': 0.11}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 04:00:17,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.16 | bwd_microstep: 1593.17 | bwd_inner_microstep: 1593.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3945
[2024-06-10 04:00:20,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.46 | bwd_microstep: 1697.93 | bwd_inner_microstep: 1697.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3894
[2024-06-10 04:00:22,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.97 | bwd_microstep: 1689.99 | bwd_inner_microstep: 1689.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 04:00:24,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1346.47 | bwd_inner_microstep: 1346.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-10 04:00:26,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 1535.84 | bwd_inner_microstep: 1535.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3731
[2024-06-10 04:00:28,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 04:00:30,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1530.47 | bwd_inner_microstep: 1530.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-10 04:00:32,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1530.21 | bwd_inner_microstep: 1530.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 04:00:33,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.34 | bwd_microstep: 792.11 | bwd_inner_microstep: 792.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3659
[2024-06-10 04:00:35,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.59 | bwd_microstep: 1543.19 | bwd_inner_microstep: 1543.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3667
[2024-06-10 04:00:37,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.57 | bwd_microstep: 1453.36 | bwd_inner_microstep: 1453.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3680
[2024-06-10 04:00:40,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.54 | bwd_microstep: 1723.59 | bwd_inner_microstep: 1723.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 04:00:42,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1375.10 | bwd_inner_microstep: 1375.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380
[2024-06-10 04:00:43,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.23 | bwd_microstep: 1336.30 | bwd_inner_microstep: 1336.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1971
[2024-06-10 04:00:45,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.46 | bwd_microstep: 827.08 | bwd_inner_microstep: 827.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-10 04:00:46,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.09 | bwd_microstep: 728.09 | bwd_inner_microstep: 728.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093
[2024-06-10 04:00:47,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.83 | bwd_microstep: 920.05 | bwd_inner_microstep: 920.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 04:00:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 04:00:51,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1393.31 | bwd_inner_microstep: 1393.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 04:00:53,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1557.70 | bwd_inner_microstep: 1557.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 04:00:55,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1394.71 | bwd_inner_microstep: 1394.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 04:00:57,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1401.10 | bwd_inner_microstep: 1401.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 04:00:58,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.61 | bwd_microstep: 802.68 | bwd_inner_microstep: 802.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 04:01:00,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1405.05 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3789
[2024-06-10 04:01:02,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1502.11 | bwd_inner_microstep: 1502.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3029
[2024-06-10 04:01:03,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.57 | bwd_microstep: 1137.84 | bwd_inner_microstep: 1137.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 04:01:06,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.49 | bwd_microstep: 1639.01 | bwd_inner_microstep: 1638.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 04:01:08,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.77 | bwd_microstep: 1654.45 | bwd_inner_microstep: 1654.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 04:01:10,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.44 | bwd_microstep: 1450.28 | bwd_inner_microstep: 1450.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2195
[2024-06-10 04:01:11,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.66 | bwd_microstep: 958.88 | bwd_inner_microstep: 958.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3766
[2024-06-10 04:01:14,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.08 | bwd_microstep: 1743.50 | bwd_inner_microstep: 1743.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 04:01:17,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-10 04:01:17,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 2336.65 | bwd_inner_microstep: 1643.24 | bwd_allreduce_microstep: 693.36 | step_microstep: 38.32
[2024-06-10 04:01:17,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16403.97 | bwd: 44845.75 | bwd_inner: 44151.48 | bwd_allreduce: 693.59 | step: 39.92
{'loss': 1.3152, 'learning_rate': 3.931365076937321e-05, 'epoch': 0.11}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2024
[2024-06-10 04:01:18,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.55 | bwd_microstep: 736.70 | bwd_inner_microstep: 736.58 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3898
[2024-06-10 04:01:20,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1420.21 | bwd_inner_microstep: 1420.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 04:01:22,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.09 | bwd_microstep: 1651.68 | bwd_inner_microstep: 1651.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4172
[2024-06-10 04:01:24,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.71 | bwd_microstep: 1653.54 | bwd_inner_microstep: 1653.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3490
[2024-06-10 04:01:26,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1247.40 | bwd_inner_microstep: 1247.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 04:01:28,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1384.96 | bwd_inner_microstep: 1384.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486
[2024-06-10 04:01:29,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.77 | bwd_microstep: 1219.28 | bwd_inner_microstep: 1219.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407
[2024-06-10 04:01:32,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.65 | bwd_microstep: 2287.29 | bwd_inner_microstep: 2287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 04:01:34,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.55 | bwd_microstep: 1314.72 | bwd_inner_microstep: 1314.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3701
[2024-06-10 04:01:36,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1327.70 | bwd_inner_microstep: 1327.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3928
[2024-06-10 04:01:38,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.05 | bwd_microstep: 1662.19 | bwd_inner_microstep: 1662.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 04:01:40,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1484.17 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 04:01:42,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1395.28 | bwd_inner_microstep: 1395.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1916
[2024-06-10 04:01:43,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.39 | bwd_microstep: 747.93 | bwd_inner_microstep: 747.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 04:01:45,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.73 | bwd_microstep: 1515.50 | bwd_inner_microstep: 1515.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2422
[2024-06-10 04:01:47,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.09 | bwd_microstep: 943.90 | bwd_inner_microstep: 943.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 04:01:49,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1555.66 | bwd_inner_microstep: 1555.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 04:01:51,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1494.92 | bwd_inner_microstep: 1494.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3879
[2024-06-10 04:01:53,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.61 | bwd_microstep: 1615.90 | bwd_inner_microstep: 1615.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 04:01:54,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.67 | bwd_microstep: 802.54 | bwd_inner_microstep: 802.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 04:01:56,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.60 | bwd_microstep: 1188.25 | bwd_inner_microstep: 1188.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 04:01:58,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.54 | bwd_microstep: 1658.88 | bwd_inner_microstep: 1658.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 04:02:00,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.29 | bwd_microstep: 1386.00 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 04:02:02,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.70 | bwd_microstep: 1629.66 | bwd_inner_microstep: 1629.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 04:02:04,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1289.37 | bwd_inner_microstep: 1289.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-10 04:02:06,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.90 | bwd_microstep: 1568.95 | bwd_inner_microstep: 1568.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2045
[2024-06-10 04:02:08,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.59 | bwd_microstep: 1005.42 | bwd_inner_microstep: 1005.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2186
[2024-06-10 04:02:09,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.32 | bwd_microstep: 891.52 | bwd_inner_microstep: 891.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 04:02:11,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.89 | bwd_microstep: 1619.60 | bwd_inner_microstep: 1619.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3811
[2024-06-10 04:02:13,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.86 | bwd_microstep: 1580.71 | bwd_inner_microstep: 1580.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2272
[2024-06-10 04:02:15,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.60 | bwd_microstep: 1070.62 | bwd_inner_microstep: 1070.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3592
[2024-06-10 04:02:20,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.20 | optimizer_step: 6.55
[2024-06-10 04:02:20,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.21 | bwd_microstep: 4363.10 | bwd_inner_microstep: 1891.79 | bwd_allreduce_microstep: 2471.26 | step_microstep: 38.61
[2024-06-10 04:02:20,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16038.36 | bwd: 46713.56 | bwd_inner: 44241.29 | bwd_allreduce: 2471.54 | step: 40.18
{'loss': 1.3468, 'learning_rate': 3.930386824391173e-05, 'epoch': 0.11}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 04:02:21,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1275.90 | bwd_inner_microstep: 1275.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3917
[2024-06-10 04:02:24,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.03 | bwd_microstep: 1589.00 | bwd_inner_microstep: 1588.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 04:02:25,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.73 | bwd_microstep: 1317.19 | bwd_inner_microstep: 1317.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 04:02:28,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1652.55 | bwd_inner_microstep: 1652.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 04:02:30,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.72 | bwd_microstep: 1401.45 | bwd_inner_microstep: 1401.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 04:02:32,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.19 | bwd_microstep: 1534.23 | bwd_inner_microstep: 1534.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 04:02:34,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.95 | bwd_microstep: 1430.90 | bwd_inner_microstep: 1430.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 04:02:36,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.91 | bwd_microstep: 1515.77 | bwd_inner_microstep: 1515.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1972
[2024-06-10 04:02:37,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.40 | bwd_microstep: 892.03 | bwd_inner_microstep: 892.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2169
[2024-06-10 04:02:38,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.81 | bwd_microstep: 981.63 | bwd_inner_microstep: 981.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 04:02:40,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1308.26 | bwd_inner_microstep: 1308.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3489
[2024-06-10 04:02:42,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1552.38 | bwd_inner_microstep: 1552.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 04:02:45,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1587.34 | bwd_inner_microstep: 1587.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2438
[2024-06-10 04:02:46,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.58 | bwd_microstep: 920.95 | bwd_inner_microstep: 920.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 04:02:48,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1404.86 | bwd_inner_microstep: 1404.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1899
[2024-06-10 04:02:49,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.08 | bwd_microstep: 748.44 | bwd_inner_microstep: 748.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 04:02:51,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.57 | bwd_microstep: 1589.62 | bwd_inner_microstep: 1589.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 04:02:53,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.98 | bwd_microstep: 1401.69 | bwd_inner_microstep: 1401.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 04:02:55,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.31 | bwd_microstep: 1661.62 | bwd_inner_microstep: 1661.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 04:02:57,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1434.03 | bwd_inner_microstep: 1434.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 04:02:59,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.86 | bwd_microstep: 1427.05 | bwd_inner_microstep: 1427.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-10 04:03:01,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1285.95 | bwd_inner_microstep: 1285.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-10 04:03:03,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.45 | bwd_microstep: 1327.98 | bwd_inner_microstep: 1327.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2199
[2024-06-10 04:03:04,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.04 | bwd_microstep: 860.10 | bwd_inner_microstep: 860.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 04:03:06,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.73 | bwd_microstep: 1301.69 | bwd_inner_microstep: 1301.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 04:03:08,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.34 | bwd_microstep: 1437.51 | bwd_inner_microstep: 1437.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 04:03:10,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.01 | bwd_microstep: 1501.99 | bwd_inner_microstep: 1501.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 04:03:12,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.05 | bwd_microstep: 1503.89 | bwd_inner_microstep: 1503.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 04:03:14,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1625.20 | bwd_inner_microstep: 1625.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3578
[2024-06-10 04:03:16,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1364.10 | bwd_inner_microstep: 1364.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2048
[2024-06-10 04:03:17,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.71 | bwd_microstep: 871.27 | bwd_inner_microstep: 871.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 04:03:20,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.19 | optimizer_step: 6.63
[2024-06-10 04:03:20,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1974.38 | bwd_inner_microstep: 1578.55 | bwd_allreduce_microstep: 395.78 | step_microstep: 38.50
[2024-06-10 04:03:20,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16138.43 | bwd: 43680.94 | bwd_inner: 43284.25 | bwd_allreduce: 396.01 | step: 40.11
5s/it]


 11%|█         | 188/1726 [3:19:51<29:27:25, 68.95s/it]
 11%|█         | 189/1726 [3:20:53<28:31:36, 66.82s/it]


 11%|█         | 189/1726 [3:20:53<28:31:36, 66.82s/it]
 11%|█         | 190/1726 [3:21:53<27:38:44, 64.79s/it]


 11%|█         | 190/1726 [3:21:53<27:38:44, 64.79s/it]
 11%|█         | 191/1726 [3:22:52<26:48:13, 62.86s/it]


 11%|█         | 191/1726 [3:22:52<26:48:13, 62.86s/it]
 11%|█         | 192/1726 [3:23:53<26:37:27, 62.48s/it]


 11%|█         | 192/1726 [3:23:53<26:37:27, 62.48s/it]
 11%|█         | 193/1726 [3:24:56<26:41:06, 62.67s/it]


 11%|█         | 193/1726 [3:24:56<26:41:06, 62.67s/it]
 11%|█         | 194/1726 [3:25:57<{'loss': 1.299, 'learning_rate': 3.929401773032664e-05, 'epoch': 0.11}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-10 04:03:22,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1302.23 | bwd_inner_microstep: 1302.16 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 04:03:24,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1383.44 | bwd_inner_microstep: 1383.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 04:03:26,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.11 | bwd_microstep: 1462.98 | bwd_inner_microstep: 1462.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 04:03:28,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.72 | bwd_microstep: 1438.55 | bwd_inner_microstep: 1438.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 04:03:29,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1344.69 | bwd_inner_microstep: 1344.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801
[2024-06-10 04:03:31,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1454.24 | bwd_inner_microstep: 1454.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 04:03:34,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.18 | bwd_microstep: 1538.51 | bwd_inner_microstep: 1538.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-10 04:03:35,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.13 | bwd_microstep: 683.24 | bwd_inner_microstep: 683.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772
[2024-06-10 04:03:36,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.03 | bwd_microstep: 1344.77 | bwd_inner_microstep: 1344.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 04:03:38,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.81 | bwd_microstep: 798.60 | bwd_inner_microstep: 798.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 04:03:39,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 1415.51 | bwd_inner_microstep: 1415.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1944
[2024-06-10 04:03:41,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.89 | bwd_microstep: 852.09 | bwd_inner_microstep: 852.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 04:03:42,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1250.04 | bwd_inner_microstep: 1250.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 04:03:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.44 | bwd_microstep: 794.60 | bwd_inner_microstep: 794.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3667
[2024-06-10 04:03:46,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.30 | bwd_microstep: 1581.37 | bwd_inner_microstep: 1581.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 04:03:48,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.93 | bwd_microstep: 1581.05 | bwd_inner_microstep: 1581.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3467
[2024-06-10 04:03:50,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1426.51 | bwd_inner_microstep: 1426.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2109
[2024-06-10 04:03:51,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.16 | bwd_microstep: 730.03 | bwd_inner_microstep: 730.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2048
[2024-06-10 04:03:52,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.29 | bwd_microstep: 718.50 | bwd_inner_microstep: 718.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3506
[2024-06-10 04:03:54,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.95 | bwd_microstep: 1221.58 | bwd_inner_microstep: 1221.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 04:03:55,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1388.52 | bwd_inner_microstep: 1388.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 04:03:57,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1294.99 | bwd_inner_microstep: 1294.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 04:03:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1296.98 | bwd_inner_microstep: 1296.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 04:04:01,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1357.08 | bwd_inner_microstep: 1357.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 04:04:03,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.58 | bwd_microstep: 1516.55 | bwd_inner_microstep: 1516.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 04:04:05,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1608.22 | bwd_inner_microstep: 1608.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-10 04:04:06,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.69 | bwd_microstep: 730.18 | bwd_inner_microstep: 730.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 04:04:08,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.67 | bwd_microstep: 1449.22 | bwd_inner_microstep: 1449.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2664
[2024-06-10 04:04:10,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.48 | bwd_microstep: 1125.44 | bwd_inner_microstep: 1125.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3595
[2024-06-10 04:04:12,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1338.67 | bwd_inner_microstep: 1338.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-10 04:04:14,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.20 | bwd_microstep: 1602.15 | bwd_inner_microstep: 1602.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3385
[2024-06-10 04:04:21,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 04:04:21,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.02 | bwd_microstep: 6230.84 | bwd_inner_microstep: 1554.78 | bwd_allreduce_microstep: 4676.01 | step_microstep: 38.72
[2024-06-10 04:04:21,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15168.20 | bwd: 45261.42 | bwd_inner: 40584.45 | bwd_allreduce: 4676.27 | step: 40.28
{'loss': 1.3028, 'learning_rate': 3.9284099263311416e-05, 'epoch': 0.11}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2503
[2024-06-10 04:04:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1046.89 | bwd_inner_microstep: 1046.75 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4428
[2024-06-10 04:04:24,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.31 | bwd_microstep: 1720.21 | bwd_inner_microstep: 1720.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3979
[2024-06-10 04:04:27,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.54 | bwd_microstep: 1531.56 | bwd_inner_microstep: 1531.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 04:04:28,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.77 | bwd_microstep: 789.87 | bwd_inner_microstep: 789.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 04:04:29,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.67 | bwd_microstep: 695.67 | bwd_inner_microstep: 695.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 04:04:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.75 | bwd_microstep: 1481.75 | bwd_inner_microstep: 1481.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3732
[2024-06-10 04:04:33,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.71 | bwd_microstep: 1335.08 | bwd_inner_microstep: 1335.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 04:04:34,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1246.06 | bwd_inner_microstep: 1246.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 04:04:36,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.81 | bwd_microstep: 1431.60 | bwd_inner_microstep: 1431.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3722
[2024-06-10 04:04:38,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.83 | bwd_microstep: 1479.29 | bwd_inner_microstep: 1479.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3513
[2024-06-10 04:04:40,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.27 | bwd_microstep: 1334.50 | bwd_inner_microstep: 1334.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 04:04:42,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.63 | bwd_microstep: 1543.80 | bwd_inner_microstep: 1543.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 04:04:44,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.68 | bwd_microstep: 1497.59 | bwd_inner_microstep: 1497.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-10 04:04:46,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.50 | bwd_microstep: 1419.10 | bwd_inner_microstep: 1419.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3509
[2024-06-10 04:04:48,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.54 | bwd_microstep: 1556.96 | bwd_inner_microstep: 1556.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-10 04:04:50,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.62 | bwd_microstep: 892.29 | bwd_inner_microstep: 892.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593
[2024-06-10 04:04:52,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.36 | bwd_microstep: 1369.63 | bwd_inner_microstep: 1369.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 04:04:54,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1512.49 | bwd_inner_microstep: 1512.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 04:04:56,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1489.22 | bwd_inner_microstep: 1489.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3675
[2024-06-10 04:04:58,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.91 | bwd_microstep: 1670.05 | bwd_inner_microstep: 1670.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053
[2024-06-10 04:04:59,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.27 | bwd_microstep: 815.50 | bwd_inner_microstep: 815.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 04:05:01,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.97 | bwd_microstep: 1398.48 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3035
[2024-06-10 04:05:03,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.19 | bwd_microstep: 1136.31 | bwd_inner_microstep: 1136.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 04:05:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1554.58 | bwd_inner_microstep: 1554.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 04:05:07,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1517.89 | bwd_inner_microstep: 1517.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 04:05:09,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1486.59 | bwd_inner_microstep: 1486.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3747
[2024-06-10 04:05:11,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-10 04:05:13,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1406.80 | bwd_inner_microstep: 1406.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3790
[2024-06-10 04:05:15,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.26 | bwd_microstep: 1687.94 | bwd_inner_microstep: 1687.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 04:05:17,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1556.36 | bwd_inner_microstep: 1556.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-10 04:05:19,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1458.25 | bwd_inner_microstep: 1458.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555
[2024-06-10 04:05:22,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 04:05:22,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.08 | bwd_microstep: 2614.91 | bwd_inner_microstep: 1490.28 | bwd_allreduce_microstep: 1124.59 | step_microstep: 38.50
[2024-06-10 04:05:22,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16375.50 | bwd: 45054.49 | bwd_inner: 43928.90 | bwd_allreduce: 1124.87 | step: 40.19
{'loss': 1.3233, 'learning_rate': 3.927411287779882e-05, 'epoch': 0.11}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 04:05:24,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1239.80 | bwd_inner_microstep: 1239.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 04:05:25,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 795.37 | bwd_inner_microstep: 795.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 04:05:27,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1250.75 | bwd_inner_microstep: 1250.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3893
[2024-06-10 04:05:29,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.17 | bwd_microstep: 1491.47 | bwd_inner_microstep: 1491.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 04:05:31,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.13 | bwd_microstep: 1490.67 | bwd_inner_microstep: 1490.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 04:05:33,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1342.88 | bwd_inner_microstep: 1342.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798
[2024-06-10 04:05:35,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.47 | bwd_microstep: 1649.76 | bwd_inner_microstep: 1649.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867
[2024-06-10 04:05:36,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.75 | bwd_microstep: 707.87 | bwd_inner_microstep: 707.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 04:05:38,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.28 | bwd_microstep: 1247.23 | bwd_inner_microstep: 1247.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 04:05:40,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1251.06 | bwd_inner_microstep: 1251.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3505
[2024-06-10 04:05:42,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.54 | bwd_microstep: 1447.82 | bwd_inner_microstep: 1447.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3412
[2024-06-10 04:05:43,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.04 | bwd_microstep: 1329.39 | bwd_inner_microstep: 1329.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2869
[2024-06-10 04:05:45,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.05 | bwd_microstep: 1177.50 | bwd_inner_microstep: 1177.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3949
[2024-06-10 04:05:48,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.78 | bwd_microstep: 1793.76 | bwd_inner_microstep: 1793.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3702
[2024-06-10 04:05:50,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.42 | bwd_microstep: 1461.08 | bwd_inner_microstep: 1461.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 04:05:51,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1310.53 | bwd_inner_microstep: 1310.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 04:05:53,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.61 | bwd_microstep: 1182.67 | bwd_inner_microstep: 1182.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 04:05:55,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1295.42 | bwd_inner_microstep: 1295.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 04:05:57,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1417.82 | bwd_inner_microstep: 1417.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 04:05:59,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.03 | bwd_microstep: 1501.42 | bwd_inner_microstep: 1501.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 04:06:01,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1399.73 | bwd_inner_microstep: 1399.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294
[2024-06-10 04:06:02,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.41 | bwd_microstep: 977.79 | bwd_inner_microstep: 977.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 04:06:04,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.10 | bwd_microstep: 1539.74 | bwd_inner_microstep: 1539.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2006
[2024-06-10 04:06:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.98 | bwd_microstep: 739.43 | bwd_inner_microstep: 739.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 04:06:07,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1457.57 | bwd_inner_microstep: 1457.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603
[2024-06-10 04:06:09,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.16 | bwd_microstep: 1216.41 | bwd_inner_microstep: 1216.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 04:06:11,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.30 | bwd_microstep: 1654.55 | bwd_inner_microstep: 1654.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-10 04:06:13,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.67 | bwd_microstep: 1600.63 | bwd_inner_microstep: 1600.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486
[2024-06-10 04:06:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1230.06 | bwd_inner_microstep: 1230.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3765
[2024-06-10 04:06:17,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1375.03 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 04:06:19,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1395.97 | bwd_inner_microstep: 1395.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 04:06:24,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 04:06:24,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 4390.09 | bwd_inner_microstep: 1560.63 | bwd_allreduce_microstep: 2829.41 | step_microstep: 38.51
[2024-06-10 04:06:24,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15909.63 | bwd: 45361.28 | bwd_inner: 42530.96 | bwd_allreduce: 2829.64 | step: 40.05
{'loss': 1.3466, 'learning_rate': 3.9264058608960874e-05, 'epoch': 0.11}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 04:06:26,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1471.26 | bwd_inner_microstep: 1471.19 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3906
[2024-06-10 04:06:28,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1491.05 | bwd_inner_microstep: 1491.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 04:06:30,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.39 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4115
[2024-06-10 04:06:33,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.80 | bwd_microstep: 1635.62 | bwd_inner_microstep: 1635.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 04:06:35,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.56 | bwd_microstep: 1634.41 | bwd_inner_microstep: 1634.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 04:06:37,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.18 | bwd_microstep: 1387.42 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 04:06:38,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 816.29 | bwd_inner_microstep: 816.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 04:06:40,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.65 | bwd_microstep: 1507.36 | bwd_inner_microstep: 1507.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 04:06:42,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.93 | bwd_microstep: 1535.89 | bwd_inner_microstep: 1535.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 04:06:44,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.62 | bwd_microstep: 1586.57 | bwd_inner_microstep: 1586.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509
[2024-06-10 04:06:46,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1446.72 | bwd_inner_microstep: 1446.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 04:06:48,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1382.08 | bwd_inner_microstep: 1382.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3051
[2024-06-10 04:06:50,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.05 | bwd_microstep: 1207.82 | bwd_inner_microstep: 1207.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 04:06:52,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1483.27 | bwd_inner_microstep: 1483.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3523
[2024-06-10 04:06:54,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1548.96 | bwd_inner_microstep: 1548.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 04:06:56,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.41 | bwd_microstep: 1569.97 | bwd_inner_microstep: 1569.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539
[2024-06-10 04:06:58,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.77 | bwd_microstep: 1454.97 | bwd_inner_microstep: 1454.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2183
[2024-06-10 04:06:59,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.03 | bwd_microstep: 890.46 | bwd_inner_microstep: 890.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 04:07:01,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1393.97 | bwd_inner_microstep: 1393.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2127
[2024-06-10 04:07:03,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.87 | bwd_microstep: 929.21 | bwd_inner_microstep: 929.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 04:07:05,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1557.29 | bwd_inner_microstep: 1557.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 04:07:07,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.29 | bwd_microstep: 1354.82 | bwd_inner_microstep: 1354.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237
[2024-06-10 04:07:08,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.55 | bwd_microstep: 868.19 | bwd_inner_microstep: 868.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 04:07:10,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.73 | bwd_microstep: 1404.61 | bwd_inner_microstep: 1404.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 04:07:12,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.50 | bwd_microstep: 1500.13 | bwd_inner_microstep: 1500.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 04:07:14,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1656.72 | bwd_inner_microstep: 1656.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 04:07:16,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.37 | bwd_microstep: 1357.30 | bwd_inner_microstep: 1357.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-10 04:07:18,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.95 | bwd_microstep: 1440.82 | bwd_inner_microstep: 1440.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 04:07:20,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.71 | bwd_microstep: 1651.34 | bwd_inner_microstep: 1651.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 04:07:22,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1351.00 | bwd_inner_microstep: 1350.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-10 04:07:24,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.47 | bwd_microstep: 1158.64 | bwd_inner_microstep: 1158.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 04:07:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 04:07:26,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.33 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1437.27 | bwd_allreduce_microstep: 7.62 | step_microstep: 38.24
[2024-06-10 04:07:26,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16709.32 | bwd: 44772.32 | bwd_inner: 44763.75 | bwd_allreduce: 7.88 | step: 39.87
{'loss': 1.3649, 'learning_rate': 3.925393649220865e-05, 'epoch': 0.11}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 04:07:28,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.54 | bwd_microstep: 1374.03 | bwd_inner_microstep: 1373.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 04:07:30,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1285.92 | bwd_inner_microstep: 1285.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898
[2024-06-10 04:07:32,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.65 | bwd_microstep: 1686.77 | bwd_inner_microstep: 1686.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 04:07:34,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.10 | bwd_microstep: 1567.41 | bwd_inner_microstep: 1567.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2474
[2024-06-10 04:07:35,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.53 | bwd_microstep: 954.02 | bwd_inner_microstep: 954.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 04:07:36,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 791.40 | bwd_inner_microstep: 791.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772
[2024-06-10 04:07:38,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.04 | bwd_microstep: 1345.41 | bwd_inner_microstep: 1345.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895
[2024-06-10 04:07:39,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.67 | bwd_microstep: 691.74 | bwd_inner_microstep: 691.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537
[2024-06-10 04:07:41,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1429.58 | bwd_inner_microstep: 1429.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 04:07:43,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.08 | bwd_microstep: 1276.84 | bwd_inner_microstep: 1276.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 04:07:45,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1346.94 | bwd_inner_microstep: 1346.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3511
[2024-06-10 04:07:47,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.44 | bwd_microstep: 1684.82 | bwd_inner_microstep: 1684.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2394
[2024-06-10 04:07:48,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.35 | bwd_microstep: 934.87 | bwd_inner_microstep: 934.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 04:07:50,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.02 | bwd_microstep: 825.46 | bwd_inner_microstep: 825.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 04:07:51,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.40 | bwd_microstep: 1354.77 | bwd_inner_microstep: 1354.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 04:07:53,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1358.97 | bwd_inner_microstep: 1358.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 04:07:55,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1516.56 | bwd_inner_microstep: 1516.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 04:07:58,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.69 | bwd_microstep: 1631.55 | bwd_inner_microstep: 1631.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 04:07:59,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1288.28 | bwd_inner_microstep: 1288.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-10 04:08:02,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.33 | bwd_microstep: 1614.63 | bwd_inner_microstep: 1614.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 04:08:04,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1437.13 | bwd_inner_microstep: 1437.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-10 04:08:06,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.85 | bwd_microstep: 1588.19 | bwd_inner_microstep: 1588.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-10 04:08:08,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1589.70 | bwd_inner_microstep: 1589.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 04:08:10,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1280.69 | bwd_inner_microstep: 1280.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-10 04:08:11,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.55 | bwd_microstep: 683.00 | bwd_inner_microstep: 682.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3585
[2024-06-10 04:08:13,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1242.15 | bwd_inner_microstep: 1242.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3829
[2024-06-10 04:08:15,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1503.64 | bwd_inner_microstep: 1503.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 04:08:16,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.32 | bwd_microstep: 1340.32 | bwd_inner_microstep: 1340.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 04:08:19,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.20 | bwd_microstep: 1647.63 | bwd_inner_microstep: 1647.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3579
[2024-06-10 04:08:21,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.48 | bwd_microstep: 1464.63 | bwd_inner_microstep: 1464.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 04:08:22,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.09 | bwd_microstep: 973.85 | bwd_inner_microstep: 973.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 04:08:25,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 04:08:25,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.59 | bwd_microstep: 2561.00 | bwd_inner_microstep: 1787.99 | bwd_allreduce_microstep: 772.96 | step_microstep: 38.24
[2024-06-10 04:08:25,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15834.64 | bwd: 43271.91 | bwd_inner: 42498.01 | bwd_allreduce: 773.19 | step: 39.81
{'loss': 1.2949, 'learning_rate': 3.9243746563192184e-05, 'epoch': 0.12}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 04:08:27,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.31 | bwd_microstep: 1438.21 | bwd_inner_microstep: 1438.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 04:08:29,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1282.48 | bwd_inner_microstep: 1282.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 04:08:31,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1380.18 | bwd_inner_microstep: 1380.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 04:08:33,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1553.78 | bwd_inner_microstep: 1553.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481
[2024-06-10 04:08:35,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1413.59 | bwd_inner_microstep: 1413.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-10 04:08:36,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.96 | bwd_microstep: 964.54 | bwd_inner_microstep: 964.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4057
[2024-06-10 04:08:39,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.48 | bwd_microstep: 1554.70 | bwd_inner_microstep: 1554.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181
[2024-06-10 04:08:40,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.42 | bwd_microstep: 857.91 | bwd_inner_microstep: 857.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-10 04:08:41,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.69 | bwd_microstep: 1186.68 | bwd_inner_microstep: 1186.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 04:08:43,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.91 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 04:08:45,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1529.93 | bwd_inner_microstep: 1529.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3719
[2024-06-10 04:08:47,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1497.15 | bwd_inner_microstep: 1497.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4005
[2024-06-10 04:08:50,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.51 | bwd_microstep: 1611.32 | bwd_inner_microstep: 1611.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3506
[2024-06-10 04:08:52,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.64 | bwd_microstep: 1517.29 | bwd_inner_microstep: 1517.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 04:08:54,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1383.56 | bwd_inner_microstep: 1383.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 04:08:56,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1493.32 | bwd_inner_microstep: 1493.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-10 04:08:58,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.75 | bwd_microstep: 1586.58 | bwd_inner_microstep: 1586.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 04:09:00,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.79 | bwd_microstep: 1348.26 | bwd_inner_microstep: 1348.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468
[2024-06-10 04:09:02,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1434.74 | bwd_inner_microstep: 1434.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 04:09:04,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.77 | bwd_microstep: 1581.65 | bwd_inner_microstep: 1581.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 04:09:06,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1401.59 | bwd_inner_microstep: 1401.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1995
[2024-06-10 04:09:07,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.54 | bwd_microstep: 832.27 | bwd_inner_microstep: 832.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 04:09:09,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1419.02 | bwd_inner_microstep: 1418.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 04:09:11,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 1510.36 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 04:09:13,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.86 | bwd_microstep: 1607.24 | bwd_inner_microstep: 1607.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443
[2024-06-10 04:09:15,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.96 | bwd_microstep: 1285.23 | bwd_inner_microstep: 1285.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 04:09:17,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.40 | bwd_microstep: 1535.40 | bwd_inner_microstep: 1535.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 04:09:19,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1511.49 | bwd_inner_microstep: 1511.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3617
[2024-06-10 04:09:21,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.77 | bwd_microstep: 1471.16 | bwd_inner_microstep: 1471.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 04:09:23,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1471.45 | bwd_inner_microstep: 1471.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3456
[2024-06-10 04:09:25,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1515.17 | bwd_inner_microstep: 1515.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 04:09:27,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 04:09:27,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.70 | bwd_microstep: 1412.07 | bwd_inner_microstep: 1389.28 | bwd_allreduce_microstep: 22.74 | step_microstep: 38.26
[2024-06-10 04:09:27,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16769.64 | bwd: 44977.89 | bwd_inner: 44954.23 | bwd_allreduce: 22.96 | step: 39.83
26:20:57, 61.92s/it]


 11%|█         | 194/1726 [3:25:57<26:20:57, 61.92s/it]
 11%|█▏        | 195/1726 [3:26:57<26:11:05, 61.57s/it]


 11%|█▏        | 195/1726 [3:26:57<26:11:05, 61.57s/it]
 11%|█▏        | 196/1726 [3:27:59<26:11:34, 61.63s/it]


 11%|█▏        | 196/1726 [3:27:59<26:11:34, 61.63s/it]
 11%|█▏        | 197/1726 [3:29:01<26:10:21, 61.62s/it]


 11%|█▏        | 197/1726 [3:29:01<26:10:21, 61.62s/it]
 11%|█▏        | 198/1726 [3:30:03<26:10:52, 61.68s/it]


 11%|█▏        | 198/1726 [3:30:03<26:10:52, 61.68s/it]
 12%|█▏        | 199/1726 [3:31:02<25:52:43, 61.01s/it]


 12%|█▏        | 199/1726 [3:31:02<25:52:43, 61.01s/it]
 12{'loss': 1.3381, 'learning_rate': 3.923348885780037e-05, 'epoch': 0.12}
%|█▏        | 200/1726 [3:32:04<26:00:00, 61.34s/it]


 12%|█▏        | 200/1726 [3:32:04<26:00:00, 61.34s/it][INFO|trainer.py:2936] 2024-06-10 04:09:31,266 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200
[INFO|configuration_utils.py:473] 2024-06-10 04:09:31,270 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/config.json
[INFO|configuration_utils.py:594] 2024-06-10 04:09:31,272 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-10 04:09:40,155 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-10 04:09:40,183 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-10 04:09:40,185 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-10 04:09:40,185 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/added_tokens.json
[2024-06-10 04:09:40,442] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step200 is about to be saved!
[2024-06-10 04:09:40,452] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/mp_rank_00_model_states.pt
[2024-06-10 04:09:40,453] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/mp_rank_00_model_states.pt...
[2024-06-10 04:09:49,901] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/mp_rank_00_model_states.pt.
[2024-06-10 04:09:49,912] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-10 04:10:02,919] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-10 04:10:02,926] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-10 04:10:02,926] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step200 is ready now!
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 04:10:05,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.97 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 04:10:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.14 | bwd_microstep: 1272.03 | bwd_inner_microstep: 1272.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3853
[2024-06-10 04:10:08,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.74 | bwd_microstep: 1387.38 | bwd_inner_microstep: 1387.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 04:10:10,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.05 | bwd_microstep: 1544.19 | bwd_inner_microstep: 1544.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-10 04:10:12,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.16 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 04:10:14,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.27 | bwd_microstep: 1243.25 | bwd_inner_microstep: 1243.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 04:10:16,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.34 | bwd_microstep: 1249.24 | bwd_inner_microstep: 1249.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 04:10:17,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.02 | bwd_microstep: 790.22 | bwd_inner_microstep: 790.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1888
[2024-06-10 04:10:18,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.62 | bwd_microstep: 714.37 | bwd_inner_microstep: 714.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 04:10:20,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.14 | bwd_microstep: 1390.53 | bwd_inner_microstep: 1390.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509
[2024-06-10 04:10:22,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1445.12 | bwd_inner_microstep: 1445.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 04:10:24,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 04:10:26,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1447.29 | bwd_inner_microstep: 1447.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3460
[2024-06-10 04:10:28,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.63 | bwd_microstep: 1434.05 | bwd_inner_microstep: 1434.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386
[2024-06-10 04:10:30,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1432.28 | bwd_inner_microstep: 1432.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-10 04:10:32,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.62 | bwd_microstep: 1582.91 | bwd_inner_microstep: 1582.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2038
[2024-06-10 04:10:33,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.92 | bwd_microstep: 717.66 | bwd_inner_microstep: 717.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3636
[2024-06-10 04:10:35,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.46 | bwd_microstep: 1468.68 | bwd_inner_microstep: 1468.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651
[2024-06-10 04:10:37,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1425.99 | bwd_inner_microstep: 1425.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828
[2024-06-10 04:10:39,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.74 | bwd_microstep: 1586.81 | bwd_inner_microstep: 1586.36 | bwd_allreduce_microstep: 0.23 | step_microstep: 0.33
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3551
[2024-06-10 04:10:41,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.51 | bwd_microstep: 1336.69 | bwd_inner_microstep: 1336.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-10 04:10:43,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1535.25 | bwd_inner_microstep: 1535.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-10 04:10:44,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.30 | bwd_microstep: 705.35 | bwd_inner_microstep: 705.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-10 04:10:45,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.91 | bwd_microstep: 703.12 | bwd_inner_microstep: 703.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 04:10:47,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1419.01 | bwd_inner_microstep: 1418.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2028
[2024-06-10 04:10:48,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.56 | bwd_microstep: 842.34 | bwd_inner_microstep: 842.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 04:10:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.13 | bwd_microstep: 1567.15 | bwd_inner_microstep: 1567.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016
[2024-06-10 04:10:51,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.95 | bwd_microstep: 712.29 | bwd_inner_microstep: 712.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 04:10:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.84 | bwd_microstep: 1752.79 | bwd_inner_microstep: 1752.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 04:10:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.46 | bwd_microstep: 1601.34 | bwd_inner_microstep: 1601.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2261
[2024-06-10 04:10:58,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.32 | bwd_microstep: 1070.21 | bwd_inner_microstep: 1070.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2233
[2024-06-10 04:11:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 04:11:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.89 | bwd_microstep: 5621.63 | bwd_inner_microstep: 1207.06 | bwd_allreduce_microstep: 4414.50 | step_microstep: 39.45
[2024-06-10 04:11:04,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.62 | bwd: 45416.79 | bwd_inner: 41000.77 | bwd_allreduce: 4415.12 | step: 41.57
{'loss': 1.3679, 'learning_rate': 3.9223163412160784e-05, 'epoch': 0.12}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1862
[2024-06-10 04:11:05,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.84 | bwd_microstep: 762.63 | bwd_inner_microstep: 762.47 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 04:11:06,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.71 | bwd_microstep: 676.86 | bwd_inner_microstep: 676.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-10 04:11:08,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.36 | bwd_microstep: 1408.27 | bwd_inner_microstep: 1408.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764
[2024-06-10 04:11:10,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1436.28 | bwd_inner_microstep: 1436.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398
[2024-06-10 04:11:11,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1372.50 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 04:11:13,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.02 | bwd_microstep: 1400.43 | bwd_inner_microstep: 1400.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 04:11:15,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.40 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:11:17,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 04:11:19,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.31 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 04:11:21,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.81 | bwd_microstep: 1396.81 | bwd_inner_microstep: 1396.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 04:11:23,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 1325.18 | bwd_inner_microstep: 1325.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3683
[2024-06-10 04:11:25,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.28 | bwd_microstep: 1374.87 | bwd_inner_microstep: 1374.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1943
[2024-06-10 04:11:26,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.61 | bwd_microstep: 824.23 | bwd_inner_microstep: 824.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1907
[2024-06-10 04:11:27,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.78 | bwd_microstep: 813.16 | bwd_inner_microstep: 813.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522
[2024-06-10 04:11:29,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1456.89 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 04:11:31,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1379.94 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823
[2024-06-10 04:11:33,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.39 | bwd_microstep: 1505.61 | bwd_inner_microstep: 1505.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 04:11:35,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.23 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 04:11:37,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.07 | bwd_microstep: 1348.46 | bwd_inner_microstep: 1348.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 04:11:39,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.18 | bwd_microstep: 1490.73 | bwd_inner_microstep: 1490.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 04:11:41,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126
[2024-06-10 04:11:42,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.17 | bwd_microstep: 929.71 | bwd_inner_microstep: 929.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597
[2024-06-10 04:11:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.90 | bwd_microstep: 1574.04 | bwd_inner_microstep: 1574.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 04:11:46,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.79 | bwd_microstep: 1162.32 | bwd_inner_microstep: 1162.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-10 04:11:48,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1321.78 | bwd_inner_microstep: 1321.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 04:11:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1354.24 | bwd_inner_microstep: 1354.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899
[2024-06-10 04:11:51,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.14 | bwd_microstep: 715.14 | bwd_inner_microstep: 715.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 04:11:52,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.51 | bwd_microstep: 1309.86 | bwd_inner_microstep: 1309.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729
[2024-06-10 04:11:54,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1370.21 | bwd_inner_microstep: 1370.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 04:11:56,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1279.36 | bwd_inner_microstep: 1279.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 04:11:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.86 | bwd_microstep: 1449.50 | bwd_inner_microstep: 1449.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 04:12:05,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 04:12:05,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 6623.09 | bwd_inner_microstep: 1559.45 | bwd_allreduce_microstep: 5063.57 | step_microstep: 39.91
[2024-06-10 04:12:05,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15350.63 | bwd: 45972.87 | bwd_inner: 40908.23 | bwd_allreduce: 5063.88 | step: 41.81
{'loss': 1.3516, 'learning_rate': 3.921277026263959e-05, 'epoch': 0.12}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 04:12:07,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.08 | bwd_microstep: 1465.62 | bwd_inner_microstep: 1465.47 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2325
[2024-06-10 04:12:09,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.09 | bwd_microstep: 916.72 | bwd_inner_microstep: 916.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451
[2024-06-10 04:12:11,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1412.11 | bwd_inner_microstep: 1412.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 04:12:12,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.36 | bwd_microstep: 1384.36 | bwd_inner_microstep: 1384.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 04:12:15,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.76 | bwd_microstep: 1642.73 | bwd_inner_microstep: 1642.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 04:12:16,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.04 | bwd_microstep: 1282.17 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2888
[2024-06-10 04:12:18,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.93 | bwd_microstep: 1058.39 | bwd_inner_microstep: 1058.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 04:12:20,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1528.27 | bwd_inner_microstep: 1528.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 04:12:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.02 | bwd_microstep: 1298.86 | bwd_inner_microstep: 1298.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3607
[2024-06-10 04:12:24,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.89 | bwd_microstep: 1489.09 | bwd_inner_microstep: 1489.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672
[2024-06-10 04:12:26,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.05 | bwd_microstep: 1619.31 | bwd_inner_microstep: 1619.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 04:12:28,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1294.66 | bwd_inner_microstep: 1294.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2372
[2024-06-10 04:12:29,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.10 | bwd_microstep: 1095.27 | bwd_inner_microstep: 1095.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637
[2024-06-10 04:12:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 1318.37 | bwd_inner_microstep: 1318.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3645
[2024-06-10 04:12:33,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1281.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 04:12:35,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1509.48 | bwd_inner_microstep: 1509.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3670
[2024-06-10 04:12:37,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.36 | bwd_microstep: 1477.00 | bwd_inner_microstep: 1476.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832
[2024-06-10 04:12:39,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1389.67 | bwd_inner_microstep: 1389.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 04:12:41,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.61 | bwd_microstep: 1388.29 | bwd_inner_microstep: 1388.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 04:12:43,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.41 | bwd_microstep: 1161.22 | bwd_inner_microstep: 1161.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2295
[2024-06-10 04:12:44,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.59 | bwd_microstep: 817.71 | bwd_inner_microstep: 817.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 04:12:46,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1286.18 | bwd_inner_microstep: 1286.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1981
[2024-06-10 04:12:47,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.56 | bwd_microstep: 705.97 | bwd_inner_microstep: 705.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068
[2024-06-10 04:12:48,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.43 | bwd_microstep: 917.08 | bwd_inner_microstep: 917.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 04:12:50,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.16 | bwd_microstep: 1461.24 | bwd_inner_microstep: 1461.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 04:12:52,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1453.05 | bwd_inner_microstep: 1453.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3825
[2024-06-10 04:12:54,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.57 | bwd_microstep: 1619.14 | bwd_inner_microstep: 1619.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219
[2024-06-10 04:12:55,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.34 | bwd_microstep: 865.39 | bwd_inner_microstep: 865.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 04:12:57,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.67 | bwd_microstep: 1602.33 | bwd_inner_microstep: 1602.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 04:12:59,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.60 | bwd_microstep: 790.26 | bwd_inner_microstep: 790.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 04:13:01,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.78 | bwd_microstep: 1644.48 | bwd_inner_microstep: 1644.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 04:13:06,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 04:13:06,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 4319.28 | bwd_inner_microstep: 1639.48 | bwd_allreduce_microstep: 2679.75 | step_microstep: 39.65
[2024-06-10 04:13:06,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15627.53 | bwd: 44495.73 | bwd_inner: 41814.95 | bwd_allreduce: 2680.05 | step: 41.35
{'loss': 1.3206, 'learning_rate': 3.920230944584141e-05, 'epoch': 0.12}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 04:13:08,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.36 | bwd_microstep: 1331.13 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3979
[2024-06-10 04:13:10,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.24 | bwd_microstep: 1502.37 | bwd_inner_microstep: 1502.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 04:13:11,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1242.74 | bwd_inner_microstep: 1242.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 04:13:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1451.85 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4180
[2024-06-10 04:13:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.74 | bwd_microstep: 1551.01 | bwd_inner_microstep: 1550.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 04:13:17,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.58 | bwd_microstep: 1186.99 | bwd_inner_microstep: 1186.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 04:13:19,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1301.92 | bwd_inner_microstep: 1301.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 04:13:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 1433.00 | bwd_inner_microstep: 1432.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 04:13:23,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.13 | bwd_microstep: 1187.96 | bwd_inner_microstep: 1187.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 04:13:24,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.68 | bwd_microstep: 1153.26 | bwd_inner_microstep: 1153.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 04:13:26,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.55 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 04:13:28,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.50 | bwd_inner_microstep: 1253.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 04:13:30,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.85 | bwd_microstep: 1521.24 | bwd_inner_microstep: 1521.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 04:13:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1289.05 | bwd_inner_microstep: 1289.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1264
[2024-06-10 04:13:32,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 197.35 | bwd_microstep: 519.17 | bwd_inner_microstep: 519.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2603
[2024-06-10 04:13:34,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.39 | bwd_microstep: 1060.20 | bwd_inner_microstep: 1060.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 04:13:36,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1528.57 | bwd_inner_microstep: 1528.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 04:13:38,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.75 | bwd_microstep: 1398.87 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-10 04:13:39,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.45 | bwd_microstep: 697.80 | bwd_inner_microstep: 697.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 04:13:41,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.52 | bwd_microstep: 1663.24 | bwd_inner_microstep: 1663.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 04:13:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.18 | bwd_microstep: 1184.07 | bwd_inner_microstep: 1184.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142
[2024-06-10 04:13:44,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.00 | bwd_microstep: 835.36 | bwd_inner_microstep: 835.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 04:13:46,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.72 | bwd_microstep: 1662.47 | bwd_inner_microstep: 1662.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 04:13:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1310.76 | bwd_inner_microstep: 1310.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 4012
[2024-06-10 04:13:51,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 677.16 | bwd_microstep: 1857.79 | bwd_inner_microstep: 1857.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3430
[2024-06-10 04:13:53,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.93 | bwd_microstep: 1377.47 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 04:13:54,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1257.66 | bwd_inner_microstep: 1257.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 04:13:57,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1656.24 | bwd_inner_microstep: 1656.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3433
[2024-06-10 04:13:59,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1393.85 | bwd_inner_microstep: 1393.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3780
[2024-06-10 04:14:01,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1495.77 | bwd_inner_microstep: 1495.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3543
[2024-06-10 04:14:03,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.68 | bwd_microstep: 1537.64 | bwd_inner_microstep: 1537.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3413
[2024-06-10 04:14:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 04:14:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.59 | bwd_microstep: 3623.60 | bwd_inner_microstep: 1717.10 | bwd_allreduce_microstep: 1906.44 | step_microstep: 38.79
[2024-06-10 04:14:07,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16041.89 | bwd: 44810.04 | bwd_inner: 42902.67 | bwd_allreduce: 1906.67 | step: 40.48
{'loss': 1.3946, 'learning_rate': 3.919178099860918e-05, 'epoch': 0.12}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 04:14:09,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.43 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2399
[2024-06-10 04:14:10,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.66 | bwd_microstep: 1000.37 | bwd_inner_microstep: 1000.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3855
[2024-06-10 04:14:12,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.19 | bwd_microstep: 1457.47 | bwd_inner_microstep: 1457.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 04:14:14,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1378.84 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 04:14:16,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1252.60 | bwd_inner_microstep: 1252.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 04:14:18,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 04:14:20,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1241.23 | bwd_inner_microstep: 1241.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 04:14:21,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 792.08 | bwd_inner_microstep: 792.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:14:23,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.70 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 04:14:24,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-10 04:14:25,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.34 | bwd_microstep: 698.97 | bwd_inner_microstep: 698.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 04:14:27,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.03 | bwd_microstep: 1313.14 | bwd_inner_microstep: 1313.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3834
[2024-06-10 04:14:29,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.07 | bwd_microstep: 1522.02 | bwd_inner_microstep: 1522.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3274
[2024-06-10 04:14:31,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.26 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1892
[2024-06-10 04:14:32,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 804.69 | bwd_inner_microstep: 804.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3634
[2024-06-10 04:14:35,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1572.79 | bwd_inner_microstep: 1572.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-10 04:14:36,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.68 | bwd_microstep: 1405.13 | bwd_inner_microstep: 1405.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2301
[2024-06-10 04:14:38,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.25 | bwd_microstep: 882.83 | bwd_inner_microstep: 882.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556
[2024-06-10 04:14:40,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.51 | bwd_microstep: 1429.57 | bwd_inner_microstep: 1429.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2389
[2024-06-10 04:14:41,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.06 | bwd_microstep: 845.24 | bwd_inner_microstep: 845.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 04:14:43,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.02 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 04:14:44,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.36 | bwd_microstep: 706.60 | bwd_inner_microstep: 706.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 04:14:46,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.23 | bwd_microstep: 1657.29 | bwd_inner_microstep: 1657.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 04:14:48,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1493.77 | bwd_inner_microstep: 1493.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 04:14:50,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1495.41 | bwd_inner_microstep: 1495.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 04:14:52,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1461.69 | bwd_inner_microstep: 1461.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 04:14:55,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1630.49 | bwd_inner_microstep: 1630.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 04:14:57,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1604.49 | bwd_inner_microstep: 1604.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 04:14:59,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.53 | bwd_microstep: 1600.93 | bwd_inner_microstep: 1600.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 04:15:00,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.48 | bwd_microstep: 810.26 | bwd_inner_microstep: 810.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3610
[2024-06-10 04:15:02,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1436.52 | bwd_inner_microstep: 1436.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 04:15:08,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.31 | optimizer_step: 6.61
[2024-06-10 04:15:08,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.12 | bwd_microstep: 5502.22 | bwd_inner_microstep: 1575.42 | bwd_allreduce_microstep: 3926.73 | step_microstep: 39.37
[2024-06-10 04:15:08,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15481.47 | bwd: 45373.28 | bwd_inner: 41445.63 | bwd_allreduce: 3926.97 | step: 41.00
{'loss': 1.2976, 'learning_rate': 3.9181184958024045e-05, 'epoch': 0.12}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3445
[2024-06-10 04:15:10,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.29 | bwd_microstep: 1303.01 | bwd_inner_microstep: 1302.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2928
[2024-06-10 04:15:12,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.70 | bwd_microstep: 1133.38 | bwd_inner_microstep: 1133.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-10 04:15:14,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.47 | bwd_microstep: 1639.25 | bwd_inner_microstep: 1639.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 04:15:15,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 04:15:18,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1537.15 | bwd_inner_microstep: 1537.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 04:15:19,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.99 | bwd_microstep: 793.75 | bwd_inner_microstep: 793.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3757
[2024-06-10 04:15:21,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1404.75 | bwd_inner_microstep: 1404.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 04:15:23,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.20 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 04:15:25,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.43 | bwd_microstep: 1393.82 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-10 04:15:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.26 | bwd_microstep: 713.75 | bwd_inner_microstep: 713.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491
[2024-06-10 04:15:28,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.59 | bwd_microstep: 1442.28 | bwd_inner_microstep: 1442.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 04:15:30,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1500.65 | bwd_inner_microstep: 1500.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2090
[2024-06-10 04:15:31,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.61 | bwd_microstep: 944.86 | bwd_inner_microstep: 944.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 04:15:33,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.08 | bwd_microstep: 1338.71 | bwd_inner_microstep: 1338.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1951
[2024-06-10 04:15:34,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.77 | bwd_microstep: 825.12 | bwd_inner_microstep: 825.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 04:15:36,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.55 | bwd_microstep: 1289.38 | bwd_inner_microstep: 1289.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 04:15:37,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1257.52 | bwd_inner_microstep: 1257.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1993
[2024-06-10 04:15:38,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.20 | bwd_microstep: 737.94 | bwd_inner_microstep: 737.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 04:15:40,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1397.38 | bwd_inner_microstep: 1397.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-10 04:15:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.14 | bwd_microstep: 1309.46 | bwd_inner_microstep: 1309.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077
[2024-06-10 04:15:43,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.22 | bwd_microstep: 915.45 | bwd_inner_microstep: 915.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 04:15:45,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1388.15 | bwd_inner_microstep: 1388.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 04:15:47,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.92 | bwd_microstep: 1432.02 | bwd_inner_microstep: 1432.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 04:15:49,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.67 | bwd_microstep: 1315.93 | bwd_inner_microstep: 1315.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2020
[2024-06-10 04:15:50,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.93 | bwd_microstep: 853.64 | bwd_inner_microstep: 853.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 04:15:52,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.93 | bwd_microstep: 1498.13 | bwd_inner_microstep: 1498.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060
[2024-06-10 04:15:54,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.94 | bwd_microstep: 847.42 | bwd_inner_microstep: 847.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 04:15:56,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 04:15:58,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.37 | bwd_microstep: 1620.22 | bwd_inner_microstep: 1620.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2270
[2024-06-10 04:15:59,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.56 | bwd_microstep: 935.20 | bwd_inner_microstep: 935.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-10 04:16:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.54 | bwd_microstep: 1244.10 | bwd_inner_microstep: 1244.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 04:16:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.35 | optimizer_step: 6.58
[2024-06-10 04:16:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 8140.05 | bwd_inner_microstep: 1753.47 | bwd_allreduce_microstep: 6386.51 | step_microstep: 39.65
[2024-06-10 04:16:10,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14870.37 | bwd: 46218.56 | bwd_inner: 39831.08 | bwd_allreduce: 6386.78 | step: 41.30
{'loss': 1.2942, 'learning_rate': 3.9170521361405206e-05, 'epoch': 0.12}

 12%|█▏        | 201/1726 [3:33:40<30:24:55, 71.80s/it]


 12%|█▏        | 201/1726 [3:33:40<30:24:55, 71.80s/it]
 12%|█▏        | 202/1726 [3:34:42<29:06:37, 68.76s/it]


 12%|█▏        | 202/1726 [3:34:42<29:06:37, 68.76s/it]
 12%|█▏        | 203/1726 [3:35:42<28:02:19, 66.28s/it]


 12%|█▏        | 203/1726 [3:35:42<28:02:19, 66.28s/it]
 12%|█▏        | 204/1726 [3:36:44<27:22:36, 64.75s/it]


 12%|█▏        | 204/1726 [3:36:44<27:22:36, 64.75s/it]
 12%|█▏        | 205/1726 [3:37:45<26:54:30, 63.69s/it]


 12%|█▏        | 205/1726 [3:37:45<26:54:30, 63.69s/it]
 12%|█▏        | 206/1726 [3:38:46<26:36:17, 63.01s/it]


 12%|█▏        | dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 04:16:11,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1342.14 | bwd_inner_microstep: 1342.05 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2403
[2024-06-10 04:16:13,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.33 | bwd_microstep: 998.92 | bwd_inner_microstep: 998.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3915
[2024-06-10 04:16:15,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1494.89 | bwd_inner_microstep: 1494.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2260
[2024-06-10 04:16:16,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.29 | bwd_microstep: 871.55 | bwd_inner_microstep: 871.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 04:16:18,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1541.75 | bwd_inner_microstep: 1541.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-10 04:16:20,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.27 | bwd_microstep: 1640.38 | bwd_inner_microstep: 1640.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 04:16:22,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1249.35 | bwd_inner_microstep: 1249.25 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 04:16:24,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1252.44 | bwd_inner_microstep: 1252.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 04:16:26,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 04:16:28,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.33 | bwd_microstep: 1382.44 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3494
[2024-06-10 04:16:30,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.56 | bwd_microstep: 1466.69 | bwd_inner_microstep: 1466.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489
[2024-06-10 04:16:31,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1316.44 | bwd_inner_microstep: 1316.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 04:16:33,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1350.21 | bwd_inner_microstep: 1350.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3681
[2024-06-10 04:16:36,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.60 | bwd_microstep: 1667.47 | bwd_inner_microstep: 1667.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3387
[2024-06-10 04:16:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.62 | bwd_microstep: 1177.34 | bwd_inner_microstep: 1177.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493
[2024-06-10 04:16:39,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1440.90 | bwd_inner_microstep: 1440.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 04:16:41,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.39 | bwd_microstep: 1289.25 | bwd_inner_microstep: 1289.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 04:16:43,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1492.23 | bwd_inner_microstep: 1492.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3617
[2024-06-10 04:16:46,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.19 | bwd_microstep: 1812.94 | bwd_inner_microstep: 1812.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3530
[2024-06-10 04:16:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1373.18 | bwd_inner_microstep: 1373.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 04:16:49,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1281.71 | bwd_inner_microstep: 1281.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 04:16:51,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 04:16:53,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.04 | bwd_microstep: 1458.53 | bwd_inner_microstep: 1458.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997
[2024-06-10 04:16:54,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.60 | bwd_microstep: 737.63 | bwd_inner_microstep: 737.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 04:16:56,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3453
[2024-06-10 04:16:58,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1416.32 | bwd_inner_microstep: 1416.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 04:17:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1405.00 | bwd_inner_microstep: 1404.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-10 04:17:01,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.38 | bwd_microstep: 1156.64 | bwd_inner_microstep: 1156.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:17:04,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1555.24 | bwd_inner_microstep: 1555.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 04:17:06,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1398.68 | bwd_inner_microstep: 1398.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 04:17:08,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.61 | bwd_microstep: 1502.15 | bwd_inner_microstep: 1502.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 04:17:11,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 04:17:11,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.45 | bwd_microstep: 2987.42 | bwd_inner_microstep: 892.05 | bwd_allreduce_microstep: 2095.32 | step_microstep: 40.86
[2024-06-10 04:17:11,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16010.21 | bwd: 44996.33 | bwd_inner: 42899.95 | bwd_allreduce: 2095.64 | step: 42.57
{'loss': 1.3435, 'learning_rate': 3.915979024630978e-05, 'epoch': 0.12}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2428
[2024-06-10 04:17:12,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.85 | bwd_microstep: 996.47 | bwd_inner_microstep: 996.36 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3903
[2024-06-10 04:17:15,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1653.72 | bwd_inner_microstep: 1653.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 04:17:16,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1382.84 | bwd_inner_microstep: 1382.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 04:17:19,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.58 | bwd_microstep: 1653.79 | bwd_inner_microstep: 1653.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418
[2024-06-10 04:17:21,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1309.19 | bwd_inner_microstep: 1309.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2900
[2024-06-10 04:17:22,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.29 | bwd_microstep: 1093.90 | bwd_inner_microstep: 1093.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-10 04:17:23,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.28 | bwd_microstep: 730.54 | bwd_inner_microstep: 730.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1872
[2024-06-10 04:17:24,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.67 | bwd_microstep: 741.94 | bwd_inner_microstep: 741.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 04:17:26,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1387.98 | bwd_inner_microstep: 1387.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3681
[2024-06-10 04:17:28,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.79 | bwd_microstep: 1288.86 | bwd_inner_microstep: 1288.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-10 04:17:30,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1279.83 | bwd_inner_microstep: 1279.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3626
[2024-06-10 04:17:32,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1463.23 | bwd_inner_microstep: 1463.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-10 04:17:33,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.08 | bwd_microstep: 1317.39 | bwd_inner_microstep: 1317.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975
[2024-06-10 04:17:35,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.32 | bwd_microstep: 855.18 | bwd_inner_microstep: 855.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 04:17:37,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2127
[2024-06-10 04:17:38,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.55 | bwd_microstep: 926.05 | bwd_inner_microstep: 926.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3647
[2024-06-10 04:17:40,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.13 | bwd_microstep: 1536.39 | bwd_inner_microstep: 1536.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 04:17:42,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.44 | bwd_microstep: 1607.87 | bwd_inner_microstep: 1607.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 04:17:44,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.81 | bwd_microstep: 1290.23 | bwd_inner_microstep: 1290.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 04:17:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.83 | bwd_microstep: 1456.83 | bwd_inner_microstep: 1456.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 04:17:48,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.57 | bwd_microstep: 1488.31 | bwd_inner_microstep: 1488.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2130
[2024-06-10 04:17:49,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.45 | bwd_microstep: 928.28 | bwd_inner_microstep: 928.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 04:17:51,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1489.30 | bwd_inner_microstep: 1489.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3847
[2024-06-10 04:17:54,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.92 | bwd_microstep: 1563.69 | bwd_inner_microstep: 1563.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 04:17:55,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185
[2024-06-10 04:17:57,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.98 | bwd_microstep: 953.25 | bwd_inner_microstep: 953.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 04:17:59,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.86 | bwd_microstep: 1459.49 | bwd_inner_microstep: 1459.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444
[2024-06-10 04:18:00,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.81 | bwd_microstep: 1216.99 | bwd_inner_microstep: 1216.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3566
[2024-06-10 04:18:03,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.23 | bwd_microstep: 1456.58 | bwd_inner_microstep: 1456.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 04:18:05,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.31 | bwd_microstep: 1538.21 | bwd_inner_microstep: 1538.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3474
[2024-06-10 04:18:07,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1508.60 | bwd_inner_microstep: 1508.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 04:18:12,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.27 | optimizer_step: 6.56
[2024-06-10 04:18:12,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 4400.73 | bwd_inner_microstep: 1680.83 | bwd_allreduce_microstep: 2719.85 | step_microstep: 40.83
[2024-06-10 04:18:12,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15682.91 | bwd: 44761.56 | bwd_inner: 42040.69 | bwd_allreduce: 2720.13 | step: 42.55
{'loss': 1.3507, 'learning_rate': 3.914899165053272e-05, 'epoch': 0.12}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868
[2024-06-10 04:18:13,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.41 | bwd_microstep: 699.82 | bwd_inner_microstep: 699.68 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 04:18:15,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4296
[2024-06-10 04:18:17,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.74 | bwd_microstep: 1583.43 | bwd_inner_microstep: 1583.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 04:18:19,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.42 | bwd_microstep: 1452.73 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:18:21,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1551.15 | bwd_inner_microstep: 1551.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 04:18:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 790.73 | bwd_inner_microstep: 790.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 04:18:23,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.63 | bwd_microstep: 681.01 | bwd_inner_microstep: 680.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 04:18:25,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.21 | bwd_microstep: 1284.81 | bwd_inner_microstep: 1284.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3720
[2024-06-10 04:18:27,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.42 | bwd_microstep: 1364.67 | bwd_inner_microstep: 1364.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-10 04:18:28,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.25 | bwd_microstep: 799.59 | bwd_inner_microstep: 799.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2120
[2024-06-10 04:18:29,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.24 | bwd_microstep: 767.07 | bwd_inner_microstep: 767.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 04:18:31,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1382.67 | bwd_inner_microstep: 1382.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 04:18:33,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.13 | bwd_microstep: 1336.86 | bwd_inner_microstep: 1336.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-10 04:18:35,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.04 | bwd_microstep: 1625.61 | bwd_inner_microstep: 1625.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 04:18:37,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1519.59 | bwd_inner_microstep: 1519.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 04:18:38,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.59 | bwd_microstep: 686.34 | bwd_inner_microstep: 686.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-10 04:18:40,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.56 | bwd_microstep: 1425.04 | bwd_inner_microstep: 1425.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3532
[2024-06-10 04:18:42,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1228.63 | bwd_inner_microstep: 1228.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 04:18:43,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1245.89 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005
[2024-06-10 04:18:45,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.60 | bwd_microstep: 897.38 | bwd_inner_microstep: 897.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 04:18:46,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1276.41 | bwd_inner_microstep: 1276.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3690
[2024-06-10 04:18:48,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.25 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2282
[2024-06-10 04:18:50,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.04 | bwd_microstep: 1036.62 | bwd_inner_microstep: 1036.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 04:18:52,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 1501.40 | bwd_inner_microstep: 1501.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 04:18:54,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1425.51 | bwd_inner_microstep: 1425.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 04:18:55,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1252.08 | bwd_inner_microstep: 1252.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 04:18:57,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1405.20 | bwd_inner_microstep: 1405.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 04:18:59,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1396.45 | bwd_inner_microstep: 1396.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 04:19:00,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.49 | bwd_microstep: 804.02 | bwd_inner_microstep: 803.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 04:19:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.02 | bwd_microstep: 1384.23 | bwd_inner_microstep: 1384.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 04:19:04,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280
[2024-06-10 04:19:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.39 | optimizer_step: 6.58
[2024-06-10 04:19:15,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.31 | bwd_microstep: 9938.67 | bwd_inner_microstep: 1139.51 | bwd_allreduce_microstep: 8799.09 | step_microstep: 39.73
[2024-06-10 04:19:15,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14689.21 | bwd: 47978.17 | bwd_inner: 39178.03 | bwd_allreduce: 8799.38 | step: 41.35
{'loss': 1.2879, 'learning_rate': 3.91381256121066e-05, 'epoch': 0.12}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 04:19:17,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.92 | bwd_microstep: 1465.54 | bwd_inner_microstep: 1465.46 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504
[2024-06-10 04:19:18,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.99 | bwd_microstep: 1186.78 | bwd_inner_microstep: 1186.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899
[2024-06-10 04:19:21,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.53 | bwd_microstep: 1580.88 | bwd_inner_microstep: 1580.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 04:19:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1375.23 | bwd_inner_microstep: 1375.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 461
[2024-06-10 04:19:23,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 91.34 | bwd_microstep: 226.67 | bwd_inner_microstep: 226.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2226
[2024-06-10 04:19:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.00 | bwd_microstep: 894.16 | bwd_inner_microstep: 894.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2627
[2024-06-10 04:19:25,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.90 | bwd_microstep: 921.42 | bwd_inner_microstep: 921.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975
[2024-06-10 04:19:27,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.25 | bwd_microstep: 857.33 | bwd_inner_microstep: 857.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 04:19:28,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.87 | bwd_microstep: 1253.89 | bwd_inner_microstep: 1253.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 04:19:29,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.23 | bwd_microstep: 797.47 | bwd_inner_microstep: 797.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2444
[2024-06-10 04:19:31,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.66 | bwd_microstep: 1017.71 | bwd_inner_microstep: 1017.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 04:19:33,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1390.89 | bwd_inner_microstep: 1390.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4055
[2024-06-10 04:19:35,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.32 | bwd_microstep: 1655.97 | bwd_inner_microstep: 1655.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3517
[2024-06-10 04:19:37,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1366.68 | bwd_inner_microstep: 1366.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2124
[2024-06-10 04:19:38,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.23 | bwd_microstep: 1023.55 | bwd_inner_microstep: 1023.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2509
[2024-06-10 04:19:40,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.85 | bwd_microstep: 995.62 | bwd_inner_microstep: 995.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 04:19:42,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1491.68 | bwd_inner_microstep: 1491.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 04:19:44,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.98 | bwd_microstep: 1384.43 | bwd_inner_microstep: 1384.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 04:19:45,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.41 | bwd_microstep: 792.27 | bwd_inner_microstep: 792.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501
[2024-06-10 04:19:46,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.71 | bwd_microstep: 1225.44 | bwd_inner_microstep: 1225.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 04:19:48,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1393.94 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 04:19:51,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1609.92 | bwd_inner_microstep: 1609.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142
[2024-06-10 04:19:52,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.15 | bwd_microstep: 835.08 | bwd_inner_microstep: 835.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 04:19:54,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1401.48 | bwd_inner_microstep: 1401.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 04:19:56,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1392.59 | bwd_inner_microstep: 1392.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:19:58,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.04 | bwd_microstep: 1555.13 | bwd_inner_microstep: 1555.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808
[2024-06-10 04:20:00,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824
[2024-06-10 04:20:02,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.84 | bwd_microstep: 1755.05 | bwd_inner_microstep: 1755.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818
[2024-06-10 04:20:04,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.52 | bwd_microstep: 1614.90 | bwd_inner_microstep: 1614.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 04:20:06,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.25 | bwd_microstep: 1551.58 | bwd_inner_microstep: 1551.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 04:20:09,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.14 | bwd_microstep: 1603.25 | bwd_inner_microstep: 1603.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3581
[2024-06-10 04:20:15,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.35 | optimizer_step: 6.60
[2024-06-10 04:20:15,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.11 | bwd_microstep: 5787.65 | bwd_inner_microstep: 1493.09 | bwd_allreduce_microstep: 4294.50 | step_microstep: 39.38
[2024-06-10 04:20:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15099.05 | bwd: 44791.48 | bwd_inner: 40496.00 | bwd_allreduce: 4294.77 | step: 41.02
{'loss': 1.3586, 'learning_rate': 3.912719216930157e-05, 'epoch': 0.12}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484
[2024-06-10 04:20:17,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.17 | bwd_microstep: 1568.06 | bwd_inner_microstep: 1567.98 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4001
[2024-06-10 04:20:19,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1506.32 | bwd_inner_microstep: 1506.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-10 04:20:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.93 | bwd_microstep: 1561.62 | bwd_inner_microstep: 1561.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 04:20:23,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1483.24 | bwd_inner_microstep: 1483.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 04:20:25,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.19 | bwd_microstep: 1485.82 | bwd_inner_microstep: 1485.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2904
[2024-06-10 04:20:27,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.63 | bwd_microstep: 1186.58 | bwd_inner_microstep: 1186.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 04:20:29,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1393.24 | bwd_inner_microstep: 1393.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3594
[2024-06-10 04:20:31,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.82 | bwd_microstep: 1260.39 | bwd_inner_microstep: 1260.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3451
[2024-06-10 04:20:32,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.77 | bwd_microstep: 1222.69 | bwd_inner_microstep: 1222.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 04:20:34,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 805.09 | bwd_inner_microstep: 805.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3504
[2024-06-10 04:20:35,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1351.38 | bwd_inner_microstep: 1351.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 04:20:38,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.24 | bwd_microstep: 1524.94 | bwd_inner_microstep: 1524.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-10 04:20:40,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1447.82 | bwd_inner_microstep: 1447.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3421
[2024-06-10 04:20:41,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.25 | bwd_microstep: 1311.35 | bwd_inner_microstep: 1311.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 04:20:43,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1347.57 | bwd_inner_microstep: 1347.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3684
[2024-06-10 04:20:45,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.92 | bwd_microstep: 1487.10 | bwd_inner_microstep: 1487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 04:20:47,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3791
[2024-06-10 04:20:50,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.64 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647
[2024-06-10 04:20:52,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.47 | bwd_microstep: 1578.87 | bwd_inner_microstep: 1578.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982
[2024-06-10 04:20:53,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.12 | bwd_microstep: 738.66 | bwd_inner_microstep: 738.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 04:20:55,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.39 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 04:20:57,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.73 | bwd_microstep: 1558.28 | bwd_inner_microstep: 1558.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 04:20:58,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 800.60 | bwd_inner_microstep: 800.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 04:21:00,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1563.11 | bwd_inner_microstep: 1563.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 04:21:02,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 04:21:04,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.36 | bwd_microstep: 1406.54 | bwd_inner_microstep: 1406.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2244
[2024-06-10 04:21:05,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 968.90 | bwd_inner_microstep: 968.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3613
[2024-06-10 04:21:07,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.16 | bwd_microstep: 1648.88 | bwd_inner_microstep: 1648.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3768
[2024-06-10 04:21:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.95 | bwd_microstep: 1518.14 | bwd_inner_microstep: 1518.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 04:21:11,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.15 | bwd_microstep: 1367.45 | bwd_inner_microstep: 1367.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 04:21:13,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.38 | bwd_microstep: 1502.31 | bwd_inner_microstep: 1502.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:21:17,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 04:21:17,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 2780.79 | bwd_inner_microstep: 2020.79 | bwd_allreduce_microstep: 759.95 | step_microstep: 38.63
[2024-06-10 04:21:17,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16378.43 | bwd: 45059.52 | bwd_inner: 44298.60 | bwd_allreduce: 760.22 | step: 40.24
{'loss': 1.2737, 'learning_rate': 3.911619136062515e-05, 'epoch': 0.12}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 04:21:19,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.90 | bwd_microstep: 1473.27 | bwd_inner_microstep: 1473.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 04:21:21,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1250.20 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 04:21:23,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1491.38 | bwd_inner_microstep: 1491.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2256
[2024-06-10 04:21:24,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.99 | bwd_microstep: 834.63 | bwd_inner_microstep: 834.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 04:21:25,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.81 | bwd_microstep: 1148.72 | bwd_inner_microstep: 1148.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 04:21:27,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1413.26 | bwd_inner_microstep: 1413.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 04:21:29,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.03 | bwd_microstep: 1288.74 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 04:21:31,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.15 | bwd_microstep: 1189.94 | bwd_inner_microstep: 1189.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1951
[2024-06-10 04:21:32,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.18 | bwd_microstep: 823.84 | bwd_inner_microstep: 823.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 04:21:34,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1380.65 | bwd_inner_microstep: 1380.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 04:21:36,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1446.81 | bwd_inner_microstep: 1446.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 04:21:38,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.12 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399
[2024-06-10 04:21:40,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.70 | bwd_microstep: 1390.96 | bwd_inner_microstep: 1390.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 04:21:42,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.70 | bwd_microstep: 1610.95 | bwd_inner_microstep: 1610.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 04:21:44,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.56 | bwd_microstep: 1185.88 | bwd_inner_microstep: 1185.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 04:21:46,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1392.41 | bwd_inner_microstep: 1392.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 04:21:47,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.86 | bwd_microstep: 1399.91 | bwd_inner_microstep: 1399.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 04:21:50,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.48 | bwd_microstep: 1516.61 | bwd_inner_microstep: 1516.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 04:21:52,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.07 | bwd_microstep: 1558.42 | bwd_inner_microstep: 1558.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 04:21:54,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.59 | bwd_microstep: 1616.23 | bwd_inner_microstep: 1616.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 04:21:56,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.02 | bwd_microstep: 1556.15 | bwd_inner_microstep: 1556.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:21:58,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1553.31 | bwd_inner_microstep: 1553.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998
[2024-06-10 04:21:59,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.98 | bwd_microstep: 740.15 | bwd_inner_microstep: 740.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3541
[2024-06-10 04:22:01,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.53 | bwd_microstep: 1231.97 | bwd_inner_microstep: 1231.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2239
[2024-06-10 04:22:02,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.73 | bwd_microstep: 896.72 | bwd_inner_microstep: 896.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 04:22:04,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.70 | bwd_microstep: 1582.67 | bwd_inner_microstep: 1582.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 04:22:06,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1493.10 | bwd_inner_microstep: 1493.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2279
[2024-06-10 04:22:08,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.78 | bwd_microstep: 1072.67 | bwd_inner_microstep: 1072.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 04:22:10,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.84 | bwd_microstep: 1278.90 | bwd_inner_microstep: 1278.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580
[2024-06-10 04:22:12,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.65 | bwd_microstep: 1699.86 | bwd_inner_microstep: 1699.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3596
[2024-06-10 04:22:14,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.61 | bwd_microstep: 1569.57 | bwd_inner_microstep: 1569.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3764
[2024-06-10 04:22:18,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 04:22:18,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 3460.33 | bwd_inner_microstep: 1803.16 | bwd_allreduce_microstep: 1657.12 | step_microstep: 38.78
[2024-06-10 04:22:18,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16171.54 | bwd: 45029.67 | bwd_inner: 43371.60 | bwd_allreduce: 1657.35 | step: 40.39
206/1726 [3:38:46<26:36:17, 63.01s/it]
 12%|█▏        | 207/1726 [3:39:48<26:22:40, 62.51s/it]


 12%|█▏        | 207/1726 [3:39:48<26:22:40, 62.51s/it]
 12%|█▏        | 208/1726 [3:40:48<26:08:35, 62.00s/it]


 12%|█▏        | 208/1726 [3:40:48<26:08:35, 62.00s/it]
 12%|█▏        | 209/1726 [3:41:51<26:15:11, 62.30s/it]


 12%|█▏        | 209/1726 [3:41:51<26:15:11, 62.30s/it]
 12%|█▏        | 210/1726 [3:42:52<25:58:27, 61.68s/it]


 12%|█▏        | 210/1726 [3:42:52<25:58:27, 61.68s/it]
 12%|█▏        | 211/1726 [3:43:53<25:58:16, 61.71s/it]


 12%|█▏        | 211/1726 [3:43:53<25:58:16, 61.71s/it]
 12%|█▏        | 212/1726 [3:44:55<25:55:59, 61.66s/it]
                                        {'loss': 1.3365, 'learning_rate': 3.9105123224822143e-05, 'epoch': 0.12}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3412
[2024-06-10 04:22:20,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1363.77 | bwd_inner_microstep: 1363.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3960
[2024-06-10 04:22:22,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.65 | bwd_microstep: 1595.09 | bwd_inner_microstep: 1595.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 04:22:24,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1387.95 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846
[2024-06-10 04:22:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.29 | bwd_microstep: 1662.46 | bwd_inner_microstep: 1662.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469
[2024-06-10 04:22:29,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1441.94 | bwd_inner_microstep: 1441.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 04:22:30,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1348.85 | bwd_inner_microstep: 1348.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 04:22:32,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1438.36 | bwd_inner_microstep: 1438.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883
[2024-06-10 04:22:33,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.16 | bwd_microstep: 711.49 | bwd_inner_microstep: 711.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 04:22:36,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1640.93 | bwd_inner_microstep: 1640.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499
[2024-06-10 04:22:38,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.91 | bwd_microstep: 1428.32 | bwd_inner_microstep: 1428.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663
[2024-06-10 04:22:40,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.06 | bwd_microstep: 1615.75 | bwd_inner_microstep: 1615.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3425
[2024-06-10 04:22:42,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1395.04 | bwd_inner_microstep: 1395.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3497
[2024-06-10 04:22:44,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.94 | bwd_microstep: 1443.02 | bwd_inner_microstep: 1443.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 04:22:46,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1488.14 | bwd_inner_microstep: 1488.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2661
[2024-06-10 04:22:48,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.44 | bwd_microstep: 1214.77 | bwd_inner_microstep: 1214.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-10 04:22:50,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.30 | bwd_microstep: 1618.16 | bwd_inner_microstep: 1618.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 04:22:52,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1400.19 | bwd_inner_microstep: 1400.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2176
[2024-06-10 04:22:53,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.06 | bwd_microstep: 763.78 | bwd_inner_microstep: 763.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 04:22:55,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 04:22:57,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1386.81 | bwd_inner_microstep: 1386.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 04:22:58,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.39 | bwd_microstep: 1302.01 | bwd_inner_microstep: 1301.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2019
[2024-06-10 04:22:59,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.06 | bwd_microstep: 714.23 | bwd_inner_microstep: 714.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 04:23:01,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1495.71 | bwd_inner_microstep: 1495.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2148
[2024-06-10 04:23:03,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.74 | bwd_microstep: 948.77 | bwd_inner_microstep: 948.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566
[2024-06-10 04:23:04,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.79 | bwd_microstep: 1206.29 | bwd_inner_microstep: 1206.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2427
[2024-06-10 04:23:06,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.99 | bwd_microstep: 940.53 | bwd_inner_microstep: 940.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 04:23:07,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1254.19 | bwd_inner_microstep: 1254.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2072
[2024-06-10 04:23:09,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.42 | bwd_microstep: 850.84 | bwd_inner_microstep: 850.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2042
[2024-06-10 04:23:10,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 842.82 | bwd_inner_microstep: 842.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 04:23:12,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.59 | bwd_microstep: 1444.58 | bwd_inner_microstep: 1444.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812
[2024-06-10 04:23:14,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1384.50 | bwd_inner_microstep: 1384.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3442
[2024-06-10 04:23:20,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 04:23:20,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 5833.15 | bwd_inner_microstep: 1762.40 | bwd_allreduce_microstep: 4070.69 | step_microstep: 38.87
[2024-06-10 04:23:20,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15600.59 | bwd: 45947.51 | bwd_inner: 41875.92 | bwd_allreduce: 4070.92 | step: 40.50
{'loss': 1.3109, 'learning_rate': 3.909398780087445e-05, 'epoch': 0.12}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-10 04:23:22,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1281.16 | bwd_inner_microstep: 1281.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 04:23:24,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.67 | bwd_microstep: 1308.70 | bwd_inner_microstep: 1308.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 04:23:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.53 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795
[2024-06-10 04:23:28,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.06 | bwd_microstep: 1648.35 | bwd_inner_microstep: 1648.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 04:23:30,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.10 | bwd_microstep: 1275.88 | bwd_inner_microstep: 1275.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752
[2024-06-10 04:23:32,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.70 | bwd_microstep: 1638.43 | bwd_inner_microstep: 1638.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3478
[2024-06-10 04:23:34,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.52 | bwd_microstep: 1215.65 | bwd_inner_microstep: 1215.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1890
[2024-06-10 04:23:35,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.76 | bwd_microstep: 745.47 | bwd_inner_microstep: 745.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2960
[2024-06-10 04:23:36,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.33 | bwd_microstep: 1070.97 | bwd_inner_microstep: 1070.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3683
[2024-06-10 04:23:38,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.01 | bwd_microstep: 1457.28 | bwd_inner_microstep: 1457.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671
[2024-06-10 04:23:40,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.44 | bwd_microstep: 1612.36 | bwd_inner_microstep: 1612.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 04:23:41,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 797.59 | bwd_inner_microstep: 797.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3493
[2024-06-10 04:23:44,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.03 | bwd_microstep: 1549.41 | bwd_inner_microstep: 1549.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515
[2024-06-10 04:23:45,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.19 | bwd_microstep: 1253.06 | bwd_inner_microstep: 1253.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 04:23:47,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.64 | bwd_microstep: 1511.73 | bwd_inner_microstep: 1511.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 04:23:49,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1494.12 | bwd_inner_microstep: 1494.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 04:23:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.13 | bwd_microstep: 1396.29 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 04:23:53,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1404.57 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2740
[2024-06-10 04:23:55,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.24 | bwd_microstep: 1043.29 | bwd_inner_microstep: 1043.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 04:23:57,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1496.48 | bwd_inner_microstep: 1496.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 04:23:59,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1398.78 | bwd_inner_microstep: 1398.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284
[2024-06-10 04:24:00,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.80 | bwd_microstep: 881.21 | bwd_inner_microstep: 881.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 04:24:02,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.62 | bwd_microstep: 1611.17 | bwd_inner_microstep: 1611.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3820
[2024-06-10 04:24:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.16 | bwd_microstep: 1262.59 | bwd_inner_microstep: 1262.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 04:24:06,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.24 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 04:24:08,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1554.92 | bwd_inner_microstep: 1554.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 04:24:09,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.85 | bwd_microstep: 988.18 | bwd_inner_microstep: 988.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801
[2024-06-10 04:24:12,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.00 | bwd_microstep: 1450.02 | bwd_inner_microstep: 1449.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 04:24:14,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1555.72 | bwd_inner_microstep: 1555.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 04:24:15,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.89 | bwd_microstep: 972.35 | bwd_inner_microstep: 972.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 04:24:17,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1277.89 | bwd_inner_microstep: 1277.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3601
[2024-06-10 04:24:23,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 04:24:23,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.51 | bwd_microstep: 6063.38 | bwd_inner_microstep: 1791.65 | bwd_allreduce_microstep: 4271.68 | step_microstep: 38.71
[2024-06-10 04:24:23,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15940.05 | bwd: 46985.63 | bwd_inner: 42713.04 | bwd_allreduce: 4271.91 | step: 40.32
{'loss': 1.3205, 'learning_rate': 3.908278512800098e-05, 'epoch': 0.12}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 04:24:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.21 | bwd_microstep: 1234.84 | bwd_inner_microstep: 1234.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 04:24:26,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.72 | bwd_microstep: 696.73 | bwd_inner_microstep: 696.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 04:24:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1245.76 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 04:24:30,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 04:24:32,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1379.89 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-10 04:24:33,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.25 | bwd_microstep: 1215.58 | bwd_inner_microstep: 1215.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 04:24:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.29 | bwd_microstep: 1387.14 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 04:24:36,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.34 | bwd_microstep: 788.56 | bwd_inner_microstep: 788.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549
[2024-06-10 04:24:38,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.81 | bwd_microstep: 1449.68 | bwd_inner_microstep: 1449.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503
[2024-06-10 04:24:40,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.08 | bwd_microstep: 1224.21 | bwd_inner_microstep: 1224.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506
[2024-06-10 04:24:42,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1351.13 | bwd_inner_microstep: 1351.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-10 04:24:44,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1540.59 | bwd_inner_microstep: 1540.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 04:24:46,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 04:24:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.54 | bwd_microstep: 1486.40 | bwd_inner_microstep: 1486.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 04:24:50,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1418.68 | bwd_inner_microstep: 1418.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 04:24:52,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.01 | bwd_microstep: 1613.56 | bwd_inner_microstep: 1613.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828
[2024-06-10 04:24:54,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1390.43 | bwd_inner_microstep: 1390.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2947
[2024-06-10 04:24:56,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.04 | bwd_microstep: 1200.35 | bwd_inner_microstep: 1200.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 04:24:58,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1326.03 | bwd_inner_microstep: 1326.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 04:24:59,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1279.36 | bwd_inner_microstep: 1279.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 04:25:01,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1400.52 | bwd_inner_microstep: 1400.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 04:25:03,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1556.12 | bwd_inner_microstep: 1556.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 04:25:05,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1291.51 | bwd_inner_microstep: 1291.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 04:25:07,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1398.58 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819
[2024-06-10 04:25:10,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.71 | bwd_microstep: 1755.70 | bwd_inner_microstep: 1755.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3381
[2024-06-10 04:25:12,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.37 | bwd_microstep: 1439.09 | bwd_inner_microstep: 1439.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 04:25:13,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1358.15 | bwd_inner_microstep: 1358.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3830
[2024-06-10 04:25:15,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.60 | bwd_microstep: 1265.34 | bwd_inner_microstep: 1265.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3567
[2024-06-10 04:25:17,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.87 | bwd_microstep: 1531.46 | bwd_inner_microstep: 1531.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3570
[2024-06-10 04:25:19,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.16 | bwd_microstep: 1562.90 | bwd_inner_microstep: 1562.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 04:25:21,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1340.38 | bwd_inner_microstep: 1340.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 04:25:24,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 04:25:24,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 2356.42 | bwd_inner_microstep: 1565.74 | bwd_allreduce_microstep: 790.62 | step_microstep: 38.19
[2024-06-10 04:25:24,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16224.58 | bwd: 44181.74 | bwd_inner: 43390.19 | bwd_allreduce: 790.85 | step: 39.88
{'loss': 1.3079, 'learning_rate': 3.907151524565749e-05, 'epoch': 0.12}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 04:25:26,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1282.37 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 04:25:28,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1248.36 | bwd_inner_microstep: 1248.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 04:25:30,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1479.78 | bwd_inner_microstep: 1479.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 04:25:32,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.97 | bwd_microstep: 1502.06 | bwd_inner_microstep: 1502.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4212
[2024-06-10 04:25:34,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.65 | bwd_microstep: 1658.91 | bwd_inner_microstep: 1658.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4085
[2024-06-10 04:25:36,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.38 | bwd_microstep: 1628.57 | bwd_inner_microstep: 1628.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 04:25:38,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1247.35 | bwd_inner_microstep: 1247.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-10 04:25:40,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.68 | bwd_microstep: 1633.40 | bwd_inner_microstep: 1633.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 04:25:43,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.62 | bwd_microstep: 1635.02 | bwd_inner_microstep: 1634.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 04:25:45,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.73 | bwd_microstep: 1427.83 | bwd_inner_microstep: 1427.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 04:25:46,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1287.36 | bwd_inner_microstep: 1287.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017
[2024-06-10 04:25:48,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.26 | bwd_microstep: 835.60 | bwd_inner_microstep: 835.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 04:25:49,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1286.86 | bwd_inner_microstep: 1286.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493
[2024-06-10 04:25:51,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2652
[2024-06-10 04:25:53,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.29 | bwd_microstep: 1023.14 | bwd_inner_microstep: 1023.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3445
[2024-06-10 04:25:55,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1412.59 | bwd_inner_microstep: 1412.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 04:25:57,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1339.71 | bwd_inner_microstep: 1339.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522
[2024-06-10 04:25:58,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1324.16 | bwd_inner_microstep: 1324.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137
[2024-06-10 04:26:00,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.76 | bwd_microstep: 834.12 | bwd_inner_microstep: 834.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 04:26:01,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.23 | bwd_microstep: 975.14 | bwd_inner_microstep: 975.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 04:26:03,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 04:26:05,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.45 | bwd_microstep: 1459.27 | bwd_inner_microstep: 1459.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 04:26:07,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.41 | bwd_microstep: 1461.26 | bwd_inner_microstep: 1461.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 04:26:09,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1461.29 | bwd_inner_microstep: 1461.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 04:26:11,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1432.87 | bwd_inner_microstep: 1432.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 04:26:13,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1433.70 | bwd_inner_microstep: 1433.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-10 04:26:15,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.14 | bwd_microstep: 1585.28 | bwd_inner_microstep: 1585.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596
[2024-06-10 04:26:17,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1434.16 | bwd_inner_microstep: 1434.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2353
[2024-06-10 04:26:18,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.46 | bwd_microstep: 1025.73 | bwd_inner_microstep: 1025.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3738
[2024-06-10 04:26:21,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1559.73 | bwd_inner_microstep: 1559.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570
[2024-06-10 04:26:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.49 | bwd_microstep: 1597.65 | bwd_inner_microstep: 1597.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 04:26:25,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 04:26:25,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1529.89 | bwd_inner_microstep: 1522.19 | bwd_allreduce_microstep: 7.65 | step_microstep: 38.22
[2024-06-10 04:26:25,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16417.57 | bwd: 43880.13 | bwd_inner: 43871.56 | bwd_allreduce: 7.88 | step: 39.86
{'loss': 1.3158, 'learning_rate': 3.906017819353645e-05, 'epoch': 0.13}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3384
[2024-06-10 04:26:27,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.77 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4044
[2024-06-10 04:26:29,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.76 | bwd_microstep: 1719.26 | bwd_inner_microstep: 1719.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 04:26:31,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1378.66 | bwd_inner_microstep: 1378.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060
[2024-06-10 04:26:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.91 | bwd_microstep: 815.12 | bwd_inner_microstep: 815.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 04:26:35,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.01 | bwd_microstep: 1654.44 | bwd_inner_microstep: 1654.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 04:26:36,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.97 | bwd_microstep: 819.93 | bwd_inner_microstep: 819.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 04:26:37,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.68 | bwd_microstep: 1151.58 | bwd_inner_microstep: 1151.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2461
[2024-06-10 04:26:39,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.91 | bwd_microstep: 950.25 | bwd_inner_microstep: 950.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 04:26:41,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1388.04 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 04:26:42,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.36 | bwd_microstep: 1193.65 | bwd_inner_microstep: 1193.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 04:26:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1250.52 | bwd_inner_microstep: 1250.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503
[2024-06-10 04:26:46,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1438.48 | bwd_inner_microstep: 1438.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 04:26:48,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.77 | bwd_microstep: 1317.96 | bwd_inner_microstep: 1317.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941
[2024-06-10 04:26:49,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.48 | bwd_microstep: 886.32 | bwd_inner_microstep: 886.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 04:26:51,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1708.48 | bwd_inner_microstep: 1708.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 04:26:53,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1242.00 | bwd_inner_microstep: 1241.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960
[2024-06-10 04:26:54,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.16 | bwd_microstep: 824.53 | bwd_inner_microstep: 824.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832
[2024-06-10 04:26:56,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1388.88 | bwd_inner_microstep: 1388.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 04:26:58,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1310.96 | bwd_inner_microstep: 1310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2099
[2024-06-10 04:26:59,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.89 | bwd_microstep: 730.59 | bwd_inner_microstep: 730.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 04:27:01,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1252.48 | bwd_inner_microstep: 1252.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2984
[2024-06-10 04:27:02,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.94 | bwd_microstep: 1297.52 | bwd_inner_microstep: 1297.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2000
[2024-06-10 04:27:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.16 | bwd_microstep: 831.65 | bwd_inner_microstep: 831.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3626
[2024-06-10 04:27:06,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.87 | bwd_microstep: 1576.82 | bwd_inner_microstep: 1576.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2287
[2024-06-10 04:27:07,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.38 | bwd_microstep: 815.12 | bwd_inner_microstep: 815.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-10 04:27:09,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 1309.76 | bwd_inner_microstep: 1309.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-10 04:27:10,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.23 | bwd_microstep: 815.89 | bwd_inner_microstep: 815.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 04:27:12,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1381.61 | bwd_inner_microstep: 1381.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 04:27:14,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1505.13 | bwd_inner_microstep: 1505.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2240
[2024-06-10 04:27:15,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 04:27:17,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.77 | bwd_microstep: 1305.69 | bwd_inner_microstep: 1305.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-10 04:27:24,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 04:27:24,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 7095.15 | bwd_inner_microstep: 1463.54 | bwd_allreduce_microstep: 5631.55 | step_microstep: 38.69
[2024-06-10 04:27:24,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14608.84 | bwd: 44634.63 | bwd_inner: 39002.12 | bwd_allreduce: 5631.84 | step: 40.37
{'loss': 1.3252, 'learning_rate': 3.9048774011566906e-05, 'epoch': 0.13}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 04:27:26,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1367.25 | bwd_inner_microstep: 1367.15 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4414
[2024-06-10 04:27:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.07 | bwd_microstep: 1814.42 | bwd_inner_microstep: 1814.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294
[2024-06-10 04:27:30,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.61 | bwd_microstep: 973.35 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 04:27:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.77 | bwd_microstep: 1493.86 | bwd_inner_microstep: 1493.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475
[2024-06-10 04:27:34,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 1341.59 | bwd_inner_microstep: 1341.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 04:27:36,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1281.15 | bwd_inner_microstep: 1281.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2259
[2024-06-10 04:27:37,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.90 | bwd_microstep: 872.42 | bwd_inner_microstep: 872.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 04:27:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1426.34 | bwd_inner_microstep: 1426.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 04:27:41,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.13 | bwd_microstep: 1518.69 | bwd_inner_microstep: 1518.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 04:27:43,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1387.91 | bwd_inner_microstep: 1387.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3417
[2024-06-10 04:27:45,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1313.53 | bwd_inner_microstep: 1313.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516
[2024-06-10 04:27:47,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.49 | bwd_microstep: 1587.46 | bwd_inner_microstep: 1587.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2439
[2024-06-10 04:27:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.48 | bwd_microstep: 950.16 | bwd_inner_microstep: 950.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637
[2024-06-10 04:27:51,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.30 | bwd_microstep: 1546.03 | bwd_inner_microstep: 1546.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 04:27:52,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1286.23 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3890
[2024-06-10 04:27:55,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 04:27:57,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.90 | bwd_microstep: 1487.98 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-10 04:27:58,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.37 | bwd_microstep: 919.19 | bwd_inner_microstep: 919.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709
[2024-06-10 04:28:00,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1338.49 | bwd_inner_microstep: 1338.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071
[2024-06-10 04:28:01,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.24 | bwd_microstep: 916.55 | bwd_inner_microstep: 916.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 04:28:03,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1512.20 | bwd_inner_microstep: 1512.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 04:28:04,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 805.23 | bwd_inner_microstep: 805.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 04:28:06,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.76 | bwd_microstep: 1362.47 | bwd_inner_microstep: 1362.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454
[2024-06-10 04:28:08,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.91 | bwd_microstep: 1193.22 | bwd_inner_microstep: 1193.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2047
[2024-06-10 04:28:09,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.60 | bwd_microstep: 938.45 | bwd_inner_microstep: 938.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430
[2024-06-10 04:28:11,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1287.02 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2184
[2024-06-10 04:28:12,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.81 | bwd_microstep: 796.97 | bwd_inner_microstep: 796.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2062
[2024-06-10 04:28:13,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 914.69 | bwd_inner_microstep: 914.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 04:28:15,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1561.00 | bwd_inner_microstep: 1560.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2269
[2024-06-10 04:28:17,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.90 | bwd_microstep: 1068.00 | bwd_inner_microstep: 1067.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2236
[2024-06-10 04:28:18,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.30 | bwd_microstep: 963.54 | bwd_inner_microstep: 963.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3801
[2024-06-10 04:28:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 04:28:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.32 | bwd_microstep: 6451.15 | bwd_inner_microstep: 1923.05 | bwd_allreduce_microstep: 4528.05 | step_microstep: 38.79
[2024-06-10 04:28:25,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15164.67 | bwd: 45263.75 | bwd_inner: 40734.71 | bwd_allreduce: 4528.33 | step: 40.43


 12%|█▏        | 212/1726 [3:44:55<25:55:59, 61.66s/it]
 12%|█▏        | 213/1726 [3:45:57<25:56:43, 61.73s/it]


 12%|█▏        | 213/1726 [3:45:57<25:56:43, 61.73s/it]
 12%|█▏        | 214/1726 [3:47:00<26:07:19, 62.20s/it]


 12%|█▏        | 214/1726 [3:47:00<26:07:19, 62.20s/it]
 12%|█▏        | 215/1726 [3:48:01<25:55:21, 61.76s/it]


 12%|█▏        | 215/1726 [3:48:01<25:55:21, 61.76s/it]
 13%|█▎        | 216/1726 [3:49:02<25:45:53, 61.43s/it]


 13%|█▎        | 216/1726 [3:49:02<25:45:53, 61.43s/it]
 13%|█▎        | 217/1726 [3:50:01<25:31:01, 60.88s/it]


 13%|█▎        | 217/1726 [3:50:01<25:31:01, 60.88s/it]
 13%|█▎        | 218/1726 [3:51:02<25:29:13, 60.84s/it]
  {'loss': 1.3438, 'learning_rate': 3.9037302739914306e-05, 'epoch': 0.13}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 04:28:27,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.64 | bwd_microstep: 1375.92 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869
[2024-06-10 04:28:29,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.19 | bwd_microstep: 1655.73 | bwd_inner_microstep: 1655.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 04:28:31,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.97 | bwd_microstep: 1471.81 | bwd_inner_microstep: 1471.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977
[2024-06-10 04:28:33,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.88 | bwd_microstep: 829.96 | bwd_inner_microstep: 829.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215
[2024-06-10 04:28:34,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.94 | bwd_microstep: 955.98 | bwd_inner_microstep: 955.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 04:28:36,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1250.11 | bwd_inner_microstep: 1250.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 964
[2024-06-10 04:28:36,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 161.30 | bwd_microstep: 418.36 | bwd_inner_microstep: 418.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 04:28:38,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.60 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 04:28:40,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.48 | bwd_microstep: 1154.15 | bwd_inner_microstep: 1154.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501
[2024-06-10 04:28:41,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.95 | bwd_microstep: 1220.26 | bwd_inner_microstep: 1220.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454
[2024-06-10 04:28:43,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.31 | bwd_microstep: 1315.74 | bwd_inner_microstep: 1315.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 04:28:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.80 | bwd_microstep: 1345.50 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002
[2024-06-10 04:28:46,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.26 | bwd_microstep: 900.35 | bwd_inner_microstep: 900.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 04:28:48,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.17 | bwd_microstep: 1620.93 | bwd_inner_microstep: 1620.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 04:28:49,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.83 | bwd_microstep: 700.54 | bwd_inner_microstep: 700.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-10 04:28:51,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1459.11 | bwd_inner_microstep: 1459.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 04:28:54,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.88 | bwd_microstep: 1489.57 | bwd_inner_microstep: 1489.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702
[2024-06-10 04:28:55,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1331.99 | bwd_inner_microstep: 1331.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 04:28:57,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1413.17 | bwd_inner_microstep: 1413.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461
[2024-06-10 04:28:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.47 | bwd_microstep: 1217.96 | bwd_inner_microstep: 1217.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 04:29:01,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1408.38 | bwd_inner_microstep: 1408.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 04:29:03,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.30 | bwd_microstep: 1258.14 | bwd_inner_microstep: 1258.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 04:29:05,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.64 | bwd_microstep: 1613.41 | bwd_inner_microstep: 1613.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3726
[2024-06-10 04:29:07,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.39 | bwd_microstep: 1242.64 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 04:29:09,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1416.29 | bwd_inner_microstep: 1416.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3555
[2024-06-10 04:29:11,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1444.94 | bwd_inner_microstep: 1444.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809
[2024-06-10 04:29:13,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 1581.03 | bwd_inner_microstep: 1581.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3599
[2024-06-10 04:29:14,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.29 | bwd_microstep: 1213.59 | bwd_inner_microstep: 1213.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3814
[2024-06-10 04:29:17,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.07 | bwd_microstep: 1850.41 | bwd_inner_microstep: 1850.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 04:29:19,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.55 | bwd_microstep: 1505.97 | bwd_inner_microstep: 1505.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 04:29:21,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.22 | bwd_microstep: 1503.10 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 04:29:28,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 04:29:28,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.59 | bwd_microstep: 6117.96 | bwd_inner_microstep: 1804.80 | bwd_allreduce_microstep: 4313.10 | step_microstep: 38.84
[2024-06-10 04:29:28,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15769.53 | bwd: 46566.28 | bwd_inner: 42252.22 | bwd_allreduce: 4313.36 | step: 40.46
{'loss': 1.3661, 'learning_rate': 3.9025764418980426e-05, 'epoch': 0.13}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 04:29:30,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1247.82 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4062
[2024-06-10 04:29:32,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.21 | bwd_microstep: 1714.96 | bwd_inner_microstep: 1714.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2388
[2024-06-10 04:29:33,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.16 | bwd_microstep: 999.41 | bwd_inner_microstep: 999.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-10 04:29:35,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.63 | bwd_microstep: 1355.79 | bwd_inner_microstep: 1355.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 04:29:37,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1378.92 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 04:29:38,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.98 | bwd_microstep: 791.34 | bwd_inner_microstep: 791.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 04:29:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.44 | bwd_microstep: 1285.51 | bwd_inner_microstep: 1285.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 04:29:41,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 818.13 | bwd_inner_microstep: 818.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 04:29:42,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.80 | bwd_microstep: 803.93 | bwd_inner_microstep: 803.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-10 04:29:43,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.42 | bwd_microstep: 825.35 | bwd_inner_microstep: 825.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478
[2024-06-10 04:29:45,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.95 | bwd_microstep: 1341.36 | bwd_inner_microstep: 1341.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114
[2024-06-10 04:29:47,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.80 | bwd_microstep: 926.18 | bwd_inner_microstep: 926.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3705
[2024-06-10 04:29:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.24 | bwd_microstep: 1487.51 | bwd_inner_microstep: 1487.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 04:29:51,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.21 | bwd_microstep: 1455.19 | bwd_inner_microstep: 1455.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957
[2024-06-10 04:29:52,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.52 | bwd_microstep: 890.20 | bwd_inner_microstep: 890.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 04:29:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 04:29:56,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.26 | bwd_microstep: 1524.06 | bwd_inner_microstep: 1524.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 04:29:58,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.99 | bwd_microstep: 1706.60 | bwd_inner_microstep: 1706.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-10 04:29:59,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.01 | bwd_microstep: 719.15 | bwd_inner_microstep: 719.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3629
[2024-06-10 04:30:01,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.55 | bwd_microstep: 1346.97 | bwd_inner_microstep: 1346.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448
[2024-06-10 04:30:03,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1318.48 | bwd_inner_microstep: 1318.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 04:30:05,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.33 | bwd_microstep: 1452.05 | bwd_inner_microstep: 1452.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-10 04:30:07,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.95 | bwd_microstep: 1631.11 | bwd_inner_microstep: 1631.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760
[2024-06-10 04:30:10,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.20 | bwd_microstep: 1645.49 | bwd_inner_microstep: 1645.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 04:30:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.68 | bwd_microstep: 1541.01 | bwd_inner_microstep: 1540.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 04:30:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1552.61 | bwd_inner_microstep: 1552.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4067
[2024-06-10 04:30:16,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.15 | bwd_microstep: 1658.01 | bwd_inner_microstep: 1657.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 04:30:18,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1406.99 | bwd_inner_microstep: 1406.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3833
[2024-06-10 04:30:20,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.33 | bwd_microstep: 1727.04 | bwd_inner_microstep: 1727.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 04:30:23,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.60 | bwd_microstep: 1531.04 | bwd_inner_microstep: 1531.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-10 04:30:24,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.46 | bwd_microstep: 1313.82 | bwd_inner_microstep: 1313.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3587
[2024-06-10 04:30:30,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.46 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 04:30:30,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 5247.90 | bwd_inner_microstep: 1536.23 | bwd_allreduce_microstep: 3711.61 | step_microstep: 39.42
[2024-06-10 04:30:30,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15768.84 | bwd: 46124.96 | bwd_inner: 42412.37 | bwd_allreduce: 3711.88 | step: 41.02
{'loss': 1.3115, 'learning_rate': 3.9014159089403167e-05, 'epoch': 0.13}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 04:30:32,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.41 | bwd_microstep: 1327.54 | bwd_inner_microstep: 1327.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 04:30:34,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891
[2024-06-10 04:30:36,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1583.53 | bwd_inner_microstep: 1583.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503
[2024-06-10 04:30:38,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.75 | bwd_microstep: 1219.96 | bwd_inner_microstep: 1219.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3798
[2024-06-10 04:30:40,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.48 | bwd_microstep: 1314.99 | bwd_inner_microstep: 1314.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 04:30:41,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1347.35 | bwd_inner_microstep: 1347.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 04:30:43,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.33 | bwd_microstep: 1279.98 | bwd_inner_microstep: 1279.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895
[2024-06-10 04:30:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.70 | bwd_microstep: 778.33 | bwd_inner_microstep: 778.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 04:30:46,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 04:30:48,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1382.43 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 04:30:50,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.83 | bwd_microstep: 1347.15 | bwd_inner_microstep: 1347.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:30:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1383.20 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002
[2024-06-10 04:30:53,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.43 | bwd_microstep: 901.05 | bwd_inner_microstep: 901.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-10 04:30:55,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.97 | bwd_microstep: 1282.32 | bwd_inner_microstep: 1282.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2154
[2024-06-10 04:30:56,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.83 | bwd_microstep: 786.49 | bwd_inner_microstep: 786.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 04:30:58,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1389.14 | bwd_inner_microstep: 1389.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 04:31:00,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1296.32 | bwd_inner_microstep: 1296.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 04:31:01,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.31 | bwd_microstep: 1187.95 | bwd_inner_microstep: 1187.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 614
[2024-06-10 04:31:02,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.94 | bwd_microstep: 261.53 | bwd_inner_microstep: 261.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3622
[2024-06-10 04:31:03,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.95 | bwd_microstep: 1216.61 | bwd_inner_microstep: 1216.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2136
[2024-06-10 04:31:04,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.97 | bwd_microstep: 834.75 | bwd_inner_microstep: 834.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544
[2024-06-10 04:31:06,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.49 | bwd_microstep: 1231.26 | bwd_inner_microstep: 1231.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 04:31:08,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.77 | bwd_microstep: 1584.89 | bwd_inner_microstep: 1584.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 04:31:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1347.80 | bwd_inner_microstep: 1347.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 04:31:12,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.27 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975
[2024-06-10 04:31:13,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.58 | bwd_microstep: 766.93 | bwd_inner_microstep: 766.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4032
[2024-06-10 04:31:15,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1453.40 | bwd_inner_microstep: 1453.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637
[2024-06-10 04:31:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.50 | bwd_microstep: 1540.10 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 04:31:20,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1401.73 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804
[2024-06-10 04:31:22,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.66 | bwd_microstep: 1758.32 | bwd_inner_microstep: 1758.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3384
[2024-06-10 04:31:24,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.75 | bwd_microstep: 1273.99 | bwd_inner_microstep: 1273.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 04:31:30,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 04:31:30,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 5731.03 | bwd_inner_microstep: 1700.77 | bwd_allreduce_microstep: 4030.21 | step_microstep: 38.64
[2024-06-10 04:31:30,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15122.12 | bwd: 44432.96 | bwd_inner: 40401.81 | bwd_allreduce: 4030.45 | step: 40.32
{'loss': 1.3138, 'learning_rate': 3.900248679205644e-05, 'epoch': 0.13}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3380
[2024-06-10 04:31:32,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.18 | bwd_microstep: 1236.03 | bwd_inner_microstep: 1236.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 04:31:33,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.26 | bwd_microstep: 788.58 | bwd_inner_microstep: 788.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3980
[2024-06-10 04:31:35,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1608.12 | bwd_inner_microstep: 1608.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 04:31:37,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1555.68 | bwd_inner_microstep: 1555.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 04:31:38,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.81 | bwd_microstep: 874.62 | bwd_inner_microstep: 874.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3413
[2024-06-10 04:31:40,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1394.38 | bwd_inner_microstep: 1394.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 04:31:42,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.50 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 04:31:43,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 792.14 | bwd_inner_microstep: 792.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 04:31:45,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1527.54 | bwd_inner_microstep: 1527.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 04:31:46,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.72 | bwd_microstep: 809.85 | bwd_inner_microstep: 809.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 04:31:48,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1399.62 | bwd_inner_microstep: 1399.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 04:31:49,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.89 | bwd_microstep: 727.83 | bwd_inner_microstep: 727.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 04:31:51,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1345.39 | bwd_inner_microstep: 1345.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 04:31:53,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640
[2024-06-10 04:31:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.44 | bwd_microstep: 1575.22 | bwd_inner_microstep: 1575.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 04:31:57,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.56 | bwd_inner_microstep: 1349.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 04:31:59,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.53 | bwd_microstep: 1391.04 | bwd_inner_microstep: 1391.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 04:32:01,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.11 | bwd_microstep: 1391.63 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 04:32:03,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.93 | bwd_microstep: 1160.64 | bwd_inner_microstep: 1160.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 04:32:04,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.97 | bwd_microstep: 1392.64 | bwd_inner_microstep: 1392.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 04:32:06,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.27 | bwd_microstep: 1288.58 | bwd_inner_microstep: 1288.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1978
[2024-06-10 04:32:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.71 | bwd_microstep: 708.16 | bwd_inner_microstep: 708.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 04:32:09,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1253.83 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 04:32:11,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1490.57 | bwd_inner_microstep: 1490.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 04:32:13,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1548.67 | bwd_inner_microstep: 1548.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 04:32:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.51 | bwd_microstep: 805.24 | bwd_inner_microstep: 805.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603
[2024-06-10 04:32:16,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1469.05 | bwd_inner_microstep: 1469.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 04:32:18,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1434.87 | bwd_inner_microstep: 1434.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-10 04:32:20,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1515.26 | bwd_inner_microstep: 1515.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 04:32:23,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1553.22 | bwd_inner_microstep: 1553.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 04:32:25,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1501.48 | bwd_inner_microstep: 1501.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 04:32:32,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 04:32:32,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.59 | bwd_microstep: 7174.71 | bwd_inner_microstep: 1857.92 | bwd_allreduce_microstep: 5316.73 | step_microstep: 38.65
[2024-06-10 04:32:32,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15392.29 | bwd: 46669.94 | bwd_inner: 41352.28 | bwd_allreduce: 5316.96 | step: 40.22
{'loss': 1.3859, 'learning_rate': 3.8990747568050016e-05, 'epoch': 0.13}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 04:32:34,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.79 | bwd_microstep: 1303.33 | bwd_inner_microstep: 1303.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 04:32:36,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.17 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 04:32:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.90 | bwd_microstep: 1243.66 | bwd_inner_microstep: 1243.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 04:32:39,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.40 | bwd_microstep: 962.11 | bwd_inner_microstep: 962.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 04:32:41,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.33 | bwd_microstep: 1287.95 | bwd_inner_microstep: 1287.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 04:32:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.42 | bwd_microstep: 1385.95 | bwd_inner_microstep: 1385.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 04:32:44,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.99 | bwd_microstep: 794.98 | bwd_inner_microstep: 794.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 04:32:45,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.91 | bwd_microstep: 680.11 | bwd_inner_microstep: 680.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 04:32:47,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.14 | bwd_microstep: 1185.12 | bwd_inner_microstep: 1185.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3409
[2024-06-10 04:32:48,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1295.99 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 04:32:50,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1346.69 | bwd_inner_microstep: 1346.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 04:32:52,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1495.07 | bwd_inner_microstep: 1495.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003
[2024-06-10 04:32:54,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.52 | bwd_microstep: 898.18 | bwd_inner_microstep: 898.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3650
[2024-06-10 04:32:56,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.46 | bwd_microstep: 1472.26 | bwd_inner_microstep: 1472.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 04:32:58,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.52 | bwd_microstep: 1477.43 | bwd_inner_microstep: 1477.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3638
[2024-06-10 04:33:00,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.36 | bwd_microstep: 1553.52 | bwd_inner_microstep: 1553.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2414
[2024-06-10 04:33:01,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.32 | bwd_microstep: 936.19 | bwd_inner_microstep: 936.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3526
[2024-06-10 04:33:03,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.17 | bwd_microstep: 1559.80 | bwd_inner_microstep: 1559.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 04:33:05,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1558.59 | bwd_inner_microstep: 1558.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2166
[2024-06-10 04:33:07,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.58 | bwd_microstep: 951.99 | bwd_inner_microstep: 951.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 04:33:09,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1418.61 | bwd_inner_microstep: 1418.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 04:33:10,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 807.64 | bwd_inner_microstep: 807.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 04:33:12,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1554.86 | bwd_inner_microstep: 1554.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2420
[2024-06-10 04:33:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.94 | bwd_microstep: 968.47 | bwd_inner_microstep: 968.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688
[2024-06-10 04:33:15,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1433.42 | bwd_inner_microstep: 1433.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3610
[2024-06-10 04:33:18,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.89 | bwd_microstep: 1656.38 | bwd_inner_microstep: 1656.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2279
[2024-06-10 04:33:19,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.87 | bwd_microstep: 1008.87 | bwd_inner_microstep: 1008.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 04:33:21,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1350.92 | bwd_inner_microstep: 1350.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429
[2024-06-10 04:33:22,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.41 | bwd_microstep: 1156.32 | bwd_inner_microstep: 1156.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 04:33:24,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 04:33:26,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.75 | bwd_microstep: 1280.44 | bwd_inner_microstep: 1280.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 04:33:35,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.26 | optimizer_step: 6.57
[2024-06-10 04:33:35,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.97 | bwd_microstep: 7850.80 | bwd_inner_microstep: 1691.04 | bwd_allreduce_microstep: 6159.70 | step_microstep: 38.81
[2024-06-10 04:33:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15109.70 | bwd: 46672.46 | bwd_inner: 40511.81 | bwd_allreduce: 6159.95 | step: 40.43
{'loss': 1.309, 'learning_rate': 3.897894145872939e-05, 'epoch': 0.13}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 04:33:36,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 784.68 | bwd_inner_microstep: 784.53 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3896
[2024-06-10 04:33:38,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.95 | bwd_microstep: 1385.85 | bwd_inner_microstep: 1385.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 04:33:39,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3775
[2024-06-10 04:33:41,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.51 | bwd_microstep: 1402.35 | bwd_inner_microstep: 1402.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3233
[2024-06-10 04:33:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.78 | bwd_microstep: 1209.25 | bwd_inner_microstep: 1209.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:33:45,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1383.54 | bwd_inner_microstep: 1383.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 04:33:47,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.89 | bwd_microstep: 1280.85 | bwd_inner_microstep: 1280.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 04:33:49,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1288.00 | bwd_inner_microstep: 1287.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 04:33:51,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1478.21 | bwd_inner_microstep: 1478.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 04:33:52,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.89 | bwd_microstep: 1348.32 | bwd_inner_microstep: 1348.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1874
[2024-06-10 04:33:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.36 | bwd_microstep: 771.04 | bwd_inner_microstep: 771.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 04:33:56,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.75 | bwd_microstep: 1434.67 | bwd_inner_microstep: 1434.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 04:33:57,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1340.52 | bwd_inner_microstep: 1340.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 04:33:59,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3323
[2024-06-10 04:34:01,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1294.83 | bwd_inner_microstep: 1294.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3651
[2024-06-10 04:34:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.91 | bwd_microstep: 1574.96 | bwd_inner_microstep: 1574.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 04:34:05,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1280.87 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437
[2024-06-10 04:34:07,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.14 | bwd_microstep: 1313.90 | bwd_inner_microstep: 1313.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 04:34:09,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1460.69 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 04:34:11,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.91 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176
[2024-06-10 04:34:12,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.10 | bwd_microstep: 856.33 | bwd_inner_microstep: 856.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3546
[2024-06-10 04:34:14,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.77 | bwd_microstep: 1356.67 | bwd_inner_microstep: 1356.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 04:34:16,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.89 | bwd_microstep: 1317.59 | bwd_inner_microstep: 1317.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 04:34:18,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.47 | bwd_microstep: 1405.77 | bwd_inner_microstep: 1405.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 04:34:20,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1397.53 | bwd_inner_microstep: 1397.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782
[2024-06-10 04:34:21,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 04:34:23,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.36 | bwd_microstep: 1285.61 | bwd_inner_microstep: 1285.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-10 04:34:25,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1300.34 | bwd_inner_microstep: 1300.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033
[2024-06-10 04:34:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.32 | bwd_microstep: 839.51 | bwd_inner_microstep: 839.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3566
[2024-06-10 04:34:28,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1545.82 | bwd_inner_microstep: 1545.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2309
[2024-06-10 04:34:30,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.13 | bwd_microstep: 884.37 | bwd_inner_microstep: 884.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 04:34:37,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 04:34:37,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 6615.26 | bwd_inner_microstep: 1692.59 | bwd_allreduce_microstep: 4922.62 | step_microstep: 38.69
[2024-06-10 04:34:37,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15513.18 | bwd: 46309.56 | bwd_inner: 41385.92 | bwd_allreduce: 4922.90 | step: 40.40


 13%|█▎        | 218/1726 [3:51:02<25:29:13, 60.84s/it]
 13%|█▎        | 219/1726 [3:52:05<25:42:05, 61.40s/it]


 13%|█▎        | 219/1726 [3:52:05<25:42:05, 61.40s/it]
 13%|█▎        | 220/1726 [3:53:07<25:47:25, 61.65s/it]


 13%|█▎        | 220/1726 [3:53:07<25:47:25, 61.65s/it]
 13%|█▎        | 221/1726 [3:54:07<25:33:10, 61.12s/it]


 13%|█▎        | 221/1726 [3:54:07<25:33:10, 61.12s/it]
 13%|█▎        | 222/1726 [3:55:09<25:41:46, 61.51s/it]


 13%|█▎        | 222/1726 [3:55:09<25:41:46, 61.51s/it]
 13%|█▎        | 223/1726 [3:56:11<25:45:22, 61.69s/it]


 13%|█▎        | 223/1726 [3:56:11<25:45:22, 61.69s/it]
 13%|█▎        | 224{'loss': 1.2389, 'learning_rate': 3.8967068505675594e-05, 'epoch': 0.13}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1905
[2024-06-10 04:34:38,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.34 | bwd_microstep: 803.81 | bwd_inner_microstep: 803.66 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3588
[2024-06-10 04:34:40,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.47 | bwd_microstep: 1455.91 | bwd_inner_microstep: 1455.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741
[2024-06-10 04:34:42,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.57 | bwd_microstep: 1640.73 | bwd_inner_microstep: 1640.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 04:34:43,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.58 | bwd_microstep: 681.29 | bwd_inner_microstep: 681.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932
[2024-06-10 04:34:44,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.56 | bwd_microstep: 700.39 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 04:34:46,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.39 | bwd_microstep: 1305.95 | bwd_inner_microstep: 1305.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 04:34:48,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 04:34:49,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.55 | bwd_microstep: 801.93 | bwd_inner_microstep: 801.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3125
[2024-06-10 04:34:50,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.68 | bwd_microstep: 1152.61 | bwd_inner_microstep: 1152.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-10 04:34:52,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1416.88 | bwd_inner_microstep: 1416.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 04:34:53,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.44 | bwd_microstep: 726.66 | bwd_inner_microstep: 726.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 04:34:55,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.76 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 04:34:57,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1248.19 | bwd_inner_microstep: 1248.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1852
[2024-06-10 04:34:58,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.17 | bwd_microstep: 673.33 | bwd_inner_microstep: 673.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 04:35:00,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.43 | bwd_microstep: 1515.75 | bwd_inner_microstep: 1515.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 04:35:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510
[2024-06-10 04:35:03,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.59 | bwd_microstep: 1223.29 | bwd_inner_microstep: 1223.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 04:35:05,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.19 | bwd_microstep: 1288.30 | bwd_inner_microstep: 1288.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3453
[2024-06-10 04:35:07,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.31 | bwd_microstep: 1319.55 | bwd_inner_microstep: 1319.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912
[2024-06-10 04:35:08,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.74 | bwd_microstep: 687.87 | bwd_inner_microstep: 687.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3444
[2024-06-10 04:35:10,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.09 | bwd_microstep: 1318.88 | bwd_inner_microstep: 1318.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 04:35:12,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1296.97 | bwd_inner_microstep: 1296.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3724
[2024-06-10 04:35:14,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1338.23 | bwd_inner_microstep: 1338.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534
[2024-06-10 04:35:15,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1328.35 | bwd_inner_microstep: 1328.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 04:35:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.10 | bwd_microstep: 1463.68 | bwd_inner_microstep: 1463.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3450
[2024-06-10 04:35:19,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1316.54 | bwd_inner_microstep: 1316.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2245
[2024-06-10 04:35:20,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.52 | bwd_microstep: 873.27 | bwd_inner_microstep: 873.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3748
[2024-06-10 04:35:23,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.28 | bwd_microstep: 1675.37 | bwd_inner_microstep: 1675.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3693
[2024-06-10 04:35:25,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.83 | bwd_microstep: 1606.16 | bwd_inner_microstep: 1606.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 04:35:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1411.48 | bwd_inner_microstep: 1411.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 04:35:29,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1507.13 | bwd_inner_microstep: 1507.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3461
[2024-06-10 04:35:38,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 04:35:38,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 8432.23 | bwd_inner_microstep: 1607.91 | bwd_allreduce_microstep: 6824.27 | step_microstep: 38.81
[2024-06-10 04:35:38,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14743.52 | bwd: 46175.84 | bwd_inner: 39350.55 | bwd_allreduce: 6824.55 | step: 40.39
{'loss': 1.3663, 'learning_rate': 3.895512875070513e-05, 'epoch': 0.13}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2977
[2024-06-10 04:35:40,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.32 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3391
[2024-06-10 04:35:41,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.05 | bwd_microstep: 1144.73 | bwd_inner_microstep: 1144.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 04:35:43,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1375.51 | bwd_inner_microstep: 1375.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 04:35:44,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.97 | bwd_microstep: 677.97 | bwd_inner_microstep: 677.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825
[2024-06-10 04:35:46,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.03 | bwd_microstep: 1385.13 | bwd_inner_microstep: 1385.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2246
[2024-06-10 04:35:47,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.88 | bwd_microstep: 899.66 | bwd_inner_microstep: 899.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 04:35:49,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.62 | bwd_microstep: 1537.99 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 04:35:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.08 | bwd_microstep: 1397.50 | bwd_inner_microstep: 1397.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1388
[2024-06-10 04:35:52,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 214.89 | bwd_microstep: 558.02 | bwd_inner_microstep: 558.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-10 04:35:53,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.85 | bwd_microstep: 698.28 | bwd_inner_microstep: 698.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 04:35:55,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.85 | bwd_microstep: 1387.63 | bwd_inner_microstep: 1387.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1950
[2024-06-10 04:35:56,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.78 | bwd_microstep: 762.80 | bwd_inner_microstep: 762.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2932
[2024-06-10 04:35:58,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.66 | bwd_microstep: 1160.26 | bwd_inner_microstep: 1160.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3426
[2024-06-10 04:35:59,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.57 | bwd_microstep: 1218.80 | bwd_inner_microstep: 1218.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 04:36:01,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1379.24 | bwd_inner_microstep: 1379.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 04:36:03,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.78 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 04:36:05,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.28 | bwd_microstep: 1529.87 | bwd_inner_microstep: 1529.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 04:36:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1517.75 | bwd_inner_microstep: 1517.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 04:36:09,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.71 | bwd_microstep: 979.34 | bwd_inner_microstep: 979.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 04:36:11,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1399.84 | bwd_inner_microstep: 1399.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 04:36:12,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.35 | bwd_microstep: 1191.54 | bwd_inner_microstep: 1191.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 04:36:14,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.35 | bwd_microstep: 1301.87 | bwd_inner_microstep: 1301.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485
[2024-06-10 04:36:16,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.03 | bwd_microstep: 1350.30 | bwd_inner_microstep: 1350.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 04:36:18,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.90 | bwd_microstep: 1158.43 | bwd_inner_microstep: 1158.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 04:36:19,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.26 | bwd_microstep: 1185.13 | bwd_inner_microstep: 1185.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 04:36:21,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.92 | bwd_microstep: 1400.12 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 04:36:23,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510
[2024-06-10 04:36:25,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.27 | bwd_microstep: 1195.96 | bwd_inner_microstep: 1195.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3585
[2024-06-10 04:36:27,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.75 | bwd_microstep: 1574.10 | bwd_inner_microstep: 1574.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 04:36:29,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.58 | bwd_microstep: 1440.33 | bwd_inner_microstep: 1440.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-10 04:36:30,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.26 | bwd_microstep: 778.13 | bwd_inner_microstep: 778.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 04:36:39,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 04:36:39,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.40 | bwd_microstep: 7966.67 | bwd_inner_microstep: 1858.63 | bwd_allreduce_microstep: 6107.99 | step_microstep: 38.74
[2024-06-10 04:36:39,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14783.68 | bwd: 45521.57 | bwd_inner: 39412.66 | bwd_allreduce: 6108.22 | step: 40.38
{'loss': 1.3638, 'learning_rate': 3.894312223586974e-05, 'epoch': 0.13}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1936
[2024-06-10 04:36:40,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.46 | bwd_microstep: 810.05 | bwd_inner_microstep: 809.90 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 04:36:42,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1379.42 | bwd_inner_microstep: 1379.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849
[2024-06-10 04:36:44,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.27 | bwd_microstep: 1558.17 | bwd_inner_microstep: 1558.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-10 04:36:46,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1646.03 | bwd_inner_microstep: 1646.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 04:36:48,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.23 | bwd_microstep: 1541.62 | bwd_inner_microstep: 1541.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 04:36:50,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1287.80 | bwd_inner_microstep: 1287.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 04:36:51,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.47 | bwd_microstep: 802.95 | bwd_inner_microstep: 802.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-10 04:36:53,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.68 | bwd_microstep: 1580.05 | bwd_inner_microstep: 1580.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3745
[2024-06-10 04:36:56,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.70 | bwd_microstep: 1733.57 | bwd_inner_microstep: 1733.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 04:36:57,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.48 | bwd_microstep: 807.33 | bwd_inner_microstep: 807.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379
[2024-06-10 04:36:59,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1432.50 | bwd_inner_microstep: 1432.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 04:37:01,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 04:37:03,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1388.40 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513
[2024-06-10 04:37:05,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997
[2024-06-10 04:37:06,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.57 | bwd_microstep: 831.36 | bwd_inner_microstep: 831.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3541
[2024-06-10 04:37:08,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.75 | bwd_microstep: 1586.15 | bwd_inner_microstep: 1586.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3538
[2024-06-10 04:37:10,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.25 | bwd_microstep: 1447.60 | bwd_inner_microstep: 1447.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3570
[2024-06-10 04:37:12,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.31 | bwd_microstep: 1347.92 | bwd_inner_microstep: 1347.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-10 04:37:14,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.31 | bwd_microstep: 1533.37 | bwd_inner_microstep: 1533.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 04:37:16,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1404.28 | bwd_inner_microstep: 1404.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3916
[2024-06-10 04:37:18,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1489.26 | bwd_inner_microstep: 1489.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 04:37:19,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 789.82 | bwd_inner_microstep: 789.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3663
[2024-06-10 04:37:21,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1323.12 | bwd_inner_microstep: 1323.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 04:37:23,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1289.29 | bwd_inner_microstep: 1289.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 04:37:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.74 | bwd_microstep: 1190.08 | bwd_inner_microstep: 1190.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 04:37:27,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.11 | bwd_microstep: 1605.33 | bwd_inner_microstep: 1605.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 04:37:29,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.94 | bwd_microstep: 1446.58 | bwd_inner_microstep: 1446.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660
[2024-06-10 04:37:31,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.07 | bwd_microstep: 1454.36 | bwd_inner_microstep: 1454.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 04:37:33,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1275.26 | bwd_inner_microstep: 1275.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 04:37:35,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1653.57 | bwd_inner_microstep: 1653.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-10 04:37:36,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 813.89 | bwd_inner_microstep: 813.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 04:37:40,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 04:37:40,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.61 | bwd_microstep: 3077.17 | bwd_inner_microstep: 1639.01 | bwd_allreduce_microstep: 1438.10 | step_microstep: 38.44
[2024-06-10 04:37:40,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.19 | bwd: 44563.00 | bwd_inner: 43123.89 | bwd_allreduce: 1438.38 | step: 40.10
{'loss': 1.3437, 'learning_rate': 3.893104900345631e-05, 'epoch': 0.13}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3600
[2024-06-10 04:37:42,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1465.55 | bwd_inner_microstep: 1465.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 04:37:44,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.53 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 04:37:46,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1478.24 | bwd_inner_microstep: 1478.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 04:37:47,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.49 | bwd_microstep: 818.56 | bwd_inner_microstep: 818.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 04:37:49,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.27 | bwd_microstep: 1649.90 | bwd_inner_microstep: 1649.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 04:37:51,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.81 | bwd_microstep: 1482.72 | bwd_inner_microstep: 1482.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 04:37:53,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1388.01 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 04:37:55,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1398.50 | bwd_inner_microstep: 1398.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 04:37:57,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 04:37:59,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1254.14 | bwd_inner_microstep: 1254.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 04:38:00,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1348.74 | bwd_inner_microstep: 1348.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3477
[2024-06-10 04:38:03,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1547.73 | bwd_inner_microstep: 1547.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 04:38:05,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1478.32 | bwd_inner_microstep: 1478.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-10 04:38:07,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1449.76 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2347
[2024-06-10 04:38:08,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.94 | bwd_microstep: 989.40 | bwd_inner_microstep: 989.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 04:38:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 04:38:12,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.95 | bwd_microstep: 1422.26 | bwd_inner_microstep: 1422.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2027
[2024-06-10 04:38:13,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.92 | bwd_microstep: 744.99 | bwd_inner_microstep: 744.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 04:38:15,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1298.64 | bwd_inner_microstep: 1298.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-10 04:38:17,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.58 | bwd_microstep: 1518.51 | bwd_inner_microstep: 1518.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 04:38:19,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1415.34 | bwd_inner_microstep: 1415.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 04:38:20,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.68 | bwd_microstep: 716.55 | bwd_inner_microstep: 716.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:38:22,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828
[2024-06-10 04:38:23,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.35 | bwd_microstep: 1390.97 | bwd_inner_microstep: 1390.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 04:38:26,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1558.90 | bwd_inner_microstep: 1558.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 04:38:28,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1558.56 | bwd_inner_microstep: 1558.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 04:38:30,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1510.23 | bwd_inner_microstep: 1510.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 04:38:32,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.14 | bwd_microstep: 1489.05 | bwd_inner_microstep: 1489.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717
[2024-06-10 04:38:34,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1366.29 | bwd_inner_microstep: 1366.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 04:38:36,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1470.00 | bwd_inner_microstep: 1469.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3468
[2024-06-10 04:38:38,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1426.15 | bwd_inner_microstep: 1426.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2903
[2024-06-10 04:38:40,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-10 04:38:40,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.30 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1248.70 | bwd_allreduce_microstep: 7.59 | step_microstep: 38.30
[2024-06-10 04:38:40,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16233.50 | bwd: 43385.42 | bwd_inner: 43376.90 | bwd_allreduce: 7.82 | step: 39.84
{'loss': 1.3862, 'learning_rate': 3.8918909095986704e-05, 'epoch': 0.13}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 04:38:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.69 | bwd_microstep: 1448.92 | bwd_inner_microstep: 1448.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 04:38:43,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1344.93 | bwd_inner_microstep: 1344.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 04:38:45,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.86 | bwd_microstep: 1290.14 | bwd_inner_microstep: 1290.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 04:38:47,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1293.46 | bwd_inner_microstep: 1293.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2254
[2024-06-10 04:38:48,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.16 | bwd_microstep: 968.21 | bwd_inner_microstep: 968.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 04:38:50,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1480.58 | bwd_inner_microstep: 1480.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1368
[2024-06-10 04:38:51,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 211.83 | bwd_microstep: 553.71 | bwd_inner_microstep: 553.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-10 04:38:53,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.10 | bwd_microstep: 1527.55 | bwd_inner_microstep: 1527.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 04:38:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1247.76 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 04:38:57,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1388.98 | bwd_inner_microstep: 1388.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2525
[2024-06-10 04:38:58,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.66 | bwd_microstep: 935.03 | bwd_inner_microstep: 935.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416
[2024-06-10 04:39:00,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.10 | bwd_microstep: 1309.05 | bwd_inner_microstep: 1309.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 04:39:02,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1382.02 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3720
[2024-06-10 04:39:04,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.34 | bwd_microstep: 1835.41 | bwd_inner_microstep: 1835.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 04:39:06,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1392.70 | bwd_inner_microstep: 1392.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3520
[2024-06-10 04:39:08,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.98 | bwd_microstep: 1358.60 | bwd_inner_microstep: 1358.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 04:39:10,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1392.00 | bwd_inner_microstep: 1391.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 04:39:12,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.88 | bwd_microstep: 1261.27 | bwd_inner_microstep: 1261.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 04:39:14,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.32 | bwd_microstep: 1489.57 | bwd_inner_microstep: 1489.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-10 04:39:16,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.13 | bwd_microstep: 1314.11 | bwd_inner_microstep: 1314.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 04:39:18,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.50 | bwd_microstep: 1390.48 | bwd_inner_microstep: 1390.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 04:39:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.87 | bwd_microstep: 978.08 | bwd_inner_microstep: 978.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 04:39:20,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.56 | bwd_microstep: 801.20 | bwd_inner_microstep: 801.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 04:39:21,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.57 | bwd_microstep: 697.63 | bwd_inner_microstep: 697.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 04:39:23,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.25 | bwd_microstep: 1635.36 | bwd_inner_microstep: 1635.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1153
[2024-06-10 04:39:24,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 182.76 | bwd_microstep: 468.56 | bwd_inner_microstep: 468.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3549
[2024-06-10 04:39:26,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.34 | bwd_microstep: 1345.13 | bwd_inner_microstep: 1345.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3420
[2024-06-10 04:39:28,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.75 | bwd_microstep: 1397.00 | bwd_inner_microstep: 1396.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2197
[2024-06-10 04:39:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.75 | bwd_microstep: 1020.08 | bwd_inner_microstep: 1020.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 04:39:31,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1312.07 | bwd_inner_microstep: 1312.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574
[2024-06-10 04:39:33,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.50 | bwd_microstep: 1570.24 | bwd_inner_microstep: 1570.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3774
[2024-06-10 04:39:40,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 04:39:40,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.09 | bwd_microstep: 6512.20 | bwd_inner_microstep: 1902.21 | bwd_allreduce_microstep: 4609.94 | step_microstep: 38.68
[2024-06-10 04:39:40,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15194.13 | bwd: 45342.07 | bwd_inner: 40731.22 | bwd_allreduce: 4610.17 | step: 40.27
{'loss': 1.3597, 'learning_rate': 3.890670255621761e-05, 'epoch': 0.13}
/1726 [3:57:13<25:47:55, 61.83s/it]


 13%|█▎        | 224/1726 [3:57:13<25:47:55, 61.83s/it]
 13%|█▎        | 225/1726 [3:58:15<25:42:32, 61.66s/it]


 13%|█▎        | 225/1726 [3:58:15<25:42:32, 61.66s/it]
 13%|█▎        | 226/1726 [3:59:15<25:33:52, 61.35s/it]


 13%|█▎        | 226/1726 [3:59:15<25:33:52, 61.35s/it]
 13%|█▎        | 227/1726 [4:00:16<25:30:01, 61.24s/it]


 13%|█▎        | 227/1726 [4:00:16<25:30:01, 61.24s/it]
 13%|█▎        | 228/1726 [4:01:16<25:19:26, 60.86s/it]


 13%|█▎        | 228/1726 [4:01:16<25:19:26, 60.86s/it]
 13%|█▎        | 229/1726 [4:02:17<25:18:35, 60.87s/it]


 13%|█▎        | 229/1726 [4:02:17<25:18:3dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 04:39:42,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2625
[2024-06-10 04:39:44,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.42 | bwd_microstep: 1011.45 | bwd_inner_microstep: 1011.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2309
[2024-06-10 04:39:45,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.10 | bwd_microstep: 981.29 | bwd_inner_microstep: 981.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-10 04:39:47,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 04:39:49,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.44 | bwd_microstep: 1541.17 | bwd_inner_microstep: 1541.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3754
[2024-06-10 04:39:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1402.18 | bwd_inner_microstep: 1402.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1880
[2024-06-10 04:39:52,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.12 | bwd_microstep: 773.46 | bwd_inner_microstep: 773.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 04:39:55,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.93 | bwd_microstep: 1630.38 | bwd_inner_microstep: 1630.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3747
[2024-06-10 04:39:57,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1503.03 | bwd_inner_microstep: 1503.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 04:39:59,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 04:40:00,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.66 | bwd_microstep: 798.58 | bwd_inner_microstep: 798.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 04:40:01,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1282.89 | bwd_inner_microstep: 1282.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3672
[2024-06-10 04:40:04,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1485.75 | bwd_inner_microstep: 1485.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-10 04:40:05,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 779.76 | bwd_inner_microstep: 779.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147
[2024-06-10 04:40:06,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.60 | bwd_microstep: 948.98 | bwd_inner_microstep: 948.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3689
[2024-06-10 04:40:08,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.14 | bwd_microstep: 1829.10 | bwd_inner_microstep: 1829.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1968
[2024-06-10 04:40:10,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.38 | bwd_microstep: 825.11 | bwd_inner_microstep: 825.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3650
[2024-06-10 04:40:11,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1291.92 | bwd_inner_microstep: 1291.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493
[2024-06-10 04:40:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.92 | bwd_microstep: 1193.23 | bwd_inner_microstep: 1193.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3618
[2024-06-10 04:40:15,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.19 | bwd_microstep: 1267.09 | bwd_inner_microstep: 1267.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 04:40:16,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.20 | bwd_microstep: 799.09 | bwd_inner_microstep: 799.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2147
[2024-06-10 04:40:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.03 | bwd_microstep: 853.90 | bwd_inner_microstep: 853.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555
[2024-06-10 04:40:19,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 1331.20 | bwd_inner_microstep: 1331.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3641
[2024-06-10 04:40:21,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.72 | bwd_microstep: 1351.51 | bwd_inner_microstep: 1351.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 04:40:23,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1502.38 | bwd_inner_microstep: 1502.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 04:40:25,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.46 | bwd_microstep: 1285.54 | bwd_inner_microstep: 1285.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3809
[2024-06-10 04:40:26,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.18 | bwd_microstep: 1291.92 | bwd_inner_microstep: 1291.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2264
[2024-06-10 04:40:28,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.81 | bwd_microstep: 813.55 | bwd_inner_microstep: 813.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3428
[2024-06-10 04:40:29,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 1376.79 | bwd_inner_microstep: 1376.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 04:40:31,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.92 | bwd_microstep: 976.10 | bwd_inner_microstep: 976.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-10 04:40:33,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1518.77 | bwd_inner_microstep: 1518.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3851
[2024-06-10 04:40:41,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.64
[2024-06-10 04:40:41,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 681.35 | bwd_microstep: 7530.20 | bwd_inner_microstep: 2115.57 | bwd_allreduce_microstep: 5414.57 | step_microstep: 38.53
[2024-06-10 04:40:41,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14926.95 | bwd: 45483.42 | bwd_inner: 40067.88 | bwd_allreduce: 5414.84 | step: 40.21
{'loss': 1.286, 'learning_rate': 3.889442942714041e-05, 'epoch': 0.13}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4586
[2024-06-10 04:40:44,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.72 | bwd_microstep: 1770.38 | bwd_inner_microstep: 1770.29 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 04:40:46,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1393.29 | bwd_inner_microstep: 1393.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 04:40:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1486.38 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 04:40:50,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.70 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 04:40:52,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.91 | bwd_microstep: 1444.53 | bwd_inner_microstep: 1444.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 04:40:53,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1281.08 | bwd_inner_microstep: 1281.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 04:40:55,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1387.27 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 04:40:57,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.49 | bwd_microstep: 1152.42 | bwd_inner_microstep: 1152.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 04:40:59,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.17 | bwd_microstep: 1385.43 | bwd_inner_microstep: 1385.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205
[2024-06-10 04:41:00,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.01 | bwd_microstep: 956.18 | bwd_inner_microstep: 956.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 04:41:02,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.15 | bwd_microstep: 1315.17 | bwd_inner_microstep: 1315.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 04:41:04,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1377.49 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482
[2024-06-10 04:41:06,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.07 | bwd_microstep: 1578.85 | bwd_inner_microstep: 1578.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 04:41:08,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-10 04:41:10,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1446.82 | bwd_inner_microstep: 1446.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 04:41:12,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.76 | bwd_microstep: 1338.01 | bwd_inner_microstep: 1337.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 04:41:13,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.55 | bwd_microstep: 1162.32 | bwd_inner_microstep: 1162.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 04:41:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1487.48 | bwd_inner_microstep: 1487.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 04:41:17,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1282.39 | bwd_inner_microstep: 1282.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 04:41:19,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.75 | bwd_microstep: 1328.01 | bwd_inner_microstep: 1327.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 04:41:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.75 | bwd_microstep: 1406.84 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 04:41:23,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.40 | bwd_microstep: 1312.57 | bwd_inner_microstep: 1312.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 04:41:25,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 04:41:27,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1494.55 | bwd_inner_microstep: 1494.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 04:41:29,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1386.01 | bwd_inner_microstep: 1385.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 04:41:31,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1491.78 | bwd_inner_microstep: 1491.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3687
[2024-06-10 04:41:33,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.75 | bwd_microstep: 1489.33 | bwd_inner_microstep: 1489.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643
[2024-06-10 04:41:35,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.91 | bwd_microstep: 1315.47 | bwd_inner_microstep: 1315.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 04:41:37,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1451.27 | bwd_inner_microstep: 1451.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 04:41:39,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.06 | bwd_microstep: 1605.74 | bwd_inner_microstep: 1605.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3476
[2024-06-10 04:41:41,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.68 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2014
[2024-06-10 04:41:45,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 04:41:45,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.67 | bwd_microstep: 3313.81 | bwd_inner_microstep: 984.54 | bwd_allreduce_microstep: 2329.21 | step_microstep: 38.59
[2024-06-10 04:41:45,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16546.18 | bwd: 46581.96 | bwd_inner: 44251.72 | bwd_allreduce: 2329.49 | step: 40.24
{'loss': 1.3273, 'learning_rate': 3.8882089751980985e-05, 'epoch': 0.13}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 04:41:46,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.19 | bwd_microstep: 678.09 | bwd_inner_microstep: 677.95 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 04:41:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.12 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-10 04:41:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3754
[2024-06-10 04:41:51,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.08 | bwd_microstep: 1340.68 | bwd_inner_microstep: 1340.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463
[2024-06-10 04:41:53,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.82 | bwd_microstep: 1246.66 | bwd_inner_microstep: 1246.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1372
[2024-06-10 04:41:54,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.77 | bwd_microstep: 520.87 | bwd_inner_microstep: 520.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3421
[2024-06-10 04:41:55,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.17 | bwd_microstep: 1216.16 | bwd_inner_microstep: 1216.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 04:41:58,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.45 | bwd_microstep: 1630.14 | bwd_inner_microstep: 1630.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420
[2024-06-10 04:41:59,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1157.06 | bwd_inner_microstep: 1157.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 04:42:00,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.98 | bwd_microstep: 689.11 | bwd_inner_microstep: 689.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 04:42:02,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1314.15 | bwd_inner_microstep: 1314.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3666
[2024-06-10 04:42:04,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.98 | bwd_microstep: 1671.14 | bwd_inner_microstep: 1671.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 04:42:06,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1381.56 | bwd_inner_microstep: 1381.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510
[2024-06-10 04:42:08,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1419.87 | bwd_inner_microstep: 1419.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-10 04:42:10,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.92 | bwd_microstep: 1722.31 | bwd_inner_microstep: 1722.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2158
[2024-06-10 04:42:12,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.78 | bwd_microstep: 822.91 | bwd_inner_microstep: 822.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 04:42:13,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.00 | bwd_microstep: 1353.79 | bwd_inner_microstep: 1353.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3485
[2024-06-10 04:42:15,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1405.30 | bwd_inner_microstep: 1405.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 04:42:17,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1409.34 | bwd_inner_microstep: 1409.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 04:42:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.53 | bwd_microstep: 1360.36 | bwd_inner_microstep: 1360.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 04:42:21,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.35 | bwd_microstep: 1297.98 | bwd_inner_microstep: 1297.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3605
[2024-06-10 04:42:23,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1248.84 | bwd_inner_microstep: 1248.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 04:42:25,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1382.22 | bwd_inner_microstep: 1382.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 04:42:27,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.98 | bwd_microstep: 1541.02 | bwd_inner_microstep: 1541.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 04:42:29,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.09 | bwd_microstep: 1405.84 | bwd_inner_microstep: 1405.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 04:42:31,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1459.00 | bwd_inner_microstep: 1458.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-10 04:42:33,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.77 | bwd_microstep: 1599.10 | bwd_inner_microstep: 1599.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3578
[2024-06-10 04:42:35,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1447.14 | bwd_inner_microstep: 1447.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 04:42:37,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.81 | bwd_microstep: 1600.88 | bwd_inner_microstep: 1600.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 04:42:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1379.31 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 04:42:41,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.94 | bwd_microstep: 1604.21 | bwd_inner_microstep: 1604.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 04:42:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.23 | optimizer_step: 6.57
[2024-06-10 04:42:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.72 | bwd_microstep: 3131.73 | bwd_inner_microstep: 1099.72 | bwd_allreduce_microstep: 2031.96 | step_microstep: 38.49
[2024-06-10 04:42:45,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15748.88 | bwd: 44104.39 | bwd_inner: 42071.42 | bwd_allreduce: 2032.24 | step: 40.15
{'loss': 1.3328, 'learning_rate': 3.886968357419961e-05, 'epoch': 0.13}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 04:42:47,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1338.78 | bwd_inner_microstep: 1338.70 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 04:42:48,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1246.31 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3845
[2024-06-10 04:42:50,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1457.74 | bwd_inner_microstep: 1457.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 04:42:52,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1346.18 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4061
[2024-06-10 04:42:54,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.63 | bwd_microstep: 1555.01 | bwd_inner_microstep: 1554.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 04:42:56,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1384.55 | bwd_inner_microstep: 1384.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728
[2024-06-10 04:42:58,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.77 | bwd_microstep: 1464.90 | bwd_inner_microstep: 1464.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 04:43:00,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.34 | bwd_microstep: 1385.09 | bwd_inner_microstep: 1385.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 04:43:02,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.63 | bwd_microstep: 1153.02 | bwd_inner_microstep: 1152.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3585
[2024-06-10 04:43:04,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1239.97 | bwd_inner_microstep: 1239.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 04:43:06,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 04:43:07,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.55 | bwd_microstep: 1283.41 | bwd_inner_microstep: 1283.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3670
[2024-06-10 04:43:09,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.16 | bwd_microstep: 1548.09 | bwd_inner_microstep: 1548.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3505
[2024-06-10 04:43:12,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1498.72 | bwd_inner_microstep: 1498.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 04:43:13,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1247.76 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 04:43:15,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1850
[2024-06-10 04:43:16,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.29 | bwd_microstep: 672.66 | bwd_inner_microstep: 672.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422
[2024-06-10 04:43:18,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1544.69 | bwd_inner_microstep: 1544.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 04:43:19,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.56 | bwd_microstep: 824.01 | bwd_inner_microstep: 823.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 04:43:21,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.49 | bwd_microstep: 1488.42 | bwd_inner_microstep: 1488.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290
[2024-06-10 04:43:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.50 | bwd_microstep: 1072.40 | bwd_inner_microstep: 1072.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-10 04:43:25,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.75 | bwd_microstep: 1331.34 | bwd_inner_microstep: 1331.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430
[2024-06-10 04:43:27,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1410.28 | bwd_inner_microstep: 1410.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 04:43:29,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1600.69 | bwd_inner_microstep: 1600.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3664
[2024-06-10 04:43:31,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.60 | bwd_microstep: 1355.74 | bwd_inner_microstep: 1355.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 04:43:33,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 04:43:35,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1603.13 | bwd_inner_microstep: 1603.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914
[2024-06-10 04:43:36,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.16 | bwd_microstep: 719.83 | bwd_inner_microstep: 719.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 04:43:38,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1282.94 | bwd_inner_microstep: 1282.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 04:43:40,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.03 | bwd_microstep: 1649.15 | bwd_inner_microstep: 1649.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 04:43:42,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1303.98 | bwd_inner_microstep: 1303.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 04:43:48,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 04:43:48,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 5163.64 | bwd_inner_microstep: 1569.80 | bwd_allreduce_microstep: 3593.79 | step_microstep: 38.59
[2024-06-10 04:43:48,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15976.87 | bwd: 46330.66 | bwd_inner: 42735.91 | bwd_allreduce: 3594.05 | step: 40.16
{'loss': 1.3413, 'learning_rate': 3.885721093749078e-05, 'epoch': 0.13}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3390
[2024-06-10 04:43:49,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 04:43:51,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.87 | bwd_microstep: 1243.18 | bwd_inner_microstep: 1243.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3399
[2024-06-10 04:43:53,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1469.39 | bwd_inner_microstep: 1469.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 04:43:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.58 | bwd_microstep: 790.76 | bwd_inner_microstep: 790.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 04:43:56,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1277.61 | bwd_inner_microstep: 1277.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 04:43:58,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.56 | bwd_microstep: 1526.72 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 04:44:00,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.42 | bwd_microstep: 1483.61 | bwd_inner_microstep: 1483.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4127
[2024-06-10 04:44:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.54 | bwd_microstep: 1639.19 | bwd_inner_microstep: 1639.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 429
[2024-06-10 04:44:03,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 88.67 | bwd_microstep: 217.65 | bwd_inner_microstep: 217.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 04:44:05,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 04:44:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1283.84 | bwd_inner_microstep: 1283.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1920
[2024-06-10 04:44:07,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.87 | bwd_microstep: 843.26 | bwd_inner_microstep: 843.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495
[2024-06-10 04:44:09,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1317.78 | bwd_inner_microstep: 1317.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 04:44:11,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1481.00 | bwd_inner_microstep: 1480.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3468
[2024-06-10 04:44:13,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1391.35 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 04:44:15,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1255.10 | bwd_inner_microstep: 1255.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2317
[2024-06-10 04:44:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.81 | bwd_microstep: 985.78 | bwd_inner_microstep: 985.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 04:44:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1289.43 | bwd_inner_microstep: 1289.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 04:44:20,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1493.79 | bwd_inner_microstep: 1493.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-10 04:44:22,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1380.66 | bwd_inner_microstep: 1380.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 04:44:24,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.96 | bwd_microstep: 1496.68 | bwd_inner_microstep: 1496.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 04:44:26,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.81 | bwd_microstep: 1282.79 | bwd_inner_microstep: 1282.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 04:44:28,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1394.92 | bwd_inner_microstep: 1394.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4127
[2024-06-10 04:44:30,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.50 | bwd_microstep: 1643.07 | bwd_inner_microstep: 1643.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426
[2024-06-10 04:44:32,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.38 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 04:44:34,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1492.94 | bwd_inner_microstep: 1492.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3440
[2024-06-10 04:44:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1203.54 | bwd_inner_microstep: 1203.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 04:44:38,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1630.72 | bwd_inner_microstep: 1630.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3382
[2024-06-10 04:44:40,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3563
[2024-06-10 04:44:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.08 | bwd_microstep: 1445.36 | bwd_inner_microstep: 1445.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2947
[2024-06-10 04:44:44,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.32 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 04:44:49,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.70 | optimizer_gradients: 4.24 | optimizer_step: 6.57
[2024-06-10 04:44:49,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 5046.29 | bwd_inner_microstep: 1685.05 | bwd_allreduce_microstep: 3361.19 | step_microstep: 39.61
[2024-06-10 04:44:49,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15774.51 | bwd: 45568.89 | bwd_inner: 42206.79 | bwd_allreduce: 3361.42 | step: 41.24
{'loss': 1.3456, 'learning_rate': 3.884467188578306e-05, 'epoch': 0.14}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459
[2024-06-10 04:44:51,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1329.09 | bwd_inner_microstep: 1329.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4623
[2024-06-10 04:44:54,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 699.63 | bwd_microstep: 1862.74 | bwd_inner_microstep: 1862.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 04:44:56,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.26 | bwd_microstep: 1549.53 | bwd_inner_microstep: 1549.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2346
[2024-06-10 04:44:57,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.98 | bwd_microstep: 985.84 | bwd_inner_microstep: 985.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 04:44:59,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1555.01 | bwd_inner_microstep: 1554.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 04:45:01,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 1479.68 | bwd_inner_microstep: 1479.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:45:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1380.28 | bwd_inner_microstep: 1380.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713
[2024-06-10 04:45:05,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.47 | bwd_microstep: 1332.64 | bwd_inner_microstep: 1332.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1957
[2024-06-10 04:45:06,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.46 | bwd_microstep: 766.11 | bwd_inner_microstep: 766.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3464
[2024-06-10 04:45:08,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.60 | bwd_microstep: 1343.17 | bwd_inner_microstep: 1343.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-10 04:45:10,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.82 | bwd_microstep: 1158.69 | bwd_inner_microstep: 1158.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 04:45:11,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1256.70 | bwd_inner_microstep: 1256.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1916
[2024-06-10 04:45:12,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.27 | bwd_microstep: 755.25 | bwd_inner_microstep: 755.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 04:45:14,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 04:45:16,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.54 | bwd_microstep: 1448.73 | bwd_inner_microstep: 1448.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3465
[2024-06-10 04:45:18,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1423.69 | bwd_inner_microstep: 1423.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-10 04:45:21,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.10 | bwd_microstep: 1618.06 | bwd_inner_microstep: 1618.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3435
[2024-06-10 04:45:23,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1375.39 | bwd_inner_microstep: 1375.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3933
[2024-06-10 04:45:24,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1403.93 | bwd_inner_microstep: 1403.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 04:45:26,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.02 | bwd_microstep: 1182.64 | bwd_inner_microstep: 1182.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 04:45:27,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.86 | bwd_microstep: 798.54 | bwd_inner_microstep: 798.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3551
[2024-06-10 04:45:29,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3439
[2024-06-10 04:45:31,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.29 | bwd_microstep: 1378.78 | bwd_inner_microstep: 1378.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 04:45:33,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1412.80 | bwd_inner_microstep: 1412.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2114
[2024-06-10 04:45:35,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.77 | bwd_microstep: 956.63 | bwd_inner_microstep: 956.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-10 04:45:36,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1312.27 | bwd_inner_microstep: 1312.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-10 04:45:38,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.24 | bwd_microstep: 1182.12 | bwd_inner_microstep: 1182.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 04:45:40,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.27 | bwd_microstep: 1161.83 | bwd_inner_microstep: 1161.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 04:45:42,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.83 | bwd_microstep: 1544.60 | bwd_inner_microstep: 1544.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 04:45:44,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1450.33 | bwd_inner_microstep: 1450.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806
[2024-06-10 04:45:46,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.78 | bwd_microstep: 1622.00 | bwd_inner_microstep: 1621.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-10 04:45:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 04:45:50,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 3926.87 | bwd_inner_microstep: 1620.69 | bwd_allreduce_microstep: 2306.13 | step_microstep: 38.55
[2024-06-10 04:45:50,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.19 | bwd: 44965.46 | bwd_inner: 42658.42 | bwd_allreduce: 2306.35 | step: 40.13
{'loss': 1.3036, 'learning_rate': 3.883206646323892e-05, 'epoch': 0.14}
5, 60.87s/it]
 13%|█▎        | 230/1726 [4:03:18<25:16:41, 60.83s/it]


 13%|█▎        | 230/1726 [4:03:18<25:16:41, 60.83s/it]
 13%|█▎        | 231/1726 [4:04:21<25:35:29, 61.62s/it]


 13%|█▎        | 231/1726 [4:04:21<25:35:29, 61.62s/it]
 13%|█▎        | 232/1726 [4:05:22<25:23:47, 61.20s/it]


 13%|█▎        | 232/1726 [4:05:22<25:23:47, 61.20s/it]
 13%|█▎        | 233/1726 [4:06:24<25:33:37, 61.63s/it]


 13%|█▎        | 233/1726 [4:06:24<25:33:37, 61.63s/it]
 14%|█▎        | 234/1726 [4:07:26<25:32:59, 61.65s/it]


 14%|█▎        | 234/1726 [4:07:26<25:32:59, 61.65s/it]
 14%|█▎        | 235/1726 [4:08:27<25:29:03, 61.53s/it]


 14%|█dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 04:45:52,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1265.86 | bwd_inner_microstep: 1265.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3979
[2024-06-10 04:45:54,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1408.69 | bwd_inner_microstep: 1408.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 04:45:56,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 1476.02 | bwd_inner_microstep: 1475.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 04:45:58,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1557.07 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 04:45:59,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.58 | bwd_microstep: 791.67 | bwd_inner_microstep: 791.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 04:46:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.23 | bwd_microstep: 1384.28 | bwd_inner_microstep: 1384.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3783
[2024-06-10 04:46:03,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 04:46:06,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1536.62 | bwd_inner_microstep: 1536.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 04:46:07,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.48 | bwd_microstep: 1153.76 | bwd_inner_microstep: 1153.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-10 04:46:09,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.53 | bwd_microstep: 1622.30 | bwd_inner_microstep: 1622.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691
[2024-06-10 04:46:11,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.57 | bwd_microstep: 1424.13 | bwd_inner_microstep: 1424.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 04:46:13,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1315.38 | bwd_inner_microstep: 1315.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 04:46:15,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.60 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 04:46:17,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.87 | bwd_microstep: 1253.13 | bwd_inner_microstep: 1253.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 04:46:19,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1475.33 | bwd_inner_microstep: 1475.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 04:46:21,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.86 | bwd_microstep: 1382.95 | bwd_inner_microstep: 1382.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 04:46:23,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.62 | bwd_microstep: 1485.26 | bwd_inner_microstep: 1485.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696
[2024-06-10 04:46:25,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.10 | bwd_microstep: 1631.41 | bwd_inner_microstep: 1631.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 04:46:27,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.25 | bwd_microstep: 1463.35 | bwd_inner_microstep: 1463.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 04:46:28,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.12 | bwd_microstep: 797.11 | bwd_inner_microstep: 797.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 04:46:30,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1297.05 | bwd_inner_microstep: 1297.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 04:46:32,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.29 | bwd_microstep: 1522.12 | bwd_inner_microstep: 1522.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 04:46:34,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.12 | bwd_microstep: 1457.19 | bwd_inner_microstep: 1457.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 04:46:36,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.56 | bwd_microstep: 1288.85 | bwd_inner_microstep: 1288.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3721
[2024-06-10 04:46:38,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1563.96 | bwd_inner_microstep: 1563.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 04:46:40,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1558.08 | bwd_inner_microstep: 1558.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 04:46:41,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 700.00 | bwd_inner_microstep: 699.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 04:46:43,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 1496.29 | bwd_inner_microstep: 1496.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 04:46:45,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.14 | bwd_microstep: 1403.62 | bwd_inner_microstep: 1403.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 04:46:47,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1608.48 | bwd_inner_microstep: 1608.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3618
[2024-06-10 04:46:50,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 1653.55 | bwd_inner_microstep: 1653.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 04:46:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.57
[2024-06-10 04:46:51,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 1083.08 | bwd_inner_microstep: 819.57 | bwd_allreduce_microstep: 263.46 | step_microstep: 38.49
[2024-06-10 04:46:51,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.03 | bwd: 43818.17 | bwd_inner: 43553.81 | bwd_allreduce: 263.69 | step: 40.17
{'loss': 1.3891, 'learning_rate': 3.88193947142546e-05, 'epoch': 0.14}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 04:46:53,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.30 | bwd_microstep: 1272.42 | bwd_inner_microstep: 1272.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872
[2024-06-10 04:46:55,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.36 | bwd_microstep: 1667.52 | bwd_inner_microstep: 1667.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 04:46:57,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.60 | bwd_microstep: 1311.09 | bwd_inner_microstep: 1311.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 04:46:58,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 790.25 | bwd_inner_microstep: 790.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 04:47:00,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.12 | bwd_microstep: 1447.03 | bwd_inner_microstep: 1447.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716
[2024-06-10 04:47:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.98 | bwd_microstep: 1463.79 | bwd_inner_microstep: 1463.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 04:47:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 04:47:05,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 809.48 | bwd_inner_microstep: 809.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3484
[2024-06-10 04:47:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1497.08 | bwd_inner_microstep: 1497.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 04:47:09,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1378.92 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3382
[2024-06-10 04:47:10,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.15 | bwd_microstep: 1177.66 | bwd_inner_microstep: 1177.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 04:47:12,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.81 | bwd_microstep: 1347.73 | bwd_inner_microstep: 1347.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778
[2024-06-10 04:47:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.69 | bwd_microstep: 1745.87 | bwd_inner_microstep: 1745.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644
[2024-06-10 04:47:17,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1438.68 | bwd_inner_microstep: 1438.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 04:47:18,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1292.48 | bwd_inner_microstep: 1292.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619
[2024-06-10 04:47:20,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.72 | bwd_microstep: 1442.79 | bwd_inner_microstep: 1442.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 04:47:22,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.57 | bwd_microstep: 1289.32 | bwd_inner_microstep: 1289.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517
[2024-06-10 04:47:24,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.40 | bwd_microstep: 1195.14 | bwd_inner_microstep: 1195.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 04:47:26,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.28 | bwd_microstep: 1321.95 | bwd_inner_microstep: 1321.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 04:47:28,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.43 | bwd_microstep: 1397.83 | bwd_inner_microstep: 1397.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 04:47:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1300.23 | bwd_inner_microstep: 1300.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 04:47:31,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1347.89 | bwd_inner_microstep: 1347.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 04:47:33,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.50 | bwd_microstep: 1477.78 | bwd_inner_microstep: 1477.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3554
[2024-06-10 04:47:36,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1544.24 | bwd_inner_microstep: 1544.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 04:47:37,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1404.31 | bwd_inner_microstep: 1404.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823
[2024-06-10 04:47:40,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-10 04:47:42,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.70 | bwd_microstep: 1755.69 | bwd_inner_microstep: 1755.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 04:47:44,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1384.08 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-10 04:47:45,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.58 | bwd_microstep: 977.87 | bwd_inner_microstep: 977.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 04:47:48,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.83 | bwd_microstep: 1661.46 | bwd_inner_microstep: 1661.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 04:47:49,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1356.51 | bwd_inner_microstep: 1356.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3615
[2024-06-10 04:47:54,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 04:47:54,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.84 | bwd_microstep: 3777.77 | bwd_inner_microstep: 1538.77 | bwd_allreduce_microstep: 2238.94 | step_microstep: 38.70
[2024-06-10 04:47:54,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16396.90 | bwd: 46145.02 | bwd_inner: 43905.12 | bwd_allreduce: 2239.19 | step: 40.30
{'loss': 1.375, 'learning_rate': 3.8806656683459916e-05, 'epoch': 0.14}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4576
[2024-06-10 04:47:56,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.99 | bwd_microstep: 1846.18 | bwd_inner_microstep: 1846.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 04:47:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.13 | bwd_microstep: 806.36 | bwd_inner_microstep: 806.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 04:47:59,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1280.51 | bwd_inner_microstep: 1280.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794
[2024-06-10 04:48:02,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1646.40 | bwd_inner_microstep: 1646.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 04:48:03,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.73 | bwd_microstep: 1146.30 | bwd_inner_microstep: 1146.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 04:48:05,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.53 | bwd_microstep: 1639.22 | bwd_inner_microstep: 1639.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 04:48:07,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1298.70 | bwd_inner_microstep: 1298.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3798
[2024-06-10 04:48:09,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 04:48:11,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1298.53 | bwd_inner_microstep: 1298.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 04:48:12,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.04 | bwd_microstep: 797.24 | bwd_inner_microstep: 797.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 04:48:14,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1254.35 | bwd_inner_microstep: 1254.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 04:48:15,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1255.32 | bwd_inner_microstep: 1255.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979
[2024-06-10 04:48:17,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.95 | bwd_microstep: 892.12 | bwd_inner_microstep: 892.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 04:48:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1484.54 | bwd_inner_microstep: 1484.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 04:48:21,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1531.81 | bwd_inner_microstep: 1531.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 04:48:23,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1344.34 | bwd_inner_microstep: 1344.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 04:48:25,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1481.69 | bwd_inner_microstep: 1481.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3575
[2024-06-10 04:48:27,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.10 | bwd_microstep: 1561.70 | bwd_inner_microstep: 1561.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 04:48:29,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1294.83 | bwd_inner_microstep: 1294.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 04:48:30,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1285.33 | bwd_inner_microstep: 1285.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 04:48:32,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.59 | bwd_microstep: 1402.55 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-10 04:48:34,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1432.61 | bwd_inner_microstep: 1432.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 04:48:37,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.26 | bwd_microstep: 1563.64 | bwd_inner_microstep: 1563.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3463
[2024-06-10 04:48:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1494.01 | bwd_inner_microstep: 1493.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432
[2024-06-10 04:48:40,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.45 | bwd_microstep: 1311.66 | bwd_inner_microstep: 1311.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 04:48:42,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.03 | bwd_microstep: 809.18 | bwd_inner_microstep: 809.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 04:48:43,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1410.26 | bwd_inner_microstep: 1410.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 04:48:46,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.79 | bwd_microstep: 1648.20 | bwd_inner_microstep: 1648.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 04:48:48,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1447.36 | bwd_inner_microstep: 1447.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 04:48:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1508.51 | bwd_inner_microstep: 1508.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 04:48:52,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3730
[2024-06-10 04:48:54,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 04:48:54,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.23 | bwd_microstep: 2216.07 | bwd_inner_microstep: 1501.34 | bwd_allreduce_microstep: 714.69 | step_microstep: 38.50
[2024-06-10 04:48:54,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16190.84 | bwd: 44022.77 | bwd_inner: 43307.18 | bwd_allreduce: 714.91 | step: 40.09
{'loss': 1.3222, 'learning_rate': 3.879385241571817e-05, 'epoch': 0.14}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 04:48:56,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.78 | bwd_microstep: 1337.65 | bwd_inner_microstep: 1337.54 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 04:48:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.52 | bwd_microstep: 1284.68 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 04:49:00,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.49 | bwd_microstep: 1558.50 | bwd_inner_microstep: 1558.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 04:49:02,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1557.59 | bwd_inner_microstep: 1557.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 04:49:04,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1256.86 | bwd_inner_microstep: 1256.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 04:49:06,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1382.81 | bwd_inner_microstep: 1382.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 04:49:08,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1550.53 | bwd_inner_microstep: 1550.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 04:49:10,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.32 | bwd_microstep: 1537.23 | bwd_inner_microstep: 1537.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 04:49:12,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.75 | bwd_microstep: 1433.59 | bwd_inner_microstep: 1433.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4030
[2024-06-10 04:49:14,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.08 | bwd_microstep: 1449.89 | bwd_inner_microstep: 1449.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 04:49:16,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1345.05 | bwd_inner_microstep: 1345.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1918
[2024-06-10 04:49:17,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.27 | bwd_microstep: 879.32 | bwd_inner_microstep: 879.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 04:49:18,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.96 | bwd_microstep: 791.87 | bwd_inner_microstep: 791.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3736
[2024-06-10 04:49:21,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1655.01 | bwd_inner_microstep: 1654.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2023
[2024-06-10 04:49:22,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.75 | bwd_microstep: 839.11 | bwd_inner_microstep: 839.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3853
[2024-06-10 04:49:24,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.25 | bwd_microstep: 1763.29 | bwd_inner_microstep: 1763.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3396
[2024-06-10 04:49:26,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.47 | bwd_microstep: 1374.52 | bwd_inner_microstep: 1374.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-10 04:49:28,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1621.67 | bwd_inner_microstep: 1621.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3553
[2024-06-10 04:49:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.30 | bwd_microstep: 1451.32 | bwd_inner_microstep: 1451.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 04:49:32,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1490.16 | bwd_inner_microstep: 1490.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 04:49:35,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.50 | bwd_microstep: 1530.32 | bwd_inner_microstep: 1530.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 04:49:36,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.47 | bwd_microstep: 1290.54 | bwd_inner_microstep: 1290.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 04:49:38,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.90 | bwd_microstep: 1463.67 | bwd_inner_microstep: 1463.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294
[2024-06-10 04:49:40,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.96 | bwd_microstep: 978.35 | bwd_inner_microstep: 978.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2282
[2024-06-10 04:49:41,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.14 | bwd_microstep: 785.76 | bwd_inner_microstep: 785.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089
[2024-06-10 04:49:42,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.09 | bwd_microstep: 918.46 | bwd_inner_microstep: 918.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 04:49:44,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.85 | bwd_microstep: 1459.41 | bwd_inner_microstep: 1459.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 04:49:46,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.06 | bwd_microstep: 1503.41 | bwd_inner_microstep: 1503.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3614
[2024-06-10 04:49:48,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1539.39 | bwd_inner_microstep: 1539.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 04:49:50,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.01 | bwd_microstep: 1441.70 | bwd_inner_microstep: 1441.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 04:49:52,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.30 | bwd_microstep: 1505.11 | bwd_inner_microstep: 1505.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4007
[2024-06-10 04:49:56,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 04:49:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.40 | bwd_microstep: 2768.52 | bwd_inner_microstep: 1617.78 | bwd_allreduce_microstep: 1150.69 | step_microstep: 38.46
[2024-06-10 04:49:56,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16251.29 | bwd: 44745.35 | bwd_inner: 43593.66 | bwd_allreduce: 1150.97 | step: 40.07
{'loss': 1.3045, 'learning_rate': 3.8780981956125914e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 04:49:58,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.97 | bwd_microstep: 1442.31 | bwd_inner_microstep: 1442.24 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 04:50:00,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1279.67 | bwd_inner_microstep: 1279.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3792
[2024-06-10 04:50:02,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1445.89 | bwd_inner_microstep: 1445.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 04:50:03,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1250.01 | bwd_inner_microstep: 1249.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402
[2024-06-10 04:50:05,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.10 | bwd_microstep: 1211.67 | bwd_inner_microstep: 1211.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 04:50:06,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 791.59 | bwd_inner_microstep: 791.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-10 04:50:08,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-10 04:50:10,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1633.57 | bwd_inner_microstep: 1633.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1879
[2024-06-10 04:50:11,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.92 | bwd_microstep: 711.15 | bwd_inner_microstep: 711.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 04:50:13,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1352.30 | bwd_inner_microstep: 1352.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-10 04:50:15,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.86 | bwd_microstep: 1419.25 | bwd_inner_microstep: 1419.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3400
[2024-06-10 04:50:17,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 1401.99 | bwd_inner_microstep: 1401.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 04:50:19,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.25 | bwd_microstep: 1280.15 | bwd_inner_microstep: 1280.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 04:50:20,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1251.01 | bwd_inner_microstep: 1250.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 04:50:22,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.14 | bwd_microstep: 1188.68 | bwd_inner_microstep: 1188.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 04:50:24,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.12 | bwd_microstep: 1406.30 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-10 04:50:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.67 | bwd_microstep: 1195.24 | bwd_inner_microstep: 1195.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 04:50:27,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1288.56 | bwd_inner_microstep: 1288.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 04:50:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1406.81 | bwd_inner_microstep: 1406.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 04:50:32,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.79 | bwd_microstep: 1660.51 | bwd_inner_microstep: 1660.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 04:50:34,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.81 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3663
[2024-06-10 04:50:36,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.68 | bwd_microstep: 1321.95 | bwd_inner_microstep: 1321.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-10 04:50:37,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.86 | bwd_microstep: 1155.69 | bwd_inner_microstep: 1155.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620
[2024-06-10 04:50:38,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.01 | bwd_microstep: 263.26 | bwd_inner_microstep: 263.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750
[2024-06-10 04:50:39,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.84 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2521
[2024-06-10 04:50:41,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.99 | bwd_microstep: 873.78 | bwd_inner_microstep: 873.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2079
[2024-06-10 04:50:42,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.14 | bwd_microstep: 756.69 | bwd_inner_microstep: 756.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3702
[2024-06-10 04:50:43,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1266.85 | bwd_inner_microstep: 1266.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071
[2024-06-10 04:50:45,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.92 | bwd_microstep: 917.42 | bwd_inner_microstep: 917.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3800
[2024-06-10 04:50:47,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.32 | bwd_microstep: 1511.84 | bwd_inner_microstep: 1511.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-10 04:50:49,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1529.39 | bwd_inner_microstep: 1529.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2602
[2024-06-10 04:50:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 04:50:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 431.23 | bwd_microstep: 6194.81 | bwd_inner_microstep: 1326.02 | bwd_allreduce_microstep: 4868.74 | step_microstep: 38.76
[2024-06-10 04:50:56,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14888.47 | bwd: 44595.27 | bwd_inner: 39725.57 | bwd_allreduce: 4868.99 | step: 40.37
{'loss': 1.3398, 'learning_rate': 3.876804535001285e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 04:50:58,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1469.83 | bwd_inner_microstep: 1469.71 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 04:50:59,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1242.22 | bwd_inner_microstep: 1242.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 04:51:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.78 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 04:51:03,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.03 | bwd_microstep: 1281.34 | bwd_inner_microstep: 1281.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 04:51:05,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.72 | bwd_microstep: 1445.70 | bwd_inner_microstep: 1445.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2209
[2024-06-10 04:51:06,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.41 | bwd_microstep: 890.87 | bwd_inner_microstep: 890.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 04:51:07,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.81 | bwd_microstep: 703.44 | bwd_inner_microstep: 703.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 04:51:09,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 04:51:11,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.58 | bwd_microstep: 1632.37 | bwd_inner_microstep: 1632.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 04:51:12,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.16 | bwd_microstep: 700.69 | bwd_inner_microstep: 700.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-10 04:51:14,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1441.58 | bwd_inner_microstep: 1441.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 04:51:16,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1344.19 | bwd_inner_microstep: 1344.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642
[2024-06-10 04:51:19,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1708.07 | bwd_inner_microstep: 1708.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 04:51:21,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.12 | bwd_microstep: 1515.70 | bwd_inner_microstep: 1515.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-10 04:51:22,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.10 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 04:51:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.89 | bwd_microstep: 1502.69 | bwd_inner_microstep: 1502.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 04:51:27,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.93 | bwd_microstep: 1426.77 | bwd_inner_microstep: 1426.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454
[2024-06-10 04:51:28,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.10 | bwd_microstep: 1215.01 | bwd_inner_microstep: 1214.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 04:51:30,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.99 | bwd_microstep: 1520.79 | bwd_inner_microstep: 1520.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 04:51:33,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.27 | bwd_microstep: 1647.17 | bwd_inner_microstep: 1647.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 04:51:35,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.63 | bwd_microstep: 1539.91 | bwd_inner_microstep: 1539.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 04:51:37,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1399.86 | bwd_inner_microstep: 1399.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3610
[2024-06-10 04:51:39,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1572.28 | bwd_inner_microstep: 1572.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824
[2024-06-10 04:51:41,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.69 | bwd_microstep: 1513.16 | bwd_inner_microstep: 1513.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 04:51:43,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.34 | bwd_microstep: 1556.35 | bwd_inner_microstep: 1556.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 04:51:45,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.15 | bwd_microstep: 1498.64 | bwd_inner_microstep: 1498.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 04:51:47,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1493.94 | bwd_inner_microstep: 1493.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3755
[2024-06-10 04:51:49,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.69 | bwd_microstep: 1573.43 | bwd_inner_microstep: 1573.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-10 04:51:51,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.44 | bwd_microstep: 912.93 | bwd_inner_microstep: 912.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3802
[2024-06-10 04:51:52,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.25 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538
[2024-06-10 04:51:54,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.30 | bwd_microstep: 1195.60 | bwd_inner_microstep: 1195.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2047
[2024-06-10 04:51:56,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 04:51:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.07 | bwd_microstep: 1943.71 | bwd_inner_microstep: 856.08 | bwd_allreduce_microstep: 1087.58 | step_microstep: 38.64
[2024-06-10 04:51:56,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16159.28 | bwd: 44349.42 | bwd_inner: 43260.82 | bwd_allreduce: 1087.86 | step: 40.20
▎        | 235/1726 [4:08:27<25:29:03, 61.53s/it]
 14%|█▎        | 236/1726 [4:09:28<25:20:07, 61.21s/it]


 14%|█▎        | 236/1726 [4:09:28<25:20:07, 61.21s/it]
 14%|█▎        | 237/1726 [4:10:31<25:31:34, 61.72s/it]


 14%|█▎        | 237/1726 [4:10:31<25:31:34, 61.72s/it]
 14%|█▍        | 238/1726 [4:11:31<25:21:56, 61.37s/it]


 14%|█▍        | 238/1726 [4:11:31<25:21:56, 61.37s/it]
 14%|█▍        | 239/1726 [4:12:32<25:20:44, 61.36s/it]


 14%|█▍        | 239/1726 [4:12:32<25:20:44, 61.36s/it]
 14%|█▍        | 240/1726 [4:13:32<25:08:18, 60.90s/it]


 14%|█▍        | 240/1726 [4:13:32<25:08:18, 60.90s/it]
 14%|█▍        | 241/1726 [4:14:33<25:06:57, 60.89s/it]
                           {'loss': 1.28, 'learning_rate': 3.875504264294161e-05, 'epoch': 0.14}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-10 04:51:58,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1326.18 | bwd_inner_microstep: 1326.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907
[2024-06-10 04:52:00,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.65 | bwd_microstep: 1588.86 | bwd_inner_microstep: 1588.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 04:52:02,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1275.73 | bwd_inner_microstep: 1275.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 04:52:04,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.12 | bwd_microstep: 1342.37 | bwd_inner_microstep: 1342.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3483
[2024-06-10 04:52:06,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1427.92 | bwd_inner_microstep: 1427.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 04:52:08,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1530.62 | bwd_inner_microstep: 1530.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 04:52:10,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1312.32 | bwd_inner_microstep: 1312.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 04:52:12,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1249.00 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 04:52:13,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.10 | bwd_microstep: 1151.36 | bwd_inner_microstep: 1151.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3423
[2024-06-10 04:52:15,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.14 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 04:52:17,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.81 | bwd_microstep: 1483.31 | bwd_inner_microstep: 1483.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 04:52:19,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.93 | bwd_microstep: 1611.50 | bwd_inner_microstep: 1611.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 04:52:21,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1486.79 | bwd_inner_microstep: 1486.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3411
[2024-06-10 04:52:23,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.92 | bwd_microstep: 1444.75 | bwd_inner_microstep: 1444.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135
[2024-06-10 04:52:25,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.89 | bwd_microstep: 833.18 | bwd_inner_microstep: 833.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101
[2024-06-10 04:52:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.41 | bwd_microstep: 826.71 | bwd_inner_microstep: 826.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 04:52:27,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.23 | bwd_microstep: 810.79 | bwd_inner_microstep: 810.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 04:52:29,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.72 | bwd_microstep: 1289.46 | bwd_inner_microstep: 1289.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 04:52:31,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.56 | bwd_microstep: 1383.10 | bwd_inner_microstep: 1383.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 04:52:32,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1295.93 | bwd_inner_microstep: 1295.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3613
[2024-06-10 04:52:34,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1374.40 | bwd_inner_microstep: 1374.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 04:52:36,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 04:52:38,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 1301.18 | bwd_inner_microstep: 1301.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 04:52:40,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.40 | bwd_microstep: 1602.91 | bwd_inner_microstep: 1602.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719
[2024-06-10 04:52:42,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1568.63 | bwd_inner_microstep: 1568.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3682
[2024-06-10 04:52:44,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1548.63 | bwd_inner_microstep: 1548.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3486
[2024-06-10 04:52:46,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1333.85 | bwd_inner_microstep: 1333.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 04:52:48,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.82 | bwd_microstep: 1556.50 | bwd_inner_microstep: 1556.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 04:52:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.31 | bwd_microstep: 980.56 | bwd_inner_microstep: 980.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 04:52:52,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.27 | bwd_microstep: 1439.94 | bwd_inner_microstep: 1439.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270
[2024-06-10 04:52:53,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 880.62 | bwd_inner_microstep: 880.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764
[2024-06-10 04:52:57,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 04:52:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 3931.47 | bwd_inner_microstep: 1671.83 | bwd_allreduce_microstep: 2259.58 | step_microstep: 38.63
[2024-06-10 04:52:57,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15900.39 | bwd: 44820.55 | bwd_inner: 42560.02 | bwd_allreduce: 2259.83 | step: 40.49
{'loss': 1.3461, 'learning_rate': 3.874197388070769e-05, 'epoch': 0.14}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 04:52:59,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1360.85 | bwd_inner_microstep: 1360.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 04:53:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1243.20 | bwd_inner_microstep: 1243.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 04:53:03,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.41 | bwd_microstep: 1454.80 | bwd_inner_microstep: 1454.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-10 04:53:05,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1321.84 | bwd_inner_microstep: 1321.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 04:53:07,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.27 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 04:53:09,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 04:53:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1404.02 | bwd_inner_microstep: 1404.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 04:53:13,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.18 | bwd_microstep: 1485.85 | bwd_inner_microstep: 1485.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 04:53:14,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.20 | bwd_microstep: 1188.48 | bwd_inner_microstep: 1188.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 04:53:15,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.60 | bwd_microstep: 820.87 | bwd_inner_microstep: 820.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3489
[2024-06-10 04:53:17,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1329.88 | bwd_inner_microstep: 1329.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2369
[2024-06-10 04:53:19,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.73 | bwd_microstep: 1092.20 | bwd_inner_microstep: 1092.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3407
[2024-06-10 04:53:21,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.05 | bwd_microstep: 1536.82 | bwd_inner_microstep: 1536.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 04:53:23,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.13 | bwd_microstep: 1604.28 | bwd_inner_microstep: 1604.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 04:53:25,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.08 | bwd_microstep: 1376.48 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928
[2024-06-10 04:53:26,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.00 | bwd_microstep: 698.24 | bwd_inner_microstep: 698.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 04:53:28,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1255.77 | bwd_inner_microstep: 1255.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 04:53:30,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.60 | bwd_microstep: 1613.63 | bwd_inner_microstep: 1613.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082
[2024-06-10 04:53:31,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.10 | bwd_microstep: 821.47 | bwd_inner_microstep: 821.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771
[2024-06-10 04:53:33,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.09 | bwd_microstep: 1346.01 | bwd_inner_microstep: 1345.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 04:53:35,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.06 | bwd_microstep: 1611.17 | bwd_inner_microstep: 1611.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 04:53:37,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.59 | bwd_microstep: 1453.21 | bwd_inner_microstep: 1453.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2272
[2024-06-10 04:53:38,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.32 | bwd_microstep: 908.68 | bwd_inner_microstep: 908.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 04:53:40,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 04:53:42,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1413.25 | bwd_inner_microstep: 1413.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3419
[2024-06-10 04:53:44,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.71 | bwd_microstep: 1373.13 | bwd_inner_microstep: 1373.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 872
[2024-06-10 04:53:45,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.19 | bwd_microstep: 367.26 | bwd_inner_microstep: 367.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2271
[2024-06-10 04:53:46,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.16 | bwd_microstep: 1070.69 | bwd_inner_microstep: 1070.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049
[2024-06-10 04:53:47,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.98 | bwd_microstep: 817.73 | bwd_inner_microstep: 817.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612
[2024-06-10 04:53:49,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.78 | bwd_microstep: 1342.70 | bwd_inner_microstep: 1342.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3414
[2024-06-10 04:53:51,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.12 | bwd_microstep: 1393.07 | bwd_inner_microstep: 1393.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3580
[2024-06-10 04:53:59,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.09 | optimizer_gradients: 4.32 | optimizer_step: 6.60
[2024-06-10 04:53:59,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.31 | bwd_microstep: 7513.49 | bwd_inner_microstep: 1894.01 | bwd_allreduce_microstep: 5619.41 | step_microstep: 40.98
[2024-06-10 04:53:59,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15158.66 | bwd: 46279.37 | bwd_inner: 40659.03 | bwd_allreduce: 5619.64 | step: 42.66
{'loss': 1.2654, 'learning_rate': 3.8728839109339195e-05, 'epoch': 0.14}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1879
[2024-06-10 04:54:00,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.04 | bwd_microstep: 738.84 | bwd_inner_microstep: 738.67 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3926
[2024-06-10 04:54:02,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.18 | bwd_microstep: 1586.92 | bwd_inner_microstep: 1586.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 04:54:05,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.67 | bwd_microstep: 1548.24 | bwd_inner_microstep: 1548.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 04:54:06,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.60 | bwd_microstep: 1341.72 | bwd_inner_microstep: 1341.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 04:54:09,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.32 | bwd_microstep: 1482.10 | bwd_inner_microstep: 1482.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 04:54:10,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.18 | bwd_microstep: 1380.00 | bwd_inner_microstep: 1379.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 04:54:12,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 04:54:14,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1382.36 | bwd_inner_microstep: 1382.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 04:54:16,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.41 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 04:54:18,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.82 | bwd_microstep: 1486.94 | bwd_inner_microstep: 1486.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 04:54:20,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.19 | bwd_microstep: 1479.17 | bwd_inner_microstep: 1479.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3739
[2024-06-10 04:54:22,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1550.88 | bwd_inner_microstep: 1550.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3494
[2024-06-10 04:54:25,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.99 | bwd_microstep: 1680.13 | bwd_inner_microstep: 1680.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-10 04:54:27,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.25 | bwd_microstep: 1438.46 | bwd_inner_microstep: 1438.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 04:54:29,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.58 | bwd_microstep: 1522.28 | bwd_inner_microstep: 1522.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 04:54:31,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1489.87 | bwd_inner_microstep: 1489.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 04:54:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1380.85 | bwd_inner_microstep: 1380.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 04:54:35,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.77 | bwd_microstep: 1470.05 | bwd_inner_microstep: 1470.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-10 04:54:37,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.85 | bwd_microstep: 1426.59 | bwd_inner_microstep: 1426.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717
[2024-06-10 04:54:38,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 04:54:40,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.68 | bwd_microstep: 1377.14 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2037
[2024-06-10 04:54:41,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.00 | bwd_microstep: 717.32 | bwd_inner_microstep: 717.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 04:54:43,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1488.91 | bwd_inner_microstep: 1488.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 04:54:45,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.07 | bwd_microstep: 802.75 | bwd_inner_microstep: 802.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 04:54:47,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.18 | bwd_microstep: 1449.88 | bwd_inner_microstep: 1449.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 04:54:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.18 | bwd_microstep: 1257.39 | bwd_inner_microstep: 1257.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 04:54:50,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.39 | bwd_microstep: 976.61 | bwd_inner_microstep: 976.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 04:54:52,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1657.53 | bwd_inner_microstep: 1657.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 04:54:54,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1399.96 | bwd_inner_microstep: 1399.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 04:54:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.99 | bwd_microstep: 819.71 | bwd_inner_microstep: 819.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 04:54:57,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.03 | bwd_microstep: 1476.72 | bwd_inner_microstep: 1476.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 04:54:59,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 04:54:59,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1632.12 | bwd_inner_microstep: 1583.49 | bwd_allreduce_microstep: 48.57 | step_microstep: 38.48
[2024-06-10 04:54:59,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16241.89 | bwd: 43449.28 | bwd_inner: 43399.67 | bwd_allreduce: 48.86 | step: 40.14
{'loss': 1.3228, 'learning_rate': 3.871563837509672e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 04:55:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.46 | bwd_microstep: 1497.52 | bwd_inner_microstep: 1497.45 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2010
[2024-06-10 04:55:02,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.91 | bwd_microstep: 777.55 | bwd_inner_microstep: 777.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3956
[2024-06-10 04:55:05,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.95 | bwd_microstep: 1600.60 | bwd_inner_microstep: 1600.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2799
[2024-06-10 04:55:06,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.04 | bwd_microstep: 1110.33 | bwd_inner_microstep: 1110.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2926
[2024-06-10 04:55:08,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.34 | bwd_microstep: 1029.56 | bwd_inner_microstep: 1029.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 04:55:09,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.82 | bwd_microstep: 1215.75 | bwd_inner_microstep: 1215.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 04:55:11,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.26 | bwd_microstep: 1537.25 | bwd_inner_microstep: 1537.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 04:55:13,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.21 | bwd_microstep: 1290.24 | bwd_inner_microstep: 1290.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 04:55:15,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1355.67 | bwd_inner_microstep: 1355.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1922
[2024-06-10 04:55:16,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.80 | bwd_microstep: 726.15 | bwd_inner_microstep: 726.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-10 04:55:18,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 04:55:20,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1380.35 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 04:55:22,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1408.80 | bwd_inner_microstep: 1408.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490
[2024-06-10 04:55:24,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.74 | bwd_microstep: 1583.76 | bwd_inner_microstep: 1583.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-10 04:55:26,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.42 | bwd_microstep: 1741.97 | bwd_inner_microstep: 1741.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 04:55:29,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.42 | bwd_microstep: 1607.02 | bwd_inner_microstep: 1607.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2995
[2024-06-10 04:55:30,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.66 | bwd_microstep: 1298.84 | bwd_inner_microstep: 1298.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 04:55:32,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.20 | bwd_microstep: 1165.09 | bwd_inner_microstep: 1165.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517
[2024-06-10 04:55:34,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.94 | bwd_microstep: 1323.90 | bwd_inner_microstep: 1323.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2721
[2024-06-10 04:55:35,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.15 | bwd_microstep: 947.48 | bwd_inner_microstep: 947.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 04:55:37,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.76 | bwd_microstep: 1280.87 | bwd_inner_microstep: 1280.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 04:55:39,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.31 | bwd_microstep: 1454.35 | bwd_inner_microstep: 1454.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 04:55:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1558.63 | bwd_inner_microstep: 1558.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3661
[2024-06-10 04:55:43,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.30 | bwd_microstep: 1484.45 | bwd_inner_microstep: 1484.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2509
[2024-06-10 04:55:45,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.87 | bwd_microstep: 1061.21 | bwd_inner_microstep: 1061.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 04:55:46,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.09 | bwd_microstep: 1302.37 | bwd_inner_microstep: 1302.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3810
[2024-06-10 04:55:49,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.85 | bwd_microstep: 1687.76 | bwd_inner_microstep: 1687.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 04:55:51,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1505.43 | bwd_inner_microstep: 1505.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 04:55:53,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.47 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 04:55:55,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1392.78 | bwd_inner_microstep: 1392.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 04:55:57,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.12 | bwd_microstep: 1597.69 | bwd_inner_microstep: 1597.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-10 04:56:25,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.39 | optimizer_step: 6.59
[2024-06-10 04:56:25,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.24 | bwd_microstep: 27118.90 | bwd_inner_microstep: 1991.80 | bwd_allreduce_microstep: 25127.03 | step_microstep: 39.82
[2024-06-10 04:56:25,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16288.16 | bwd: 68887.00 | bwd_inner: 43758.99 | bwd_allreduce: 25127.29 | step: 41.43
{'loss': 1.3175, 'learning_rate': 3.870237172447317e-05, 'epoch': 0.14}
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2629
[2024-06-10 04:56:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.39 | bwd_microstep: 912.37 | bwd_inner_microstep: 912.28 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3952
[2024-06-10 04:56:28,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1497.01 | bwd_inner_microstep: 1496.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 04:56:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1276.26 | bwd_inner_microstep: 1276.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814
[2024-06-10 04:56:32,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1599.17 | bwd_inner_microstep: 1599.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3776
[2024-06-10 04:56:34,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1503.47 | bwd_inner_microstep: 1503.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 04:56:36,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1280.27 | bwd_inner_microstep: 1280.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2246
[2024-06-10 04:56:37,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.48 | bwd_microstep: 966.68 | bwd_inner_microstep: 966.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3542
[2024-06-10 04:56:39,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.44 | bwd_microstep: 1230.83 | bwd_inner_microstep: 1230.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 04:56:40,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 790.55 | bwd_inner_microstep: 790.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-10 04:56:42,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1284.79 | bwd_inner_microstep: 1284.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2199
[2024-06-10 04:56:43,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.83 | bwd_microstep: 1016.45 | bwd_inner_microstep: 1016.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 04:56:45,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1251.14 | bwd_inner_microstep: 1251.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 04:56:47,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.78 | bwd_microstep: 1256.07 | bwd_inner_microstep: 1256.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1880
[2024-06-10 04:56:48,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.54 | bwd_microstep: 758.97 | bwd_inner_microstep: 758.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 04:56:50,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.21 | bwd_microstep: 1479.99 | bwd_inner_microstep: 1479.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-10 04:56:51,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.41 | bwd_microstep: 889.82 | bwd_inner_microstep: 889.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 04:56:53,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.50 | bwd_microstep: 1529.70 | bwd_inner_microstep: 1529.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527
[2024-06-10 04:56:55,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1323.87 | bwd_inner_microstep: 1323.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 04:56:57,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1277.16 | bwd_inner_microstep: 1277.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3464
[2024-06-10 04:56:59,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1213.97 | bwd_inner_microstep: 1213.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 04:57:00,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1400.94 | bwd_inner_microstep: 1400.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 04:57:02,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1405.96 | bwd_inner_microstep: 1405.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 04:57:04,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.16 | bwd_microstep: 1462.30 | bwd_inner_microstep: 1462.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091
[2024-06-10 04:57:06,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.23 | bwd_microstep: 921.30 | bwd_inner_microstep: 921.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 04:57:08,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.05 | bwd_microstep: 1385.21 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3771
[2024-06-10 04:57:10,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.75 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-10 04:57:12,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.62 | bwd_microstep: 1445.20 | bwd_inner_microstep: 1445.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 04:57:13,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.90 | bwd_microstep: 1302.54 | bwd_inner_microstep: 1302.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3828
[2024-06-10 04:57:16,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.18 | bwd_microstep: 1692.84 | bwd_inner_microstep: 1692.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593
[2024-06-10 04:57:18,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1369.85 | bwd_inner_microstep: 1369.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 04:57:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 04:57:26,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 04:57:26,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 5657.07 | bwd_inner_microstep: 2009.91 | bwd_allreduce_microstep: 3647.11 | step_microstep: 38.72
[2024-06-10 04:57:26,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15454.90 | bwd: 45158.94 | bwd_inner: 41510.85 | bwd_allreduce: 3647.39 | step: 40.32
{'loss': 1.3045, 'learning_rate': 3.868903920419364e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3412
[2024-06-10 04:57:28,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.69 | bwd_microstep: 1440.51 | bwd_inner_microstep: 1440.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-10 04:57:30,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.46 | bwd_microstep: 1625.24 | bwd_inner_microstep: 1625.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 04:57:32,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1347.24 | bwd_inner_microstep: 1347.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 04:57:34,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 1278.66 | bwd_inner_microstep: 1278.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3881
[2024-06-10 04:57:36,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.80 | bwd_microstep: 1582.76 | bwd_inner_microstep: 1582.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 04:57:37,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.74 | bwd_microstep: 1151.33 | bwd_inner_microstep: 1151.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 04:57:39,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1440.18 | bwd_inner_microstep: 1440.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 04:57:41,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.42 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 04:57:43,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 04:57:45,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.24 | bwd_microstep: 1424.52 | bwd_inner_microstep: 1424.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 04:57:47,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1255.04 | bwd_inner_microstep: 1255.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431
[2024-06-10 04:57:49,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.48 | bwd_microstep: 1187.26 | bwd_inner_microstep: 1187.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958
[2024-06-10 04:57:50,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.01 | bwd_microstep: 830.20 | bwd_inner_microstep: 830.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-10 04:57:52,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.76 | bwd_microstep: 1613.02 | bwd_inner_microstep: 1612.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1896
[2024-06-10 04:57:53,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 809.91 | bwd_inner_microstep: 809.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 04:57:55,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 04:57:57,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1486.24 | bwd_inner_microstep: 1486.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-10 04:57:59,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.22 | bwd_microstep: 1580.93 | bwd_inner_microstep: 1580.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3412
[2024-06-10 04:58:01,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.91 | bwd_microstep: 1471.20 | bwd_inner_microstep: 1471.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290
[2024-06-10 04:58:03,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.43 | bwd_microstep: 1070.04 | bwd_inner_microstep: 1070.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 04:58:05,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1486.03 | bwd_inner_microstep: 1486.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 04:58:06,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.12 | bwd_microstep: 980.29 | bwd_inner_microstep: 980.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607
[2024-06-10 04:58:08,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.38 | bwd_microstep: 1572.00 | bwd_inner_microstep: 1571.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3812
[2024-06-10 04:58:11,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.95 | bwd_microstep: 1823.19 | bwd_inner_microstep: 1823.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386
[2024-06-10 04:58:13,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.14 | bwd_microstep: 1437.65 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 04:58:15,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.50 | bwd_microstep: 1405.93 | bwd_inner_microstep: 1405.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 04:58:17,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.74 | bwd_microstep: 1355.64 | bwd_inner_microstep: 1355.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774
[2024-06-10 04:58:19,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.48 | bwd_microstep: 1744.77 | bwd_inner_microstep: 1744.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 04:58:21,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1496.49 | bwd_inner_microstep: 1496.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 04:58:23,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.41 | bwd_microstep: 1639.09 | bwd_inner_microstep: 1639.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2184
[2024-06-10 04:58:25,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.27 | bwd_microstep: 856.86 | bwd_inner_microstep: 856.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3580
[2024-06-10 04:58:27,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 04:58:27,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.45 | bwd_microstep: 2107.53 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 764.18 | step_microstep: 38.44
[2024-06-10 04:58:27,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16322.69 | bwd: 44716.92 | bwd_inner: 43951.81 | bwd_allreduce: 764.41 | step: 40.00


 14%|█▍        | 241/1726 [4:14:33<25:06:57, 60.89s/it]
 14%|█▍        | 242/1726 [4:15:34<25:07:17, 60.94s/it]


 14%|█▍        | 242/1726 [4:15:34<25:07:17, 60.94s/it]
 14%|█▍        | 243/1726 [4:16:36<25:12:33, 61.20s/it]


 14%|█▍        | 243/1726 [4:16:36<25:12:33, 61.20s/it]
 14%|█▍        | 244/1726 [4:17:36<25:02:59, 60.85s/it]


 14%|█▍        | 244/1726 [4:17:36<25:02:59, 60.85s/it]
 14%|█▍        | 245/1726 [4:19:02<28:04:42, 68.25s/it]


 14%|█▍        | 245/1726 [4:19:02<28:04:42, 68.25s/it]
 14%|█▍        | 246/1726 [4:20:03<27:09:34, 66.06s/it]


 14%|█▍        | 246/1726 [4:20:03<27:09:34, 66.06s/it]
 14%|█▍        | 247/1726 [4:21:04<26:33:55, {'loss': 1.3099, 'learning_rate': 3.867564086121519e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 04:58:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1481.69 | bwd_inner_microstep: 1481.63 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 04:58:31,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.87 | bwd_microstep: 1281.06 | bwd_inner_microstep: 1281.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 04:58:33,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.72 | bwd_microstep: 1314.83 | bwd_inner_microstep: 1314.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 04:58:35,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1486.17 | bwd_inner_microstep: 1486.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410
[2024-06-10 04:58:36,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.25 | bwd_microstep: 1150.36 | bwd_inner_microstep: 1150.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 04:58:38,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 04:58:40,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.31 | bwd_microstep: 1393.80 | bwd_inner_microstep: 1393.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 04:58:41,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 801.09 | bwd_inner_microstep: 801.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 04:58:43,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1350.51 | bwd_inner_microstep: 1350.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 04:58:45,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.58 | bwd_microstep: 1249.08 | bwd_inner_microstep: 1249.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 04:58:47,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1279.04 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3673
[2024-06-10 04:58:49,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.66 | bwd_microstep: 1771.42 | bwd_inner_microstep: 1771.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 04:58:51,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 04:58:53,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.99 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651
[2024-06-10 04:58:56,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.17 | bwd_microstep: 1714.82 | bwd_inner_microstep: 1714.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3647
[2024-06-10 04:58:58,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.11 | bwd_microstep: 1713.36 | bwd_inner_microstep: 1713.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927
[2024-06-10 04:58:59,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.96 | bwd_microstep: 726.12 | bwd_inner_microstep: 726.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 04:59:01,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1288.96 | bwd_inner_microstep: 1288.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 04:59:03,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1660.45 | bwd_inner_microstep: 1660.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 04:59:05,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.45 | bwd_microstep: 1427.81 | bwd_inner_microstep: 1427.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2003
[2024-06-10 04:59:06,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.03 | bwd_microstep: 710.90 | bwd_inner_microstep: 710.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2015
[2024-06-10 04:59:07,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.14 | bwd_microstep: 744.45 | bwd_inner_microstep: 744.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047
[2024-06-10 04:59:08,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.56 | bwd_microstep: 813.75 | bwd_inner_microstep: 813.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 04:59:09,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 880.32 | bwd_inner_microstep: 880.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 04:59:12,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1560.12 | bwd_inner_microstep: 1560.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 905
[2024-06-10 04:59:12,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 137.56 | bwd_microstep: 346.79 | bwd_inner_microstep: 346.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2220
[2024-06-10 04:59:13,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.86 | bwd_microstep: 944.70 | bwd_inner_microstep: 944.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 04:59:16,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.96 | bwd_microstep: 1594.54 | bwd_inner_microstep: 1594.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 04:59:18,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.33 | bwd_microstep: 1473.27 | bwd_inner_microstep: 1473.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3569
[2024-06-10 04:59:20,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1540.82 | bwd_inner_microstep: 1540.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 04:59:22,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.30 | bwd_microstep: 1459.26 | bwd_inner_microstep: 1459.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3806
[2024-06-10 04:59:29,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.94 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 04:59:29,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 6418.13 | bwd_inner_microstep: 1849.21 | bwd_allreduce_microstep: 4568.86 | step_microstep: 38.98
[2024-06-10 04:59:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15378.98 | bwd: 45887.96 | bwd_inner: 41318.13 | bwd_allreduce: 4569.12 | step: 40.67
{'loss': 1.3131, 'learning_rate': 3.8662176742726706e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 04:59:31,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1468.60 | bwd_inner_microstep: 1468.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2953
[2024-06-10 04:59:33,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.45 | bwd_microstep: 1266.18 | bwd_inner_microstep: 1266.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-10 04:59:35,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.06 | bwd_microstep: 1660.09 | bwd_inner_microstep: 1660.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 04:59:37,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 04:59:38,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 04:59:40,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.04 | bwd_microstep: 1480.38 | bwd_inner_microstep: 1480.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 04:59:42,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 04:59:44,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 04:59:46,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.25 | bwd_microstep: 1155.37 | bwd_inner_microstep: 1155.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 04:59:47,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.04 | bwd_microstep: 1151.07 | bwd_inner_microstep: 1151.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 04:59:49,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1392.46 | bwd_inner_microstep: 1392.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 04:59:51,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2895
[2024-06-10 04:59:53,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.76 | bwd_microstep: 999.31 | bwd_inner_microstep: 999.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1984
[2024-06-10 04:59:54,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.17 | bwd_microstep: 833.48 | bwd_inner_microstep: 833.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510
[2024-06-10 04:59:56,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1412.46 | bwd_inner_microstep: 1412.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 04:59:58,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1492.71 | bwd_inner_microstep: 1492.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2130
[2024-06-10 04:59:59,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.24 | bwd_microstep: 993.81 | bwd_inner_microstep: 993.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2080
[2024-06-10 05:00:00,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.31 | bwd_microstep: 791.20 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-10 05:00:02,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1418.03 | bwd_inner_microstep: 1418.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 05:00:04,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.29 | bwd_microstep: 1403.91 | bwd_inner_microstep: 1403.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761
[2024-06-10 05:00:06,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.16 | bwd_microstep: 1346.00 | bwd_inner_microstep: 1345.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-10 05:00:07,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.04 | bwd_microstep: 697.27 | bwd_inner_microstep: 697.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695
[2024-06-10 05:00:09,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1333.55 | bwd_inner_microstep: 1333.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3785
[2024-06-10 05:00:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1509.57 | bwd_inner_microstep: 1509.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-10 05:00:13,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.44 | bwd_microstep: 1331.06 | bwd_inner_microstep: 1331.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 05:00:15,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1377.05 | bwd_inner_microstep: 1377.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 05:00:17,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1422.38 | bwd_inner_microstep: 1422.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 05:00:19,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3776
[2024-06-10 05:00:21,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.99 | bwd_microstep: 1616.38 | bwd_inner_microstep: 1616.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 05:00:23,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 1624.17 | bwd_inner_microstep: 1624.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3802
[2024-06-10 05:00:25,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.15 | bwd_microstep: 1619.67 | bwd_inner_microstep: 1619.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 05:00:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 05:00:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.31 | bwd_microstep: 3264.22 | bwd_inner_microstep: 1651.50 | bwd_allreduce_microstep: 1612.67 | step_microstep: 41.09
[2024-06-10 05:00:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15881.65 | bwd: 44064.72 | bwd_inner: 42451.14 | bwd_allreduce: 1612.90 | step: 42.74
{'loss': 1.302, 'learning_rate': 3.864864689614875e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 05:00:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.97 | bwd_microstep: 1475.62 | bwd_inner_microstep: 1475.42 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1481
[2024-06-10 05:00:32,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 193.80 | bwd_microstep: 489.15 | bwd_inner_microstep: 489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1862
[2024-06-10 05:00:33,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.66 | bwd_microstep: 739.08 | bwd_inner_microstep: 739.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 05:00:35,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 05:00:37,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.83 | bwd_microstep: 1444.49 | bwd_inner_microstep: 1444.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 05:00:39,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.56 | bwd_microstep: 1389.76 | bwd_inner_microstep: 1389.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 05:00:41,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.50 | bwd_microstep: 1431.12 | bwd_inner_microstep: 1431.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 05:00:43,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.10 | bwd_microstep: 1633.48 | bwd_inner_microstep: 1633.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 05:00:45,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1385.13 | bwd_inner_microstep: 1385.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1947
[2024-06-10 05:00:46,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.57 | bwd_microstep: 762.26 | bwd_inner_microstep: 762.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2652
[2024-06-10 05:00:48,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.97 | bwd_microstep: 1216.38 | bwd_inner_microstep: 1216.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2160
[2024-06-10 05:00:49,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.19 | bwd_microstep: 853.10 | bwd_inner_microstep: 853.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 05:00:51,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.38 | bwd_microstep: 1325.58 | bwd_inner_microstep: 1325.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2633
[2024-06-10 05:00:52,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.96 | bwd_microstep: 1016.40 | bwd_inner_microstep: 1016.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1992
[2024-06-10 05:00:53,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.06 | bwd_microstep: 830.11 | bwd_inner_microstep: 830.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3702
[2024-06-10 05:00:56,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.54 | bwd_microstep: 1728.78 | bwd_inner_microstep: 1728.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-10 05:00:58,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.42 | bwd_microstep: 1613.75 | bwd_inner_microstep: 1613.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3646
[2024-06-10 05:01:00,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1664.37 | bwd_inner_microstep: 1664.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-10 05:01:02,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.36 | bwd_microstep: 1528.94 | bwd_inner_microstep: 1528.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3834
[2024-06-10 05:01:04,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1391.34 | bwd_inner_microstep: 1391.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 05:01:06,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.30 | bwd_microstep: 1514.80 | bwd_inner_microstep: 1514.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2002
[2024-06-10 05:01:07,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.22 | bwd_microstep: 740.12 | bwd_inner_microstep: 740.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 05:01:09,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1379.12 | bwd_inner_microstep: 1379.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 05:01:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.46 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 05:01:13,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.28 | bwd_microstep: 1172.62 | bwd_inner_microstep: 1172.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-10 05:01:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1327.21 | bwd_inner_microstep: 1327.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683
[2024-06-10 05:01:16,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1330.94 | bwd_inner_microstep: 1330.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2705
[2024-06-10 05:01:18,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.23 | bwd_microstep: 1167.48 | bwd_inner_microstep: 1167.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2281
[2024-06-10 05:01:19,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.91 | bwd_microstep: 961.49 | bwd_inner_microstep: 961.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263
[2024-06-10 05:01:21,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.16 | bwd_microstep: 977.84 | bwd_inner_microstep: 977.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607
[2024-06-10 05:01:23,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1432.90 | bwd_inner_microstep: 1432.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 05:01:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.56
[2024-06-10 05:01:32,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.11 | bwd_microstep: 8527.46 | bwd_inner_microstep: 1811.97 | bwd_allreduce_microstep: 6715.44 | step_microstep: 38.75
[2024-06-10 05:01:32,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15135.01 | bwd: 47322.05 | bwd_inner: 40605.53 | bwd_allreduce: 6715.75 | step: 40.45
{'loss': 1.3133, 'learning_rate': 3.863505136913337e-05, 'epoch': 0.14}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 05:01:34,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1492.57 | bwd_inner_microstep: 1492.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-10 05:01:35,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.53 | bwd_microstep: 676.13 | bwd_inner_microstep: 676.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3861
[2024-06-10 05:01:37,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1423.78 | bwd_inner_microstep: 1423.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2351
[2024-06-10 05:01:38,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.16 | bwd_microstep: 984.52 | bwd_inner_microstep: 984.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 05:01:40,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1543.61 | bwd_inner_microstep: 1543.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2205
[2024-06-10 05:01:42,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.68 | bwd_microstep: 888.73 | bwd_inner_microstep: 888.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 05:01:43,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 05:01:45,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1343.11 | bwd_inner_microstep: 1343.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-10 05:01:46,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.78 | bwd_microstep: 702.62 | bwd_inner_microstep: 702.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 05:01:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.61 | bwd_microstep: 1387.77 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 05:01:49,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.45 | bwd_microstep: 1302.41 | bwd_inner_microstep: 1302.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666
[2024-06-10 05:01:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.95 | bwd_microstep: 1588.03 | bwd_inner_microstep: 1588.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3670
[2024-06-10 05:01:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.02 | bwd_microstep: 1789.28 | bwd_inner_microstep: 1789.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 05:01:56,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.58 | bwd_microstep: 1251.41 | bwd_inner_microstep: 1251.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615
[2024-06-10 05:01:58,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1472.68 | bwd_inner_microstep: 1472.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 05:02:00,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.69 | bwd_microstep: 1493.37 | bwd_inner_microstep: 1493.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 05:02:02,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1386.84 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 05:02:04,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1428.44 | bwd_inner_microstep: 1428.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3705
[2024-06-10 05:02:06,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.60 | bwd_microstep: 1728.92 | bwd_inner_microstep: 1728.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967
[2024-06-10 05:02:07,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.00 | bwd_microstep: 734.77 | bwd_inner_microstep: 734.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000
[2024-06-10 05:02:08,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.92 | bwd_microstep: 772.19 | bwd_inner_microstep: 772.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 05:02:10,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1351.78 | bwd_inner_microstep: 1351.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942
[2024-06-10 05:02:11,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.27 | bwd_microstep: 698.88 | bwd_inner_microstep: 698.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 05:02:13,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1292.78 | bwd_inner_microstep: 1292.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 05:02:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1400.96 | bwd_inner_microstep: 1400.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 05:02:17,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1466.14 | bwd_inner_microstep: 1466.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 05:02:19,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.45 | bwd_microstep: 1563.39 | bwd_inner_microstep: 1563.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 05:02:21,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1406.09 | bwd_inner_microstep: 1406.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3541
[2024-06-10 05:02:23,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1451.31 | bwd_inner_microstep: 1451.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 05:02:25,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.97 | bwd_microstep: 1644.26 | bwd_inner_microstep: 1644.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 05:02:27,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.08 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270
[2024-06-10 05:02:32,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 05:02:32,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.41 | bwd_microstep: 4960.17 | bwd_inner_microstep: 992.43 | bwd_allreduce_microstep: 3967.69 | step_microstep: 38.73
[2024-06-10 05:02:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15265.91 | bwd: 44797.88 | bwd_inner: 40829.28 | bwd_allreduce: 3967.93 | step: 40.34
{'loss': 1.4019, 'learning_rate': 3.862139020956395e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 05:02:34,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1442.23 | bwd_inner_microstep: 1442.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450
[2024-06-10 05:02:36,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.59 | bwd_microstep: 1415.20 | bwd_inner_microstep: 1415.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 05:02:38,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.46 | bwd_microstep: 1552.06 | bwd_inner_microstep: 1552.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 05:02:40,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1493.89 | bwd_inner_microstep: 1493.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 05:02:42,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1344.62 | bwd_inner_microstep: 1344.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2222
[2024-06-10 05:02:44,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.23 | bwd_microstep: 863.11 | bwd_inner_microstep: 863.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 05:02:45,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1382.88 | bwd_inner_microstep: 1382.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 05:02:47,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.62 | bwd_microstep: 1359.65 | bwd_inner_microstep: 1359.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 05:02:49,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 05:02:51,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1312.55 | bwd_inner_microstep: 1312.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3611
[2024-06-10 05:02:53,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1539.14 | bwd_inner_microstep: 1539.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 05:02:55,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1487.23 | bwd_inner_microstep: 1487.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2718
[2024-06-10 05:02:57,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.38 | bwd_microstep: 1194.63 | bwd_inner_microstep: 1194.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627
[2024-06-10 05:02:59,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1709.24 | bwd_inner_microstep: 1709.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-10 05:03:01,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1155.18 | bwd_inner_microstep: 1155.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 950
[2024-06-10 05:03:01,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 147.86 | bwd_microstep: 382.21 | bwd_inner_microstep: 382.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 05:03:03,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1392.93 | bwd_inner_microstep: 1392.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1970
[2024-06-10 05:03:04,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.33 | bwd_microstep: 703.94 | bwd_inner_microstep: 703.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 05:03:06,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1282.12 | bwd_inner_microstep: 1282.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3002
[2024-06-10 05:03:07,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.98 | bwd_microstep: 1111.46 | bwd_inner_microstep: 1111.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 05:03:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 793.66 | bwd_inner_microstep: 793.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 05:03:11,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1556.17 | bwd_inner_microstep: 1556.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 05:03:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.61 | bwd_microstep: 800.30 | bwd_inner_microstep: 800.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 05:03:14,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1375.89 | bwd_inner_microstep: 1375.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 05:03:15,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.76 | bwd_microstep: 1260.59 | bwd_inner_microstep: 1260.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606
[2024-06-10 05:03:18,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.38 | bwd_microstep: 1543.28 | bwd_inner_microstep: 1543.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3615
[2024-06-10 05:03:20,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1549.74 | bwd_inner_microstep: 1549.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 05:03:22,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829
[2024-06-10 05:03:24,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.98 | bwd_microstep: 1758.83 | bwd_inner_microstep: 1758.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 05:03:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1280.64 | bwd_inner_microstep: 1280.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 05:03:28,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1494.76 | bwd_inner_microstep: 1494.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3379
[2024-06-10 05:03:32,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 05:03:32,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.56 | bwd_microstep: 3527.64 | bwd_inner_microstep: 1443.92 | bwd_allreduce_microstep: 2083.67 | step_microstep: 38.65
[2024-06-10 05:03:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15517.69 | bwd: 43648.39 | bwd_inner: 41563.81 | bwd_allreduce: 2083.90 | step: 40.27
{'loss': 1.3044, 'learning_rate': 3.860766346555501e-05, 'epoch': 0.15}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 05:03:34,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1245.07 | bwd_inner_microstep: 1245.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 05:03:35,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.39 | bwd_microstep: 1309.37 | bwd_inner_microstep: 1309.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 05:03:38,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.29 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 05:03:39,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 793.28 | bwd_inner_microstep: 793.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479
[2024-06-10 05:03:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1331.81 | bwd_inner_microstep: 1331.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 05:03:42,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.41 | bwd_microstep: 1185.06 | bwd_inner_microstep: 1185.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3754
[2024-06-10 05:03:44,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1403.21 | bwd_inner_microstep: 1403.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974
[2024-06-10 05:03:45,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.15 | bwd_microstep: 766.68 | bwd_inner_microstep: 766.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 05:03:47,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1433.96 | bwd_inner_microstep: 1433.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 05:03:49,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1248.32 | bwd_inner_microstep: 1248.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 05:03:51,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.11 | bwd_microstep: 1343.48 | bwd_inner_microstep: 1343.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3831
[2024-06-10 05:03:53,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.49 | bwd_microstep: 1490.14 | bwd_inner_microstep: 1490.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3496
[2024-06-10 05:03:55,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.99 | bwd_microstep: 1435.60 | bwd_inner_microstep: 1435.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 05:03:57,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.73 | bwd_microstep: 1486.31 | bwd_inner_microstep: 1486.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-10 05:03:59,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.92 | bwd_microstep: 1452.74 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2650
[2024-06-10 05:04:00,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.04 | bwd_microstep: 1115.16 | bwd_inner_microstep: 1115.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 05:04:02,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1520.17 | bwd_inner_microstep: 1520.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 05:04:05,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.72 | bwd_microstep: 1624.74 | bwd_inner_microstep: 1624.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-10 05:04:07,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.67 | bwd_microstep: 1605.08 | bwd_inner_microstep: 1605.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2066
[2024-06-10 05:04:08,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.18 | bwd_microstep: 948.31 | bwd_inner_microstep: 948.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3488
[2024-06-10 05:04:10,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.98 | bwd_microstep: 1595.29 | bwd_inner_microstep: 1595.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807
[2024-06-10 05:04:13,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.86 | bwd_microstep: 1756.25 | bwd_inner_microstep: 1756.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142
[2024-06-10 05:04:14,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.12 | bwd_microstep: 931.08 | bwd_inner_microstep: 931.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 05:04:15,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.20 | bwd_microstep: 803.89 | bwd_inner_microstep: 803.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 05:04:17,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1432.20 | bwd_inner_microstep: 1432.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 05:04:19,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.60 | bwd_microstep: 1302.02 | bwd_inner_microstep: 1301.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 05:04:21,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.91 | bwd_microstep: 1626.64 | bwd_inner_microstep: 1626.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2223
[2024-06-10 05:04:23,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.73 | bwd_microstep: 961.91 | bwd_inner_microstep: 961.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767
[2024-06-10 05:04:25,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.20 | bwd_microstep: 1573.61 | bwd_inner_microstep: 1573.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 05:04:27,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1398.42 | bwd_inner_microstep: 1398.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-10 05:04:29,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1536.32 | bwd_inner_microstep: 1536.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 05:04:33,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 05:04:33,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.75 | bwd_microstep: 3771.34 | bwd_inner_microstep: 1618.22 | bwd_allreduce_microstep: 2153.07 | step_microstep: 38.76
[2024-06-10 05:04:33,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15902.35 | bwd: 44986.46 | bwd_inner: 42832.48 | bwd_allreduce: 2153.29 | step: 40.40
64.66s/it]


 14%|█▍        | 247/1726 [4:21:04<26:33:55, 64.66s/it]
 14%|█▍        | 248/1726 [4:22:06<26:10:21, 63.75s/it]


 14%|█▍        | 248/1726 [4:22:06<26:10:21, 63.75s/it]
 14%|█▍        | 249/1726 [4:23:06<25:43:47, 62.71s/it]


 14%|█▍        | 249/1726 [4:23:06<25:43:47, 62.71s/it]
 14%|█▍        | 250/1726 [4:24:09<25:43:25, 62.74s/it]


 14%|█▍        | 250/1726 [4:24:09<25:43:25, 62.74s/it]
 15%|█▍        | 251/1726 [4:25:09<25:25:09, 62.04s/it]


 15%|█▍        | 251/1726 [4:25:09<25:25:09, 62.04s/it]
 15%|█▍        | 252/1726 [4:26:09<25:05:28, 61.28s/it]


 15%|█▍        | 252/1726 [4:26:09<25:05:28, 61.28s/it]
 15%|█▍{'loss': 1.3452, 'learning_rate': 3.8593871185452074e-05, 'epoch': 0.15}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2088
[2024-06-10 05:04:34,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.59 | bwd_microstep: 920.13 | bwd_inner_microstep: 920.02 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 05:04:36,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1492.81 | bwd_inner_microstep: 1492.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 05:04:38,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1422.96 | bwd_inner_microstep: 1422.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 05:04:40,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1384.11 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 05:04:41,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.20 | bwd_microstep: 679.52 | bwd_inner_microstep: 679.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 05:04:43,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.14 | bwd_microstep: 1219.54 | bwd_inner_microstep: 1219.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3715
[2024-06-10 05:04:45,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.67 | bwd_microstep: 1463.24 | bwd_inner_microstep: 1463.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884
[2024-06-10 05:04:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.05 | bwd_microstep: 712.81 | bwd_inner_microstep: 712.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 05:04:48,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1389.35 | bwd_inner_microstep: 1389.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-10 05:04:50,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3488
[2024-06-10 05:04:52,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.57 | bwd_microstep: 1576.22 | bwd_inner_microstep: 1576.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 05:04:54,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 05:04:56,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1345.29 | bwd_inner_microstep: 1345.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 05:04:58,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.87 | bwd_microstep: 1587.92 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 05:05:00,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.70 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843
[2024-06-10 05:05:02,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.45 | bwd_microstep: 1457.29 | bwd_inner_microstep: 1457.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-10 05:05:04,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.67 | bwd_microstep: 1199.69 | bwd_inner_microstep: 1199.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 05:05:06,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.85 | bwd_microstep: 1490.14 | bwd_inner_microstep: 1490.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 05:05:08,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1454.05 | bwd_inner_microstep: 1454.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 05:05:10,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.18 | bwd_microstep: 1560.50 | bwd_inner_microstep: 1560.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 05:05:12,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.56 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1918
[2024-06-10 05:05:13,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.82 | bwd_microstep: 720.08 | bwd_inner_microstep: 720.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 05:05:15,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1423.88 | bwd_inner_microstep: 1423.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1148
[2024-06-10 05:05:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 167.26 | bwd_microstep: 430.40 | bwd_inner_microstep: 430.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 05:05:17,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1314.08 | bwd_inner_microstep: 1314.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 05:05:19,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.62 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1906
[2024-06-10 05:05:20,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.16 | bwd_microstep: 757.50 | bwd_inner_microstep: 757.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3422
[2024-06-10 05:05:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.18 | bwd_microstep: 1411.01 | bwd_inner_microstep: 1410.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3575
[2024-06-10 05:05:24,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1460.61 | bwd_inner_microstep: 1460.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493
[2024-06-10 05:05:26,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.27 | bwd_microstep: 1578.19 | bwd_inner_microstep: 1578.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 05:05:28,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429
[2024-06-10 05:05:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 05:05:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 8121.91 | bwd_inner_microstep: 1566.94 | bwd_allreduce_microstep: 6554.93 | step_microstep: 38.75
[2024-06-10 05:05:37,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15391.41 | bwd: 47767.68 | bwd_inner: 41211.74 | bwd_allreduce: 6555.20 | step: 40.41
{'loss': 1.3368, 'learning_rate': 3.858001341783149e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637
[2024-06-10 05:05:39,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1502.50 | bwd_inner_microstep: 1502.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 05:05:40,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1277.25 | bwd_inner_microstep: 1277.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3470
[2024-06-10 05:05:42,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.75 | bwd_microstep: 1439.39 | bwd_inner_microstep: 1439.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 05:05:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.00 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 05:05:46,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.01 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3837
[2024-06-10 05:05:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.99 | bwd_microstep: 1629.86 | bwd_inner_microstep: 1629.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091
[2024-06-10 05:05:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.58 | bwd_microstep: 822.87 | bwd_inner_microstep: 822.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-10 05:05:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.59 | bwd_microstep: 1529.09 | bwd_inner_microstep: 1529.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 05:05:54,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.44 | bwd_microstep: 1484.78 | bwd_inner_microstep: 1484.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 05:05:56,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1346.17 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 05:05:58,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.92 | bwd_microstep: 1617.15 | bwd_inner_microstep: 1617.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 05:06:00,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864
[2024-06-10 05:06:01,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.15 | bwd_microstep: 707.94 | bwd_inner_microstep: 707.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3707
[2024-06-10 05:06:03,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.29 | bwd_microstep: 1724.61 | bwd_inner_microstep: 1724.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 05:06:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.74 | bwd_microstep: 1643.11 | bwd_inner_microstep: 1643.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 05:06:07,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3670
[2024-06-10 05:06:09,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1659.90 | bwd_inner_microstep: 1659.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 05:06:11,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1420.11 | bwd_inner_microstep: 1420.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3633
[2024-06-10 05:06:13,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1473.34 | bwd_inner_microstep: 1473.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3103
[2024-06-10 05:06:15,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1249.90 | bwd_inner_microstep: 1249.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 05:06:17,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1496.17 | bwd_inner_microstep: 1496.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 05:06:19,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.96 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-10 05:06:21,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 1360.12 | bwd_inner_microstep: 1360.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865
[2024-06-10 05:06:23,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.59 | bwd_microstep: 1479.55 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046
[2024-06-10 05:06:24,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.30 | bwd_microstep: 910.90 | bwd_inner_microstep: 910.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-10 05:06:26,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.57 | bwd_microstep: 1186.91 | bwd_inner_microstep: 1186.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 05:06:28,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1658.35 | bwd_inner_microstep: 1658.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 05:06:30,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.29 | bwd_microstep: 1284.43 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3719
[2024-06-10 05:06:32,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.11 | bwd_microstep: 1300.44 | bwd_inner_microstep: 1300.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-10 05:06:34,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.39 | bwd_microstep: 1533.65 | bwd_inner_microstep: 1533.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 05:06:36,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1394.76 | bwd_inner_microstep: 1394.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 05:06:40,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 05:06:40,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 3478.61 | bwd_inner_microstep: 1554.67 | bwd_allreduce_microstep: 1923.88 | step_microstep: 38.86
[2024-06-10 05:06:40,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16630.14 | bwd: 46553.38 | bwd_inner: 44628.58 | bwd_allreduce: 1924.11 | step: 40.63
{'loss': 1.3071, 'learning_rate': 3.856609021150022e-05, 'epoch': 0.15}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 05:06:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1340.39 | bwd_inner_microstep: 1340.27 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 05:06:44,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.65 | bwd_microstep: 1349.37 | bwd_inner_microstep: 1349.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3914
[2024-06-10 05:06:46,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.78 | bwd_microstep: 1691.53 | bwd_inner_microstep: 1691.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 05:06:48,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.18 | bwd_microstep: 1295.37 | bwd_inner_microstep: 1295.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742
[2024-06-10 05:06:50,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.33 | bwd_microstep: 1534.59 | bwd_inner_microstep: 1534.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 05:06:52,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1382.50 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 05:06:54,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.85 | bwd_microstep: 1342.57 | bwd_inner_microstep: 1342.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 05:06:56,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 05:06:58,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1250.06 | bwd_inner_microstep: 1250.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3197
[2024-06-10 05:06:59,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.40 | bwd_microstep: 1170.94 | bwd_inner_microstep: 1170.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3483
[2024-06-10 05:07:01,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.58 | bwd_microstep: 1444.37 | bwd_inner_microstep: 1444.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3705
[2024-06-10 05:07:03,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.18 | bwd_microstep: 1725.22 | bwd_inner_microstep: 1725.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3668
[2024-06-10 05:07:06,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.17 | bwd_microstep: 1719.46 | bwd_inner_microstep: 1719.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 05:07:08,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1486.61 | bwd_inner_microstep: 1486.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122
[2024-06-10 05:07:09,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.44 | bwd_microstep: 827.33 | bwd_inner_microstep: 827.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427
[2024-06-10 05:07:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 05:07:13,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1430.64 | bwd_inner_microstep: 1430.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 05:07:15,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1387.65 | bwd_inner_microstep: 1387.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825
[2024-06-10 05:07:17,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1482.24 | bwd_inner_microstep: 1482.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 05:07:19,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1511.34 | bwd_inner_microstep: 1511.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 05:07:21,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1460.70 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612
[2024-06-10 05:07:23,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.52 | bwd_microstep: 1312.96 | bwd_inner_microstep: 1312.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 05:07:24,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1250.07 | bwd_inner_microstep: 1250.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 05:07:26,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.70 | bwd_microstep: 879.66 | bwd_inner_microstep: 879.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 05:07:28,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1475.65 | bwd_inner_microstep: 1475.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 05:07:30,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1414.56 | bwd_inner_microstep: 1414.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-10 05:07:31,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.98 | bwd_microstep: 1316.88 | bwd_inner_microstep: 1316.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2186
[2024-06-10 05:07:33,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.95 | bwd_microstep: 956.69 | bwd_inner_microstep: 956.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005
[2024-06-10 05:07:34,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.04 | bwd_microstep: 831.70 | bwd_inner_microstep: 831.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 05:07:36,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 05:07:38,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1396.67 | bwd_inner_microstep: 1396.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3786
[2024-06-10 05:07:42,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 05:07:42,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 3852.58 | bwd_inner_microstep: 1591.96 | bwd_allreduce_microstep: 2260.56 | step_microstep: 38.70
[2024-06-10 05:07:42,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.72 | bwd: 45571.87 | bwd_inner: 43310.30 | bwd_allreduce: 2260.86 | step: 40.31
{'loss': 1.3265, 'learning_rate': 3.8552101615495755e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 05:07:44,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.81 | bwd_microstep: 1466.02 | bwd_inner_microstep: 1465.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 05:07:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1343.96 | bwd_inner_microstep: 1343.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3886
[2024-06-10 05:07:48,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1385.65 | bwd_inner_microstep: 1385.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3793
[2024-06-10 05:07:50,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1478.43 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 05:07:52,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.98 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 05:07:54,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1386.58 | bwd_inner_microstep: 1386.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 05:07:55,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.49 | bwd_microstep: 680.29 | bwd_inner_microstep: 680.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046
[2024-06-10 05:07:56,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.99 | bwd_microstep: 810.53 | bwd_inner_microstep: 810.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 05:07:58,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1639.59 | bwd_inner_microstep: 1639.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 05:08:00,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.79 | bwd_microstep: 1156.78 | bwd_inner_microstep: 1156.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3479
[2024-06-10 05:08:02,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.50 | bwd_microstep: 1424.26 | bwd_inner_microstep: 1424.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 05:08:03,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.35 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-10 05:08:05,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1420.75 | bwd_inner_microstep: 1420.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662
[2024-06-10 05:08:07,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.89 | bwd_microstep: 1420.06 | bwd_inner_microstep: 1420.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 05:08:09,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.05 | bwd_microstep: 1514.42 | bwd_inner_microstep: 1514.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997
[2024-06-10 05:08:10,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.58 | bwd_microstep: 835.21 | bwd_inner_microstep: 835.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518
[2024-06-10 05:08:12,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1420.43 | bwd_inner_microstep: 1420.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 05:08:14,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.95 | bwd_microstep: 1306.69 | bwd_inner_microstep: 1306.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3631
[2024-06-10 05:08:16,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.62 | bwd_microstep: 1251.38 | bwd_inner_microstep: 1251.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 05:08:18,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.66 | bwd_microstep: 1329.75 | bwd_inner_microstep: 1329.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 05:08:20,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1556.27 | bwd_inner_microstep: 1556.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 05:08:22,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1509.77 | bwd_inner_microstep: 1509.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 05:08:24,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1401.84 | bwd_inner_microstep: 1401.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 05:08:26,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1496.23 | bwd_inner_microstep: 1496.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 05:08:28,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.98 | bwd_microstep: 1663.01 | bwd_inner_microstep: 1662.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3730
[2024-06-10 05:08:30,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1339.56 | bwd_inner_microstep: 1339.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 05:08:32,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.55 | bwd_microstep: 1255.58 | bwd_inner_microstep: 1255.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 05:08:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.52 | bwd_microstep: 1556.78 | bwd_inner_microstep: 1556.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 05:08:36,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.93 | bwd_microstep: 1341.75 | bwd_inner_microstep: 1341.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3554
[2024-06-10 05:08:38,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.19 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 05:08:40,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.43 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 05:08:45,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 05:08:45,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 4410.51 | bwd_inner_microstep: 1682.62 | bwd_allreduce_microstep: 2727.84 | step_microstep: 38.71
[2024-06-10 05:08:45,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16217.49 | bwd: 46093.78 | bwd_inner: 43364.98 | bwd_allreduce: 2728.09 | step: 40.29
{'loss': 1.2914, 'learning_rate': 3.853804767908584e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 05:08:47,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1469.34 | bwd_inner_microstep: 1469.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 05:08:49,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1477.30 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 05:08:51,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1281.28 | bwd_inner_microstep: 1281.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2320
[2024-06-10 05:08:52,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.91 | bwd_microstep: 981.35 | bwd_inner_microstep: 981.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 05:08:54,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1350.68 | bwd_inner_microstep: 1350.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 05:08:56,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.03 | bwd_microstep: 1282.13 | bwd_inner_microstep: 1282.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3723
[2024-06-10 05:08:58,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.51 | bwd_microstep: 1840.42 | bwd_inner_microstep: 1840.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 05:08:59,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.81 | bwd_microstep: 797.96 | bwd_inner_microstep: 797.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 05:09:01,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.68 | bwd_microstep: 1302.81 | bwd_inner_microstep: 1302.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 05:09:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1391.69 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 05:09:05,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1424.46 | bwd_inner_microstep: 1424.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 05:09:06,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.42 | bwd_microstep: 799.45 | bwd_inner_microstep: 799.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-10 05:09:08,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.69 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2993
[2024-06-10 05:09:10,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.71 | bwd_microstep: 1203.07 | bwd_inner_microstep: 1203.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 05:09:11,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1393.91 | bwd_inner_microstep: 1393.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2942
[2024-06-10 05:09:13,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.33 | bwd_microstep: 1098.19 | bwd_inner_microstep: 1098.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902
[2024-06-10 05:09:15,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.00 | bwd_microstep: 1685.25 | bwd_inner_microstep: 1685.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 05:09:18,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.10 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3604
[2024-06-10 05:09:19,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.67 | bwd_microstep: 1321.51 | bwd_inner_microstep: 1321.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 05:09:21,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.98 | bwd_microstep: 1283.29 | bwd_inner_microstep: 1283.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532
[2024-06-10 05:09:23,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.44 | bwd_microstep: 1329.00 | bwd_inner_microstep: 1328.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 05:09:25,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 05:09:26,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.36 | bwd_microstep: 805.92 | bwd_inner_microstep: 805.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2274
[2024-06-10 05:09:27,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.41 | bwd_microstep: 910.08 | bwd_inner_microstep: 910.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 05:09:29,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.52 | bwd_microstep: 1315.72 | bwd_inner_microstep: 1315.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 05:09:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.42 | bwd_microstep: 881.87 | bwd_inner_microstep: 881.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3641
[2024-06-10 05:09:32,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1667.00 | bwd_inner_microstep: 1666.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3522
[2024-06-10 05:09:34,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.57 | bwd_microstep: 1340.93 | bwd_inner_microstep: 1340.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2272
[2024-06-10 05:09:36,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.58 | bwd_microstep: 934.35 | bwd_inner_microstep: 934.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3471
[2024-06-10 05:09:38,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.56 | bwd_microstep: 1533.16 | bwd_inner_microstep: 1533.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 05:09:40,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.81 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 05:09:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 05:09:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.28 | bwd_microstep: 5744.62 | bwd_inner_microstep: 1753.18 | bwd_allreduce_microstep: 3991.38 | step_microstep: 38.66
[2024-06-10 05:09:46,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15469.14 | bwd: 45445.26 | bwd_inner: 41452.93 | bwd_allreduce: 3991.62 | step: 40.30
{'loss': 1.3321, 'learning_rate': 3.852392845176837e-05, 'epoch': 0.15}
        | 253/1726 [4:27:10<25:04:06, 61.27s/it]


 15%|█▍        | 253/1726 [4:27:10<25:04:06, 61.27s/it]
 15%|█▍        | 254/1726 [4:28:13<25:19:30, 61.94s/it]


 15%|█▍        | 254/1726 [4:28:13<25:19:30, 61.94s/it]
 15%|█▍        | 255/1726 [4:29:17<25:30:20, 62.42s/it]


 15%|█▍        | 255/1726 [4:29:17<25:30:20, 62.42s/it]
 15%|█▍        | 256/1726 [4:30:19<25:26:51, 62.32s/it]


 15%|█▍        | 256/1726 [4:30:19<25:26:51, 62.32s/it]
 15%|█▍        | 257/1726 [4:31:22<25:28:17, 62.42s/it]


 15%|█▍        | 257/1726 [4:31:22<25:28:17, 62.42s/it]
 15%|█▍        | 258/1726 [4:32:23<25:18:46, 62.08s/it]


 15%|█▍        | 258/1726 [4:dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 05:09:48,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.36 | bwd_microstep: 1568.12 | bwd_inner_microstep: 1568.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 05:09:50,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.36 | bwd_microstep: 1255.19 | bwd_inner_microstep: 1255.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3861
[2024-06-10 05:09:52,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1492.65 | bwd_inner_microstep: 1492.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 05:09:54,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.73 | bwd_microstep: 1251.32 | bwd_inner_microstep: 1251.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998
[2024-06-10 05:09:55,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.50 | bwd_microstep: 738.12 | bwd_inner_microstep: 738.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 05:09:57,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 05:09:59,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1387.20 | bwd_inner_microstep: 1387.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3719
[2024-06-10 05:10:00,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.39 | bwd_microstep: 1273.70 | bwd_inner_microstep: 1273.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755
[2024-06-10 05:10:02,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.71 | bwd_microstep: 1472.66 | bwd_inner_microstep: 1472.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 05:10:04,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.99 | bwd_microstep: 1153.73 | bwd_inner_microstep: 1153.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 05:10:06,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.80 | bwd_microstep: 1629.07 | bwd_inner_microstep: 1629.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 05:10:08,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1384.49 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 05:10:10,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.28 | bwd_microstep: 1483.73 | bwd_inner_microstep: 1483.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551
[2024-06-10 05:10:12,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.61 | bwd_microstep: 1234.41 | bwd_inner_microstep: 1234.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508
[2024-06-10 05:10:14,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.38 | bwd_microstep: 1221.51 | bwd_inner_microstep: 1221.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 05:10:15,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1289.11 | bwd_inner_microstep: 1289.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2411
[2024-06-10 05:10:17,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.94 | bwd_microstep: 909.79 | bwd_inner_microstep: 909.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 05:10:18,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.70 | bwd_microstep: 804.83 | bwd_inner_microstep: 804.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 05:10:19,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.20 | bwd_microstep: 1158.75 | bwd_inner_microstep: 1158.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2160
[2024-06-10 05:10:20,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.34 | bwd_microstep: 759.77 | bwd_inner_microstep: 759.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 05:10:22,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.97 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2126
[2024-06-10 05:10:23,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.92 | bwd_microstep: 832.75 | bwd_inner_microstep: 832.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2470
[2024-06-10 05:10:25,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.77 | bwd_microstep: 955.16 | bwd_inner_microstep: 955.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 05:10:26,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.54 | bwd_microstep: 730.50 | bwd_inner_microstep: 730.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 05:10:28,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.18 | bwd_microstep: 1513.17 | bwd_inner_microstep: 1513.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 05:10:30,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.08 | bwd_microstep: 1280.91 | bwd_inner_microstep: 1280.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1908
[2024-06-10 05:10:31,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.43 | bwd_microstep: 780.39 | bwd_inner_microstep: 780.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 05:10:32,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 804.87 | bwd_inner_microstep: 804.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 05:10:34,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.65 | bwd_microstep: 1603.33 | bwd_inner_microstep: 1603.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 05:10:36,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.76 | bwd_microstep: 1358.70 | bwd_inner_microstep: 1358.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 05:10:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.74 | bwd_microstep: 1612.67 | bwd_inner_microstep: 1612.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 05:10:46,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 05:10:46,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.69 | bwd_microstep: 7624.04 | bwd_inner_microstep: 1811.54 | bwd_allreduce_microstep: 5812.45 | step_microstep: 38.69
[2024-06-10 05:10:46,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14741.75 | bwd: 45133.58 | bwd_inner: 39320.23 | bwd_allreduce: 5812.68 | step: 40.43
{'loss': 1.2924, 'learning_rate': 3.8509743983271196e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 05:10:48,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1470.59 | bwd_inner_microstep: 1470.51 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2380
[2024-06-10 05:10:50,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.18 | bwd_microstep: 1059.14 | bwd_inner_microstep: 1059.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 05:10:52,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1340.94 | bwd_inner_microstep: 1340.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 05:10:54,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.71 | bwd_microstep: 1483.69 | bwd_inner_microstep: 1483.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 05:10:56,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.72 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 05:10:57,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.03 | bwd_microstep: 1151.07 | bwd_inner_microstep: 1151.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 05:10:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1385.55 | bwd_inner_microstep: 1385.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 05:11:01,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.20 | bwd_microstep: 1499.04 | bwd_inner_microstep: 1499.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 05:11:03,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958
[2024-06-10 05:11:04,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.26 | bwd_microstep: 826.72 | bwd_inner_microstep: 826.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2295
[2024-06-10 05:11:05,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.55 | bwd_microstep: 1007.98 | bwd_inner_microstep: 1007.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 05:11:07,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977
[2024-06-10 05:11:09,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.89 | bwd_microstep: 830.62 | bwd_inner_microstep: 830.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 05:11:11,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1477.22 | bwd_inner_microstep: 1477.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 05:11:13,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1408.50 | bwd_inner_microstep: 1408.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1965
[2024-06-10 05:11:14,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.24 | bwd_microstep: 747.80 | bwd_inner_microstep: 747.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3552
[2024-06-10 05:11:15,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1279.84 | bwd_inner_microstep: 1279.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 05:11:18,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1608.09 | bwd_inner_microstep: 1608.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 05:11:20,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.99 | bwd_microstep: 1418.17 | bwd_inner_microstep: 1418.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3552
[2024-06-10 05:11:21,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1204.23 | bwd_inner_microstep: 1204.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 05:11:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.82 | bwd_microstep: 811.42 | bwd_inner_microstep: 811.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 05:11:24,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1412.58 | bwd_inner_microstep: 1412.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-10 05:11:26,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-10 05:11:29,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.77 | bwd_microstep: 1519.15 | bwd_inner_microstep: 1519.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 05:11:31,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1556.60 | bwd_inner_microstep: 1556.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822
[2024-06-10 05:11:33,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1418.66 | bwd_inner_microstep: 1418.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3541
[2024-06-10 05:11:34,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.73 | bwd_microstep: 1260.89 | bwd_inner_microstep: 1260.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2242
[2024-06-10 05:11:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.88 | bwd_microstep: 1062.04 | bwd_inner_microstep: 1062.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 05:11:37,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.72 | bwd_microstep: 974.59 | bwd_inner_microstep: 974.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2085
[2024-06-10 05:11:38,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.77 | bwd_microstep: 765.84 | bwd_inner_microstep: 765.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 05:11:40,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.38 | bwd_microstep: 977.26 | bwd_inner_microstep: 977.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3598
[2024-06-10 05:11:49,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 05:11:49,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.09 | bwd_microstep: 8629.26 | bwd_inner_microstep: 1641.11 | bwd_allreduce_microstep: 6988.09 | step_microstep: 38.81
[2024-06-10 05:11:49,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15015.66 | bwd: 47155.51 | bwd_inner: 40166.45 | bwd_allreduce: 6988.36 | step: 40.55
{'loss': 1.2814, 'learning_rate': 3.849549432355192e-05, 'epoch': 0.15}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3446
[2024-06-10 05:11:51,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.95 | bwd_microstep: 1506.85 | bwd_inner_microstep: 1506.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3401
[2024-06-10 05:11:53,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1287.86 | bwd_inner_microstep: 1287.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2637
[2024-06-10 05:11:54,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.96 | bwd_microstep: 1113.62 | bwd_inner_microstep: 1113.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 05:11:56,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.28 | bwd_microstep: 1375.20 | bwd_inner_microstep: 1375.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493
[2024-06-10 05:11:58,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.80 | bwd_microstep: 1185.39 | bwd_inner_microstep: 1185.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 05:12:00,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.45 | bwd_microstep: 1544.00 | bwd_inner_microstep: 1543.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 05:12:02,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1478.10 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 05:12:04,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 05:12:06,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1431.00 | bwd_inner_microstep: 1430.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 05:12:08,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.48 | bwd_microstep: 1438.42 | bwd_inner_microstep: 1438.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 05:12:10,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1346.92 | bwd_inner_microstep: 1346.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3701
[2024-06-10 05:12:12,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1451.31 | bwd_inner_microstep: 1451.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 05:12:14,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.09 | bwd_microstep: 1627.49 | bwd_inner_microstep: 1627.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 05:12:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1449.84 | bwd_inner_microstep: 1449.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-10 05:12:17,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 894.37 | bwd_inner_microstep: 894.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 05:12:19,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1484.16 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3865
[2024-06-10 05:12:21,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1629.92 | bwd_inner_microstep: 1629.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3873
[2024-06-10 05:12:24,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.36 | bwd_microstep: 1484.07 | bwd_inner_microstep: 1484.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3528
[2024-06-10 05:12:26,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1543.94 | bwd_inner_microstep: 1543.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 05:12:28,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.09 | bwd_microstep: 1488.54 | bwd_inner_microstep: 1488.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 05:12:29,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.11 | bwd_microstep: 1158.92 | bwd_inner_microstep: 1158.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2131
[2024-06-10 05:12:30,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.96 | bwd_microstep: 836.21 | bwd_inner_microstep: 836.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 05:12:33,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 1561.78 | bwd_inner_microstep: 1561.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 05:12:35,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1494.96 | bwd_inner_microstep: 1494.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 05:12:36,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.81 | bwd_microstep: 809.02 | bwd_inner_microstep: 808.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 05:12:38,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.37 | bwd_microstep: 1291.25 | bwd_inner_microstep: 1291.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3539
[2024-06-10 05:12:39,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.63 | bwd_microstep: 1344.15 | bwd_inner_microstep: 1344.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3834
[2024-06-10 05:12:42,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1587.17 | bwd_inner_microstep: 1587.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 05:12:44,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1464.70 | bwd_inner_microstep: 1464.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 05:12:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1498.09 | bwd_inner_microstep: 1498.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064
[2024-06-10 05:12:47,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 915.03 | bwd_inner_microstep: 915.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 05:12:49,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.90 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 05:12:49,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.09 | bwd_microstep: 1515.17 | bwd_inner_microstep: 1507.46 | bwd_allreduce_microstep: 7.66 | step_microstep: 39.55
[2024-06-10 05:12:49,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16300.76 | bwd: 43627.17 | bwd_inner: 43618.59 | bwd_allreduce: 7.89 | step: 41.21
{'loss': 1.2995, 'learning_rate': 3.84811795227978e-05, 'epoch': 0.15}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 05:12:51,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.19 | bwd_microstep: 1392.09 | bwd_inner_microstep: 1392.01 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 05:12:53,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1250.15 | bwd_inner_microstep: 1250.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3875
[2024-06-10 05:12:55,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437
[2024-06-10 05:12:57,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1281.95 | bwd_inner_microstep: 1281.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 05:12:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1407.65 | bwd_inner_microstep: 1407.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 05:13:00,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.04 | bwd_microstep: 1346.12 | bwd_inner_microstep: 1346.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2205
[2024-06-10 05:13:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.09 | bwd_microstep: 1058.57 | bwd_inner_microstep: 1058.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 05:13:04,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 05:13:06,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1396.99 | bwd_inner_microstep: 1396.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 05:13:08,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.46 | bwd_microstep: 1626.52 | bwd_inner_microstep: 1626.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 05:13:10,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.04 | bwd_microstep: 1536.09 | bwd_inner_microstep: 1536.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 05:13:12,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.33 | bwd_microstep: 1254.28 | bwd_inner_microstep: 1254.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3416
[2024-06-10 05:13:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.30 | bwd_microstep: 1408.45 | bwd_inner_microstep: 1408.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 05:13:16,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.25 | bwd_microstep: 1446.20 | bwd_inner_microstep: 1446.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3646
[2024-06-10 05:13:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1712.85 | bwd_inner_microstep: 1712.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3532
[2024-06-10 05:13:20,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.76 | bwd_microstep: 1198.96 | bwd_inner_microstep: 1198.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523
[2024-06-10 05:13:22,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.44 | bwd_microstep: 1455.51 | bwd_inner_microstep: 1455.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 05:13:24,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1661.03 | bwd_inner_microstep: 1661.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 05:13:26,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.98 | bwd_microstep: 1187.40 | bwd_inner_microstep: 1187.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088
[2024-06-10 05:13:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.49 | bwd_microstep: 730.30 | bwd_inner_microstep: 730.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 05:13:28,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.09 | bwd_microstep: 804.42 | bwd_inner_microstep: 804.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273
[2024-06-10 05:13:29,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 878.58 | bwd_inner_microstep: 878.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 05:13:31,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.71 | bwd_microstep: 1488.13 | bwd_inner_microstep: 1488.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 05:13:33,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.12 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3637
[2024-06-10 05:13:35,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.84 | bwd_microstep: 1220.06 | bwd_inner_microstep: 1220.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 05:13:37,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1496.48 | bwd_inner_microstep: 1496.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 05:13:39,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 1499.80 | bwd_inner_microstep: 1499.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 05:13:40,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.90 | bwd_microstep: 718.71 | bwd_inner_microstep: 718.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 05:13:42,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.73 | bwd_microstep: 1550.01 | bwd_inner_microstep: 1549.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3554
[2024-06-10 05:13:44,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.87 | bwd_microstep: 1474.40 | bwd_inner_microstep: 1474.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574
[2024-06-10 05:13:46,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1527.44 | bwd_inner_microstep: 1527.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 05:13:51,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.60 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 05:13:51,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 4591.73 | bwd_inner_microstep: 1684.91 | bwd_allreduce_microstep: 2906.77 | step_microstep: 41.58
[2024-06-10 05:13:51,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16013.14 | bwd: 45762.72 | bwd_inner: 42854.98 | bwd_allreduce: 2907.03 | step: 43.22
{'loss': 1.3466, 'learning_rate': 3.8466799631425474e-05, 'epoch': 0.15}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449
[2024-06-10 05:13:53,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.95 | bwd_microstep: 1306.95 | bwd_inner_microstep: 1306.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 05:13:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 1374.48 | bwd_inner_microstep: 1374.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 05:13:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.18 | bwd_microstep: 1489.66 | bwd_inner_microstep: 1489.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 05:13:59,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1243.13 | bwd_inner_microstep: 1243.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3769
[2024-06-10 05:14:01,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 05:14:03,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1379.56 | bwd_inner_microstep: 1379.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 05:14:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.42 | bwd_microstep: 1407.12 | bwd_inner_microstep: 1407.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 05:14:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.87 | bwd_microstep: 1296.67 | bwd_inner_microstep: 1296.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 05:14:09,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.95 | bwd_microstep: 1478.68 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3437
[2024-06-10 05:14:10,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1402.24 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 05:14:12,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1444.20 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2153
[2024-06-10 05:14:14,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.26 | bwd_microstep: 946.85 | bwd_inner_microstep: 946.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3917
[2024-06-10 05:14:16,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.69 | bwd_microstep: 1597.10 | bwd_inner_microstep: 1597.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3417
[2024-06-10 05:14:18,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.55 | bwd_microstep: 1310.79 | bwd_inner_microstep: 1310.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 05:14:20,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.55 | bwd_microstep: 1446.92 | bwd_inner_microstep: 1446.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3627
[2024-06-10 05:14:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.36 | bwd_microstep: 1566.68 | bwd_inner_microstep: 1566.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2996
[2024-06-10 05:14:24,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.75 | bwd_microstep: 1203.64 | bwd_inner_microstep: 1203.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 05:14:26,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1395.43 | bwd_inner_microstep: 1395.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3649
[2024-06-10 05:14:28,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.03 | bwd_microstep: 1447.04 | bwd_inner_microstep: 1447.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 05:14:29,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1380.44 | bwd_inner_microstep: 1380.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3674
[2024-06-10 05:14:31,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.67 | bwd_microstep: 1327.57 | bwd_inner_microstep: 1327.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 05:14:33,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.18 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 05:14:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.26 | bwd_microstep: 1633.77 | bwd_inner_microstep: 1633.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 05:14:37,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.14 | bwd_microstep: 1458.33 | bwd_inner_microstep: 1458.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 05:14:39,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1393.60 | bwd_inner_microstep: 1393.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2080
[2024-06-10 05:14:40,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.02 | bwd_microstep: 758.53 | bwd_inner_microstep: 758.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2534
[2024-06-10 05:14:42,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.74 | bwd_microstep: 1062.08 | bwd_inner_microstep: 1062.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 05:14:44,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.35 | bwd_microstep: 1754.53 | bwd_inner_microstep: 1754.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 05:14:47,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1588.30 | bwd_inner_microstep: 1588.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271
[2024-06-10 05:14:48,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.23 | bwd_microstep: 1010.94 | bwd_inner_microstep: 1010.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567
[2024-06-10 05:14:50,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.73 | bwd_microstep: 1599.57 | bwd_inner_microstep: 1599.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 05:14:52,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.16 | optimizer_step: 6.56
[2024-06-10 05:14:52,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1656.68 | bwd_inner_microstep: 1512.08 | bwd_allreduce_microstep: 144.56 | step_microstep: 38.30
[2024-06-10 05:14:52,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16483.18 | bwd: 44249.43 | bwd_inner: 44103.92 | bwd_allreduce: 144.81 | step: 39.91
{'loss': 1.239, 'learning_rate': 3.845235470008084e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 05:14:54,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1467.33 | bwd_inner_microstep: 1467.26 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3920
[2024-06-10 05:14:57,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.92 | bwd_microstep: 1696.66 | bwd_inner_microstep: 1696.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 05:14:58,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 790.22 | bwd_inner_microstep: 790.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 05:15:00,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.11 | bwd_microstep: 1652.55 | bwd_inner_microstep: 1652.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3802
[2024-06-10 05:15:02,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.37 | bwd_microstep: 1599.41 | bwd_inner_microstep: 1599.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 05:15:04,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.37 | bwd_microstep: 1285.47 | bwd_inner_microstep: 1285.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 956
[2024-06-10 05:15:05,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 148.23 | bwd_microstep: 384.62 | bwd_inner_microstep: 384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3717
[2024-06-10 05:15:07,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.06 | bwd_microstep: 1734.00 | bwd_inner_microstep: 1733.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063
[2024-06-10 05:15:08,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.14 | bwd_microstep: 821.25 | bwd_inner_microstep: 821.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 05:15:10,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.76 | bwd_microstep: 1260.00 | bwd_inner_microstep: 1259.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 05:15:12,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.54 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-10 05:15:14,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.20 | bwd_microstep: 1719.85 | bwd_inner_microstep: 1719.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 05:15:16,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.15 | bwd_microstep: 1481.82 | bwd_inner_microstep: 1481.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3709
[2024-06-10 05:15:19,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.96 | bwd_microstep: 1728.35 | bwd_inner_microstep: 1728.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-10 05:15:21,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.50 | bwd_microstep: 1319.50 | bwd_inner_microstep: 1319.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 05:15:22,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.21 | bwd_microstep: 1190.07 | bwd_inner_microstep: 1190.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 05:15:24,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.08 | bwd_microstep: 1514.97 | bwd_inner_microstep: 1514.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 05:15:26,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 05:15:28,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.64 | bwd_microstep: 1296.79 | bwd_inner_microstep: 1296.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2299
[2024-06-10 05:15:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.68 | bwd_microstep: 882.67 | bwd_inner_microstep: 882.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 05:15:31,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.67 | bwd_microstep: 1315.65 | bwd_inner_microstep: 1315.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 05:15:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1405.64 | bwd_inner_microstep: 1405.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3883
[2024-06-10 05:15:35,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.97 | bwd_microstep: 1615.29 | bwd_inner_microstep: 1615.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3872
[2024-06-10 05:15:38,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.64 | bwd_microstep: 1772.26 | bwd_inner_microstep: 1772.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142
[2024-06-10 05:15:39,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.42 | bwd_microstep: 835.16 | bwd_inner_microstep: 835.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3616
[2024-06-10 05:15:41,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.01 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3717
[2024-06-10 05:15:43,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.08 | bwd_microstep: 1495.37 | bwd_inner_microstep: 1495.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 05:15:45,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.02 | bwd_microstep: 1648.53 | bwd_inner_microstep: 1648.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 05:15:47,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.44 | bwd_microstep: 1499.85 | bwd_inner_microstep: 1499.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 05:15:49,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.53 | bwd_microstep: 1603.23 | bwd_inner_microstep: 1603.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3848
[2024-06-10 05:15:52,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.94 | bwd_microstep: 1761.19 | bwd_inner_microstep: 1761.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3746
[2024-06-10 05:15:54,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 05:15:54,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.93 | bwd_microstep: 1727.26 | bwd_inner_microstep: 1719.54 | bwd_allreduce_microstep: 7.67 | step_microstep: 38.42
[2024-06-10 05:15:54,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16670.29 | bwd: 44756.82 | bwd_inner: 44748.19 | bwd_allreduce: 7.93 | step: 40.01
32:23<25:18:46, 62.08s/it]
 15%|█▌        | 259/1726 [4:33:23<25:04:12, 61.52s/it]


 15%|█▌        | 259/1726 [4:33:23<25:04:12, 61.52s/it]
 15%|█▌        | 260/1726 [4:34:26<25:10:29, 61.82s/it]


 15%|█▌        | 260/1726 [4:34:26<25:10:29, 61.82s/it]
 15%|█▌        | 261/1726 [4:35:26<24:58:12, 61.36s/it]


 15%|█▌        | 261/1726 [4:35:26<24:58:12, 61.36s/it]
 15%|█▌        | 262/1726 [4:36:28<25:02:50, 61.59s/it]


 15%|█▌        | 262/1726 [4:36:28<25:02:50, 61.59s/it]
 15%|█▌        | 263/1726 [4:37:29<24:58:06, 61.44s/it]


 15%|█▌        | 263/1726 [4:37:29<24:58:06, 61.44s/it]
 15%|█▌        | 264/1726 [4:38:31<24:59:34, 61.54s/it]
                                                    {'loss': 1.3251, 'learning_rate': 3.843784477963888e-05, 'epoch': 0.15}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 05:15:56,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.66 | bwd_microstep: 1292.96 | bwd_inner_microstep: 1292.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3916
[2024-06-10 05:15:58,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1490.86 | bwd_inner_microstep: 1490.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4219
[2024-06-10 05:16:00,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.02 | bwd_microstep: 1658.60 | bwd_inner_microstep: 1658.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 05:16:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1343.86 | bwd_inner_microstep: 1343.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 05:16:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.37 | bwd_microstep: 1403.73 | bwd_inner_microstep: 1403.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 05:16:06,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1248.15 | bwd_inner_microstep: 1248.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2042
[2024-06-10 05:16:07,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.52 | bwd_microstep: 842.39 | bwd_inner_microstep: 842.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 05:16:09,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.46 | bwd_microstep: 1291.38 | bwd_inner_microstep: 1291.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4006
[2024-06-10 05:16:11,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.85 | bwd_microstep: 1716.49 | bwd_inner_microstep: 1716.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2502
[2024-06-10 05:16:12,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.04 | bwd_microstep: 964.82 | bwd_inner_microstep: 964.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2140
[2024-06-10 05:16:14,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.99 | bwd_microstep: 990.01 | bwd_inner_microstep: 989.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3415
[2024-06-10 05:16:16,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1473.13 | bwd_inner_microstep: 1473.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 05:16:18,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.86 | bwd_microstep: 1442.38 | bwd_inner_microstep: 1442.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 05:16:20,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1392.11 | bwd_inner_microstep: 1392.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 05:16:22,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1485.68 | bwd_inner_microstep: 1485.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 05:16:24,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1340.69 | bwd_inner_microstep: 1340.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 05:16:26,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1412.71 | bwd_inner_microstep: 1412.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 05:16:28,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.75 | bwd_microstep: 1589.13 | bwd_inner_microstep: 1589.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 05:16:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1418.46 | bwd_inner_microstep: 1418.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 05:16:32,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.99 | bwd_microstep: 1427.15 | bwd_inner_microstep: 1427.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 05:16:34,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.92 | bwd_microstep: 1297.21 | bwd_inner_microstep: 1297.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3611
[2024-06-10 05:16:35,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.30 | bwd_microstep: 1217.09 | bwd_inner_microstep: 1217.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-10 05:16:36,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.00 | bwd_microstep: 802.45 | bwd_inner_microstep: 802.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 05:16:38,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.79 | bwd_microstep: 1404.40 | bwd_inner_microstep: 1404.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2015
[2024-06-10 05:16:40,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.65 | bwd_microstep: 931.66 | bwd_inner_microstep: 931.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3778
[2024-06-10 05:16:42,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 05:16:44,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 05:16:45,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.75 | bwd_microstep: 975.61 | bwd_inner_microstep: 975.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 05:16:47,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1512.60 | bwd_inner_microstep: 1512.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 05:16:49,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1389.60 | bwd_inner_microstep: 1389.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 05:16:51,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1379.33 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-10 05:16:56,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 05:16:56,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.74 | bwd_microstep: 4272.69 | bwd_inner_microstep: 1804.63 | bwd_allreduce_microstep: 2468.00 | step_microstep: 38.63
[2024-06-10 05:16:56,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.22 | bwd: 45346.37 | bwd_inner: 42877.42 | bwd_allreduce: 2468.24 | step: 40.31
{'loss': 1.3241, 'learning_rate': 3.842326992120345e-05, 'epoch': 0.15}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5008
[2024-06-10 05:16:59,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 727.45 | bwd_microstep: 1953.72 | bwd_inner_microstep: 1953.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 05:17:00,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 795.86 | bwd_inner_microstep: 795.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 05:17:01,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1311.78 | bwd_inner_microstep: 1311.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 05:17:04,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1550.68 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3487
[2024-06-10 05:17:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.53 | bwd_microstep: 1235.84 | bwd_inner_microstep: 1235.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 05:17:06,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 796.54 | bwd_inner_microstep: 796.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 05:17:08,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1246.37 | bwd_inner_microstep: 1246.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 05:17:10,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.64 | bwd_microstep: 1634.89 | bwd_inner_microstep: 1634.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 05:17:12,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1488.83 | bwd_inner_microstep: 1488.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 05:17:15,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.38 | bwd_microstep: 1646.55 | bwd_inner_microstep: 1646.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 05:17:16,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.96 | bwd_microstep: 793.24 | bwd_inner_microstep: 793.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 05:17:18,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.83 | bwd_microstep: 1485.75 | bwd_inner_microstep: 1485.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3493
[2024-06-10 05:17:20,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.40 | bwd_microstep: 1270.57 | bwd_inner_microstep: 1270.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 05:17:21,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.26 | bwd_microstep: 1282.22 | bwd_inner_microstep: 1282.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 05:17:23,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1396.51 | bwd_inner_microstep: 1396.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966
[2024-06-10 05:17:24,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.82 | bwd_microstep: 826.23 | bwd_inner_microstep: 826.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3498
[2024-06-10 05:17:27,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.83 | bwd_microstep: 1631.77 | bwd_inner_microstep: 1631.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821
[2024-06-10 05:17:29,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.41 | bwd_microstep: 1750.41 | bwd_inner_microstep: 1750.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939
[2024-06-10 05:17:30,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.66 | bwd_microstep: 825.04 | bwd_inner_microstep: 825.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 05:17:32,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2089
[2024-06-10 05:17:33,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.62 | bwd_microstep: 803.09 | bwd_inner_microstep: 803.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 05:17:35,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1459.74 | bwd_inner_microstep: 1459.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 05:17:37,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 876.13 | bwd_inner_microstep: 876.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-10 05:17:38,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.18 | bwd_microstep: 1188.63 | bwd_inner_microstep: 1188.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3456
[2024-06-10 05:17:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1414.94 | bwd_inner_microstep: 1414.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273
[2024-06-10 05:17:42,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.68 | bwd_microstep: 1073.37 | bwd_inner_microstep: 1073.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 05:17:43,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1286.81 | bwd_inner_microstep: 1286.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 05:17:45,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1503.89 | bwd_inner_microstep: 1503.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 05:17:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.57 | bwd_microstep: 1262.10 | bwd_inner_microstep: 1262.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 05:17:49,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1351.83 | bwd_inner_microstep: 1351.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-10 05:17:52,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.95 | bwd_microstep: 1753.65 | bwd_inner_microstep: 1753.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 05:17:56,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 05:17:56,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 3499.72 | bwd_inner_microstep: 1303.46 | bwd_allreduce_microstep: 2196.21 | step_microstep: 38.76
[2024-06-10 05:17:56,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15525.18 | bwd: 43779.27 | bwd_inner: 41582.16 | bwd_allreduce: 2196.44 | step: 40.37
{'loss': 1.3004, 'learning_rate': 3.840863017610714e-05, 'epoch': 0.15}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 05:17:58,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1486.25 | bwd_inner_microstep: 1486.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1980
[2024-06-10 05:17:59,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.15 | bwd_microstep: 768.72 | bwd_inner_microstep: 768.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3891
[2024-06-10 05:18:01,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.11 | bwd_microstep: 1683.89 | bwd_inner_microstep: 1683.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3847
[2024-06-10 05:18:03,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1361.88 | bwd_inner_microstep: 1361.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851
[2024-06-10 05:18:05,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1559.49 | bwd_inner_microstep: 1559.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 05:18:07,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 05:18:09,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 05:18:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1350.35 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4087
[2024-06-10 05:18:13,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1630.11 | bwd_inner_microstep: 1630.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 05:18:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1315.53 | bwd_inner_microstep: 1315.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510
[2024-06-10 05:18:17,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1348.05 | bwd_inner_microstep: 1348.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 05:18:18,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 791.81 | bwd_inner_microstep: 791.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 05:18:20,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1484.12 | bwd_inner_microstep: 1484.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 05:18:22,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.48 | bwd_microstep: 1585.27 | bwd_inner_microstep: 1585.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 05:18:24,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.47 | bwd_microstep: 1619.69 | bwd_inner_microstep: 1619.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2075
[2024-06-10 05:18:25,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.10 | bwd_microstep: 787.30 | bwd_inner_microstep: 787.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3640
[2024-06-10 05:18:27,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.70 | bwd_microstep: 1541.47 | bwd_inner_microstep: 1541.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-10 05:18:29,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1428.53 | bwd_inner_microstep: 1428.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 05:18:31,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1496.29 | bwd_inner_microstep: 1496.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 05:18:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1398.26 | bwd_inner_microstep: 1398.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3548
[2024-06-10 05:18:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.61 | bwd_microstep: 1560.10 | bwd_inner_microstep: 1560.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 05:18:38,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.55 | bwd_microstep: 1611.42 | bwd_inner_microstep: 1611.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 05:18:40,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1557.91 | bwd_inner_microstep: 1557.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 05:18:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.50 | bwd_microstep: 1460.89 | bwd_inner_microstep: 1460.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-10 05:18:44,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.98 | bwd_microstep: 1316.37 | bwd_inner_microstep: 1316.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-10 05:18:45,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.69 | bwd_microstep: 1215.55 | bwd_inner_microstep: 1215.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 05:18:47,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1398.45 | bwd_inner_microstep: 1398.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2213
[2024-06-10 05:18:49,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.05 | bwd_microstep: 863.76 | bwd_inner_microstep: 863.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 05:18:51,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.73 | bwd_microstep: 1497.09 | bwd_inner_microstep: 1497.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451
[2024-06-10 05:18:53,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1418.58 | bwd_inner_microstep: 1418.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3591
[2024-06-10 05:18:54,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 1426.22 | bwd_inner_microstep: 1426.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-10 05:18:57,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 05:18:57,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1927.14 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 387.01 | step_microstep: 38.41
[2024-06-10 05:18:57,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16522.11 | bwd: 44619.93 | bwd_inner: 44231.98 | bwd_allreduce: 387.25 | step: 40.14
{'loss': 1.3095, 'learning_rate': 3.839392559591104e-05, 'epoch': 0.15}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 05:18:59,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1377.21 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3396
[2024-06-10 05:19:01,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.42 | bwd_microstep: 1148.36 | bwd_inner_microstep: 1148.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904
[2024-06-10 05:19:03,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1494.05 | bwd_inner_microstep: 1494.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 05:19:04,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1281.83 | bwd_inner_microstep: 1281.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 05:19:06,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1451.88 | bwd_inner_microstep: 1451.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867
[2024-06-10 05:19:07,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.29 | bwd_microstep: 710.00 | bwd_inner_microstep: 709.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 05:19:09,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.67 | bwd_microstep: 1186.13 | bwd_inner_microstep: 1186.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 740
[2024-06-10 05:19:09,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.61 | bwd_microstep: 299.85 | bwd_inner_microstep: 299.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3792
[2024-06-10 05:19:11,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1455.83 | bwd_inner_microstep: 1455.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 05:19:13,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.71 | bwd_microstep: 1282.95 | bwd_inner_microstep: 1282.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-10 05:19:14,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.36 | bwd_microstep: 711.16 | bwd_inner_microstep: 711.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 05:19:16,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1389.53 | bwd_inner_microstep: 1389.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 05:19:18,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1386.95 | bwd_inner_microstep: 1386.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 05:19:20,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1388.62 | bwd_inner_microstep: 1388.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 05:19:22,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1346.70 | bwd_inner_microstep: 1346.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 05:19:24,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1425.36 | bwd_inner_microstep: 1425.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 05:19:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1376.31 | bwd_inner_microstep: 1376.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 05:19:27,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.94 | bwd_microstep: 800.47 | bwd_inner_microstep: 800.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 05:19:29,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.46 | bwd_microstep: 1283.89 | bwd_inner_microstep: 1283.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2149
[2024-06-10 05:19:31,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.68 | bwd_microstep: 1806.87 | bwd_inner_microstep: 1806.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275
[2024-06-10 05:19:32,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.46 | bwd_microstep: 910.17 | bwd_inner_microstep: 910.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 05:19:34,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1397.06 | bwd_inner_microstep: 1397.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 05:19:36,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.50 | bwd_microstep: 1390.10 | bwd_inner_microstep: 1390.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 05:19:38,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1414.23 | bwd_inner_microstep: 1414.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 05:19:40,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.31 | bwd_microstep: 1632.25 | bwd_inner_microstep: 1632.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3542
[2024-06-10 05:19:42,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.82 | bwd_microstep: 1591.51 | bwd_inner_microstep: 1591.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601
[2024-06-10 05:19:45,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.09 | bwd_microstep: 1704.58 | bwd_inner_microstep: 1704.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3817
[2024-06-10 05:19:47,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.40 | bwd_microstep: 1820.46 | bwd_inner_microstep: 1820.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3589
[2024-06-10 05:19:49,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.22 | bwd_microstep: 1354.81 | bwd_inner_microstep: 1354.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 05:19:51,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 1542.97 | bwd_inner_microstep: 1542.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-10 05:19:52,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.80 | bwd_microstep: 806.31 | bwd_inner_microstep: 806.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2240
[2024-06-10 05:20:00,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.33 | optimizer_step: 6.63
[2024-06-10 05:20:00,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.98 | bwd_microstep: 7748.80 | bwd_inner_microstep: 983.81 | bwd_allreduce_microstep: 6764.93 | step_microstep: 38.88
[2024-06-10 05:20:00,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15058.37 | bwd: 47917.23 | bwd_inner: 41151.33 | bwd_allreduce: 6765.18 | step: 40.56
{'loss': 1.3688, 'learning_rate': 3.837915623240462e-05, 'epoch': 0.16}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 05:20:02,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.40 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2361
[2024-06-10 05:20:04,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.57 | bwd_microstep: 984.69 | bwd_inner_microstep: 984.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352
[2024-06-10 05:20:05,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.97 | bwd_microstep: 985.55 | bwd_inner_microstep: 985.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886
[2024-06-10 05:20:07,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1581.38 | bwd_inner_microstep: 1581.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 05:20:09,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1248.50 | bwd_inner_microstep: 1248.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 05:20:11,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 05:20:13,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1386.03 | bwd_inner_microstep: 1386.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 05:20:14,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.06 | bwd_microstep: 681.95 | bwd_inner_microstep: 681.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 05:20:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.10 | bwd_microstep: 958.18 | bwd_inner_microstep: 958.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 05:20:17,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.36 | bwd_microstep: 1533.57 | bwd_inner_microstep: 1533.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 05:20:19,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 05:20:21,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1634.93 | bwd_inner_microstep: 1634.67 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 05:20:23,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.78 | bwd_microstep: 1539.40 | bwd_inner_microstep: 1539.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 05:20:25,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.34 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 05:20:27,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.29 | bwd_microstep: 1503.24 | bwd_inner_microstep: 1503.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.32
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642
[2024-06-10 05:20:30,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.40 | bwd_microstep: 1710.48 | bwd_inner_microstep: 1710.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 05:20:32,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.00 | bwd_microstep: 1492.06 | bwd_inner_microstep: 1492.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 05:20:34,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 05:20:36,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1514.43 | bwd_inner_microstep: 1514.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2664
[2024-06-10 05:20:37,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.44 | bwd_microstep: 1119.41 | bwd_inner_microstep: 1119.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3608
[2024-06-10 05:20:40,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.24 | bwd_microstep: 1809.14 | bwd_inner_microstep: 1809.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 05:20:42,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1416.39 | bwd_inner_microstep: 1416.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 05:20:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1280.57 | bwd_inner_microstep: 1280.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 05:20:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.10 | bwd_microstep: 810.07 | bwd_inner_microstep: 810.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 05:20:46,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.31 | bwd_microstep: 1255.67 | bwd_inner_microstep: 1255.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 05:20:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.28 | bwd_microstep: 1637.64 | bwd_inner_microstep: 1637.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524
[2024-06-10 05:20:50,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.77 | bwd_microstep: 1203.43 | bwd_inner_microstep: 1203.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-10 05:20:52,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1554.33 | bwd_inner_microstep: 1554.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 05:20:55,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.41 | bwd_microstep: 1754.01 | bwd_inner_microstep: 1753.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2625
[2024-06-10 05:20:56,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.46 | bwd_microstep: 1109.70 | bwd_inner_microstep: 1109.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 05:20:59,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.38 | bwd_microstep: 1647.62 | bwd_inner_microstep: 1647.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 05:21:02,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.62
[2024-06-10 05:21:02,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 2536.51 | bwd_inner_microstep: 1686.95 | bwd_allreduce_microstep: 849.49 | step_microstep: 38.94
[2024-06-10 05:21:02,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.06 | bwd: 44793.07 | bwd_inner: 43942.25 | bwd_allreduce: 849.92 | step: 41.10
{'loss': 1.3367, 'learning_rate': 3.8364322137605484e-05, 'epoch': 0.16}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866
[2024-06-10 05:21:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.66 | bwd_microstep: 671.47 | bwd_inner_microstep: 671.33 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898
[2024-06-10 05:21:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1590.92 | bwd_inner_microstep: 1590.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2310
[2024-06-10 05:21:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.52 | bwd_microstep: 979.17 | bwd_inner_microstep: 979.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 05:21:08,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1378.07 | bwd_inner_microstep: 1378.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4255
[2024-06-10 05:21:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.28 | bwd_microstep: 1565.62 | bwd_inner_microstep: 1565.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 05:21:12,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.58 | bwd_microstep: 1186.72 | bwd_inner_microstep: 1186.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3667
[2024-06-10 05:21:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.94 | bwd_microstep: 1453.18 | bwd_inner_microstep: 1453.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 05:21:16,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.02 | bwd_microstep: 1626.70 | bwd_inner_microstep: 1626.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3409
[2024-06-10 05:21:18,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.05 | bwd_microstep: 1177.61 | bwd_inner_microstep: 1177.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427
[2024-06-10 05:21:20,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.85 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 05:21:22,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 05:21:24,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1344.12 | bwd_inner_microstep: 1344.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 1954
[2024-06-10 05:21:25,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.95 | bwd_microstep: 947.96 | bwd_inner_microstep: 947.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-10 05:21:27,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.88 | bwd_microstep: 1574.44 | bwd_inner_microstep: 1574.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 05:21:29,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.13 | bwd_microstep: 1339.11 | bwd_inner_microstep: 1339.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 05:21:31,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1419.68 | bwd_inner_microstep: 1419.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2125
[2024-06-10 05:21:32,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.50 | bwd_microstep: 836.69 | bwd_inner_microstep: 836.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 05:21:34,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1509.36 | bwd_inner_microstep: 1509.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 05:21:36,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3553
[2024-06-10 05:21:38,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.79 | bwd_microstep: 1201.84 | bwd_inner_microstep: 1201.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 05:21:40,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.89 | bwd_microstep: 1449.75 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 05:21:41,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.36 | bwd_microstep: 1255.89 | bwd_inner_microstep: 1255.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 05:21:43,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.55 | bwd_microstep: 1304.00 | bwd_inner_microstep: 1303.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546
[2024-06-10 05:21:45,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.49 | bwd_microstep: 1326.11 | bwd_inner_microstep: 1326.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3175
[2024-06-10 05:21:47,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.78 | bwd_microstep: 1237.03 | bwd_inner_microstep: 1237.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-10 05:21:49,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1447.93 | bwd_inner_microstep: 1447.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588
[2024-06-10 05:21:51,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1564.98 | bwd_inner_microstep: 1564.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639
[2024-06-10 05:21:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.05 | bwd_microstep: 1577.55 | bwd_inner_microstep: 1577.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 05:21:55,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1474.77 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 05:21:57,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1393.77 | bwd_inner_microstep: 1393.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 05:21:59,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1343.01 | bwd_inner_microstep: 1342.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 05:22:06,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.39 | optimizer_step: 6.60
[2024-06-10 05:22:06,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 6185.56 | bwd_inner_microstep: 1440.98 | bwd_allreduce_microstep: 4744.51 | step_microstep: 39.82
[2024-06-10 05:22:06,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16015.27 | bwd: 47519.05 | bwd_inner: 42773.51 | bwd_allreduce: 4744.81 | step: 41.42


 15%|█▌        | 264/1726 [4:38:31<24:59:34, 61.54s/it]
 15%|█▌        | 265/1726 [4:39:33<24:59:42, 61.59s/it]


 15%|█▌        | 265/1726 [4:39:33<24:59:42, 61.59s/it]
 15%|█▌        | 266/1726 [4:40:32<24:44:33, 61.01s/it]


 15%|█▌        | 266/1726 [4:40:32<24:44:33, 61.01s/it]
 15%|█▌        | 267/1726 [4:41:34<24:47:05, 61.16s/it]


 15%|█▌        | 267/1726 [4:41:34<24:47:05, 61.16s/it]
 16%|█▌        | 268/1726 [4:42:37<25:01:48, 61.80s/it]


 16%|█▌        | 268/1726 [4:42:37<25:01:48, 61.80s/it]
 16%|█▌        | 269/1726 [4:43:39<24:58:33, 61.71s/it]


 16%|█▌        | 269/1726 [4:43:39<24:58:33, 61.71s/it]
 16%|█▌        | 270/1726 [4:44:42<25:13:22, 62.36s/it]
              {'loss': 1.308, 'learning_rate': 3.834942336375925e-05, 'epoch': 0.16}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 05:22:08,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.32 | bwd_microstep: 1564.66 | bwd_inner_microstep: 1564.55 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 05:22:10,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912
[2024-06-10 05:22:12,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.67 | bwd_microstep: 1686.31 | bwd_inner_microstep: 1686.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-10 05:22:14,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.77 | bwd_microstep: 1277.26 | bwd_inner_microstep: 1277.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 05:22:16,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1458.73 | bwd_inner_microstep: 1458.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 05:22:18,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.79 | bwd_microstep: 1553.88 | bwd_inner_microstep: 1553.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3742
[2024-06-10 05:22:20,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1368.20 | bwd_inner_microstep: 1368.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 05:22:22,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1399.90 | bwd_inner_microstep: 1399.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 05:22:24,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728
[2024-06-10 05:22:26,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.48 | bwd_microstep: 1465.09 | bwd_inner_microstep: 1465.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533
[2024-06-10 05:22:27,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.04 | bwd_microstep: 1200.10 | bwd_inner_microstep: 1200.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 05:22:30,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.78 | bwd_microstep: 1645.15 | bwd_inner_microstep: 1645.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3416
[2024-06-10 05:22:32,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1409.08 | bwd_inner_microstep: 1409.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 05:22:33,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1386.74 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3386
[2024-06-10 05:22:35,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1367.72 | bwd_inner_microstep: 1367.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1917
[2024-06-10 05:22:36,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.45 | bwd_microstep: 722.69 | bwd_inner_microstep: 722.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 05:22:37,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.50 | bwd_microstep: 792.93 | bwd_inner_microstep: 792.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 05:22:39,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.53 | bwd_microstep: 1248.49 | bwd_inner_microstep: 1248.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 05:22:41,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 05:22:43,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.99 | bwd_microstep: 1297.52 | bwd_inner_microstep: 1297.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 05:22:45,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1296.72 | bwd_inner_microstep: 1296.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 05:22:47,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1463.14 | bwd_inner_microstep: 1463.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2119
[2024-06-10 05:22:48,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.64 | bwd_microstep: 766.11 | bwd_inner_microstep: 766.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 05:22:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.59 | bwd_microstep: 1356.16 | bwd_inner_microstep: 1356.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 05:22:52,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1397.08 | bwd_inner_microstep: 1397.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 05:22:54,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1511.74 | bwd_inner_microstep: 1511.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 05:22:56,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.19 | bwd_microstep: 1601.17 | bwd_inner_microstep: 1601.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 05:22:58,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1444.70 | bwd_inner_microstep: 1444.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2253
[2024-06-10 05:22:59,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.29 | bwd_microstep: 869.95 | bwd_inner_microstep: 869.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-10 05:23:00,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.33 | bwd_microstep: 879.13 | bwd_inner_microstep: 879.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596
[2024-06-10 05:23:02,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1438.29 | bwd_inner_microstep: 1438.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3575
[2024-06-10 05:23:07,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 05:23:07,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.68 | bwd_microstep: 4051.13 | bwd_inner_microstep: 1921.42 | bwd_allreduce_microstep: 2129.66 | step_microstep: 39.10
[2024-06-10 05:23:07,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16030.46 | bwd: 45115.81 | bwd_inner: 42985.14 | bwd_allreduce: 2129.96 | step: 40.78
{'loss': 1.2707, 'learning_rate': 3.833445996333932e-05, 'epoch': 0.16}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2875
[2024-06-10 05:23:09,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.80 | bwd_microstep: 1172.17 | bwd_inner_microstep: 1172.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4016
[2024-06-10 05:23:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882
[2024-06-10 05:23:13,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.85 | bwd_microstep: 1488.73 | bwd_inner_microstep: 1488.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3840
[2024-06-10 05:23:15,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.69 | bwd_microstep: 1487.20 | bwd_inner_microstep: 1487.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237
[2024-06-10 05:23:16,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.67 | bwd_microstep: 864.39 | bwd_inner_microstep: 864.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 05:23:18,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1434.46 | bwd_inner_microstep: 1434.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 05:23:20,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.75 | bwd_microstep: 1540.11 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 05:23:22,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1396.93 | bwd_inner_microstep: 1396.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 05:23:23,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.50 | bwd_microstep: 798.98 | bwd_inner_microstep: 798.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 05:23:25,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1283.42 | bwd_inner_microstep: 1283.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 05:23:27,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.01 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 05:23:29,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1384.75 | bwd_inner_microstep: 1384.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 05:23:31,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1386.60 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2147
[2024-06-10 05:23:32,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.62 | bwd_microstep: 1043.21 | bwd_inner_microstep: 1043.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 05:23:34,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.13 | bwd_microstep: 1483.19 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3447
[2024-06-10 05:23:36,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.39 | bwd_microstep: 1486.06 | bwd_inner_microstep: 1486.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3695
[2024-06-10 05:23:39,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.64 | bwd_microstep: 1665.78 | bwd_inner_microstep: 1665.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 05:23:41,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.24 | bwd_microstep: 1663.11 | bwd_inner_microstep: 1663.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 05:23:43,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1410.28 | bwd_inner_microstep: 1410.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 05:23:45,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.17 | bwd_microstep: 1527.13 | bwd_inner_microstep: 1527.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 05:23:47,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.89 | bwd_microstep: 1464.18 | bwd_inner_microstep: 1464.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 05:23:49,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.20 | bwd_microstep: 1397.71 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 05:23:51,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.50 | bwd_microstep: 1290.93 | bwd_inner_microstep: 1290.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3543
[2024-06-10 05:23:53,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1426.64 | bwd_inner_microstep: 1426.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2182
[2024-06-10 05:23:54,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.22 | bwd_microstep: 890.53 | bwd_inner_microstep: 890.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 05:23:56,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.86 | bwd_microstep: 1533.22 | bwd_inner_microstep: 1533.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3778
[2024-06-10 05:23:58,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.66 | bwd_microstep: 1413.01 | bwd_inner_microstep: 1412.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 05:24:00,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1353.84 | bwd_inner_microstep: 1353.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3581
[2024-06-10 05:24:02,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.21 | bwd_microstep: 1668.60 | bwd_inner_microstep: 1668.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 05:24:04,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1380.01 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 05:24:06,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1402.22 | bwd_inner_microstep: 1402.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 05:24:11,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.62
[2024-06-10 05:24:11,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 4323.32 | bwd_inner_microstep: 1568.00 | bwd_allreduce_microstep: 2755.24 | step_microstep: 41.54
[2024-06-10 05:24:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16418.20 | bwd: 46852.62 | bwd_inner: 44096.41 | bwd_allreduce: 2755.49 | step: 43.15
{'loss': 1.3312, 'learning_rate': 3.8319431989046704e-05, 'epoch': 0.16}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 05:24:13,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.88 | bwd_microstep: 1364.53 | bwd_inner_microstep: 1364.44 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3971
[2024-06-10 05:24:15,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.65 | bwd_microstep: 1668.83 | bwd_inner_microstep: 1668.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3887
[2024-06-10 05:24:17,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1686.82 | bwd_inner_microstep: 1686.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 05:24:19,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769
[2024-06-10 05:24:21,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.60 | bwd_microstep: 1346.41 | bwd_inner_microstep: 1346.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 05:24:23,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.23 | bwd_microstep: 1557.33 | bwd_inner_microstep: 1557.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 05:24:26,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1531.50 | bwd_inner_microstep: 1531.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 05:24:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1251.61 | bwd_inner_microstep: 1251.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 05:24:29,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.31 | bwd_microstep: 1426.44 | bwd_inner_microstep: 1426.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 05:24:31,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.56 | bwd_microstep: 1262.60 | bwd_inner_microstep: 1262.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2085
[2024-06-10 05:24:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 922.82 | bwd_inner_microstep: 922.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 05:24:35,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.73 | bwd_microstep: 1617.44 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3956
[2024-06-10 05:24:37,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.20 | bwd_microstep: 1696.35 | bwd_inner_microstep: 1696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1983
[2024-06-10 05:24:38,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.48 | bwd_microstep: 831.23 | bwd_inner_microstep: 831.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 05:24:39,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.79 | bwd_microstep: 890.46 | bwd_inner_microstep: 890.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 05:24:41,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1351.13 | bwd_inner_microstep: 1351.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2021
[2024-06-10 05:24:42,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.45 | bwd_microstep: 746.21 | bwd_inner_microstep: 746.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-10 05:24:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.82 | bwd_microstep: 881.36 | bwd_inner_microstep: 881.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 05:24:45,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.67 | bwd_microstep: 1201.06 | bwd_inner_microstep: 1201.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 05:24:47,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.61 | bwd_microstep: 1282.85 | bwd_inner_microstep: 1282.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 05:24:49,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1513.81 | bwd_inner_microstep: 1513.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752
[2024-06-10 05:24:51,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1344.83 | bwd_inner_microstep: 1344.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 05:24:53,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.40 | bwd_microstep: 1408.83 | bwd_inner_microstep: 1408.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121
[2024-06-10 05:24:54,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.87 | bwd_microstep: 831.20 | bwd_inner_microstep: 831.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2063
[2024-06-10 05:24:55,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.91 | bwd_microstep: 945.40 | bwd_inner_microstep: 945.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 05:24:57,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.22 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3616
[2024-06-10 05:24:59,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.85 | bwd_microstep: 1559.38 | bwd_inner_microstep: 1559.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-10 05:25:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.13 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 05:25:03,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.45 | bwd_microstep: 1645.80 | bwd_inner_microstep: 1645.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 05:25:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1477.44 | bwd_inner_microstep: 1477.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 05:25:07,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1321.48 | bwd_inner_microstep: 1321.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 05:25:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 05:25:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.78 | bwd_microstep: 3766.97 | bwd_inner_microstep: 1836.60 | bwd_allreduce_microstep: 1930.32 | step_microstep: 38.76
[2024-06-10 05:25:12,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15801.98 | bwd: 44544.68 | bwd_inner: 42613.38 | bwd_allreduce: 1930.60 | step: 40.42
{'loss': 1.3399, 'learning_rate': 3.8304339493809866e-05, 'epoch': 0.16}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 05:25:13,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.23 | bwd_microstep: 1271.74 | bwd_inner_microstep: 1271.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 05:25:15,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.83 | bwd_microstep: 1346.41 | bwd_inner_microstep: 1346.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887
[2024-06-10 05:25:17,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.82 | bwd_microstep: 1586.98 | bwd_inner_microstep: 1586.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3837
[2024-06-10 05:25:19,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 05:25:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.21 | bwd_microstep: 1274.69 | bwd_inner_microstep: 1274.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3594
[2024-06-10 05:25:23,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.80 | bwd_microstep: 1212.39 | bwd_inner_microstep: 1212.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-10 05:25:25,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.31 | bwd_microstep: 1440.51 | bwd_inner_microstep: 1440.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 05:25:26,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.96 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 05:25:28,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.61 | bwd_microstep: 1388.53 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3773
[2024-06-10 05:25:30,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 05:25:32,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.93 | bwd_microstep: 1289.04 | bwd_inner_microstep: 1289.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 05:25:34,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.66 | bwd_microstep: 1488.59 | bwd_inner_microstep: 1488.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3995
[2024-06-10 05:25:37,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.93 | bwd_microstep: 1909.66 | bwd_inner_microstep: 1909.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 05:25:39,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2069
[2024-06-10 05:25:40,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.08 | bwd_microstep: 882.55 | bwd_inner_microstep: 882.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3636
[2024-06-10 05:25:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1374.81 | bwd_inner_microstep: 1374.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 05:25:44,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1505.39 | bwd_inner_microstep: 1505.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519
[2024-06-10 05:25:46,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.06 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 05:25:48,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.79 | bwd_microstep: 1297.04 | bwd_inner_microstep: 1297.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 05:25:50,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.57 | bwd_microstep: 1288.45 | bwd_inner_microstep: 1288.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 05:25:51,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.54 | bwd_microstep: 812.31 | bwd_inner_microstep: 812.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3616
[2024-06-10 05:25:53,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1443.39 | bwd_inner_microstep: 1443.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3891
[2024-06-10 05:25:55,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1618.16 | bwd_inner_microstep: 1618.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 05:25:57,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.49 | bwd_microstep: 1662.05 | bwd_inner_microstep: 1662.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3820
[2024-06-10 05:25:59,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1507.71 | bwd_inner_microstep: 1507.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 05:26:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.14 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2267
[2024-06-10 05:26:03,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.79 | bwd_microstep: 974.65 | bwd_inner_microstep: 974.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 05:26:04,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1255.55 | bwd_inner_microstep: 1255.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 05:26:07,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.09 | bwd_microstep: 1502.94 | bwd_inner_microstep: 1502.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2270
[2024-06-10 05:26:08,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.10 | bwd_microstep: 1066.66 | bwd_inner_microstep: 1066.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 05:26:10,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.20 | bwd_microstep: 1454.11 | bwd_inner_microstep: 1454.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 05:26:14,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 05:26:14,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 3339.47 | bwd_inner_microstep: 1533.02 | bwd_allreduce_microstep: 1806.40 | step_microstep: 38.75
[2024-06-10 05:26:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16339.14 | bwd: 45688.31 | bwd_inner: 43880.96 | bwd_allreduce: 1806.64 | step: 40.40
{'loss': 1.3183, 'learning_rate': 3.828918253078448e-05, 'epoch': 0.16}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-10 05:26:15,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.08 | bwd_microstep: 676.28 | bwd_inner_microstep: 676.15 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 05:26:17,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1276.42 | bwd_inner_microstep: 1276.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3772
[2024-06-10 05:26:19,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1503.15 | bwd_inner_microstep: 1503.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 05:26:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1486.86 | bwd_inner_microstep: 1486.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-10 05:26:23,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.37 | bwd_microstep: 1527.25 | bwd_inner_microstep: 1527.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 05:26:25,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 1448.46 | bwd_inner_microstep: 1448.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-10 05:26:27,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1431.16 | bwd_inner_microstep: 1431.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 05:26:29,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1287.68 | bwd_inner_microstep: 1287.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 05:26:31,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.38 | bwd_microstep: 1448.23 | bwd_inner_microstep: 1448.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 05:26:33,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1481.94 | bwd_inner_microstep: 1481.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 05:26:35,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1972
[2024-06-10 05:26:36,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.86 | bwd_microstep: 891.24 | bwd_inner_microstep: 891.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 05:26:38,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.17 | bwd_microstep: 1390.79 | bwd_inner_microstep: 1390.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 05:26:40,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3635
[2024-06-10 05:26:42,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.01 | bwd_microstep: 1531.93 | bwd_inner_microstep: 1531.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 05:26:44,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1500.22 | bwd_inner_microstep: 1500.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 670
[2024-06-10 05:26:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 111.60 | bwd_microstep: 279.89 | bwd_inner_microstep: 279.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 05:26:46,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1425.20 | bwd_inner_microstep: 1425.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 05:26:48,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.74 | bwd_microstep: 1659.57 | bwd_inner_microstep: 1659.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3710
[2024-06-10 05:26:51,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.23 | bwd_microstep: 1600.83 | bwd_inner_microstep: 1600.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 05:26:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.96 | bwd_microstep: 802.74 | bwd_inner_microstep: 802.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 05:26:54,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1513.53 | bwd_inner_microstep: 1513.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 05:26:56,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.26 | bwd_microstep: 1531.65 | bwd_inner_microstep: 1531.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3523
[2024-06-10 05:26:58,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.49 | bwd_microstep: 1344.09 | bwd_inner_microstep: 1344.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828
[2024-06-10 05:27:00,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1490.68 | bwd_inner_microstep: 1490.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 05:27:02,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1324.56 | bwd_inner_microstep: 1324.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 05:27:04,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.59 | bwd_microstep: 1405.10 | bwd_inner_microstep: 1405.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2027
[2024-06-10 05:27:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.05 | bwd_microstep: 746.37 | bwd_inner_microstep: 746.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3417
[2024-06-10 05:27:06,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.63 | bwd_microstep: 1213.46 | bwd_inner_microstep: 1213.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 05:27:09,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.87 | bwd_microstep: 1544.55 | bwd_inner_microstep: 1544.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3573
[2024-06-10 05:27:11,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.06 | bwd_microstep: 1662.86 | bwd_inner_microstep: 1662.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 05:27:15,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 05:27:15,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 3297.59 | bwd_inner_microstep: 1739.58 | bwd_allreduce_microstep: 1557.95 | step_microstep: 38.92
[2024-06-10 05:27:15,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15983.97 | bwd: 44490.47 | bwd_inner: 42931.51 | bwd_allreduce: 1558.23 | step: 40.50
{'loss': 1.2838, 'learning_rate': 3.8273961153353296e-05, 'epoch': 0.16}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 05:27:16,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.84 | bwd_microstep: 1141.60 | bwd_inner_microstep: 1141.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2450
[2024-06-10 05:27:18,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.63 | bwd_microstep: 1016.78 | bwd_inner_microstep: 1016.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 05:27:20,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1655.22 | bwd_inner_microstep: 1655.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 05:27:22,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.11 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-10 05:27:24,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.24 | bwd_microstep: 1438.80 | bwd_inner_microstep: 1438.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 05:27:26,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.31 | bwd_microstep: 1248.97 | bwd_inner_microstep: 1248.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-10 05:27:28,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.34 | bwd_microstep: 1535.21 | bwd_inner_microstep: 1535.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927
[2024-06-10 05:27:29,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.65 | bwd_microstep: 727.43 | bwd_inner_microstep: 727.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 05:27:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1256.44 | bwd_inner_microstep: 1256.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 05:27:32,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.15 | bwd_microstep: 1256.86 | bwd_inner_microstep: 1256.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 05:27:35,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.47 | bwd_microstep: 1655.97 | bwd_inner_microstep: 1655.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2012
[2024-06-10 05:27:36,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.43 | bwd_microstep: 773.47 | bwd_inner_microstep: 773.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2684
[2024-06-10 05:27:37,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.73 | bwd_microstep: 1220.30 | bwd_inner_microstep: 1220.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-10 05:27:39,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.10 | bwd_microstep: 1346.43 | bwd_inner_microstep: 1346.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 05:27:41,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3542
[2024-06-10 05:27:43,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001
[2024-06-10 05:27:45,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.65 | bwd_microstep: 897.95 | bwd_inner_microstep: 897.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 05:27:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 1282.17 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 05:27:49,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.84 | bwd_microstep: 1609.52 | bwd_inner_microstep: 1609.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 05:27:51,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.70 | bwd_microstep: 1400.75 | bwd_inner_microstep: 1400.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 05:27:52,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1379.33 | bwd_inner_microstep: 1379.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 05:27:54,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1379.74 | bwd_inner_microstep: 1379.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 05:27:57,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.82 | bwd_microstep: 1593.61 | bwd_inner_microstep: 1593.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3772
[2024-06-10 05:27:58,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.43 | bwd_microstep: 1251.74 | bwd_inner_microstep: 1251.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 05:28:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.17 | bwd_microstep: 1438.63 | bwd_inner_microstep: 1438.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 05:28:02,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.17 | bwd_microstep: 1500.37 | bwd_inner_microstep: 1500.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 05:28:04,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.11 | bwd_microstep: 1536.81 | bwd_inner_microstep: 1536.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 05:28:06,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1248.91 | bwd_inner_microstep: 1248.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 05:28:08,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1557.55 | bwd_inner_microstep: 1557.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379
[2024-06-10 05:28:10,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.02 | bwd_microstep: 1242.65 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 05:28:12,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.13 | bwd_microstep: 1407.28 | bwd_inner_microstep: 1407.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3771
[2024-06-10 05:28:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.12 | optimizer_gradients: 4.40 | optimizer_step: 6.60
[2024-06-10 05:28:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.92 | bwd_microstep: 27606.96 | bwd_inner_microstep: 1671.11 | bwd_allreduce_microstep: 25935.78 | step_microstep: 41.22
[2024-06-10 05:28:40,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16112.73 | bwd: 69052.25 | bwd_inner: 43115.52 | bwd_allreduce: 25936.02 | step: 42.92


 16%|█▌        | 270/1726 [4:44:42<25:13:22, 62.36s/it]
 16%|█▌        | 271/1726 [4:45:44<25:06:02, 62.10s/it]


 16%|█▌        | 271/1726 [4:45:44<25:06:02, 62.10s/it]
 16%|█▌        | 272/1726 [4:46:48<25:16:03, 62.56s/it]


 16%|█▌        | 272/1726 [4:46:48<25:16:03, 62.56s/it]
 16%|█▌        | 273/1726 [4:47:48<25:01:26, 62.00s/it]


 16%|█▌        | 273/1726 [4:47:48<25:01:26, 62.00s/it]
 16%|█▌        | 274/1726 [4:48:51<25:03:11, 62.12s/it]


 16%|█▌        | 274/1726 [4:48:51<25:03:11, 62.12s/it]
 16%|█▌        | 275/1726 [4:49:51<24:52:45, 61.73s/it]


 16%|█▌        | 275/1726 [4:49:51<24:52:45, 61.73s/it]
 16%|█▌        | 276/1726 [4:51:{'loss': 1.267, 'learning_rate': 3.825867541512593e-05, 'epoch': 0.16}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3450
[2024-06-10 05:28:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.04 | bwd_microstep: 1362.93 | bwd_inner_microstep: 1362.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3896
[2024-06-10 05:28:44,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.89 | bwd_microstep: 1675.89 | bwd_inner_microstep: 1675.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 05:28:46,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1372.08 | bwd_inner_microstep: 1372.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 05:28:48,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.50 | bwd_microstep: 1471.34 | bwd_inner_microstep: 1471.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 05:28:49,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.29 | bwd_microstep: 786.69 | bwd_inner_microstep: 786.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 05:28:51,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1375.04 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 05:28:53,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 05:28:55,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1378.46 | bwd_inner_microstep: 1378.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 05:28:57,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.26 | bwd_microstep: 1528.54 | bwd_inner_microstep: 1528.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 05:28:59,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489
[2024-06-10 05:29:01,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1442.40 | bwd_inner_microstep: 1442.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 05:29:03,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.46 | bwd_microstep: 1476.08 | bwd_inner_microstep: 1476.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-10 05:29:04,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 890.88 | bwd_inner_microstep: 890.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 05:29:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.11 | bwd_microstep: 1438.96 | bwd_inner_microstep: 1438.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451
[2024-06-10 05:29:08,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1413.41 | bwd_inner_microstep: 1413.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 05:29:10,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.60 | bwd_microstep: 795.89 | bwd_inner_microstep: 795.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 05:29:10,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.70 | bwd_microstep: 699.12 | bwd_inner_microstep: 699.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 05:29:12,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1293.46 | bwd_inner_microstep: 1293.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3532
[2024-06-10 05:29:14,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1229.54 | bwd_inner_microstep: 1229.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-10 05:29:16,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1422.39 | bwd_inner_microstep: 1422.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 05:29:18,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1346.62 | bwd_inner_microstep: 1346.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 05:29:20,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1293.99 | bwd_inner_microstep: 1293.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 05:29:22,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1451.84 | bwd_inner_microstep: 1451.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3560
[2024-06-10 05:29:24,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1546.76 | bwd_inner_microstep: 1546.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2012
[2024-06-10 05:29:25,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.37 | bwd_microstep: 837.02 | bwd_inner_microstep: 836.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 05:29:27,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.79 | bwd_microstep: 1351.19 | bwd_inner_microstep: 1351.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3760
[2024-06-10 05:29:29,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.48 | bwd_microstep: 1603.56 | bwd_inner_microstep: 1603.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581
[2024-06-10 05:29:31,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.94 | bwd_microstep: 1697.44 | bwd_inner_microstep: 1697.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 05:29:33,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1493.54 | bwd_inner_microstep: 1493.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547
[2024-06-10 05:29:35,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1519.07 | bwd_inner_microstep: 1519.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 05:29:38,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1492.03 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 05:29:44,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 05:29:44,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 5480.40 | bwd_inner_microstep: 1438.82 | bwd_allreduce_microstep: 4041.52 | step_microstep: 38.86
[2024-06-10 05:29:44,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16039.98 | bwd: 46934.22 | bwd_inner: 42891.71 | bwd_allreduce: 4041.81 | step: 40.45
{'loss': 1.2909, 'learning_rate': 3.8243325369938674e-05, 'epoch': 0.16}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 05:29:45,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.26 | bwd_microstep: 798.00 | bwd_inner_microstep: 797.85 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3908
[2024-06-10 05:29:47,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.65 | bwd_microstep: 1420.09 | bwd_inner_microstep: 1420.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459
[2024-06-10 05:29:49,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 05:29:51,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.53 | bwd_microstep: 1648.91 | bwd_inner_microstep: 1648.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 05:29:53,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 1546.00 | bwd_inner_microstep: 1545.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 05:29:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1283.39 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233
[2024-06-10 05:29:56,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.74 | bwd_microstep: 959.95 | bwd_inner_microstep: 959.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 05:29:58,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.07 | bwd_microstep: 1187.57 | bwd_inner_microstep: 1187.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 05:30:00,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2091
[2024-06-10 05:30:01,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.96 | bwd_microstep: 731.29 | bwd_inner_microstep: 731.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 05:30:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1380.41 | bwd_inner_microstep: 1380.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 05:30:05,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1449.54 | bwd_inner_microstep: 1449.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677
[2024-06-10 05:30:07,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.87 | bwd_microstep: 1719.88 | bwd_inner_microstep: 1719.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3517
[2024-06-10 05:30:09,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1418.99 | bwd_inner_microstep: 1418.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618
[2024-06-10 05:30:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.69 | bwd_microstep: 1312.74 | bwd_inner_microstep: 1312.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 05:30:13,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1390.24 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 05:30:15,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 1399.00 | bwd_inner_microstep: 1398.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 05:30:17,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1515.75 | bwd_inner_microstep: 1515.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 05:30:18,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.53 | bwd_microstep: 1257.33 | bwd_inner_microstep: 1257.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3642
[2024-06-10 05:30:20,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1348.23 | bwd_inner_microstep: 1348.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2268
[2024-06-10 05:30:22,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.97 | bwd_microstep: 1005.25 | bwd_inner_microstep: 1005.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 05:30:24,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1408.00 | bwd_inner_microstep: 1407.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 05:30:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.70 | bwd_microstep: 1559.15 | bwd_inner_microstep: 1559.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673
[2024-06-10 05:30:28,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1328.93 | bwd_inner_microstep: 1328.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 05:30:30,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1496.27 | bwd_inner_microstep: 1496.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 05:30:32,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1376.77 | bwd_inner_microstep: 1376.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469
[2024-06-10 05:30:34,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1538.38 | bwd_inner_microstep: 1538.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-10 05:30:36,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-10 05:30:38,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.16 | bwd_microstep: 1517.88 | bwd_inner_microstep: 1517.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-10 05:30:40,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.14 | bwd_microstep: 1752.75 | bwd_inner_microstep: 1752.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184
[2024-06-10 05:30:42,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.02 | bwd_microstep: 953.05 | bwd_inner_microstep: 953.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 05:30:44,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 05:30:44,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 2179.18 | bwd_inner_microstep: 1616.14 | bwd_allreduce_microstep: 562.99 | step_microstep: 38.35
[2024-06-10 05:30:44,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16263.85 | bwd: 44108.12 | bwd_inner: 43544.10 | bwd_allreduce: 563.28 | step: 39.92
{'loss': 1.3306, 'learning_rate': 3.82279110718543e-05, 'epoch': 0.16}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3449
[2024-06-10 05:30:46,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.96 | bwd_microstep: 1475.14 | bwd_inner_microstep: 1475.08 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871
[2024-06-10 05:30:48,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.62 | bwd_microstep: 1463.19 | bwd_inner_microstep: 1463.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 05:30:50,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.06 | bwd_microstep: 1489.16 | bwd_inner_microstep: 1489.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 05:30:52,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1390.07 | bwd_inner_microstep: 1390.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 05:30:54,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.86 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 05:30:56,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 05:30:58,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.15 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 05:30:59,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 792.99 | bwd_inner_microstep: 792.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613
[2024-06-10 05:31:01,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.15 | bwd_microstep: 1216.69 | bwd_inner_microstep: 1216.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 05:31:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.05 | bwd_inner_microstep: 1385.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495
[2024-06-10 05:31:04,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1410.39 | bwd_inner_microstep: 1410.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2681
[2024-06-10 05:31:06,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.36 | bwd_microstep: 1075.11 | bwd_inner_microstep: 1075.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3419
[2024-06-10 05:31:08,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.54 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3626
[2024-06-10 05:31:10,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.76 | bwd_microstep: 1636.23 | bwd_inner_microstep: 1636.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 05:31:12,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.64 | bwd_microstep: 1618.43 | bwd_inner_microstep: 1618.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-10 05:31:14,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 05:31:16,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.07 | bwd_microstep: 1483.83 | bwd_inner_microstep: 1483.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 05:31:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.91 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-10 05:31:20,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.27 | bwd_microstep: 896.96 | bwd_inner_microstep: 896.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 05:31:22,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.10 | bwd_microstep: 1552.76 | bwd_inner_microstep: 1552.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 05:31:24,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1500.30 | bwd_inner_microstep: 1500.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 05:31:26,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.92 | bwd_microstep: 1579.81 | bwd_inner_microstep: 1579.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 05:31:28,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.76 | bwd_microstep: 1284.25 | bwd_inner_microstep: 1284.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3552
[2024-06-10 05:31:30,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.08 | bwd_microstep: 1341.66 | bwd_inner_microstep: 1341.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3628
[2024-06-10 05:31:32,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1346.84 | bwd_inner_microstep: 1346.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 05:31:34,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.51 | bwd_microstep: 1547.57 | bwd_inner_microstep: 1547.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718
[2024-06-10 05:31:36,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.61 | bwd_microstep: 1336.66 | bwd_inner_microstep: 1336.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 05:31:37,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.43 | bwd_microstep: 974.32 | bwd_inner_microstep: 974.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 05:31:39,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.75 | bwd_microstep: 1501.62 | bwd_inner_microstep: 1501.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 05:31:41,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1505.17 | bwd_inner_microstep: 1505.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 05:31:43,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.41 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 05:31:49,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.62
[2024-06-10 05:31:49,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.85 | bwd_microstep: 5829.98 | bwd_inner_microstep: 1450.02 | bwd_allreduce_microstep: 4379.84 | step_microstep: 39.54
[2024-06-10 05:31:49,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16435.14 | bwd: 48334.90 | bwd_inner: 43954.03 | bwd_allreduce: 4380.15 | step: 41.13
{'loss': 1.3171, 'learning_rate': 3.821243257516188e-05, 'epoch': 0.16}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1865
[2024-06-10 05:31:50,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.13 | bwd_microstep: 763.35 | bwd_inner_microstep: 763.21 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 05:31:52,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1272.35 | bwd_inner_microstep: 1272.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-10 05:31:54,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.20 | bwd_microstep: 1579.28 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 05:31:57,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.61 | bwd_microstep: 1548.23 | bwd_inner_microstep: 1548.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 05:31:59,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1416.95 | bwd_inner_microstep: 1416.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 05:32:00,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.92 | bwd_microstep: 696.77 | bwd_inner_microstep: 696.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 05:32:02,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.23 | bwd_microstep: 1478.84 | bwd_inner_microstep: 1478.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 05:32:03,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1378.13 | bwd_inner_microstep: 1378.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897
[2024-06-10 05:32:04,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 713.48 | bwd_inner_microstep: 713.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 05:32:06,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1410.91 | bwd_inner_microstep: 1410.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442
[2024-06-10 05:32:08,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.97 | bwd_microstep: 1190.38 | bwd_inner_microstep: 1190.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2397
[2024-06-10 05:32:09,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.97 | bwd_microstep: 931.93 | bwd_inner_microstep: 931.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1339
[2024-06-10 05:32:10,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 211.11 | bwd_microstep: 546.28 | bwd_inner_microstep: 546.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1981
[2024-06-10 05:32:11,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.79 | bwd_microstep: 895.26 | bwd_inner_microstep: 895.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379
[2024-06-10 05:32:13,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1429.49 | bwd_inner_microstep: 1429.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 05:32:15,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1489.04 | bwd_inner_microstep: 1489.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 05:32:17,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1276.09 | bwd_inner_microstep: 1276.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 05:32:19,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1356.17 | bwd_inner_microstep: 1356.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 05:32:21,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 05:32:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.13 | bwd_microstep: 1642.62 | bwd_inner_microstep: 1642.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 05:32:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3859
[2024-06-10 05:32:28,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.94 | bwd_microstep: 1696.10 | bwd_inner_microstep: 1696.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 05:32:30,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1644.91 | bwd_inner_microstep: 1644.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 05:32:32,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.94 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 05:32:34,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1312.08 | bwd_inner_microstep: 1312.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 05:32:36,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1393.82 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 05:32:38,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.08 | bwd_microstep: 1601.90 | bwd_inner_microstep: 1601.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-10 05:32:40,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.03 | bwd_microstep: 1289.43 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 05:32:42,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.23 | bwd_microstep: 1542.90 | bwd_inner_microstep: 1542.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 05:32:44,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.56 | bwd_microstep: 1302.58 | bwd_inner_microstep: 1302.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 05:32:45,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1294.13 | bwd_inner_microstep: 1294.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3772
[2024-06-10 05:32:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.38 | optimizer_step: 6.61
[2024-06-10 05:32:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.04 | bwd_microstep: 3859.09 | bwd_inner_microstep: 1630.91 | bwd_allreduce_microstep: 2228.10 | step_microstep: 39.50
[2024-06-10 05:32:50,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15736.75 | bwd: 44341.39 | bwd_inner: 42112.25 | bwd_allreduce: 2228.40 | step: 41.14
{'loss': 1.2995, 'learning_rate': 3.8196889934376617e-05, 'epoch': 0.16}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4264
[2024-06-10 05:32:52,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.93 | bwd_microstep: 1490.21 | bwd_inner_microstep: 1490.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 05:32:54,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.20 | bwd_microstep: 1478.07 | bwd_inner_microstep: 1478.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 05:32:56,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1648.52 | bwd_inner_microstep: 1648.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 05:32:58,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1181.66 | bwd_inner_microstep: 1181.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 05:32:59,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.77 | bwd_microstep: 875.38 | bwd_inner_microstep: 875.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 05:33:01,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.83 | bwd_microstep: 1180.84 | bwd_inner_microstep: 1180.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 05:33:03,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1395.92 | bwd_inner_microstep: 1395.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 05:33:05,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1386.71 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 05:33:06,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1248.60 | bwd_inner_microstep: 1248.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 05:33:08,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1523.84 | bwd_inner_microstep: 1523.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 05:33:11,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1633.03 | bwd_inner_microstep: 1633.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-10 05:33:13,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.16 | bwd_microstep: 1374.29 | bwd_inner_microstep: 1374.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-10 05:33:15,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.67 | bwd_microstep: 1639.49 | bwd_inner_microstep: 1639.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 05:33:17,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1247.48 | bwd_inner_microstep: 1247.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3505
[2024-06-10 05:33:18,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.84 | bwd_microstep: 1337.96 | bwd_inner_microstep: 1337.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 05:33:20,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.96 | bwd_microstep: 912.86 | bwd_inner_microstep: 912.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3711
[2024-06-10 05:33:22,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.05 | bwd_microstep: 1550.66 | bwd_inner_microstep: 1550.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2138
[2024-06-10 05:33:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.94 | bwd_microstep: 928.31 | bwd_inner_microstep: 928.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 05:33:25,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.74 | bwd_microstep: 1509.01 | bwd_inner_microstep: 1508.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 05:33:27,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1454.72 | bwd_inner_microstep: 1454.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 05:33:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.96 | bwd_microstep: 703.44 | bwd_inner_microstep: 703.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 05:33:30,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1490.70 | bwd_inner_microstep: 1490.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 05:33:32,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1557.56 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3613
[2024-06-10 05:33:34,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.78 | bwd_microstep: 1312.99 | bwd_inner_microstep: 1312.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3549
[2024-06-10 05:33:36,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1429.42 | bwd_inner_microstep: 1429.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 05:33:38,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1510.45 | bwd_inner_microstep: 1510.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3606
[2024-06-10 05:33:40,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1576.12 | bwd_inner_microstep: 1576.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2233
[2024-06-10 05:33:42,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.29 | bwd_microstep: 897.67 | bwd_inner_microstep: 897.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 05:33:44,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1555.72 | bwd_inner_microstep: 1555.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 05:33:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1408.47 | bwd_inner_microstep: 1408.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 05:33:48,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.28 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 05:33:50,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 05:33:50,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1466.56 | bwd_allreduce_microstep: 8.41 | step_microstep: 38.32
[2024-06-10 05:33:50,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.37 | bwd: 43215.96 | bwd_inner: 43206.64 | bwd_allreduce: 8.64 | step: 40.03
{'loss': 1.3246, 'learning_rate': 3.81812832042396e-05, 'epoch': 0.16}
17<27:44:16, 68.87s/it]


 16%|█▌        | 276/1726 [4:51:17<27:44:16, 68.87s/it]
 16%|█▌        | 277/1726 [4:52:20<27:03:01, 67.21s/it]


 16%|█▌        | 277/1726 [4:52:20<27:03:01, 67.21s/it]
 16%|█▌        | 278/1726 [4:53:21<26:14:57, 65.26s/it]


 16%|█▌        | 278/1726 [4:53:21<26:14:57, 65.26s/it]
 16%|█▌        | 279/1726 [4:54:26<26:12:54, 65.22s/it]


 16%|█▌        | 279/1726 [4:54:26<26:12:54, 65.22s/it]
 16%|█▌        | 280/1726 [4:55:27<25:37:10, 63.78s/it]


 16%|█▌        | 280/1726 [4:55:27<25:37:10, 63.78s/it]
 16%|█▋        | 281/1726 [4:56:26<25:06:46, 62.56s/it]


 16%|█▋        | 281/1726 [4:56:26<25:06:46, 62.56s/itdynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448
[2024-06-10 05:33:52,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1408.81 | bwd_inner_microstep: 1408.73 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3475
[2024-06-10 05:33:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.54 | bwd_microstep: 1444.29 | bwd_inner_microstep: 1444.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 05:33:55,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1379.28 | bwd_inner_microstep: 1379.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 05:33:57,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.54 | bwd_microstep: 1486.54 | bwd_inner_microstep: 1486.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 05:33:59,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.52 | bwd_microstep: 1351.64 | bwd_inner_microstep: 1351.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 05:34:02,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.36 | bwd_microstep: 1636.31 | bwd_inner_microstep: 1636.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420
[2024-06-10 05:34:03,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.79 | bwd_microstep: 1216.04 | bwd_inner_microstep: 1216.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2476
[2024-06-10 05:34:05,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.95 | bwd_microstep: 982.34 | bwd_inner_microstep: 982.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 05:34:07,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1531.50 | bwd_inner_microstep: 1531.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 05:34:09,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1392.67 | bwd_inner_microstep: 1392.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3687
[2024-06-10 05:34:11,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.11 | bwd_microstep: 1674.38 | bwd_inner_microstep: 1674.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2016
[2024-06-10 05:34:12,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.84 | bwd_microstep: 777.00 | bwd_inner_microstep: 776.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-10 05:34:14,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.36 | bwd_microstep: 1580.44 | bwd_inner_microstep: 1580.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3489
[2024-06-10 05:34:16,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1348.54 | bwd_inner_microstep: 1348.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3596
[2024-06-10 05:34:18,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.99 | bwd_microstep: 1342.76 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 05:34:20,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.32 | bwd_microstep: 1392.31 | bwd_inner_microstep: 1392.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3471
[2024-06-10 05:34:22,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1532.66 | bwd_inner_microstep: 1532.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 05:34:24,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.10 | bwd_microstep: 1492.84 | bwd_inner_microstep: 1492.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 05:34:26,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1408.42 | bwd_inner_microstep: 1408.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 05:34:28,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1394.78 | bwd_inner_microstep: 1394.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857
[2024-06-10 05:34:30,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.56 | bwd_microstep: 1568.01 | bwd_inner_microstep: 1567.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 05:34:32,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1294.72 | bwd_inner_microstep: 1294.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 05:34:34,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2079
[2024-06-10 05:34:35,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.47 | bwd_microstep: 1012.70 | bwd_inner_microstep: 1012.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3536
[2024-06-10 05:34:37,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1591.24 | bwd_inner_microstep: 1591.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 05:34:39,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.67 | bwd_microstep: 1360.89 | bwd_inner_microstep: 1360.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 05:34:41,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.17 | bwd_microstep: 1374.43 | bwd_inner_microstep: 1374.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559
[2024-06-10 05:34:43,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.26 | bwd_microstep: 1428.13 | bwd_inner_microstep: 1428.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3119
[2024-06-10 05:34:45,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.65 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 05:34:47,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1452.39 | bwd_inner_microstep: 1452.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 05:34:49,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.31 | bwd_microstep: 1397.71 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3428
[2024-06-10 05:34:52,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 05:34:52,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.94 | bwd_microstep: 3017.28 | bwd_inner_microstep: 1609.75 | bwd_allreduce_microstep: 1407.48 | step_microstep: 38.70
[2024-06-10 05:34:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16610.54 | bwd: 45947.77 | bwd_inner: 44539.31 | bwd_allreduce: 1407.75 | step: 40.35
{'loss': 1.3765, 'learning_rate': 3.816561243971765e-05, 'epoch': 0.16}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1993
[2024-06-10 05:34:54,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.33 | bwd_microstep: 857.89 | bwd_inner_microstep: 857.74 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 05:34:55,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 696.29 | bwd_inner_microstep: 696.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3420
[2024-06-10 05:34:57,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1411.63 | bwd_inner_microstep: 1411.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 05:34:59,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1378.05 | bwd_inner_microstep: 1378.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 05:35:00,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.81 | bwd_microstep: 1379.67 | bwd_inner_microstep: 1379.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485
[2024-06-10 05:35:02,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.71 | bwd_microstep: 1345.39 | bwd_inner_microstep: 1345.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 05:35:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1384.93 | bwd_inner_microstep: 1384.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 05:35:06,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.30 | bwd_microstep: 1280.97 | bwd_inner_microstep: 1280.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 05:35:08,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.64 | bwd_microstep: 1385.59 | bwd_inner_microstep: 1385.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 05:35:10,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.39 | bwd_microstep: 1292.71 | bwd_inner_microstep: 1292.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940
[2024-06-10 05:35:11,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.94 | bwd_microstep: 889.38 | bwd_inner_microstep: 889.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 05:35:13,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.84 | bwd_microstep: 1626.82 | bwd_inner_microstep: 1626.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 05:35:15,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1383.10 | bwd_inner_microstep: 1383.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2612
[2024-06-10 05:35:17,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.67 | bwd_microstep: 1110.28 | bwd_inner_microstep: 1110.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 05:35:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1483.08 | bwd_inner_microstep: 1483.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 05:35:21,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.22 | bwd_microstep: 1707.44 | bwd_inner_microstep: 1707.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 05:35:23,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.22 | bwd_microstep: 1316.30 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 05:35:24,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.81 | bwd_microstep: 680.88 | bwd_inner_microstep: 680.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653
[2024-06-10 05:35:26,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.45 | bwd_microstep: 1326.43 | bwd_inner_microstep: 1326.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3826
[2024-06-10 05:35:28,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1389.35 | bwd_inner_microstep: 1389.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 05:35:29,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.07 | bwd_microstep: 1291.33 | bwd_inner_microstep: 1291.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 05:35:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1405.86 | bwd_inner_microstep: 1405.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-10 05:35:33,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.55 | bwd_microstep: 1425.99 | bwd_inner_microstep: 1425.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 05:35:36,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.51 | bwd_microstep: 1663.44 | bwd_inner_microstep: 1663.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 05:35:37,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1411.13 | bwd_inner_microstep: 1411.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-10 05:35:39,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.90 | bwd_microstep: 882.20 | bwd_inner_microstep: 882.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 05:35:41,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.92 | bwd_microstep: 1359.45 | bwd_inner_microstep: 1359.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577
[2024-06-10 05:35:43,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1423.88 | bwd_inner_microstep: 1423.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2384
[2024-06-10 05:35:44,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.14 | bwd_microstep: 1126.37 | bwd_inner_microstep: 1126.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 05:35:46,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.17 | bwd_microstep: 1345.07 | bwd_inner_microstep: 1345.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1924
[2024-06-10 05:35:47,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.38 | bwd_microstep: 776.30 | bwd_inner_microstep: 776.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433
[2024-06-10 05:35:55,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.39 | optimizer_step: 6.59
[2024-06-10 05:35:55,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.48 | bwd_microstep: 7231.55 | bwd_inner_microstep: 1812.24 | bwd_allreduce_microstep: 5419.24 | step_microstep: 39.68
[2024-06-10 05:35:55,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15330.94 | bwd: 46668.79 | bwd_inner: 41248.49 | bwd_allreduce: 5419.55 | step: 41.25
{'loss': 1.332, 'learning_rate': 3.814987769600312e-05, 'epoch': 0.16}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3417
[2024-06-10 05:35:57,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1396.83 | bwd_inner_microstep: 1396.73 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393
[2024-06-10 05:35:59,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1240.40 | bwd_inner_microstep: 1240.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851
[2024-06-10 05:36:01,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.65 | bwd_microstep: 1556.43 | bwd_inner_microstep: 1556.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 05:36:03,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1456.50 | bwd_inner_microstep: 1456.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 05:36:05,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.80 | bwd_microstep: 1479.27 | bwd_inner_microstep: 1479.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 05:36:06,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.74 | bwd_microstep: 1247.12 | bwd_inner_microstep: 1247.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 05:36:08,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.93 | bwd_microstep: 793.63 | bwd_inner_microstep: 793.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 05:36:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.65 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-10 05:36:11,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1375.15 | bwd_inner_microstep: 1375.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-10 05:36:13,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.37 | bwd_microstep: 1636.07 | bwd_inner_microstep: 1636.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569
[2024-06-10 05:36:15,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.57 | bwd_microstep: 1448.01 | bwd_inner_microstep: 1447.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 05:36:17,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.56 | bwd_microstep: 916.86 | bwd_inner_microstep: 916.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 05:36:19,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1431.43 | bwd_inner_microstep: 1431.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3436
[2024-06-10 05:36:21,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 05:36:23,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1391.84 | bwd_inner_microstep: 1391.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 05:36:25,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1508.43 | bwd_inner_microstep: 1508.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 05:36:27,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1408.65 | bwd_inner_microstep: 1408.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 05:36:28,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 05:36:30,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.91 | bwd_microstep: 1394.99 | bwd_inner_microstep: 1394.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506
[2024-06-10 05:36:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1194.40 | bwd_inner_microstep: 1194.14 | bwd_allreduce_microstep: 0.17 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 05:36:33,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.04 | bwd_microstep: 797.57 | bwd_inner_microstep: 797.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2185
[2024-06-10 05:36:34,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.26 | bwd_microstep: 889.99 | bwd_inner_microstep: 889.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 05:36:37,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.56 | bwd_microstep: 1560.31 | bwd_inner_microstep: 1560.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 05:36:39,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 1503.50 | bwd_inner_microstep: 1503.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3558
[2024-06-10 05:36:41,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.41 | bwd_microstep: 1463.39 | bwd_inner_microstep: 1463.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3771
[2024-06-10 05:36:43,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.51 | bwd_microstep: 1740.96 | bwd_inner_microstep: 1740.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472
[2024-06-10 05:36:45,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1404.71 | bwd_inner_microstep: 1404.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-10 05:36:47,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1363.33 | bwd_inner_microstep: 1363.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3580
[2024-06-10 05:36:49,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.73 | bwd_microstep: 1242.17 | bwd_inner_microstep: 1242.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402
[2024-06-10 05:36:50,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.38 | bwd_microstep: 1374.03 | bwd_inner_microstep: 1374.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2982
[2024-06-10 05:36:52,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.77 | bwd_microstep: 1336.73 | bwd_inner_microstep: 1336.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3606
[2024-06-10 05:36:55,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-10 05:36:55,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.43 | bwd_microstep: 2287.03 | bwd_inner_microstep: 1823.12 | bwd_allreduce_microstep: 463.86 | step_microstep: 38.48
[2024-06-10 05:36:55,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16191.97 | bwd: 43842.25 | bwd_inner: 43377.18 | bwd_allreduce: 464.30 | step: 40.15
{'loss': 1.3204, 'learning_rate': 3.8134079028513705e-05, 'epoch': 0.16}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 05:36:57,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.50 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 05:36:59,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.15 | bwd_microstep: 1279.16 | bwd_inner_microstep: 1279.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 05:37:01,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1281.16 | bwd_inner_microstep: 1281.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-10 05:37:02,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 1216.58 | bwd_inner_microstep: 1216.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 05:37:04,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1455.52 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 05:37:05,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.89 | bwd_microstep: 797.82 | bwd_inner_microstep: 797.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 05:37:07,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1432.40 | bwd_inner_microstep: 1432.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 05:37:08,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.07 | bwd_microstep: 729.68 | bwd_inner_microstep: 729.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 05:37:10,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.24 | bwd_microstep: 793.02 | bwd_inner_microstep: 792.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-10 05:37:12,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1445.50 | bwd_inner_microstep: 1445.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3704
[2024-06-10 05:37:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1485.03 | bwd_inner_microstep: 1485.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-10 05:37:16,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1372.94 | bwd_inner_microstep: 1372.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3795
[2024-06-10 05:37:18,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.41 | bwd_microstep: 1717.35 | bwd_inner_microstep: 1717.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 05:37:20,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.39 | bwd_microstep: 1381.90 | bwd_inner_microstep: 1381.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514
[2024-06-10 05:37:22,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1448.67 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425
[2024-06-10 05:37:23,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.54 | bwd_microstep: 1217.99 | bwd_inner_microstep: 1217.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-10 05:37:25,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.00 | bwd_microstep: 891.55 | bwd_inner_microstep: 891.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 05:37:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1300.18 | bwd_inner_microstep: 1300.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 05:37:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.84 | bwd_microstep: 1353.09 | bwd_inner_microstep: 1353.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3558
[2024-06-10 05:37:30,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1524.80 | bwd_inner_microstep: 1524.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 05:37:32,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.70 | bwd_microstep: 1280.76 | bwd_inner_microstep: 1280.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 05:37:35,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.44 | bwd_microstep: 1635.40 | bwd_inner_microstep: 1635.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 05:37:37,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1557.34 | bwd_inner_microstep: 1557.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 05:37:38,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.62 | bwd_microstep: 797.67 | bwd_inner_microstep: 797.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 05:37:39,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.60 | bwd_microstep: 1152.57 | bwd_inner_microstep: 1152.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 05:37:41,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1449.47 | bwd_inner_microstep: 1449.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189
[2024-06-10 05:37:43,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.35 | bwd_microstep: 859.29 | bwd_inner_microstep: 859.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2285
[2024-06-10 05:37:44,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.36 | bwd_microstep: 786.39 | bwd_inner_microstep: 786.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 05:37:46,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1404.77 | bwd_inner_microstep: 1404.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 05:37:48,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.53 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 05:37:50,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1544.22 | bwd_inner_microstep: 1544.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-10 05:37:55,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.26 | optimizer_step: 6.61
[2024-06-10 05:37:55,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.97 | bwd_microstep: 4800.80 | bwd_inner_microstep: 1811.01 | bwd_allreduce_microstep: 2989.74 | step_microstep: 38.79
[2024-06-10 05:37:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15400.87 | bwd: 44235.29 | bwd_inner: 41244.64 | bwd_allreduce: 2989.97 | step: 40.63
{'loss': 1.2906, 'learning_rate': 3.811821649289221e-05, 'epoch': 0.17}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 05:37:57,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1362.47 | bwd_inner_microstep: 1362.38 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 05:37:59,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.03 | bwd_microstep: 1273.10 | bwd_inner_microstep: 1273.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 05:38:01,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.68 | bwd_microstep: 1474.95 | bwd_inner_microstep: 1474.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 05:38:03,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1455.89 | bwd_inner_microstep: 1455.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-10 05:38:05,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.00 | bwd_microstep: 1347.41 | bwd_inner_microstep: 1347.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 05:38:07,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.62 | bwd_microstep: 1530.22 | bwd_inner_microstep: 1530.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 05:38:09,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 05:38:10,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.58 | bwd_microstep: 1154.12 | bwd_inner_microstep: 1154.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3402
[2024-06-10 05:38:12,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1392.17 | bwd_inner_microstep: 1392.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 05:38:14,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.49 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137
[2024-06-10 05:38:15,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.48 | bwd_microstep: 927.82 | bwd_inner_microstep: 927.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 05:38:17,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1415.96 | bwd_inner_microstep: 1415.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 05:38:20,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1606.35 | bwd_inner_microstep: 1606.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383
[2024-06-10 05:38:21,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.18 | bwd_microstep: 1243.95 | bwd_inner_microstep: 1243.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 05:38:23,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.77 | bwd_microstep: 1349.67 | bwd_inner_microstep: 1349.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 05:38:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.77 | bwd_microstep: 1409.82 | bwd_inner_microstep: 1409.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 05:38:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.17 | bwd_inner_microstep: 1283.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 05:38:29,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 1509.83 | bwd_inner_microstep: 1509.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 05:38:31,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1318.82 | bwd_inner_microstep: 1318.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 05:38:33,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1297.35 | bwd_inner_microstep: 1297.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167
[2024-06-10 05:38:34,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.98 | bwd_microstep: 856.96 | bwd_inner_microstep: 856.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 05:38:36,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1285.64 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3670
[2024-06-10 05:38:37,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.42 | bwd_microstep: 1389.11 | bwd_inner_microstep: 1389.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 05:38:39,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1289.62 | bwd_inner_microstep: 1289.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 05:38:41,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1390.28 | bwd_inner_microstep: 1390.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2038
[2024-06-10 05:38:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.13 | bwd_microstep: 809.82 | bwd_inner_microstep: 809.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3598
[2024-06-10 05:38:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.43 | bwd_microstep: 1674.59 | bwd_inner_microstep: 1674.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 05:38:46,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1337.65 | bwd_inner_microstep: 1337.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2070
[2024-06-10 05:38:48,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.85 | bwd_microstep: 946.81 | bwd_inner_microstep: 946.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3609
[2024-06-10 05:38:50,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.71 | bwd_microstep: 1771.59 | bwd_inner_microstep: 1771.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 05:38:52,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.60 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2888
[2024-06-10 05:38:57,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.35 | optimizer_step: 6.60
[2024-06-10 05:38:57,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.45 | bwd_microstep: 4655.40 | bwd_inner_microstep: 1303.66 | bwd_allreduce_microstep: 3351.66 | step_microstep: 39.46
[2024-06-10 05:38:57,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15892.33 | bwd: 45790.82 | bwd_inner: 42438.15 | bwd_allreduce: 3351.95 | step: 41.21
{'loss': 1.3258, 'learning_rate': 3.810229014500643e-05, 'epoch': 0.17}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 05:38:59,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.65 | bwd_microstep: 1463.92 | bwd_inner_microstep: 1463.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3887
[2024-06-10 05:39:02,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.64 | bwd_microstep: 1681.83 | bwd_inner_microstep: 1681.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 05:39:04,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.15 | bwd_microstep: 1656.80 | bwd_inner_microstep: 1656.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 05:39:06,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1345.81 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4136
[2024-06-10 05:39:08,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.10 | bwd_microstep: 1643.12 | bwd_inner_microstep: 1643.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 05:39:10,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.21 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754
[2024-06-10 05:39:12,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.28 | bwd_microstep: 1443.83 | bwd_inner_microstep: 1443.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3922
[2024-06-10 05:39:14,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.51 | bwd_microstep: 1594.03 | bwd_inner_microstep: 1594.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 05:39:16,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.53 | bwd_microstep: 1256.63 | bwd_inner_microstep: 1256.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 05:39:17,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 701.65 | bwd_inner_microstep: 701.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 05:39:18,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.12 | bwd_microstep: 1153.02 | bwd_inner_microstep: 1152.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 05:39:20,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.08 | bwd_microstep: 1192.22 | bwd_inner_microstep: 1192.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 05:39:22,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.23 | bwd_microstep: 1407.62 | bwd_inner_microstep: 1407.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583
[2024-06-10 05:39:24,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.17 | bwd_microstep: 1465.25 | bwd_inner_microstep: 1465.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399
[2024-06-10 05:39:26,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.06 | bwd_microstep: 1392.07 | bwd_inner_microstep: 1392.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3203
[2024-06-10 05:39:27,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.65 | bwd_microstep: 1143.41 | bwd_inner_microstep: 1143.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3614
[2024-06-10 05:39:29,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1276.44 | bwd_inner_microstep: 1276.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-10 05:39:31,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.48 | bwd_microstep: 923.05 | bwd_inner_microstep: 923.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 05:39:32,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.76 | bwd_microstep: 1256.55 | bwd_inner_microstep: 1256.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-10 05:39:34,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.72 | bwd_microstep: 1590.30 | bwd_inner_microstep: 1590.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891
[2024-06-10 05:39:35,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.29 | bwd_microstep: 720.60 | bwd_inner_microstep: 720.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 05:39:37,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1402.22 | bwd_inner_microstep: 1402.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 05:39:39,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1406.06 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 05:39:41,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1560.81 | bwd_inner_microstep: 1560.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3604
[2024-06-10 05:39:44,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 1642.42 | bwd_inner_microstep: 1642.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551
[2024-06-10 05:39:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.95 | bwd_microstep: 1422.85 | bwd_inner_microstep: 1422.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 05:39:48,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.10 | bwd_microstep: 1431.60 | bwd_inner_microstep: 1431.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 05:39:50,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1445.69 | bwd_inner_microstep: 1445.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2101
[2024-06-10 05:39:51,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.62 | bwd_microstep: 922.98 | bwd_inner_microstep: 922.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3588
[2024-06-10 05:39:53,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.30 | bwd_microstep: 1637.37 | bwd_inner_microstep: 1637.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805
[2024-06-10 05:39:56,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.18 | bwd_microstep: 1752.00 | bwd_inner_microstep: 1751.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 05:39:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 05:39:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1948.86 | bwd_inner_microstep: 1398.87 | bwd_allreduce_microstep: 549.95 | step_microstep: 38.51
[2024-06-10 05:39:58,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16279.02 | bwd: 44164.41 | bwd_inner: 43613.52 | bwd_allreduce: 550.18 | step: 40.20
{'loss': 1.3549, 'learning_rate': 3.8086300040948854e-05, 'epoch': 0.17}
]
 16%|█▋        | 282/1726 [4:57:29<25:08:18, 62.67s/it]


 16%|█▋        | 282/1726 [4:57:29<25:08:18, 62.67s/it]
 16%|█▋        | 283/1726 [4:58:32<25:04:54, 62.57s/it]


 16%|█▋        | 283/1726 [4:58:32<25:04:54, 62.57s/it]
 16%|█▋        | 284/1726 [4:59:32<24:48:09, 61.92s/it]


 16%|█▋        | 284/1726 [4:59:32<24:48:09, 61.92s/it]
 17%|█▋        | 285/1726 [5:00:32<24:33:20, 61.35s/it]


 17%|█▋        | 285/1726 [5:00:32<24:33:20, 61.35s/it]
 17%|█▋        | 286/1726 [5:01:34<24:37:18, 61.55s/it]


 17%|█▋        | 286/1726 [5:01:34<24:37:18, 61.55s/it]
 17%|█▋        | 287/1726 [5:02:35<24:30:48, 61.33s/it]


 17%|█▋        |dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 05:40:00,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1582.20 | bwd_inner_microstep: 1582.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 05:40:02,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1395.32 | bwd_inner_microstep: 1395.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3887
[2024-06-10 05:40:04,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1387.21 | bwd_inner_microstep: 1387.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3404
[2024-06-10 05:40:06,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.54 | bwd_microstep: 1198.61 | bwd_inner_microstep: 1198.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 05:40:07,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.03 | bwd_microstep: 961.36 | bwd_inner_microstep: 961.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 05:40:09,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1533.35 | bwd_inner_microstep: 1533.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-10 05:40:11,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1316.00 | bwd_inner_microstep: 1315.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 05:40:13,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1389.26 | bwd_inner_microstep: 1389.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 05:40:15,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1392.32 | bwd_inner_microstep: 1392.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 05:40:17,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 05:40:19,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.31 | bwd_microstep: 1395.99 | bwd_inner_microstep: 1395.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3513
[2024-06-10 05:40:20,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.25 | bwd_microstep: 1224.77 | bwd_inner_microstep: 1224.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695
[2024-06-10 05:40:22,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.42 | bwd_microstep: 1465.31 | bwd_inner_microstep: 1465.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-10 05:40:24,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1286.32 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 05:40:26,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-10 05:40:28,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.75 | bwd_microstep: 1588.67 | bwd_inner_microstep: 1588.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 05:40:30,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.29 | bwd_microstep: 1607.09 | bwd_inner_microstep: 1607.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2824
[2024-06-10 05:40:32,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.91 | bwd_microstep: 1161.03 | bwd_inner_microstep: 1161.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 05:40:34,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1290.26 | bwd_inner_microstep: 1290.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3838
[2024-06-10 05:40:36,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.75 | bwd_microstep: 1362.53 | bwd_inner_microstep: 1362.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977
[2024-06-10 05:40:37,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.42 | bwd_microstep: 706.28 | bwd_inner_microstep: 706.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 05:40:39,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1500.64 | bwd_inner_microstep: 1500.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 05:40:41,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.88 | bwd_microstep: 1452.69 | bwd_inner_microstep: 1452.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 05:40:42,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.13 | bwd_microstep: 699.88 | bwd_inner_microstep: 699.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3819
[2024-06-10 05:40:43,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.13 | bwd_microstep: 1260.50 | bwd_inner_microstep: 1260.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 05:40:45,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1400.77 | bwd_inner_microstep: 1400.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2933
[2024-06-10 05:40:47,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.91 | bwd_microstep: 1096.09 | bwd_inner_microstep: 1096.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-10 05:40:49,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.08 | bwd_microstep: 1711.36 | bwd_inner_microstep: 1711.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3585
[2024-06-10 05:40:51,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1460.92 | bwd_inner_microstep: 1460.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 05:40:53,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.89 | bwd_microstep: 1446.59 | bwd_inner_microstep: 1446.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 05:40:55,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.01 | bwd_microstep: 1589.95 | bwd_inner_microstep: 1589.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584
[2024-06-10 05:40:58,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 05:40:58,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 2393.43 | bwd_inner_microstep: 1678.77 | bwd_allreduce_microstep: 714.61 | step_microstep: 38.40
[2024-06-10 05:40:58,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16120.96 | bwd: 43895.38 | bwd_inner: 43179.87 | bwd_allreduce: 714.83 | step: 40.02
{'loss': 1.2404, 'learning_rate': 3.807024623703655e-05, 'epoch': 0.17}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-10 05:41:01,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.30 | bwd_microstep: 1574.76 | bwd_inner_microstep: 1574.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 05:41:03,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.98 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1483.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3796
[2024-06-10 05:41:05,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.34 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 05:41:06,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.44 | bwd_microstep: 781.52 | bwd_inner_microstep: 781.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1968
[2024-06-10 05:41:07,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.14 | bwd_microstep: 858.97 | bwd_inner_microstep: 858.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3493
[2024-06-10 05:41:09,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.01 | bwd_microstep: 1217.25 | bwd_inner_microstep: 1217.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 05:41:11,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1316.86 | bwd_inner_microstep: 1316.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3909
[2024-06-10 05:41:13,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.96 | bwd_microstep: 1698.45 | bwd_inner_microstep: 1698.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-10 05:41:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1286.33 | bwd_inner_microstep: 1286.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 05:41:16,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1347.02 | bwd_inner_microstep: 1346.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3618
[2024-06-10 05:41:18,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.74 | bwd_microstep: 1373.19 | bwd_inner_microstep: 1373.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3509
[2024-06-10 05:41:21,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.27 | bwd_microstep: 1685.58 | bwd_inner_microstep: 1685.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3633
[2024-06-10 05:41:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.26 | bwd_microstep: 1678.08 | bwd_inner_microstep: 1678.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-10 05:41:25,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1416.37 | bwd_inner_microstep: 1416.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 05:41:27,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.39 | bwd_microstep: 1295.71 | bwd_inner_microstep: 1295.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 05:41:29,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1397.41 | bwd_inner_microstep: 1397.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 05:41:30,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1277.89 | bwd_inner_microstep: 1277.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 05:41:32,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1393.24 | bwd_inner_microstep: 1393.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2338
[2024-06-10 05:41:34,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.02 | bwd_microstep: 954.45 | bwd_inner_microstep: 954.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2186
[2024-06-10 05:41:35,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.39 | bwd_microstep: 794.23 | bwd_inner_microstep: 794.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3724
[2024-06-10 05:41:37,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.33 | bwd_microstep: 1337.73 | bwd_inner_microstep: 1337.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822
[2024-06-10 05:41:39,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.56 | bwd_microstep: 1691.44 | bwd_inner_microstep: 1691.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 05:41:40,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 700.88 | bwd_inner_microstep: 700.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807
[2024-06-10 05:41:42,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.36 | bwd_microstep: 1360.28 | bwd_inner_microstep: 1360.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 05:41:44,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1381.53 | bwd_inner_microstep: 1381.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 05:41:46,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1351.41 | bwd_inner_microstep: 1351.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 05:41:47,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1342.39 | bwd_inner_microstep: 1342.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3586
[2024-06-10 05:41:50,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 1704.44 | bwd_inner_microstep: 1704.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2009
[2024-06-10 05:41:51,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.56 | bwd_microstep: 757.60 | bwd_inner_microstep: 757.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830
[2024-06-10 05:41:53,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.33 | bwd_microstep: 1754.86 | bwd_inner_microstep: 1754.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2698
[2024-06-10 05:41:55,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.25 | bwd_microstep: 1132.77 | bwd_inner_microstep: 1132.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-10 05:42:00,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 05:42:00,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 4535.04 | bwd_inner_microstep: 1633.54 | bwd_allreduce_microstep: 2901.45 | step_microstep: 38.67
[2024-06-10 05:42:00,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15808.87 | bwd: 45362.70 | bwd_inner: 42460.34 | bwd_allreduce: 2901.67 | step: 40.35
{'loss': 1.3264, 'learning_rate': 3.805412878981095e-05, 'epoch': 0.17}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 05:42:02,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.26 | bwd_microstep: 1268.61 | bwd_inner_microstep: 1268.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420
[2024-06-10 05:42:03,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.97 | bwd_microstep: 1151.28 | bwd_inner_microstep: 1151.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4036
[2024-06-10 05:42:06,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.45 | bwd_microstep: 1720.19 | bwd_inner_microstep: 1720.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 05:42:08,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1382.35 | bwd_inner_microstep: 1382.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 05:42:09,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.02 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 05:42:11,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.27 | bwd_microstep: 1386.48 | bwd_inner_microstep: 1386.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3499
[2024-06-10 05:42:13,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1347.95 | bwd_inner_microstep: 1347.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3528
[2024-06-10 05:42:15,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.44 | bwd_microstep: 1229.26 | bwd_inner_microstep: 1229.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868
[2024-06-10 05:42:17,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.21 | bwd_microstep: 1670.20 | bwd_inner_microstep: 1670.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 05:42:19,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.93 | bwd_microstep: 1154.89 | bwd_inner_microstep: 1154.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3705
[2024-06-10 05:42:21,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.95 | bwd_microstep: 1459.15 | bwd_inner_microstep: 1459.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 05:42:23,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1255.66 | bwd_inner_microstep: 1255.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2093
[2024-06-10 05:42:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.09 | bwd_microstep: 759.48 | bwd_inner_microstep: 759.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 05:42:26,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1382.06 | bwd_inner_microstep: 1382.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 05:42:27,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1428.72 | bwd_inner_microstep: 1428.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1850
[2024-06-10 05:42:28,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.24 | bwd_microstep: 702.72 | bwd_inner_microstep: 702.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3646
[2024-06-10 05:42:31,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.39 | bwd_microstep: 1710.87 | bwd_inner_microstep: 1710.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3680
[2024-06-10 05:42:33,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.53 | bwd_microstep: 1723.00 | bwd_inner_microstep: 1722.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3624
[2024-06-10 05:42:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.33 | bwd_microstep: 1676.98 | bwd_inner_microstep: 1676.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2159
[2024-06-10 05:42:37,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.77 | bwd_microstep: 759.87 | bwd_inner_microstep: 759.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3835
[2024-06-10 05:42:39,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1423.58 | bwd_inner_microstep: 1423.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723
[2024-06-10 05:42:40,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1342.00 | bwd_inner_microstep: 1341.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 05:42:42,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1400.13 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 05:42:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.17 | bwd_microstep: 1407.39 | bwd_inner_microstep: 1407.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551
[2024-06-10 05:42:46,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.57 | bwd_microstep: 1201.45 | bwd_inner_microstep: 1201.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830
[2024-06-10 05:42:48,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.73 | bwd_microstep: 1489.35 | bwd_inner_microstep: 1489.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 05:42:50,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-10 05:42:52,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.22 | bwd_microstep: 1648.46 | bwd_inner_microstep: 1648.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 05:42:54,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1351.44 | bwd_inner_microstep: 1351.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2240
[2024-06-10 05:42:55,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.11 | bwd_microstep: 1062.58 | bwd_inner_microstep: 1062.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 05:42:58,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1553.60 | bwd_inner_microstep: 1553.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3602
[2024-06-10 05:43:03,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.84 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 05:43:03,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.45 | bwd_microstep: 4612.06 | bwd_inner_microstep: 1936.49 | bwd_allreduce_microstep: 2675.51 | step_microstep: 40.77
[2024-06-10 05:43:03,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16249.76 | bwd: 46296.56 | bwd_inner: 43620.15 | bwd_allreduce: 2675.74 | step: 42.46
{'loss': 1.2414, 'learning_rate': 3.80379477560376e-05, 'epoch': 0.17}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971
[2024-06-10 05:43:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.81 | bwd_microstep: 887.80 | bwd_inner_microstep: 887.65 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2407
[2024-06-10 05:43:05,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.45 | bwd_microstep: 1001.68 | bwd_inner_microstep: 1001.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472
[2024-06-10 05:43:07,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.79 | bwd_microstep: 1214.11 | bwd_inner_microstep: 1214.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3866
[2024-06-10 05:43:09,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.98 | bwd_microstep: 1659.22 | bwd_inner_microstep: 1659.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 05:43:11,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1250.40 | bwd_inner_microstep: 1250.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 05:43:13,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.44 | bwd_microstep: 1345.09 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3733
[2024-06-10 05:43:15,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1400.53 | bwd_inner_microstep: 1400.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 05:43:17,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.70 | bwd_microstep: 1284.42 | bwd_inner_microstep: 1284.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3435
[2024-06-10 05:43:19,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.79 | bwd_microstep: 1374.22 | bwd_inner_microstep: 1374.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-10 05:43:21,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.68 | bwd_microstep: 1445.45 | bwd_inner_microstep: 1445.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3677
[2024-06-10 05:43:23,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.02 | bwd_microstep: 1827.17 | bwd_inner_microstep: 1827.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 05:43:25,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.25 | bwd_microstep: 1618.82 | bwd_inner_microstep: 1618.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 05:43:28,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.92 | bwd_microstep: 1534.92 | bwd_inner_microstep: 1534.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693
[2024-06-10 05:43:30,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.90 | bwd_microstep: 1728.15 | bwd_inner_microstep: 1728.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 05:43:32,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.02 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 05:43:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1415.69 | bwd_inner_microstep: 1415.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3522
[2024-06-10 05:43:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.84 | bwd_microstep: 1230.78 | bwd_inner_microstep: 1230.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2713
[2024-06-10 05:43:37,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.14 | bwd_microstep: 1006.55 | bwd_inner_microstep: 1006.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3688
[2024-06-10 05:43:39,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1335.01 | bwd_inner_microstep: 1334.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103
[2024-06-10 05:43:40,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.35 | bwd_microstep: 823.78 | bwd_inner_microstep: 823.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 05:43:42,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1431.45 | bwd_inner_microstep: 1431.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3858
[2024-06-10 05:43:44,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.14 | bwd_microstep: 1767.92 | bwd_inner_microstep: 1767.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549
[2024-06-10 05:43:46,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.93 | bwd_microstep: 1198.56 | bwd_inner_microstep: 1198.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 05:43:48,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.30 | bwd_microstep: 1286.34 | bwd_inner_microstep: 1286.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3621
[2024-06-10 05:43:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.26 | bwd_microstep: 1248.33 | bwd_inner_microstep: 1248.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 05:43:52,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1556.19 | bwd_inner_microstep: 1556.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 05:43:54,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.21 | bwd_microstep: 1629.65 | bwd_inner_microstep: 1629.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2959
[2024-06-10 05:43:56,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.85 | bwd_microstep: 1200.20 | bwd_inner_microstep: 1200.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 05:43:57,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1342.94 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3814
[2024-06-10 05:43:59,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 05:44:01,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.08 | bwd_microstep: 976.50 | bwd_inner_microstep: 976.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 05:44:04,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 05:44:04,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 2208.67 | bwd_inner_microstep: 1489.43 | bwd_allreduce_microstep: 719.19 | step_microstep: 38.33
[2024-06-10 05:44:04,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16198.64 | bwd: 44126.37 | bwd_inner: 43406.17 | bwd_allreduce: 719.47 | step: 39.93
{'loss': 1.2998, 'learning_rate': 3.8021703192706023e-05, 'epoch': 0.17}
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4038
[2024-06-10 05:44:06,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.64 | bwd_microstep: 1807.39 | bwd_inner_microstep: 1807.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 05:44:08,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478
[2024-06-10 05:44:10,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1243.32 | bwd_inner_microstep: 1243.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 05:44:11,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.95 | bwd_microstep: 1288.57 | bwd_inner_microstep: 1288.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 05:44:12,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 679.67 | bwd_inner_microstep: 679.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 05:44:14,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 05:44:16,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.82 | bwd_microstep: 1286.79 | bwd_inner_microstep: 1286.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1388
[2024-06-10 05:44:16,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 204.60 | bwd_microstep: 528.41 | bwd_inner_microstep: 528.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 05:44:18,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1430.69 | bwd_inner_microstep: 1430.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 05:44:20,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.14 | bwd_microstep: 1429.74 | bwd_inner_microstep: 1429.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 05:44:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1485.99 | bwd_inner_microstep: 1485.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 05:44:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-10 05:44:27,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.53 | bwd_microstep: 1721.70 | bwd_inner_microstep: 1721.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 05:44:29,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1250.75 | bwd_inner_microstep: 1250.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3635
[2024-06-10 05:44:31,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.59 | bwd_microstep: 1376.55 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2119
[2024-06-10 05:44:32,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.32 | bwd_microstep: 827.14 | bwd_inner_microstep: 827.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 05:44:33,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1300.93 | bwd_inner_microstep: 1300.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 05:44:36,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1496.83 | bwd_inner_microstep: 1496.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 05:44:37,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1309.99 | bwd_inner_microstep: 1309.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-10 05:44:39,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.48 | bwd_microstep: 1199.78 | bwd_inner_microstep: 1199.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 05:44:41,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1497.42 | bwd_inner_microstep: 1497.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1992
[2024-06-10 05:44:42,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.78 | bwd_microstep: 709.02 | bwd_inner_microstep: 708.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3550
[2024-06-10 05:44:44,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.44 | bwd_microstep: 1202.88 | bwd_inner_microstep: 1202.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2032
[2024-06-10 05:44:45,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.61 | bwd_microstep: 747.69 | bwd_inner_microstep: 747.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 05:44:47,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1411.08 | bwd_inner_microstep: 1411.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3601
[2024-06-10 05:44:49,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1342.28 | bwd_inner_microstep: 1342.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4151
[2024-06-10 05:44:51,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.53 | bwd_microstep: 1554.47 | bwd_inner_microstep: 1554.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 05:44:53,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1615.17 | bwd_inner_microstep: 1615.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 05:44:55,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.56 | bwd_microstep: 1557.95 | bwd_inner_microstep: 1557.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3377
[2024-06-10 05:44:57,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.69 | bwd_microstep: 1437.66 | bwd_inner_microstep: 1437.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3644
[2024-06-10 05:44:59,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.01 | bwd_microstep: 1680.29 | bwd_inner_microstep: 1680.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424
[2024-06-10 05:45:05,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.35 | optimizer_step: 6.65
[2024-06-10 05:45:05,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.85 | bwd_microstep: 4665.84 | bwd_inner_microstep: 1485.71 | bwd_allreduce_microstep: 3180.07 | step_microstep: 39.11
[2024-06-10 05:45:05,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15667.52 | bwd: 45069.22 | bwd_inner: 41888.24 | bwd_allreduce: 3180.29 | step: 40.77
{'loss': 1.3716, 'learning_rate': 3.800539515702949e-05, 'epoch': 0.17}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472
[2024-06-10 05:45:07,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.41 | bwd_microstep: 1570.11 | bwd_inner_microstep: 1570.02 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933
[2024-06-10 05:45:09,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1594.26 | bwd_inner_microstep: 1594.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 05:45:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1309.22 | bwd_inner_microstep: 1309.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 05:45:13,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1444.16 | bwd_inner_microstep: 1444.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2030
[2024-06-10 05:45:14,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.91 | bwd_microstep: 714.36 | bwd_inner_microstep: 714.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 05:45:16,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1458.22 | bwd_inner_microstep: 1458.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3500
[2024-06-10 05:45:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.08 | bwd_microstep: 1222.93 | bwd_inner_microstep: 1222.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3737
[2024-06-10 05:45:20,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.30 | bwd_microstep: 1467.32 | bwd_inner_microstep: 1467.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-10 05:45:21,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.23 | bwd_microstep: 1313.66 | bwd_inner_microstep: 1313.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443
[2024-06-10 05:45:23,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.94 | bwd_microstep: 1289.84 | bwd_inner_microstep: 1289.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 05:45:25,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1485.24 | bwd_inner_microstep: 1485.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1993
[2024-06-10 05:45:26,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.78 | bwd_microstep: 899.84 | bwd_inner_microstep: 899.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966
[2024-06-10 05:45:28,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.72 | bwd_microstep: 855.47 | bwd_inner_microstep: 855.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 05:45:30,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.84 | bwd_microstep: 1420.76 | bwd_inner_microstep: 1420.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 05:45:32,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.72 | bwd_microstep: 1504.10 | bwd_inner_microstep: 1504.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 05:45:34,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1629.89 | bwd_inner_microstep: 1629.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513
[2024-06-10 05:45:36,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.95 | bwd_microstep: 1192.98 | bwd_inner_microstep: 1192.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 05:45:38,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1399.74 | bwd_inner_microstep: 1399.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 05:45:40,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.45 | bwd_microstep: 1635.16 | bwd_inner_microstep: 1635.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 05:45:42,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1393.22 | bwd_inner_microstep: 1393.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-10 05:45:43,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.23 | bwd_microstep: 730.82 | bwd_inner_microstep: 730.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2203
[2024-06-10 05:45:44,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.32 | bwd_microstep: 866.17 | bwd_inner_microstep: 866.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 05:45:46,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.68 | bwd_microstep: 1637.65 | bwd_inner_microstep: 1637.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 05:45:48,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.24 | bwd_microstep: 1287.57 | bwd_inner_microstep: 1287.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1979
[2024-06-10 05:45:49,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.36 | bwd_microstep: 828.36 | bwd_inner_microstep: 828.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2281
[2024-06-10 05:45:50,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.06 | bwd_microstep: 937.86 | bwd_inner_microstep: 937.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 05:45:52,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1352.11 | bwd_inner_microstep: 1352.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046
[2024-06-10 05:45:54,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.44 | bwd_microstep: 904.40 | bwd_inner_microstep: 904.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806
[2024-06-10 05:45:56,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.83 | bwd_microstep: 1610.48 | bwd_inner_microstep: 1610.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580
[2024-06-10 05:45:58,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1453.41 | bwd_inner_microstep: 1453.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 05:46:00,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1648.39 | bwd_inner_microstep: 1648.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2073
[2024-06-10 05:46:06,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 05:46:06,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.04 | bwd_microstep: 5199.33 | bwd_inner_microstep: 1157.30 | bwd_allreduce_microstep: 4041.95 | step_microstep: 39.28
[2024-06-10 05:46:06,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15375.16 | bwd: 45257.04 | bwd_inner: 41214.08 | bwd_allreduce: 4042.25 | step: 40.94
 287/1726 [5:02:35<24:30:48, 61.33s/it]
 17%|█▋        | 288/1726 [5:03:35<24:22:55, 61.04s/it]


 17%|█▋        | 288/1726 [5:03:35<24:22:55, 61.04s/it]
 17%|█▋        | 289/1726 [5:04:37<24:25:22, 61.18s/it]


 17%|█▋        | 289/1726 [5:04:37<24:25:22, 61.18s/it]
 17%|█▋        | 290/1726 [5:05:40<24:36:40, 61.70s/it]


 17%|█▋        | 290/1726 [5:05:40<24:36:40, 61.70s/it]
 17%|█▋        | 291/1726 [5:06:40<24:28:17, 61.39s/it]


 17%|█▋        | 291/1726 [5:06:40<24:28:17, 61.39s/it]
 17%|█▋        | 292/1726 [5:07:41<24:25:05, 61.30s/it]


 17%|█▋        | 292/1726 [5:07:41<24:25:05, 61.30s/it]
 17%|█▋        | 293/1726 [5:08:42<24:21:46, 61.21s/it]
                                       {'loss': 1.3045, 'learning_rate': 3.798902370644482e-05, 'epoch': 0.17}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-10 05:46:07,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.47 | bwd_microstep: 1193.94 | bwd_inner_microstep: 1193.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3973
[2024-06-10 05:46:10,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.94 | bwd_microstep: 1696.53 | bwd_inner_microstep: 1696.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 05:46:11,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.56 | bwd_microstep: 788.74 | bwd_inner_microstep: 788.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482
[2024-06-10 05:46:13,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1322.39 | bwd_inner_microstep: 1322.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 05:46:15,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1443.03 | bwd_inner_microstep: 1443.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407
[2024-06-10 05:46:16,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.88 | bwd_microstep: 1211.26 | bwd_inner_microstep: 1211.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-10 05:46:18,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.43 | bwd_microstep: 1277.84 | bwd_inner_microstep: 1277.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 05:46:19,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.09 | bwd_microstep: 799.48 | bwd_inner_microstep: 799.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 05:46:21,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1389.48 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 05:46:23,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.58 | bwd_microstep: 1154.32 | bwd_inner_microstep: 1154.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1903
[2024-06-10 05:46:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.16 | bwd_microstep: 746.27 | bwd_inner_microstep: 746.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2191
[2024-06-10 05:46:25,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.53 | bwd_microstep: 1049.39 | bwd_inner_microstep: 1049.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 05:46:27,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.56 | bwd_microstep: 1440.05 | bwd_inner_microstep: 1440.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536
[2024-06-10 05:46:29,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.35 | bwd_microstep: 1327.65 | bwd_inner_microstep: 1327.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3593
[2024-06-10 05:46:31,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.77 | bwd_microstep: 1574.62 | bwd_inner_microstep: 1574.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158
[2024-06-10 05:46:32,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.68 | bwd_microstep: 950.42 | bwd_inner_microstep: 950.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 05:46:35,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1513.03 | bwd_inner_microstep: 1513.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533
[2024-06-10 05:46:37,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1592.10 | bwd_inner_microstep: 1592.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2081
[2024-06-10 05:46:38,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.26 | bwd_microstep: 725.26 | bwd_inner_microstep: 725.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 05:46:39,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.80 | bwd_microstep: 725.81 | bwd_inner_microstep: 725.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 05:46:41,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.91 | bwd_microstep: 1559.68 | bwd_inner_microstep: 1559.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3715
[2024-06-10 05:46:43,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1368.12 | bwd_inner_microstep: 1368.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 05:46:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.20 | bwd_microstep: 1298.05 | bwd_inner_microstep: 1298.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 05:46:46,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1350.88 | bwd_inner_microstep: 1350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3473
[2024-06-10 05:46:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1366.00 | bwd_inner_microstep: 1365.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 05:46:50,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.14 | bwd_microstep: 1312.32 | bwd_inner_microstep: 1312.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 05:46:52,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1503.47 | bwd_inner_microstep: 1503.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 05:46:53,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.51 | bwd_microstep: 880.47 | bwd_inner_microstep: 880.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3534
[2024-06-10 05:46:55,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.68 | bwd_microstep: 1199.67 | bwd_inner_microstep: 1199.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3127
[2024-06-10 05:46:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1406.26 | bwd_inner_microstep: 1406.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 05:46:59,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.22 | bwd_microstep: 1535.64 | bwd_inner_microstep: 1535.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 05:47:08,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.28 | optimizer_step: 6.60
[2024-06-10 05:47:08,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 7905.89 | bwd_inner_microstep: 1697.33 | bwd_allreduce_microstep: 6208.51 | step_microstep: 39.24
[2024-06-10 05:47:08,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15116.86 | bwd: 46608.07 | bwd_inner: 40398.65 | bwd_allreduce: 6208.74 | step: 40.84
{'loss': 1.2502, 'learning_rate': 3.797258889861216e-05, 'epoch': 0.17}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 05:47:09,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.24 | bwd_microstep: 1241.12 | bwd_inner_microstep: 1241.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3839
[2024-06-10 05:47:12,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.96 | bwd_microstep: 1513.22 | bwd_inner_microstep: 1513.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 05:47:13,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1243.29 | bwd_inner_microstep: 1243.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 05:47:15,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.24 | bwd_microstep: 1374.01 | bwd_inner_microstep: 1373.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3417
[2024-06-10 05:47:17,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.70 | bwd_microstep: 1184.04 | bwd_inner_microstep: 1184.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3460
[2024-06-10 05:47:19,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.23 | bwd_microstep: 1240.93 | bwd_inner_microstep: 1240.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-10 05:47:19,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.24 | bwd_microstep: 712.65 | bwd_inner_microstep: 712.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 05:47:21,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1388.46 | bwd_inner_microstep: 1388.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 05:47:23,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1380.97 | bwd_inner_microstep: 1380.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 05:47:26,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.49 | bwd_microstep: 1623.36 | bwd_inner_microstep: 1623.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3643
[2024-06-10 05:47:28,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.17 | bwd_microstep: 1576.90 | bwd_inner_microstep: 1576.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1937
[2024-06-10 05:47:29,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.33 | bwd_microstep: 882.52 | bwd_inner_microstep: 882.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2096
[2024-06-10 05:47:30,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.02 | bwd_microstep: 1015.15 | bwd_inner_microstep: 1015.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3901
[2024-06-10 05:47:33,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.47 | bwd_microstep: 1783.24 | bwd_inner_microstep: 1783.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-10 05:47:34,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.39 | bwd_microstep: 704.45 | bwd_inner_microstep: 704.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 05:47:36,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.86 | bwd_microstep: 1483.47 | bwd_inner_microstep: 1483.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2584
[2024-06-10 05:47:37,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.83 | bwd_microstep: 975.05 | bwd_inner_microstep: 975.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 05:47:38,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.13 | bwd_microstep: 796.01 | bwd_inner_microstep: 795.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 05:47:40,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1407.96 | bwd_inner_microstep: 1407.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3864
[2024-06-10 05:47:43,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.42 | bwd_microstep: 1666.64 | bwd_inner_microstep: 1666.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 05:47:44,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1390.47 | bwd_inner_microstep: 1390.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-10 05:47:46,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.86 | bwd_microstep: 1456.11 | bwd_inner_microstep: 1456.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2398
[2024-06-10 05:47:48,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.72 | bwd_microstep: 1004.33 | bwd_inner_microstep: 1004.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3609
[2024-06-10 05:47:50,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.24 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583
[2024-06-10 05:47:52,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1532.21 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 05:47:54,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.82 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1549.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688
[2024-06-10 05:47:56,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1531.17 | bwd_inner_microstep: 1531.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 05:47:58,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.62 | bwd_microstep: 1402.99 | bwd_inner_microstep: 1402.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3567
[2024-06-10 05:48:00,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.00 | bwd_microstep: 1698.34 | bwd_inner_microstep: 1698.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3777
[2024-06-10 05:48:03,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.92 | bwd_microstep: 1590.23 | bwd_inner_microstep: 1590.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3570
[2024-06-10 05:48:04,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.74 | bwd_microstep: 1423.25 | bwd_inner_microstep: 1423.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015
[2024-06-10 05:48:11,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 05:48:11,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.39 | bwd_microstep: 5671.72 | bwd_inner_microstep: 1024.60 | bwd_allreduce_microstep: 4647.07 | step_microstep: 38.74
[2024-06-10 05:48:11,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15728.61 | bwd: 46785.60 | bwd_inner: 42137.62 | bwd_allreduce: 4647.29 | step: 40.37
{'loss': 1.3508, 'learning_rate': 3.795609079141484e-05, 'epoch': 0.17}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-10 05:48:13,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1590.58 | bwd_inner_microstep: 1590.52 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-10 05:48:14,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.52 | bwd_microstep: 724.25 | bwd_inner_microstep: 724.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2373
[2024-06-10 05:48:15,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.90 | bwd_microstep: 995.84 | bwd_inner_microstep: 995.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848
[2024-06-10 05:48:17,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.91 | bwd_microstep: 1465.94 | bwd_inner_microstep: 1465.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 05:48:19,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.46 | bwd_microstep: 1498.59 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 05:48:21,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1249.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3796
[2024-06-10 05:48:23,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.23 | bwd_microstep: 1600.91 | bwd_inner_microstep: 1600.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1443
[2024-06-10 05:48:24,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 209.70 | bwd_microstep: 540.07 | bwd_inner_microstep: 540.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 05:48:26,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1432.69 | bwd_inner_microstep: 1432.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434
[2024-06-10 05:48:28,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.21 | bwd_microstep: 1187.11 | bwd_inner_microstep: 1187.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 05:48:29,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 05:48:32,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1492.55 | bwd_inner_microstep: 1492.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2285
[2024-06-10 05:48:33,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.41 | bwd_microstep: 1075.63 | bwd_inner_microstep: 1075.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2971
[2024-06-10 05:48:35,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1332.64 | bwd_inner_microstep: 1332.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3394
[2024-06-10 05:48:37,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1391.72 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 05:48:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1507.52 | bwd_inner_microstep: 1507.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 05:48:41,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1495.55 | bwd_inner_microstep: 1495.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3850
[2024-06-10 05:48:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1590.62 | bwd_inner_microstep: 1590.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 05:48:45,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.85 | bwd_microstep: 1612.70 | bwd_inner_microstep: 1612.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 05:48:48,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.72 | bwd_microstep: 1662.39 | bwd_inner_microstep: 1662.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 05:48:50,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.98 | bwd_microstep: 1631.78 | bwd_inner_microstep: 1631.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2005
[2024-06-10 05:48:51,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.72 | bwd_microstep: 712.02 | bwd_inner_microstep: 711.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 05:48:53,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1515.41 | bwd_inner_microstep: 1515.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 635
[2024-06-10 05:48:53,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 105.22 | bwd_microstep: 264.85 | bwd_inner_microstep: 264.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495
[2024-06-10 05:48:55,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.75 | bwd_microstep: 1319.20 | bwd_inner_microstep: 1319.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 05:48:57,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.59 | bwd_microstep: 1401.26 | bwd_inner_microstep: 1401.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 05:48:59,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.71 | bwd_microstep: 1289.14 | bwd_inner_microstep: 1289.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1980
[2024-06-10 05:49:00,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.05 | bwd_microstep: 706.88 | bwd_inner_microstep: 706.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 05:49:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1506.88 | bwd_inner_microstep: 1506.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823
[2024-06-10 05:49:04,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.44 | bwd_microstep: 1582.14 | bwd_inner_microstep: 1582.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2288
[2024-06-10 05:49:06,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.08 | bwd_microstep: 1040.40 | bwd_inner_microstep: 1040.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 05:49:13,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 05:49:13,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 6992.14 | bwd_inner_microstep: 1752.87 | bwd_allreduce_microstep: 5239.22 | step_microstep: 38.92
[2024-06-10 05:49:13,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15454.61 | bwd: 46788.98 | bwd_inner: 41548.80 | bwd_allreduce: 5239.47 | step: 40.55
{'loss': 1.2527, 'learning_rate': 3.793952944295909e-05, 'epoch': 0.17}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 05:49:15,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.00 | bwd_microstep: 1333.49 | bwd_inner_microstep: 1333.40 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3953
[2024-06-10 05:49:17,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.41 | bwd_microstep: 1490.00 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868
[2024-06-10 05:49:19,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.75 | bwd_microstep: 1565.44 | bwd_inner_microstep: 1565.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 05:49:21,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1555.15 | bwd_inner_microstep: 1555.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 05:49:22,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.58 | bwd_microstep: 796.25 | bwd_inner_microstep: 796.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 05:49:24,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1386.06 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 05:49:26,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.26 | bwd_microstep: 1284.30 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3489
[2024-06-10 05:49:28,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1247.11 | bwd_inner_microstep: 1247.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 05:49:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.41 | bwd_microstep: 792.22 | bwd_inner_microstep: 792.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 05:49:31,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.33 | bwd_microstep: 1649.79 | bwd_inner_microstep: 1649.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 05:49:32,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.25 | bwd_microstep: 798.66 | bwd_inner_microstep: 798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 05:49:34,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1281.87 | bwd_inner_microstep: 1281.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 05:49:36,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1478.73 | bwd_inner_microstep: 1478.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3688
[2024-06-10 05:49:39,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.66 | bwd_microstep: 1723.05 | bwd_inner_microstep: 1723.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 05:49:41,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1444.28 | bwd_inner_microstep: 1444.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-10 05:49:43,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.35 | bwd_microstep: 1545.63 | bwd_inner_microstep: 1545.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 05:49:45,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.00 | bwd_microstep: 1375.08 | bwd_inner_microstep: 1375.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997
[2024-06-10 05:49:46,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.00 | bwd_microstep: 709.61 | bwd_inner_microstep: 709.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3528
[2024-06-10 05:49:47,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1353.90 | bwd_inner_microstep: 1353.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670
[2024-06-10 05:49:50,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1526.29 | bwd_inner_microstep: 1526.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2041
[2024-06-10 05:49:51,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.75 | bwd_microstep: 845.64 | bwd_inner_microstep: 845.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 05:49:53,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.98 | bwd_microstep: 1554.14 | bwd_inner_microstep: 1554.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2300
[2024-06-10 05:49:54,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.89 | bwd_microstep: 975.74 | bwd_inner_microstep: 975.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 05:49:56,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.08 | bwd_microstep: 1260.47 | bwd_inner_microstep: 1260.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 05:49:58,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1299.80 | bwd_inner_microstep: 1299.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 05:50:00,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1397.33 | bwd_inner_microstep: 1397.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 05:50:02,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.46 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 05:50:04,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.19 | bwd_microstep: 1756.59 | bwd_inner_microstep: 1756.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1996
[2024-06-10 05:50:06,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.86 | bwd_microstep: 897.51 | bwd_inner_microstep: 897.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 05:50:08,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.04 | bwd_microstep: 1544.17 | bwd_inner_microstep: 1544.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-10 05:50:09,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.20 | bwd_microstep: 963.96 | bwd_inner_microstep: 963.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462
[2024-06-10 05:50:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 05:50:15,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.22 | bwd_microstep: 5089.58 | bwd_inner_microstep: 1505.76 | bwd_allreduce_microstep: 3583.77 | step_microstep: 38.77
[2024-06-10 05:50:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15626.60 | bwd: 45478.18 | bwd_inner: 41893.43 | bwd_allreduce: 3584.04 | step: 40.45
{'loss': 1.3392, 'learning_rate': 3.7922904911573903e-05, 'epoch': 0.17}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4245
[2024-06-10 05:50:17,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.26 | bwd_microstep: 1742.27 | bwd_inner_microstep: 1742.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2405
[2024-06-10 05:50:18,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.02 | bwd_microstep: 1002.88 | bwd_inner_microstep: 1002.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901
[2024-06-10 05:50:21,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.07 | bwd_microstep: 1587.58 | bwd_inner_microstep: 1587.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 4068
[2024-06-10 05:50:23,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1386.42 | bwd_inner_microstep: 1386.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 05:50:24,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1247.83 | bwd_inner_microstep: 1247.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 05:50:25,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 791.81 | bwd_inner_microstep: 791.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 05:50:27,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1254.46 | bwd_inner_microstep: 1254.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-10 05:50:29,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.06 | bwd_microstep: 1320.27 | bwd_inner_microstep: 1320.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2207
[2024-06-10 05:50:30,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.00 | bwd_microstep: 1054.74 | bwd_inner_microstep: 1054.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509
[2024-06-10 05:50:32,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.01 | bwd_microstep: 1446.34 | bwd_inner_microstep: 1446.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124
[2024-06-10 05:50:34,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.72 | bwd_microstep: 926.78 | bwd_inner_microstep: 926.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 05:50:35,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1291.31 | bwd_inner_microstep: 1291.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534
[2024-06-10 05:50:37,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.33 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982
[2024-06-10 05:50:39,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.52 | bwd_microstep: 826.83 | bwd_inner_microstep: 826.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 05:50:40,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.12 | bwd_microstep: 1340.76 | bwd_inner_microstep: 1340.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3530
[2024-06-10 05:50:43,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.74 | bwd_microstep: 1687.67 | bwd_inner_microstep: 1687.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 05:50:45,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1348.96 | bwd_inner_microstep: 1348.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 05:50:47,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1385.26 | bwd_inner_microstep: 1385.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 05:50:48,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1317.37 | bwd_inner_microstep: 1317.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 05:50:50,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.53 | bwd_microstep: 1345.23 | bwd_inner_microstep: 1345.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 05:50:52,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1347.54 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-10 05:50:53,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.06 | bwd_microstep: 819.83 | bwd_inner_microstep: 819.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725
[2024-06-10 05:50:55,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.18 | bwd_microstep: 1338.07 | bwd_inner_microstep: 1338.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 05:50:56,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 878.39 | bwd_inner_microstep: 878.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 05:50:58,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.47 | bwd_microstep: 1383.50 | bwd_inner_microstep: 1383.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 05:51:00,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.07 | bwd_microstep: 1402.73 | bwd_inner_microstep: 1402.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512
[2024-06-10 05:51:02,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.79 | bwd_microstep: 1194.05 | bwd_inner_microstep: 1194.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 05:51:04,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.67 | bwd_microstep: 1505.28 | bwd_inner_microstep: 1505.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 05:51:06,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.27 | bwd_microstep: 1349.59 | bwd_inner_microstep: 1349.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 05:51:08,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.83 | bwd_microstep: 1297.84 | bwd_inner_microstep: 1297.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 05:51:10,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.43 | bwd_microstep: 1510.22 | bwd_inner_microstep: 1510.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 05:51:15,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.23 | optimizer_step: 6.63
[2024-06-10 05:51:15,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.81 | bwd_microstep: 4528.79 | bwd_inner_microstep: 1803.33 | bwd_allreduce_microstep: 2725.41 | step_microstep: 38.79
[2024-06-10 05:51:15,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15516.13 | bwd: 44307.55 | bwd_inner: 41581.24 | bwd_allreduce: 2725.63 | step: 40.44
{'loss': 1.3538, 'learning_rate': 3.790621725581079e-05, 'epoch': 0.17}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 05:51:17,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1444.26 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2026
[2024-06-10 05:51:18,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.71 | bwd_microstep: 715.29 | bwd_inner_microstep: 715.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 05:51:20,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1253.94 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2338
[2024-06-10 05:51:21,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.69 | bwd_microstep: 825.96 | bwd_inner_microstep: 825.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-10 05:51:22,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.24 | bwd_microstep: 1184.28 | bwd_inner_microstep: 1184.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 05:51:24,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1392.77 | bwd_inner_microstep: 1392.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 05:51:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1488.38 | bwd_inner_microstep: 1488.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219
[2024-06-10 05:51:28,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.27 | bwd_microstep: 962.20 | bwd_inner_microstep: 962.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 05:51:30,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.56 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1944
[2024-06-10 05:51:31,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.18 | bwd_microstep: 744.86 | bwd_inner_microstep: 744.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 05:51:33,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1352.31 | bwd_inner_microstep: 1352.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-10 05:51:35,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.53 | bwd_microstep: 1510.95 | bwd_inner_microstep: 1510.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 05:51:37,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.85 | bwd_microstep: 1627.27 | bwd_inner_microstep: 1627.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 05:51:39,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1380.79 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3673
[2024-06-10 05:51:41,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.19 | bwd_microstep: 1654.98 | bwd_inner_microstep: 1654.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653
[2024-06-10 05:51:43,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1328.94 | bwd_inner_microstep: 1328.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 05:51:45,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.64 | bwd_microstep: 1405.55 | bwd_inner_microstep: 1405.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 05:51:47,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1493.00 | bwd_inner_microstep: 1492.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 05:51:49,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.02 | bwd_microstep: 1303.00 | bwd_inner_microstep: 1302.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 05:51:51,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.35 | bwd_microstep: 1287.44 | bwd_inner_microstep: 1287.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3678
[2024-06-10 05:51:53,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1330.62 | bwd_inner_microstep: 1330.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 05:51:54,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.23 | bwd_microstep: 1402.01 | bwd_inner_microstep: 1401.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 05:51:57,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.64 | bwd_inner_microstep: 1556.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 05:51:59,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1436.07 | bwd_inner_microstep: 1436.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818
[2024-06-10 05:52:01,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1391.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750
[2024-06-10 05:52:02,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.28 | bwd_microstep: 1345.56 | bwd_inner_microstep: 1345.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 05:52:04,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.98 | bwd_microstep: 1384.60 | bwd_inner_microstep: 1384.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384
[2024-06-10 05:52:06,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.60 | bwd_microstep: 1242.13 | bwd_inner_microstep: 1242.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 05:52:08,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1558.69 | bwd_inner_microstep: 1558.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581
[2024-06-10 05:52:11,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.74 | bwd_microstep: 1697.79 | bwd_inner_microstep: 1697.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 05:52:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.91 | bwd_microstep: 1549.59 | bwd_inner_microstep: 1549.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 05:52:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 05:52:18,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 4366.67 | bwd_inner_microstep: 1767.12 | bwd_allreduce_microstep: 2599.49 | step_microstep: 38.67
[2024-06-10 05:52:18,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16266.15 | bwd: 46222.19 | bwd_inner: 43621.73 | bwd_allreduce: 2599.76 | step: 40.36


 17%|█▋        | 293/1726 [5:08:42<24:21:46, 61.21s/it]
 17%|█▋        | 294/1726 [5:09:44<24:26:56, 61.46s/it]


 17%|█▋        | 294/1726 [5:09:44<24:26:56, 61.46s/it]
 17%|█▋        | 295/1726 [5:10:47<24:35:56, 61.88s/it]


 17%|█▋        | 295/1726 [5:10:47<24:35:56, 61.88s/it]
 17%|█▋        | 296/1726 [5:11:50<24:39:59, 62.10s/it]


 17%|█▋        | 296/1726 [5:11:50<24:39:59, 62.10s/it]
 17%|█▋        | 297/1726 [5:12:51<24:34:26, 61.91s/it]


 17%|█▋        | 297/1726 [5:12:51<24:34:26, 61.91s/it]
 17%|█▋        | 298/1726 [5:13:52<24:21:01, 61.39s/it]


 17%|█▋        | 298/1726 [5:13:52<24:21:01, 61.39s/it]
 17%|█▋        | 299/1726 [5:14:54<24:30:23, 61.82s/it]
 {'loss': 1.3184, 'learning_rate': 3.788946653444359e-05, 'epoch': 0.17}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 05:52:20,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1482.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 05:52:21,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1275.70 | bwd_inner_microstep: 1275.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1921
[2024-06-10 05:52:23,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.56 | bwd_microstep: 849.79 | bwd_inner_microstep: 849.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 05:52:25,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.73 | bwd_microstep: 1550.96 | bwd_inner_microstep: 1550.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 05:52:26,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.03 | bwd_microstep: 680.07 | bwd_inner_microstep: 680.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3738
[2024-06-10 05:52:28,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.76 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 05:52:29,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.69 | bwd_microstep: 1288.19 | bwd_inner_microstep: 1288.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 05:52:31,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1530.37 | bwd_inner_microstep: 1530.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 05:52:33,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.65 | bwd_microstep: 1353.38 | bwd_inner_microstep: 1353.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 05:52:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1526.20 | bwd_inner_microstep: 1526.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118
[2024-06-10 05:52:37,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.60 | bwd_microstep: 831.55 | bwd_inner_microstep: 831.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496
[2024-06-10 05:52:38,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 1316.09 | bwd_inner_microstep: 1316.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 05:52:40,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.09 | bwd_microstep: 802.79 | bwd_inner_microstep: 802.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687
[2024-06-10 05:52:42,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.25 | bwd_microstep: 1721.24 | bwd_inner_microstep: 1721.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3683
[2024-06-10 05:52:44,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.78 | bwd_microstep: 1823.33 | bwd_inner_microstep: 1823.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 4025
[2024-06-10 05:52:47,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.68 | bwd_microstep: 1657.20 | bwd_inner_microstep: 1657.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 05:52:49,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.02 | bwd_microstep: 1505.80 | bwd_inner_microstep: 1505.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-10 05:52:50,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 727.61 | bwd_inner_microstep: 727.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2412
[2024-06-10 05:52:51,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.29 | bwd_microstep: 1033.98 | bwd_inner_microstep: 1033.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-10 05:52:53,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1417.39 | bwd_inner_microstep: 1417.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 05:52:55,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1530.16 | bwd_inner_microstep: 1530.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 05:52:57,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.15 | bwd_microstep: 1185.70 | bwd_inner_microstep: 1185.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2019
[2024-06-10 05:52:58,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.42 | bwd_microstep: 854.31 | bwd_inner_microstep: 854.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-10 05:53:00,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1608.54 | bwd_inner_microstep: 1608.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912
[2024-06-10 05:53:01,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.31 | bwd_microstep: 718.86 | bwd_inner_microstep: 718.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945
[2024-06-10 05:53:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.25 | bwd_microstep: 732.95 | bwd_inner_microstep: 732.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 05:53:04,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1459.61 | bwd_inner_microstep: 1459.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-10 05:53:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.28 | bwd_microstep: 1007.28 | bwd_inner_microstep: 1007.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 05:53:08,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.60 | bwd_microstep: 1632.33 | bwd_inner_microstep: 1632.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 05:53:10,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3806
[2024-06-10 05:53:12,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.07 | bwd_microstep: 1686.76 | bwd_inner_microstep: 1686.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-10 05:53:18,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 05:53:18,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.31 | bwd_microstep: 5274.87 | bwd_inner_microstep: 1102.46 | bwd_allreduce_microstep: 4172.36 | step_microstep: 38.66
[2024-06-10 05:53:18,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15176.54 | bwd: 44906.69 | bwd_inner: 40733.33 | bwd_allreduce: 4172.64 | step: 40.24
{'loss': 1.3219, 'learning_rate': 3.787265280646825e-05, 'epoch': 0.17}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 05:53:20,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.31 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.14 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2821
[2024-06-10 05:53:22,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.48 | bwd_microstep: 1112.62 | bwd_inner_microstep: 1112.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 05:53:23,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1248.35 | bwd_inner_microstep: 1248.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 05:53:25,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 05:53:27,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1353.32 | bwd_inner_microstep: 1353.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 05:53:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1653.98 | bwd_inner_microstep: 1653.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 05:53:31,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1289.11 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 05:53:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.19 | bwd_microstep: 793.22 | bwd_inner_microstep: 793.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884
[2024-06-10 05:53:33,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.53 | bwd_microstep: 685.79 | bwd_inner_microstep: 685.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 05:53:34,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.67 | bwd_microstep: 794.46 | bwd_inner_microstep: 794.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 05:53:36,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.50 | bwd_microstep: 1348.08 | bwd_inner_microstep: 1348.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3950
[2024-06-10 05:53:38,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.32 | bwd_microstep: 1531.21 | bwd_inner_microstep: 1531.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 05:53:39,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.46 | bwd_microstep: 698.58 | bwd_inner_microstep: 698.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3691
[2024-06-10 05:53:41,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.20 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1413.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-10 05:53:43,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.72 | bwd_microstep: 1419.81 | bwd_inner_microstep: 1419.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 05:53:45,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1460.90 | bwd_inner_microstep: 1460.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 05:53:47,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1394.08 | bwd_inner_microstep: 1394.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 05:53:49,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1259.00 | bwd_inner_microstep: 1258.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 05:53:50,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.28 | bwd_microstep: 1254.76 | bwd_inner_microstep: 1254.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 05:53:53,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.61 | bwd_microstep: 1659.91 | bwd_inner_microstep: 1659.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667
[2024-06-10 05:53:55,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.16 | bwd_microstep: 1624.42 | bwd_inner_microstep: 1624.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293
[2024-06-10 05:53:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.83 | bwd_microstep: 877.04 | bwd_inner_microstep: 877.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 05:53:58,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.72 | bwd_microstep: 1186.98 | bwd_inner_microstep: 1186.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 05:54:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.80 | bwd_microstep: 1441.28 | bwd_inner_microstep: 1441.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3715
[2024-06-10 05:54:02,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1368.70 | bwd_inner_microstep: 1368.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 05:54:04,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1348.20 | bwd_inner_microstep: 1348.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 05:54:05,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.42 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 05:54:08,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.29 | bwd_microstep: 1548.85 | bwd_inner_microstep: 1548.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2205
[2024-06-10 05:54:09,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.40 | bwd_microstep: 867.84 | bwd_inner_microstep: 867.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817
[2024-06-10 05:54:11,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.23 | bwd_microstep: 1620.52 | bwd_inner_microstep: 1620.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399
[2024-06-10 05:54:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1441.27 | bwd_inner_microstep: 1441.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816
[2024-06-10 05:54:21,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.57
[2024-06-10 05:54:21,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.98 | bwd_microstep: 7714.77 | bwd_inner_microstep: 1988.65 | bwd_allreduce_microstep: 5726.07 | step_microstep: 38.88
[2024-06-10 05:54:21,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15549.80 | bwd: 47475.36 | bwd_inner: 41748.34 | bwd_allreduce: 5726.32 | step: 40.46
{'loss': 1.3533, 'learning_rate': 3.785577613110264e-05, 'epoch': 0.17}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 05:54:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.97 | bwd_microstep: 784.07 | bwd_inner_microstep: 783.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 05:54:24,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1342.21 | bwd_inner_microstep: 1342.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2373
[2024-06-10 05:54:26,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.70 | bwd_microstep: 980.33 | bwd_inner_microstep: 980.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 05:54:28,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.05 | bwd_microstep: 1281.37 | bwd_inner_microstep: 1281.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 05:54:29,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1385.53 | bwd_inner_microstep: 1385.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 05:54:31,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3560
[2024-06-10 05:54:33,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.80 | bwd_microstep: 1235.29 | bwd_inner_microstep: 1235.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 05:54:35,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.55 | bwd_microstep: 1536.16 | bwd_inner_microstep: 1536.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 05:54:37,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.44 | bwd_microstep: 1527.67 | bwd_inner_microstep: 1527.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 05:54:39,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1374.50 | bwd_inner_microstep: 1374.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 05:54:41,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.30 | bwd_microstep: 1520.13 | bwd_inner_microstep: 1520.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 05:54:44,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.23 | bwd_microstep: 1620.64 | bwd_inner_microstep: 1620.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3671
[2024-06-10 05:54:46,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1480.63 | bwd_inner_microstep: 1480.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 05:54:47,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1315.95 | bwd_inner_microstep: 1315.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 05:54:50,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.54 | bwd_microstep: 1715.90 | bwd_inner_microstep: 1715.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-10 05:54:52,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1589.08 | bwd_inner_microstep: 1589.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682
[2024-06-10 05:54:54,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.34 | bwd_microstep: 1420.34 | bwd_inner_microstep: 1420.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 05:54:56,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.50 | bwd_microstep: 1613.63 | bwd_inner_microstep: 1613.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 05:54:58,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.58 | bwd_microstep: 1611.31 | bwd_inner_microstep: 1611.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 05:55:00,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.63 | bwd_microstep: 1525.16 | bwd_inner_microstep: 1525.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3712
[2024-06-10 05:55:02,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1237.74 | bwd_inner_microstep: 1237.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3527
[2024-06-10 05:55:04,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 1341.37 | bwd_inner_microstep: 1341.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-10 05:55:05,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 703.04 | bwd_inner_microstep: 703.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2007
[2024-06-10 05:55:06,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.46 | bwd_microstep: 711.14 | bwd_inner_microstep: 711.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3743
[2024-06-10 05:55:08,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.07 | bwd_microstep: 1272.75 | bwd_inner_microstep: 1272.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 05:55:10,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1295.96 | bwd_inner_microstep: 1295.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 05:55:12,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1404.15 | bwd_inner_microstep: 1404.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 05:55:13,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1405.63 | bwd_inner_microstep: 1405.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047
[2024-06-10 05:55:15,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.17 | bwd_microstep: 814.61 | bwd_inner_microstep: 814.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 05:55:16,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.54 | bwd_microstep: 810.74 | bwd_inner_microstep: 810.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2244
[2024-06-10 05:55:17,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.07 | bwd_microstep: 872.47 | bwd_inner_microstep: 872.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 05:55:22,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.26 | optimizer_step: 6.63
[2024-06-10 05:55:22,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.45 | bwd_microstep: 4134.54 | bwd_inner_microstep: 1584.69 | bwd_allreduce_microstep: 2549.80 | step_microstep: 38.77
[2024-06-10 05:55:22,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15587.29 | bwd: 44246.37 | bwd_inner: 41695.55 | bwd_allreduce: 2550.08 | step: 40.40
{'loss': 1.3319, 'learning_rate': 3.783883656778631e-05, 'epoch': 0.17}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 05:55:24,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1484.22 | bwd_inner_microstep: 1484.16 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 05:55:25,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.04 | bwd_microstep: 1240.45 | bwd_inner_microstep: 1240.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811
[2024-06-10 05:55:27,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.97 | bwd_microstep: 1351.97 | bwd_inner_microstep: 1351.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 05:55:29,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.43 | bwd_microstep: 971.49 | bwd_inner_microstep: 971.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 05:55:30,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.94 | bwd_microstep: 1352.21 | bwd_inner_microstep: 1352.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 05:55:33,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.72 | bwd_microstep: 1529.53 | bwd_inner_microstep: 1529.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725
[2024-06-10 05:55:34,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.74 | bwd_microstep: 1334.43 | bwd_inner_microstep: 1334.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 05:55:36,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1375.55 | bwd_inner_microstep: 1375.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2057
[2024-06-10 05:55:37,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.96 | bwd_microstep: 816.56 | bwd_inner_microstep: 816.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 05:55:40,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.60 | bwd_microstep: 1627.19 | bwd_inner_microstep: 1627.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 05:55:42,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.95 | bwd_microstep: 1387.80 | bwd_inner_microstep: 1387.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1941
[2024-06-10 05:55:43,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.40 | bwd_microstep: 823.68 | bwd_inner_microstep: 823.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-10 05:55:45,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.44 | bwd_microstep: 1525.07 | bwd_inner_microstep: 1525.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 05:55:47,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 05:55:49,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.65 | bwd_microstep: 1606.72 | bwd_inner_microstep: 1606.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953
[2024-06-10 05:55:50,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.40 | bwd_microstep: 822.26 | bwd_inner_microstep: 822.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 05:55:51,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 790.88 | bwd_inner_microstep: 790.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 05:55:53,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1551.99 | bwd_inner_microstep: 1551.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 05:55:55,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1360.37 | bwd_inner_microstep: 1360.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 05:55:57,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1280.18 | bwd_inner_microstep: 1280.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 05:55:59,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.03 | bwd_microstep: 1417.32 | bwd_inner_microstep: 1417.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 05:56:01,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.51 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 05:56:03,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1255.57 | bwd_inner_microstep: 1255.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2022
[2024-06-10 05:56:04,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.52 | bwd_microstep: 715.57 | bwd_inner_microstep: 715.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 05:56:05,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1294.97 | bwd_inner_microstep: 1294.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 05:56:07,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.33 | bwd_microstep: 1452.38 | bwd_inner_microstep: 1452.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 05:56:10,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.34 | bwd_microstep: 1659.37 | bwd_inner_microstep: 1659.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3604
[2024-06-10 05:56:11,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.27 | bwd_microstep: 1275.79 | bwd_inner_microstep: 1275.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2236
[2024-06-10 05:56:12,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 776.72 | bwd_inner_microstep: 776.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 05:56:15,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.86 | bwd_microstep: 1536.87 | bwd_inner_microstep: 1536.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3535
[2024-06-10 05:56:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1522.93 | bwd_inner_microstep: 1522.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3467
[2024-06-10 05:56:23,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.83 | optimizer_step: 6.58
[2024-06-10 05:56:23,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.52 | bwd_microstep: 5860.14 | bwd_inner_microstep: 1360.02 | bwd_allreduce_microstep: 4500.05 | step_microstep: 39.46
[2024-06-10 05:56:23,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15435.44 | bwd: 45664.25 | bwd_inner: 41163.21 | bwd_allreduce: 4500.32 | step: 41.06
{'loss': 1.3211, 'learning_rate': 3.7821834176180336e-05, 'epoch': 0.18}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462
[2024-06-10 05:56:25,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.39 | bwd_microstep: 1327.12 | bwd_inner_microstep: 1327.03 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-10 05:56:26,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.98 | bwd_microstep: 716.51 | bwd_inner_microstep: 716.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 05:56:27,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.78 | bwd_microstep: 787.10 | bwd_inner_microstep: 787.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 05:56:28,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.28 | bwd_microstep: 970.36 | bwd_inner_microstep: 970.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3828
[2024-06-10 05:56:30,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.46 | bwd_microstep: 1512.50 | bwd_inner_microstep: 1512.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 05:56:32,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 794.76 | bwd_inner_microstep: 794.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 05:56:33,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.34 | bwd_microstep: 1284.58 | bwd_inner_microstep: 1284.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 05:56:36,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.63 | bwd_microstep: 1635.93 | bwd_inner_microstep: 1635.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 05:56:37,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.59 | bwd_microstep: 791.21 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419
[2024-06-10 05:56:38,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.07 | bwd_microstep: 1279.70 | bwd_inner_microstep: 1279.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 05:56:40,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.15 | bwd_microstep: 780.10 | bwd_inner_microstep: 780.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 05:56:42,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.22 | bwd_microstep: 1476.55 | bwd_inner_microstep: 1476.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 05:56:44,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.06 | bwd_microstep: 1622.55 | bwd_inner_microstep: 1622.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3926
[2024-06-10 05:56:46,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1501.49 | bwd_inner_microstep: 1501.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096
[2024-06-10 05:56:47,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.10 | bwd_microstep: 823.39 | bwd_inner_microstep: 823.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2119
[2024-06-10 05:56:48,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.94 | bwd_microstep: 928.29 | bwd_inner_microstep: 928.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 05:56:50,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1285.05 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 05:56:52,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.81 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 05:56:54,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.25 | bwd_microstep: 1257.90 | bwd_inner_microstep: 1257.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619
[2024-06-10 05:56:56,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1437.79 | bwd_inner_microstep: 1437.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 05:56:58,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 1284.97 | bwd_inner_microstep: 1284.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3617
[2024-06-10 05:56:59,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.96 | bwd_microstep: 1246.39 | bwd_inner_microstep: 1246.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3707
[2024-06-10 05:57:01,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 1335.73 | bwd_inner_microstep: 1335.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-10 05:57:03,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.82 | bwd_microstep: 1359.30 | bwd_inner_microstep: 1359.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3714
[2024-06-10 05:57:05,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.78 | bwd_microstep: 1497.15 | bwd_inner_microstep: 1497.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 05:57:07,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 05:57:10,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.64 | bwd_microstep: 1649.38 | bwd_inner_microstep: 1649.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3735
[2024-06-10 05:57:12,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.42 | bwd_microstep: 1604.81 | bwd_inner_microstep: 1604.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 05:57:14,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.67 | bwd_microstep: 1438.11 | bwd_inner_microstep: 1438.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 05:57:16,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2059
[2024-06-10 05:57:17,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.25 | bwd_microstep: 847.81 | bwd_inner_microstep: 847.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3437
[2024-06-10 05:57:25,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 05:57:25,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 7153.15 | bwd_inner_microstep: 2043.37 | bwd_allreduce_microstep: 5109.71 | step_microstep: 39.14
[2024-06-10 05:57:25,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15208.43 | bwd: 46178.88 | bwd_inner: 41068.16 | bwd_allreduce: 5110.00 | step: 40.81
{'loss': 1.2781, 'learning_rate': 3.7804769016167036e-05, 'epoch': 0.18}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 05:57:27,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1372.15 | bwd_inner_microstep: 1372.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876
[2024-06-10 05:57:28,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 708.16 | bwd_inner_microstep: 708.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 05:57:30,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1391.08 | bwd_inner_microstep: 1391.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-10 05:57:31,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.84 | bwd_microstep: 1241.65 | bwd_inner_microstep: 1241.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-10 05:57:32,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.56 | bwd_microstep: 794.02 | bwd_inner_microstep: 794.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 05:57:34,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1245.72 | bwd_inner_microstep: 1245.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3726
[2024-06-10 05:57:36,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1494.74 | bwd_inner_microstep: 1494.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 05:57:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1294.76 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 05:57:40,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1652.33 | bwd_inner_microstep: 1652.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 05:57:41,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.78 | bwd_microstep: 700.27 | bwd_inner_microstep: 700.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 05:57:43,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1388.63 | bwd_inner_microstep: 1388.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 05:57:44,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.83 | bwd_microstep: 795.68 | bwd_inner_microstep: 795.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3675
[2024-06-10 05:57:46,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.72 | bwd_microstep: 1262.23 | bwd_inner_microstep: 1262.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1971
[2024-06-10 05:57:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.39 | bwd_microstep: 826.67 | bwd_inner_microstep: 826.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-10 05:57:48,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.30 | bwd_microstep: 887.76 | bwd_inner_microstep: 887.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450
[2024-06-10 05:57:50,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.39 | bwd_microstep: 1410.51 | bwd_inner_microstep: 1410.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1901
[2024-06-10 05:57:51,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.24 | bwd_microstep: 714.90 | bwd_inner_microstep: 714.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3947
[2024-06-10 05:57:53,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1527.98 | bwd_inner_microstep: 1527.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2124
[2024-06-10 05:57:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.70 | bwd_microstep: 862.47 | bwd_inner_microstep: 862.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3621
[2024-06-10 05:57:57,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.26 | bwd_microstep: 1541.30 | bwd_inner_microstep: 1541.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-10 05:57:59,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1417.21 | bwd_inner_microstep: 1417.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 05:58:01,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.81 | bwd_microstep: 1554.34 | bwd_inner_microstep: 1554.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 05:58:02,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.54 | bwd_microstep: 698.43 | bwd_inner_microstep: 698.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 05:58:04,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1292.32 | bwd_inner_microstep: 1292.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 05:58:06,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.64 | bwd_microstep: 1571.21 | bwd_inner_microstep: 1571.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 05:58:08,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.65 | bwd_microstep: 1420.83 | bwd_inner_microstep: 1420.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3429
[2024-06-10 05:58:10,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.13 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 05:58:12,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1379.07 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 05:58:13,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.40 | bwd_microstep: 1310.55 | bwd_inner_microstep: 1310.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 05:58:16,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.52 | bwd_microstep: 1648.38 | bwd_inner_microstep: 1648.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 05:58:18,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 05:58:25,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 05:58:25,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 6846.23 | bwd_inner_microstep: 1526.57 | bwd_allreduce_microstep: 5319.59 | step_microstep: 39.29
[2024-06-10 05:58:25,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14904.24 | bwd: 45161.45 | bwd_inner: 39840.89 | bwd_allreduce: 5319.86 | step: 40.94


 17%|█▋        | 299/1726 [5:14:54<24:30:23, 61.82s/it]
 17%|█▋        | 300/1726 [5:15:55<24:19:27, 61.41s/it]


 17%|█▋        | 300/1726 [5:15:55<24:19:27, 61.41s/it]
 17%|█▋        | 301/1726 [5:16:58<24:32:25, 62.00s/it]


 17%|█▋        | 301/1726 [5:16:58<24:32:25, 62.00s/it]
 17%|█▋        | 302/1726 [5:17:58<24:18:27, 61.45s/it]


 17%|█▋        | 302/1726 [5:17:58<24:18:27, 61.45s/it]
 18%|█▊        | 303/1726 [5:19:00<24:17:22, 61.45s/it]


 18%|█▊        | 303/1726 [5:19:00<24:17:22, 61.45s/it]
 18%|█▊        | 304/1726 [5:20:02<24:18:21, 61.53s/it]


 18%|█▊        | 304/1726 [5:20:02<24:18:21, 61.53s/it]
 18%|█▊        | 30{'loss': 1.2547, 'learning_rate': 3.7787641147849814e-05, 'epoch': 0.18}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 05:58:27,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.82 | bwd_microstep: 1482.51 | bwd_inner_microstep: 1482.44 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3990
[2024-06-10 05:58:29,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.35 | bwd_microstep: 1603.29 | bwd_inner_microstep: 1603.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 05:58:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.56 | bwd_microstep: 1289.88 | bwd_inner_microstep: 1289.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 05:58:33,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1342.26 | bwd_inner_microstep: 1342.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938
[2024-06-10 05:58:34,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.71 | bwd_microstep: 727.06 | bwd_inner_microstep: 727.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 05:58:36,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1346.74 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 05:58:38,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1640.95 | bwd_inner_microstep: 1640.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 05:58:40,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1391.75 | bwd_inner_microstep: 1391.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 05:58:42,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1528.94 | bwd_inner_microstep: 1528.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3407
[2024-06-10 05:58:44,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.95 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-10 05:58:46,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.72 | bwd_microstep: 1626.31 | bwd_inner_microstep: 1626.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 05:58:49,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.96 | bwd_microstep: 1719.17 | bwd_inner_microstep: 1719.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3649
[2024-06-10 05:58:51,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.99 | bwd_microstep: 1784.96 | bwd_inner_microstep: 1784.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3550
[2024-06-10 05:58:53,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.02 | bwd_microstep: 1546.67 | bwd_inner_microstep: 1546.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637
[2024-06-10 05:58:55,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3512
[2024-06-10 05:58:57,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.96 | bwd_microstep: 1368.91 | bwd_inner_microstep: 1368.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2972
[2024-06-10 05:58:59,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.55 | bwd_microstep: 1199.46 | bwd_inner_microstep: 1199.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 05:59:01,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.89 | bwd_microstep: 1557.31 | bwd_inner_microstep: 1557.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3624
[2024-06-10 05:59:03,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.59 | bwd_microstep: 1345.75 | bwd_inner_microstep: 1345.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 05:59:05,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1281.24 | bwd_inner_microstep: 1281.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 05:59:07,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1394.16 | bwd_inner_microstep: 1394.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 05:59:09,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.34 | bwd_microstep: 1657.72 | bwd_inner_microstep: 1657.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 05:59:11,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.39 | bwd_microstep: 1160.92 | bwd_inner_microstep: 1160.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 05:59:12,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.77 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807
[2024-06-10 05:59:15,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.35 | bwd_microstep: 1612.09 | bwd_inner_microstep: 1612.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 05:59:17,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1628.74 | bwd_inner_microstep: 1628.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 05:59:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1552.07 | bwd_inner_microstep: 1552.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 05:59:21,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.65 | bwd_microstep: 1358.97 | bwd_inner_microstep: 1358.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 05:59:23,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1490.80 | bwd_inner_microstep: 1490.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-10 05:59:25,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1523.53 | bwd_inner_microstep: 1523.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-10 05:59:27,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.58 | bwd_microstep: 1597.41 | bwd_inner_microstep: 1597.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2997
[2024-06-10 05:59:29,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 05:59:29,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.72 | bwd_microstep: 1147.94 | bwd_inner_microstep: 1138.77 | bwd_allreduce_microstep: 9.12 | step_microstep: 38.34
[2024-06-10 05:59:29,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17168.24 | bwd: 46076.33 | bwd_inner: 46066.25 | bwd_allreduce: 9.37 | step: 39.94
{'loss': 1.2627, 'learning_rate': 3.7770450631552946e-05, 'epoch': 0.18}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931
[2024-06-10 05:59:30,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.60 | bwd_microstep: 823.33 | bwd_inner_microstep: 823.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3931
[2024-06-10 05:59:32,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.57 | bwd_microstep: 1593.17 | bwd_inner_microstep: 1593.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 05:59:34,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.57 | bwd_microstep: 1657.55 | bwd_inner_microstep: 1657.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3838
[2024-06-10 05:59:37,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.39 | bwd_microstep: 1518.99 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 05:59:38,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1380.41 | bwd_inner_microstep: 1380.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 05:59:40,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1389.68 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 05:59:42,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1285.77 | bwd_inner_microstep: 1285.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 05:59:44,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.43 | bwd_microstep: 1152.13 | bwd_inner_microstep: 1152.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-10 05:59:45,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.64 | bwd_microstep: 1191.51 | bwd_inner_microstep: 1191.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 05:59:47,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1288.35 | bwd_inner_microstep: 1288.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 05:59:49,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1415.12 | bwd_inner_microstep: 1415.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 05:59:51,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1377.09 | bwd_inner_microstep: 1377.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 05:59:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 05:59:55,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1478.36 | bwd_inner_microstep: 1478.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 05:59:57,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1378.30 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3627
[2024-06-10 05:59:59,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.67 | bwd_microstep: 1580.23 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 06:00:01,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1284.76 | bwd_inner_microstep: 1284.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2294
[2024-06-10 06:00:02,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.16 | bwd_microstep: 1026.12 | bwd_inner_microstep: 1026.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2676
[2024-06-10 06:00:04,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.80 | bwd_microstep: 1025.76 | bwd_inner_microstep: 1025.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 06:00:06,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3649
[2024-06-10 06:00:07,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1323.67 | bwd_inner_microstep: 1323.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 06:00:09,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1441.38 | bwd_inner_microstep: 1441.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3643
[2024-06-10 06:00:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.45 | bwd_microstep: 1252.03 | bwd_inner_microstep: 1252.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834
[2024-06-10 06:00:13,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.36 | bwd_microstep: 1360.99 | bwd_inner_microstep: 1360.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2079
[2024-06-10 06:00:14,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.83 | bwd_microstep: 729.00 | bwd_inner_microstep: 728.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 06:00:15,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.89 | bwd_microstep: 983.01 | bwd_inner_microstep: 982.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3818
[2024-06-10 06:00:17,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.04 | bwd_microstep: 1266.82 | bwd_inner_microstep: 1266.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 06:00:19,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1338.62 | bwd_inner_microstep: 1338.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 06:00:21,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1277.94 | bwd_inner_microstep: 1277.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 06:00:23,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1592.39 | bwd_inner_microstep: 1592.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764
[2024-06-10 06:00:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.85 | bwd_microstep: 1469.05 | bwd_inner_microstep: 1469.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3595
[2024-06-10 06:00:29,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 06:00:29,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.20 | bwd_microstep: 3561.10 | bwd_inner_microstep: 1930.96 | bwd_allreduce_microstep: 1630.08 | step_microstep: 76.73
[2024-06-10 06:00:29,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15904.41 | bwd: 44197.78 | bwd_inner: 42566.67 | bwd_allreduce: 1630.37 | step: 78.42
{'loss': 1.3195, 'learning_rate': 3.775319752782133e-05, 'epoch': 0.18}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 06:00:30,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 782.97 | bwd_inner_microstep: 782.84 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2360
[2024-06-10 06:00:32,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.49 | bwd_microstep: 989.74 | bwd_inner_microstep: 989.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3409
[2024-06-10 06:00:34,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.37 | bwd_microstep: 1278.33 | bwd_inner_microstep: 1278.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 06:00:35,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.28 | bwd_microstep: 1280.98 | bwd_inner_microstep: 1280.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504
[2024-06-10 06:00:37,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.21 | bwd_microstep: 1190.61 | bwd_inner_microstep: 1190.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 06:00:39,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 1532.03 | bwd_inner_microstep: 1532.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2201
[2024-06-10 06:00:40,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.55 | bwd_microstep: 956.53 | bwd_inner_microstep: 956.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 750
[2024-06-10 06:00:41,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 129.19 | bwd_microstep: 302.14 | bwd_inner_microstep: 302.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 06:00:42,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.55 | bwd_microstep: 803.13 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940
[2024-06-10 06:00:43,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 891.28 | bwd_inner_microstep: 891.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3626
[2024-06-10 06:00:45,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1538.36 | bwd_inner_microstep: 1538.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 06:00:48,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 1613.86 | bwd_inner_microstep: 1613.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-10 06:00:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.67 | bwd_microstep: 1612.60 | bwd_inner_microstep: 1612.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432
[2024-06-10 06:00:52,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.84 | bwd_microstep: 1285.10 | bwd_inner_microstep: 1285.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 06:00:53,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1287.84 | bwd_inner_microstep: 1287.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466
[2024-06-10 06:00:55,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.81 | bwd_microstep: 1213.66 | bwd_inner_microstep: 1213.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 06:00:57,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.27 | bwd_microstep: 1515.30 | bwd_inner_microstep: 1515.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 06:00:59,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.43 | bwd_microstep: 1295.24 | bwd_inner_microstep: 1295.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 06:01:01,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.36 | bwd_microstep: 1184.47 | bwd_inner_microstep: 1184.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3612
[2024-06-10 06:01:02,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.33 | bwd_microstep: 1217.36 | bwd_inner_microstep: 1217.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 06:01:04,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 06:01:06,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.32 | bwd_microstep: 1298.47 | bwd_inner_microstep: 1298.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 06:01:08,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1258.85 | bwd_inner_microstep: 1258.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 06:01:09,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.08 | bwd_microstep: 1258.11 | bwd_inner_microstep: 1258.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 06:01:11,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1291.74 | bwd_inner_microstep: 1291.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 06:01:13,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.53 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 06:01:15,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 06:01:17,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1545.21 | bwd_inner_microstep: 1545.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 06:01:20,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.67 | bwd_microstep: 1638.38 | bwd_inner_microstep: 1638.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 06:01:22,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1595.68 | bwd_inner_microstep: 1595.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3950
[2024-06-10 06:01:24,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.72 | bwd_microstep: 1825.74 | bwd_inner_microstep: 1825.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3698
[2024-06-10 06:01:32,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 06:01:32,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.38 | bwd_microstep: 6669.74 | bwd_inner_microstep: 1781.81 | bwd_allreduce_microstep: 4887.87 | step_microstep: 39.00
[2024-06-10 06:01:32,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15511.83 | bwd: 46385.00 | bwd_inner: 41496.12 | bwd_allreduce: 4888.15 | step: 40.71
{'loss': 1.3156, 'learning_rate': 3.7735881897420315e-05, 'epoch': 0.18}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 06:01:33,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.90 | bwd_microstep: 1275.38 | bwd_inner_microstep: 1275.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 06:01:35,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1375.74 | bwd_inner_microstep: 1375.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2730
[2024-06-10 06:01:37,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.57 | bwd_microstep: 994.04 | bwd_inner_microstep: 994.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 06:01:38,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.58 | bwd_microstep: 712.29 | bwd_inner_microstep: 712.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 06:01:39,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1346.26 | bwd_inner_microstep: 1346.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 06:01:41,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1333.34 | bwd_inner_microstep: 1333.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3581
[2024-06-10 06:01:43,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.45 | bwd_microstep: 1208.15 | bwd_inner_microstep: 1208.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493
[2024-06-10 06:01:45,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1191.79 | bwd_inner_microstep: 1191.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430
[2024-06-10 06:01:46,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.96 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 06:01:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.68 | bwd_microstep: 792.64 | bwd_inner_microstep: 792.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3416
[2024-06-10 06:01:49,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.80 | bwd_microstep: 1213.34 | bwd_inner_microstep: 1213.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2140
[2024-06-10 06:01:50,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.24 | bwd_microstep: 927.97 | bwd_inner_microstep: 927.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 06:01:52,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1251.23 | bwd_inner_microstep: 1251.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 06:01:54,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 06:01:56,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1451.20 | bwd_inner_microstep: 1451.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639
[2024-06-10 06:01:58,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.15 | bwd_microstep: 1577.50 | bwd_inner_microstep: 1577.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 06:02:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 06:02:03,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.00 | bwd_microstep: 1613.65 | bwd_inner_microstep: 1613.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3644
[2024-06-10 06:02:04,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917
[2024-06-10 06:02:05,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.82 | bwd_microstep: 688.45 | bwd_inner_microstep: 688.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 06:02:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.59 | bwd_microstep: 1430.27 | bwd_inner_microstep: 1430.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 06:02:09,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1279.23 | bwd_inner_microstep: 1279.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 06:02:11,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1438.72 | bwd_inner_microstep: 1438.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3602
[2024-06-10 06:02:13,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1369.16 | bwd_inner_microstep: 1369.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 06:02:15,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.06 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 06:02:17,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.16 | bwd_microstep: 1405.20 | bwd_inner_microstep: 1405.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 06:02:19,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1376.30 | bwd_inner_microstep: 1376.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 06:02:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1476.79 | bwd_inner_microstep: 1476.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 06:02:23,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.13 | bwd_microstep: 1288.04 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3066
[2024-06-10 06:02:24,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.97 | bwd_microstep: 1140.13 | bwd_inner_microstep: 1140.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426
[2024-06-10 06:02:26,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.60 | bwd_microstep: 1316.25 | bwd_inner_microstep: 1316.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 06:02:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 06:02:32,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.50 | bwd_microstep: 5104.37 | bwd_inner_microstep: 1987.75 | bwd_allreduce_microstep: 3116.56 | step_microstep: 38.88
[2024-06-10 06:02:32,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15500.15 | bwd: 44509.87 | bwd_inner: 41392.35 | bwd_allreduce: 3116.81 | step: 40.50
{'loss': 1.3333, 'learning_rate': 3.771850380133545e-05, 'epoch': 0.18}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 06:02:34,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1372.66 | bwd_inner_microstep: 1372.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 06:02:35,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.98 | bwd_microstep: 816.84 | bwd_inner_microstep: 816.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2447
[2024-06-10 06:02:36,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.53 | bwd_microstep: 1014.43 | bwd_inner_microstep: 1014.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746
[2024-06-10 06:02:39,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.56 | bwd_microstep: 1638.31 | bwd_inner_microstep: 1638.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3741
[2024-06-10 06:02:41,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1465.23 | bwd_inner_microstep: 1465.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 06:02:43,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1388.64 | bwd_inner_microstep: 1388.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 06:02:44,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.68 | bwd_microstep: 1159.57 | bwd_inner_microstep: 1159.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 06:02:46,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 06:02:48,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.31 | bwd_microstep: 1634.48 | bwd_inner_microstep: 1634.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 06:02:50,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1417.06 | bwd_inner_microstep: 1417.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 06:02:52,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1405.10 | bwd_inner_microstep: 1405.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-10 06:02:54,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1326.37 | bwd_inner_microstep: 1326.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 06:02:56,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1485.47 | bwd_inner_microstep: 1485.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 06:02:58,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1447.79 | bwd_inner_microstep: 1447.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 06:03:00,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1241.52 | bwd_inner_microstep: 1241.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 06:03:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.74 | bwd_microstep: 1487.84 | bwd_inner_microstep: 1487.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3680
[2024-06-10 06:03:04,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.34 | bwd_microstep: 1450.19 | bwd_inner_microstep: 1450.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3611
[2024-06-10 06:03:06,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.01 | bwd_microstep: 1371.24 | bwd_inner_microstep: 1371.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3492
[2024-06-10 06:03:08,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.12 | bwd_microstep: 1714.74 | bwd_inner_microstep: 1714.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 06:03:10,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.42 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 06:03:11,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.44 | bwd_microstep: 976.74 | bwd_inner_microstep: 976.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 06:03:13,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.99 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744
[2024-06-10 06:03:15,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1469.20 | bwd_inner_microstep: 1469.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 06:03:17,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1498.77 | bwd_inner_microstep: 1498.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3610
[2024-06-10 06:03:19,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1311.53 | bwd_inner_microstep: 1311.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 06:03:21,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.03 | bwd_microstep: 1287.58 | bwd_inner_microstep: 1287.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3777
[2024-06-10 06:03:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.07 | bwd_microstep: 1352.07 | bwd_inner_microstep: 1352.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2054
[2024-06-10 06:03:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.29 | bwd_microstep: 867.00 | bwd_inner_microstep: 866.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 06:03:26,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.76 | bwd_microstep: 1649.10 | bwd_inner_microstep: 1649.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 06:03:28,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1492.81 | bwd_inner_microstep: 1492.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3482
[2024-06-10 06:03:30,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.56 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 06:03:34,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 06:03:34,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.49 | bwd_microstep: 2908.84 | bwd_inner_microstep: 2012.43 | bwd_allreduce_microstep: 896.37 | step_microstep: 38.43
[2024-06-10 06:03:34,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16396.50 | bwd: 45134.01 | bwd_inner: 44236.68 | bwd_allreduce: 896.63 | step: 40.03
{'loss': 1.2753, 'learning_rate': 3.770106330077231e-05, 'epoch': 0.18}
5/1726 [5:21:02<24:09:20, 61.20s/it]


 18%|█▊        | 305/1726 [5:21:02<24:09:20, 61.20s/it]
 18%|█▊        | 306/1726 [5:22:06<24:25:24, 61.92s/it]


 18%|█▊        | 306/1726 [5:22:06<24:25:24, 61.92s/it]
 18%|█▊        | 307/1726 [5:23:06<24:14:15, 61.49s/it]


 18%|█▊        | 307/1726 [5:23:06<24:14:15, 61.49s/it]
 18%|█▊        | 308/1726 [5:24:08<24:18:36, 61.72s/it]


 18%|█▊        | 308/1726 [5:24:08<24:18:36, 61.72s/it]
 18%|█▊        | 309/1726 [5:25:09<24:07:54, 61.31s/it]


 18%|█▊        | 309/1726 [5:25:09<24:07:54, 61.31s/it]
 18%|█▊        | 310/1726 [5:26:11<24:10:56, 61.48s/it]


 18%|█▊        | 310/1726 [5:26:11<24:10:dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1959
[2024-06-10 06:03:35,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.80 | bwd_microstep: 891.17 | bwd_inner_microstep: 891.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 06:03:37,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1375.29 | bwd_inner_microstep: 1375.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3858
[2024-06-10 06:03:39,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.11 | bwd_microstep: 1562.04 | bwd_inner_microstep: 1562.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 06:03:41,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1439.89 | bwd_inner_microstep: 1439.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 06:03:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.13 | bwd_inner_microstep: 1249.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 06:03:45,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.64 | bwd_microstep: 1633.51 | bwd_inner_microstep: 1633.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873
[2024-06-10 06:03:46,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.51 | bwd_microstep: 681.13 | bwd_inner_microstep: 681.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420
[2024-06-10 06:03:48,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.74 | bwd_microstep: 1214.01 | bwd_inner_microstep: 1213.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 06:03:49,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.71 | bwd_microstep: 1154.79 | bwd_inner_microstep: 1154.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 06:03:52,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.39 | bwd_microstep: 1627.37 | bwd_inner_microstep: 1627.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 06:03:54,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.38 | bwd_microstep: 1523.97 | bwd_inner_microstep: 1523.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678
[2024-06-10 06:03:56,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.96 | bwd_microstep: 1450.90 | bwd_inner_microstep: 1450.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933
[2024-06-10 06:03:57,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.38 | bwd_microstep: 1204.11 | bwd_inner_microstep: 1204.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 06:03:59,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.00 | bwd_microstep: 1513.71 | bwd_inner_microstep: 1513.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3408
[2024-06-10 06:04:01,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.77 | bwd_microstep: 1370.77 | bwd_inner_microstep: 1370.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3887
[2024-06-10 06:04:04,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.34 | bwd_microstep: 1890.92 | bwd_inner_microstep: 1890.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 06:04:06,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2665
[2024-06-10 06:04:07,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.01 | bwd_microstep: 1025.34 | bwd_inner_microstep: 1025.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 06:04:09,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1446.39 | bwd_inner_microstep: 1446.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384
[2024-06-10 06:04:11,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1243.48 | bwd_inner_microstep: 1243.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 06:04:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.92 | bwd_microstep: 1352.62 | bwd_inner_microstep: 1352.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3810
[2024-06-10 06:04:15,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1596.90 | bwd_inner_microstep: 1596.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-10 06:04:17,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1432.06 | bwd_inner_microstep: 1432.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 06:04:18,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.71 | bwd_microstep: 804.04 | bwd_inner_microstep: 804.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3789
[2024-06-10 06:04:20,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1618.53 | bwd_inner_microstep: 1618.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3570
[2024-06-10 06:04:22,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.81 | bwd_microstep: 1433.01 | bwd_inner_microstep: 1432.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2192
[2024-06-10 06:04:23,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.58 | bwd_microstep: 797.08 | bwd_inner_microstep: 797.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 06:04:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3807
[2024-06-10 06:04:27,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1388.15 | bwd_inner_microstep: 1388.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 06:04:29,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.24 | bwd_microstep: 1504.22 | bwd_inner_microstep: 1504.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 06:04:31,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.00 | bwd_microstep: 1287.17 | bwd_inner_microstep: 1287.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-10 06:04:35,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 06:04:35,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 3673.04 | bwd_inner_microstep: 1683.65 | bwd_allreduce_microstep: 1989.34 | step_microstep: 38.85
[2024-06-10 06:04:35,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16029.77 | bwd: 45007.10 | bwd_inner: 43016.81 | bwd_allreduce: 1989.59 | step: 40.52
{'loss': 1.2932, 'learning_rate': 3.768356045715624e-05, 'epoch': 0.18}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 06:04:37,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1491.14 | bwd_inner_microstep: 1490.98 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 06:04:39,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.95 | bwd_microstep: 1561.66 | bwd_inner_microstep: 1561.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2338
[2024-06-10 06:04:41,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.23 | bwd_microstep: 985.69 | bwd_inner_microstep: 985.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-10 06:04:43,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.65 | bwd_microstep: 1448.12 | bwd_inner_microstep: 1448.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 06:04:45,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.95 | bwd_microstep: 1633.77 | bwd_inner_microstep: 1633.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 06:04:47,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.23 | bwd_microstep: 1636.10 | bwd_inner_microstep: 1636.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 06:04:49,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 06:04:51,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1419.30 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 06:04:53,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1291.94 | bwd_inner_microstep: 1291.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 06:04:55,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1346.80 | bwd_inner_microstep: 1346.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2214
[2024-06-10 06:04:56,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.90 | bwd_microstep: 896.19 | bwd_inner_microstep: 896.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 06:04:58,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1346.72 | bwd_inner_microstep: 1346.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 06:05:00,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1348.05 | bwd_inner_microstep: 1348.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 06:05:02,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.37 | bwd_microstep: 1588.53 | bwd_inner_microstep: 1588.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2922
[2024-06-10 06:05:03,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.92 | bwd_microstep: 1129.02 | bwd_inner_microstep: 1128.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-10 06:05:04,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.99 | bwd_microstep: 729.54 | bwd_inner_microstep: 729.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 06:05:06,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1291.10 | bwd_inner_microstep: 1291.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-10 06:05:07,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.52 | bwd_microstep: 823.74 | bwd_inner_microstep: 823.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 06:05:09,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.98 | bwd_microstep: 1255.79 | bwd_inner_microstep: 1255.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1584
[2024-06-10 06:05:10,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 223.00 | bwd_microstep: 572.26 | bwd_inner_microstep: 572.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3538
[2024-06-10 06:05:12,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.34 | bwd_microstep: 1329.90 | bwd_inner_microstep: 1329.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 06:05:14,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1445.65 | bwd_inner_microstep: 1445.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-10 06:05:15,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 698.82 | bwd_inner_microstep: 698.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2398
[2024-06-10 06:05:16,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.82 | bwd_microstep: 1125.56 | bwd_inner_microstep: 1125.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3821
[2024-06-10 06:05:18,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.50 | bwd_microstep: 1691.10 | bwd_inner_microstep: 1691.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 06:05:21,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.10 | bwd_microstep: 1659.87 | bwd_inner_microstep: 1659.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3704
[2024-06-10 06:05:23,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.97 | bwd_microstep: 1677.76 | bwd_inner_microstep: 1677.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2282
[2024-06-10 06:05:25,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.09 | bwd_microstep: 1075.54 | bwd_inner_microstep: 1075.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 06:05:27,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1459.93 | bwd_inner_microstep: 1459.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422
[2024-06-10 06:05:29,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1540.42 | bwd_inner_microstep: 1540.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 06:05:31,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.13 | bwd_microstep: 1548.85 | bwd_inner_microstep: 1548.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-10 06:05:38,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 06:05:38,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.00 | bwd_microstep: 6557.09 | bwd_inner_microstep: 786.86 | bwd_allreduce_microstep: 5770.17 | step_microstep: 39.32
[2024-06-10 06:05:38,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15326.15 | bwd: 46856.70 | bwd_inner: 41085.48 | bwd_allreduce: 5770.48 | step: 40.95
{'loss': 1.3306, 'learning_rate': 3.766599533213218e-05, 'epoch': 0.18}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 06:05:40,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.59 | bwd_microstep: 1462.67 | bwd_inner_microstep: 1462.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3913
[2024-06-10 06:05:42,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.52 | bwd_microstep: 1587.25 | bwd_inner_microstep: 1587.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 06:05:43,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.41 | bwd_microstep: 786.26 | bwd_inner_microstep: 786.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 06:05:45,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.16 | bwd_microstep: 1340.38 | bwd_inner_microstep: 1340.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 06:05:47,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1250.22 | bwd_inner_microstep: 1250.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 06:05:49,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1381.95 | bwd_inner_microstep: 1381.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940
[2024-06-10 06:05:49,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.28 | bwd_microstep: 700.31 | bwd_inner_microstep: 700.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053
[2024-06-10 06:05:51,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.81 | bwd_microstep: 817.96 | bwd_inner_microstep: 817.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 06:05:52,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2206
[2024-06-10 06:05:54,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.58 | bwd_microstep: 961.89 | bwd_inner_microstep: 961.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3690
[2024-06-10 06:05:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.77 | bwd_microstep: 1553.60 | bwd_inner_microstep: 1553.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3627
[2024-06-10 06:05:58,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.20 | bwd_microstep: 1373.98 | bwd_inner_microstep: 1373.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 06:06:00,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785
[2024-06-10 06:06:02,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 1610.88 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3463
[2024-06-10 06:06:04,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.71 | bwd_microstep: 1502.87 | bwd_inner_microstep: 1502.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3649
[2024-06-10 06:06:06,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1283.93 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3530
[2024-06-10 06:06:08,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.29 | bwd_microstep: 1559.46 | bwd_inner_microstep: 1559.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093
[2024-06-10 06:06:09,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.68 | bwd_microstep: 921.79 | bwd_inner_microstep: 921.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 06:06:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1279.30 | bwd_inner_microstep: 1279.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527
[2024-06-10 06:06:13,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.41 | bwd_microstep: 1328.56 | bwd_inner_microstep: 1328.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3657
[2024-06-10 06:06:15,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.00 | bwd_microstep: 1354.17 | bwd_inner_microstep: 1354.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2132
[2024-06-10 06:06:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.93 | bwd_microstep: 931.02 | bwd_inner_microstep: 930.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-10 06:06:17,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.85 | bwd_microstep: 853.06 | bwd_inner_microstep: 853.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 06:06:19,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3609
[2024-06-10 06:06:21,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.77 | bwd_microstep: 1588.30 | bwd_inner_microstep: 1588.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 06:06:23,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1510.28 | bwd_inner_microstep: 1510.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 06:06:25,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.14 | bwd_microstep: 1484.07 | bwd_inner_microstep: 1484.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625
[2024-06-10 06:06:27,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1536.95 | bwd_inner_microstep: 1536.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 06:06:29,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.17 | bwd_microstep: 795.33 | bwd_inner_microstep: 795.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675
[2024-06-10 06:06:31,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1543.67 | bwd_inner_microstep: 1543.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597
[2024-06-10 06:06:33,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1309.17 | bwd_inner_microstep: 1309.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3803
[2024-06-10 06:06:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.26 | optimizer_step: 6.62
[2024-06-10 06:06:40,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 6692.21 | bwd_inner_microstep: 1524.32 | bwd_allreduce_microstep: 5167.83 | step_microstep: 39.04
[2024-06-10 06:06:40,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15373.58 | bwd: 46338.18 | bwd_inner: 41169.44 | bwd_allreduce: 5168.06 | step: 40.65
{'loss': 1.3176, 'learning_rate': 3.764836798756439e-05, 'epoch': 0.18}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 06:06:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.97 | bwd_microstep: 1497.04 | bwd_inner_microstep: 1497.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414
[2024-06-10 06:06:44,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.31 | bwd_microstep: 1206.96 | bwd_inner_microstep: 1206.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 06:06:45,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.82 | bwd_microstep: 1282.09 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 06:06:47,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1243.21 | bwd_inner_microstep: 1243.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 06:06:49,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1531.31 | bwd_inner_microstep: 1531.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 06:06:50,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.90 | bwd_microstep: 779.68 | bwd_inner_microstep: 779.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 06:06:52,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.97 | bwd_microstep: 1146.79 | bwd_inner_microstep: 1146.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 06:06:53,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.09 | bwd_microstep: 680.13 | bwd_inner_microstep: 680.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3713
[2024-06-10 06:06:55,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1560.42 | bwd_inner_microstep: 1560.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 06:06:57,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.26 | bwd_microstep: 1293.40 | bwd_inner_microstep: 1293.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408
[2024-06-10 06:06:58,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1309.48 | bwd_inner_microstep: 1309.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 06:07:00,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3587
[2024-06-10 06:07:03,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1574.00 | bwd_inner_microstep: 1573.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3497
[2024-06-10 06:07:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.43 | bwd_microstep: 1645.36 | bwd_inner_microstep: 1645.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 06:07:07,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.07 | bwd_microstep: 1350.98 | bwd_inner_microstep: 1350.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969
[2024-06-10 06:07:08,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.70 | bwd_microstep: 703.16 | bwd_inner_microstep: 703.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 06:07:09,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 795.43 | bwd_inner_microstep: 795.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-10 06:07:10,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1189.29 | bwd_inner_microstep: 1189.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 06:07:12,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1426.64 | bwd_inner_microstep: 1426.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3619
[2024-06-10 06:07:14,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1373.58 | bwd_inner_microstep: 1373.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-10 06:07:16,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.94 | bwd_microstep: 1289.24 | bwd_inner_microstep: 1289.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 06:07:18,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.11 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 06:07:20,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.14 | bwd_microstep: 1296.26 | bwd_inner_microstep: 1296.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 06:07:22,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1282.91 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3820
[2024-06-10 06:07:24,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1401.35 | bwd_inner_microstep: 1401.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 06:07:26,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1504.68 | bwd_inner_microstep: 1504.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3806
[2024-06-10 06:07:28,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.52 | bwd_microstep: 1687.07 | bwd_inner_microstep: 1687.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2045
[2024-06-10 06:07:29,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.17 | bwd_microstep: 1003.52 | bwd_inner_microstep: 1003.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 06:07:31,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.94 | bwd_microstep: 1355.26 | bwd_inner_microstep: 1355.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588
[2024-06-10 06:07:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.90 | bwd_microstep: 1702.99 | bwd_inner_microstep: 1702.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563
[2024-06-10 06:07:36,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1422.26 | bwd_inner_microstep: 1422.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 06:07:40,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.26 | optimizer_step: 6.61
[2024-06-10 06:07:40,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 4120.18 | bwd_inner_microstep: 1742.20 | bwd_allreduce_microstep: 2377.92 | step_microstep: 39.00
[2024-06-10 06:07:40,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.06 | bwd: 44550.00 | bwd_inner: 42171.17 | bwd_allreduce: 2378.15 | step: 40.63
{'loss': 1.3222, 'learning_rate': 3.763067848553629e-05, 'epoch': 0.18}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2911
[2024-06-10 06:07:42,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.99 | bwd_microstep: 1177.96 | bwd_inner_microstep: 1177.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1935
[2024-06-10 06:07:43,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.41 | bwd_microstep: 850.33 | bwd_inner_microstep: 850.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2305
[2024-06-10 06:07:44,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 881.46 | bwd_inner_microstep: 881.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931
[2024-06-10 06:07:46,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.53 | bwd_microstep: 819.64 | bwd_inner_microstep: 819.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895
[2024-06-10 06:07:47,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.00 | bwd_microstep: 683.29 | bwd_inner_microstep: 683.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2450
[2024-06-10 06:07:48,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.13 | bwd_microstep: 977.64 | bwd_inner_microstep: 977.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 06:07:50,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1394.75 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 06:07:52,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.85 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3408
[2024-06-10 06:07:53,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.36 | bwd_microstep: 1294.62 | bwd_inner_microstep: 1294.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 06:07:55,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1343.44 | bwd_inner_microstep: 1343.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478
[2024-06-10 06:07:57,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.49 | bwd_microstep: 1412.69 | bwd_inner_microstep: 1412.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957
[2024-06-10 06:07:58,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.75 | bwd_microstep: 891.00 | bwd_inner_microstep: 890.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489
[2024-06-10 06:08:01,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1575.25 | bwd_inner_microstep: 1575.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3945
[2024-06-10 06:08:03,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.70 | bwd_microstep: 1690.13 | bwd_inner_microstep: 1690.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 06:08:05,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.85 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 06:08:06,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1288.05 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 06:08:08,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1388.80 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 06:08:10,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1376.14 | bwd_inner_microstep: 1376.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 06:08:12,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.42 | bwd_microstep: 1416.93 | bwd_inner_microstep: 1416.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821
[2024-06-10 06:08:14,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.63 | bwd_microstep: 1421.19 | bwd_inner_microstep: 1421.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535
[2024-06-10 06:08:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.34 | bwd_microstep: 1450.92 | bwd_inner_microstep: 1450.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2522
[2024-06-10 06:08:18,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.95 | bwd_microstep: 1025.89 | bwd_inner_microstep: 1025.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-10 06:08:20,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.47 | bwd_microstep: 1619.31 | bwd_inner_microstep: 1619.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 06:08:22,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.22 | bwd_microstep: 1722.04 | bwd_inner_microstep: 1722.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3825
[2024-06-10 06:08:24,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1420.58 | bwd_inner_microstep: 1420.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 06:08:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.25 | bwd_microstep: 1492.26 | bwd_inner_microstep: 1492.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 06:08:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.03 | bwd_microstep: 1451.38 | bwd_inner_microstep: 1451.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3729
[2024-06-10 06:08:30,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.06 | bwd_microstep: 1565.67 | bwd_inner_microstep: 1565.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-10 06:08:33,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.39 | bwd_microstep: 1537.80 | bwd_inner_microstep: 1537.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 06:08:35,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1403.62 | bwd_inner_microstep: 1403.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-10 06:08:37,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.49 | bwd_microstep: 1447.25 | bwd_inner_microstep: 1447.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058
[2024-06-10 06:08:41,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 06:08:41,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.02 | bwd_microstep: 3794.27 | bwd_inner_microstep: 1044.63 | bwd_allreduce_microstep: 2749.59 | step_microstep: 40.78
[2024-06-10 06:08:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15571.40 | bwd: 44383.21 | bwd_inner: 41632.67 | bwd_allreduce: 2749.82 | step: 42.45
{'loss': 1.2503, 'learning_rate': 3.7612926888350216e-05, 'epoch': 0.18}
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4670
[2024-06-10 06:08:43,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 732.69 | bwd_microstep: 1957.39 | bwd_inner_microstep: 1957.17 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 06:08:45,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 06:08:48,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.72 | bwd_microstep: 1651.00 | bwd_inner_microstep: 1650.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 06:08:50,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1482.92 | bwd_inner_microstep: 1482.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 06:08:51,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2217
[2024-06-10 06:08:53,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.44 | bwd_microstep: 893.29 | bwd_inner_microstep: 893.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 06:08:55,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.98 | bwd_microstep: 1430.16 | bwd_inner_microstep: 1430.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4230
[2024-06-10 06:08:57,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1562.49 | bwd_inner_microstep: 1562.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 06:08:58,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.61 | bwd_microstep: 679.64 | bwd_inner_microstep: 679.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507
[2024-06-10 06:08:59,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.93 | bwd_microstep: 1195.39 | bwd_inner_microstep: 1195.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569
[2024-06-10 06:09:01,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.39 | bwd_microstep: 1206.45 | bwd_inner_microstep: 1206.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 06:09:03,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.59 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 06:09:04,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 889.82 | bwd_inner_microstep: 889.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513
[2024-06-10 06:09:06,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1446.72 | bwd_inner_microstep: 1446.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 06:09:09,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.97 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147
[2024-06-10 06:09:10,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 950.38 | bwd_inner_microstep: 950.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 06:09:12,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.55 | bwd_microstep: 1485.97 | bwd_inner_microstep: 1485.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385
[2024-06-10 06:09:14,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.11 | bwd_microstep: 1243.30 | bwd_inner_microstep: 1243.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-10 06:09:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1451.84 | bwd_inner_microstep: 1451.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-10 06:09:17,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 896.19 | bwd_inner_microstep: 896.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3715
[2024-06-10 06:09:19,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.19 | bwd_microstep: 1271.80 | bwd_inner_microstep: 1271.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1983
[2024-06-10 06:09:20,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.58 | bwd_microstep: 751.69 | bwd_inner_microstep: 751.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709
[2024-06-10 06:09:21,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1333.84 | bwd_inner_microstep: 1333.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 06:09:23,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.75 | bwd_microstep: 1358.43 | bwd_inner_microstep: 1358.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 06:09:26,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.67 | bwd_microstep: 1559.61 | bwd_inner_microstep: 1559.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054
[2024-06-10 06:09:27,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.55 | bwd_microstep: 917.18 | bwd_inner_microstep: 917.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053
[2024-06-10 06:09:28,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.35 | bwd_microstep: 915.24 | bwd_inner_microstep: 915.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2189
[2024-06-10 06:09:29,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.68 | bwd_microstep: 810.89 | bwd_inner_microstep: 810.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-10 06:09:31,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1445.11 | bwd_inner_microstep: 1445.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 06:09:33,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.16 | bwd_microstep: 1550.66 | bwd_inner_microstep: 1550.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3737
[2024-06-10 06:09:35,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.70 | bwd_microstep: 1340.59 | bwd_inner_microstep: 1340.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3755
[2024-06-10 06:09:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.27 | optimizer_step: 6.59
[2024-06-10 06:09:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 4903.69 | bwd_inner_microstep: 1549.07 | bwd_allreduce_microstep: 3354.57 | step_microstep: 38.90
[2024-06-10 06:09:41,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15295.68 | bwd: 44294.47 | bwd_inner: 40938.83 | bwd_allreduce: 3354.89 | step: 40.63
{'loss': 1.3758, 'learning_rate': 3.7595113258527206e-05, 'epoch': 0.18}
56, 61.48s/it]
 18%|█▊        | 311/1726 [5:27:12<24:09:13, 61.45s/it]


 18%|█▊        | 311/1726 [5:27:12<24:09:13, 61.45s/it]
 18%|█▊        | 312/1726 [5:28:14<24:15:51, 61.78s/it]


 18%|█▊        | 312/1726 [5:28:14<24:15:51, 61.78s/it]
 18%|█▊        | 313/1726 [5:29:17<24:16:48, 61.86s/it]


 18%|█▊        | 313/1726 [5:29:17<24:16:48, 61.86s/it]
 18%|█▊        | 314/1726 [5:30:17<24:07:04, 61.49s/it]


 18%|█▊        | 314/1726 [5:30:17<24:07:04, 61.49s/it]
 18%|█▊        | 315/1726 [5:31:17<23:57:43, 61.14s/it]


 18%|█▊        | 315/1726 [5:31:17<23:57:43, 61.14s/it]
 18%|█▊        | 316/1726 [5:32:17<23:48:18, 60.78s/it]


 18%|�dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 06:09:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 789.20 | bwd_inner_microstep: 789.05 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 06:09:44,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1393.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 06:09:46,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.94 | bwd_microstep: 1382.21 | bwd_inner_microstep: 1382.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 06:09:48,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1453.54 | bwd_inner_microstep: 1453.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 06:09:49,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.33 | bwd_microstep: 1250.48 | bwd_inner_microstep: 1250.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 06:09:51,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1283.37 | bwd_inner_microstep: 1283.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 06:09:53,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1304.15 | bwd_inner_microstep: 1304.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 06:09:55,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1284.77 | bwd_inner_microstep: 1284.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 06:09:56,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.95 | bwd_microstep: 1222.40 | bwd_inner_microstep: 1222.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966
[2024-06-10 06:09:58,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.46 | bwd_microstep: 825.60 | bwd_inner_microstep: 825.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 06:10:00,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.53 | bwd_microstep: 1522.83 | bwd_inner_microstep: 1522.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-10 06:10:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.69 | bwd_microstep: 1519.52 | bwd_inner_microstep: 1519.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491
[2024-06-10 06:10:04,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.96 | bwd_microstep: 1584.44 | bwd_inner_microstep: 1584.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3651
[2024-06-10 06:10:06,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.31 | bwd_microstep: 1685.64 | bwd_inner_microstep: 1685.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 06:10:08,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1507.05 | bwd_inner_microstep: 1507.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3555
[2024-06-10 06:10:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.36 | bwd_microstep: 1663.88 | bwd_inner_microstep: 1663.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 06:10:12,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.06 | bwd_microstep: 1283.63 | bwd_inner_microstep: 1283.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 06:10:14,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1515.33 | bwd_inner_microstep: 1515.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 06:10:16,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1354.55 | bwd_inner_microstep: 1354.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 06:10:18,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304
[2024-06-10 06:10:20,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.13 | bwd_microstep: 980.52 | bwd_inner_microstep: 980.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 06:10:22,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.30 | bwd_microstep: 1660.47 | bwd_inner_microstep: 1660.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 06:10:23,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.50 | bwd_microstep: 977.29 | bwd_inner_microstep: 977.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 06:10:25,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.05 | bwd_microstep: 975.84 | bwd_inner_microstep: 975.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 06:10:27,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.91 | bwd_microstep: 1399.18 | bwd_inner_microstep: 1399.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 06:10:29,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.06 | bwd_microstep: 1624.74 | bwd_inner_microstep: 1624.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3597
[2024-06-10 06:10:31,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1557.60 | bwd_inner_microstep: 1557.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3815
[2024-06-10 06:10:33,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.54 | bwd_microstep: 1440.57 | bwd_inner_microstep: 1440.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678
[2024-06-10 06:10:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.14 | bwd_microstep: 1456.98 | bwd_inner_microstep: 1456.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270
[2024-06-10 06:10:36,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 877.73 | bwd_inner_microstep: 877.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 06:10:38,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1543.30 | bwd_inner_microstep: 1543.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767
[2024-06-10 06:10:42,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 06:10:42,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.55 | bwd_microstep: 3323.73 | bwd_inner_microstep: 1786.17 | bwd_allreduce_microstep: 1537.51 | step_microstep: 38.73
[2024-06-10 06:10:42,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16197.77 | bwd: 45024.63 | bwd_inner: 43486.09 | bwd_allreduce: 1537.79 | step: 40.45
{'loss': 1.2901, 'learning_rate': 3.757723765880677e-05, 'epoch': 0.18}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 06:10:44,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1480.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4193
[2024-06-10 06:10:47,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.33 | bwd_microstep: 1751.70 | bwd_inner_microstep: 1751.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3841
[2024-06-10 06:10:49,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.98 | bwd_microstep: 1392.89 | bwd_inner_microstep: 1392.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 06:10:51,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1382.11 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3774
[2024-06-10 06:10:53,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1502.69 | bwd_inner_microstep: 1502.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 06:10:55,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.98 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 06:10:56,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1248.06 | bwd_inner_microstep: 1248.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 06:10:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.97 | bwd_microstep: 1387.46 | bwd_inner_microstep: 1387.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410
[2024-06-10 06:11:00,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.55 | bwd_microstep: 1373.04 | bwd_inner_microstep: 1373.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 06:11:02,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3007
[2024-06-10 06:11:04,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.96 | bwd_microstep: 1201.92 | bwd_inner_microstep: 1201.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 06:11:06,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.45 | bwd_microstep: 1489.96 | bwd_inner_microstep: 1489.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1967
[2024-06-10 06:11:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.38 | bwd_microstep: 828.13 | bwd_inner_microstep: 828.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 06:11:09,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.01 | bwd_microstep: 1345.12 | bwd_inner_microstep: 1345.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 06:11:11,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 06:11:13,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1408.13 | bwd_inner_microstep: 1408.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2123
[2024-06-10 06:11:14,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.71 | bwd_microstep: 941.75 | bwd_inner_microstep: 941.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 06:11:16,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.14 | bwd_microstep: 1421.69 | bwd_inner_microstep: 1421.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 06:11:18,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1375.98 | bwd_inner_microstep: 1375.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-10 06:11:20,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.63 | bwd_microstep: 1620.03 | bwd_inner_microstep: 1620.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 06:11:22,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.37 | bwd_microstep: 1613.71 | bwd_inner_microstep: 1613.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 06:11:24,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.36 | bwd_microstep: 1186.59 | bwd_inner_microstep: 1186.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 06:11:26,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.05 | bwd_microstep: 1395.27 | bwd_inner_microstep: 1395.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3748
[2024-06-10 06:11:28,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.89 | bwd_microstep: 1542.89 | bwd_inner_microstep: 1542.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061
[2024-06-10 06:11:29,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.19 | bwd_microstep: 915.25 | bwd_inner_microstep: 915.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-10 06:11:31,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.90 | bwd_microstep: 1347.59 | bwd_inner_microstep: 1347.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755
[2024-06-10 06:11:33,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.49 | bwd_microstep: 1346.33 | bwd_inner_microstep: 1346.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766
[2024-06-10 06:11:35,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.24 | bwd_microstep: 1347.87 | bwd_inner_microstep: 1347.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 06:11:37,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.44 | bwd_microstep: 1334.85 | bwd_inner_microstep: 1334.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 06:11:39,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.84 | bwd_microstep: 1655.13 | bwd_inner_microstep: 1655.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 06:11:41,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1282.47 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 06:11:45,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 06:11:45,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.99 | bwd_microstep: 3693.44 | bwd_inner_microstep: 1703.25 | bwd_allreduce_microstep: 1990.14 | step_microstep: 38.55
[2024-06-10 06:11:45,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16432.07 | bwd: 45969.88 | bwd_inner: 43978.80 | bwd_allreduce: 1990.37 | step: 40.20
{'loss': 1.3099, 'learning_rate': 3.7559300152146665e-05, 'epoch': 0.18}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 06:11:47,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1278.10 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-10 06:11:49,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.57 | bwd_microstep: 1341.21 | bwd_inner_microstep: 1341.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798
[2024-06-10 06:11:51,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.50 | bwd_microstep: 1651.22 | bwd_inner_microstep: 1651.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 06:11:53,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.72 | bwd_microstep: 1281.29 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 06:11:55,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.47 | bwd_microstep: 1404.57 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735
[2024-06-10 06:11:57,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.69 | bwd_microstep: 1430.35 | bwd_inner_microstep: 1430.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 06:11:59,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.78 | bwd_microstep: 1386.68 | bwd_inner_microstep: 1386.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 06:12:00,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1253.66 | bwd_inner_microstep: 1253.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3417
[2024-06-10 06:12:02,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1397.10 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1970
[2024-06-10 06:12:03,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.40 | bwd_microstep: 766.53 | bwd_inner_microstep: 766.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 06:12:05,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 1529.69 | bwd_inner_microstep: 1529.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671
[2024-06-10 06:12:07,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1551.32 | bwd_inner_microstep: 1551.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3422
[2024-06-10 06:12:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.74 | bwd_microstep: 1214.84 | bwd_inner_microstep: 1214.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 06:12:11,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1391.23 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3645
[2024-06-10 06:12:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1574.58 | bwd_inner_microstep: 1574.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940
[2024-06-10 06:12:15,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.91 | bwd_microstep: 888.63 | bwd_inner_microstep: 888.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3622
[2024-06-10 06:12:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1435.53 | bwd_inner_microstep: 1435.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3836
[2024-06-10 06:12:19,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.66 | bwd_microstep: 1522.34 | bwd_inner_microstep: 1522.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3543
[2024-06-10 06:12:21,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1441.75 | bwd_inner_microstep: 1441.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 06:12:23,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.44 | bwd_microstep: 1561.77 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844
[2024-06-10 06:12:25,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1468.18 | bwd_inner_microstep: 1468.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 06:12:27,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1354.04 | bwd_inner_microstep: 1354.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 06:12:29,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.12 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 06:12:30,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1280.34 | bwd_inner_microstep: 1280.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 06:12:32,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1392.39 | bwd_inner_microstep: 1392.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3444
[2024-06-10 06:12:34,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1401.74 | bwd_inner_microstep: 1401.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 06:12:36,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1289.14 | bwd_inner_microstep: 1289.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 06:12:38,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1509.48 | bwd_inner_microstep: 1509.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 06:12:40,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1396.84 | bwd_inner_microstep: 1396.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182
[2024-06-10 06:12:41,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.00 | bwd_microstep: 858.72 | bwd_inner_microstep: 858.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3377
[2024-06-10 06:12:43,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.02 | bwd_microstep: 1433.64 | bwd_inner_microstep: 1433.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 06:12:49,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 06:12:49,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 5120.39 | bwd_inner_microstep: 1679.86 | bwd_allreduce_microstep: 3440.47 | step_microstep: 38.86
[2024-06-10 06:12:49,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16358.25 | bwd: 47202.86 | bwd_inner: 43761.40 | bwd_allreduce: 3440.74 | step: 40.50
{'loss': 1.339, 'learning_rate': 3.7541300801722715e-05, 'epoch': 0.18}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 06:12:51,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1278.09 | bwd_inner_microstep: 1278.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4044
[2024-06-10 06:12:53,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.57 | bwd_microstep: 1550.58 | bwd_inner_microstep: 1550.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4033
[2024-06-10 06:12:55,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.79 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2249
[2024-06-10 06:12:56,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.77 | bwd_microstep: 869.97 | bwd_inner_microstep: 869.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 06:12:58,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.04 | bwd_microstep: 1445.01 | bwd_inner_microstep: 1444.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 06:13:00,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1248.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734
[2024-06-10 06:13:02,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1532.20 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4067
[2024-06-10 06:13:04,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.56 | bwd_microstep: 1622.30 | bwd_inner_microstep: 1622.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963
[2024-06-10 06:13:05,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.20 | bwd_microstep: 702.79 | bwd_inner_microstep: 702.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-10 06:13:06,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.05 | bwd_microstep: 709.41 | bwd_inner_microstep: 709.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975
[2024-06-10 06:13:07,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.96 | bwd_microstep: 706.33 | bwd_inner_microstep: 706.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 06:13:09,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.81 | bwd_microstep: 1283.81 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 06:13:11,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 06:13:13,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.48 | bwd_microstep: 1503.10 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 06:13:15,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 06:13:17,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3670
[2024-06-10 06:13:19,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.12 | bwd_microstep: 1673.97 | bwd_inner_microstep: 1673.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2175
[2024-06-10 06:13:20,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.75 | bwd_microstep: 889.53 | bwd_inner_microstep: 889.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3575
[2024-06-10 06:13:22,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.53 | bwd_microstep: 1206.99 | bwd_inner_microstep: 1206.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 06:13:24,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1294.73 | bwd_inner_microstep: 1294.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 06:13:25,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3681
[2024-06-10 06:13:27,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1358.47 | bwd_inner_microstep: 1358.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 06:13:29,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 06:13:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.66 | bwd_microstep: 1358.51 | bwd_inner_microstep: 1358.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 06:13:32,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.85 | bwd_microstep: 809.85 | bwd_inner_microstep: 809.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 06:13:34,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.36 | bwd_microstep: 1290.58 | bwd_inner_microstep: 1290.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 06:13:36,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1503.73 | bwd_inner_microstep: 1503.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771
[2024-06-10 06:13:38,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.97 | bwd_microstep: 1347.66 | bwd_inner_microstep: 1347.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-10 06:13:40,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.94 | bwd_microstep: 1313.72 | bwd_inner_microstep: 1313.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 06:13:42,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.45 | bwd_microstep: 1380.94 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2800
[2024-06-10 06:13:43,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.03 | bwd_microstep: 1087.56 | bwd_inner_microstep: 1087.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 06:13:50,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 06:13:50,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 6019.42 | bwd_inner_microstep: 1741.13 | bwd_allreduce_microstep: 4278.24 | step_microstep: 38.75
[2024-06-10 06:13:50,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.04 | bwd: 45188.52 | bwd_inner: 40909.37 | bwd_allreduce: 4278.47 | step: 40.37
{'loss': 1.318, 'learning_rate': 3.752323967092853e-05, 'epoch': 0.19}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 06:13:52,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.49 | bwd_microstep: 1371.32 | bwd_inner_microstep: 1371.19 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3561
[2024-06-10 06:13:54,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1358.06 | bwd_inner_microstep: 1358.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2373
[2024-06-10 06:13:55,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.48 | bwd_microstep: 838.22 | bwd_inner_microstep: 838.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-10 06:13:57,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1316.93 | bwd_inner_microstep: 1316.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 06:13:59,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 1446.24 | bwd_inner_microstep: 1446.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2435
[2024-06-10 06:14:00,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.34 | bwd_microstep: 946.71 | bwd_inner_microstep: 946.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 06:14:02,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.93 | bwd_microstep: 1455.93 | bwd_inner_microstep: 1455.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 06:14:04,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1345.76 | bwd_inner_microstep: 1345.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4071
[2024-06-10 06:14:06,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.71 | bwd_microstep: 1457.91 | bwd_inner_microstep: 1457.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-10 06:14:07,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.42 | bwd_microstep: 819.28 | bwd_inner_microstep: 819.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 06:14:09,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.44 | bwd_microstep: 1533.40 | bwd_inner_microstep: 1533.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3655
[2024-06-10 06:14:11,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1483.58 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3514
[2024-06-10 06:14:13,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1350.16 | bwd_inner_microstep: 1350.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534
[2024-06-10 06:14:15,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.44 | bwd_microstep: 1455.76 | bwd_inner_microstep: 1455.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 06:14:17,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1376.67 | bwd_inner_microstep: 1376.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3540
[2024-06-10 06:14:19,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1587.39 | bwd_inner_microstep: 1587.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478
[2024-06-10 06:14:21,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.85 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3423
[2024-06-10 06:14:23,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1469.04 | bwd_inner_microstep: 1469.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3429
[2024-06-10 06:14:25,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.08 | bwd_microstep: 1216.36 | bwd_inner_microstep: 1216.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3847
[2024-06-10 06:14:27,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.67 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 06:14:28,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.75 | bwd_microstep: 1186.72 | bwd_inner_microstep: 1186.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 06:14:30,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.73 | bwd_microstep: 1400.78 | bwd_inner_microstep: 1400.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-10 06:14:32,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1518.09 | bwd_inner_microstep: 1518.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 06:14:34,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1512.76 | bwd_inner_microstep: 1512.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 06:14:36,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1414.83 | bwd_inner_microstep: 1414.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 06:14:38,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.46 | bwd_microstep: 1255.98 | bwd_inner_microstep: 1255.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 06:14:40,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1554.29 | bwd_inner_microstep: 1554.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 06:14:42,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.44 | bwd_microstep: 1160.33 | bwd_inner_microstep: 1160.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-10 06:14:44,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.14 | bwd_microstep: 1519.79 | bwd_inner_microstep: 1519.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 06:14:46,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1456.62 | bwd_inner_microstep: 1456.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3476
[2024-06-10 06:14:48,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1330.44 | bwd_inner_microstep: 1330.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 06:14:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.27 | optimizer_step: 6.56
[2024-06-10 06:14:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.99 | bwd_microstep: 4819.30 | bwd_inner_microstep: 1691.40 | bwd_allreduce_microstep: 3127.84 | step_microstep: 38.56
[2024-06-10 06:14:53,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.80 | bwd: 46695.15 | bwd_inner: 43566.27 | bwd_allreduce: 3128.13 | step: 40.24
{'loss': 1.3217, 'learning_rate': 3.750511682337531e-05, 'epoch': 0.19}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2492
[2024-06-10 06:14:54,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.58 | bwd_microstep: 917.75 | bwd_inner_microstep: 917.64 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3964
[2024-06-10 06:14:56,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1401.09 | bwd_inner_microstep: 1401.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3846
[2024-06-10 06:14:59,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.02 | bwd_microstep: 1557.87 | bwd_inner_microstep: 1557.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 06:15:01,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1546.73 | bwd_inner_microstep: 1546.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 06:15:02,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.28 | bwd_microstep: 1285.92 | bwd_inner_microstep: 1285.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 06:15:05,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.53 | bwd_microstep: 1643.28 | bwd_inner_microstep: 1643.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 06:15:07,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.99 | bwd_microstep: 1628.41 | bwd_inner_microstep: 1628.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2498
[2024-06-10 06:15:08,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.93 | bwd_microstep: 1027.29 | bwd_inner_microstep: 1027.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687
[2024-06-10 06:15:10,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.11 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445
[2024-06-10 06:15:12,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.41 | bwd_microstep: 1219.95 | bwd_inner_microstep: 1219.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2086
[2024-06-10 06:15:13,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.49 | bwd_microstep: 1015.42 | bwd_inner_microstep: 1015.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 06:15:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1386.23 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 06:15:17,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1447.69 | bwd_inner_microstep: 1447.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 06:15:19,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.57 | bwd_microstep: 1500.98 | bwd_inner_microstep: 1500.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3425
[2024-06-10 06:15:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.53 | bwd_microstep: 1542.60 | bwd_inner_microstep: 1542.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2086
[2024-06-10 06:15:23,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.23 | bwd_microstep: 919.16 | bwd_inner_microstep: 919.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-10 06:15:25,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.42 | bwd_microstep: 1530.31 | bwd_inner_microstep: 1530.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 06:15:27,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1358.95 | bwd_inner_microstep: 1358.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 06:15:29,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1361.15 | bwd_inner_microstep: 1361.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3896
[2024-06-10 06:15:31,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.49 | bwd_microstep: 1692.52 | bwd_inner_microstep: 1692.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3777
[2024-06-10 06:15:33,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.39 | bwd_microstep: 1715.88 | bwd_inner_microstep: 1715.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 06:15:35,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.53 | bwd_microstep: 1409.95 | bwd_inner_microstep: 1409.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 06:15:37,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1513.08 | bwd_inner_microstep: 1513.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430
[2024-06-10 06:15:39,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 06:15:41,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.60 | bwd_inner_microstep: 1378.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 06:15:43,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1452.74 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3707
[2024-06-10 06:15:45,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.46 | bwd_microstep: 1730.11 | bwd_inner_microstep: 1730.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 06:15:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.40 | bwd_microstep: 1507.26 | bwd_inner_microstep: 1507.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 06:15:50,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.15 | bwd_microstep: 1531.27 | bwd_inner_microstep: 1531.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 06:15:52,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.75 | bwd_microstep: 1425.91 | bwd_inner_microstep: 1425.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 06:15:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 06:15:56,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.35 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 06:15:56,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1471.54 | bwd_inner_microstep: 1338.10 | bwd_allreduce_microstep: 133.40 | step_microstep: 38.96
[2024-06-10 06:15:56,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16826.35 | bwd: 45234.79 | bwd_inner: 45100.39 | bwd_allreduce: 133.67 | step: 40.64
�▊        | 316/1726 [5:32:17<23:48:18, 60.78s/it]
 18%|█▊        | 317/1726 [5:33:19<23:52:58, 61.02s/it]


 18%|█▊        | 317/1726 [5:33:19<23:52:58, 61.02s/it]
 18%|█▊        | 318/1726 [5:34:22<24:04:10, 61.54s/it]


 18%|█▊        | 318/1726 [5:34:22<24:04:10, 61.54s/it]
 18%|█▊        | 319/1726 [5:35:26<24:19:52, 62.25s/it]


 18%|█▊        | 319/1726 [5:35:26<24:19:52, 62.25s/it]
 19%|█▊        | 320/1726 [5:36:27<24:09:07, 61.84s/it]


 19%|█▊        | 320/1726 [5:36:27<24:09:07, 61.84s/it]
 19%|█▊        | 321/1726 [5:37:30<24:18:53, 62.30s/it]


 19%|█▊        | 321/1726 [5:37:30<24:18:53, 62.30s/it]
 19%|█▊        | 322/1726 [5:38:32<24:18:44, 62.34s/it]
                          {'loss': 1.3029, 'learning_rate': 3.7486932322891646e-05, 'epoch': 0.19}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 06:15:57,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.68 | bwd_microstep: 792.03 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4165
[2024-06-10 06:15:59,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.48 | bwd_microstep: 1651.48 | bwd_inner_microstep: 1651.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 06:16:01,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1379.66 | bwd_inner_microstep: 1379.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 06:16:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.97 | bwd_microstep: 1357.28 | bwd_inner_microstep: 1357.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 06:16:05,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1285.33 | bwd_inner_microstep: 1285.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 06:16:06,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 06:16:08,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1288.71 | bwd_inner_microstep: 1288.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 06:16:10,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.26 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 06:16:12,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1376.23 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 06:16:14,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.06 | bwd_microstep: 1279.65 | bwd_inner_microstep: 1279.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-10 06:16:15,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.74 | bwd_microstep: 917.00 | bwd_inner_microstep: 916.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 06:16:17,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1346.89 | bwd_inner_microstep: 1346.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499
[2024-06-10 06:16:19,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1412.68 | bwd_inner_microstep: 1412.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 06:16:20,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.45 | bwd_microstep: 895.33 | bwd_inner_microstep: 895.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3648
[2024-06-10 06:16:22,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.37 | bwd_microstep: 1650.21 | bwd_inner_microstep: 1650.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-10 06:16:24,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.38 | bwd_microstep: 1344.19 | bwd_inner_microstep: 1344.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 06:16:26,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1286.69 | bwd_inner_microstep: 1286.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 06:16:28,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1558.24 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823
[2024-06-10 06:16:30,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.99 | bwd_microstep: 1518.58 | bwd_inner_microstep: 1518.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 06:16:32,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.01 | bwd_microstep: 1259.90 | bwd_inner_microstep: 1259.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091
[2024-06-10 06:16:33,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.82 | bwd_microstep: 921.15 | bwd_inner_microstep: 921.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3720
[2024-06-10 06:16:35,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1498.15 | bwd_inner_microstep: 1498.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822
[2024-06-10 06:16:38,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.85 | bwd_microstep: 1688.91 | bwd_inner_microstep: 1688.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1120
[2024-06-10 06:16:38,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 181.30 | bwd_microstep: 473.38 | bwd_inner_microstep: 473.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 06:16:40,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.81 | bwd_microstep: 1404.77 | bwd_inner_microstep: 1404.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 06:16:42,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.93 | bwd_microstep: 1186.34 | bwd_inner_microstep: 1186.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 06:16:44,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1455.15 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3448
[2024-06-10 06:16:46,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.95 | bwd_microstep: 1219.96 | bwd_inner_microstep: 1219.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813
[2024-06-10 06:16:48,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.75 | bwd_microstep: 1619.77 | bwd_inner_microstep: 1619.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 06:16:50,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1402.71 | bwd_inner_microstep: 1402.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 06:16:52,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.10 | bwd_microstep: 1718.93 | bwd_inner_microstep: 1718.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 06:16:58,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.39 | optimizer_step: 6.58
[2024-06-10 06:16:58,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.16 | bwd_microstep: 5115.49 | bwd_inner_microstep: 1698.78 | bwd_allreduce_microstep: 3416.63 | step_microstep: 39.61
[2024-06-10 06:16:58,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15893.86 | bwd: 45970.50 | bwd_inner: 42552.83 | bwd_allreduce: 3416.92 | step: 41.24
{'loss': 1.3184, 'learning_rate': 3.746868623352325e-05, 'epoch': 0.19}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 06:17:00,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.46 | bwd_microstep: 1471.96 | bwd_inner_microstep: 1471.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907
[2024-06-10 06:17:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.25 | bwd_microstep: 1483.81 | bwd_inner_microstep: 1483.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 06:17:04,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.15 | bwd_microstep: 1483.03 | bwd_inner_microstep: 1483.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 06:17:06,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1902
[2024-06-10 06:17:07,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.31 | bwd_microstep: 715.56 | bwd_inner_microstep: 715.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 06:17:09,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1281.96 | bwd_inner_microstep: 1281.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588
[2024-06-10 06:17:10,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1307.89 | bwd_inner_microstep: 1307.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 06:17:12,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.59 | bwd_microstep: 1157.03 | bwd_inner_microstep: 1157.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 06:17:14,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1317.12 | bwd_inner_microstep: 1317.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3666
[2024-06-10 06:17:16,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1446.83 | bwd_inner_microstep: 1446.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 06:17:18,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.07 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 06:17:20,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1353.40 | bwd_inner_microstep: 1353.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 06:17:21,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.09 | bwd_microstep: 794.29 | bwd_inner_microstep: 794.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836
[2024-06-10 06:17:23,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.31 | bwd_microstep: 1752.87 | bwd_inner_microstep: 1752.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3421
[2024-06-10 06:17:25,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.37 | bwd_microstep: 1210.86 | bwd_inner_microstep: 1210.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 06:17:27,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1490.07 | bwd_inner_microstep: 1490.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-10 06:17:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1429.23 | bwd_inner_microstep: 1429.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 06:17:31,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1422.04 | bwd_inner_microstep: 1422.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 06:17:33,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.01 | bwd_microstep: 1482.96 | bwd_inner_microstep: 1482.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 06:17:35,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.27 | bwd_microstep: 1373.06 | bwd_inner_microstep: 1373.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3598
[2024-06-10 06:17:37,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.87 | bwd_microstep: 1568.72 | bwd_inner_microstep: 1568.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 06:17:39,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1294.82 | bwd_inner_microstep: 1294.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 06:17:41,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1405.70 | bwd_inner_microstep: 1405.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 06:17:42,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 1313.14 | bwd_inner_microstep: 1313.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 06:17:44,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.21 | bwd_microstep: 1346.16 | bwd_inner_microstep: 1346.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 06:17:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1458.42 | bwd_inner_microstep: 1458.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554
[2024-06-10 06:17:48,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.86 | bwd_microstep: 1335.03 | bwd_inner_microstep: 1335.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 06:17:50,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.45 | bwd_microstep: 1194.60 | bwd_inner_microstep: 1194.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 06:17:51,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 802.68 | bwd_inner_microstep: 802.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869
[2024-06-10 06:17:53,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.97 | bwd_microstep: 1667.88 | bwd_inner_microstep: 1667.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 06:17:55,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.99 | bwd_microstep: 1289.45 | bwd_inner_microstep: 1289.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 06:17:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 06:17:59,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 3389.42 | bwd_inner_microstep: 1566.94 | bwd_allreduce_microstep: 1822.43 | step_microstep: 38.98
[2024-06-10 06:17:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16091.83 | bwd: 44797.82 | bwd_inner: 42974.46 | bwd_allreduce: 1822.65 | step: 40.59
{'loss': 1.3123, 'learning_rate': 3.745037861953274e-05, 'epoch': 0.19}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 06:18:01,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.23 | bwd_microstep: 1332.10 | bwd_inner_microstep: 1331.93 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903
[2024-06-10 06:18:03,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1585.28 | bwd_inner_microstep: 1585.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 06:18:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.88 | bwd_microstep: 1309.53 | bwd_inner_microstep: 1309.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 06:18:07,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1348.16 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2269
[2024-06-10 06:18:08,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.24 | bwd_microstep: 904.74 | bwd_inner_microstep: 904.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474
[2024-06-10 06:18:10,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.96 | bwd_microstep: 1246.45 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1869
[2024-06-10 06:18:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.10 | bwd_microstep: 744.52 | bwd_inner_microstep: 744.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 06:18:13,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.54 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 06:18:15,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1398.00 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 06:18:16,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.60 | bwd_microstep: 1254.24 | bwd_inner_microstep: 1254.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 06:18:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.37 | bwd_microstep: 703.97 | bwd_inner_microstep: 703.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 06:18:19,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1401.78 | bwd_inner_microstep: 1401.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 06:18:20,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.68 | bwd_microstep: 715.53 | bwd_inner_microstep: 715.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 06:18:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.04 | bwd_microstep: 893.54 | bwd_inner_microstep: 893.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2762
[2024-06-10 06:18:23,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.55 | bwd_microstep: 1238.74 | bwd_inner_microstep: 1238.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158
[2024-06-10 06:18:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.76 | bwd_microstep: 949.22 | bwd_inner_microstep: 949.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 06:18:27,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1421.26 | bwd_inner_microstep: 1421.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-10 06:18:28,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1332.56 | bwd_inner_microstep: 1332.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 06:18:30,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1410.55 | bwd_inner_microstep: 1410.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 06:18:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1282.97 | bwd_inner_microstep: 1282.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 06:18:34,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.52 | bwd_microstep: 1254.68 | bwd_inner_microstep: 1254.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461
[2024-06-10 06:18:36,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.26 | bwd_microstep: 1443.54 | bwd_inner_microstep: 1443.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 06:18:38,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1398.99 | bwd_inner_microstep: 1398.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 06:18:40,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.75 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 06:18:42,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.87 | bwd_microstep: 1544.28 | bwd_inner_microstep: 1544.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 06:18:44,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.65 | bwd_microstep: 1637.30 | bwd_inner_microstep: 1637.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 06:18:46,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.20 | bwd_microstep: 1282.64 | bwd_inner_microstep: 1282.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 06:18:48,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1341.08 | bwd_inner_microstep: 1341.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 06:18:50,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1511.23 | bwd_inner_microstep: 1511.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 06:18:52,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3424
[2024-06-10 06:18:53,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1298.93 | bwd_inner_microstep: 1298.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577
[2024-06-10 06:19:00,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.38 | optimizer_step: 6.58
[2024-06-10 06:19:00,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 6102.93 | bwd_inner_microstep: 1530.22 | bwd_allreduce_microstep: 4572.64 | step_microstep: 39.73
[2024-06-10 06:19:00,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15325.66 | bwd: 45412.93 | bwd_inner: 40839.21 | bwd_allreduce: 4572.95 | step: 41.37
{'loss': 1.3225, 'learning_rate': 3.743200954539945e-05, 'epoch': 0.19}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 06:19:02,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.10 | bwd_microstep: 1404.55 | bwd_inner_microstep: 1404.38 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4117
[2024-06-10 06:19:04,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.85 | bwd_microstep: 1734.09 | bwd_inner_microstep: 1734.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 06:19:07,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.71 | bwd_microstep: 1479.98 | bwd_inner_microstep: 1479.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2308
[2024-06-10 06:19:08,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.27 | bwd_microstep: 790.64 | bwd_inner_microstep: 790.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 06:19:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1382.26 | bwd_inner_microstep: 1382.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 06:19:11,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 06:19:13,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 06:19:15,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1248.23 | bwd_inner_microstep: 1248.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 06:19:17,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1248.12 | bwd_inner_microstep: 1248.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 06:19:19,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1310.53 | bwd_inner_microstep: 1310.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 06:19:20,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.87 | bwd_microstep: 1285.65 | bwd_inner_microstep: 1285.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 06:19:23,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.62 | bwd_microstep: 1616.24 | bwd_inner_microstep: 1616.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657
[2024-06-10 06:19:25,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1474.44 | bwd_inner_microstep: 1474.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-10 06:19:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.51 | bwd_microstep: 1455.89 | bwd_inner_microstep: 1455.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512
[2024-06-10 06:19:29,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1418.74 | bwd_inner_microstep: 1418.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 06:19:30,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1389.04 | bwd_inner_microstep: 1389.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 06:19:32,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1419.13 | bwd_inner_microstep: 1419.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 06:19:34,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1277.72 | bwd_inner_microstep: 1277.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 06:19:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1494.10 | bwd_inner_microstep: 1494.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-10 06:19:38,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1348.64 | bwd_inner_microstep: 1348.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 06:19:40,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.55 | bwd_microstep: 1326.17 | bwd_inner_microstep: 1326.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3480
[2024-06-10 06:19:42,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1429.22 | bwd_inner_microstep: 1429.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 06:19:44,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1408.49 | bwd_inner_microstep: 1408.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 06:19:46,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.52 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2285
[2024-06-10 06:19:47,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.76 | bwd_microstep: 1072.24 | bwd_inner_microstep: 1072.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3814
[2024-06-10 06:19:50,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.73 | bwd_microstep: 1506.46 | bwd_inner_microstep: 1506.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 06:19:51,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1416.66 | bwd_inner_microstep: 1416.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452
[2024-06-10 06:19:53,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1384.81 | bwd_inner_microstep: 1384.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 06:19:55,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1303.80 | bwd_inner_microstep: 1303.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-10 06:19:57,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.47 | bwd_microstep: 956.45 | bwd_inner_microstep: 956.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3570
[2024-06-10 06:19:59,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.63 | bwd_microstep: 1478.11 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572
[2024-06-10 06:20:02,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 06:20:02,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 2447.17 | bwd_inner_microstep: 1734.29 | bwd_allreduce_microstep: 712.83 | step_microstep: 38.69
[2024-06-10 06:20:02,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16460.14 | bwd: 44720.62 | bwd_inner: 44006.73 | bwd_allreduce: 713.14 | step: 40.39
{'loss': 1.2968, 'learning_rate': 3.7413579075819166e-05, 'epoch': 0.19}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522
[2024-06-10 06:20:04,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.16 | bwd_microstep: 1322.29 | bwd_inner_microstep: 1322.20 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 06:20:05,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 799.49 | bwd_inner_microstep: 799.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3995
[2024-06-10 06:20:07,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.22 | bwd_microstep: 1656.19 | bwd_inner_microstep: 1656.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4289
[2024-06-10 06:20:09,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.13 | bwd_microstep: 1625.53 | bwd_inner_microstep: 1625.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 06:20:11,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1488.93 | bwd_inner_microstep: 1488.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-10 06:20:12,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.11 | bwd_microstep: 815.72 | bwd_inner_microstep: 815.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 06:20:14,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1251.61 | bwd_inner_microstep: 1251.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2071
[2024-06-10 06:20:15,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.02 | bwd_microstep: 881.56 | bwd_inner_microstep: 881.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 06:20:16,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 796.09 | bwd_inner_microstep: 796.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 06:20:19,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1533.48 | bwd_inner_microstep: 1533.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 06:20:20,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 06:20:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.26 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-10 06:20:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.95 | bwd_microstep: 896.09 | bwd_inner_microstep: 896.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947
[2024-06-10 06:20:25,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.51 | bwd_microstep: 824.12 | bwd_inner_microstep: 824.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105
[2024-06-10 06:20:26,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.77 | bwd_microstep: 921.85 | bwd_inner_microstep: 921.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 06:20:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.07 | bwd_microstep: 1523.12 | bwd_inner_microstep: 1523.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3624
[2024-06-10 06:20:30,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.96 | bwd_microstep: 1277.58 | bwd_inner_microstep: 1277.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 06:20:32,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.37 | bwd_microstep: 1517.94 | bwd_inner_microstep: 1517.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 06:20:34,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 06:20:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1396.77 | bwd_inner_microstep: 1396.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003
[2024-06-10 06:20:37,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.60 | bwd_microstep: 900.26 | bwd_inner_microstep: 900.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 06:20:39,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1475.13 | bwd_inner_microstep: 1475.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 06:20:41,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1257.77 | bwd_inner_microstep: 1257.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 06:20:42,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.91 | bwd_microstep: 805.98 | bwd_inner_microstep: 805.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462
[2024-06-10 06:20:44,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.78 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 06:20:46,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1183.97 | bwd_inner_microstep: 1183.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 06:20:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.59 | bwd_microstep: 1294.73 | bwd_inner_microstep: 1294.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 06:20:49,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.60 | bwd_microstep: 1345.43 | bwd_inner_microstep: 1345.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441
[2024-06-10 06:20:51,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1376.91 | bwd_inner_microstep: 1376.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 06:20:53,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1506.11 | bwd_inner_microstep: 1506.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3743
[2024-06-10 06:20:56,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.23 | bwd_microstep: 1737.89 | bwd_inner_microstep: 1737.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-10 06:21:03,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.26 | optimizer_step: 6.58
[2024-06-10 06:21:03,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.32 | bwd_microstep: 7021.06 | bwd_inner_microstep: 1753.42 | bwd_allreduce_microstep: 5267.59 | step_microstep: 39.01
[2024-06-10 06:21:03,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15209.54 | bwd: 45980.10 | bwd_inner: 40711.52 | bwd_allreduce: 5267.86 | step: 40.78
{'loss': 1.3274, 'learning_rate': 3.73950872757039e-05, 'epoch': 0.19}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 06:21:05,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1439.26 | bwd_inner_microstep: 1439.18 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 06:21:07,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1505.50 | bwd_inner_microstep: 1505.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4281
[2024-06-10 06:21:10,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.17 | bwd_microstep: 1769.12 | bwd_inner_microstep: 1769.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 06:21:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.05 | bwd_microstep: 1654.91 | bwd_inner_microstep: 1654.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 06:21:14,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.33 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 06:21:16,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1284.01 | bwd_inner_microstep: 1283.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 06:21:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1384.90 | bwd_inner_microstep: 1384.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.72
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 06:21:20,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.48 | bwd_microstep: 1532.43 | bwd_inner_microstep: 1532.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 06:21:22,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.50 | bwd_microstep: 1345.92 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 06:21:24,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1284.20 | bwd_inner_microstep: 1284.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 06:21:25,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1394.86 | bwd_inner_microstep: 1394.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-10 06:21:27,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1413.10 | bwd_inner_microstep: 1413.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 06:21:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.24 | bwd_microstep: 1479.05 | bwd_inner_microstep: 1479.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 06:21:31,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.86 | bwd_microstep: 1262.65 | bwd_inner_microstep: 1262.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 06:21:33,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1481.70 | bwd_inner_microstep: 1481.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-10 06:21:35,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3571
[2024-06-10 06:21:37,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1461.00 | bwd_inner_microstep: 1460.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 06:21:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1485.47 | bwd_inner_microstep: 1485.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 06:21:41,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1391.59 | bwd_inner_microstep: 1391.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529
[2024-06-10 06:21:43,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1324.96 | bwd_inner_microstep: 1324.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932
[2024-06-10 06:21:44,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.02 | bwd_microstep: 700.12 | bwd_inner_microstep: 700.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2403
[2024-06-10 06:21:45,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.30 | bwd_microstep: 843.17 | bwd_inner_microstep: 843.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 06:21:47,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.84 | bwd_microstep: 1523.48 | bwd_inner_microstep: 1523.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 06:21:49,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1502.35 | bwd_inner_microstep: 1502.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2116
[2024-06-10 06:21:51,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.05 | bwd_microstep: 972.79 | bwd_inner_microstep: 972.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068
[2024-06-10 06:21:52,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.75 | bwd_microstep: 915.30 | bwd_inner_microstep: 915.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 06:21:54,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1396.28 | bwd_inner_microstep: 1396.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 06:21:56,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1420.18 | bwd_inner_microstep: 1420.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3755
[2024-06-10 06:21:58,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 06:22:00,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.56 | bwd_microstep: 1497.65 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 06:22:02,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1416.39 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3730
[2024-06-10 06:22:04,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.65
[2024-06-10 06:22:04,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.45 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1473.97 | bwd_allreduce_microstep: 7.61 | step_microstep: 38.25
[2024-06-10 06:22:04,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16441.96 | bwd: 43955.78 | bwd_inner: 43947.20 | bwd_allreduce: 7.88 | step: 41.64


 19%|█▊        | 322/1726 [5:38:32<24:18:44, 62.34s/it]
 19%|█▊        | 323/1726 [5:39:35<24:16:47, 62.30s/it]


 19%|█▊        | 323/1726 [5:39:35<24:16:47, 62.30s/it]
 19%|█▉        | 324/1726 [5:40:36<24:08:17, 61.98s/it]


 19%|█▉        | 324/1726 [5:40:36<24:08:17, 61.98s/it]
 19%|█▉        | 325/1726 [5:41:37<24:01:00, 61.71s/it]


 19%|█▉        | 325/1726 [5:41:37<24:01:00, 61.71s/it]
 19%|█▉        | 326/1726 [5:42:38<23:58:46, 61.66s/it]


 19%|█▉        | 326/1726 [5:42:38<23:58:46, 61.66s/it]
 19%|█▉        | 327/1726 [5:43:40<23:56:56, 61.63s/it]


 19%|█▉        | 327/1726 [5:43:40<23:56:56, 61.63s/it]
 19%|█▉        | 328/1726 [5:44:41<23:49:52,{'loss': 1.3188, 'learning_rate': 3.737653421018168e-05, 'epoch': 0.19}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 06:22:06,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.57 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 06:22:08,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.94 | bwd_microstep: 1251.50 | bwd_inner_microstep: 1251.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-10 06:22:10,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.88 | bwd_microstep: 1658.46 | bwd_inner_microstep: 1658.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 06:22:12,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.23 | bwd_microstep: 1278.72 | bwd_inner_microstep: 1278.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 06:22:14,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.58 | bwd_microstep: 1540.09 | bwd_inner_microstep: 1540.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 06:22:15,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.24 | bwd_microstep: 796.62 | bwd_inner_microstep: 796.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 06:22:17,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.67 | bwd_microstep: 1550.19 | bwd_inner_microstep: 1550.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 06:22:19,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.89 | bwd_microstep: 1552.59 | bwd_inner_microstep: 1552.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 06:22:21,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.66 | bwd_microstep: 1148.86 | bwd_inner_microstep: 1148.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 06:22:23,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1289.69 | bwd_inner_microstep: 1289.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2174
[2024-06-10 06:22:24,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.45 | bwd_microstep: 884.96 | bwd_inner_microstep: 884.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 06:22:26,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.29 | bwd_microstep: 1397.31 | bwd_inner_microstep: 1397.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 06:22:27,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.75 | bwd_microstep: 788.30 | bwd_inner_microstep: 788.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988
[2024-06-10 06:22:28,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.17 | bwd_microstep: 895.15 | bwd_inner_microstep: 895.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 06:22:30,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.38 | bwd_microstep: 1476.34 | bwd_inner_microstep: 1476.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3658
[2024-06-10 06:22:32,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.50 | bwd_microstep: 1562.77 | bwd_inner_microstep: 1562.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 06:22:34,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.57 | bwd_microstep: 1617.67 | bwd_inner_microstep: 1617.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 06:22:37,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1514.41 | bwd_inner_microstep: 1514.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2169
[2024-06-10 06:22:38,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.59 | bwd_microstep: 853.81 | bwd_inner_microstep: 853.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 06:22:39,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 801.09 | bwd_inner_microstep: 801.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 06:22:41,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1637.96 | bwd_inner_microstep: 1637.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3838
[2024-06-10 06:22:43,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1266.49 | bwd_inner_microstep: 1266.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465
[2024-06-10 06:22:45,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1216.26 | bwd_inner_microstep: 1216.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 06:22:47,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.62 | bwd_microstep: 1403.90 | bwd_inner_microstep: 1403.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 06:22:48,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.81 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284
[2024-06-10 06:22:50,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.22 | bwd_microstep: 910.23 | bwd_inner_microstep: 910.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 06:22:52,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1398.66 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 06:22:53,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1296.83 | bwd_inner_microstep: 1296.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 06:22:55,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1257.24 | bwd_inner_microstep: 1257.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2269
[2024-06-10 06:22:56,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.50 | bwd_microstep: 842.04 | bwd_inner_microstep: 842.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3582
[2024-06-10 06:22:59,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.68 | bwd_microstep: 1697.27 | bwd_inner_microstep: 1697.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 06:23:04,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.31 | optimizer_step: 6.62
[2024-06-10 06:23:04,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.62 | bwd_microstep: 5266.21 | bwd_inner_microstep: 1585.47 | bwd_allreduce_microstep: 3680.67 | step_microstep: 39.46
[2024-06-10 06:23:04,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15368.93 | bwd: 44721.96 | bwd_inner: 41040.29 | bwd_allreduce: 3680.96 | step: 41.16
{'loss': 1.3459, 'learning_rate': 3.7357919944596305e-05, 'epoch': 0.19}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 06:23:06,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.51 | bwd_microstep: 1391.79 | bwd_inner_microstep: 1391.71 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4321
[2024-06-10 06:23:09,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.61 | bwd_microstep: 1699.35 | bwd_inner_microstep: 1699.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2404
[2024-06-10 06:23:10,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.20 | bwd_microstep: 1002.55 | bwd_inner_microstep: 1002.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270
[2024-06-10 06:23:11,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.06 | bwd_microstep: 871.48 | bwd_inner_microstep: 871.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 06:23:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1245.23 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 06:23:15,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1247.16 | bwd_inner_microstep: 1247.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426
[2024-06-10 06:23:16,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.20 | bwd_microstep: 1185.72 | bwd_inner_microstep: 1185.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3721
[2024-06-10 06:23:18,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1268.11 | bwd_inner_microstep: 1268.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 06:23:20,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1389.10 | bwd_inner_microstep: 1389.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 06:23:22,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.04 | bwd_microstep: 1484.47 | bwd_inner_microstep: 1484.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 06:23:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1480.93 | bwd_inner_microstep: 1480.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 06:23:26,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1982
[2024-06-10 06:23:27,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.37 | bwd_microstep: 856.93 | bwd_inner_microstep: 856.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 06:23:29,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1488.75 | bwd_inner_microstep: 1488.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 06:23:31,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1374.74 | bwd_inner_microstep: 1374.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3890
[2024-06-10 06:23:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.01 | bwd_microstep: 1890.63 | bwd_inner_microstep: 1890.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 06:23:36,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.16 | bwd_microstep: 1276.84 | bwd_inner_microstep: 1276.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3393
[2024-06-10 06:23:38,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1437.46 | bwd_inner_microstep: 1437.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 06:23:40,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.21 | bwd_microstep: 1503.39 | bwd_inner_microstep: 1503.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3534
[2024-06-10 06:23:42,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.71 | bwd_microstep: 1589.31 | bwd_inner_microstep: 1589.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3442
[2024-06-10 06:23:44,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.47 | bwd_microstep: 1411.01 | bwd_inner_microstep: 1410.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 06:23:45,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.46 | bwd_microstep: 972.73 | bwd_inner_microstep: 972.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3444
[2024-06-10 06:23:47,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.56 | bwd_microstep: 1189.46 | bwd_inner_microstep: 1189.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 06:23:49,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1463.93 | bwd_inner_microstep: 1463.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 06:23:51,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1262.53 | bwd_inner_microstep: 1262.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 06:23:52,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.64 | bwd_microstep: 1183.65 | bwd_inner_microstep: 1183.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 06:23:54,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1255.50 | bwd_inner_microstep: 1255.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 06:23:56,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1406.08 | bwd_inner_microstep: 1406.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 06:23:58,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.50 | bwd_microstep: 1606.59 | bwd_inner_microstep: 1606.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 06:24:00,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1499.76 | bwd_inner_microstep: 1499.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 06:24:02,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1449.68 | bwd_inner_microstep: 1449.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588
[2024-06-10 06:24:05,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 06:24:05,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 2044.71 | bwd_inner_microstep: 1717.96 | bwd_allreduce_microstep: 326.70 | step_microstep: 38.43
[2024-06-10 06:24:05,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16247.56 | bwd: 43811.63 | bwd_inner: 43483.96 | bwd_allreduce: 326.96 | step: 40.05
{'loss': 1.2818, 'learning_rate': 3.733924454450711e-05, 'epoch': 0.19}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 06:24:07,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1476.72 | bwd_inner_microstep: 1476.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 06:24:09,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1313.11 | bwd_inner_microstep: 1313.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 06:24:10,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.41 | bwd_microstep: 1250.22 | bwd_inner_microstep: 1250.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 06:24:12,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.00 | bwd_microstep: 1356.75 | bwd_inner_microstep: 1356.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-10 06:24:14,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.38 | bwd_microstep: 1448.08 | bwd_inner_microstep: 1448.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 06:24:16,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1245.07 | bwd_inner_microstep: 1245.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 06:24:18,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.57 | bwd_microstep: 1277.46 | bwd_inner_microstep: 1277.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 06:24:20,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.87 | bwd_microstep: 1253.71 | bwd_inner_microstep: 1253.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096
[2024-06-10 06:24:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 822.80 | bwd_inner_microstep: 822.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 06:24:22,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.20 | bwd_microstep: 1215.47 | bwd_inner_microstep: 1215.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 06:24:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1292.63 | bwd_inner_microstep: 1292.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 06:24:25,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.21 | bwd_microstep: 730.27 | bwd_inner_microstep: 730.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297
[2024-06-10 06:24:26,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.97 | bwd_microstep: 880.90 | bwd_inner_microstep: 880.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3496
[2024-06-10 06:24:29,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 1532.82 | bwd_inner_microstep: 1532.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 06:24:31,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1513.37 | bwd_inner_microstep: 1513.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3609
[2024-06-10 06:24:33,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.84 | bwd_microstep: 1672.80 | bwd_inner_microstep: 1672.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3435
[2024-06-10 06:24:35,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.50 | bwd_microstep: 1218.17 | bwd_inner_microstep: 1218.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 06:24:36,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.38 | bwd_microstep: 1320.49 | bwd_inner_microstep: 1320.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 06:24:38,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 792.85 | bwd_inner_microstep: 792.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 06:24:40,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1561.20 | bwd_inner_microstep: 1561.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 06:24:42,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1401.06 | bwd_inner_microstep: 1401.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 06:24:44,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1415.14 | bwd_inner_microstep: 1415.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 06:24:46,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.54 | bwd_microstep: 1491.81 | bwd_inner_microstep: 1491.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 06:24:47,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1279.75 | bwd_inner_microstep: 1279.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 06:24:49,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.64 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 06:24:51,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.69 | bwd_microstep: 1345.51 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3744
[2024-06-10 06:24:53,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.10 | bwd_microstep: 1598.12 | bwd_inner_microstep: 1598.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 06:24:55,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1388.21 | bwd_inner_microstep: 1388.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-10 06:24:56,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 817.54 | bwd_inner_microstep: 817.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 06:24:58,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1395.73 | bwd_inner_microstep: 1395.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 06:25:00,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1348.79 | bwd_inner_microstep: 1348.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3731
[2024-06-10 06:25:05,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 06:25:05,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.86 | bwd_microstep: 3991.06 | bwd_inner_microstep: 1894.82 | bwd_allreduce_microstep: 2096.18 | step_microstep: 38.87
[2024-06-10 06:25:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15633.80 | bwd: 43894.36 | bwd_inner: 41797.23 | bwd_allreduce: 2096.41 | step: 40.47
{'loss': 1.2829, 'learning_rate': 3.732050807568878e-05, 'epoch': 0.19}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 06:25:06,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1245.57 | bwd_inner_microstep: 1245.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4002
[2024-06-10 06:25:08,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1410.19 | bwd_inner_microstep: 1410.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 06:25:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1380.58 | bwd_inner_microstep: 1380.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 06:25:12,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1463.50 | bwd_inner_microstep: 1463.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 06:25:14,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1281.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 06:25:16,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.13 | bwd_microstep: 1534.44 | bwd_inner_microstep: 1534.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753
[2024-06-10 06:25:18,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.65 | bwd_microstep: 1469.73 | bwd_inner_microstep: 1469.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1868
[2024-06-10 06:25:19,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.17 | bwd_microstep: 744.10 | bwd_inner_microstep: 744.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432
[2024-06-10 06:25:21,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.35 | bwd_microstep: 1188.51 | bwd_inner_microstep: 1188.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 06:25:23,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.73 | bwd_microstep: 1294.82 | bwd_inner_microstep: 1294.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 06:25:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.79 | bwd_microstep: 1322.13 | bwd_inner_microstep: 1322.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3689
[2024-06-10 06:25:27,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.38 | bwd_microstep: 1569.45 | bwd_inner_microstep: 1569.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 06:25:29,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.22 | bwd_microstep: 1490.24 | bwd_inner_microstep: 1490.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 06:25:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.72 | bwd_microstep: 1437.94 | bwd_inner_microstep: 1437.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2146
[2024-06-10 06:25:32,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.03 | bwd_microstep: 757.92 | bwd_inner_microstep: 757.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 06:25:34,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.51 | bwd_microstep: 1340.00 | bwd_inner_microstep: 1339.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 06:25:36,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1516.09 | bwd_inner_microstep: 1516.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424
[2024-06-10 06:25:38,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.97 | bwd_microstep: 1409.97 | bwd_inner_microstep: 1409.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150
[2024-06-10 06:25:39,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.07 | bwd_microstep: 851.27 | bwd_inner_microstep: 851.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 06:25:41,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1435.32 | bwd_inner_microstep: 1435.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 06:25:43,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1514.23 | bwd_inner_microstep: 1514.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 06:25:45,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.20 | bwd_microstep: 1437.00 | bwd_inner_microstep: 1436.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2084
[2024-06-10 06:25:46,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.31 | bwd_microstep: 921.81 | bwd_inner_microstep: 921.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 06:25:48,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.44 | bwd_microstep: 1191.26 | bwd_inner_microstep: 1191.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 06:25:50,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.43 | bwd_microstep: 1287.43 | bwd_inner_microstep: 1287.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 06:25:52,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1391.94 | bwd_inner_microstep: 1391.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2039
[2024-06-10 06:25:53,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.34 | bwd_microstep: 969.58 | bwd_inner_microstep: 969.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560
[2024-06-10 06:25:55,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1426.46 | bwd_inner_microstep: 1426.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 06:25:57,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.51 | bwd_microstep: 1257.49 | bwd_inner_microstep: 1257.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3600
[2024-06-10 06:25:59,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.54 | bwd_microstep: 1450.07 | bwd_inner_microstep: 1450.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3430
[2024-06-10 06:26:00,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1313.08 | bwd_inner_microstep: 1313.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 06:26:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.38 | optimizer_step: 6.59
[2024-06-10 06:26:08,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.52 | bwd_microstep: 6483.70 | bwd_inner_microstep: 1659.02 | bwd_allreduce_microstep: 4824.62 | step_microstep: 39.64
[2024-06-10 06:26:08,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15710.66 | bwd: 46787.86 | bwd_inner: 41962.31 | bwd_allreduce: 4824.86 | step: 41.37
{'loss': 1.285, 'learning_rate': 3.730171060413103e-05, 'epoch': 0.19}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 06:26:09,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.58 | bwd_microstep: 778.32 | bwd_inner_microstep: 778.20 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 06:26:11,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.21 | bwd_microstep: 1374.86 | bwd_inner_microstep: 1374.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 06:26:13,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.87 | bwd_microstep: 1476.63 | bwd_inner_microstep: 1476.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 06:26:14,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.81 | bwd_microstep: 1309.09 | bwd_inner_microstep: 1309.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3531
[2024-06-10 06:26:16,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.94 | bwd_microstep: 1338.92 | bwd_inner_microstep: 1338.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 06:26:18,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 06:26:19,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 699.35 | bwd_inner_microstep: 699.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1959
[2024-06-10 06:26:20,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.51 | bwd_microstep: 733.15 | bwd_inner_microstep: 733.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 06:26:22,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1523.93 | bwd_inner_microstep: 1523.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 06:26:24,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.80 | bwd_microstep: 1286.96 | bwd_inner_microstep: 1286.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 06:26:26,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1296.44 | bwd_inner_microstep: 1296.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 06:26:28,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1287.99 | bwd_inner_microstep: 1287.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2099
[2024-06-10 06:26:29,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.00 | bwd_microstep: 1013.08 | bwd_inner_microstep: 1013.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 06:26:31,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.41 | bwd_microstep: 1603.02 | bwd_inner_microstep: 1602.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3427
[2024-06-10 06:26:33,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.47 | bwd_microstep: 1542.02 | bwd_inner_microstep: 1541.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 06:26:35,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.74 | bwd_microstep: 1297.47 | bwd_inner_microstep: 1297.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 06:26:37,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1415.65 | bwd_inner_microstep: 1415.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 06:26:39,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1459.82 | bwd_inner_microstep: 1459.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 06:26:41,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833
[2024-06-10 06:26:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1461.33 | bwd_inner_microstep: 1461.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3614
[2024-06-10 06:26:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.27 | bwd_microstep: 1569.76 | bwd_inner_microstep: 1569.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 06:26:47,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1374.63 | bwd_inner_microstep: 1374.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2989
[2024-06-10 06:26:49,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.17 | bwd_microstep: 1142.69 | bwd_inner_microstep: 1142.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3535
[2024-06-10 06:26:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.82 | bwd_microstep: 1233.63 | bwd_inner_microstep: 1233.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 06:26:52,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.86 | bwd_microstep: 1441.94 | bwd_inner_microstep: 1441.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 06:26:55,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1518.66 | bwd_inner_microstep: 1518.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819
[2024-06-10 06:26:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.82 | bwd_microstep: 1682.36 | bwd_inner_microstep: 1682.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3532
[2024-06-10 06:26:59,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1585.36 | bwd_inner_microstep: 1585.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-10 06:27:01,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1306.18 | bwd_inner_microstep: 1306.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3692
[2024-06-10 06:27:03,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.01 | bwd_microstep: 1791.77 | bwd_inner_microstep: 1791.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2397
[2024-06-10 06:27:05,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.47 | bwd_microstep: 1030.92 | bwd_inner_microstep: 1030.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 06:27:10,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 06:27:10,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.89 | bwd_microstep: 5082.04 | bwd_inner_microstep: 1638.96 | bwd_allreduce_microstep: 3443.03 | step_microstep: 38.72
[2024-06-10 06:27:10,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.41 | bwd: 46430.82 | bwd_inner: 42986.78 | bwd_allreduce: 3443.31 | step: 40.39
{'loss': 1.2688, 'learning_rate': 3.7282852196038495e-05, 'epoch': 0.19}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2951
[2024-06-10 06:27:12,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.63 | bwd_microstep: 1183.71 | bwd_inner_microstep: 1183.62 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-10 06:27:13,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.33 | bwd_microstep: 787.05 | bwd_inner_microstep: 787.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 06:27:14,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 790.39 | bwd_inner_microstep: 790.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 06:27:16,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1552.65 | bwd_inner_microstep: 1552.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 06:27:18,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1341.09 | bwd_inner_microstep: 1341.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3762
[2024-06-10 06:27:20,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.97 | bwd_microstep: 1307.70 | bwd_inner_microstep: 1307.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 06:27:22,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1393.65 | bwd_inner_microstep: 1393.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1863
[2024-06-10 06:27:23,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.26 | bwd_microstep: 747.02 | bwd_inner_microstep: 746.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 06:27:25,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.93 | bwd_microstep: 1532.16 | bwd_inner_microstep: 1532.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 06:27:27,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1445.15 | bwd_inner_microstep: 1445.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 06:27:29,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.45 | bwd_microstep: 1476.27 | bwd_inner_microstep: 1476.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 06:27:31,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1496.37 | bwd_inner_microstep: 1496.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3492
[2024-06-10 06:27:33,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1535.18 | bwd_inner_microstep: 1535.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 06:27:35,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 06:27:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1294.69 | bwd_inner_microstep: 1294.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-10 06:27:38,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 897.37 | bwd_inner_microstep: 897.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 06:27:39,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.10 | bwd_microstep: 731.14 | bwd_inner_microstep: 731.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 06:27:41,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.66 | bwd_microstep: 1311.79 | bwd_inner_microstep: 1311.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2424
[2024-06-10 06:27:42,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.51 | bwd_microstep: 844.50 | bwd_inner_microstep: 844.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630
[2024-06-10 06:27:44,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1316.59 | bwd_inner_microstep: 1316.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 06:27:46,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1412.20 | bwd_inner_microstep: 1412.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 06:27:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1420.16 | bwd_inner_microstep: 1420.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 06:27:50,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 06:27:52,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1352.80 | bwd_inner_microstep: 1352.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 06:27:54,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.41 | bwd_microstep: 1354.60 | bwd_inner_microstep: 1354.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3735
[2024-06-10 06:27:56,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1470.55 | bwd_inner_microstep: 1470.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 06:27:58,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 06:27:59,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1966
[2024-06-10 06:28:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.59 | bwd_microstep: 734.23 | bwd_inner_microstep: 734.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3742
[2024-06-10 06:28:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1562.79 | bwd_inner_microstep: 1562.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 06:28:04,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.76 | bwd_microstep: 1253.97 | bwd_inner_microstep: 1253.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 06:28:13,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 06:28:13,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 7607.11 | bwd_inner_microstep: 1544.37 | bwd_allreduce_microstep: 6062.67 | step_microstep: 39.35
[2024-06-10 06:28:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15187.01 | bwd: 46630.50 | bwd_inner: 40566.84 | bwd_allreduce: 6062.95 | step: 40.97
 61.37s/it]


 19%|█▉        | 328/1726 [5:44:41<23:49:52, 61.37s/it]
 19%|█▉        | 329/1726 [5:45:41<23:42:27, 61.09s/it]


 19%|█▉        | 329/1726 [5:45:41<23:42:27, 61.09s/it]
 19%|█▉        | 330/1726 [5:46:42<23:36:41, 60.89s/it]


 19%|█▉        | 330/1726 [5:46:42<23:36:41, 60.89s/it]
 19%|█▉        | 331/1726 [5:47:41<23:28:36, 60.59s/it]


 19%|█▉        | 331/1726 [5:47:41<23:28:36, 60.59s/it]
 19%|█▉        | 332/1726 [5:48:44<23:43:24, 61.27s/it]


 19%|█▉        | 332/1726 [5:48:44<23:43:24, 61.27s/it]
 19%|█▉        | 333/1726 [5:49:47<23:53:22, 61.74s/it]


 19%|█▉        | 333/1726 [5:49:47<23:53:22, 61.74s/it]
 19%|█�{'loss': 1.2531, 'learning_rate': 3.726393291783036e-05, 'epoch': 0.19}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3554
[2024-06-10 06:28:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.53 | bwd_microstep: 1590.68 | bwd_inner_microstep: 1590.56 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 06:28:16,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.71 | bwd_microstep: 1241.96 | bwd_inner_microstep: 1241.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 06:28:19,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1550.78 | bwd_inner_microstep: 1550.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 06:28:20,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-10 06:28:23,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.83 | bwd_microstep: 1639.79 | bwd_inner_microstep: 1639.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 06:28:24,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 06:28:26,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.13 | bwd_microstep: 1151.52 | bwd_inner_microstep: 1151.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 06:28:28,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1350.12 | bwd_inner_microstep: 1350.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 06:28:30,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1391.62 | bwd_inner_microstep: 1391.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 06:28:32,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.95 | bwd_microstep: 1391.09 | bwd_inner_microstep: 1391.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2183
[2024-06-10 06:28:33,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.39 | bwd_microstep: 980.73 | bwd_inner_microstep: 980.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2186
[2024-06-10 06:28:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.67 | bwd_microstep: 1050.78 | bwd_inner_microstep: 1050.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 06:28:37,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1489.15 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121
[2024-06-10 06:28:38,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.14 | bwd_microstep: 828.88 | bwd_inner_microstep: 828.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 06:28:40,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.35 | bwd_microstep: 1628.56 | bwd_inner_microstep: 1628.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 06:28:42,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 06:28:44,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.49 | bwd_microstep: 1583.28 | bwd_inner_microstep: 1583.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-10 06:28:45,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 897.85 | bwd_inner_microstep: 897.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 06:28:47,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.02 | bwd_microstep: 1350.62 | bwd_inner_microstep: 1350.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 06:28:49,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.41 | bwd_microstep: 1393.39 | bwd_inner_microstep: 1393.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3677
[2024-06-10 06:28:51,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.43 | bwd_microstep: 1690.75 | bwd_inner_microstep: 1690.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 06:28:54,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1557.78 | bwd_inner_microstep: 1557.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 06:28:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.52 | bwd_microstep: 1188.02 | bwd_inner_microstep: 1187.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 06:28:56,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.67 | bwd_microstep: 698.38 | bwd_inner_microstep: 698.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3664
[2024-06-10 06:28:58,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1457.64 | bwd_inner_microstep: 1457.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558
[2024-06-10 06:29:00,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1560.18 | bwd_inner_microstep: 1560.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 06:29:03,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1653.19 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 06:29:05,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2068
[2024-06-10 06:29:06,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.84 | bwd_microstep: 854.21 | bwd_inner_microstep: 854.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2879
[2024-06-10 06:29:08,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.40 | bwd_microstep: 1181.44 | bwd_inner_microstep: 1181.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3591
[2024-06-10 06:29:10,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.09 | bwd_microstep: 1433.88 | bwd_inner_microstep: 1433.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2295
[2024-06-10 06:29:14,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 06:29:14,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.93 | bwd_microstep: 3620.13 | bwd_inner_microstep: 1147.16 | bwd_allreduce_microstep: 2472.91 | step_microstep: 38.94
[2024-06-10 06:29:14,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15799.84 | bwd: 44940.22 | bwd_inner: 42466.29 | bwd_allreduce: 2473.20 | step: 40.56
{'loss': 1.3125, 'learning_rate': 3.724495283614024e-05, 'epoch': 0.19}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 06:29:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.07 | bwd_microstep: 797.27 | bwd_inner_microstep: 797.13 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 06:29:17,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1273.00 | bwd_inner_microstep: 1272.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3798
[2024-06-10 06:29:19,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.25 | bwd_microstep: 1743.34 | bwd_inner_microstep: 1743.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857
[2024-06-10 06:29:21,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.01 | bwd_microstep: 1660.55 | bwd_inner_microstep: 1660.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 06:29:23,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.63 | bwd_microstep: 1291.65 | bwd_inner_microstep: 1291.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4134
[2024-06-10 06:29:25,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.99 | bwd_microstep: 1738.14 | bwd_inner_microstep: 1738.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-10 06:29:26,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.03 | bwd_microstep: 705.77 | bwd_inner_microstep: 705.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 06:29:27,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 791.60 | bwd_inner_microstep: 791.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895
[2024-06-10 06:29:28,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.57 | bwd_microstep: 715.16 | bwd_inner_microstep: 715.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 06:29:30,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.22 | bwd_microstep: 808.10 | bwd_inner_microstep: 808.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577
[2024-06-10 06:29:31,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1364.36 | bwd_inner_microstep: 1364.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3567
[2024-06-10 06:29:34,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 1526.95 | bwd_inner_microstep: 1526.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499
[2024-06-10 06:29:36,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1584.26 | bwd_inner_microstep: 1584.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3495
[2024-06-10 06:29:38,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.42 | bwd_microstep: 1646.61 | bwd_inner_microstep: 1646.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3549
[2024-06-10 06:29:40,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1588.94 | bwd_inner_microstep: 1588.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4039
[2024-06-10 06:29:42,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.72 | bwd_microstep: 1651.21 | bwd_inner_microstep: 1651.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 06:29:43,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.80 | bwd_microstep: 700.94 | bwd_inner_microstep: 700.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 06:29:46,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.85 | bwd_microstep: 1484.02 | bwd_inner_microstep: 1484.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 06:29:47,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.54 | bwd_microstep: 1284.61 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 06:29:50,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.41 | bwd_microstep: 1618.32 | bwd_inner_microstep: 1618.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947
[2024-06-10 06:29:51,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.51 | bwd_microstep: 730.63 | bwd_inner_microstep: 730.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524
[2024-06-10 06:29:52,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.88 | bwd_microstep: 1198.98 | bwd_inner_microstep: 1198.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 06:29:54,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.60 | bwd_microstep: 1290.62 | bwd_inner_microstep: 1290.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 06:29:56,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1397.98 | bwd_inner_microstep: 1397.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 06:29:58,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1382.29 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 06:29:59,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.90 | bwd_microstep: 915.85 | bwd_inner_microstep: 915.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 06:30:01,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.56 | bwd_microstep: 1377.70 | bwd_inner_microstep: 1377.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3531
[2024-06-10 06:30:03,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.52 | bwd_microstep: 1524.19 | bwd_inner_microstep: 1524.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3617
[2024-06-10 06:30:05,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1644.62 | bwd_inner_microstep: 1644.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 06:30:07,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1506.41 | bwd_inner_microstep: 1506.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819
[2024-06-10 06:30:10,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.19 | bwd_microstep: 1751.34 | bwd_inner_microstep: 1751.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2956
[2024-06-10 06:30:17,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 06:30:17,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.03 | bwd_microstep: 6963.57 | bwd_inner_microstep: 1354.03 | bwd_allreduce_microstep: 5609.49 | step_microstep: 38.85
[2024-06-10 06:30:17,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15659.75 | bwd: 47659.01 | bwd_inner: 42048.50 | bwd_allreduce: 5609.77 | step: 40.52
{'loss': 1.2894, 'learning_rate': 3.722591201781588e-05, 'epoch': 0.19}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 06:30:19,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1431.36 | bwd_inner_microstep: 1431.25 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 06:30:21,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1249.05 | bwd_inner_microstep: 1249.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 06:30:23,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1537.99 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 06:30:25,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1341.81 | bwd_inner_microstep: 1341.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 06:30:27,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.35 | bwd_microstep: 1480.55 | bwd_inner_microstep: 1480.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 06:30:28,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.17 | bwd_microstep: 792.30 | bwd_inner_microstep: 792.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 06:30:29,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 799.17 | bwd_inner_microstep: 799.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 06:30:30,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 801.14 | bwd_inner_microstep: 801.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 06:30:32,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.47 | bwd_microstep: 1163.38 | bwd_inner_microstep: 1163.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686
[2024-06-10 06:30:34,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1328.71 | bwd_inner_microstep: 1328.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 06:30:36,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1427.18 | bwd_inner_microstep: 1427.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 06:30:38,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.30 | bwd_microstep: 1423.33 | bwd_inner_microstep: 1423.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1942
[2024-06-10 06:30:39,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.31 | bwd_microstep: 850.13 | bwd_inner_microstep: 850.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3655
[2024-06-10 06:30:41,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.98 | bwd_microstep: 1719.07 | bwd_inner_microstep: 1719.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3504
[2024-06-10 06:30:43,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.36 | bwd_microstep: 1367.19 | bwd_inner_microstep: 1367.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 06:30:45,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 06:30:47,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.84 | bwd_microstep: 1501.81 | bwd_inner_microstep: 1501.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 06:30:49,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.53 | bwd_microstep: 1650.38 | bwd_inner_microstep: 1650.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 06:30:51,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1423.83 | bwd_inner_microstep: 1423.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 06:30:53,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.39 | bwd_microstep: 1299.90 | bwd_inner_microstep: 1299.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 06:30:55,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 1494.12 | bwd_inner_microstep: 1494.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667
[2024-06-10 06:30:57,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1327.30 | bwd_inner_microstep: 1327.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 06:30:59,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3554
[2024-06-10 06:31:01,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.46 | bwd_microstep: 1548.57 | bwd_inner_microstep: 1548.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3630
[2024-06-10 06:31:03,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1474.33 | bwd_inner_microstep: 1474.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 06:31:05,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824
[2024-06-10 06:31:07,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.73 | bwd_microstep: 1754.17 | bwd_inner_microstep: 1754.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2975
[2024-06-10 06:31:09,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.00 | bwd_microstep: 1230.88 | bwd_inner_microstep: 1230.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3450
[2024-06-10 06:31:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1520.41 | bwd_inner_microstep: 1520.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 06:31:13,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.90 | bwd_microstep: 1595.21 | bwd_inner_microstep: 1595.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-10 06:31:15,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.23 | bwd_microstep: 1433.06 | bwd_inner_microstep: 1433.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-10 06:31:18,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 06:31:18,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 2603.66 | bwd_inner_microstep: 1461.24 | bwd_allreduce_microstep: 1142.37 | step_microstep: 38.34
[2024-06-10 06:31:18,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16206.79 | bwd: 44584.90 | bwd_inner: 43441.53 | bwd_allreduce: 1142.65 | step: 40.04
{'loss': 1.3149, 'learning_rate': 3.7206810529918935e-05, 'epoch': 0.2}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942
[2024-06-10 06:31:19,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.88 | bwd_microstep: 721.50 | bwd_inner_microstep: 721.35 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3885
[2024-06-10 06:31:22,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.46 | bwd_microstep: 1535.28 | bwd_inner_microstep: 1535.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3852
[2024-06-10 06:31:24,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.33 | bwd_microstep: 1562.31 | bwd_inner_microstep: 1562.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 06:31:26,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.63 | bwd_microstep: 1344.72 | bwd_inner_microstep: 1344.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590
[2024-06-10 06:31:27,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1308.74 | bwd_inner_microstep: 1308.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423
[2024-06-10 06:31:29,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.75 | bwd_microstep: 1155.10 | bwd_inner_microstep: 1155.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 06:31:30,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.53 | bwd_microstep: 796.66 | bwd_inner_microstep: 796.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 06:31:32,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.12 | bwd_microstep: 1249.35 | bwd_inner_microstep: 1249.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 06:31:34,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1400.00 | bwd_inner_microstep: 1399.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3700
[2024-06-10 06:31:36,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.31 | bwd_microstep: 1452.47 | bwd_inner_microstep: 1452.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 06:31:38,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.20 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502
[2024-06-10 06:31:40,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1437.48 | bwd_inner_microstep: 1437.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-10 06:31:41,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.73 | bwd_microstep: 892.70 | bwd_inner_microstep: 892.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2129
[2024-06-10 06:31:42,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.30 | bwd_microstep: 926.27 | bwd_inner_microstep: 926.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-10 06:31:44,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.16 | bwd_microstep: 1310.63 | bwd_inner_microstep: 1310.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 06:31:46,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1349.42 | bwd_inner_microstep: 1349.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 06:31:48,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.84 | bwd_microstep: 1529.98 | bwd_inner_microstep: 1529.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 06:31:50,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1405.56 | bwd_inner_microstep: 1405.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 06:31:52,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.22 | bwd_microstep: 1290.19 | bwd_inner_microstep: 1290.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298
[2024-06-10 06:31:53,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.83 | bwd_microstep: 976.55 | bwd_inner_microstep: 976.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641
[2024-06-10 06:31:55,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.44 | bwd_microstep: 1583.40 | bwd_inner_microstep: 1583.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 06:31:57,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 06:31:59,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1289.53 | bwd_inner_microstep: 1289.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 06:32:01,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.20 | bwd_microstep: 1315.91 | bwd_inner_microstep: 1315.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 06:32:03,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1316.65 | bwd_inner_microstep: 1316.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 06:32:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.56 | bwd_microstep: 1559.22 | bwd_inner_microstep: 1559.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 06:32:07,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.62 | bwd_microstep: 1384.09 | bwd_inner_microstep: 1384.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 06:32:09,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1242.69 | bwd_inner_microstep: 1242.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 06:32:11,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.40 | bwd_microstep: 1650.86 | bwd_inner_microstep: 1650.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 06:32:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.35 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 06:32:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1290.53 | bwd_inner_microstep: 1290.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3812
[2024-06-10 06:32:20,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 06:32:20,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.53 | bwd_microstep: 4927.94 | bwd_inner_microstep: 1907.08 | bwd_allreduce_microstep: 3020.82 | step_microstep: 38.79
[2024-06-10 06:32:20,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15818.94 | bwd: 45380.67 | bwd_inner: 42358.82 | bwd_allreduce: 3021.10 | step: 40.40
{'loss': 1.2819, 'learning_rate': 3.7187648439724755e-05, 'epoch': 0.2}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017
[2024-06-10 06:32:21,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.85 | bwd_microstep: 833.80 | bwd_inner_microstep: 833.69 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 06:32:23,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1386.07 | bwd_inner_microstep: 1386.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 06:32:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.95 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 06:32:27,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.37 | bwd_microstep: 1458.69 | bwd_inner_microstep: 1458.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-10 06:32:29,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.58 | bwd_microstep: 1215.66 | bwd_inner_microstep: 1215.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3745
[2024-06-10 06:32:31,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1486.41 | bwd_inner_microstep: 1486.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 06:32:33,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1379.75 | bwd_inner_microstep: 1379.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 06:32:34,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.01 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2370
[2024-06-10 06:32:36,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.51 | bwd_microstep: 838.72 | bwd_inner_microstep: 838.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 06:32:37,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.65 | bwd_microstep: 1311.39 | bwd_inner_microstep: 1311.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 06:32:40,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.88 | bwd_microstep: 1627.92 | bwd_inner_microstep: 1627.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3412
[2024-06-10 06:32:42,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1295.60 | bwd_inner_microstep: 1295.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3672
[2024-06-10 06:32:44,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.72 | bwd_microstep: 1585.02 | bwd_inner_microstep: 1584.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 06:32:46,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1386.35 | bwd_inner_microstep: 1386.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-10 06:32:48,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1587.27 | bwd_inner_microstep: 1587.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 06:32:50,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1507.47 | bwd_inner_microstep: 1507.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-10 06:32:52,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1410.59 | bwd_inner_microstep: 1410.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3638
[2024-06-10 06:32:54,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.00 | bwd_microstep: 1221.28 | bwd_inner_microstep: 1221.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 06:32:55,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1293.87 | bwd_inner_microstep: 1293.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299
[2024-06-10 06:32:57,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.44 | bwd_microstep: 978.07 | bwd_inner_microstep: 978.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 06:32:59,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1485.49 | bwd_inner_microstep: 1485.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 06:33:00,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1281.46 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 06:33:03,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1556.27 | bwd_inner_microstep: 1556.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 06:33:05,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1512.82 | bwd_inner_microstep: 1512.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-10 06:33:06,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.23 | bwd_microstep: 817.99 | bwd_inner_microstep: 817.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2107
[2024-06-10 06:33:07,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.94 | bwd_microstep: 886.57 | bwd_inner_microstep: 886.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-10 06:33:09,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1437.71 | bwd_inner_microstep: 1437.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2043
[2024-06-10 06:33:10,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.36 | bwd_microstep: 845.03 | bwd_inner_microstep: 845.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1893
[2024-06-10 06:33:11,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.58 | bwd_microstep: 777.40 | bwd_inner_microstep: 777.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 06:33:13,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1251.10 | bwd_inner_microstep: 1251.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 06:33:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2012
[2024-06-10 06:33:20,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 06:33:20,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.72 | bwd_microstep: 5294.04 | bwd_inner_microstep: 1022.30 | bwd_allreduce_microstep: 4271.69 | step_microstep: 38.65
[2024-06-10 06:33:20,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15189.69 | bwd: 44886.72 | bwd_inner: 40614.01 | bwd_allreduce: 4271.97 | step: 40.30
{'loss': 1.3032, 'learning_rate': 3.7168425814722127e-05, 'epoch': 0.2}
�        | 334/1726 [5:50:49<23:55:19, 61.87s/it]


 19%|█▉        | 334/1726 [5:50:49<23:55:19, 61.87s/it]
 19%|█▉        | 335/1726 [5:51:50<23:48:53, 61.63s/it]


 19%|█▉        | 335/1726 [5:51:50<23:48:53, 61.63s/it]
 19%|█▉        | 336/1726 [5:52:54<24:02:00, 62.24s/it]


 19%|█▉        | 336/1726 [5:52:54<24:02:00, 62.24s/it]
 20%|█▉        | 337/1726 [5:53:55<23:53:21, 61.92s/it]


 20%|█▉        | 337/1726 [5:53:55<23:53:21, 61.92s/it]
 20%|█▉        | 338/1726 [5:54:57<23:49:44, 61.80s/it]


 20%|█▉        | 338/1726 [5:54:57<23:49:44, 61.80s/it]
 20%|█▉        | 339/1726 [5:55:57<23:39:08, 61.39s/it]


 20%|█▉        | 339/1726 [5dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3394
[2024-06-10 06:33:22,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1390.70 | bwd_inner_microstep: 1390.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 06:33:24,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.86 | bwd_microstep: 1276.69 | bwd_inner_microstep: 1276.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-10 06:33:26,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.44 | bwd_microstep: 1579.67 | bwd_inner_microstep: 1579.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 06:33:28,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1343.35 | bwd_inner_microstep: 1343.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-10 06:33:29,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.80 | bwd_microstep: 778.34 | bwd_inner_microstep: 778.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-10 06:33:32,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.77 | bwd_microstep: 1638.89 | bwd_inner_microstep: 1638.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 06:33:33,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1380.83 | bwd_inner_microstep: 1380.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1913
[2024-06-10 06:33:34,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.26 | bwd_microstep: 715.51 | bwd_inner_microstep: 715.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489
[2024-06-10 06:33:36,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.32 | bwd_microstep: 1218.96 | bwd_inner_microstep: 1218.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 06:33:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1389.10 | bwd_inner_microstep: 1389.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 06:33:39,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 793.72 | bwd_inner_microstep: 793.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 892
[2024-06-10 06:33:40,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.78 | bwd_microstep: 372.69 | bwd_inner_microstep: 372.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 06:33:42,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.68 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 06:33:43,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.79 | bwd_microstep: 1185.06 | bwd_inner_microstep: 1185.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3663
[2024-06-10 06:33:45,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1330.60 | bwd_inner_microstep: 1330.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3387
[2024-06-10 06:33:47,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.78 | bwd_microstep: 1273.32 | bwd_inner_microstep: 1273.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-10 06:33:49,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.17 | bwd_microstep: 1585.78 | bwd_inner_microstep: 1585.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 06:33:51,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.02 | bwd_microstep: 1581.99 | bwd_inner_microstep: 1581.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3617
[2024-06-10 06:33:54,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 1704.25 | bwd_inner_microstep: 1704.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817
[2024-06-10 06:33:56,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.91 | bwd_microstep: 1514.87 | bwd_inner_microstep: 1514.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3824
[2024-06-10 06:33:58,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.09 | bwd_microstep: 1601.35 | bwd_inner_microstep: 1601.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 06:34:00,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.21 | bwd_microstep: 1446.01 | bwd_inner_microstep: 1445.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 06:34:02,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.62 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 06:34:04,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.01 | bwd_microstep: 1529.66 | bwd_inner_microstep: 1529.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 06:34:06,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1494.50 | bwd_inner_microstep: 1494.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3682
[2024-06-10 06:34:08,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1232.86 | bwd_inner_microstep: 1232.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2240
[2024-06-10 06:34:09,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.04 | bwd_microstep: 1060.42 | bwd_inner_microstep: 1060.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445
[2024-06-10 06:34:11,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.08 | bwd_microstep: 1157.41 | bwd_inner_microstep: 1157.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3776
[2024-06-10 06:34:13,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.35 | bwd_microstep: 1681.73 | bwd_inner_microstep: 1681.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 06:34:15,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.46 | bwd_microstep: 1502.36 | bwd_inner_microstep: 1502.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 06:34:17,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.03 | bwd_microstep: 1603.41 | bwd_inner_microstep: 1603.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228
[2024-06-10 06:34:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 06:34:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.39 | bwd_microstep: 5214.74 | bwd_inner_microstep: 1089.57 | bwd_allreduce_microstep: 4125.11 | step_microstep: 38.83
[2024-06-10 06:34:23,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15729.54 | bwd: 46370.48 | bwd_inner: 42244.43 | bwd_allreduce: 4125.35 | step: 40.43
{'loss': 1.3781, 'learning_rate': 3.714914272261302e-05, 'epoch': 0.2}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 06:34:25,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1366.97 | bwd_inner_microstep: 1366.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4022
[2024-06-10 06:34:27,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.18 | bwd_microstep: 1709.42 | bwd_inner_microstep: 1709.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2422
[2024-06-10 06:34:29,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.72 | bwd_microstep: 1002.38 | bwd_inner_microstep: 1002.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2298
[2024-06-10 06:34:30,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.82 | bwd_microstep: 877.68 | bwd_inner_microstep: 877.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 06:34:32,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.85 | bwd_microstep: 1341.93 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 06:34:33,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1250.26 | bwd_inner_microstep: 1250.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 06:34:35,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 06:34:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 06:34:39,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1247.73 | bwd_inner_microstep: 1247.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 06:34:41,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 06:34:42,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1418.04 | bwd_inner_microstep: 1418.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 06:34:43,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 698.65 | bwd_inner_microstep: 698.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449
[2024-06-10 06:34:45,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.33 | bwd_microstep: 1285.01 | bwd_inner_microstep: 1284.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3963
[2024-06-10 06:34:48,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.67 | bwd_microstep: 1800.48 | bwd_inner_microstep: 1800.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086
[2024-06-10 06:34:49,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.87 | bwd_microstep: 824.43 | bwd_inner_microstep: 824.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 06:34:51,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1390.92 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 06:34:52,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 795.00 | bwd_inner_microstep: 794.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 06:34:54,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 06:34:56,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1475.92 | bwd_inner_microstep: 1475.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 06:34:58,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1616.19 | bwd_inner_microstep: 1616.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 747
[2024-06-10 06:34:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.57 | bwd_microstep: 302.85 | bwd_inner_microstep: 302.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3559
[2024-06-10 06:35:01,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.01 | bwd_microstep: 1550.00 | bwd_inner_microstep: 1549.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-10 06:35:02,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.94 | bwd_microstep: 699.02 | bwd_inner_microstep: 698.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3712
[2024-06-10 06:35:03,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1236.33 | bwd_inner_microstep: 1236.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-10 06:35:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.20 | bwd_microstep: 1184.29 | bwd_inner_microstep: 1184.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3558
[2024-06-10 06:35:07,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1361.01 | bwd_inner_microstep: 1360.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 06:35:09,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.90 | bwd_microstep: 1260.32 | bwd_inner_microstep: 1260.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 06:35:10,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3767
[2024-06-10 06:35:13,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.44 | bwd_microstep: 1739.41 | bwd_inner_microstep: 1739.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3580
[2024-06-10 06:35:15,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1563.03 | bwd_inner_microstep: 1563.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 06:35:17,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1291.80 | bwd_inner_microstep: 1291.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1020
[2024-06-10 06:35:24,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.28 | optimizer_step: 6.56
[2024-06-10 06:35:24,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.78 | bwd_microstep: 6920.40 | bwd_inner_microstep: 535.16 | bwd_allreduce_microstep: 6385.19 | step_microstep: 39.38
[2024-06-10 06:35:24,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14804.22 | bwd: 45893.27 | bwd_inner: 39507.17 | bwd_allreduce: 6385.42 | step: 41.01
{'loss': 1.264, 'learning_rate': 3.71297992313124e-05, 'epoch': 0.2}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 06:35:26,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1478.40 | bwd_inner_microstep: 1478.33 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878
[2024-06-10 06:35:27,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.41 | bwd_microstep: 747.83 | bwd_inner_microstep: 747.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 06:35:29,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1340.70 | bwd_inner_microstep: 1340.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3839
[2024-06-10 06:35:31,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1483.96 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 06:35:33,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.18 | bwd_microstep: 1403.64 | bwd_inner_microstep: 1403.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3754
[2024-06-10 06:35:35,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1370.30 | bwd_inner_microstep: 1370.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2476
[2024-06-10 06:35:36,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.11 | bwd_microstep: 953.11 | bwd_inner_microstep: 953.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617
[2024-06-10 06:35:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1313.40 | bwd_inner_microstep: 1313.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 06:35:40,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.12 | bwd_microstep: 1632.24 | bwd_inner_microstep: 1632.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3736
[2024-06-10 06:35:42,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.39 | bwd_microstep: 1594.66 | bwd_inner_microstep: 1594.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3792
[2024-06-10 06:35:44,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1469.64 | bwd_inner_microstep: 1469.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 06:35:46,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1350.07 | bwd_inner_microstep: 1350.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 06:35:48,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1319.67 | bwd_inner_microstep: 1319.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668
[2024-06-10 06:35:50,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.09 | bwd_microstep: 1618.88 | bwd_inner_microstep: 1618.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3510
[2024-06-10 06:35:52,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1446.19 | bwd_inner_microstep: 1446.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1984
[2024-06-10 06:35:53,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.38 | bwd_microstep: 830.07 | bwd_inner_microstep: 830.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1980
[2024-06-10 06:35:55,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.39 | bwd_microstep: 925.12 | bwd_inner_microstep: 925.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 06:35:57,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.32 | bwd_microstep: 1599.67 | bwd_inner_microstep: 1599.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3445
[2024-06-10 06:35:59,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.60 | bwd_microstep: 1187.82 | bwd_inner_microstep: 1187.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 06:36:01,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.09 | bwd_microstep: 1653.66 | bwd_inner_microstep: 1653.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 06:36:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.90 | bwd_microstep: 1313.06 | bwd_inner_microstep: 1313.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 06:36:04,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1160.29 | bwd_inner_microstep: 1160.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103
[2024-06-10 06:36:05,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.16 | bwd_microstep: 824.73 | bwd_inner_microstep: 824.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 06:36:07,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.67 | bwd_microstep: 1450.45 | bwd_inner_microstep: 1450.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2187
[2024-06-10 06:36:08,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.53 | bwd_microstep: 767.23 | bwd_inner_microstep: 767.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 06:36:10,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1256.78 | bwd_inner_microstep: 1256.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 06:36:12,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.41 | bwd_microstep: 1645.89 | bwd_inner_microstep: 1645.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 06:36:15,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.22 | bwd_microstep: 1655.75 | bwd_inner_microstep: 1655.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 06:36:17,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1554.78 | bwd_inner_microstep: 1554.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 06:36:19,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1545.00 | bwd_inner_microstep: 1544.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 06:36:21,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1456.94 | bwd_inner_microstep: 1456.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-10 06:36:25,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 06:36:25,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 3675.34 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 2196.86 | step_microstep: 39.53
[2024-06-10 06:36:25,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15955.47 | bwd: 45025.30 | bwd_inner: 42827.45 | bwd_allreduce: 2197.14 | step: 41.25
{'loss': 1.3162, 'learning_rate': 3.7110395408947937e-05, 'epoch': 0.2}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2036
[2024-06-10 06:36:27,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.79 | bwd_microstep: 900.87 | bwd_inner_microstep: 900.71 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4078
[2024-06-10 06:36:29,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.00 | bwd_microstep: 1620.62 | bwd_inner_microstep: 1620.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 06:36:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.70 | bwd_microstep: 1650.38 | bwd_inner_microstep: 1650.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 06:36:33,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1283.49 | bwd_inner_microstep: 1283.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 06:36:35,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 06:36:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1249.54 | bwd_inner_microstep: 1249.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 06:36:38,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1389.51 | bwd_inner_microstep: 1389.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-10 06:36:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.20 | bwd_microstep: 1445.28 | bwd_inner_microstep: 1445.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 06:36:42,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1384.11 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 06:36:44,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1253.23 | bwd_inner_microstep: 1253.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 06:36:46,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1417.57 | bwd_inner_microstep: 1417.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954
[2024-06-10 06:36:47,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.40 | bwd_microstep: 858.10 | bwd_inner_microstep: 858.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 06:36:49,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1388.23 | bwd_inner_microstep: 1388.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3510
[2024-06-10 06:36:51,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.23 | bwd_microstep: 1553.97 | bwd_inner_microstep: 1553.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 06:36:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1553.77 | bwd_inner_microstep: 1553.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3696
[2024-06-10 06:36:56,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.06 | bwd_microstep: 1560.73 | bwd_inner_microstep: 1560.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 06:36:57,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1386.85 | bwd_inner_microstep: 1386.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 06:36:59,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1314.58 | bwd_inner_microstep: 1314.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687
[2024-06-10 06:37:01,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.46 | bwd_microstep: 1435.48 | bwd_inner_microstep: 1435.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 06:37:03,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1506.25 | bwd_inner_microstep: 1506.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 06:37:05,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1553.23 | bwd_inner_microstep: 1553.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612
[2024-06-10 06:37:07,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.92 | bwd_microstep: 1316.37 | bwd_inner_microstep: 1316.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 06:37:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.44 | bwd_microstep: 1452.84 | bwd_inner_microstep: 1452.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 06:37:11,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.74 | bwd_microstep: 1497.05 | bwd_inner_microstep: 1497.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 06:37:13,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603
[2024-06-10 06:37:16,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.86 | bwd_microstep: 1707.49 | bwd_inner_microstep: 1707.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816
[2024-06-10 06:37:18,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.58 | bwd_microstep: 1604.07 | bwd_inner_microstep: 1604.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581
[2024-06-10 06:37:20,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.21 | bwd_microstep: 1697.09 | bwd_inner_microstep: 1697.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3726
[2024-06-10 06:37:23,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.41 | bwd_microstep: 1837.33 | bwd_inner_microstep: 1837.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2232
[2024-06-10 06:37:24,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.96 | bwd_microstep: 993.96 | bwd_inner_microstep: 993.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 06:37:26,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 06:37:28,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 06:37:28,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1440.56 | bwd_inner_microstep: 1432.80 | bwd_allreduce_microstep: 7.71 | step_microstep: 38.38
[2024-06-10 06:37:28,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16988.25 | bwd: 45593.95 | bwd_inner: 45585.22 | bwd_allreduce: 7.99 | step: 40.06
{'loss': 1.2781, 'learning_rate': 3.7090931323859794e-05, 'epoch': 0.2}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-10 06:37:30,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.99 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2620
[2024-06-10 06:37:32,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.87 | bwd_microstep: 1049.33 | bwd_inner_microstep: 1049.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873
[2024-06-10 06:37:34,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.66 | bwd_microstep: 1685.85 | bwd_inner_microstep: 1685.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-10 06:37:36,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.19 | bwd_microstep: 1539.66 | bwd_inner_microstep: 1539.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 06:37:38,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.18 | bwd_microstep: 1283.81 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1863
[2024-06-10 06:37:39,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 709.19 | bwd_inner_microstep: 709.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 06:37:41,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.58 | bwd_microstep: 1541.56 | bwd_inner_microstep: 1541.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 06:37:43,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1251.36 | bwd_inner_microstep: 1251.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 06:37:45,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 1287.28 | bwd_inner_microstep: 1287.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-10 06:37:46,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.68 | bwd_microstep: 1223.55 | bwd_inner_microstep: 1223.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 06:37:48,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1248.24 | bwd_inner_microstep: 1248.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1943
[2024-06-10 06:37:49,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.91 | bwd_microstep: 812.71 | bwd_inner_microstep: 812.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3403
[2024-06-10 06:37:51,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1294.70 | bwd_inner_microstep: 1294.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 06:37:53,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1490.61 | bwd_inner_microstep: 1490.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2183
[2024-06-10 06:37:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.22 | bwd_microstep: 985.76 | bwd_inner_microstep: 985.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 06:37:56,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.86 | bwd_microstep: 1452.36 | bwd_inner_microstep: 1452.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 06:37:58,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1256.68 | bwd_inner_microstep: 1256.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2480
[2024-06-10 06:37:59,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.95 | bwd_microstep: 956.03 | bwd_inner_microstep: 956.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2147
[2024-06-10 06:38:01,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.71 | bwd_microstep: 883.30 | bwd_inner_microstep: 883.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2940
[2024-06-10 06:38:02,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.76 | bwd_microstep: 1245.41 | bwd_inner_microstep: 1245.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 06:38:04,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1387.34 | bwd_inner_microstep: 1387.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 06:38:06,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1403.39 | bwd_inner_microstep: 1403.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 06:38:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.74 | bwd_microstep: 1352.93 | bwd_inner_microstep: 1352.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2286
[2024-06-10 06:38:09,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.82 | bwd_microstep: 914.95 | bwd_inner_microstep: 914.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815
[2024-06-10 06:38:11,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1388.87 | bwd_inner_microstep: 1388.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 06:38:14,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.61 | bwd_microstep: 2021.26 | bwd_inner_microstep: 2021.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 06:38:16,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1411.73 | bwd_inner_microstep: 1411.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 06:38:17,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1255.20 | bwd_inner_microstep: 1255.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2411
[2024-06-10 06:38:19,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.51 | bwd_microstep: 1003.02 | bwd_inner_microstep: 1002.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 06:38:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.66 | bwd_microstep: 1539.88 | bwd_inner_microstep: 1539.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-10 06:38:23,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1531.55 | bwd_inner_microstep: 1531.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 06:38:27,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 06:38:27,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.50 | bwd_microstep: 3742.34 | bwd_inner_microstep: 1742.28 | bwd_allreduce_microstep: 2000.01 | step_microstep: 38.70
[2024-06-10 06:38:27,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15287.29 | bwd: 43592.14 | bwd_inner: 41591.18 | bwd_allreduce: 2000.25 | step: 40.45
{'loss': 1.3095, 'learning_rate': 3.707140704460037e-05, 'epoch': 0.2}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 06:38:29,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.25 | bwd_microstep: 781.65 | bwd_inner_microstep: 781.57 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 06:38:30,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1378.45 | bwd_inner_microstep: 1378.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 06:38:32,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1282.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 06:38:34,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.77 | bwd_microstep: 1342.44 | bwd_inner_microstep: 1342.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 06:38:36,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.55 | bwd_microstep: 1546.35 | bwd_inner_microstep: 1546.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 06:38:38,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1640.00 | bwd_inner_microstep: 1639.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719
[2024-06-10 06:38:40,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1335.01 | bwd_inner_microstep: 1334.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 06:38:42,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 06:38:44,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696
[2024-06-10 06:38:46,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1430.29 | bwd_inner_microstep: 1430.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3585
[2024-06-10 06:38:48,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.08 | bwd_microstep: 1212.93 | bwd_inner_microstep: 1212.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-10 06:38:49,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.35 | bwd_microstep: 757.92 | bwd_inner_microstep: 757.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2174
[2024-06-10 06:38:50,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.81 | bwd_microstep: 948.19 | bwd_inner_microstep: 948.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 06:38:52,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1317.16 | bwd_inner_microstep: 1317.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 06:38:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1478.76 | bwd_inner_microstep: 1478.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 06:38:56,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.22 | bwd_microstep: 1475.11 | bwd_inner_microstep: 1475.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 06:38:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1391.53 | bwd_inner_microstep: 1391.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 06:39:00,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1515.42 | bwd_inner_microstep: 1515.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 06:39:02,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 06:39:04,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1414.99 | bwd_inner_microstep: 1414.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 06:39:06,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1496.82 | bwd_inner_microstep: 1496.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4123
[2024-06-10 06:39:08,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.75 | bwd_microstep: 1567.31 | bwd_inner_microstep: 1567.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3815
[2024-06-10 06:39:10,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1624.28 | bwd_inner_microstep: 1624.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 06:39:12,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1381.79 | bwd_inner_microstep: 1381.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437
[2024-06-10 06:39:14,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1381.88 | bwd_inner_microstep: 1381.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2188
[2024-06-10 06:39:15,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.27 | bwd_microstep: 795.29 | bwd_inner_microstep: 795.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 06:39:17,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2552
[2024-06-10 06:39:19,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.50 | bwd_microstep: 1154.03 | bwd_inner_microstep: 1154.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-10 06:39:21,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1465.29 | bwd_inner_microstep: 1465.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 06:39:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.30 | bwd_microstep: 1598.21 | bwd_inner_microstep: 1598.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 06:39:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.86 | bwd_microstep: 1643.14 | bwd_inner_microstep: 1643.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3398
[2024-06-10 06:39:27,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.64
[2024-06-10 06:39:27,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1432.35 | bwd_inner_microstep: 1424.65 | bwd_allreduce_microstep: 7.66 | step_microstep: 38.39
[2024-06-10 06:39:27,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16192.46 | bwd: 43231.01 | bwd_inner: 43222.39 | bwd_allreduce: 7.92 | step: 40.04
:55:57<23:39:08, 61.39s/it]
 20%|█▉        | 340/1726 [5:57:00<23:45:26, 61.71s/it]


 20%|█▉        | 340/1726 [5:57:00<23:45:26, 61.71s/it]
 20%|█▉        | 341/1726 [5:58:01<23:39:48, 61.51s/it]


 20%|█▉        | 341/1726 [5:58:01<23:39:48, 61.51s/it]
 20%|█▉        | 342/1726 [5:59:02<23:37:32, 61.45s/it]


 20%|█▉        | 342/1726 [5:59:02<23:37:32, 61.45s/it]
 20%|█▉        | 343/1726 [6:00:05<23:46:48, 61.90s/it]


 20%|█▉        | 343/1726 [6:00:05<23:46:48, 61.90s/it]
 20%|█▉        | 344/1726 [6:01:04<23:27:22, 61.10s/it]


 20%|█▉        | 344/1726 [6:01:04<23:27:22, 61.10s/it]
 20%|█▉        | 345/1726 [6:02:04<23:17:10, 60.70s/it]
                                                   {'loss': 1.281, 'learning_rate': 3.7051822639934086e-05, 'epoch': 0.2}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-10 06:39:28,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.59 | bwd_microstep: 783.20 | bwd_inner_microstep: 783.06 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3985
[2024-06-10 06:39:30,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1504.95 | bwd_inner_microstep: 1504.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-10 06:39:33,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.47 | bwd_microstep: 1662.43 | bwd_inner_microstep: 1662.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 06:39:35,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1382.66 | bwd_inner_microstep: 1382.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4217
[2024-06-10 06:39:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.72 | bwd_microstep: 1658.54 | bwd_inner_microstep: 1658.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 06:39:39,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.21 | bwd_microstep: 1457.81 | bwd_inner_microstep: 1457.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3576
[2024-06-10 06:39:41,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.70 | bwd_microstep: 1208.91 | bwd_inner_microstep: 1208.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3480
[2024-06-10 06:39:42,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1333.48 | bwd_inner_microstep: 1333.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3742
[2024-06-10 06:39:44,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.04 | bwd_microstep: 1337.55 | bwd_inner_microstep: 1337.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425
[2024-06-10 06:39:46,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1286.09 | bwd_inner_microstep: 1286.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 06:39:48,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.59 | bwd_microstep: 1156.94 | bwd_inner_microstep: 1156.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 06:39:49,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1255.14 | bwd_inner_microstep: 1255.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-10 06:39:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.12 | bwd_microstep: 729.07 | bwd_inner_microstep: 729.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 06:39:52,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.92 | bwd_microstep: 1451.35 | bwd_inner_microstep: 1451.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885
[2024-06-10 06:39:53,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.55 | bwd_microstep: 684.93 | bwd_inner_microstep: 684.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 06:39:55,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1395.37 | bwd_inner_microstep: 1395.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893
[2024-06-10 06:39:58,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.53 | bwd_microstep: 1689.91 | bwd_inner_microstep: 1689.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 06:39:59,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.87 | bwd_microstep: 1282.55 | bwd_inner_microstep: 1282.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 06:40:01,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1314.49 | bwd_inner_microstep: 1314.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 06:40:03,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535
[2024-06-10 06:40:05,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 06:40:07,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1343.54 | bwd_inner_microstep: 1343.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3517
[2024-06-10 06:40:09,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.02 | bwd_microstep: 1224.68 | bwd_inner_microstep: 1224.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 06:40:10,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.54 | bwd_microstep: 803.73 | bwd_inner_microstep: 803.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 06:40:12,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.46 | bwd_microstep: 1658.26 | bwd_inner_microstep: 1658.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 06:40:14,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1288.43 | bwd_inner_microstep: 1288.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 06:40:16,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1473.61 | bwd_inner_microstep: 1473.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 06:40:18,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1652.92 | bwd_inner_microstep: 1652.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 06:40:20,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.08 | bwd_microstep: 1636.77 | bwd_inner_microstep: 1636.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588
[2024-06-10 06:40:23,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1568.61 | bwd_inner_microstep: 1568.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 06:40:24,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.35 | bwd_microstep: 680.17 | bwd_inner_microstep: 680.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2247
[2024-06-10 06:40:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-10 06:40:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.33 | bwd_microstep: 3205.12 | bwd_inner_microstep: 1019.69 | bwd_allreduce_microstep: 2185.38 | step_microstep: 38.61
[2024-06-10 06:40:27,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15641.05 | bwd: 43925.59 | bwd_inner: 41739.20 | bwd_allreduce: 2185.66 | step: 40.47
{'loss': 1.2802, 'learning_rate': 3.70321781788371e-05, 'epoch': 0.2}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 06:40:29,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.78 | bwd_microstep: 1271.71 | bwd_inner_microstep: 1271.57 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3887
[2024-06-10 06:40:31,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.66 | bwd_microstep: 1683.59 | bwd_inner_microstep: 1683.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 06:40:33,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1554.30 | bwd_inner_microstep: 1554.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 06:40:35,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.56 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523
[2024-06-10 06:40:37,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.49 | bwd_microstep: 1229.60 | bwd_inner_microstep: 1229.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 06:40:39,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.10 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3518
[2024-06-10 06:40:41,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.20 | bwd_microstep: 1256.04 | bwd_inner_microstep: 1256.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902
[2024-06-10 06:40:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.72 | bwd_microstep: 686.38 | bwd_inner_microstep: 686.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3737
[2024-06-10 06:40:44,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.99 | bwd_microstep: 1485.70 | bwd_inner_microstep: 1485.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-10 06:40:46,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.36 | bwd_microstep: 1531.02 | bwd_inner_microstep: 1530.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-10 06:40:48,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2086
[2024-06-10 06:40:49,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.50 | bwd_microstep: 760.77 | bwd_inner_microstep: 760.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3657
[2024-06-10 06:40:51,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1455.39 | bwd_inner_microstep: 1455.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 06:40:53,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1377.47 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 06:40:55,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1343.23 | bwd_inner_microstep: 1343.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 06:40:57,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.72 | bwd_microstep: 1606.02 | bwd_inner_microstep: 1606.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3521
[2024-06-10 06:40:59,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 06:41:01,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1246.16 | bwd_inner_microstep: 1246.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3516
[2024-06-10 06:41:03,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1244.93 | bwd_inner_microstep: 1244.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 06:41:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1296.20 | bwd_inner_microstep: 1296.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782
[2024-06-10 06:41:06,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.61 | bwd_microstep: 1479.97 | bwd_inner_microstep: 1479.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3541
[2024-06-10 06:41:08,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1375.44 | bwd_inner_microstep: 1375.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721
[2024-06-10 06:41:10,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.50 | bwd_microstep: 1339.94 | bwd_inner_microstep: 1339.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3603
[2024-06-10 06:41:12,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.17 | bwd_microstep: 1543.75 | bwd_inner_microstep: 1543.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 06:41:14,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1473.88 | bwd_inner_microstep: 1473.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2267
[2024-06-10 06:41:16,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.99 | bwd_microstep: 971.17 | bwd_inner_microstep: 971.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2415
[2024-06-10 06:41:17,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.77 | bwd_microstep: 1044.75 | bwd_inner_microstep: 1044.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 06:41:19,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.69 | bwd_microstep: 1392.70 | bwd_inner_microstep: 1392.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 06:41:21,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1296.10 | bwd_inner_microstep: 1296.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-10 06:41:23,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1315.76 | bwd_inner_microstep: 1315.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-10 06:41:24,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.96 | bwd_microstep: 880.26 | bwd_inner_microstep: 880.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 06:41:28,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 06:41:28,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 3569.52 | bwd_inner_microstep: 1624.10 | bwd_allreduce_microstep: 1945.37 | step_microstep: 38.81
[2024-06-10 06:41:28,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15977.02 | bwd: 44602.00 | bwd_inner: 42655.60 | bwd_allreduce: 1945.67 | step: 40.60
{'loss': 1.3171, 'learning_rate': 3.7012473730497115e-05, 'epoch': 0.2}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 06:41:30,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.40 | bwd_microstep: 1607.39 | bwd_inner_microstep: 1607.23 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937
[2024-06-10 06:41:32,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.33 | bwd_microstep: 1491.98 | bwd_inner_microstep: 1491.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 06:41:34,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1482.86 | bwd_inner_microstep: 1482.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407
[2024-06-10 06:41:36,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.18 | bwd_microstep: 1309.06 | bwd_inner_microstep: 1309.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 06:41:38,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.72 | bwd_microstep: 1505.48 | bwd_inner_microstep: 1505.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-10 06:41:40,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.34 | bwd_microstep: 1332.27 | bwd_inner_microstep: 1332.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-10 06:41:42,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.52 | bwd_microstep: 1662.48 | bwd_inner_microstep: 1662.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 06:41:44,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1246.78 | bwd_inner_microstep: 1246.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2124
[2024-06-10 06:41:45,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.41 | bwd_microstep: 831.15 | bwd_inner_microstep: 831.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643
[2024-06-10 06:41:47,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.93 | bwd_microstep: 1317.31 | bwd_inner_microstep: 1317.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 06:41:49,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.58 | bwd_microstep: 1629.58 | bwd_inner_microstep: 1629.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514
[2024-06-10 06:41:51,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1415.61 | bwd_inner_microstep: 1415.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 06:41:53,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 06:41:55,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.60 | bwd_microstep: 1435.90 | bwd_inner_microstep: 1435.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2906
[2024-06-10 06:41:57,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.09 | bwd_microstep: 1124.15 | bwd_inner_microstep: 1124.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 06:41:59,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 1379.11 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 06:42:01,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.52 | bwd_microstep: 1501.08 | bwd_inner_microstep: 1501.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-10 06:42:03,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.29 | bwd_microstep: 1612.28 | bwd_inner_microstep: 1612.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1899
[2024-06-10 06:42:04,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 810.55 | bwd_inner_microstep: 810.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3570
[2024-06-10 06:42:06,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.98 | bwd_microstep: 1207.42 | bwd_inner_microstep: 1207.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 06:42:07,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.21 | bwd_microstep: 1186.96 | bwd_inner_microstep: 1186.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572
[2024-06-10 06:42:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.92 | bwd_microstep: 1564.92 | bwd_inner_microstep: 1564.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 06:42:11,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.85 | bwd_microstep: 705.60 | bwd_inner_microstep: 705.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 06:42:12,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1247.03 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 06:42:14,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 06:42:16,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1397.40 | bwd_inner_microstep: 1397.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 06:42:18,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.43 | bwd_microstep: 1544.04 | bwd_inner_microstep: 1544.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 06:42:20,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1394.56 | bwd_inner_microstep: 1394.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3456
[2024-06-10 06:42:22,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.41 | bwd_microstep: 1221.31 | bwd_inner_microstep: 1221.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 06:42:24,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.10 | bwd_microstep: 1654.04 | bwd_inner_microstep: 1654.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 06:42:26,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.32 | bwd_microstep: 1301.37 | bwd_inner_microstep: 1301.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 06:42:31,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 06:42:31,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.41 | bwd_microstep: 4427.19 | bwd_inner_microstep: 2105.93 | bwd_allreduce_microstep: 2321.20 | step_microstep: 38.79
[2024-06-10 06:42:31,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16368.98 | bwd: 46440.14 | bwd_inner: 44117.90 | bwd_allreduce: 2321.49 | step: 40.52
{'loss': 1.3499, 'learning_rate': 3.699270936431309e-05, 'epoch': 0.2}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1955
[2024-06-10 06:42:32,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.62 | bwd_microstep: 889.33 | bwd_inner_microstep: 889.06 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.19
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3989
[2024-06-10 06:42:34,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.55 | bwd_microstep: 1409.60 | bwd_inner_microstep: 1409.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898
[2024-06-10 06:42:37,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.54 | bwd_microstep: 1683.35 | bwd_inner_microstep: 1683.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 06:42:39,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1484.52 | bwd_inner_microstep: 1484.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233
[2024-06-10 06:42:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.43 | bwd_microstep: 959.51 | bwd_inner_microstep: 959.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 06:42:42,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1386.19 | bwd_inner_microstep: 1386.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 06:42:44,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1547.54 | bwd_inner_microstep: 1547.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-10 06:42:46,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.44 | bwd_microstep: 1632.87 | bwd_inner_microstep: 1632.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 06:42:48,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.49 | bwd_microstep: 1301.91 | bwd_inner_microstep: 1301.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 06:42:50,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1285.84 | bwd_inner_microstep: 1285.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 06:42:52,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.71 | bwd_microstep: 1456.60 | bwd_inner_microstep: 1456.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 06:42:54,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.40 | bwd_microstep: 1481.40 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964
[2024-06-10 06:42:55,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.11 | bwd_microstep: 829.99 | bwd_inner_microstep: 829.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 06:42:57,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1345.93 | bwd_inner_microstep: 1345.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2913
[2024-06-10 06:42:59,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.32 | bwd_microstep: 1189.72 | bwd_inner_microstep: 1189.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 06:43:01,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.65 | bwd_microstep: 1440.22 | bwd_inner_microstep: 1440.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 06:43:03,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.66 | bwd_microstep: 1586.59 | bwd_inner_microstep: 1586.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 06:43:04,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.31 | bwd_microstep: 802.31 | bwd_inner_microstep: 802.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 06:43:06,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 06:43:08,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.69 | bwd_microstep: 1353.79 | bwd_inner_microstep: 1353.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2433
[2024-06-10 06:43:09,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.87 | bwd_microstep: 1042.26 | bwd_inner_microstep: 1042.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 06:43:11,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.79 | bwd_microstep: 1358.98 | bwd_inner_microstep: 1358.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 06:43:13,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.74 | bwd_microstep: 1257.24 | bwd_inner_microstep: 1257.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-10 06:43:15,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.09 | bwd_microstep: 1540.64 | bwd_inner_microstep: 1540.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3812
[2024-06-10 06:43:17,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1420.98 | bwd_inner_microstep: 1420.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 06:43:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1253.18 | bwd_inner_microstep: 1253.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549
[2024-06-10 06:43:21,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.18 | bwd_microstep: 1440.73 | bwd_inner_microstep: 1440.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 06:43:23,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.47 | bwd_microstep: 1503.61 | bwd_inner_microstep: 1503.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3770
[2024-06-10 06:43:25,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.92 | bwd_microstep: 1606.98 | bwd_inner_microstep: 1606.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 06:43:27,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1442.49 | bwd_inner_microstep: 1442.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 06:43:29,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1649.90 | bwd_inner_microstep: 1649.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3738
[2024-06-10 06:43:32,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.64
[2024-06-10 06:43:32,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.20 | bwd_microstep: 1769.41 | bwd_inner_microstep: 1761.70 | bwd_allreduce_microstep: 7.66 | step_microstep: 38.32
[2024-06-10 06:43:32,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16324.05 | bwd: 43745.36 | bwd_inner: 43736.59 | bwd_allreduce: 7.98 | step: 40.15
{'loss': 1.3545, 'learning_rate': 3.697288514989502e-05, 'epoch': 0.2}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1929
[2024-06-10 06:43:33,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.24 | bwd_microstep: 882.22 | bwd_inner_microstep: 882.06 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3985
[2024-06-10 06:43:35,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.20 | bwd_microstep: 1610.84 | bwd_inner_microstep: 1610.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 06:43:37,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 1553.80 | bwd_inner_microstep: 1553.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840
[2024-06-10 06:43:40,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1658.49 | bwd_inner_microstep: 1658.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 06:43:41,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-10 06:43:44,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1643.16 | bwd_inner_microstep: 1643.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 06:43:45,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.72 | bwd_microstep: 1248.57 | bwd_inner_microstep: 1248.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2440
[2024-06-10 06:43:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.65 | bwd_microstep: 948.31 | bwd_inner_microstep: 948.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3408
[2024-06-10 06:43:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.18 | bwd_microstep: 1213.02 | bwd_inner_microstep: 1212.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619
[2024-06-10 06:43:50,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.80 | bwd_microstep: 1316.46 | bwd_inner_microstep: 1316.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 06:43:52,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.35 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 06:43:54,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 06:43:56,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1384.35 | bwd_inner_microstep: 1384.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-10 06:43:58,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.42 | bwd_microstep: 1626.73 | bwd_inner_microstep: 1626.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 06:43:59,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 892.06 | bwd_inner_microstep: 892.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3535
[2024-06-10 06:44:01,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.31 | bwd_microstep: 1577.32 | bwd_inner_microstep: 1577.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2106
[2024-06-10 06:44:02,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.02 | bwd_microstep: 825.28 | bwd_inner_microstep: 825.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 06:44:04,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1534.97 | bwd_inner_microstep: 1534.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 06:44:06,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 06:44:08,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.46 | bwd_microstep: 1463.23 | bwd_inner_microstep: 1463.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3858
[2024-06-10 06:44:10,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1371.18 | bwd_inner_microstep: 1371.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641
[2024-06-10 06:44:12,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1318.45 | bwd_inner_microstep: 1318.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 06:44:14,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.85 | bwd_microstep: 1453.47 | bwd_inner_microstep: 1453.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 06:44:16,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 06:44:18,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1509.84 | bwd_inner_microstep: 1509.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2393
[2024-06-10 06:44:19,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.65 | bwd_microstep: 946.29 | bwd_inner_microstep: 946.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3627
[2024-06-10 06:44:22,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.91 | bwd_microstep: 1575.52 | bwd_inner_microstep: 1575.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-10 06:44:23,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.17 | bwd_microstep: 1070.99 | bwd_inner_microstep: 1070.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2236
[2024-06-10 06:44:24,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.32 | bwd_microstep: 898.91 | bwd_inner_microstep: 898.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3774
[2024-06-10 06:44:26,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.16 | bwd_microstep: 1477.23 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 06:44:28,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1304.42 | bwd_inner_microstep: 1304.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 06:44:34,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 06:44:34,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.43 | bwd_microstep: 4668.07 | bwd_inner_microstep: 1955.57 | bwd_allreduce_microstep: 2712.45 | step_microstep: 38.78
[2024-06-10 06:44:34,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15912.37 | bwd: 45568.45 | bwd_inner: 42854.98 | bwd_allreduce: 2712.73 | step: 40.44
{'loss': 1.2517, 'learning_rate': 3.6953001157063686e-05, 'epoch': 0.2}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482
[2024-06-10 06:44:36,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.63 | bwd_microstep: 1571.90 | bwd_inner_microstep: 1571.70 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3920
[2024-06-10 06:44:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1484.57 | bwd_inner_microstep: 1484.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 06:44:40,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1343.80 | bwd_inner_microstep: 1343.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 06:44:42,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1476.83 | bwd_inner_microstep: 1476.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 06:44:43,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.71 | bwd_microstep: 1312.95 | bwd_inner_microstep: 1312.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 06:44:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1288.50 | bwd_inner_microstep: 1288.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 06:44:47,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.81 | bwd_microstep: 1152.57 | bwd_inner_microstep: 1152.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 06:44:48,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 796.74 | bwd_inner_microstep: 796.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 06:44:50,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.11 | bwd_microstep: 1279.24 | bwd_inner_microstep: 1279.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 06:44:52,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.82 | bwd_microstep: 1483.34 | bwd_inner_microstep: 1483.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2873
[2024-06-10 06:44:53,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.21 | bwd_microstep: 1175.35 | bwd_inner_microstep: 1175.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 06:44:56,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1583.98 | bwd_inner_microstep: 1583.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-10 06:44:58,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1596.20 | bwd_inner_microstep: 1596.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3658
[2024-06-10 06:45:00,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3682
[2024-06-10 06:45:02,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.89 | bwd_microstep: 1717.47 | bwd_inner_microstep: 1717.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 06:45:03,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.82 | bwd_microstep: 791.21 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 06:45:05,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.32 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 06:45:07,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1255.60 | bwd_inner_microstep: 1255.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 06:45:08,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.11 | bwd_microstep: 1290.29 | bwd_inner_microstep: 1290.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505
[2024-06-10 06:45:10,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.53 | bwd_microstep: 1225.82 | bwd_inner_microstep: 1225.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 06:45:12,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1412.67 | bwd_inner_microstep: 1412.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 06:45:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.77 | bwd_microstep: 1296.19 | bwd_inner_microstep: 1296.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 06:45:16,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.30 | bwd_microstep: 1307.90 | bwd_inner_microstep: 1307.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 06:45:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.73 | bwd_microstep: 1310.64 | bwd_inner_microstep: 1310.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-10 06:45:19,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.69 | bwd_microstep: 1382.44 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 06:45:21,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.56 | bwd_microstep: 977.18 | bwd_inner_microstep: 977.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3605
[2024-06-10 06:45:23,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.10 | bwd_microstep: 1373.79 | bwd_inner_microstep: 1373.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755
[2024-06-10 06:45:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 1344.78 | bwd_inner_microstep: 1344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 06:45:27,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 1478.45 | bwd_inner_microstep: 1478.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 06:45:29,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.08 | bwd_microstep: 1444.49 | bwd_inner_microstep: 1444.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 06:45:31,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.44 | bwd_microstep: 1459.01 | bwd_inner_microstep: 1458.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 06:45:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.63
[2024-06-10 06:45:35,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.02 | bwd_microstep: 4279.75 | bwd_inner_microstep: 1691.65 | bwd_allreduce_microstep: 2588.06 | step_microstep: 38.60
[2024-06-10 06:45:35,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16065.73 | bwd: 45562.82 | bwd_inner: 42973.70 | bwd_allreduce: 2588.36 | step: 40.27


 20%|█▉        | 345/1726 [6:02:04<23:17:10, 60.70s/it]
 20%|██        | 346/1726 [6:03:04<23:10:47, 60.47s/it]


 20%|██        | 346/1726 [6:03:04<23:10:47, 60.47s/it]
 20%|██        | 347/1726 [6:04:05<23:13:00, 60.61s/it]


 20%|██        | 347/1726 [6:04:05<23:13:00, 60.61s/it]
 20%|██        | 348/1726 [6:05:08<23:29:36, 61.38s/it]


 20%|██        | 348/1726 [6:05:08<23:29:36, 61.38s/it]
 20%|██        | 349/1726 [6:06:08<23:22:05, 61.09s/it]


 20%|██        | 349/1726 [6:06:08<23:22:05, 61.09s/it]
 20%|██        | 350/1726 [6:07:10<23:26:09, 61.32s/it]


 20%|██        | 350/1726 [6:07:10<23:26:09, 61.32s/it]
 20%|██        | 351/1726 [6:08:12<23:29:41, 61.51s/it]
             {'loss': 1.2848, 'learning_rate': 3.693305745585041e-05, 'epoch': 0.2}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 06:45:37,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.42 | bwd_microstep: 1392.61 | bwd_inner_microstep: 1392.53 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 06:45:39,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 06:45:41,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 06:45:43,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1342.36 | bwd_inner_microstep: 1342.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402
[2024-06-10 06:45:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.61 | bwd_microstep: 1209.79 | bwd_inner_microstep: 1209.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2962
[2024-06-10 06:45:46,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.47 | bwd_microstep: 1103.54 | bwd_inner_microstep: 1103.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 06:45:48,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1383.85 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775
[2024-06-10 06:45:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1475.26 | bwd_inner_microstep: 1475.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 06:45:52,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.44 | bwd_microstep: 1251.54 | bwd_inner_microstep: 1251.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2221
[2024-06-10 06:45:53,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.31 | bwd_microstep: 832.87 | bwd_inner_microstep: 832.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958
[2024-06-10 06:45:54,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.21 | bwd_microstep: 890.05 | bwd_inner_microstep: 890.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675
[2024-06-10 06:45:56,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.00 | bwd_microstep: 1485.20 | bwd_inner_microstep: 1485.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3706
[2024-06-10 06:45:59,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.62 | bwd_microstep: 1723.11 | bwd_inner_microstep: 1723.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430
[2024-06-10 06:46:00,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1281.01 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3508
[2024-06-10 06:46:02,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.91 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 06:46:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1278.31 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-10 06:46:06,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.95 | bwd_microstep: 1525.49 | bwd_inner_microstep: 1525.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 06:46:08,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1357.24 | bwd_inner_microstep: 1357.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 06:46:10,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1558.33 | bwd_inner_microstep: 1558.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 06:46:12,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.78 | bwd_microstep: 1614.54 | bwd_inner_microstep: 1614.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 06:46:14,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.16 | bwd_microstep: 1613.43 | bwd_inner_microstep: 1613.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-10 06:46:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.51 | bwd_microstep: 1445.33 | bwd_inner_microstep: 1445.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2565
[2024-06-10 06:46:18,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.92 | bwd_microstep: 1071.09 | bwd_inner_microstep: 1071.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 06:46:20,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 06:46:22,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.61 | bwd_microstep: 1466.54 | bwd_inner_microstep: 1466.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 06:46:23,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.02 | bwd_microstep: 705.00 | bwd_inner_microstep: 704.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3726
[2024-06-10 06:46:25,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1615.11 | bwd_inner_microstep: 1615.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2241
[2024-06-10 06:46:27,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.43 | bwd_microstep: 930.25 | bwd_inner_microstep: 930.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592
[2024-06-10 06:46:29,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.17 | bwd_microstep: 1703.09 | bwd_inner_microstep: 1703.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 06:46:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.44 | bwd_microstep: 973.36 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 06:46:32,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1353.42 | bwd_inner_microstep: 1353.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 06:46:38,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 06:46:38,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.74 | bwd_microstep: 5464.95 | bwd_inner_microstep: 2167.85 | bwd_allreduce_microstep: 3297.05 | step_microstep: 38.75
[2024-06-10 06:46:38,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15925.00 | bwd: 46423.34 | bwd_inner: 43125.31 | bwd_allreduce: 3297.32 | step: 40.49
{'loss': 1.3025, 'learning_rate': 3.6913054116496797e-05, 'epoch': 0.2}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 06:46:40,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1365.97 | bwd_inner_microstep: 1365.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898
[2024-06-10 06:46:42,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1681.65 | bwd_inner_microstep: 1681.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 06:46:44,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1449.57 | bwd_inner_microstep: 1449.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3873
[2024-06-10 06:46:46,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.29 | bwd_microstep: 1509.14 | bwd_inner_microstep: 1509.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 06:46:48,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1280.09 | bwd_inner_microstep: 1280.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3810
[2024-06-10 06:46:50,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.74 | bwd_microstep: 1512.59 | bwd_inner_microstep: 1512.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482
[2024-06-10 06:46:52,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.60 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601
[2024-06-10 06:46:54,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1309.75 | bwd_inner_microstep: 1309.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 06:46:55,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 790.09 | bwd_inner_microstep: 790.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-10 06:46:57,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.10 | bwd_microstep: 1192.31 | bwd_inner_microstep: 1192.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 06:46:59,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1391.28 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 06:47:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1497.19 | bwd_inner_microstep: 1497.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2007
[2024-06-10 06:47:02,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.88 | bwd_microstep: 930.32 | bwd_inner_microstep: 930.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 06:47:04,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1658.15 | bwd_inner_microstep: 1658.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 06:47:06,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1508.81 | bwd_inner_microstep: 1508.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 06:47:08,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.23 | bwd_microstep: 1403.04 | bwd_inner_microstep: 1403.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3468
[2024-06-10 06:47:10,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1500.35 | bwd_inner_microstep: 1500.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 06:47:11,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.70 | bwd_microstep: 790.14 | bwd_inner_microstep: 790.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3632
[2024-06-10 06:47:13,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1538.05 | bwd_inner_microstep: 1538.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2190
[2024-06-10 06:47:15,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.49 | bwd_microstep: 770.11 | bwd_inner_microstep: 770.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3823
[2024-06-10 06:47:17,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.13 | bwd_microstep: 1638.79 | bwd_inner_microstep: 1638.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 06:47:18,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.94 | bwd_microstep: 980.17 | bwd_inner_microstep: 980.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 06:47:20,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.62 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299
[2024-06-10 06:47:22,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.72 | bwd_microstep: 979.41 | bwd_inner_microstep: 979.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 06:47:24,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1409.19 | bwd_inner_microstep: 1409.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 06:47:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.79 | bwd_microstep: 1376.81 | bwd_inner_microstep: 1376.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 06:47:27,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.00 | bwd_microstep: 1403.67 | bwd_inner_microstep: 1403.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-10 06:47:29,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.13 | bwd_microstep: 878.64 | bwd_inner_microstep: 878.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 06:47:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1411.44 | bwd_inner_microstep: 1411.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398
[2024-06-10 06:47:32,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.57 | bwd_microstep: 1372.05 | bwd_inner_microstep: 1372.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 06:47:34,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 06:47:41,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.38 | optimizer_step: 6.58
[2024-06-10 06:47:41,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.62 | bwd_microstep: 6253.91 | bwd_inner_microstep: 2153.74 | bwd_allreduce_microstep: 4100.11 | step_microstep: 39.31
[2024-06-10 06:47:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15826.42 | bwd: 46904.69 | bwd_inner: 42803.62 | bwd_allreduce: 4100.35 | step: 40.91
{'loss': 1.2939, 'learning_rate': 3.689299120945451e-05, 'epoch': 0.2}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 06:47:43,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.44 | bwd_microstep: 1510.74 | bwd_inner_microstep: 1510.64 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 06:47:45,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2347
[2024-06-10 06:47:47,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.36 | bwd_microstep: 987.44 | bwd_inner_microstep: 987.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 913
[2024-06-10 06:47:47,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.37 | bwd_microstep: 373.23 | bwd_inner_microstep: 373.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 06:47:49,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1247.33 | bwd_inner_microstep: 1247.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3793
[2024-06-10 06:47:51,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3392
[2024-06-10 06:47:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1296.45 | bwd_inner_microstep: 1296.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3784
[2024-06-10 06:47:55,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.82 | bwd_microstep: 1576.00 | bwd_inner_microstep: 1575.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 06:47:57,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1247.24 | bwd_inner_microstep: 1247.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3735
[2024-06-10 06:47:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.63 | bwd_microstep: 1731.04 | bwd_inner_microstep: 1731.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 06:48:01,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1349.20 | bwd_inner_microstep: 1349.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3534
[2024-06-10 06:48:03,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.96 | bwd_microstep: 1355.66 | bwd_inner_microstep: 1355.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-10 06:48:04,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.31 | bwd_microstep: 916.69 | bwd_inner_microstep: 916.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 06:48:06,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1348.40 | bwd_inner_microstep: 1348.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2455
[2024-06-10 06:48:07,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.52 | bwd_microstep: 1113.84 | bwd_inner_microstep: 1113.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3521
[2024-06-10 06:48:10,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.71 | bwd_microstep: 1691.85 | bwd_inner_microstep: 1691.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3889
[2024-06-10 06:48:12,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1682.87 | bwd_inner_microstep: 1682.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3675
[2024-06-10 06:48:15,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.58 | bwd_microstep: 1825.82 | bwd_inner_microstep: 1825.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1149
[2024-06-10 06:48:15,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 182.96 | bwd_microstep: 478.43 | bwd_inner_microstep: 478.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505
[2024-06-10 06:48:17,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.45 | bwd_microstep: 1224.99 | bwd_inner_microstep: 1224.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 06:48:19,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1256.28 | bwd_inner_microstep: 1256.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100
[2024-06-10 06:48:20,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.26 | bwd_microstep: 923.98 | bwd_inner_microstep: 923.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545
[2024-06-10 06:48:22,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.13 | bwd_microstep: 1356.48 | bwd_inner_microstep: 1356.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3602
[2024-06-10 06:48:24,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.95 | bwd_microstep: 1472.94 | bwd_inner_microstep: 1472.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 06:48:26,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.92 | bwd_microstep: 1630.87 | bwd_inner_microstep: 1630.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2040
[2024-06-10 06:48:27,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.23 | bwd_microstep: 907.29 | bwd_inner_microstep: 907.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3461
[2024-06-10 06:48:29,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.48 | bwd_microstep: 1244.16 | bwd_inner_microstep: 1244.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 06:48:31,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1558.51 | bwd_inner_microstep: 1558.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928
[2024-06-10 06:48:32,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.93 | bwd_microstep: 698.41 | bwd_inner_microstep: 698.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 06:48:34,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.82 | bwd_microstep: 1512.85 | bwd_inner_microstep: 1512.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 06:48:36,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1483.16 | bwd_inner_microstep: 1483.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 06:48:43,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 06:48:43,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 5818.28 | bwd_inner_microstep: 1863.42 | bwd_allreduce_microstep: 3954.81 | step_microstep: 38.64
[2024-06-10 06:48:43,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15435.36 | bwd: 45681.80 | bwd_inner: 41725.98 | bwd_allreduce: 3955.10 | step: 40.34
{'loss': 1.3562, 'learning_rate': 3.6872868805385004e-05, 'epoch': 0.21}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-10 06:48:44,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1235.43 | bwd_inner_microstep: 1235.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2450
[2024-06-10 06:48:46,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 949.98 | bwd_inner_microstep: 949.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939
[2024-06-10 06:48:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.82 | bwd_microstep: 1691.74 | bwd_inner_microstep: 1691.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894
[2024-06-10 06:48:50,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.99 | bwd_microstep: 1584.93 | bwd_inner_microstep: 1584.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3783
[2024-06-10 06:48:52,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.34 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 06:48:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.37 | bwd_microstep: 804.75 | bwd_inner_microstep: 804.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 760
[2024-06-10 06:48:54,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.72 | bwd_microstep: 303.79 | bwd_inner_microstep: 303.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 06:48:55,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 797.85 | bwd_inner_microstep: 797.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 06:48:57,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734
[2024-06-10 06:48:59,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1535.59 | bwd_inner_microstep: 1535.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 06:49:01,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1421.19 | bwd_inner_microstep: 1421.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3715
[2024-06-10 06:49:03,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.23 | bwd_microstep: 1666.86 | bwd_inner_microstep: 1666.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 06:49:05,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1374.30 | bwd_inner_microstep: 1374.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3378
[2024-06-10 06:49:07,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1487.20 | bwd_inner_microstep: 1487.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3698
[2024-06-10 06:49:09,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.13 | bwd_microstep: 1725.80 | bwd_inner_microstep: 1725.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 06:49:11,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1527.01 | bwd_inner_microstep: 1526.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 06:49:14,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.60 | bwd_microstep: 1614.15 | bwd_inner_microstep: 1614.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 06:49:15,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1387.00 | bwd_inner_microstep: 1386.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 06:49:18,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1513.38 | bwd_inner_microstep: 1513.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1269
[2024-06-10 06:49:18,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 174.80 | bwd_microstep: 430.28 | bwd_inner_microstep: 430.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3630
[2024-06-10 06:49:20,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.84 | bwd_microstep: 1479.90 | bwd_inner_microstep: 1479.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3474
[2024-06-10 06:49:22,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.72 | bwd_microstep: 1343.67 | bwd_inner_microstep: 1343.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-10 06:49:24,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.98 | bwd_microstep: 1637.42 | bwd_inner_microstep: 1637.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3609
[2024-06-10 06:49:26,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1462.88 | bwd_inner_microstep: 1462.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 06:49:28,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1528.43 | bwd_inner_microstep: 1528.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933
[2024-06-10 06:49:31,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1602.94 | bwd_inner_microstep: 1602.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-10 06:49:32,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.92 | bwd_microstep: 1185.08 | bwd_inner_microstep: 1185.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 06:49:34,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.26 | bwd_microstep: 918.22 | bwd_inner_microstep: 918.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3817
[2024-06-10 06:49:36,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1419.37 | bwd_inner_microstep: 1419.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 06:49:37,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.41 | bwd_microstep: 974.77 | bwd_inner_microstep: 974.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 06:49:39,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1509.97 | bwd_inner_microstep: 1509.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424
[2024-06-10 06:49:43,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 06:49:43,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 3335.73 | bwd_inner_microstep: 1754.65 | bwd_allreduce_microstep: 1581.03 | step_microstep: 38.61
[2024-06-10 06:49:43,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15788.01 | bwd: 44047.12 | bwd_inner: 42465.13 | bwd_allreduce: 1581.27 | step: 40.31
{'loss': 1.2431, 'learning_rate': 3.685268697515928e-05, 'epoch': 0.21}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3379
[2024-06-10 06:49:45,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.89 | bwd_microstep: 1263.76 | bwd_inner_microstep: 1263.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4000
[2024-06-10 06:49:47,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.07 | bwd_microstep: 1609.73 | bwd_inner_microstep: 1609.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4342
[2024-06-10 06:49:49,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.64 | bwd_microstep: 1602.82 | bwd_inner_microstep: 1602.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 06:49:50,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.04 | bwd_microstep: 795.75 | bwd_inner_microstep: 795.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 06:49:52,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.08 | bwd_microstep: 1515.79 | bwd_inner_microstep: 1515.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 06:49:54,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.61 | bwd_microstep: 1387.41 | bwd_inner_microstep: 1387.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 06:49:56,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.35 | bwd_microstep: 1252.33 | bwd_inner_microstep: 1252.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 06:49:58,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.25 | bwd_microstep: 1436.51 | bwd_inner_microstep: 1436.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 06:50:00,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1251.14 | bwd_inner_microstep: 1251.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 946
[2024-06-10 06:50:00,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 149.12 | bwd_microstep: 378.91 | bwd_inner_microstep: 378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 873
[2024-06-10 06:50:01,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.51 | bwd_microstep: 399.25 | bwd_inner_microstep: 399.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 06:50:03,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.91 | bwd_microstep: 1341.76 | bwd_inner_microstep: 1341.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 06:50:05,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1342.51 | bwd_inner_microstep: 1342.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 06:50:06,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 06:50:09,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.68 | bwd_microstep: 1724.28 | bwd_inner_microstep: 1724.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 06:50:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1646.11 | bwd_inner_microstep: 1646.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 06:50:13,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.53 | bwd_microstep: 1490.19 | bwd_inner_microstep: 1490.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102
[2024-06-10 06:50:14,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.26 | bwd_microstep: 823.13 | bwd_inner_microstep: 823.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 06:50:16,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1296.11 | bwd_inner_microstep: 1296.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 06:50:18,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1285.94 | bwd_inner_microstep: 1285.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 06:50:19,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.48 | bwd_microstep: 803.48 | bwd_inner_microstep: 803.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3614
[2024-06-10 06:50:21,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1314.01 | bwd_inner_microstep: 1313.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 06:50:22,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.48 | bwd_microstep: 1163.78 | bwd_inner_microstep: 1163.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 06:50:24,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.25 | bwd_microstep: 1157.47 | bwd_inner_microstep: 1157.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488
[2024-06-10 06:50:26,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.91 | bwd_microstep: 1220.80 | bwd_inner_microstep: 1220.52 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.15
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3556
[2024-06-10 06:50:27,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.89 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3743
[2024-06-10 06:50:30,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.54 | bwd_microstep: 1738.50 | bwd_inner_microstep: 1738.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 06:50:32,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.65 | bwd_microstep: 1606.70 | bwd_inner_microstep: 1606.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3797
[2024-06-10 06:50:34,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.88 | bwd_microstep: 1684.91 | bwd_inner_microstep: 1684.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 06:50:36,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1403.88 | bwd_inner_microstep: 1403.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3597
[2024-06-10 06:50:39,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.94 | bwd_microstep: 1701.06 | bwd_inner_microstep: 1701.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 06:50:43,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 06:50:43,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.26 | bwd_microstep: 4065.07 | bwd_inner_microstep: 1096.55 | bwd_allreduce_microstep: 2968.46 | step_microstep: 38.76
[2024-06-10 06:50:43,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15446.46 | bwd: 44295.16 | bwd_inner: 41325.60 | bwd_allreduce: 2968.79 | step: 40.55
{'loss': 1.2607, 'learning_rate': 3.683244578985763e-05, 'epoch': 0.21}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3470
[2024-06-10 06:50:45,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.28 | bwd_microstep: 1493.83 | bwd_inner_microstep: 1493.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 06:50:47,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.84 | bwd_microstep: 1282.25 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4268
[2024-06-10 06:50:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.85 | bwd_microstep: 1667.64 | bwd_inner_microstep: 1667.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 06:50:51,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1343.64 | bwd_inner_microstep: 1343.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 06:50:53,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 06:50:55,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1352.70 | bwd_inner_microstep: 1352.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-10 06:50:56,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.03 | bwd_microstep: 1158.74 | bwd_inner_microstep: 1158.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 06:50:58,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 06:51:00,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1388.76 | bwd_inner_microstep: 1388.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1900
[2024-06-10 06:51:01,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.67 | bwd_microstep: 748.81 | bwd_inner_microstep: 748.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960
[2024-06-10 06:51:02,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.54 | bwd_microstep: 832.15 | bwd_inner_microstep: 832.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2318
[2024-06-10 06:51:04,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.06 | bwd_microstep: 890.78 | bwd_inner_microstep: 890.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3681
[2024-06-10 06:51:06,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.59 | bwd_microstep: 1723.30 | bwd_inner_microstep: 1723.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3881
[2024-06-10 06:51:08,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.68 | bwd_microstep: 1749.96 | bwd_inner_microstep: 1749.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 06:51:11,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.59 | bwd_microstep: 1620.48 | bwd_inner_microstep: 1620.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 06:51:12,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1383.37 | bwd_inner_microstep: 1383.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028
[2024-06-10 06:51:14,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.33 | bwd_microstep: 904.35 | bwd_inner_microstep: 904.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 06:51:16,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1409.67 | bwd_inner_microstep: 1409.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 06:51:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.74 | bwd_microstep: 1401.98 | bwd_inner_microstep: 1401.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 06:51:20,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1555.55 | bwd_inner_microstep: 1555.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3517
[2024-06-10 06:51:22,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.45 | bwd_microstep: 1555.97 | bwd_inner_microstep: 1555.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830
[2024-06-10 06:51:24,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1366.56 | bwd_inner_microstep: 1366.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3049
[2024-06-10 06:51:25,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.73 | bwd_microstep: 1139.11 | bwd_inner_microstep: 1139.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-10 06:51:27,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1511.90 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2110
[2024-06-10 06:51:29,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.69 | bwd_microstep: 924.22 | bwd_inner_microstep: 924.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 06:51:31,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.42 | bwd_microstep: 1660.99 | bwd_inner_microstep: 1660.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2399
[2024-06-10 06:51:32,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.76 | bwd_microstep: 1001.43 | bwd_inner_microstep: 1001.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529
[2024-06-10 06:51:34,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1354.42 | bwd_inner_microstep: 1354.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546
[2024-06-10 06:51:36,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.99 | bwd_microstep: 1330.15 | bwd_inner_microstep: 1330.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3672
[2024-06-10 06:51:38,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.47 | bwd_microstep: 1482.42 | bwd_inner_microstep: 1482.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3532
[2024-06-10 06:51:41,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.92 | bwd_microstep: 1688.39 | bwd_inner_microstep: 1688.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 06:51:47,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.33 | optimizer_step: 6.58
[2024-06-10 06:51:47,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 5780.62 | bwd_inner_microstep: 1518.75 | bwd_allreduce_microstep: 4261.80 | step_microstep: 38.96
[2024-06-10 06:51:47,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16071.53 | bwd: 47373.67 | bwd_inner: 43110.94 | bwd_allreduce: 4262.04 | step: 40.57


 20%|██        | 351/1726 [6:08:12<23:29:41, 61.51s/it]
 20%|██        | 352/1726 [6:09:15<23:36:48, 61.87s/it]


 20%|██        | 352/1726 [6:09:15<23:36:48, 61.87s/it]
 20%|██        | 353/1726 [6:10:18<23:44:07, 62.23s/it]


 20%|██        | 353/1726 [6:10:18<23:44:07, 62.23s/it]
 21%|██        | 354/1726 [6:11:19<23:37:53, 62.01s/it]


 21%|██        | 354/1726 [6:11:20<23:37:53, 62.01s/it]
 21%|██        | 355/1726 [6:12:20<23:24:30, 61.47s/it]


 21%|██        | 355/1726 [6:12:20<23:24:30, 61.47s/it]
 21%|██        | 356/1726 [6:13:20<23:14:03, 61.05s/it]


 21%|██        | 356/1726 [6:13:20<23:14:03, 61.05s/it]
 21%|██        | 357/1726 [6:14{'loss': 1.2993, 'learning_rate': 3.6812145320769415e-05, 'epoch': 0.21}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 06:51:49,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1471.17 | bwd_inner_microstep: 1471.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-10 06:51:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.16 | bwd_microstep: 697.40 | bwd_inner_microstep: 697.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3866
[2024-06-10 06:51:52,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.00 | bwd_microstep: 1301.33 | bwd_inner_microstep: 1301.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2343
[2024-06-10 06:51:53,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.74 | bwd_microstep: 923.76 | bwd_inner_microstep: 923.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 06:51:55,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1379.79 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 06:51:57,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.92 | bwd_microstep: 1543.40 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2475
[2024-06-10 06:51:58,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.99 | bwd_microstep: 1051.48 | bwd_inner_microstep: 1051.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 06:52:00,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1249.37 | bwd_inner_microstep: 1249.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 06:52:02,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.81 | bwd_microstep: 1399.12 | bwd_inner_microstep: 1399.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 06:52:04,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1249.32 | bwd_inner_microstep: 1249.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 06:52:05,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.37 | bwd_microstep: 806.01 | bwd_inner_microstep: 805.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 06:52:07,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1448.92 | bwd_inner_microstep: 1448.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3514
[2024-06-10 06:52:09,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1337.60 | bwd_inner_microstep: 1337.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 06:52:11,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.61 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 06:52:13,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.83 | bwd_microstep: 1312.36 | bwd_inner_microstep: 1312.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3686
[2024-06-10 06:52:15,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.19 | bwd_microstep: 1785.26 | bwd_inner_microstep: 1785.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3446
[2024-06-10 06:52:17,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1305.27 | bwd_inner_microstep: 1305.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967
[2024-06-10 06:52:18,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.97 | bwd_microstep: 704.72 | bwd_inner_microstep: 704.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952
[2024-06-10 06:52:20,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.64 | bwd_microstep: 1700.70 | bwd_inner_microstep: 1700.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 06:52:22,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1616.39 | bwd_inner_microstep: 1616.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290
[2024-06-10 06:52:24,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.07 | bwd_microstep: 915.33 | bwd_inner_microstep: 915.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-10 06:52:25,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.44 | bwd_microstep: 916.73 | bwd_inner_microstep: 916.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 06:52:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.84 | bwd_microstep: 1661.25 | bwd_inner_microstep: 1661.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 06:52:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1399.78 | bwd_inner_microstep: 1399.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 06:52:31,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1292.08 | bwd_inner_microstep: 1292.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3552
[2024-06-10 06:52:33,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.24 | bwd_microstep: 1204.69 | bwd_inner_microstep: 1204.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 06:52:34,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.97 | bwd_microstep: 1301.78 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3568
[2024-06-10 06:52:37,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.67 | bwd_microstep: 1631.75 | bwd_inner_microstep: 1631.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596
[2024-06-10 06:52:39,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.79 | bwd_microstep: 1466.89 | bwd_inner_microstep: 1466.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 06:52:41,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.07 | bwd_microstep: 1506.64 | bwd_inner_microstep: 1506.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 06:52:43,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.89 | bwd_microstep: 1637.34 | bwd_inner_microstep: 1637.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2378
[2024-06-10 06:52:46,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 06:52:46,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.45 | bwd_microstep: 2048.73 | bwd_inner_microstep: 1056.14 | bwd_allreduce_microstep: 992.53 | step_microstep: 38.60
[2024-06-10 06:52:46,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15600.75 | bwd: 42751.17 | bwd_inner: 41757.73 | bwd_allreduce: 992.76 | step: 40.18
{'loss': 1.2862, 'learning_rate': 3.679178563939278e-05, 'epoch': 0.21}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 06:52:47,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.76 | bwd_microstep: 1277.38 | bwd_inner_microstep: 1277.20 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 06:52:49,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2321
[2024-06-10 06:52:51,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.48 | bwd_microstep: 984.58 | bwd_inner_microstep: 984.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 06:52:52,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1394.00 | bwd_inner_microstep: 1393.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4103
[2024-06-10 06:52:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.25 | bwd_microstep: 1734.29 | bwd_inner_microstep: 1734.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 06:52:57,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.44 | bwd_microstep: 1179.74 | bwd_inner_microstep: 1179.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 06:52:58,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1390.06 | bwd_inner_microstep: 1390.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423
[2024-06-10 06:53:00,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.42 | bwd_microstep: 1155.47 | bwd_inner_microstep: 1155.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 06:53:02,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.29 | bwd_microstep: 1406.86 | bwd_inner_microstep: 1406.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 06:53:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.20 | bwd_microstep: 1420.21 | bwd_inner_microstep: 1420.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-10 06:53:06,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.87 | bwd_microstep: 1625.40 | bwd_inner_microstep: 1625.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 06:53:08,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1317.08 | bwd_inner_microstep: 1317.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 06:53:10,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1592.88 | bwd_inner_microstep: 1592.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3665
[2024-06-10 06:53:12,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.00 | bwd_microstep: 1653.24 | bwd_inner_microstep: 1653.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 06:53:14,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.21 | bwd_microstep: 1389.18 | bwd_inner_microstep: 1389.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1952
[2024-06-10 06:53:15,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.70 | bwd_microstep: 699.20 | bwd_inner_microstep: 699.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 06:53:17,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1488.38 | bwd_inner_microstep: 1488.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628
[2024-06-10 06:53:20,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1539.61 | bwd_inner_microstep: 1539.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 06:53:22,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1497.77 | bwd_inner_microstep: 1497.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 06:53:23,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1281.88 | bwd_inner_microstep: 1281.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 06:53:25,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.47 | bwd_microstep: 1261.15 | bwd_inner_microstep: 1261.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 06:53:27,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.76 | bwd_microstep: 1188.63 | bwd_inner_microstep: 1188.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 06:53:29,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1403.10 | bwd_inner_microstep: 1403.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 06:53:31,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 06:53:33,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.87 | bwd_microstep: 1553.27 | bwd_inner_microstep: 1553.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 06:53:35,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1503.23 | bwd_inner_microstep: 1503.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2878
[2024-06-10 06:53:37,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.83 | bwd_microstep: 1185.14 | bwd_inner_microstep: 1185.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 06:53:38,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 730.05 | bwd_inner_microstep: 730.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818
[2024-06-10 06:53:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.81 | bwd_microstep: 1623.05 | bwd_inner_microstep: 1623.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 06:53:42,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1347.66 | bwd_inner_microstep: 1347.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774
[2024-06-10 06:53:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.10 | bwd_microstep: 1741.94 | bwd_inner_microstep: 1741.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 06:53:47,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 06:53:47,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.93 | bwd_microstep: 1658.80 | bwd_inner_microstep: 1650.75 | bwd_allreduce_microstep: 8.00 | step_microstep: 38.08
[2024-06-10 06:53:47,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16495.37 | bwd: 44129.46 | bwd_inner: 44120.41 | bwd_allreduce: 8.30 | step: 39.74
{'loss': 1.2611, 'learning_rate': 3.6771366817434416e-05, 'epoch': 0.21}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3471
[2024-06-10 06:53:49,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1576.14 | bwd_inner_microstep: 1576.06 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 06:53:51,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1407.34 | bwd_inner_microstep: 1407.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 06:53:53,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1351.18 | bwd_inner_microstep: 1351.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3863
[2024-06-10 06:53:55,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1564.31 | bwd_inner_microstep: 1564.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 06:53:56,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.20 | bwd_microstep: 1191.42 | bwd_inner_microstep: 1191.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 06:53:58,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1544.16 | bwd_inner_microstep: 1544.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 06:54:00,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1359.39 | bwd_inner_microstep: 1359.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 06:54:02,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1530.35 | bwd_inner_microstep: 1530.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-10 06:54:04,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.65 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1967
[2024-06-10 06:54:06,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.76 | bwd_microstep: 861.45 | bwd_inner_microstep: 861.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3489
[2024-06-10 06:54:08,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.74 | bwd_microstep: 1615.55 | bwd_inner_microstep: 1615.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 06:54:10,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1514.79 | bwd_inner_microstep: 1514.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-10 06:54:12,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.00 | bwd_microstep: 1585.81 | bwd_inner_microstep: 1585.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3985
[2024-06-10 06:54:14,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1540.83 | bwd_inner_microstep: 1540.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399
[2024-06-10 06:54:16,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.94 | bwd_microstep: 1369.73 | bwd_inner_microstep: 1369.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3531
[2024-06-10 06:54:18,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.90 | bwd_microstep: 1559.82 | bwd_inner_microstep: 1559.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 06:54:20,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 1349.37 | bwd_inner_microstep: 1349.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649
[2024-06-10 06:54:22,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.94 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 06:54:23,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 801.26 | bwd_inner_microstep: 801.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 06:54:25,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.72 | bwd_microstep: 1404.75 | bwd_inner_microstep: 1404.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 06:54:27,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.42 | bwd_microstep: 1408.12 | bwd_inner_microstep: 1408.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3553
[2024-06-10 06:54:29,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.08 | bwd_microstep: 1233.97 | bwd_inner_microstep: 1233.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294
[2024-06-10 06:54:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.76 | bwd_microstep: 980.75 | bwd_inner_microstep: 980.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 06:54:32,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.76 | bwd_microstep: 1453.46 | bwd_inner_microstep: 1453.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 06:54:34,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1502.90 | bwd_inner_microstep: 1502.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-10 06:54:36,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.48 | bwd_microstep: 1443.86 | bwd_inner_microstep: 1443.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3307
[2024-06-10 06:54:38,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1325.05 | bwd_inner_microstep: 1325.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2260
[2024-06-10 06:54:40,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.03 | bwd_microstep: 1067.81 | bwd_inner_microstep: 1067.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 06:54:42,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1394.53 | bwd_inner_microstep: 1394.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 06:54:44,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.96 | bwd_microstep: 1598.14 | bwd_inner_microstep: 1598.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 06:54:46,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1404.62 | bwd_inner_microstep: 1404.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 06:54:48,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.64
[2024-06-10 06:54:48,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.89 | bwd_microstep: 1502.33 | bwd_inner_microstep: 1494.65 | bwd_allreduce_microstep: 7.64 | step_microstep: 38.40
[2024-06-10 06:54:48,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16559.26 | bwd: 44373.85 | bwd_inner: 44365.23 | bwd_allreduce: 7.91 | step: 40.15
{'loss': 1.2731, 'learning_rate': 3.67508889268093e-05, 'epoch': 0.21}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 06:54:50,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.96 | bwd_microstep: 1369.90 | bwd_inner_microstep: 1369.79 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3537
[2024-06-10 06:54:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.46 | bwd_microstep: 1345.43 | bwd_inner_microstep: 1345.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3906
[2024-06-10 06:54:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1452.94 | bwd_inner_microstep: 1452.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 06:54:55,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1315.49 | bwd_inner_microstep: 1315.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798
[2024-06-10 06:54:58,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-10 06:55:00,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.07 | bwd_microstep: 1455.53 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 06:55:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.93 | bwd_microstep: 1251.65 | bwd_inner_microstep: 1251.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1979
[2024-06-10 06:55:02,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.95 | bwd_microstep: 707.21 | bwd_inner_microstep: 707.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 06:55:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.55 | bwd_microstep: 1536.48 | bwd_inner_microstep: 1536.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 06:55:05,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.42 | bwd_microstep: 699.76 | bwd_inner_microstep: 699.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 06:55:07,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 06:55:09,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.09 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 06:55:11,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1375.70 | bwd_inner_microstep: 1375.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674
[2024-06-10 06:55:13,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.63 | bwd_microstep: 1552.90 | bwd_inner_microstep: 1552.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 06:55:15,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.84 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2145
[2024-06-10 06:55:17,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.43 | bwd_microstep: 1042.12 | bwd_inner_microstep: 1042.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3449
[2024-06-10 06:55:19,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.61 | bwd_microstep: 1548.17 | bwd_inner_microstep: 1548.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 06:55:21,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 06:55:23,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1438.24 | bwd_inner_microstep: 1438.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 06:55:25,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.95 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1483.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 06:55:27,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.39 | bwd_microstep: 1404.86 | bwd_inner_microstep: 1404.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461
[2024-06-10 06:55:29,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.08 | bwd_microstep: 1436.31 | bwd_inner_microstep: 1436.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3617
[2024-06-10 06:55:31,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1472.36 | bwd_inner_microstep: 1472.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 06:55:33,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1559.77 | bwd_inner_microstep: 1559.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 06:55:35,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.86 | bwd_microstep: 1392.93 | bwd_inner_microstep: 1392.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3552
[2024-06-10 06:55:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1443.33 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442
[2024-06-10 06:55:39,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1452.75 | bwd_inner_microstep: 1452.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 06:55:41,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1648.48 | bwd_inner_microstep: 1648.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818
[2024-06-10 06:55:43,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1391.98 | bwd_inner_microstep: 1391.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964
[2024-06-10 06:55:44,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.15 | bwd_microstep: 704.42 | bwd_inner_microstep: 704.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3694
[2024-06-10 06:55:46,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1334.88 | bwd_inner_microstep: 1334.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 06:55:50,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 06:55:50,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 3520.31 | bwd_inner_microstep: 1439.67 | bwd_allreduce_microstep: 2080.59 | step_microstep: 38.72
[2024-06-10 06:55:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16193.20 | bwd: 45395.58 | bwd_inner: 43313.97 | bwd_allreduce: 2080.88 | step: 40.52
{'loss': 1.2967, 'learning_rate': 3.6730352039640476e-05, 'epoch': 0.21}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 06:55:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1240.72 | bwd_inner_microstep: 1240.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 06:55:53,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.50 | bwd_microstep: 1244.63 | bwd_inner_microstep: 1244.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 06:55:54,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.99 | bwd_microstep: 786.44 | bwd_inner_microstep: 786.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 06:55:56,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1275.44 | bwd_inner_microstep: 1275.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 06:55:58,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1273.49 | bwd_inner_microstep: 1273.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 06:56:00,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.02 | bwd_microstep: 1639.35 | bwd_inner_microstep: 1639.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 06:56:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.26 | bwd_microstep: 1628.93 | bwd_inner_microstep: 1628.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 06:56:04,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-10 06:56:06,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.91 | bwd_microstep: 1152.80 | bwd_inner_microstep: 1152.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 06:56:08,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1412.56 | bwd_inner_microstep: 1412.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 06:56:09,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 686.91 | bwd_inner_microstep: 686.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 06:56:10,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.47 | bwd_microstep: 792.53 | bwd_inner_microstep: 792.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656
[2024-06-10 06:56:12,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1510.68 | bwd_inner_microstep: 1510.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 06:56:14,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3508
[2024-06-10 06:56:16,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.85 | bwd_microstep: 1686.01 | bwd_inner_microstep: 1685.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3511
[2024-06-10 06:56:18,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1513.51 | bwd_inner_microstep: 1513.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3498
[2024-06-10 06:56:20,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.47 | bwd_microstep: 1457.14 | bwd_inner_microstep: 1457.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 06:56:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1281.48 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969
[2024-06-10 06:56:23,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.99 | bwd_microstep: 857.97 | bwd_inner_microstep: 857.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 06:56:26,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.82 | bwd_microstep: 1623.66 | bwd_inner_microstep: 1623.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-10 06:56:27,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1424.33 | bwd_inner_microstep: 1424.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 06:56:30,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.89 | bwd_microstep: 1660.61 | bwd_inner_microstep: 1660.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688
[2024-06-10 06:56:32,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.83 | bwd_microstep: 1431.00 | bwd_inner_microstep: 1430.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1359
[2024-06-10 06:56:32,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.59 | bwd_microstep: 519.21 | bwd_inner_microstep: 519.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3700
[2024-06-10 06:56:35,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.19 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 06:56:36,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1254.66 | bwd_inner_microstep: 1254.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3675
[2024-06-10 06:56:39,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.62 | bwd_microstep: 1571.67 | bwd_inner_microstep: 1571.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3535
[2024-06-10 06:56:41,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1591.23 | bwd_inner_microstep: 1591.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 06:56:43,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1546.35 | bwd_inner_microstep: 1546.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2964
[2024-06-10 06:56:45,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.23 | bwd_microstep: 1202.70 | bwd_inner_microstep: 1202.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 06:56:46,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 789.94 | bwd_inner_microstep: 789.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887
[2024-06-10 06:56:51,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 06:56:51,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 4607.93 | bwd_inner_microstep: 1785.31 | bwd_allreduce_microstep: 2822.57 | step_microstep: 38.68
[2024-06-10 06:56:51,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15713.54 | bwd: 45045.43 | bwd_inner: 42221.95 | bwd_allreduce: 2822.79 | step: 40.45
{'loss': 1.2592, 'learning_rate': 3.6709756228258735e-05, 'epoch': 0.21}
:24<23:31:48, 61.88s/it]


 21%|██        | 357/1726 [6:14:24<23:31:48, 61.88s/it]
 21%|██        | 358/1726 [6:15:22<23:09:00, 60.92s/it]


 21%|██        | 358/1726 [6:15:22<23:09:00, 60.92s/it]
 21%|██        | 359/1726 [6:16:23<23:08:21, 60.94s/it]


 21%|██        | 359/1726 [6:16:23<23:08:21, 60.94s/it]
 21%|██        | 360/1726 [6:17:25<23:09:49, 61.05s/it]


 21%|██        | 360/1726 [6:17:25<23:09:49, 61.05s/it]
 21%|██        | 361/1726 [6:18:26<23:14:55, 61.32s/it]


 21%|██        | 361/1726 [6:18:27<23:14:55, 61.32s/it]
 21%|██        | 362/1726 [6:19:28<23:12:30, 61.25s/it]


 21%|██        | 362/1726 [6:19:28<23:12:30, 61.25s/idynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 06:56:53,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.41 | bwd_microstep: 1425.01 | bwd_inner_microstep: 1424.90 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 06:56:55,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1347.33 | bwd_inner_microstep: 1347.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 06:56:57,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1546.70 | bwd_inner_microstep: 1546.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 06:56:59,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1247.40 | bwd_inner_microstep: 1247.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930
[2024-06-10 06:57:00,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.44 | bwd_microstep: 760.58 | bwd_inner_microstep: 760.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 06:57:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.47 | bwd_microstep: 1633.67 | bwd_inner_microstep: 1633.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415
[2024-06-10 06:57:04,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1185.74 | bwd_inner_microstep: 1185.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 06:57:05,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1386.73 | bwd_inner_microstep: 1386.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 06:57:08,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.29 | bwd_microstep: 1524.81 | bwd_inner_microstep: 1524.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 06:57:09,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.77 | bwd_microstep: 810.61 | bwd_inner_microstep: 810.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3487
[2024-06-10 06:57:11,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1395.06 | bwd_inner_microstep: 1395.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2610
[2024-06-10 06:57:12,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.80 | bwd_microstep: 1002.73 | bwd_inner_microstep: 1002.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 06:57:14,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1418.96 | bwd_inner_microstep: 1418.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 06:57:16,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.20 | bwd_microstep: 1626.86 | bwd_inner_microstep: 1626.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3451
[2024-06-10 06:57:18,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1620.54 | bwd_inner_microstep: 1620.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 06:57:21,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.33 | bwd_microstep: 1593.35 | bwd_inner_microstep: 1593.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 06:57:23,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1385.01 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 06:57:24,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.79 | bwd_microstep: 1254.97 | bwd_inner_microstep: 1254.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 06:57:26,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1252.53 | bwd_inner_microstep: 1252.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1972
[2024-06-10 06:57:27,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.87 | bwd_microstep: 861.40 | bwd_inner_microstep: 861.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 06:57:29,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1251.35 | bwd_inner_microstep: 1251.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 06:57:31,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.09 | bwd_microstep: 1183.61 | bwd_inner_microstep: 1183.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3673
[2024-06-10 06:57:32,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1230.82 | bwd_inner_microstep: 1230.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 06:57:34,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1505.47 | bwd_inner_microstep: 1505.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 06:57:37,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.12 | bwd_microstep: 1558.30 | bwd_inner_microstep: 1558.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3550
[2024-06-10 06:57:38,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.13 | bwd_microstep: 1328.11 | bwd_inner_microstep: 1328.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581
[2024-06-10 06:57:40,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1365.67 | bwd_inner_microstep: 1365.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 06:57:42,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.09 | bwd_microstep: 1256.37 | bwd_inner_microstep: 1256.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 06:57:44,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1312.84 | bwd_inner_microstep: 1312.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2981
[2024-06-10 06:57:45,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.59 | bwd_microstep: 1141.81 | bwd_inner_microstep: 1141.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3762
[2024-06-10 06:57:48,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.88 | bwd_microstep: 1601.05 | bwd_inner_microstep: 1601.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441
[2024-06-10 06:57:52,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.25 | optimizer_step: 6.57
[2024-06-10 06:57:52,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.20 | bwd_microstep: 3950.91 | bwd_inner_microstep: 1568.55 | bwd_allreduce_microstep: 2382.31 | step_microstep: 38.80
[2024-06-10 06:57:52,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15924.10 | bwd: 44966.33 | bwd_inner: 42583.00 | bwd_allreduce: 2382.61 | step: 40.47
{'loss': 1.3242, 'learning_rate': 3.6689101565202416e-05, 'epoch': 0.21}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039
[2024-06-10 06:57:53,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.41 | bwd_microstep: 900.26 | bwd_inner_microstep: 900.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 06:57:55,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.81 | bwd_microstep: 1414.32 | bwd_inner_microstep: 1414.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 06:57:57,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1354.13 | bwd_inner_microstep: 1354.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481
[2024-06-10 06:57:59,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.15 | bwd_microstep: 1348.65 | bwd_inner_microstep: 1348.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 06:58:01,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.94 | bwd_microstep: 1516.60 | bwd_inner_microstep: 1516.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 06:58:03,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.67 | bwd_microstep: 1188.67 | bwd_inner_microstep: 1188.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 06:58:05,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1286.77 | bwd_inner_microstep: 1286.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870
[2024-06-10 06:58:06,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.08 | bwd_microstep: 680.42 | bwd_inner_microstep: 680.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 06:58:08,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1536.72 | bwd_inner_microstep: 1536.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 06:58:09,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.74 | bwd_microstep: 684.78 | bwd_inner_microstep: 684.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 06:58:11,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.95 | bwd_microstep: 1392.84 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3703
[2024-06-10 06:58:13,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.34 | bwd_microstep: 1553.04 | bwd_inner_microstep: 1553.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692
[2024-06-10 06:58:15,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.38 | bwd_microstep: 1460.51 | bwd_inner_microstep: 1460.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467
[2024-06-10 06:58:17,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1572.51 | bwd_inner_microstep: 1572.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686
[2024-06-10 06:58:19,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.39 | bwd_microstep: 1726.61 | bwd_inner_microstep: 1726.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 06:58:21,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 06:58:23,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.08 | bwd_microstep: 1497.23 | bwd_inner_microstep: 1497.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2755
[2024-06-10 06:58:25,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.56 | bwd_microstep: 1173.00 | bwd_inner_microstep: 1172.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 06:58:26,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.50 | bwd_microstep: 881.47 | bwd_inner_microstep: 881.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 06:58:28,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 06:58:29,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.19 | bwd_microstep: 975.38 | bwd_inner_microstep: 975.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2701
[2024-06-10 06:58:31,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.60 | bwd_microstep: 1129.74 | bwd_inner_microstep: 1129.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3153
[2024-06-10 06:58:32,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3805
[2024-06-10 06:58:35,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1516.51 | bwd_inner_microstep: 1516.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2075
[2024-06-10 06:58:36,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.45 | bwd_microstep: 1012.00 | bwd_inner_microstep: 1011.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3535
[2024-06-10 06:58:38,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.19 | bwd_microstep: 1688.68 | bwd_inner_microstep: 1688.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 06:58:40,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 06:58:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1284.84 | bwd_inner_microstep: 1284.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 06:58:43,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.30 | bwd_microstep: 976.19 | bwd_inner_microstep: 976.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 06:58:45,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076
[2024-06-10 06:58:46,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.00 | bwd_microstep: 822.43 | bwd_inner_microstep: 822.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2242
[2024-06-10 06:58:53,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 06:58:53,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.85 | bwd_microstep: 6635.06 | bwd_inner_microstep: 876.33 | bwd_allreduce_microstep: 5758.68 | step_microstep: 38.61
[2024-06-10 06:58:53,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14976.92 | bwd: 45790.66 | bwd_inner: 40030.96 | bwd_allreduce: 5758.96 | step: 40.30
{'loss': 1.2691, 'learning_rate': 3.6668388123217154e-05, 'epoch': 0.21}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 06:58:55,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1333.27 | bwd_inner_microstep: 1333.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3949
[2024-06-10 06:58:57,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.15 | bwd_microstep: 1699.32 | bwd_inner_microstep: 1699.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 06:58:59,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.49 | bwd_microstep: 1277.48 | bwd_inner_microstep: 1277.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-10 06:59:01,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1411.64 | bwd_inner_microstep: 1411.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 06:59:03,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.25 | bwd_microstep: 1438.37 | bwd_inner_microstep: 1438.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 06:59:05,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1385.21 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 06:59:07,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1395.30 | bwd_inner_microstep: 1395.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-10 06:59:08,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.96 | bwd_microstep: 684.71 | bwd_inner_microstep: 684.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2197
[2024-06-10 06:59:09,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.46 | bwd_microstep: 829.26 | bwd_inner_microstep: 829.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948
[2024-06-10 06:59:10,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.67 | bwd_microstep: 827.35 | bwd_inner_microstep: 827.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907
[2024-06-10 06:59:11,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.46 | bwd_microstep: 687.20 | bwd_inner_microstep: 687.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2934
[2024-06-10 06:59:13,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.81 | bwd_microstep: 1165.21 | bwd_inner_microstep: 1165.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1949
[2024-06-10 06:59:14,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.63 | bwd_microstep: 851.27 | bwd_inner_microstep: 851.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3637
[2024-06-10 06:59:16,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1461.98 | bwd_inner_microstep: 1461.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 06:59:18,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641
[2024-06-10 06:59:20,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.22 | bwd_microstep: 1707.97 | bwd_inner_microstep: 1707.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3513
[2024-06-10 06:59:23,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.97 | bwd_microstep: 1681.84 | bwd_inner_microstep: 1681.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3661
[2024-06-10 06:59:25,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.80 | bwd_microstep: 1589.84 | bwd_inner_microstep: 1589.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 06:59:27,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.18 | bwd_microstep: 1297.93 | bwd_inner_microstep: 1297.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 06:59:29,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1492.91 | bwd_inner_microstep: 1492.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 06:59:31,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1557.11 | bwd_inner_microstep: 1557.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3586
[2024-06-10 06:59:33,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1309.18 | bwd_inner_microstep: 1309.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689
[2024-06-10 06:59:35,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1433.16 | bwd_inner_microstep: 1433.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 06:59:36,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1375.14 | bwd_inner_microstep: 1375.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 06:59:38,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.51 | bwd_microstep: 1253.54 | bwd_inner_microstep: 1253.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 06:59:40,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.41 | bwd_microstep: 1301.78 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426
[2024-06-10 06:59:42,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1313.88 | bwd_inner_microstep: 1313.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 06:59:44,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.08 | bwd_microstep: 1759.96 | bwd_inner_microstep: 1759.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2054
[2024-06-10 06:59:46,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.27 | bwd_microstep: 946.37 | bwd_inner_microstep: 946.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 06:59:48,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.62 | bwd_microstep: 1544.96 | bwd_inner_microstep: 1544.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 06:59:50,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.07 | bwd_microstep: 1497.35 | bwd_inner_microstep: 1497.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 06:59:53,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 06:59:53,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 2879.88 | bwd_inner_microstep: 1567.99 | bwd_allreduce_microstep: 1311.83 | step_microstep: 38.55
[2024-06-10 06:59:53,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15841.34 | bwd: 43767.22 | bwd_inner: 42454.38 | bwd_allreduce: 1312.11 | step: 40.15
{'loss': 1.2989, 'learning_rate': 3.664761597525557e-05, 'epoch': 0.21}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 06:59:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.58 | bwd_microstep: 1467.46 | bwd_inner_microstep: 1467.39 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 06:59:57,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.02 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 06:59:59,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1480.65 | bwd_inner_microstep: 1480.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3585
[2024-06-10 07:00:01,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.03 | bwd_microstep: 1238.62 | bwd_inner_microstep: 1238.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 07:00:03,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 07:00:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.62 | bwd_microstep: 798.92 | bwd_inner_microstep: 798.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 07:00:05,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 791.63 | bwd_inner_microstep: 791.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761
[2024-06-10 07:00:07,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.17 | bwd_microstep: 1341.79 | bwd_inner_microstep: 1341.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 07:00:09,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1343.85 | bwd_inner_microstep: 1343.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 07:00:11,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1524.40 | bwd_inner_microstep: 1524.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 07:00:13,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.06 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 07:00:15,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.50 | bwd_microstep: 1495.03 | bwd_inner_microstep: 1495.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 07:00:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1279.38 | bwd_inner_microstep: 1279.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492
[2024-06-10 07:00:19,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.24 | bwd_microstep: 1584.30 | bwd_inner_microstep: 1584.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 07:00:21,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.30 | bwd_microstep: 1427.98 | bwd_inner_microstep: 1427.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 07:00:22,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.05 | bwd_microstep: 804.59 | bwd_inner_microstep: 804.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3682
[2024-06-10 07:00:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1553.07 | bwd_inner_microstep: 1553.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 07:00:26,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1396.41 | bwd_inner_microstep: 1396.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3640
[2024-06-10 07:00:28,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.83 | bwd_microstep: 1709.94 | bwd_inner_microstep: 1709.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 07:00:30,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1485.04 | bwd_inner_microstep: 1485.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 07:00:32,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 07:00:34,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.57 | bwd_microstep: 1278.15 | bwd_inner_microstep: 1278.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 07:00:36,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1401.45 | bwd_inner_microstep: 1401.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 07:00:38,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.76 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 07:00:40,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1314.45 | bwd_inner_microstep: 1314.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2447
[2024-06-10 07:00:41,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.93 | bwd_microstep: 948.54 | bwd_inner_microstep: 948.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2248
[2024-06-10 07:00:43,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.77 | bwd_microstep: 969.58 | bwd_inner_microstep: 969.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 07:00:45,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1508.33 | bwd_inner_microstep: 1508.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2282
[2024-06-10 07:00:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.88 | bwd_microstep: 911.35 | bwd_inner_microstep: 911.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-10 07:00:47,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.38 | bwd_microstep: 729.64 | bwd_inner_microstep: 729.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2064
[2024-06-10 07:00:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.65 | bwd_microstep: 846.38 | bwd_inner_microstep: 846.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3746
[2024-06-10 07:00:55,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 07:00:55,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.13 | bwd_microstep: 6186.55 | bwd_inner_microstep: 1600.69 | bwd_allreduce_microstep: 4585.81 | step_microstep: 38.56
[2024-06-10 07:00:55,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15459.64 | bwd: 45883.72 | bwd_inner: 41296.94 | bwd_allreduce: 4586.07 | step: 40.23
{'loss': 1.3355, 'learning_rate': 3.662678519447706e-05, 'epoch': 0.21}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 07:00:57,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1374.39 | bwd_inner_microstep: 1374.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4039
[2024-06-10 07:00:59,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.08 | bwd_microstep: 1713.58 | bwd_inner_microstep: 1713.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 07:01:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.50 | bwd_microstep: 1683.98 | bwd_inner_microstep: 1683.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3863
[2024-06-10 07:01:03,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1368.18 | bwd_inner_microstep: 1368.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 07:01:06,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1555.61 | bwd_inner_microstep: 1555.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 07:01:07,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.58 | bwd_microstep: 793.72 | bwd_inner_microstep: 793.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 07:01:09,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.11 | bwd_microstep: 1651.39 | bwd_inner_microstep: 1651.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3234
[2024-06-10 07:01:11,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.23 | bwd_microstep: 1180.29 | bwd_inner_microstep: 1180.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3511
[2024-06-10 07:01:12,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.01 | bwd_microstep: 1225.85 | bwd_inner_microstep: 1225.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 07:01:13,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.64 | bwd_microstep: 800.78 | bwd_inner_microstep: 800.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 07:01:15,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 07:01:17,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.96 | bwd_microstep: 1523.35 | bwd_inner_microstep: 1523.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3415
[2024-06-10 07:01:19,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.20 | bwd_microstep: 1370.05 | bwd_inner_microstep: 1370.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3454
[2024-06-10 07:01:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.64 | bwd_microstep: 1336.57 | bwd_inner_microstep: 1336.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3983
[2024-06-10 07:01:23,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.26 | bwd_microstep: 1375.35 | bwd_inner_microstep: 1375.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 07:01:25,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1480.56 | bwd_inner_microstep: 1480.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3645
[2024-06-10 07:01:27,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.34 | bwd_microstep: 1644.06 | bwd_inner_microstep: 1644.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 07:01:29,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1528.93 | bwd_inner_microstep: 1528.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2020
[2024-06-10 07:01:30,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.58 | bwd_microstep: 719.40 | bwd_inner_microstep: 719.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3599
[2024-06-10 07:01:32,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1474.49 | bwd_inner_microstep: 1474.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 07:01:34,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1357.35 | bwd_inner_microstep: 1357.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 07:01:35,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 805.76 | bwd_inner_microstep: 805.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3686
[2024-06-10 07:01:37,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1391.59 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3711
[2024-06-10 07:01:39,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1239.41 | bwd_inner_microstep: 1239.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031
[2024-06-10 07:01:40,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.52 | bwd_microstep: 809.59 | bwd_inner_microstep: 809.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2192
[2024-06-10 07:01:41,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.25 | bwd_microstep: 798.28 | bwd_inner_microstep: 798.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 07:01:43,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.06 | bwd_microstep: 1304.59 | bwd_inner_microstep: 1304.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3532
[2024-06-10 07:01:45,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.16 | bwd_microstep: 1559.96 | bwd_inner_microstep: 1559.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2929
[2024-06-10 07:01:47,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.66 | bwd_microstep: 1192.32 | bwd_inner_microstep: 1192.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 07:01:49,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.88 | bwd_microstep: 1474.77 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 07:01:51,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.23 | bwd_microstep: 1557.16 | bwd_inner_microstep: 1557.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3437
[2024-06-10 07:01:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 07:01:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.12 | bwd_microstep: 3306.65 | bwd_inner_microstep: 1617.45 | bwd_allreduce_microstep: 1689.15 | step_microstep: 38.69
[2024-06-10 07:01:55,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15729.05 | bwd: 43887.47 | bwd_inner: 42197.38 | bwd_allreduce: 1689.38 | step: 40.37
{'loss': 1.3645, 'learning_rate': 3.6605895854247534e-05, 'epoch': 0.21}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846
[2024-06-10 07:01:56,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.21 | bwd_microstep: 667.78 | bwd_inner_microstep: 667.64 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3947
[2024-06-10 07:01:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.08 | bwd_microstep: 1591.74 | bwd_inner_microstep: 1591.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847
[2024-06-10 07:02:00,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.05 | bwd_microstep: 1661.69 | bwd_inner_microstep: 1661.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2302
[2024-06-10 07:02:02,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.83 | bwd_microstep: 881.19 | bwd_inner_microstep: 881.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 07:02:04,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.19 | bwd_microstep: 1445.02 | bwd_inner_microstep: 1444.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 07:02:06,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1637.82 | bwd_inner_microstep: 1637.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 07:02:08,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1287.98 | bwd_inner_microstep: 1287.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3404
[2024-06-10 07:02:09,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.11 | bwd_microstep: 1212.40 | bwd_inner_microstep: 1212.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 07:02:11,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1250.97 | bwd_inner_microstep: 1250.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870
[2024-06-10 07:02:12,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.66 | bwd_microstep: 683.47 | bwd_inner_microstep: 683.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405
[2024-06-10 07:02:14,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.37 | bwd_microstep: 1370.64 | bwd_inner_microstep: 1370.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-10 07:02:16,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1280.56 | bwd_inner_microstep: 1280.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 07:02:18,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.47 | bwd_microstep: 1615.13 | bwd_inner_microstep: 1615.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3510
[2024-06-10 07:02:20,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.42 | bwd_microstep: 1553.76 | bwd_inner_microstep: 1553.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639
[2024-06-10 07:02:22,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.14 | bwd_microstep: 1575.71 | bwd_inner_microstep: 1575.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3648
[2024-06-10 07:02:24,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.15 | bwd_microstep: 1592.15 | bwd_inner_microstep: 1592.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 07:02:26,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.36 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-10 07:02:28,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.58 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-10 07:02:30,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1614.16 | bwd_inner_microstep: 1614.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 07:02:32,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.41 | bwd_microstep: 974.78 | bwd_inner_microstep: 974.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 07:02:34,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2311
[2024-06-10 07:02:35,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.30 | bwd_microstep: 984.12 | bwd_inner_microstep: 984.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 07:02:37,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1254.67 | bwd_inner_microstep: 1254.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465
[2024-06-10 07:02:39,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1331.51 | bwd_inner_microstep: 1331.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-10 07:02:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1427.47 | bwd_inner_microstep: 1427.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3602
[2024-06-10 07:02:42,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.74 | bwd_microstep: 1215.75 | bwd_inner_microstep: 1215.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3561
[2024-06-10 07:02:44,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.47 | bwd_microstep: 1523.64 | bwd_inner_microstep: 1523.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 07:02:46,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1554.28 | bwd_inner_microstep: 1554.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2268
[2024-06-10 07:02:48,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.82 | bwd_microstep: 937.16 | bwd_inner_microstep: 937.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 07:02:50,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1356.47 | bwd_inner_microstep: 1356.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 07:02:52,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1402.80 | bwd_inner_microstep: 1402.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 07:02:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 07:02:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 4313.17 | bwd_inner_microstep: 1697.97 | bwd_allreduce_microstep: 2615.15 | step_microstep: 38.72
[2024-06-10 07:02:56,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15914.85 | bwd: 45332.66 | bwd_inner: 42716.50 | bwd_allreduce: 2615.43 | step: 40.43
{'loss': 1.3196, 'learning_rate': 3.6584948028139126e-05, 'epoch': 0.21}
t]
 21%|██        | 363/1726 [6:20:29<23:11:26, 61.25s/it]


 21%|██        | 363/1726 [6:20:29<23:11:26, 61.25s/it]
 21%|██        | 364/1726 [6:21:30<23:09:33, 61.21s/it]


 21%|██        | 364/1726 [6:21:30<23:09:33, 61.21s/it]
 21%|██        | 365/1726 [6:22:30<22:59:59, 60.84s/it]


 21%|██        | 365/1726 [6:22:30<22:59:59, 60.84s/it]
 21%|██        | 366/1726 [6:23:32<23:04:49, 61.10s/it]


 21%|██        | 366/1726 [6:23:32<23:04:49, 61.10s/it]
 21%|██▏       | 367/1726 [6:24:32<22:56:10, 60.76s/it]


 21%|██▏       | 367/1726 [6:24:32<22:56:10, 60.76s/it]
 21%|██▏       | 368/1726 [6:25:33<23:00:51, 61.01s/it]


 21%|██�dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-10 07:02:58,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1305.10 | bwd_inner_microstep: 1305.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3897
[2024-06-10 07:03:00,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.51 | bwd_microstep: 1481.15 | bwd_inner_microstep: 1481.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3930
[2024-06-10 07:03:03,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.25 | bwd_microstep: 1593.19 | bwd_inner_microstep: 1593.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 07:03:05,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.39 | bwd_microstep: 1491.29 | bwd_inner_microstep: 1491.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3770
[2024-06-10 07:03:07,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 1471.49 | bwd_inner_microstep: 1471.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 07:03:08,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.61 | bwd_microstep: 793.36 | bwd_inner_microstep: 793.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 07:03:09,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.99 | bwd_microstep: 794.25 | bwd_inner_microstep: 794.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 07:03:11,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 07:03:12,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1288.32 | bwd_inner_microstep: 1288.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3504
[2024-06-10 07:03:14,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1224.07 | bwd_inner_microstep: 1224.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 07:03:16,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1249.45 | bwd_inner_microstep: 1249.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 07:03:18,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 07:03:20,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1318.93 | bwd_inner_microstep: 1318.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 07:03:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.00 | bwd_microstep: 1468.90 | bwd_inner_microstep: 1468.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 07:03:23,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1383.88 | bwd_inner_microstep: 1383.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 07:03:25,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1391.00 | bwd_inner_microstep: 1390.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2184
[2024-06-10 07:03:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.35 | bwd_microstep: 867.53 | bwd_inner_microstep: 867.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 07:03:29,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1515.92 | bwd_inner_microstep: 1515.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 07:03:30,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.79 | bwd_microstep: 1284.83 | bwd_inner_microstep: 1284.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 07:03:32,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.53 | bwd_microstep: 1281.11 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 07:03:34,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1399.27 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451
[2024-06-10 07:03:36,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.44 | bwd_microstep: 1320.08 | bwd_inner_microstep: 1320.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549
[2024-06-10 07:03:38,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.88 | bwd_microstep: 1331.29 | bwd_inner_microstep: 1331.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 07:03:39,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.21 | bwd_microstep: 819.19 | bwd_inner_microstep: 819.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 07:03:41,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1499.36 | bwd_inner_microstep: 1499.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 07:03:43,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.70 | bwd_microstep: 1413.02 | bwd_inner_microstep: 1412.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-10 07:03:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.55 | bwd_microstep: 814.94 | bwd_inner_microstep: 814.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 07:03:46,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.25 | bwd_microstep: 1449.87 | bwd_inner_microstep: 1449.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 07:03:48,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1403.85 | bwd_inner_microstep: 1403.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3603
[2024-06-10 07:03:51,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.90 | bwd_microstep: 1775.78 | bwd_inner_microstep: 1775.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2040
[2024-06-10 07:03:52,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.86 | bwd_microstep: 903.38 | bwd_inner_microstep: 903.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 07:03:58,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 07:03:58,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 5516.94 | bwd_inner_microstep: 1753.21 | bwd_allreduce_microstep: 3763.67 | step_microstep: 38.68
[2024-06-10 07:03:58,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15609.04 | bwd: 45485.97 | bwd_inner: 41721.33 | bwd_allreduce: 3763.93 | step: 40.31
{'loss': 1.3082, 'learning_rate': 3.6563941789929994e-05, 'epoch': 0.21}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 07:04:00,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1372.68 | bwd_inner_microstep: 1372.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 07:04:02,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1382.98 | bwd_inner_microstep: 1382.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 07:04:04,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.96 | bwd_microstep: 1543.55 | bwd_inner_microstep: 1543.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4121
[2024-06-10 07:04:06,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1437.76 | bwd_inner_microstep: 1437.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 07:04:08,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1279.75 | bwd_inner_microstep: 1279.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 07:04:09,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.42 | bwd_microstep: 1250.46 | bwd_inner_microstep: 1250.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 07:04:11,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.42 | bwd_microstep: 1242.76 | bwd_inner_microstep: 1242.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 07:04:13,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.87 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-10 07:04:14,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.44 | bwd_microstep: 710.96 | bwd_inner_microstep: 710.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3730
[2024-06-10 07:04:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.04 | bwd_microstep: 1732.89 | bwd_inner_microstep: 1732.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 705
[2024-06-10 07:04:17,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 117.08 | bwd_microstep: 289.81 | bwd_inner_microstep: 289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455
[2024-06-10 07:04:19,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1317.16 | bwd_inner_microstep: 1317.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004
[2024-06-10 07:04:20,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.88 | bwd_microstep: 831.17 | bwd_inner_microstep: 831.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 07:04:22,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1342.53 | bwd_inner_microstep: 1342.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 07:04:24,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.50 | bwd_microstep: 1627.84 | bwd_inner_microstep: 1627.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3551
[2024-06-10 07:04:26,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1363.37 | bwd_inner_microstep: 1363.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3704
[2024-06-10 07:04:28,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.93 | bwd_microstep: 1724.66 | bwd_inner_microstep: 1724.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518
[2024-06-10 07:04:30,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1513.41 | bwd_inner_microstep: 1513.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 07:04:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1506.45 | bwd_inner_microstep: 1506.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2101
[2024-06-10 07:04:34,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.96 | bwd_microstep: 921.11 | bwd_inner_microstep: 921.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525
[2024-06-10 07:04:36,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.78 | bwd_microstep: 1454.55 | bwd_inner_microstep: 1454.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 07:04:37,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1383.10 | bwd_inner_microstep: 1383.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 07:04:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1463.96 | bwd_inner_microstep: 1463.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 3033
[2024-06-10 07:04:41,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.10 | bwd_microstep: 1091.44 | bwd_inner_microstep: 1091.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2059
[2024-06-10 07:04:42,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.51 | bwd_microstep: 944.24 | bwd_inner_microstep: 944.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 07:04:44,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.89 | bwd_microstep: 1254.18 | bwd_inner_microstep: 1254.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 07:04:46,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.96 | bwd_microstep: 1498.35 | bwd_inner_microstep: 1498.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-10 07:04:48,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.32 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 07:04:50,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.97 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 07:04:52,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.33 | bwd_microstep: 1506.35 | bwd_inner_microstep: 1506.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 07:04:54,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1415.19 | bwd_inner_microstep: 1415.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2182
[2024-06-10 07:05:00,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 07:05:00,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.52 | bwd_microstep: 5697.32 | bwd_inner_microstep: 865.13 | bwd_allreduce_microstep: 4832.13 | step_microstep: 38.94
[2024-06-10 07:05:00,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15460.27 | bwd: 46157.75 | bwd_inner: 41324.71 | bwd_allreduce: 4832.37 | step: 40.58
{'loss': 1.305, 'learning_rate': 3.654287721360398e-05, 'epoch': 0.21}
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4489
[2024-06-10 07:05:03,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 709.69 | bwd_microstep: 1915.79 | bwd_inner_microstep: 1915.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 07:05:04,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1373.17 | bwd_inner_microstep: 1373.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 07:05:06,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1343.52 | bwd_inner_microstep: 1343.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867
[2024-06-10 07:05:07,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.35 | bwd_microstep: 708.00 | bwd_inner_microstep: 707.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 07:05:09,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1346.61 | bwd_inner_microstep: 1346.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 07:05:11,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.27 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 07:05:13,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 1550.69 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 07:05:15,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.99 | bwd_microstep: 1400.67 | bwd_inner_microstep: 1400.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 07:05:16,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.45 | bwd_microstep: 793.69 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 07:05:18,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1246.82 | bwd_inner_microstep: 1246.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1900
[2024-06-10 07:05:19,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.56 | bwd_microstep: 717.61 | bwd_inner_microstep: 717.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947
[2024-06-10 07:05:20,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.25 | bwd_microstep: 823.78 | bwd_inner_microstep: 823.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3422
[2024-06-10 07:05:22,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1396.14 | bwd_inner_microstep: 1396.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3500
[2024-06-10 07:05:24,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.51 | bwd_microstep: 1680.83 | bwd_inner_microstep: 1680.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3645
[2024-06-10 07:05:26,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.20 | bwd_microstep: 1538.42 | bwd_inner_microstep: 1538.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513
[2024-06-10 07:05:28,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.49 | bwd_microstep: 1200.64 | bwd_inner_microstep: 1200.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2142
[2024-06-10 07:05:29,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.63 | bwd_microstep: 866.64 | bwd_inner_microstep: 866.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 07:05:30,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.13 | bwd_microstep: 705.61 | bwd_inner_microstep: 705.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2095
[2024-06-10 07:05:31,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.57 | bwd_microstep: 918.47 | bwd_inner_microstep: 918.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 07:05:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1419.19 | bwd_inner_microstep: 1419.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 07:05:35,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.67 | bwd_microstep: 1287.06 | bwd_inner_microstep: 1287.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2293
[2024-06-10 07:05:36,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.49 | bwd_microstep: 819.20 | bwd_inner_microstep: 819.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 07:05:38,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1488.34 | bwd_inner_microstep: 1488.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063
[2024-06-10 07:05:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.63 | bwd_microstep: 915.26 | bwd_inner_microstep: 915.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3627
[2024-06-10 07:05:41,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 1373.79 | bwd_inner_microstep: 1373.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3875
[2024-06-10 07:05:44,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.69 | bwd_microstep: 1678.11 | bwd_inner_microstep: 1678.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2241
[2024-06-10 07:05:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.29 | bwd_microstep: 1062.56 | bwd_inner_microstep: 1062.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 07:05:47,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.03 | bwd_microstep: 970.26 | bwd_inner_microstep: 970.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 07:05:49,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 07:05:51,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1498.83 | bwd_inner_microstep: 1498.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2892
[2024-06-10 07:05:52,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.25 | bwd_microstep: 1187.29 | bwd_inner_microstep: 1187.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 07:06:00,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.32 | optimizer_step: 6.61
[2024-06-10 07:06:00,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 7404.00 | bwd_inner_microstep: 1696.82 | bwd_allreduce_microstep: 5707.13 | step_microstep: 38.95
[2024-06-10 07:06:00,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14758.57 | bwd: 45328.02 | bwd_inner: 39619.98 | bwd_allreduce: 5707.37 | step: 40.66
{'loss': 1.2988, 'learning_rate': 3.652175437335041e-05, 'epoch': 0.21}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3471
[2024-06-10 07:06:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1569.45 | bwd_inner_microstep: 1569.38 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 07:06:04,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1393.38 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 07:06:06,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1374.23 | bwd_inner_microstep: 1374.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 07:06:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1383.67 | bwd_inner_microstep: 1383.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 07:06:10,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1281.61 | bwd_inner_microstep: 1281.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3770
[2024-06-10 07:06:12,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1494.25 | bwd_inner_microstep: 1494.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3404
[2024-06-10 07:06:14,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.28 | bwd_microstep: 1213.18 | bwd_inner_microstep: 1213.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4041
[2024-06-10 07:06:16,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.06 | bwd_microstep: 1718.85 | bwd_inner_microstep: 1718.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3499
[2024-06-10 07:06:18,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.83 | bwd_microstep: 1319.04 | bwd_inner_microstep: 1319.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 07:06:19,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.16 | bwd_microstep: 796.75 | bwd_inner_microstep: 796.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2199
[2024-06-10 07:06:20,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.20 | bwd_microstep: 797.59 | bwd_inner_microstep: 797.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 07:06:22,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1390.44 | bwd_inner_microstep: 1390.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 07:06:24,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3457
[2024-06-10 07:06:26,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.06 | bwd_microstep: 1330.29 | bwd_inner_microstep: 1330.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 07:06:28,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1856
[2024-06-10 07:06:29,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.45 | bwd_microstep: 675.50 | bwd_inner_microstep: 675.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 07:06:31,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1486.10 | bwd_inner_microstep: 1486.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3473
[2024-06-10 07:06:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1426.35 | bwd_inner_microstep: 1426.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 07:06:35,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.14 | bwd_microstep: 1622.48 | bwd_inner_microstep: 1622.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 07:06:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 1657.94 | bwd_inner_microstep: 1657.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 07:06:39,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1506.69 | bwd_inner_microstep: 1506.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 07:06:41,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.40 | bwd_microstep: 1258.83 | bwd_inner_microstep: 1258.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 07:06:42,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.10 | bwd_microstep: 813.99 | bwd_inner_microstep: 813.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 07:06:43,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.87 | bwd_microstep: 883.43 | bwd_inner_microstep: 883.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 07:06:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.37 | bwd_microstep: 1459.99 | bwd_inner_microstep: 1459.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 07:06:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1283.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772
[2024-06-10 07:06:49,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.31 | bwd_microstep: 1346.55 | bwd_inner_microstep: 1346.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 07:06:51,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.63 | bwd_microstep: 1656.73 | bwd_inner_microstep: 1656.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3600
[2024-06-10 07:06:53,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.79 | bwd_microstep: 1466.24 | bwd_inner_microstep: 1466.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3434
[2024-06-10 07:06:55,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.63 | bwd_microstep: 1464.84 | bwd_inner_microstep: 1464.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 07:06:58,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2414
[2024-06-10 07:06:59,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 07:06:59,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.01 | bwd_microstep: 1168.89 | bwd_inner_microstep: 1161.23 | bwd_allreduce_microstep: 7.62 | step_microstep: 38.37
[2024-06-10 07:06:59,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15958.37 | bwd: 42664.78 | bwd_inner: 42654.89 | bwd_allreduce: 9.16 | step: 40.11
{'loss': 1.2854, 'learning_rate': 3.6500573343563835e-05, 'epoch': 0.22}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 07:07:01,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.16 | bwd_microstep: 1450.53 | bwd_inner_microstep: 1450.39 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 07:07:03,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1344.93 | bwd_inner_microstep: 1344.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3875
[2024-06-10 07:07:05,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.74 | bwd_microstep: 1683.93 | bwd_inner_microstep: 1683.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 07:07:08,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.51 | bwd_microstep: 1642.00 | bwd_inner_microstep: 1641.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3474
[2024-06-10 07:07:10,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1331.44 | bwd_inner_microstep: 1331.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 07:07:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 07:07:14,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1378.02 | bwd_inner_microstep: 1377.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 07:07:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.17 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 07:07:17,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1390.36 | bwd_inner_microstep: 1390.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 07:07:19,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 794.47 | bwd_inner_microstep: 794.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 07:07:20,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.50 | bwd_microstep: 1153.14 | bwd_inner_microstep: 1153.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1974
[2024-06-10 07:07:21,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.92 | bwd_microstep: 832.54 | bwd_inner_microstep: 832.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 07:07:23,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.32 | bwd_microstep: 1217.07 | bwd_inner_microstep: 1217.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3895
[2024-06-10 07:07:25,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.17 | bwd_microstep: 1646.90 | bwd_inner_microstep: 1646.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2152
[2024-06-10 07:07:27,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.02 | bwd_microstep: 947.99 | bwd_inner_microstep: 947.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3669
[2024-06-10 07:07:29,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1479.64 | bwd_inner_microstep: 1479.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675
[2024-06-10 07:07:31,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.02 | bwd_microstep: 1479.43 | bwd_inner_microstep: 1479.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 07:07:32,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 698.70 | bwd_inner_microstep: 698.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 07:07:34,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1350.85 | bwd_inner_microstep: 1350.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 07:07:35,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 07:07:38,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1557.29 | bwd_inner_microstep: 1557.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 07:07:39,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.66 | bwd_microstep: 1257.53 | bwd_inner_microstep: 1257.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1967
[2024-06-10 07:07:40,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.56 | bwd_microstep: 767.27 | bwd_inner_microstep: 767.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 07:07:43,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.69 | bwd_microstep: 1658.71 | bwd_inner_microstep: 1658.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 07:07:44,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.30 | bwd_microstep: 1158.75 | bwd_inner_microstep: 1158.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3464
[2024-06-10 07:07:46,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.99 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3777
[2024-06-10 07:07:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.07 | bwd_microstep: 1474.62 | bwd_inner_microstep: 1474.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819
[2024-06-10 07:07:50,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.21 | bwd_microstep: 1717.87 | bwd_inner_microstep: 1717.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388
[2024-06-10 07:07:52,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.64 | bwd_microstep: 1436.34 | bwd_inner_microstep: 1436.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2046
[2024-06-10 07:07:54,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.42 | bwd_microstep: 874.23 | bwd_inner_microstep: 874.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 07:07:56,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1493.61 | bwd_inner_microstep: 1493.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 07:08:00,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.25 | optimizer_step: 6.63
[2024-06-10 07:08:00,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.08 | bwd_microstep: 4413.65 | bwd_inner_microstep: 1043.02 | bwd_allreduce_microstep: 3370.56 | step_microstep: 38.91
[2024-06-10 07:08:00,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15622.19 | bwd: 45204.76 | bwd_inner: 41833.15 | bwd_allreduce: 3370.87 | step: 40.57
{'loss': 1.333, 'learning_rate': 3.647933419884371e-05, 'epoch': 0.22}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2426
[2024-06-10 07:08:02,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.24 | bwd_microstep: 1027.79 | bwd_inner_microstep: 1027.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 07:08:04,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1246.25 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 07:08:06,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1401.40 | bwd_inner_microstep: 1401.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3801
[2024-06-10 07:08:07,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.46 | bwd_microstep: 1352.10 | bwd_inner_microstep: 1352.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 07:08:09,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1384.26 | bwd_inner_microstep: 1384.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 07:08:11,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1381.91 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3462
[2024-06-10 07:08:13,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.91 | bwd_microstep: 1239.71 | bwd_inner_microstep: 1239.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692
[2024-06-10 07:08:15,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1617.99 | bwd_inner_microstep: 1617.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 07:08:17,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1375.04 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4023
[2024-06-10 07:08:19,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.84 | bwd_microstep: 1703.63 | bwd_inner_microstep: 1703.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 07:08:22,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1483.28 | bwd_inner_microstep: 1483.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529
[2024-06-10 07:08:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.61 | bwd_microstep: 1326.21 | bwd_inner_microstep: 1326.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2470
[2024-06-10 07:08:25,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.56 | bwd_microstep: 857.91 | bwd_inner_microstep: 857.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2480
[2024-06-10 07:08:26,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.81 | bwd_microstep: 959.01 | bwd_inner_microstep: 958.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 07:08:28,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1400.38 | bwd_inner_microstep: 1400.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2081
[2024-06-10 07:08:29,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.90 | bwd_microstep: 916.40 | bwd_inner_microstep: 916.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 07:08:31,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1451.04 | bwd_inner_microstep: 1451.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 07:08:33,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1284.63 | bwd_inner_microstep: 1284.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564
[2024-06-10 07:08:35,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.05 | bwd_microstep: 1424.60 | bwd_inner_microstep: 1424.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2083
[2024-06-10 07:08:36,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.81 | bwd_microstep: 822.99 | bwd_inner_microstep: 822.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2543
[2024-06-10 07:08:37,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.56 | bwd_microstep: 969.26 | bwd_inner_microstep: 969.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 07:08:39,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.31 | bwd_microstep: 1489.89 | bwd_inner_microstep: 1489.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-10 07:08:41,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.70 | bwd_microstep: 1307.13 | bwd_inner_microstep: 1307.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 07:08:43,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1519.67 | bwd_inner_microstep: 1519.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 07:08:45,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1561.17 | bwd_inner_microstep: 1561.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3596
[2024-06-10 07:08:47,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.87 | bwd_microstep: 1275.74 | bwd_inner_microstep: 1275.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3548
[2024-06-10 07:08:49,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 07:08:51,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.70 | bwd_microstep: 1555.84 | bwd_inner_microstep: 1555.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 07:08:54,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.88 | bwd_microstep: 1603.71 | bwd_inner_microstep: 1603.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3434
[2024-06-10 07:08:55,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1297.02 | bwd_inner_microstep: 1297.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 07:08:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1594.22 | bwd_inner_microstep: 1594.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 07:09:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 07:09:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 4731.44 | bwd_inner_microstep: 1676.49 | bwd_allreduce_microstep: 3054.89 | step_microstep: 38.71
[2024-06-10 07:09:03,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16092.16 | bwd: 46118.29 | bwd_inner: 43062.48 | bwd_allreduce: 3055.12 | step: 40.27
�       | 368/1726 [6:25:33<23:00:51, 61.01s/it]
 21%|██▏       | 369/1726 [6:26:35<23:02:46, 61.14s/it]


 21%|██▏       | 369/1726 [6:26:35<23:02:46, 61.14s/it]
 21%|██▏       | 370/1726 [6:27:37<23:07:21, 61.39s/it]


 21%|██▏       | 370/1726 [6:27:37<23:07:21, 61.39s/it]
 21%|██▏       | 371/1726 [6:28:37<22:59:55, 61.10s/it]


 21%|██▏       | 371/1726 [6:28:37<22:59:55, 61.10s/it]
 22%|██▏       | 372/1726 [6:29:36<22:44:30, 60.47s/it]


 22%|██▏       | 372/1726 [6:29:36<22:44:30, 60.47s/it]
 22%|██▏       | 373/1726 [6:30:37<22:48:21, 60.68s/it]


 22%|██▏       | 373/1726 [6:30:37<22:48:21, 60.68s/it]
 22%|██▏       | 374/1726 [6:31:40<23:00:02, 61.24s/it]
        {'loss': 1.3482, 'learning_rate': 3.6458037013994214e-05, 'epoch': 0.22}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 07:09:05,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1387.71 | bwd_inner_microstep: 1387.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3910
[2024-06-10 07:09:07,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1493.91 | bwd_inner_microstep: 1493.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-10 07:09:09,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.99 | bwd_microstep: 1455.44 | bwd_inner_microstep: 1455.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 07:09:11,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 07:09:13,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.76 | bwd_microstep: 1345.72 | bwd_inner_microstep: 1345.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 07:09:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1875
[2024-06-10 07:09:16,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.89 | bwd_microstep: 681.65 | bwd_inner_microstep: 681.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 07:09:17,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.16 | bwd_microstep: 1194.31 | bwd_inner_microstep: 1194.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 07:09:19,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.10 | bwd_microstep: 1320.67 | bwd_inner_microstep: 1320.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 07:09:20,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.23 | bwd_microstep: 806.56 | bwd_inner_microstep: 806.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1946
[2024-06-10 07:09:21,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.33 | bwd_microstep: 891.93 | bwd_inner_microstep: 891.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 07:09:23,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.43 | bwd_microstep: 1387.81 | bwd_inner_microstep: 1387.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-10 07:09:26,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.31 | bwd_microstep: 1639.36 | bwd_inner_microstep: 1639.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3441
[2024-06-10 07:09:27,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1301.63 | bwd_inner_microstep: 1301.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691
[2024-06-10 07:09:29,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1326.69 | bwd_inner_microstep: 1326.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 07:09:31,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1397.07 | bwd_inner_microstep: 1397.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 07:09:33,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1389.63 | bwd_inner_microstep: 1389.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 07:09:34,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.60 | bwd_microstep: 798.88 | bwd_inner_microstep: 798.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 07:09:36,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.09 | bwd_microstep: 1259.04 | bwd_inner_microstep: 1259.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 07:09:38,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.34 | bwd_microstep: 1655.24 | bwd_inner_microstep: 1655.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2167
[2024-06-10 07:09:39,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.84 | bwd_microstep: 953.02 | bwd_inner_microstep: 952.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744
[2024-06-10 07:09:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1472.28 | bwd_inner_microstep: 1472.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 07:09:44,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.13 | bwd_microstep: 1663.81 | bwd_inner_microstep: 1663.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 07:09:46,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1304.83 | bwd_inner_microstep: 1304.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 07:09:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.41 | bwd_microstep: 1302.98 | bwd_inner_microstep: 1302.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 07:09:49,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.72 | bwd_microstep: 1202.04 | bwd_inner_microstep: 1202.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3751
[2024-06-10 07:09:51,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1279.97 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2036
[2024-06-10 07:09:52,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.51 | bwd_microstep: 717.71 | bwd_inner_microstep: 717.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 07:09:54,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1286.88 | bwd_inner_microstep: 1286.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3474
[2024-06-10 07:09:56,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.03 | bwd_microstep: 1442.57 | bwd_inner_microstep: 1442.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3771
[2024-06-10 07:09:58,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.83 | bwd_microstep: 1741.54 | bwd_inner_microstep: 1741.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 07:10:05,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 07:10:05,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 6190.73 | bwd_inner_microstep: 1871.43 | bwd_allreduce_microstep: 4319.25 | step_microstep: 38.67
[2024-06-10 07:10:05,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15576.82 | bwd: 45957.34 | bwd_inner: 41637.17 | bwd_allreduce: 4319.48 | step: 40.22
{'loss': 1.2833, 'learning_rate': 3.643668186402392e-05, 'epoch': 0.22}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 07:10:07,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.62 | bwd_microstep: 1438.69 | bwd_inner_microstep: 1438.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.14
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3987
[2024-06-10 07:10:09,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.81 | bwd_microstep: 1630.59 | bwd_inner_microstep: 1630.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 07:10:10,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.27 | bwd_microstep: 791.39 | bwd_inner_microstep: 791.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3930
[2024-06-10 07:10:12,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.85 | bwd_microstep: 1593.80 | bwd_inner_microstep: 1593.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 07:10:14,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 07:10:16,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1381.68 | bwd_inner_microstep: 1381.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 07:10:18,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.88 | bwd_microstep: 1289.91 | bwd_inner_microstep: 1289.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 07:10:20,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1385.47 | bwd_inner_microstep: 1385.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 07:10:22,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3716
[2024-06-10 07:10:23,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1400.33 | bwd_inner_microstep: 1400.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 07:10:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.95 | bwd_microstep: 798.29 | bwd_inner_microstep: 798.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 07:10:26,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.54 | bwd_microstep: 1282.87 | bwd_inner_microstep: 1282.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 07:10:28,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.59 | bwd_microstep: 1486.54 | bwd_inner_microstep: 1486.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3044
[2024-06-10 07:10:30,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.24 | bwd_microstep: 1232.52 | bwd_inner_microstep: 1232.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2988
[2024-06-10 07:10:32,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.89 | bwd_microstep: 1203.50 | bwd_inner_microstep: 1203.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003
[2024-06-10 07:10:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.43 | bwd_microstep: 740.73 | bwd_inner_microstep: 740.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-10 07:10:35,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.50 | bwd_microstep: 1525.45 | bwd_inner_microstep: 1525.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640
[2024-06-10 07:10:37,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.52 | bwd_microstep: 1318.77 | bwd_inner_microstep: 1318.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 07:10:39,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1297.71 | bwd_inner_microstep: 1297.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546
[2024-06-10 07:10:40,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.03 | bwd_microstep: 1201.67 | bwd_inner_microstep: 1201.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 07:10:42,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.98 | bwd_microstep: 1615.70 | bwd_inner_microstep: 1615.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 07:10:44,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1516.20 | bwd_inner_microstep: 1516.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967
[2024-06-10 07:10:45,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.82 | bwd_microstep: 705.00 | bwd_inner_microstep: 704.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 07:10:47,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.81 | bwd_microstep: 1361.19 | bwd_inner_microstep: 1361.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911
[2024-06-10 07:10:50,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.28 | bwd_microstep: 1702.01 | bwd_inner_microstep: 1701.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-10 07:10:51,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.30 | bwd_microstep: 909.34 | bwd_inner_microstep: 909.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-10 07:10:53,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1318.36 | bwd_inner_microstep: 1318.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3602
[2024-06-10 07:10:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1573.09 | bwd_inner_microstep: 1573.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 07:10:57,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.69 | bwd_microstep: 1508.47 | bwd_inner_microstep: 1508.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 07:10:59,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1399.78 | bwd_inner_microstep: 1399.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3848
[2024-06-10 07:11:01,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.64 | bwd_microstep: 1764.99 | bwd_inner_microstep: 1764.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 07:11:06,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.20 | optimizer_step: 6.56
[2024-06-10 07:11:06,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.35 | bwd_microstep: 3976.50 | bwd_inner_microstep: 1595.41 | bwd_allreduce_microstep: 2381.04 | step_microstep: 38.56
[2024-06-10 07:11:06,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15823.89 | bwd: 44849.16 | bwd_inner: 42467.11 | bwd_allreduce: 2381.33 | step: 40.29
{'loss': 1.3417, 'learning_rate': 3.641526882414553e-05, 'epoch': 0.22}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 07:11:08,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.89 | bwd_microstep: 1332.73 | bwd_inner_microstep: 1332.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3920
[2024-06-10 07:11:10,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.56 | bwd_microstep: 1687.70 | bwd_inner_microstep: 1687.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2397
[2024-06-10 07:11:11,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.82 | bwd_microstep: 906.57 | bwd_inner_microstep: 906.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 07:11:14,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.19 | bwd_microstep: 1686.90 | bwd_inner_microstep: 1686.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 07:11:15,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1299.59 | bwd_inner_microstep: 1299.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 07:11:17,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1253.74 | bwd_inner_microstep: 1253.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 07:11:19,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 07:11:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1389.23 | bwd_inner_microstep: 1389.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-10 07:11:22,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.83 | bwd_microstep: 687.59 | bwd_inner_microstep: 687.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948
[2024-06-10 07:11:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.21 | bwd_microstep: 731.31 | bwd_inner_microstep: 731.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 07:11:24,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.83 | bwd_microstep: 798.66 | bwd_inner_microstep: 798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684
[2024-06-10 07:11:26,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.78 | bwd_microstep: 1423.03 | bwd_inner_microstep: 1423.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 07:11:28,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.43 | bwd_microstep: 1293.86 | bwd_inner_microstep: 1293.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3421
[2024-06-10 07:11:30,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1308.84 | bwd_inner_microstep: 1308.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3960
[2024-06-10 07:11:32,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.24 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 07:11:34,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1477.13 | bwd_inner_microstep: 1477.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-10 07:11:36,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1513.15 | bwd_inner_microstep: 1513.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 07:11:38,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3373
[2024-06-10 07:11:39,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.72 | bwd_microstep: 1210.49 | bwd_inner_microstep: 1210.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 07:11:41,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.14 | bwd_microstep: 1285.67 | bwd_inner_microstep: 1285.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3947
[2024-06-10 07:11:43,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.21 | bwd_microstep: 1505.91 | bwd_inner_microstep: 1505.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2282
[2024-06-10 07:11:44,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.17 | bwd_microstep: 850.10 | bwd_inner_microstep: 850.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 07:11:46,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.51 | bwd_microstep: 1299.41 | bwd_inner_microstep: 1299.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 07:11:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.24 | bwd_microstep: 810.81 | bwd_inner_microstep: 810.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 07:11:49,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.88 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 07:11:51,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1394.43 | bwd_inner_microstep: 1394.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 07:11:52,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.44 | bwd_microstep: 685.32 | bwd_inner_microstep: 685.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-10 07:11:53,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.26 | bwd_microstep: 879.27 | bwd_inner_microstep: 879.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 07:11:55,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1403.21 | bwd_inner_microstep: 1403.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585
[2024-06-10 07:11:57,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.68 | bwd_microstep: 1532.36 | bwd_inner_microstep: 1532.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 07:11:59,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3596
[2024-06-10 07:12:06,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 07:12:06,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.17 | bwd_microstep: 6082.81 | bwd_inner_microstep: 2048.66 | bwd_allreduce_microstep: 4034.09 | step_microstep: 38.65
[2024-06-10 07:12:06,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15143.05 | bwd: 44569.34 | bwd_inner: 40534.33 | bwd_allreduce: 4034.32 | step: 40.28
{'loss': 1.3252, 'learning_rate': 3.639379796977569e-05, 'epoch': 0.22}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 07:12:08,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1491.50 | bwd_inner_microstep: 1491.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3884
[2024-06-10 07:12:10,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1412.98 | bwd_inner_microstep: 1412.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399
[2024-06-10 07:12:12,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 07:12:14,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1443.65 | bwd_inner_microstep: 1443.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229
[2024-06-10 07:12:15,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.66 | bwd_microstep: 959.80 | bwd_inner_microstep: 959.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3784
[2024-06-10 07:12:17,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.14 | bwd_microstep: 1396.26 | bwd_inner_microstep: 1396.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 07:12:19,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 874
[2024-06-10 07:12:20,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.14 | bwd_microstep: 366.98 | bwd_inner_microstep: 366.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 07:12:22,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.95 | bwd_microstep: 1388.00 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 07:12:23,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1255.02 | bwd_inner_microstep: 1254.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 07:12:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1299.37 | bwd_inner_microstep: 1299.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 07:12:27,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.98 | bwd_microstep: 1381.84 | bwd_inner_microstep: 1381.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 07:12:29,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.22 | bwd_microstep: 1522.75 | bwd_inner_microstep: 1522.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 07:12:31,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.85 | bwd_microstep: 1522.98 | bwd_inner_microstep: 1522.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3672
[2024-06-10 07:12:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.67 | bwd_microstep: 1656.52 | bwd_inner_microstep: 1656.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3528
[2024-06-10 07:12:36,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 07:12:38,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1490.11 | bwd_inner_microstep: 1490.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1995
[2024-06-10 07:12:39,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.01 | bwd_microstep: 787.55 | bwd_inner_microstep: 787.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 07:12:40,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.87 | bwd_microstep: 796.73 | bwd_inner_microstep: 796.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 07:12:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1379.18 | bwd_inner_microstep: 1379.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691
[2024-06-10 07:12:44,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1332.00 | bwd_inner_microstep: 1331.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917
[2024-06-10 07:12:45,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.36 | bwd_microstep: 689.56 | bwd_inner_microstep: 689.40 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 07:12:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1561.77 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 07:12:49,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1413.12 | bwd_inner_microstep: 1413.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748
[2024-06-10 07:12:51,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.29 | bwd_microstep: 1441.78 | bwd_inner_microstep: 1441.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 07:12:52,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1257.35 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1025
[2024-06-10 07:12:53,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 168.61 | bwd_microstep: 433.46 | bwd_inner_microstep: 433.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3604
[2024-06-10 07:12:55,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.06 | bwd_microstep: 1535.97 | bwd_inner_microstep: 1535.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 07:12:57,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.09 | bwd_microstep: 1257.02 | bwd_inner_microstep: 1256.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 07:12:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.78 | bwd_microstep: 1522.85 | bwd_inner_microstep: 1522.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3806
[2024-06-10 07:13:02,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.27 | bwd_microstep: 1802.31 | bwd_inner_microstep: 1802.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3392
[2024-06-10 07:13:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.56
[2024-06-10 07:13:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 4209.69 | bwd_inner_microstep: 1632.62 | bwd_allreduce_microstep: 2577.01 | step_microstep: 38.60
[2024-06-10 07:13:06,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15584.79 | bwd: 44401.26 | bwd_inner: 41823.21 | bwd_allreduce: 2577.31 | step: 40.24
{'loss': 1.2705, 'learning_rate': 3.637226937653461e-05, 'epoch': 0.22}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 07:13:08,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1471.68 | bwd_inner_microstep: 1471.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 07:13:10,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.77 | bwd_microstep: 1477.52 | bwd_inner_microstep: 1477.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-10 07:13:12,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 07:13:14,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 07:13:16,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.06 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885
[2024-06-10 07:13:17,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.90 | bwd_microstep: 682.47 | bwd_inner_microstep: 682.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 07:13:19,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1243.63 | bwd_inner_microstep: 1243.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:13:21,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1247.86 | bwd_inner_microstep: 1247.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 07:13:23,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.10 | bwd_microstep: 1251.76 | bwd_inner_microstep: 1251.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2635
[2024-06-10 07:13:24,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.40 | bwd_microstep: 1021.89 | bwd_inner_microstep: 1021.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540
[2024-06-10 07:13:26,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1426.58 | bwd_inner_microstep: 1426.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1989
[2024-06-10 07:13:27,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.10 | bwd_microstep: 833.32 | bwd_inner_microstep: 833.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 07:13:29,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1428.22 | bwd_inner_microstep: 1428.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 07:13:31,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1248.38 | bwd_inner_microstep: 1248.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3707
[2024-06-10 07:13:33,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.60 | bwd_microstep: 1725.55 | bwd_inner_microstep: 1725.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 07:13:35,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1389.12 | bwd_inner_microstep: 1389.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434
[2024-06-10 07:13:37,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.48 | bwd_microstep: 1186.61 | bwd_inner_microstep: 1186.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 07:13:38,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1254.21 | bwd_inner_microstep: 1254.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 07:13:40,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.30 | bwd_microstep: 1456.33 | bwd_inner_microstep: 1456.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 07:13:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.49 | bwd_microstep: 1656.36 | bwd_inner_microstep: 1656.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-10 07:13:45,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1442.10 | bwd_inner_microstep: 1442.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3431
[2024-06-10 07:13:47,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 07:13:48,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.97 | bwd_microstep: 979.94 | bwd_inner_microstep: 979.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 07:13:49,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.20 | bwd_microstep: 698.01 | bwd_inner_microstep: 697.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 07:13:51,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 1377.68 | bwd_inner_microstep: 1377.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180
[2024-06-10 07:13:52,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.20 | bwd_microstep: 862.12 | bwd_inner_microstep: 862.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2275
[2024-06-10 07:13:53,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.92 | bwd_microstep: 1003.37 | bwd_inner_microstep: 1003.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719
[2024-06-10 07:13:55,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1337.05 | bwd_inner_microstep: 1337.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 07:13:57,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1507.35 | bwd_inner_microstep: 1507.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 07:13:59,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1500.20 | bwd_inner_microstep: 1500.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 07:14:01,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.25 | bwd_microstep: 1331.57 | bwd_inner_microstep: 1331.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927
[2024-06-10 07:14:07,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 07:14:07,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.98 | bwd_microstep: 5440.02 | bwd_inner_microstep: 938.18 | bwd_allreduce_microstep: 4501.79 | step_microstep: 38.62
[2024-06-10 07:14:07,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15233.79 | bwd: 45202.32 | bwd_inner: 40699.61 | bwd_allreduce: 4502.02 | step: 40.27
{'loss': 1.2972, 'learning_rate': 3.6350683120245906e-05, 'epoch': 0.22}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-10 07:14:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.84 | bwd_microstep: 1570.06 | bwd_inner_microstep: 1570.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 07:14:11,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.10 | bwd_microstep: 1241.90 | bwd_inner_microstep: 1241.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 07:14:13,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.33 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 07:14:15,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.29 | bwd_microstep: 1554.99 | bwd_inner_microstep: 1554.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 07:14:17,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1380.91 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 07:14:19,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1276.19 | bwd_inner_microstep: 1276.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406
[2024-06-10 07:14:20,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.04 | bwd_microstep: 1213.71 | bwd_inner_microstep: 1213.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 07:14:22,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1248.97 | bwd_inner_microstep: 1248.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 07:14:24,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1482.22 | bwd_inner_microstep: 1482.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 07:14:26,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1290.67 | bwd_inner_microstep: 1290.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444
[2024-06-10 07:14:28,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.80 | bwd_microstep: 1221.09 | bwd_inner_microstep: 1221.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3676
[2024-06-10 07:14:30,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.71 | bwd_microstep: 1671.19 | bwd_inner_microstep: 1671.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1061
[2024-06-10 07:14:30,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.17 | bwd_microstep: 389.84 | bwd_inner_microstep: 389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670
[2024-06-10 07:14:32,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1524.10 | bwd_inner_microstep: 1524.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3424
[2024-06-10 07:14:34,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1396.66 | bwd_inner_microstep: 1396.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 07:14:36,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1388.37 | bwd_inner_microstep: 1388.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3526
[2024-06-10 07:14:38,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.30 | bwd_microstep: 1441.96 | bwd_inner_microstep: 1441.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3431
[2024-06-10 07:14:40,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.48 | bwd_microstep: 1475.68 | bwd_inner_microstep: 1475.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 07:14:41,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 797.22 | bwd_inner_microstep: 797.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 07:14:43,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1479.23 | bwd_inner_microstep: 1479.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 07:14:45,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.25 | bwd_microstep: 1315.73 | bwd_inner_microstep: 1315.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2766
[2024-06-10 07:14:47,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.93 | bwd_microstep: 1145.17 | bwd_inner_microstep: 1145.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3527
[2024-06-10 07:14:49,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.37 | bwd_microstep: 1690.36 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 07:14:51,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1585.15 | bwd_inner_microstep: 1585.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 07:14:54,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.74 | bwd_microstep: 1534.19 | bwd_inner_microstep: 1534.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 07:14:56,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1556.83 | bwd_inner_microstep: 1556.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 07:14:58,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1602.05 | bwd_inner_microstep: 1602.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 07:15:00,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1355.10 | bwd_inner_microstep: 1355.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 07:15:02,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1378.04 | bwd_inner_microstep: 1378.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 07:15:04,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.67 | bwd_microstep: 1655.86 | bwd_inner_microstep: 1655.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3557
[2024-06-10 07:15:06,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.11 | bwd_microstep: 1265.83 | bwd_inner_microstep: 1265.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 07:15:10,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 07:15:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 3909.98 | bwd_inner_microstep: 1756.43 | bwd_allreduce_microstep: 2153.49 | step_microstep: 38.64
[2024-06-10 07:15:10,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16465.20 | bwd: 46321.65 | bwd_inner: 44167.26 | bwd_allreduce: 2153.72 | step: 40.32


 22%|██▏       | 374/1726 [6:31:40<23:00:02, 61.24s/it]
 22%|██▏       | 375/1726 [6:32:42<23:03:17, 61.43s/it]


 22%|██▏       | 375/1726 [6:32:42<23:03:17, 61.43s/it]
 22%|██▏       | 376/1726 [6:33:43<22:59:28, 61.31s/it]


 22%|██▏       | 376/1726 [6:33:43<22:59:28, 61.31s/it]
 22%|██▏       | 377/1726 [6:34:43<22:49:56, 60.93s/it]


 22%|██▏       | 377/1726 [6:34:43<22:49:56, 60.93s/it]
 22%|██▏       | 378/1726 [6:35:43<22:44:53, 60.75s/it]


 22%|██▏       | 378/1726 [6:35:43<22:44:53, 60.75s/it]
 22%|██▏       | 379/1726 [6:36:44<22:44:06, 60.76s/it]


 22%|██▏       | 379/1726 [6:36:44<22:44:06, 60.76s/it]
 22%|█{'loss': 1.2874, 'learning_rate': 3.6329039276936254e-05, 'epoch': 0.22}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 07:15:12,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.20 | bwd_microstep: 1398.26 | bwd_inner_microstep: 1398.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2433
[2024-06-10 07:15:13,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.99 | bwd_microstep: 914.98 | bwd_inner_microstep: 914.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 07:15:15,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.48 | bwd_microstep: 1244.83 | bwd_inner_microstep: 1244.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 07:15:17,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.04 | bwd_microstep: 1276.34 | bwd_inner_microstep: 1276.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 07:15:19,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.17 | bwd_microstep: 1384.99 | bwd_inner_microstep: 1384.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 07:15:20,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.87 | bwd_microstep: 809.04 | bwd_inner_microstep: 809.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 07:15:22,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.49 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 07:15:23,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 07:15:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1253.17 | bwd_inner_microstep: 1253.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 07:15:26,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.47 | bwd_microstep: 1223.49 | bwd_inner_microstep: 1223.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 07:15:28,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1255.40 | bwd_inner_microstep: 1255.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 1933
[2024-06-10 07:15:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.17 | bwd_microstep: 872.26 | bwd_inner_microstep: 872.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 07:15:31,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1278.81 | bwd_inner_microstep: 1278.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1967
[2024-06-10 07:15:32,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.50 | bwd_microstep: 847.84 | bwd_inner_microstep: 847.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 07:15:33,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.23 | bwd_microstep: 801.14 | bwd_inner_microstep: 801.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2302
[2024-06-10 07:15:34,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 880.00 | bwd_inner_microstep: 879.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681
[2024-06-10 07:15:36,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1328.80 | bwd_inner_microstep: 1328.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3528
[2024-06-10 07:15:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.97 | bwd_microstep: 1230.25 | bwd_inner_microstep: 1230.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2007
[2024-06-10 07:15:39,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.80 | bwd_microstep: 773.71 | bwd_inner_microstep: 773.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 07:15:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.21 | bwd_microstep: 1568.83 | bwd_inner_microstep: 1568.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 07:15:43,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1284.04 | bwd_inner_microstep: 1284.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3540
[2024-06-10 07:15:45,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1543.54 | bwd_inner_microstep: 1543.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702
[2024-06-10 07:15:47,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1339.40 | bwd_inner_microstep: 1339.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 07:15:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1428.21 | bwd_inner_microstep: 1428.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-10 07:15:50,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.39 | bwd_microstep: 876.21 | bwd_inner_microstep: 876.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3621
[2024-06-10 07:15:53,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.05 | bwd_microstep: 1709.07 | bwd_inner_microstep: 1709.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 07:15:54,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1350.40 | bwd_inner_microstep: 1350.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-10 07:15:57,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.94 | bwd_microstep: 1573.98 | bwd_inner_microstep: 1573.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 07:15:59,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1591.42 | bwd_inner_microstep: 1591.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 07:16:01,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.94 | bwd_microstep: 1398.07 | bwd_inner_microstep: 1398.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 07:16:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.46 | bwd_microstep: 1642.58 | bwd_inner_microstep: 1642.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 07:16:13,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.57
[2024-06-10 07:16:13,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 9677.88 | bwd_inner_microstep: 1543.58 | bwd_allreduce_microstep: 8134.24 | step_microstep: 38.91
[2024-06-10 07:16:13,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14873.65 | bwd: 47798.15 | bwd_inner: 39662.99 | bwd_allreduce: 8134.47 | step: 40.53
{'loss': 1.2614, 'learning_rate': 3.630733792283515e-05, 'epoch': 0.22}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 07:16:15,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.34 | bwd_microstep: 1358.25 | bwd_inner_microstep: 1358.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920
[2024-06-10 07:16:17,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.31 | bwd_microstep: 1584.42 | bwd_inner_microstep: 1584.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4287
[2024-06-10 07:16:20,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.80 | bwd_microstep: 1766.92 | bwd_inner_microstep: 1766.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 07:16:22,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1380.25 | bwd_inner_microstep: 1380.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:16:23,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1244.74 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3710
[2024-06-10 07:16:25,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1329.21 | bwd_inner_microstep: 1329.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 07:16:27,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1379.58 | bwd_inner_microstep: 1379.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 07:16:29,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1504.95 | bwd_inner_microstep: 1504.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 07:16:31,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.61 | bwd_microstep: 1522.94 | bwd_inner_microstep: 1522.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 07:16:33,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1277.25 | bwd_inner_microstep: 1277.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 07:16:35,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1293.75 | bwd_inner_microstep: 1293.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 07:16:37,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.44 | bwd_microstep: 1338.87 | bwd_inner_microstep: 1338.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3595
[2024-06-10 07:16:39,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.98 | bwd_microstep: 1467.50 | bwd_inner_microstep: 1467.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512
[2024-06-10 07:16:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.04 | bwd_microstep: 1191.23 | bwd_inner_microstep: 1191.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506
[2024-06-10 07:16:42,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1349.55 | bwd_inner_microstep: 1349.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 07:16:44,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.19 | bwd_microstep: 1464.29 | bwd_inner_microstep: 1464.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 07:16:46,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.30 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 07:16:48,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1299.42 | bwd_inner_microstep: 1299.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 07:16:50,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1398.65 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 07:16:52,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.36 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3734
[2024-06-10 07:16:54,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.17 | bwd_microstep: 1562.19 | bwd_inner_microstep: 1562.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 07:16:56,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1517.03 | bwd_inner_microstep: 1517.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3648
[2024-06-10 07:16:58,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1543.28 | bwd_inner_microstep: 1543.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 07:17:00,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.19 | bwd_microstep: 1239.91 | bwd_inner_microstep: 1239.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3540
[2024-06-10 07:17:02,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1438.70 | bwd_inner_microstep: 1438.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 07:17:04,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1343.20 | bwd_inner_microstep: 1343.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3814
[2024-06-10 07:17:06,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.39 | bwd_microstep: 1818.29 | bwd_inner_microstep: 1818.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3765
[2024-06-10 07:17:09,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.83 | bwd_microstep: 1736.46 | bwd_inner_microstep: 1736.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3755
[2024-06-10 07:17:11,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.57 | bwd_microstep: 1434.33 | bwd_inner_microstep: 1434.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 07:17:13,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.75 | bwd_microstep: 1542.97 | bwd_inner_microstep: 1542.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 07:17:15,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3387
[2024-06-10 07:17:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.17 | optimizer_step: 6.64
[2024-06-10 07:17:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.65 | bwd_microstep: 1283.31 | bwd_inner_microstep: 1275.12 | bwd_allreduce_microstep: 8.14 | step_microstep: 38.36
[2024-06-10 07:17:16,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17078.46 | bwd: 45673.47 | bwd_inner: 45664.43 | bwd_allreduce: 8.36 | step: 39.99
{'loss': 1.2884, 'learning_rate': 3.6285579134374655e-05, 'epoch': 0.22}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1873
[2024-06-10 07:17:17,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.91 | bwd_microstep: 766.43 | bwd_inner_microstep: 766.28 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-10 07:17:20,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.14 | bwd_microstep: 1563.05 | bwd_inner_microstep: 1563.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4205
[2024-06-10 07:17:22,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.29 | bwd_microstep: 1757.05 | bwd_inner_microstep: 1757.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 07:17:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.35 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-10 07:17:26,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1357.04 | bwd_inner_microstep: 1357.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:17:28,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1245.96 | bwd_inner_microstep: 1245.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 07:17:29,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1245.90 | bwd_inner_microstep: 1245.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 07:17:31,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1256.00 | bwd_inner_microstep: 1255.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1961
[2024-06-10 07:17:32,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.93 | bwd_microstep: 888.21 | bwd_inner_microstep: 888.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 07:17:34,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.58 | bwd_microstep: 1287.67 | bwd_inner_microstep: 1287.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 07:17:36,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.14 | bwd_microstep: 1282.26 | bwd_inner_microstep: 1282.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-10 07:17:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1284.92 | bwd_inner_microstep: 1284.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 07:17:40,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1615.02 | bwd_inner_microstep: 1614.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 07:17:42,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1485.04 | bwd_inner_microstep: 1485.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 07:17:44,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1342.57 | bwd_inner_microstep: 1342.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2317
[2024-06-10 07:17:45,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.82 | bwd_microstep: 983.60 | bwd_inner_microstep: 983.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889
[2024-06-10 07:17:47,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.78 | bwd_microstep: 1590.87 | bwd_inner_microstep: 1590.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630
[2024-06-10 07:17:49,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.23 | bwd_microstep: 1314.31 | bwd_inner_microstep: 1314.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674
[2024-06-10 07:17:51,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.68 | bwd_microstep: 1687.70 | bwd_inner_microstep: 1687.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3625
[2024-06-10 07:17:54,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.76 | bwd_microstep: 1557.49 | bwd_inner_microstep: 1557.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816
[2024-06-10 07:17:56,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.77 | bwd_microstep: 1623.35 | bwd_inner_microstep: 1623.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2018
[2024-06-10 07:17:57,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.92 | bwd_microstep: 841.09 | bwd_inner_microstep: 841.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-10 07:17:58,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.50 | bwd_microstep: 807.35 | bwd_inner_microstep: 807.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-10 07:18:00,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.75 | bwd_microstep: 1514.33 | bwd_inner_microstep: 1514.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533
[2024-06-10 07:18:02,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.73 | bwd_microstep: 1228.09 | bwd_inner_microstep: 1228.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 07:18:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.24 | bwd_microstep: 1523.16 | bwd_inner_microstep: 1523.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 07:18:06,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1392.76 | bwd_inner_microstep: 1392.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-10 07:18:07,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.99 | bwd_microstep: 699.68 | bwd_inner_microstep: 699.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 07:18:09,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.09 | bwd_microstep: 1322.30 | bwd_inner_microstep: 1322.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 07:18:11,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1345.57 | bwd_inner_microstep: 1345.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3455
[2024-06-10 07:18:13,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1548.26 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 07:18:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.25 | optimizer_step: 6.58
[2024-06-10 07:18:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 7093.34 | bwd_inner_microstep: 1531.96 | bwd_allreduce_microstep: 5561.33 | step_microstep: 38.89
[2024-06-10 07:18:20,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15805.18 | bwd: 47897.24 | bwd_inner: 42334.90 | bwd_allreduce: 5561.61 | step: 40.57
{'loss': 1.3223, 'learning_rate': 3.626376298818911e-05, 'epoch': 0.22}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478
[2024-06-10 07:18:23,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.70 | bwd_microstep: 1571.23 | bwd_inner_microstep: 1571.16 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3974
[2024-06-10 07:18:25,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.01 | bwd_microstep: 1600.99 | bwd_inner_microstep: 1600.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 07:18:27,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1403.57 | bwd_inner_microstep: 1403.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 07:18:29,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.57 | bwd_microstep: 1449.96 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 07:18:31,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752
[2024-06-10 07:18:33,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1633.85 | bwd_inner_microstep: 1633.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3734
[2024-06-10 07:18:35,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.02 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 07:18:37,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 07:18:38,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.84 | bwd_microstep: 1395.92 | bwd_inner_microstep: 1395.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 07:18:40,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.94 | bwd_microstep: 1275.09 | bwd_inner_microstep: 1275.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 07:18:42,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1287.45 | bwd_inner_microstep: 1287.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2663
[2024-06-10 07:18:44,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.34 | bwd_microstep: 1118.21 | bwd_inner_microstep: 1118.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 07:18:46,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1403.00 | bwd_inner_microstep: 1402.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701
[2024-06-10 07:18:48,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.28 | bwd_microstep: 1616.04 | bwd_inner_microstep: 1616.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502
[2024-06-10 07:18:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.97 | bwd_microstep: 1429.51 | bwd_inner_microstep: 1429.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 07:18:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1247.08 | bwd_inner_microstep: 1247.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 07:18:54,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.65 | bwd_microstep: 1599.35 | bwd_inner_microstep: 1599.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3840
[2024-06-10 07:18:56,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.30 | bwd_microstep: 1585.69 | bwd_inner_microstep: 1585.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 07:18:58,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1276.53 | bwd_inner_microstep: 1276.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 07:18:59,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1289.24 | bwd_inner_microstep: 1289.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-10 07:19:01,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.91 | bwd_microstep: 1189.60 | bwd_inner_microstep: 1189.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 07:19:03,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1509.23 | bwd_inner_microstep: 1509.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 07:19:05,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1525.96 | bwd_inner_microstep: 1525.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 07:19:07,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.22 | bwd_microstep: 1623.06 | bwd_inner_microstep: 1623.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3551
[2024-06-10 07:19:10,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.64 | bwd_microstep: 1561.82 | bwd_inner_microstep: 1561.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 07:19:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.83 | bwd_microstep: 1449.99 | bwd_inner_microstep: 1449.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 07:19:14,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1491.16 | bwd_inner_microstep: 1491.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 07:19:16,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1485.80 | bwd_inner_microstep: 1485.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-10 07:19:18,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1477.86 | bwd_inner_microstep: 1477.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3563
[2024-06-10 07:19:20,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.00 | bwd_microstep: 1330.05 | bwd_inner_microstep: 1330.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 07:19:22,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.21 | bwd_microstep: 1748.99 | bwd_inner_microstep: 1748.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 07:19:24,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.19 | optimizer_step: 6.62
[2024-06-10 07:19:24,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.06 | bwd_microstep: 1382.32 | bwd_inner_microstep: 1374.63 | bwd_allreduce_microstep: 7.65 | step_microstep: 38.50
[2024-06-10 07:19:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17197.76 | bwd: 45983.78 | bwd_inner: 45975.18 | bwd_allreduce: 7.90 | step: 40.16
{'loss': 1.3053, 'learning_rate': 3.624188956111487e-05, 'epoch': 0.22}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2008
[2024-06-10 07:19:25,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.55 | bwd_microstep: 891.56 | bwd_inner_microstep: 891.47 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2866
[2024-06-10 07:19:27,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.40 | bwd_microstep: 1027.04 | bwd_inner_microstep: 1027.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-10 07:19:29,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.78 | bwd_microstep: 1413.15 | bwd_inner_microstep: 1413.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 07:19:30,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.40 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 07:19:32,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1249.65 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 07:19:34,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.56 | bwd_microstep: 1255.26 | bwd_inner_microstep: 1255.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 07:19:36,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1245.69 | bwd_inner_microstep: 1245.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 07:19:37,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1376.82 | bwd_inner_microstep: 1376.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-10 07:19:38,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.59 | bwd_microstep: 683.63 | bwd_inner_microstep: 683.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 07:19:40,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.77 | bwd_microstep: 1524.83 | bwd_inner_microstep: 1524.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502
[2024-06-10 07:19:42,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1446.64 | bwd_inner_microstep: 1446.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2130
[2024-06-10 07:19:44,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.83 | bwd_microstep: 770.26 | bwd_inner_microstep: 770.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 07:19:45,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 800.29 | bwd_inner_microstep: 800.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3530
[2024-06-10 07:19:47,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.87 | bwd_microstep: 1623.66 | bwd_inner_microstep: 1623.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4002
[2024-06-10 07:19:49,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.40 | bwd_microstep: 1811.45 | bwd_inner_microstep: 1811.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 07:19:51,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.58 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 07:19:52,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.82 | bwd_microstep: 680.72 | bwd_inner_microstep: 680.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 07:19:53,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.33 | bwd_microstep: 726.03 | bwd_inner_microstep: 726.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639
[2024-06-10 07:19:55,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.65 | bwd_microstep: 1319.74 | bwd_inner_microstep: 1319.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827
[2024-06-10 07:19:57,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1490.22 | bwd_inner_microstep: 1490.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 07:19:59,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1422.43 | bwd_inner_microstep: 1422.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 07:20:00,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.80 | bwd_microstep: 976.52 | bwd_inner_microstep: 976.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 07:20:02,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1381.38 | bwd_inner_microstep: 1381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 07:20:04,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.73 | bwd_microstep: 1502.73 | bwd_inner_microstep: 1502.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3543
[2024-06-10 07:20:06,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.11 | bwd_microstep: 1451.09 | bwd_inner_microstep: 1451.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2269
[2024-06-10 07:20:08,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.19 | bwd_microstep: 877.94 | bwd_inner_microstep: 877.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 07:20:09,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1401.78 | bwd_inner_microstep: 1401.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 07:20:12,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.76 | bwd_microstep: 1651.38 | bwd_inner_microstep: 1651.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 07:20:14,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.79 | bwd_microstep: 1646.60 | bwd_inner_microstep: 1646.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2011
[2024-06-10 07:20:15,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.45 | bwd_microstep: 834.71 | bwd_inner_microstep: 834.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3766
[2024-06-10 07:20:18,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.04 | bwd_microstep: 1742.70 | bwd_inner_microstep: 1742.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 07:20:25,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.39 | optimizer_step: 6.59
[2024-06-10 07:20:25,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 7341.07 | bwd_inner_microstep: 1682.81 | bwd_allreduce_microstep: 5658.19 | step_microstep: 39.59
[2024-06-10 07:20:25,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15100.60 | bwd: 46073.31 | bwd_inner: 40414.11 | bwd_allreduce: 5658.48 | step: 41.20
{'loss': 1.3032, 'learning_rate': 3.621995893019003e-05, 'epoch': 0.22}
█▏       | 380/1726 [6:37:47<22:59:07, 61.48s/it]


 22%|██▏       | 380/1726 [6:37:47<22:59:07, 61.48s/it]
 22%|██▏       | 381/1726 [6:38:50<23:08:25, 61.94s/it]


 22%|██▏       | 381/1726 [6:38:50<23:08:25, 61.94s/it]
 22%|██▏       | 382/1726 [6:39:53<23:15:17, 62.29s/it]


 22%|██▏       | 382/1726 [6:39:53<23:15:17, 62.29s/it]
 22%|██▏       | 383/1726 [6:40:57<23:26:07, 62.82s/it]


 22%|██▏       | 383/1726 [6:40:57<23:26:07, 62.82s/it]
 22%|██▏       | 384/1726 [6:42:01<23:29:54, 63.04s/it]


 22%|██▏       | 384/1726 [6:42:01<23:29:54, 63.04s/it]
 22%|██▏       | 385/1726 [6:43:02<23:18:42, 62.58s/it]


 22%|█dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 07:20:27,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1365.20 | bwd_inner_microstep: 1365.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 07:20:29,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.30 | bwd_microstep: 1340.21 | bwd_inner_microstep: 1340.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 07:20:31,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.74 | bwd_microstep: 1344.80 | bwd_inner_microstep: 1344.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2252
[2024-06-10 07:20:32,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.75 | bwd_microstep: 967.18 | bwd_inner_microstep: 967.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 07:20:34,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.92 | bwd_microstep: 791.96 | bwd_inner_microstep: 791.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 07:20:36,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.30 | bwd_microstep: 1443.88 | bwd_inner_microstep: 1443.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 07:20:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.47 | bwd_microstep: 1278.23 | bwd_inner_microstep: 1278.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 07:20:39,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.86 | bwd_microstep: 1294.91 | bwd_inner_microstep: 1294.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 07:20:41,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1388.73 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 07:20:43,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1476.51 | bwd_inner_microstep: 1476.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 07:20:45,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1389.69 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 07:20:46,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.55 | bwd_microstep: 806.36 | bwd_inner_microstep: 806.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 07:20:48,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1286.34 | bwd_inner_microstep: 1286.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 07:20:50,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.65 | bwd_microstep: 1614.99 | bwd_inner_microstep: 1614.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 07:20:52,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.89 | bwd_microstep: 1601.00 | bwd_inner_microstep: 1600.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 07:20:53,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.40 | bwd_microstep: 787.22 | bwd_inner_microstep: 787.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143
[2024-06-10 07:20:55,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.31 | bwd_microstep: 834.63 | bwd_inner_microstep: 834.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 07:20:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1392.87 | bwd_inner_microstep: 1392.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 07:20:58,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1397.80 | bwd_inner_microstep: 1397.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 07:21:00,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1427.86 | bwd_inner_microstep: 1427.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 07:21:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.71 | bwd_microstep: 1757.21 | bwd_inner_microstep: 1757.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 07:21:05,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1548.26 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 07:21:07,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1485.61 | bwd_inner_microstep: 1485.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3819
[2024-06-10 07:21:09,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1596.65 | bwd_inner_microstep: 1596.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3432
[2024-06-10 07:21:11,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.57 | bwd_microstep: 1399.98 | bwd_inner_microstep: 1399.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900
[2024-06-10 07:21:12,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.27 | bwd_microstep: 779.90 | bwd_inner_microstep: 779.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 07:21:14,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1285.10 | bwd_inner_microstep: 1285.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424
[2024-06-10 07:21:16,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.98 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1542.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 07:21:18,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1590.60 | bwd_inner_microstep: 1590.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3584
[2024-06-10 07:21:20,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.52 | bwd_microstep: 1568.64 | bwd_inner_microstep: 1568.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 07:21:23,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.34 | bwd_microstep: 1539.14 | bwd_inner_microstep: 1539.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 07:21:28,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-10 07:21:28,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.49 | bwd_microstep: 4402.39 | bwd_inner_microstep: 1949.81 | bwd_allreduce_microstep: 2452.51 | step_microstep: 39.24
[2024-06-10 07:21:28,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.75 | bwd: 45726.75 | bwd_inner: 43273.28 | bwd_allreduce: 2452.76 | step: 40.81
{'loss': 1.3406, 'learning_rate': 3.6197971172654156e-05, 'epoch': 0.22}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 07:21:29,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1336.75 | bwd_inner_microstep: 1336.68 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 07:21:31,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 793.13 | bwd_inner_microstep: 793.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869
[2024-06-10 07:21:33,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.67 | bwd_microstep: 1665.39 | bwd_inner_microstep: 1665.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 07:21:34,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.22 | bwd_microstep: 699.35 | bwd_inner_microstep: 699.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760
[2024-06-10 07:21:36,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.70 | bwd_microstep: 1445.25 | bwd_inner_microstep: 1445.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 07:21:38,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1482.29 | bwd_inner_microstep: 1482.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 07:21:40,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1251.22 | bwd_inner_microstep: 1251.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 07:21:41,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.35 | bwd_microstep: 815.28 | bwd_inner_microstep: 815.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 07:21:43,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.25 | bwd_microstep: 1527.49 | bwd_inner_microstep: 1527.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1963
[2024-06-10 07:21:44,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.36 | bwd_microstep: 734.88 | bwd_inner_microstep: 734.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2169
[2024-06-10 07:21:45,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.09 | bwd_microstep: 983.45 | bwd_inner_microstep: 983.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 07:21:47,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1341.99 | bwd_inner_microstep: 1341.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3409
[2024-06-10 07:21:49,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1509.24 | bwd_inner_microstep: 1509.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 07:21:51,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.84 | bwd_microstep: 1488.28 | bwd_inner_microstep: 1488.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3610
[2024-06-10 07:21:53,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.87 | bwd_microstep: 1671.95 | bwd_inner_microstep: 1671.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-10 07:21:55,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1489.71 | bwd_inner_microstep: 1489.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298
[2024-06-10 07:21:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.31 | bwd_microstep: 978.75 | bwd_inner_microstep: 978.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3689
[2024-06-10 07:21:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1329.76 | bwd_inner_microstep: 1329.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617
[2024-06-10 07:22:01,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1312.87 | bwd_inner_microstep: 1312.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3453
[2024-06-10 07:22:02,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1338.93 | bwd_inner_microstep: 1338.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 07:22:04,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1510.63 | bwd_inner_microstep: 1510.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 07:22:06,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1350.95 | bwd_inner_microstep: 1350.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2760
[2024-06-10 07:22:08,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.16 | bwd_microstep: 1048.25 | bwd_inner_microstep: 1048.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3629
[2024-06-10 07:22:09,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.48 | bwd_microstep: 1248.57 | bwd_inner_microstep: 1248.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 07:22:12,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1493.20 | bwd_inner_microstep: 1493.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591
[2024-06-10 07:22:13,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1338.38 | bwd_inner_microstep: 1338.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441
[2024-06-10 07:22:15,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.67 | bwd_microstep: 953.30 | bwd_inner_microstep: 953.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558
[2024-06-10 07:22:17,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.49 | bwd_microstep: 1562.83 | bwd_inner_microstep: 1562.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 07:22:19,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1489.91 | bwd_inner_microstep: 1489.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 07:22:21,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1396.14 | bwd_inner_microstep: 1396.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 07:22:22,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 816.66 | bwd_inner_microstep: 816.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 07:22:28,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 07:22:28,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 5618.05 | bwd_inner_microstep: 1434.03 | bwd_allreduce_microstep: 4183.98 | step_microstep: 38.72
[2024-06-10 07:22:28,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15255.36 | bwd: 45022.85 | bwd_inner: 40837.91 | bwd_allreduce: 4184.24 | step: 40.38
{'loss': 1.2818, 'learning_rate': 3.617592636594801e-05, 'epoch': 0.22}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 07:22:30,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1331.43 | bwd_inner_microstep: 1331.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 07:22:32,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1243.16 | bwd_inner_microstep: 1243.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 07:22:34,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1380.01 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 07:22:36,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1482.31 | bwd_inner_microstep: 1482.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 07:22:37,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3745
[2024-06-10 07:22:39,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.04 | bwd_microstep: 1340.67 | bwd_inner_microstep: 1340.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2237
[2024-06-10 07:22:41,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.82 | bwd_microstep: 961.92 | bwd_inner_microstep: 961.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 07:22:42,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1322.38 | bwd_inner_microstep: 1322.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 07:22:45,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.45 | bwd_microstep: 1498.85 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 07:22:46,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1431.98 | bwd_inner_microstep: 1431.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-10 07:22:48,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1429.92 | bwd_inner_microstep: 1429.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3634
[2024-06-10 07:22:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1446.41 | bwd_inner_microstep: 1446.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 07:22:52,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.84 | bwd_microstep: 1254.59 | bwd_inner_microstep: 1254.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 07:22:54,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1407.49 | bwd_inner_microstep: 1407.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3515
[2024-06-10 07:22:56,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1442.04 | bwd_inner_microstep: 1442.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-10 07:22:58,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1440.01 | bwd_inner_microstep: 1439.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502
[2024-06-10 07:23:00,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1346.53 | bwd_inner_microstep: 1346.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3842
[2024-06-10 07:23:03,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 680.38 | bwd_microstep: 1867.78 | bwd_inner_microstep: 1867.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 07:23:04,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1347.84 | bwd_inner_microstep: 1347.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441
[2024-06-10 07:23:06,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.64 | bwd_microstep: 1194.39 | bwd_inner_microstep: 1194.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 07:23:08,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1501.95 | bwd_inner_microstep: 1501.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 07:23:10,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1314.04 | bwd_inner_microstep: 1314.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 07:23:12,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.97 | bwd_microstep: 1163.15 | bwd_inner_microstep: 1163.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 07:23:13,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.38 | bwd_microstep: 1257.03 | bwd_inner_microstep: 1257.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 07:23:15,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1514.38 | bwd_inner_microstep: 1514.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 07:23:17,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.32 | bwd_microstep: 976.48 | bwd_inner_microstep: 976.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 07:23:19,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801
[2024-06-10 07:23:21,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.85 | bwd_microstep: 1751.52 | bwd_inner_microstep: 1751.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3759
[2024-06-10 07:23:23,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1499.65 | bwd_inner_microstep: 1499.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2683
[2024-06-10 07:23:24,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.87 | bwd_microstep: 961.31 | bwd_inner_microstep: 961.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3587
[2024-06-10 07:23:27,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.37 | bwd_microstep: 1584.96 | bwd_inner_microstep: 1584.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3717
[2024-06-10 07:23:29,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.62
[2024-06-10 07:23:29,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1850.65 | bwd_inner_microstep: 1558.51 | bwd_allreduce_microstep: 292.10 | step_microstep: 38.38
[2024-06-10 07:23:29,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16368.75 | bwd: 44079.65 | bwd_inner: 43786.62 | bwd_allreduce: 292.33 | step: 39.95
{'loss': 1.2944, 'learning_rate': 3.61538245877133e-05, 'epoch': 0.22}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 07:23:31,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.72 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 07:23:33,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 07:23:35,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1376.75 | bwd_inner_microstep: 1376.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 07:23:37,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.27 | bwd_microstep: 1566.52 | bwd_inner_microstep: 1566.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 07:23:39,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.58 | bwd_microstep: 1481.14 | bwd_inner_microstep: 1481.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 07:23:41,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1550.32 | bwd_inner_microstep: 1550.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-10 07:23:43,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.31 | bwd_microstep: 1185.22 | bwd_inner_microstep: 1185.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743
[2024-06-10 07:23:45,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 1534.30 | bwd_inner_microstep: 1534.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 07:23:47,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 07:23:49,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.12 | bwd_microstep: 1352.86 | bwd_inner_microstep: 1352.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 07:23:50,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.01 | bwd_microstep: 1400.99 | bwd_inner_microstep: 1400.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 07:23:53,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.48 | bwd_microstep: 1487.92 | bwd_inner_microstep: 1487.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 07:23:54,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.43 | bwd_microstep: 1386.61 | bwd_inner_microstep: 1386.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918
[2024-06-10 07:23:55,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.96 | bwd_microstep: 685.80 | bwd_inner_microstep: 685.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 07:23:57,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1484.96 | bwd_inner_microstep: 1484.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 07:23:59,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1249.46 | bwd_inner_microstep: 1249.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 07:24:01,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1460.30 | bwd_inner_microstep: 1460.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 07:24:03,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1311.94 | bwd_inner_microstep: 1311.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 07:24:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1390.00 | bwd_inner_microstep: 1389.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 07:24:07,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1293.14 | bwd_inner_microstep: 1293.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 07:24:09,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1282.67 | bwd_inner_microstep: 1282.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480
[2024-06-10 07:24:10,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1442.00 | bwd_inner_microstep: 1441.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816
[2024-06-10 07:24:13,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.26 | bwd_microstep: 1691.94 | bwd_inner_microstep: 1691.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 07:24:15,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1285.96 | bwd_inner_microstep: 1285.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3681
[2024-06-10 07:24:17,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.08 | bwd_microstep: 1458.30 | bwd_inner_microstep: 1458.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909
[2024-06-10 07:24:18,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.62 | bwd_microstep: 685.46 | bwd_inner_microstep: 685.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3808
[2024-06-10 07:24:20,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.65 | bwd_microstep: 1857.94 | bwd_inner_microstep: 1857.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 07:24:22,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1650.43 | bwd_inner_microstep: 1650.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 07:24:25,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.07 | bwd_microstep: 1598.36 | bwd_inner_microstep: 1598.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3588
[2024-06-10 07:24:27,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1438.77 | bwd_inner_microstep: 1438.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 07:24:28,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1376.50 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3808
[2024-06-10 07:24:31,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 07:24:31,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.18 | bwd_microstep: 1771.66 | bwd_inner_microstep: 1550.23 | bwd_allreduce_microstep: 221.38 | step_microstep: 38.50
[2024-06-10 07:24:31,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16712.50 | bwd: 44820.40 | bwd_inner: 44598.05 | bwd_allreduce: 221.64 | step: 40.14
{'loss': 1.3066, 'learning_rate': 3.6131665915792374e-05, 'epoch': 0.23}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 07:24:33,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.27 | bwd_microstep: 1480.19 | bwd_inner_microstep: 1480.12 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3898
[2024-06-10 07:24:35,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 07:24:37,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1379.39 | bwd_inner_microstep: 1379.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 07:24:38,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1244.02 | bwd_inner_microstep: 1243.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4134
[2024-06-10 07:24:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.38 | bwd_microstep: 1742.01 | bwd_inner_microstep: 1741.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 07:24:43,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1278.16 | bwd_inner_microstep: 1278.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 07:24:45,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1386.26 | bwd_inner_microstep: 1386.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:24:46,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1247.84 | bwd_inner_microstep: 1247.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 07:24:48,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.90 | bwd_microstep: 1195.89 | bwd_inner_microstep: 1195.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3433
[2024-06-10 07:24:50,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1403.69 | bwd_inner_microstep: 1403.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988
[2024-06-10 07:24:51,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.45 | bwd_microstep: 894.98 | bwd_inner_microstep: 894.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 07:24:53,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.90 | bwd_microstep: 1596.31 | bwd_inner_microstep: 1596.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503
[2024-06-10 07:24:55,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1445.41 | bwd_inner_microstep: 1445.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492
[2024-06-10 07:24:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.02 | bwd_microstep: 1575.62 | bwd_inner_microstep: 1575.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508
[2024-06-10 07:24:59,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.97 | bwd_microstep: 1225.38 | bwd_inner_microstep: 1225.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 07:25:01,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1430.79 | bwd_inner_microstep: 1430.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2092
[2024-06-10 07:25:02,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.90 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831
[2024-06-10 07:25:04,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1359.54 | bwd_inner_microstep: 1359.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 07:25:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1397.52 | bwd_inner_microstep: 1397.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 07:25:08,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 1434.19 | bwd_inner_microstep: 1434.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 07:25:10,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.44 | bwd_microstep: 1456.95 | bwd_inner_microstep: 1456.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 07:25:12,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.66 | bwd_microstep: 1460.11 | bwd_inner_microstep: 1460.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 07:25:14,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.02 | bwd_microstep: 1556.58 | bwd_inner_microstep: 1556.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 07:25:16,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 07:25:19,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1608.66 | bwd_inner_microstep: 1608.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547
[2024-06-10 07:25:20,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1327.72 | bwd_inner_microstep: 1327.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-10 07:25:23,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.43 | bwd_microstep: 1548.69 | bwd_inner_microstep: 1548.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588
[2024-06-10 07:25:25,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.79 | bwd_microstep: 1606.64 | bwd_inner_microstep: 1606.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3740
[2024-06-10 07:25:27,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.45 | bwd_microstep: 1731.34 | bwd_inner_microstep: 1731.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 07:25:29,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.19 | bwd_microstep: 972.03 | bwd_inner_microstep: 972.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3799
[2024-06-10 07:25:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.95 | bwd_microstep: 1823.36 | bwd_inner_microstep: 1823.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 07:25:33,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.17 | optimizer_step: 6.64
[2024-06-10 07:25:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.74 | bwd_microstep: 1529.72 | bwd_inner_microstep: 1522.06 | bwd_allreduce_microstep: 7.62 | step_microstep: 38.40
[2024-06-10 07:25:33,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16848.75 | bwd: 45102.99 | bwd_inner: 45094.41 | bwd_allreduce: 7.88 | step: 40.00
{'loss': 1.301, 'learning_rate': 3.610945042822794e-05, 'epoch': 0.23}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 07:25:35,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.75 | bwd_microstep: 1340.62 | bwd_inner_microstep: 1340.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3981
[2024-06-10 07:25:37,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.91 | bwd_microstep: 1704.55 | bwd_inner_microstep: 1704.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 07:25:39,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772
[2024-06-10 07:25:41,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1345.50 | bwd_inner_microstep: 1345.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 07:25:43,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.78 | bwd_microstep: 1152.57 | bwd_inner_microstep: 1152.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 07:25:45,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.38 | bwd_microstep: 1433.16 | bwd_inner_microstep: 1433.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2070
[2024-06-10 07:25:46,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.47 | bwd_microstep: 789.47 | bwd_inner_microstep: 789.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 07:25:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 797.50 | bwd_inner_microstep: 797.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 07:25:49,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1251.97 | bwd_inner_microstep: 1251.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3500
[2024-06-10 07:25:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1513.71 | bwd_inner_microstep: 1513.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 07:25:52,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1343.49 | bwd_inner_microstep: 1343.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 07:25:55,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.35 | bwd_microstep: 1529.09 | bwd_inner_microstep: 1529.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656
[2024-06-10 07:25:57,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1616.75 | bwd_inner_microstep: 1616.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-10 07:25:59,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 07:26:01,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.23 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-10 07:26:02,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.01 | bwd_microstep: 898.87 | bwd_inner_microstep: 898.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 07:26:04,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1394.17 | bwd_inner_microstep: 1394.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109
[2024-06-10 07:26:05,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.06 | bwd_microstep: 921.27 | bwd_inner_microstep: 921.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 07:26:07,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.46 | bwd_microstep: 1398.84 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2024
[2024-06-10 07:26:08,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.16 | bwd_microstep: 744.98 | bwd_inner_microstep: 744.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 07:26:10,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1292.09 | bwd_inner_microstep: 1292.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 07:26:12,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1462.77 | bwd_inner_microstep: 1462.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 07:26:14,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1297.83 | bwd_inner_microstep: 1297.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111
[2024-06-10 07:26:15,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.76 | bwd_microstep: 923.98 | bwd_inner_microstep: 923.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3533
[2024-06-10 07:26:17,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1520.25 | bwd_inner_microstep: 1520.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 07:26:19,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1513.94 | bwd_inner_microstep: 1513.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3810
[2024-06-10 07:26:21,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.46 | bwd_microstep: 1386.86 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 07:26:23,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.40 | bwd_microstep: 1563.59 | bwd_inner_microstep: 1563.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 07:26:26,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.39 | bwd_microstep: 1600.59 | bwd_inner_microstep: 1600.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 07:26:28,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.25 | bwd_microstep: 1483.06 | bwd_inner_microstep: 1483.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 07:26:30,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1463.93 | bwd_inner_microstep: 1463.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 07:26:34,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.27 | optimizer_step: 6.60
[2024-06-10 07:26:34,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 3969.02 | bwd_inner_microstep: 1442.11 | bwd_allreduce_microstep: 2526.86 | step_microstep: 38.99
[2024-06-10 07:26:34,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15798.72 | bwd: 44870.26 | bwd_inner: 42342.46 | bwd_allreduce: 2527.10 | step: 40.57
█▏       | 385/1726 [6:43:02<23:18:42, 62.58s/it]
 22%|██▏       | 386/1726 [6:44:04<23:14:13, 62.43s/it]


 22%|██▏       | 386/1726 [6:44:04<23:14:13, 62.43s/it]
 22%|██▏       | 387/1726 [6:45:05<23:01:06, 61.89s/it]


 22%|██▏       | 387/1726 [6:45:05<23:01:06, 61.89s/it]
 22%|██▏       | 388/1726 [6:46:06<22:52:46, 61.56s/it]


 22%|██▏       | 388/1726 [6:46:06<22:52:46, 61.56s/it]
 23%|██▎       | 389/1726 [6:47:08<22:53:55, 61.66s/it]


 23%|██▎       | 389/1726 [6:47:08<22:53:55, 61.66s/it]
 23%|██▎       | 390/1726 [6:48:10<22:57:14, 61.85s/it]


 23%|██▎       | 390/1726 [6:48:10<22:57:14, 61.85s/it]
 23%|██▎       | 391/1726 [6:49:11<22:50:38, 61.60s/it]
   {'loss': 1.3549, 'learning_rate': 3.608717820326285e-05, 'epoch': 0.23}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1971
[2024-06-10 07:26:35,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.58 | bwd_microstep: 730.33 | bwd_inner_microstep: 730.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 07:26:36,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 805.22 | bwd_inner_microstep: 805.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 07:26:38,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.10 | bwd_microstep: 1384.53 | bwd_inner_microstep: 1384.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437
[2024-06-10 07:26:40,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.26 | bwd_microstep: 1312.37 | bwd_inner_microstep: 1312.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 07:26:42,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.83 | bwd_microstep: 1346.90 | bwd_inner_microstep: 1346.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 07:26:44,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.33 | bwd_microstep: 1481.49 | bwd_inner_microstep: 1481.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 07:26:46,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1296.52 | bwd_inner_microstep: 1296.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4087
[2024-06-10 07:26:48,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.64 | bwd_microstep: 1527.19 | bwd_inner_microstep: 1527.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635
[2024-06-10 07:26:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.63 | bwd_microstep: 1418.66 | bwd_inner_microstep: 1418.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 07:26:52,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1394.40 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 07:26:54,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.01 | bwd_microstep: 1290.60 | bwd_inner_microstep: 1290.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 07:26:55,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1391.67 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 07:26:57,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.51 | bwd_microstep: 1297.99 | bwd_inner_microstep: 1297.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 07:26:58,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 801.74 | bwd_inner_microstep: 801.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3427
[2024-06-10 07:27:00,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1297.23 | bwd_inner_microstep: 1297.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 07:27:02,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 07:27:04,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.54 | bwd_microstep: 1450.79 | bwd_inner_microstep: 1450.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2670
[2024-06-10 07:27:06,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.52 | bwd_microstep: 1119.72 | bwd_inner_microstep: 1119.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633
[2024-06-10 07:27:08,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1538.00 | bwd_inner_microstep: 1537.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 07:27:10,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.54 | bwd_microstep: 1349.19 | bwd_inner_microstep: 1349.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3853
[2024-06-10 07:27:12,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.96 | bwd_microstep: 1627.39 | bwd_inner_microstep: 1627.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 07:27:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.31 | bwd_microstep: 1409.37 | bwd_inner_microstep: 1409.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 07:27:16,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.89 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3549
[2024-06-10 07:27:18,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.62 | bwd_microstep: 1691.64 | bwd_inner_microstep: 1691.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3487
[2024-06-10 07:27:20,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.19 | bwd_microstep: 1331.91 | bwd_inner_microstep: 1331.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 07:27:22,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1412.13 | bwd_inner_microstep: 1412.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 07:27:23,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 809.48 | bwd_inner_microstep: 809.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3453
[2024-06-10 07:27:25,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.89 | bwd_microstep: 1229.06 | bwd_inner_microstep: 1229.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732
[2024-06-10 07:27:27,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1628.45 | bwd_inner_microstep: 1628.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1996
[2024-06-10 07:27:28,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.21 | bwd_microstep: 772.30 | bwd_inner_microstep: 772.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 07:27:30,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1492.32 | bwd_inner_microstep: 1492.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 07:27:35,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 07:27:35,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.89 | bwd_microstep: 4431.26 | bwd_inner_microstep: 1744.48 | bwd_allreduce_microstep: 2686.72 | step_microstep: 38.66
[2024-06-10 07:27:35,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15766.10 | bwd: 44936.57 | bwd_inner: 42248.84 | bwd_allreduce: 2687.00 | step: 40.22
{'loss': 1.2497, 'learning_rate': 3.6064849319339764e-05, 'epoch': 0.23}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 07:27:37,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 07:27:39,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1280.37 | bwd_inner_microstep: 1280.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4308
[2024-06-10 07:27:41,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1511.56 | bwd_inner_microstep: 1511.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 07:27:43,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1241.35 | bwd_inner_microstep: 1241.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 07:27:45,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.27 | bwd_microstep: 1444.46 | bwd_inner_microstep: 1444.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 07:27:47,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 07:27:48,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1186.84 | bwd_inner_microstep: 1186.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-10 07:27:49,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.96 | bwd_microstep: 681.30 | bwd_inner_microstep: 681.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 07:27:51,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.85 | bwd_microstep: 1534.11 | bwd_inner_microstep: 1534.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444
[2024-06-10 07:27:53,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.03 | bwd_microstep: 1219.20 | bwd_inner_microstep: 1219.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515
[2024-06-10 07:27:55,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.88 | bwd_microstep: 1320.11 | bwd_inner_microstep: 1320.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 07:27:57,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1375.41 | bwd_inner_microstep: 1375.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 07:27:59,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1351.26 | bwd_inner_microstep: 1351.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2070
[2024-06-10 07:28:00,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.45 | bwd_microstep: 754.70 | bwd_inner_microstep: 754.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3479
[2024-06-10 07:28:02,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.25 | bwd_microstep: 1581.73 | bwd_inner_microstep: 1581.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3654
[2024-06-10 07:28:04,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.66 | bwd_microstep: 1542.44 | bwd_inner_microstep: 1542.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3552
[2024-06-10 07:28:06,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1457.06 | bwd_inner_microstep: 1457.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3842
[2024-06-10 07:28:08,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.17 | bwd_microstep: 1598.16 | bwd_inner_microstep: 1598.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 07:28:10,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1258.57 | bwd_inner_microstep: 1258.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 07:28:11,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.30 | bwd_microstep: 808.42 | bwd_inner_microstep: 808.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 07:28:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1394.67 | bwd_inner_microstep: 1394.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 07:28:15,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.17 | bwd_microstep: 1298.56 | bwd_inner_microstep: 1298.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 07:28:16,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.18 | bwd_microstep: 797.37 | bwd_inner_microstep: 797.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-10 07:28:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.68 | bwd_microstep: 1193.59 | bwd_inner_microstep: 1193.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721
[2024-06-10 07:28:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1338.69 | bwd_inner_microstep: 1338.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568
[2024-06-10 07:28:22,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.21 | bwd_microstep: 1595.67 | bwd_inner_microstep: 1595.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 07:28:23,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.77 | bwd_microstep: 1258.14 | bwd_inner_microstep: 1258.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3591
[2024-06-10 07:28:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.94 | bwd_microstep: 1527.74 | bwd_inner_microstep: 1527.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2207
[2024-06-10 07:28:27,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.41 | bwd_microstep: 862.04 | bwd_inner_microstep: 862.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 07:28:29,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.39 | bwd_microstep: 1378.19 | bwd_inner_microstep: 1378.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 07:28:31,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453
[2024-06-10 07:28:37,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 07:28:37,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.38 | bwd_microstep: 6244.48 | bwd_inner_microstep: 1342.83 | bwd_allreduce_microstep: 4901.59 | step_microstep: 38.80
[2024-06-10 07:28:37,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15506.07 | bwd: 46202.24 | bwd_inner: 41299.72 | bwd_allreduce: 4901.82 | step: 40.37
{'loss': 1.2801, 'learning_rate': 3.604246385510088e-05, 'epoch': 0.23}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958
[2024-06-10 07:28:39,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.11 | bwd_microstep: 889.33 | bwd_inner_microstep: 889.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 07:28:40,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 791.86 | bwd_inner_microstep: 791.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 07:28:42,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.37 | bwd_microstep: 1624.19 | bwd_inner_microstep: 1624.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 07:28:44,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1283.82 | bwd_inner_microstep: 1283.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 07:28:46,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 07:28:48,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1342.96 | bwd_inner_microstep: 1342.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 07:28:49,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 07:28:51,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.92 | bwd_microstep: 1476.15 | bwd_inner_microstep: 1476.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2192
[2024-06-10 07:28:53,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.04 | bwd_microstep: 890.23 | bwd_inner_microstep: 890.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739
[2024-06-10 07:28:55,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.38 | bwd_microstep: 1637.19 | bwd_inner_microstep: 1637.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-10 07:28:56,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.50 | bwd_microstep: 797.54 | bwd_inner_microstep: 797.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 07:28:57,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.10 | bwd_microstep: 797.90 | bwd_inner_microstep: 797.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142
[2024-06-10 07:28:58,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.08 | bwd_microstep: 930.45 | bwd_inner_microstep: 930.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 07:29:00,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1385.35 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429
[2024-06-10 07:29:02,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 07:29:04,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.46 | bwd_microstep: 1386.18 | bwd_inner_microstep: 1386.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3653
[2024-06-10 07:29:06,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.62 | bwd_microstep: 1573.32 | bwd_inner_microstep: 1573.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 07:29:08,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:29:10,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1244.24 | bwd_inner_microstep: 1244.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3541
[2024-06-10 07:29:12,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.84 | bwd_microstep: 1428.11 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3681
[2024-06-10 07:29:14,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 07:29:16,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1509.98 | bwd_inner_microstep: 1509.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 07:29:18,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.42 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2075
[2024-06-10 07:29:19,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.25 | bwd_microstep: 854.30 | bwd_inner_microstep: 854.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 07:29:21,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.18 | bwd_microstep: 808.08 | bwd_inner_microstep: 808.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2966
[2024-06-10 07:29:22,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.71 | bwd_microstep: 1204.37 | bwd_inner_microstep: 1204.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3529
[2024-06-10 07:29:24,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.41 | bwd_microstep: 1227.68 | bwd_inner_microstep: 1227.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769
[2024-06-10 07:29:26,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.30 | bwd_microstep: 1576.19 | bwd_inner_microstep: 1576.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3579
[2024-06-10 07:29:28,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.53 | bwd_microstep: 1564.98 | bwd_inner_microstep: 1564.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3812
[2024-06-10 07:29:31,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.99 | bwd_microstep: 1708.06 | bwd_inner_microstep: 1708.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641
[2024-06-10 07:29:33,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 1439.88 | bwd_inner_microstep: 1439.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773
[2024-06-10 07:29:40,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 07:29:40,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.42 | bwd_microstep: 6238.02 | bwd_inner_microstep: 1980.02 | bwd_allreduce_microstep: 4257.94 | step_microstep: 38.72
[2024-06-10 07:29:40,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15597.52 | bwd: 46290.63 | bwd_inner: 42031.68 | bwd_allreduce: 4258.22 | step: 40.31
{'loss': 1.286, 'learning_rate': 3.602002188938769e-05, 'epoch': 0.23}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 07:29:41,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1367.95 | bwd_inner_microstep: 1367.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 07:29:43,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1445.21 | bwd_inner_microstep: 1445.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 07:29:45,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1273.89 | bwd_inner_microstep: 1273.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 07:29:47,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.51 | bwd_microstep: 1269.10 | bwd_inner_microstep: 1269.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2431
[2024-06-10 07:29:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.05 | bwd_microstep: 846.61 | bwd_inner_microstep: 846.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883
[2024-06-10 07:29:49,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.05 | bwd_microstep: 708.49 | bwd_inner_microstep: 708.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 07:29:51,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1380.35 | bwd_inner_microstep: 1380.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 07:29:53,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.61 | bwd_microstep: 1275.61 | bwd_inner_microstep: 1275.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 07:29:55,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1383.08 | bwd_inner_microstep: 1383.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 07:29:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1272.66 | bwd_inner_microstep: 1272.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 07:29:58,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.73 | bwd_microstep: 794.22 | bwd_inner_microstep: 794.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 07:29:59,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.53 | bwd_microstep: 802.78 | bwd_inner_microstep: 802.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411
[2024-06-10 07:30:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1308.46 | bwd_inner_microstep: 1308.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910
[2024-06-10 07:30:02,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.67 | bwd_microstep: 777.53 | bwd_inner_microstep: 777.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3690
[2024-06-10 07:30:04,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1553.66 | bwd_inner_microstep: 1553.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 07:30:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 07:30:08,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1381.20 | bwd_inner_microstep: 1381.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3526
[2024-06-10 07:30:10,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.74 | bwd_microstep: 1358.76 | bwd_inner_microstep: 1358.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940
[2024-06-10 07:30:11,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.53 | bwd_microstep: 698.77 | bwd_inner_microstep: 698.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 07:30:13,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.42 | bwd_microstep: 1506.48 | bwd_inner_microstep: 1506.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 07:30:14,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.90 | bwd_microstep: 801.11 | bwd_inner_microstep: 801.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-10 07:30:16,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.63 | bwd_microstep: 1626.27 | bwd_inner_microstep: 1626.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 07:30:18,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1559.06 | bwd_inner_microstep: 1559.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 07:30:20,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1298.02 | bwd_inner_microstep: 1298.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 07:30:22,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 07:30:24,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.62 | bwd_microstep: 1755.94 | bwd_inner_microstep: 1755.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284
[2024-06-10 07:30:26,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.94 | bwd_microstep: 1076.94 | bwd_inner_microstep: 1076.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3466
[2024-06-10 07:30:28,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1505.13 | bwd_inner_microstep: 1505.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 07:30:30,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.20 | bwd_microstep: 1460.46 | bwd_inner_microstep: 1460.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 07:30:31,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.76 | bwd_microstep: 969.80 | bwd_inner_microstep: 969.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 07:30:32,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 699.54 | bwd_inner_microstep: 699.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3798
[2024-06-10 07:30:41,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.31 | optimizer_step: 6.61
[2024-06-10 07:30:41,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.56 | bwd_microstep: 7667.00 | bwd_inner_microstep: 1947.18 | bwd_allreduce_microstep: 5719.77 | step_microstep: 38.98
[2024-06-10 07:30:41,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14950.48 | bwd: 45807.18 | bwd_inner: 40086.50 | bwd_allreduce: 5720.00 | step: 40.51
{'loss': 1.3113, 'learning_rate': 3.59975235012407e-05, 'epoch': 0.23}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 07:30:42,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.69 | bwd_microstep: 1301.50 | bwd_inner_microstep: 1301.37 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 07:30:44,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1375.75 | bwd_inner_microstep: 1375.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546
[2024-06-10 07:30:46,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1327.24 | bwd_inner_microstep: 1327.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 07:30:48,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.70 | bwd_microstep: 1183.60 | bwd_inner_microstep: 1183.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 07:30:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1447.66 | bwd_inner_microstep: 1447.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 07:30:51,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.18 | bwd_microstep: 789.76 | bwd_inner_microstep: 789.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 07:30:53,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3759
[2024-06-10 07:30:55,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 07:30:57,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1530.00 | bwd_inner_microstep: 1529.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-10 07:30:58,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 821.50 | bwd_inner_microstep: 821.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 07:31:00,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.87 | bwd_microstep: 1153.20 | bwd_inner_microstep: 1153.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3500
[2024-06-10 07:31:02,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.11 | bwd_microstep: 1441.50 | bwd_inner_microstep: 1441.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2012
[2024-06-10 07:31:03,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.32 | bwd_microstep: 854.57 | bwd_inner_microstep: 854.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 07:31:05,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1382.12 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 07:31:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.58 | bwd_microstep: 1577.82 | bwd_inner_microstep: 1577.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661
[2024-06-10 07:31:09,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.33 | bwd_microstep: 1717.57 | bwd_inner_microstep: 1717.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 07:31:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1500.78 | bwd_inner_microstep: 1500.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3669
[2024-06-10 07:31:13,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.53 | bwd_microstep: 1567.40 | bwd_inner_microstep: 1567.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 07:31:15,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.51 | bwd_microstep: 1453.84 | bwd_inner_microstep: 1453.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 07:31:17,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1502.94 | bwd_inner_microstep: 1502.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 07:31:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.08 | bwd_microstep: 1647.56 | bwd_inner_microstep: 1647.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3826
[2024-06-10 07:31:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.04 | bwd_microstep: 1855.21 | bwd_inner_microstep: 1855.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 07:31:24,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 07:31:26,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1383.68 | bwd_inner_microstep: 1383.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 07:31:28,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1397.96 | bwd_inner_microstep: 1397.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 07:31:30,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.23 | bwd_microstep: 1647.52 | bwd_inner_microstep: 1647.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476
[2024-06-10 07:31:32,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.89 | bwd_microstep: 1405.89 | bwd_inner_microstep: 1405.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591
[2024-06-10 07:31:34,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1339.72 | bwd_inner_microstep: 1339.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3573
[2024-06-10 07:31:36,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.55 | bwd_microstep: 1237.27 | bwd_inner_microstep: 1237.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 07:31:37,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1254.67 | bwd_inner_microstep: 1254.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3580
[2024-06-10 07:31:39,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.92 | bwd_microstep: 1332.90 | bwd_inner_microstep: 1332.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 07:31:43,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 07:31:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 2973.76 | bwd_inner_microstep: 1451.43 | bwd_allreduce_microstep: 1522.28 | step_microstep: 38.89
[2024-06-10 07:31:43,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16403.77 | bwd: 45415.05 | bwd_inner: 43891.76 | bwd_allreduce: 1522.58 | step: 40.47
{'loss': 1.3275, 'learning_rate': 3.597496876989909e-05, 'epoch': 0.23}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 07:31:45,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1333.38 | bwd_inner_microstep: 1333.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 07:31:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1294.63 | bwd_inner_microstep: 1294.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 07:31:48,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.85 | bwd_microstep: 1449.55 | bwd_inner_microstep: 1449.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475
[2024-06-10 07:31:50,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.45 | bwd_microstep: 1411.08 | bwd_inner_microstep: 1411.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758
[2024-06-10 07:31:53,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.80 | bwd_microstep: 1643.35 | bwd_inner_microstep: 1643.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2484
[2024-06-10 07:31:54,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 953.70 | bwd_inner_microstep: 953.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3728
[2024-06-10 07:31:56,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1364.78 | bwd_inner_microstep: 1364.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-10 07:31:57,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.84 | bwd_microstep: 731.86 | bwd_inner_microstep: 731.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2914
[2024-06-10 07:31:58,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.87 | bwd_microstep: 1080.67 | bwd_inner_microstep: 1080.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409
[2024-06-10 07:32:00,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1305.62 | bwd_inner_microstep: 1305.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 07:32:02,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1344.47 | bwd_inner_microstep: 1344.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 07:32:04,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1345.88 | bwd_inner_microstep: 1345.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 07:32:06,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.88 | bwd_microstep: 1499.10 | bwd_inner_microstep: 1499.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 07:32:08,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.46 | bwd_microstep: 1556.65 | bwd_inner_microstep: 1556.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3467
[2024-06-10 07:32:10,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1342.48 | bwd_inner_microstep: 1342.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2104
[2024-06-10 07:32:11,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.68 | bwd_microstep: 920.32 | bwd_inner_microstep: 920.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 07:32:13,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1412.08 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-10 07:32:14,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.32 | bwd_microstep: 700.87 | bwd_inner_microstep: 700.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510
[2024-06-10 07:32:16,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.93 | bwd_microstep: 1225.47 | bwd_inner_microstep: 1225.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 07:32:18,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.79 | bwd_microstep: 1297.14 | bwd_inner_microstep: 1297.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3621
[2024-06-10 07:32:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.74 | bwd_microstep: 1250.20 | bwd_inner_microstep: 1250.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 07:32:21,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1400.90 | bwd_inner_microstep: 1400.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 07:32:24,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1556.66 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 07:32:25,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1398.49 | bwd_inner_microstep: 1398.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 07:32:28,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1537.26 | bwd_inner_microstep: 1537.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553
[2024-06-10 07:32:29,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1328.79 | bwd_inner_microstep: 1328.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 07:32:31,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.19 | bwd_microstep: 881.63 | bwd_inner_microstep: 881.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-10 07:32:33,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.81 | bwd_microstep: 1431.82 | bwd_inner_microstep: 1431.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066
[2024-06-10 07:32:34,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.11 | bwd_microstep: 818.03 | bwd_inner_microstep: 818.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3827
[2024-06-10 07:32:36,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.72 | bwd_microstep: 1616.74 | bwd_inner_microstep: 1616.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3618
[2024-06-10 07:32:38,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.24 | bwd_microstep: 1707.26 | bwd_inner_microstep: 1707.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 07:32:43,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 07:32:43,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.06 | bwd_microstep: 3579.38 | bwd_inner_microstep: 1870.90 | bwd_allreduce_microstep: 1708.44 | step_microstep: 38.86
[2024-06-10 07:32:43,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15704.74 | bwd: 43720.29 | bwd_inner: 42010.92 | bwd_allreduce: 1708.67 | step: 40.43


 23%|██▎       | 391/1726 [6:49:11<22:50:38, 61.60s/it]
 23%|██▎       | 392/1726 [6:50:12<22:45:55, 61.44s/it]


 23%|██▎       | 392/1726 [6:50:12<22:45:55, 61.44s/it]
 23%|██▎       | 393/1726 [6:51:14<22:48:58, 61.62s/it]


 23%|██▎       | 393/1726 [6:51:14<22:48:58, 61.62s/it]
 23%|██▎       | 394/1726 [6:52:16<22:52:03, 61.80s/it]


 23%|██▎       | 394/1726 [6:52:16<22:52:03, 61.80s/it]
 23%|██▎       | 395/1726 [6:53:17<22:46:20, 61.59s/it]


 23%|██▎       | 395/1726 [6:53:17<22:46:20, 61.59s/it]
 23%|██▎       | 396/1726 [6:54:20<22:49:08, 61.77s/it]


 23%|██▎       | 396/1726 [6:54:20<22:49:08, 61.77s/it]
 23{'loss': 1.3117, 'learning_rate': 3.5952357774800526e-05, 'epoch': 0.23}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 07:32:44,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.73 | bwd_microstep: 788.26 | bwd_inner_microstep: 788.16 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3910
[2024-06-10 07:32:46,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.84 | bwd_microstep: 1692.22 | bwd_inner_microstep: 1692.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3885
[2024-06-10 07:32:48,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1584.08 | bwd_inner_microstep: 1584.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 07:32:50,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1480.20 | bwd_inner_microstep: 1480.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 07:32:52,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 1544.98 | bwd_inner_microstep: 1544.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822
[2024-06-10 07:32:54,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.03 | bwd_microstep: 1384.67 | bwd_inner_microstep: 1384.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 07:32:56,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1550.48 | bwd_inner_microstep: 1550.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 07:32:58,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.00 | bwd_inner_microstep: 1248.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 07:33:00,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.99 | bwd_microstep: 1392.03 | bwd_inner_microstep: 1392.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2060
[2024-06-10 07:33:01,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.54 | bwd_microstep: 755.28 | bwd_inner_microstep: 755.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 07:33:03,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.13 | bwd_microstep: 1617.53 | bwd_inner_microstep: 1617.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1860
[2024-06-10 07:33:04,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.07 | bwd_microstep: 708.01 | bwd_inner_microstep: 707.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-10 07:33:06,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1435.98 | bwd_inner_microstep: 1435.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 07:33:08,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3422
[2024-06-10 07:33:10,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.64 | bwd_microstep: 1474.35 | bwd_inner_microstep: 1474.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 07:33:12,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1290.80 | bwd_inner_microstep: 1290.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 07:33:14,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.06 | bwd_microstep: 1544.58 | bwd_inner_microstep: 1544.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 07:33:16,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1474.13 | bwd_inner_microstep: 1474.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1937
[2024-06-10 07:33:17,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.24 | bwd_microstep: 742.85 | bwd_inner_microstep: 742.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 07:33:19,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1494.52 | bwd_inner_microstep: 1494.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122
[2024-06-10 07:33:21,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.01 | bwd_microstep: 922.60 | bwd_inner_microstep: 922.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2286
[2024-06-10 07:33:22,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.93 | bwd_microstep: 814.81 | bwd_inner_microstep: 814.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176
[2024-06-10 07:33:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.35 | bwd_microstep: 855.91 | bwd_inner_microstep: 855.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2239
[2024-06-10 07:33:24,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.30 | bwd_microstep: 903.53 | bwd_inner_microstep: 903.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 07:33:26,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1654.02 | bwd_inner_microstep: 1653.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 07:33:28,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 07:33:30,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.02 | bwd_microstep: 964.87 | bwd_inner_microstep: 964.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2269
[2024-06-10 07:33:31,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 876.38 | bwd_inner_microstep: 876.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 07:33:33,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.97 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 07:33:35,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1399.70 | bwd_inner_microstep: 1399.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 857
[2024-06-10 07:33:36,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.75 | bwd_microstep: 348.33 | bwd_inner_microstep: 348.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263
[2024-06-10 07:33:43,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 07:33:43,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.57 | bwd_microstep: 6758.00 | bwd_inner_microstep: 1105.86 | bwd_allreduce_microstep: 5652.08 | step_microstep: 38.87
[2024-06-10 07:33:43,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14729.97 | bwd: 45111.87 | bwd_inner: 39458.77 | bwd_allreduce: 5652.37 | step: 40.50
{'loss': 1.3336, 'learning_rate': 3.5929690595580804e-05, 'epoch': 0.23}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1949
[2024-06-10 07:33:44,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.39 | bwd_microstep: 881.63 | bwd_inner_microstep: 881.49 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 07:33:46,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1379.34 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 07:33:48,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1374.20 | bwd_inner_microstep: 1374.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601
[2024-06-10 07:33:50,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.09 | bwd_microstep: 1303.59 | bwd_inner_microstep: 1303.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 07:33:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.91 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 07:33:53,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1281.53 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 07:33:55,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1403.13 | bwd_inner_microstep: 1403.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662
[2024-06-10 07:33:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 1425.48 | bwd_inner_microstep: 1425.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 07:33:59,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.22 | bwd_microstep: 1449.18 | bwd_inner_microstep: 1449.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 07:34:01,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1254.52 | bwd_inner_microstep: 1254.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 07:34:03,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1961
[2024-06-10 07:34:04,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.25 | bwd_microstep: 889.91 | bwd_inner_microstep: 889.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895
[2024-06-10 07:34:05,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.19 | bwd_microstep: 774.57 | bwd_inner_microstep: 774.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-10 07:34:06,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 704.56 | bwd_inner_microstep: 704.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 07:34:08,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 07:34:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.72 | bwd_microstep: 1554.80 | bwd_inner_microstep: 1554.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 07:34:12,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 07:34:14,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.11 | bwd_microstep: 1429.91 | bwd_inner_microstep: 1429.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856
[2024-06-10 07:34:16,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.35 | bwd_microstep: 1664.72 | bwd_inner_microstep: 1664.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188
[2024-06-10 07:34:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.14 | bwd_microstep: 956.48 | bwd_inner_microstep: 956.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 07:34:19,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1256.25 | bwd_inner_microstep: 1256.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 07:34:22,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.67 | bwd_microstep: 1654.78 | bwd_inner_microstep: 1654.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2291
[2024-06-10 07:34:23,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.23 | bwd_microstep: 941.17 | bwd_inner_microstep: 941.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3543
[2024-06-10 07:34:25,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.46 | bwd_microstep: 1691.97 | bwd_inner_microstep: 1691.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3604
[2024-06-10 07:34:27,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.44 | bwd_microstep: 1271.03 | bwd_inner_microstep: 1271.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-10 07:34:29,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.72 | bwd_microstep: 1320.76 | bwd_inner_microstep: 1320.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3787
[2024-06-10 07:34:31,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.65 | bwd_microstep: 1751.45 | bwd_inner_microstep: 1751.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 07:34:33,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.05 | bwd_microstep: 1495.28 | bwd_inner_microstep: 1495.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2947
[2024-06-10 07:34:35,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.58 | bwd_microstep: 1290.05 | bwd_inner_microstep: 1290.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 07:34:37,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 1605.38 | bwd_inner_microstep: 1605.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 07:34:40,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.14 | bwd_microstep: 1649.86 | bwd_inner_microstep: 1649.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 07:34:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 07:34:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.04 | bwd_microstep: 4006.96 | bwd_inner_microstep: 1657.59 | bwd_allreduce_microstep: 2349.31 | step_microstep: 38.66
[2024-06-10 07:34:44,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15928.58 | bwd: 45139.34 | bwd_inner: 42789.01 | bwd_allreduce: 2349.60 | step: 40.27
{'loss': 1.2773, 'learning_rate': 3.590696731207361e-05, 'epoch': 0.23}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3455
[2024-06-10 07:34:46,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.07 | bwd_microstep: 1476.24 | bwd_inner_microstep: 1476.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3958
[2024-06-10 07:34:48,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.95 | bwd_microstep: 1528.94 | bwd_inner_microstep: 1528.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2997
[2024-06-10 07:34:50,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.03 | bwd_microstep: 1109.77 | bwd_inner_microstep: 1109.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 07:34:52,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1346.10 | bwd_inner_microstep: 1346.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 07:34:54,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1552.13 | bwd_inner_microstep: 1552.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 07:34:56,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3748
[2024-06-10 07:34:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.44 | bwd_microstep: 1246.67 | bwd_inner_microstep: 1246.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 07:34:59,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1251.19 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-10 07:35:01,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.63 | bwd_microstep: 1539.32 | bwd_inner_microstep: 1539.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3800
[2024-06-10 07:35:04,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.32 | bwd_microstep: 1617.74 | bwd_inner_microstep: 1617.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 07:35:05,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.42 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 07:35:08,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.73 | bwd_microstep: 1527.28 | bwd_inner_microstep: 1527.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2116
[2024-06-10 07:35:09,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.50 | bwd_microstep: 927.97 | bwd_inner_microstep: 927.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 07:35:11,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1445.53 | bwd_inner_microstep: 1445.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424
[2024-06-10 07:35:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.18 | bwd_microstep: 1313.35 | bwd_inner_microstep: 1313.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 07:35:14,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.56 | bwd_microstep: 908.86 | bwd_inner_microstep: 908.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 07:35:16,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1387.31 | bwd_inner_microstep: 1387.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 07:35:17,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.79 | bwd_microstep: 717.70 | bwd_inner_microstep: 717.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 07:35:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1491.63 | bwd_inner_microstep: 1491.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 07:35:21,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 07:35:23,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1378.70 | bwd_inner_microstep: 1378.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291
[2024-06-10 07:35:24,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.03 | bwd_microstep: 914.78 | bwd_inner_microstep: 914.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2003
[2024-06-10 07:35:25,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.74 | bwd_microstep: 711.93 | bwd_inner_microstep: 711.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 07:35:27,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1298.40 | bwd_inner_microstep: 1298.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 07:35:29,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.54 | bwd_microstep: 1657.48 | bwd_inner_microstep: 1657.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893
[2024-06-10 07:35:30,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.21 | bwd_microstep: 684.88 | bwd_inner_microstep: 684.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-10 07:35:31,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.04 | bwd_microstep: 913.64 | bwd_inner_microstep: 913.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3547
[2024-06-10 07:35:33,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1361.55 | bwd_inner_microstep: 1361.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 07:35:35,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1489.07 | bwd_inner_microstep: 1489.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 07:35:38,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1605.83 | bwd_inner_microstep: 1605.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-10 07:35:40,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.92 | bwd_microstep: 1604.13 | bwd_inner_microstep: 1604.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2240
[2024-06-10 07:35:46,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 07:35:46,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.84 | bwd_microstep: 6144.02 | bwd_inner_microstep: 1130.04 | bwd_allreduce_microstep: 5013.93 | step_microstep: 38.61
[2024-06-10 07:35:46,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15445.09 | bwd: 46426.16 | bwd_inner: 41411.31 | bwd_allreduce: 5014.16 | step: 40.30
{'loss': 1.3035, 'learning_rate': 3.5884188004310244e-05, 'epoch': 0.23}
%|██▎       | 397/1726 [6:55:19<22:34:50, 61.17s/it]


 23%|██▎       | 397/1726 [6:55:19<22:34:50, 61.17s/it]
 23%|██▎       | 398/1726 [6:56:19<22:27:17, 60.87s/it]


 23%|██▎       | 398/1726 [6:56:19<22:27:17, 60.87s/it]
 23%|██▎       | 399/1726 [6:57:21<22:29:53, 61.04s/it]


 23%|██▎       | 399/1726 [6:57:21<22:29:53, 61.04s/it]
 23%|██▎       | 400/1726 [6:58:23<22:36:42, 61.39s/it]


 23%|██▎       | 400/1726 [6:58:23<22:36:42, 61.39s/it][INFO|trainer.py:2936] 2024-06-10 07:35:50,274 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400
[INFO|configuration_utils.py:473] 2024-06-10 07:35:50,278 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/config.json
[INFO|configuration_utils.py:594] 2024-06-10 07:35:50,281 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-10 07:35:58,231 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-10 07:35:58,244 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-10 07:35:58,246 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-10 07:35:58,247 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/added_tokens.json
[2024-06-10 07:35:58,470] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step400 is about to be saved!
[2024-06-10 07:35:58,483] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/mp_rank_00_model_states.pt
[2024-06-10 07:35:58,483] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/mp_rank_00_model_states.pt...
[2024-06-10 07:36:07,136] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/mp_rank_00_model_states.pt.
[2024-06-10 07:36:07,141] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-10 07:36:19,250] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-10 07:36:19,262] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-10 07:36:19,262] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step400 is ready now!
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3392
[2024-06-10 07:36:21,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-10 07:36:23,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.66 | bwd_microstep: 1573.14 | bwd_inner_microstep: 1573.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 07:36:25,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1546.70 | bwd_inner_microstep: 1546.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 07:36:27,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.59 | bwd_microstep: 1444.26 | bwd_inner_microstep: 1444.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 07:36:29,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.41 | bwd_microstep: 1239.03 | bwd_inner_microstep: 1239.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-10 07:36:31,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.87 | bwd_microstep: 1628.08 | bwd_inner_microstep: 1628.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 07:36:33,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1279.98 | bwd_inner_microstep: 1279.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 07:36:35,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.09 | bwd_microstep: 1382.70 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 07:36:37,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.16 | bwd_microstep: 1486.09 | bwd_inner_microstep: 1486.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-10 07:36:39,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.16 | bwd_microstep: 1521.11 | bwd_inner_microstep: 1521.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 07:36:40,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.62 | bwd_microstep: 802.47 | bwd_inner_microstep: 802.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 07:36:41,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 798.27 | bwd_inner_microstep: 798.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 07:36:43,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1397.18 | bwd_inner_microstep: 1397.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3656
[2024-06-10 07:36:45,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1470.59 | bwd_inner_microstep: 1470.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596
[2024-06-10 07:36:47,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.18 | bwd_microstep: 1467.15 | bwd_inner_microstep: 1467.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3504
[2024-06-10 07:36:49,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.91 | bwd_microstep: 1509.99 | bwd_inner_microstep: 1509.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2646
[2024-06-10 07:36:51,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.57 | bwd_microstep: 1112.12 | bwd_inner_microstep: 1112.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986
[2024-06-10 07:36:52,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.18 | bwd_microstep: 828.41 | bwd_inner_microstep: 828.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-10 07:36:54,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.09 | bwd_microstep: 1614.87 | bwd_inner_microstep: 1614.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 07:36:56,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1432.18 | bwd_inner_microstep: 1432.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 07:36:58,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1381.54 | bwd_inner_microstep: 1381.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3610
[2024-06-10 07:37:00,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 1533.21 | bwd_inner_microstep: 1533.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3624
[2024-06-10 07:37:02,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.12 | bwd_microstep: 1564.13 | bwd_inner_microstep: 1564.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-10 07:37:03,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.87 | bwd_microstep: 701.29 | bwd_inner_microstep: 701.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 07:37:04,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.66 | bwd_microstep: 817.19 | bwd_inner_microstep: 817.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3523
[2024-06-10 07:37:06,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.55 | bwd_microstep: 1194.68 | bwd_inner_microstep: 1194.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3623
[2024-06-10 07:37:08,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.55 | bwd_microstep: 1542.79 | bwd_inner_microstep: 1542.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-10 07:37:10,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.67 | bwd_microstep: 1072.84 | bwd_inner_microstep: 1072.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 07:37:12,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1647.64 | bwd_inner_microstep: 1647.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 07:37:14,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.13 | bwd_microstep: 1338.34 | bwd_inner_microstep: 1338.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766
[2024-06-10 07:37:16,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.44 | bwd_microstep: 1611.60 | bwd_inner_microstep: 1611.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 07:37:20,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.45 | optimizer_step: 6.60
[2024-06-10 07:37:20,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 3845.50 | bwd_inner_microstep: 1591.49 | bwd_allreduce_microstep: 2253.93 | step_microstep: 39.33
[2024-06-10 07:37:20,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15907.10 | bwd: 45074.23 | bwd_inner: 42819.34 | bwd_allreduce: 2254.18 | step: 40.93
{'loss': 1.336, 'learning_rate': 3.5861352752519294e-05, 'epoch': 0.23}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 07:37:22,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.46 | bwd_microstep: 1366.10 | bwd_inner_microstep: 1366.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 07:37:24,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 07:37:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1277.09 | bwd_inner_microstep: 1277.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 07:37:28,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.73 | bwd_microstep: 1383.56 | bwd_inner_microstep: 1383.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 07:37:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.98 | bwd_microstep: 1454.26 | bwd_inner_microstep: 1454.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 07:37:32,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.34 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 07:37:33,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1279.89 | bwd_inner_microstep: 1279.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 07:37:36,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.73 | bwd_microstep: 1528.09 | bwd_inner_microstep: 1528.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3732
[2024-06-10 07:37:38,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.69 | bwd_microstep: 1565.88 | bwd_inner_microstep: 1565.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 07:37:40,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.93 | bwd_microstep: 1476.00 | bwd_inner_microstep: 1475.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3658
[2024-06-10 07:37:42,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.27 | bwd_microstep: 1445.44 | bwd_inner_microstep: 1445.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 07:37:44,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1294.22 | bwd_inner_microstep: 1294.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 07:37:46,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.89 | bwd_microstep: 1474.72 | bwd_inner_microstep: 1474.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2117
[2024-06-10 07:37:47,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.40 | bwd_microstep: 1020.77 | bwd_inner_microstep: 1020.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 07:37:49,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1616.32 | bwd_inner_microstep: 1616.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 07:37:51,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1384.87 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 07:37:53,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1287.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 07:37:55,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.03 | bwd_microstep: 1291.31 | bwd_inner_microstep: 1291.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 07:37:57,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.65 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 07:37:59,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.87 | bwd_microstep: 1430.41 | bwd_inner_microstep: 1430.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 07:38:01,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1535.76 | bwd_inner_microstep: 1535.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 07:38:02,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1291.40 | bwd_inner_microstep: 1291.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 07:38:04,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.46 | bwd_microstep: 1254.35 | bwd_inner_microstep: 1254.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 07:38:06,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1292.06 | bwd_inner_microstep: 1292.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2989
[2024-06-10 07:38:08,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.05 | bwd_microstep: 1206.83 | bwd_inner_microstep: 1206.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2031
[2024-06-10 07:38:09,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.54 | bwd_microstep: 760.66 | bwd_inner_microstep: 760.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573
[2024-06-10 07:38:11,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.09 | bwd_microstep: 1566.27 | bwd_inner_microstep: 1566.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2279
[2024-06-10 07:38:12,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.29 | bwd_microstep: 937.02 | bwd_inner_microstep: 936.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 07:38:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1377.13 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 4810
[2024-06-10 07:38:17,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 747.29 | bwd_microstep: 2000.87 | bwd_inner_microstep: 2000.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 07:38:19,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1492.21 | bwd_inner_microstep: 1492.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3579
[2024-06-10 07:38:21,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 07:38:21,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1475.60 | bwd_allreduce_microstep: 7.71 | step_microstep: 38.17
[2024-06-10 07:38:21,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16436.19 | bwd: 43829.07 | bwd_inner: 43820.44 | bwd_allreduce: 7.94 | step: 40.14
{'loss': 1.3144, 'learning_rate': 3.583846163712641e-05, 'epoch': 0.23}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 07:38:23,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.42 | bwd_microstep: 1476.85 | bwd_inner_microstep: 1476.77 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 07:38:24,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.50 | bwd_microstep: 699.01 | bwd_inner_microstep: 698.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3854
[2024-06-10 07:38:26,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.58 | bwd_microstep: 1468.28 | bwd_inner_microstep: 1468.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 07:38:28,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.03 | bwd_microstep: 1655.46 | bwd_inner_microstep: 1655.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-10 07:38:31,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.24 | bwd_microstep: 1648.72 | bwd_inner_microstep: 1648.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 07:38:32,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.63 | bwd_microstep: 1154.47 | bwd_inner_microstep: 1154.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 07:38:34,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.61 | bwd_microstep: 1434.15 | bwd_inner_microstep: 1434.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3744
[2024-06-10 07:38:36,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1338.60 | bwd_inner_microstep: 1338.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 07:38:38,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.20 | bwd_microstep: 1288.25 | bwd_inner_microstep: 1288.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3497
[2024-06-10 07:38:39,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.83 | bwd_microstep: 1252.88 | bwd_inner_microstep: 1252.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3696
[2024-06-10 07:38:42,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.50 | bwd_microstep: 1734.36 | bwd_inner_microstep: 1734.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 07:38:44,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.67 | bwd_microstep: 1429.79 | bwd_inner_microstep: 1429.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 07:38:46,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1348.47 | bwd_inner_microstep: 1348.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1924
[2024-06-10 07:38:47,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.07 | bwd_microstep: 758.84 | bwd_inner_microstep: 758.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 07:38:49,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.00 | bwd_microstep: 1302.31 | bwd_inner_microstep: 1302.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-10 07:38:51,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1590.00 | bwd_inner_microstep: 1589.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-10 07:38:53,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1410.26 | bwd_inner_microstep: 1410.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3162
[2024-06-10 07:38:55,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1348.18 | bwd_inner_microstep: 1348.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 07:38:56,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.91 | bwd_microstep: 878.56 | bwd_inner_microstep: 878.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 07:38:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.26 | bwd_microstep: 1295.66 | bwd_inner_microstep: 1295.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 07:39:00,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3485
[2024-06-10 07:39:01,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.11 | bwd_microstep: 1432.01 | bwd_inner_microstep: 1431.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827
[2024-06-10 07:39:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.70 | bwd_microstep: 1520.93 | bwd_inner_microstep: 1520.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 07:39:06,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 07:39:07,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1391.77 | bwd_inner_microstep: 1391.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 07:39:09,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2049
[2024-06-10 07:39:11,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.68 | bwd_microstep: 1007.83 | bwd_inner_microstep: 1007.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3802
[2024-06-10 07:39:13,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.55 | bwd_microstep: 1482.84 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 07:39:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1394.64 | bwd_inner_microstep: 1394.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3612
[2024-06-10 07:39:17,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1709.19 | bwd_inner_microstep: 1709.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3442
[2024-06-10 07:39:19,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.91 | bwd_microstep: 1400.72 | bwd_inner_microstep: 1400.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 07:39:23,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 07:39:23,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.45 | bwd_microstep: 3111.26 | bwd_inner_microstep: 1868.07 | bwd_allreduce_microstep: 1243.14 | step_microstep: 38.12
[2024-06-10 07:39:23,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16243.97 | bwd: 45028.79 | bwd_inner: 43784.66 | bwd_allreduce: 1243.42 | step: 39.71
{'loss': 1.3116, 'learning_rate': 3.581551473875397e-05, 'epoch': 0.23}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 07:39:24,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1370.48 | bwd_inner_microstep: 1370.36 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 07:39:26,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1384.07 | bwd_inner_microstep: 1384.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 07:39:28,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1286.63 | bwd_inner_microstep: 1286.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3909
[2024-06-10 07:39:30,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.89 | bwd_microstep: 1538.58 | bwd_inner_microstep: 1538.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 07:39:32,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1481.29 | bwd_inner_microstep: 1481.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884
[2024-06-10 07:39:33,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 709.54 | bwd_inner_microstep: 709.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-10 07:39:36,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.35 | bwd_microstep: 1631.06 | bwd_inner_microstep: 1631.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 07:39:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 07:39:39,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1286.44 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 07:39:41,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.02 | bwd_microstep: 1384.92 | bwd_inner_microstep: 1384.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902
[2024-06-10 07:39:42,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.20 | bwd_microstep: 686.08 | bwd_inner_microstep: 686.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3673
[2024-06-10 07:39:44,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.39 | bwd_microstep: 1550.36 | bwd_inner_microstep: 1550.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3582
[2024-06-10 07:39:46,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.54 | bwd_microstep: 1434.06 | bwd_inner_microstep: 1434.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 07:39:48,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1382.96 | bwd_inner_microstep: 1382.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 07:39:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.93 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2898
[2024-06-10 07:39:52,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.45 | bwd_microstep: 1185.44 | bwd_inner_microstep: 1185.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 07:39:54,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979
[2024-06-10 07:39:55,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.21 | bwd_microstep: 895.60 | bwd_inner_microstep: 895.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 07:39:57,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1508.19 | bwd_inner_microstep: 1508.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 07:39:59,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.36 | bwd_microstep: 1368.09 | bwd_inner_microstep: 1368.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 07:40:01,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1284.62 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3620
[2024-06-10 07:40:03,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.30 | bwd_microstep: 1676.93 | bwd_inner_microstep: 1676.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 07:40:05,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.50 | bwd_microstep: 1252.83 | bwd_inner_microstep: 1252.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3437
[2024-06-10 07:40:07,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.77 | bwd_microstep: 1544.64 | bwd_inner_microstep: 1544.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2424
[2024-06-10 07:40:08,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.73 | bwd_microstep: 1134.34 | bwd_inner_microstep: 1134.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 07:40:10,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.67 | bwd_microstep: 1644.32 | bwd_inner_microstep: 1644.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3821
[2024-06-10 07:40:13,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.91 | bwd_microstep: 1860.31 | bwd_inner_microstep: 1860.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3730
[2024-06-10 07:40:15,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.17 | bwd_microstep: 1664.28 | bwd_inner_microstep: 1664.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 07:40:17,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.02 | bwd_microstep: 916.44 | bwd_inner_microstep: 916.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 07:40:18,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1304.28 | bwd_inner_microstep: 1304.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582
[2024-06-10 07:40:20,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1299.69 | bwd_inner_microstep: 1299.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 07:40:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 07:40:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.04 | bwd_microstep: 3985.08 | bwd_inner_microstep: 2025.88 | bwd_allreduce_microstep: 1959.12 | step_microstep: 38.36
[2024-06-10 07:40:25,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16174.29 | bwd: 45735.85 | bwd_inner: 43775.69 | bwd_allreduce: 1959.42 | step: 39.93
{'loss': 1.33, 'learning_rate': 3.579251213822085e-05, 'epoch': 0.23}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 07:40:27,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.94 | bwd_microstep: 1485.24 | bwd_inner_microstep: 1485.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 07:40:29,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.61 | bwd_microstep: 1442.87 | bwd_inner_microstep: 1442.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2307
[2024-06-10 07:40:30,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.14 | bwd_microstep: 821.10 | bwd_inner_microstep: 821.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 07:40:32,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1552.10 | bwd_inner_microstep: 1552.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-10 07:40:34,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1297.83 | bwd_inner_microstep: 1297.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 07:40:36,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1385.33 | bwd_inner_microstep: 1385.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-10 07:40:38,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.07 | bwd_microstep: 1213.57 | bwd_inner_microstep: 1213.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 07:40:39,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 07:40:41,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.22 | bwd_microstep: 792.90 | bwd_inner_microstep: 792.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 07:40:42,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 716.05 | bwd_inner_microstep: 716.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 07:40:43,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1390.68 | bwd_inner_microstep: 1390.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 07:40:45,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1429.53 | bwd_inner_microstep: 1429.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2679
[2024-06-10 07:40:47,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.58 | bwd_microstep: 1058.97 | bwd_inner_microstep: 1058.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2166
[2024-06-10 07:40:48,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.70 | bwd_microstep: 885.77 | bwd_inner_microstep: 885.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3421
[2024-06-10 07:40:50,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1406.24 | bwd_inner_microstep: 1406.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-10 07:40:52,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1585.64 | bwd_inner_microstep: 1585.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 07:40:54,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1288.39 | bwd_inner_microstep: 1288.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 07:40:56,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1390.97 | bwd_inner_microstep: 1390.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2177
[2024-06-10 07:40:57,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.90 | bwd_microstep: 829.25 | bwd_inner_microstep: 829.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 07:40:59,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1296.86 | bwd_inner_microstep: 1296.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2088
[2024-06-10 07:41:00,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.81 | bwd_microstep: 852.84 | bwd_inner_microstep: 852.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3636
[2024-06-10 07:41:02,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1318.65 | bwd_inner_microstep: 1318.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 07:41:04,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.75 | bwd_microstep: 1549.81 | bwd_inner_microstep: 1549.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819
[2024-06-10 07:41:06,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.00 | bwd_microstep: 1753.49 | bwd_inner_microstep: 1753.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 07:41:08,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1282.30 | bwd_inner_microstep: 1282.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 07:41:10,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1530.86 | bwd_inner_microstep: 1530.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 07:41:13,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.90 | bwd_microstep: 1756.91 | bwd_inner_microstep: 1756.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 07:41:15,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.96 | bwd_microstep: 1747.16 | bwd_inner_microstep: 1747.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 07:41:17,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.68 | bwd_microstep: 1530.65 | bwd_inner_microstep: 1530.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3560
[2024-06-10 07:41:19,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1589.34 | bwd_inner_microstep: 1589.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2254
[2024-06-10 07:41:21,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.16 | bwd_microstep: 1067.66 | bwd_inner_microstep: 1067.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3825
[2024-06-10 07:41:26,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.26 | optimizer_step: 6.64
[2024-06-10 07:41:26,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.30 | bwd_microstep: 3893.16 | bwd_inner_microstep: 2104.33 | bwd_allreduce_microstep: 1788.77 | step_microstep: 38.40
[2024-06-10 07:41:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15866.15 | bwd: 44524.19 | bwd_inner: 42734.51 | bwd_allreduce: 1789.01 | step: 40.05
{'loss': 1.3099, 'learning_rate': 3.5769453916542065e-05, 'epoch': 0.23}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3413
[2024-06-10 07:41:27,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.33 | bwd_microstep: 1180.09 | bwd_inner_microstep: 1180.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1926
[2024-06-10 07:41:28,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.84 | bwd_microstep: 738.87 | bwd_inner_microstep: 738.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2434
[2024-06-10 07:41:30,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.13 | bwd_microstep: 941.51 | bwd_inner_microstep: 941.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3855
[2024-06-10 07:41:32,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.38 | bwd_microstep: 1559.42 | bwd_inner_microstep: 1559.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 07:41:34,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.13 | bwd_microstep: 1542.21 | bwd_inner_microstep: 1542.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 07:41:36,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1246.86 | bwd_inner_microstep: 1246.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402
[2024-06-10 07:41:37,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.62 | bwd_microstep: 1179.91 | bwd_inner_microstep: 1179.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 07:41:39,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.07 | bwd_microstep: 1442.41 | bwd_inner_microstep: 1442.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-10 07:41:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 07:41:43,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1390.07 | bwd_inner_microstep: 1390.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 07:41:45,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1389.21 | bwd_inner_microstep: 1389.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3441
[2024-06-10 07:41:47,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.82 | bwd_microstep: 1216.10 | bwd_inner_microstep: 1216.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3443
[2024-06-10 07:41:48,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1310.99 | bwd_inner_microstep: 1310.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-10 07:41:50,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.77 | bwd_microstep: 1218.55 | bwd_inner_microstep: 1218.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 07:41:52,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1377.01 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 07:41:54,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1351.21 | bwd_inner_microstep: 1351.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 07:41:56,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.86 | bwd_microstep: 1573.78 | bwd_inner_microstep: 1573.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1961
[2024-06-10 07:41:57,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.82 | bwd_microstep: 890.48 | bwd_inner_microstep: 890.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3627
[2024-06-10 07:41:59,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1217.81 | bwd_inner_microstep: 1217.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511
[2024-06-10 07:42:01,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.03 | bwd_microstep: 1194.29 | bwd_inner_microstep: 1194.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 07:42:03,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1609.80 | bwd_inner_microstep: 1609.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3880
[2024-06-10 07:42:05,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.05 | bwd_microstep: 1590.40 | bwd_inner_microstep: 1590.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 07:42:07,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1407.03 | bwd_inner_microstep: 1407.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616
[2024-06-10 07:42:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 1538.21 | bwd_inner_microstep: 1538.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 07:42:11,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.94 | bwd_microstep: 1255.30 | bwd_inner_microstep: 1255.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524
[2024-06-10 07:42:13,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1586.64 | bwd_inner_microstep: 1586.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 07:42:15,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1514.66 | bwd_inner_microstep: 1514.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-10 07:42:17,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.57 | bwd_microstep: 1507.86 | bwd_inner_microstep: 1507.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3538
[2024-06-10 07:42:19,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.24 | bwd_microstep: 1622.00 | bwd_inner_microstep: 1621.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 07:42:20,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.11 | bwd_microstep: 697.31 | bwd_inner_microstep: 697.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 07:42:23,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1501.46 | bwd_inner_microstep: 1501.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 07:42:27,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.57
[2024-06-10 07:42:27,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 4300.92 | bwd_inner_microstep: 1701.48 | bwd_allreduce_microstep: 2599.38 | step_microstep: 38.22
[2024-06-10 07:42:27,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16024.56 | bwd: 45520.51 | bwd_inner: 42920.21 | bwd_allreduce: 2599.61 | step: 39.81
{'loss': 1.3202, 'learning_rate': 3.574634015492857e-05, 'epoch': 0.24}

 23%|██▎       | 401/1726 [6:59:57<26:11:16, 71.15s/it]


 23%|██▎       | 401/1726 [6:59:57<26:11:16, 71.15s/it]
 23%|██▎       | 402/1726 [7:00:58<25:00:29, 68.00s/it]


 23%|██▎       | 402/1726 [7:00:58<25:00:29, 68.00s/it]
 23%|██▎       | 403/1726 [7:01:59<24:17:10, 66.09s/it]


 23%|██▎       | 403/1726 [7:01:59<24:17:10, 66.09s/it]
 23%|██▎       | 404/1726 [7:03:02<23:50:46, 64.94s/it]


 23%|██▎       | 404/1726 [7:03:02<23:50:46, 64.94s/it]
 23%|██▎       | 405/1726 [7:04:02<23:21:57, 63.68s/it]


 23%|██▎       | 405/1726 [7:04:02<23:21:57, 63.68s/it]
 24%|██▎       | 406/1726 [7:05:04<23:09:01, 63.14s/it]

dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 07:42:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.46 | bwd_microstep: 1468.08 | bwd_inner_microstep: 1468.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3971
[2024-06-10 07:42:31,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1437.47 | bwd_inner_microstep: 1437.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 07:42:34,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1479.66 | bwd_inner_microstep: 1479.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 07:42:35,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.49 | bwd_microstep: 1443.65 | bwd_inner_microstep: 1443.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 07:42:37,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.39 | bwd_microstep: 791.14 | bwd_inner_microstep: 791.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 07:42:38,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 07:42:40,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.56 | bwd_microstep: 1387.01 | bwd_inner_microstep: 1386.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 07:42:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.50 | bwd_microstep: 1154.49 | bwd_inner_microstep: 1154.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-10 07:42:44,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.66 | bwd_microstep: 1625.14 | bwd_inner_microstep: 1625.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 07:42:46,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1383.93 | bwd_inner_microstep: 1383.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-10 07:42:47,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.46 | bwd_microstep: 894.19 | bwd_inner_microstep: 894.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 07:42:49,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.42 | bwd_microstep: 1613.97 | bwd_inner_microstep: 1613.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3628
[2024-06-10 07:42:51,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.95 | bwd_microstep: 1458.43 | bwd_inner_microstep: 1458.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992
[2024-06-10 07:42:53,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.20 | bwd_microstep: 899.87 | bwd_inner_microstep: 899.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 07:42:55,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.42 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 07:42:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.92 | bwd_microstep: 1287.97 | bwd_inner_microstep: 1287.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1979
[2024-06-10 07:42:58,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.71 | bwd_microstep: 830.66 | bwd_inner_microstep: 830.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 07:42:59,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.45 | bwd_microstep: 978.31 | bwd_inner_microstep: 978.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3662
[2024-06-10 07:43:01,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 07:43:03,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.84 | bwd_microstep: 1599.39 | bwd_inner_microstep: 1599.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 07:43:05,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.38 | bwd_microstep: 1487.84 | bwd_inner_microstep: 1487.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 07:43:07,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1490.02 | bwd_inner_microstep: 1489.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 07:43:09,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1507.53 | bwd_inner_microstep: 1507.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 07:43:11,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1376.23 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770
[2024-06-10 07:43:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1567.04 | bwd_inner_microstep: 1567.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 07:43:15,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1413.35 | bwd_inner_microstep: 1413.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3608
[2024-06-10 07:43:17,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.86 | bwd_microstep: 1554.78 | bwd_inner_microstep: 1554.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 07:43:19,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.89 | bwd_microstep: 1261.84 | bwd_inner_microstep: 1261.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 07:43:21,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1406.74 | bwd_inner_microstep: 1406.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 07:43:23,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.69 | bwd_microstep: 1508.66 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 07:43:25,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1557.15 | bwd_inner_microstep: 1557.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2469
[2024-06-10 07:43:29,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 07:43:29,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.85 | bwd_microstep: 3036.34 | bwd_inner_microstep: 1190.93 | bwd_allreduce_microstep: 1845.36 | step_microstep: 37.85
[2024-06-10 07:43:29,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.73 | bwd: 45028.55 | bwd_inner: 43182.26 | bwd_allreduce: 1845.59 | step: 39.36
{'loss': 1.2825, 'learning_rate': 3.57231709347869e-05, 'epoch': 0.24}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547
[2024-06-10 07:43:31,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1316.60 | bwd_inner_microstep: 1316.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 07:43:32,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1245.64 | bwd_inner_microstep: 1245.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 07:43:34,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1353.42 | bwd_inner_microstep: 1353.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 07:43:36,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1278.82 | bwd_inner_microstep: 1278.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 07:43:38,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1484.14 | bwd_inner_microstep: 1484.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 07:43:40,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1507.62 | bwd_inner_microstep: 1507.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 07:43:42,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1403.07 | bwd_inner_microstep: 1403.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4043
[2024-06-10 07:43:44,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.66 | bwd_microstep: 1722.13 | bwd_inner_microstep: 1722.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416
[2024-06-10 07:43:46,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1153.56 | bwd_inner_microstep: 1153.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-10 07:43:47,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.98 | bwd_microstep: 778.18 | bwd_inner_microstep: 778.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3499
[2024-06-10 07:43:49,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1447.16 | bwd_inner_microstep: 1447.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2967
[2024-06-10 07:43:51,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.52 | bwd_microstep: 1230.81 | bwd_inner_microstep: 1230.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 07:43:53,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1504.54 | bwd_inner_microstep: 1504.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2399
[2024-06-10 07:43:54,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.47 | bwd_microstep: 902.00 | bwd_inner_microstep: 901.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 07:43:56,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.60 | bwd_microstep: 1522.67 | bwd_inner_microstep: 1522.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 07:43:58,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1484.65 | bwd_inner_microstep: 1484.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3385
[2024-06-10 07:44:00,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.05 | bwd_microstep: 1177.50 | bwd_inner_microstep: 1177.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2656
[2024-06-10 07:44:02,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.69 | bwd_microstep: 1211.97 | bwd_inner_microstep: 1211.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2087
[2024-06-10 07:44:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.40 | bwd_microstep: 1014.85 | bwd_inner_microstep: 1014.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2982
[2024-06-10 07:44:04,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.08 | bwd_microstep: 1016.58 | bwd_inner_microstep: 1016.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486
[2024-06-10 07:44:06,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.36 | bwd_microstep: 1188.53 | bwd_inner_microstep: 1188.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 07:44:08,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1328.98 | bwd_inner_microstep: 1328.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3615
[2024-06-10 07:44:10,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.22 | bwd_microstep: 1707.97 | bwd_inner_microstep: 1707.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 07:44:12,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.21 | bwd_microstep: 1403.48 | bwd_inner_microstep: 1403.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 07:44:14,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.42 | bwd_microstep: 1302.37 | bwd_inner_microstep: 1302.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 07:44:16,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 07:44:18,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.02 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 07:44:19,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.94 | bwd_microstep: 875.70 | bwd_inner_microstep: 875.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 07:44:21,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1499.26 | bwd_inner_microstep: 1499.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807
[2024-06-10 07:44:23,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.49 | bwd_microstep: 1354.87 | bwd_inner_microstep: 1354.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424
[2024-06-10 07:44:25,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1408.32 | bwd_inner_microstep: 1408.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388
[2024-06-10 07:44:30,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 07:44:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 4883.81 | bwd_inner_microstep: 1515.68 | bwd_allreduce_microstep: 3368.07 | step_microstep: 38.28
[2024-06-10 07:44:30,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15716.29 | bwd: 45395.97 | bwd_inner: 42026.98 | bwd_allreduce: 3368.30 | step: 39.79
{'loss': 1.3278, 'learning_rate': 3.5699946337718934e-05, 'epoch': 0.24}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3383
[2024-06-10 07:44:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.80 | bwd_microstep: 1283.75 | bwd_inner_microstep: 1283.63 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3972
[2024-06-10 07:44:34,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.61 | bwd_microstep: 1599.80 | bwd_inner_microstep: 1599.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 07:44:36,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.42 | bwd_microstep: 1282.90 | bwd_inner_microstep: 1282.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 07:44:38,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.46 | bwd_microstep: 1553.65 | bwd_inner_microstep: 1553.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 07:44:40,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.33 | bwd_microstep: 1287.79 | bwd_inner_microstep: 1287.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 07:44:42,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.20 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 07:44:44,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1246.41 | bwd_inner_microstep: 1246.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 07:44:46,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.91 | bwd_microstep: 1386.17 | bwd_inner_microstep: 1386.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2177
[2024-06-10 07:44:47,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.94 | bwd_microstep: 950.64 | bwd_inner_microstep: 950.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 07:44:49,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1392.59 | bwd_inner_microstep: 1392.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 4050
[2024-06-10 07:44:51,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.39 | bwd_microstep: 1784.50 | bwd_inner_microstep: 1784.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650
[2024-06-10 07:44:53,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.09 | bwd_microstep: 1576.95 | bwd_inner_microstep: 1576.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3501
[2024-06-10 07:44:55,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.54 | bwd_microstep: 1189.48 | bwd_inner_microstep: 1189.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3411
[2024-06-10 07:44:57,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1535.77 | bwd_inner_microstep: 1535.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 07:44:59,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1606.00 | bwd_inner_microstep: 1605.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 07:45:01,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.89 | bwd_microstep: 1290.79 | bwd_inner_microstep: 1290.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 07:45:03,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1410.37 | bwd_inner_microstep: 1410.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 07:45:05,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1510.87 | bwd_inner_microstep: 1510.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 07:45:07,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 07:45:09,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.08 | bwd_microstep: 1190.02 | bwd_inner_microstep: 1189.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 07:45:11,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1405.97 | bwd_inner_microstep: 1405.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 07:45:13,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1356.10 | bwd_inner_microstep: 1356.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 07:45:15,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.08 | bwd_microstep: 1612.73 | bwd_inner_microstep: 1612.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3541
[2024-06-10 07:45:17,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.53 | bwd_microstep: 1199.98 | bwd_inner_microstep: 1199.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 07:45:19,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.23 | bwd_microstep: 1507.34 | bwd_inner_microstep: 1507.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3608
[2024-06-10 07:45:21,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.51 | bwd_microstep: 1704.74 | bwd_inner_microstep: 1704.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 07:45:23,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.63 | bwd_microstep: 973.92 | bwd_inner_microstep: 973.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-10 07:45:24,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.75 | bwd_microstep: 1357.48 | bwd_inner_microstep: 1357.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 07:45:26,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1352.67 | bwd_inner_microstep: 1352.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 07:45:28,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1455.81 | bwd_inner_microstep: 1455.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-10 07:45:30,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.34 | bwd_microstep: 980.36 | bwd_inner_microstep: 980.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 07:45:32,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 07:45:32,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1443.96 | bwd_inner_microstep: 1436.29 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.69
[2024-06-10 07:45:32,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16586.44 | bwd: 44413.59 | bwd_inner: 44404.95 | bwd_allreduce: 7.93 | step: 39.28
{'loss': 1.3042, 'learning_rate': 3.567666644552159e-05, 'epoch': 0.24}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2886
[2024-06-10 07:45:33,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.50 | bwd_microstep: 1180.91 | bwd_inner_microstep: 1180.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3520
[2024-06-10 07:45:35,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1507.16 | bwd_inner_microstep: 1507.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734
[2024-06-10 07:45:37,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1533.11 | bwd_inner_microstep: 1533.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 583
[2024-06-10 07:45:38,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.22 | bwd_microstep: 254.36 | bwd_inner_microstep: 254.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 07:45:40,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1280.20 | bwd_inner_microstep: 1280.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 07:45:42,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.44 | bwd_microstep: 1529.74 | bwd_inner_microstep: 1529.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2625
[2024-06-10 07:45:43,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 950.93 | bwd_inner_microstep: 950.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2933
[2024-06-10 07:45:45,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.47 | bwd_microstep: 1064.76 | bwd_inner_microstep: 1064.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-10 07:45:47,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.18 | bwd_microstep: 1528.00 | bwd_inner_microstep: 1527.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 07:45:48,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1249.93 | bwd_inner_microstep: 1249.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124
[2024-06-10 07:45:50,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.74 | bwd_microstep: 929.18 | bwd_inner_microstep: 929.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885
[2024-06-10 07:45:51,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.49 | bwd_microstep: 716.36 | bwd_inner_microstep: 716.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4003
[2024-06-10 07:45:53,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.88 | bwd_microstep: 1562.71 | bwd_inner_microstep: 1562.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 07:45:55,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 07:45:57,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1484.42 | bwd_inner_microstep: 1484.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 07:45:59,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1513.20 | bwd_inner_microstep: 1513.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 07:46:01,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.94 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 07:46:02,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1349.05 | bwd_inner_microstep: 1349.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3531
[2024-06-10 07:46:04,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.14 | bwd_microstep: 1199.16 | bwd_inner_microstep: 1199.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 07:46:06,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.76 | bwd_microstep: 1509.75 | bwd_inner_microstep: 1509.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-10 07:46:07,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.59 | bwd_microstep: 910.56 | bwd_inner_microstep: 910.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 07:46:09,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.91 | bwd_microstep: 1189.21 | bwd_inner_microstep: 1189.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 07:46:11,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1554.76 | bwd_inner_microstep: 1554.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 07:46:13,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1501.80 | bwd_inner_microstep: 1501.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 07:46:15,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.87 | bwd_microstep: 1463.16 | bwd_inner_microstep: 1463.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 07:46:16,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.56 | bwd_microstep: 803.12 | bwd_inner_microstep: 803.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-10 07:46:18,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1329.02 | bwd_inner_microstep: 1328.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2059
[2024-06-10 07:46:20,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.18 | bwd_microstep: 943.43 | bwd_inner_microstep: 943.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3590
[2024-06-10 07:46:21,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.24 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588
[2024-06-10 07:46:23,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1307.56 | bwd_inner_microstep: 1307.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-10 07:46:26,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.92 | bwd_microstep: 1748.90 | bwd_inner_microstep: 1748.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 07:46:33,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.32 | optimizer_step: 6.60
[2024-06-10 07:46:33,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 6783.99 | bwd_inner_microstep: 1568.10 | bwd_allreduce_microstep: 5215.83 | step_microstep: 38.70
[2024-06-10 07:46:33,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15187.03 | bwd: 45886.94 | bwd_inner: 40670.18 | bwd_allreduce: 5216.07 | step: 40.22
{'loss': 1.3165, 'learning_rate': 3.5653331340186515e-05, 'epoch': 0.24}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 07:46:35,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1333.64 | bwd_inner_microstep: 1333.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 07:46:37,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 07:46:39,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.94 | bwd_microstep: 1450.01 | bwd_inner_microstep: 1449.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 07:46:41,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.75 | bwd_microstep: 1533.74 | bwd_inner_microstep: 1533.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 07:46:43,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.24 | bwd_microstep: 1279.10 | bwd_inner_microstep: 1279.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 07:46:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1384.95 | bwd_inner_microstep: 1384.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 07:46:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1249.27 | bwd_inner_microstep: 1249.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 07:46:48,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1350.05 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 07:46:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.05 | bwd_microstep: 1529.21 | bwd_inner_microstep: 1529.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 07:46:52,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 07:46:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1193.29 | bwd_inner_microstep: 1193.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073
[2024-06-10 07:46:55,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.31 | bwd_microstep: 818.50 | bwd_inner_microstep: 818.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2763
[2024-06-10 07:46:56,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.28 | bwd_microstep: 1006.94 | bwd_inner_microstep: 1006.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-10 07:46:59,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1581.06 | bwd_inner_microstep: 1581.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 07:47:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1396.32 | bwd_inner_microstep: 1396.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 07:47:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.05 | bwd_microstep: 696.62 | bwd_inner_microstep: 696.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531
[2024-06-10 07:47:03,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1358.43 | bwd_inner_microstep: 1358.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849
[2024-06-10 07:47:06,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.60 | bwd_microstep: 1563.12 | bwd_inner_microstep: 1563.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 07:47:08,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.81 | bwd_microstep: 1527.98 | bwd_inner_microstep: 1527.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 07:47:09,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1297.00 | bwd_inner_microstep: 1296.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 07:47:11,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1434.55 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 07:47:13,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1402.32 | bwd_inner_microstep: 1402.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 07:47:15,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1433.40 | bwd_inner_microstep: 1433.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 07:47:17,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1397.03 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-10 07:47:18,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.68 | bwd_microstep: 701.58 | bwd_inner_microstep: 701.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-10 07:47:20,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1313.29 | bwd_inner_microstep: 1313.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 07:47:22,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.97 | bwd_microstep: 1591.51 | bwd_inner_microstep: 1591.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 07:47:24,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1591.63 | bwd_inner_microstep: 1591.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 07:47:26,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1403.82 | bwd_inner_microstep: 1403.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 07:47:28,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.89 | bwd_microstep: 1260.74 | bwd_inner_microstep: 1260.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3590
[2024-06-10 07:47:30,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 07:47:34,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 07:47:34,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.47 | bwd_microstep: 3593.30 | bwd_inner_microstep: 1462.48 | bwd_allreduce_microstep: 2130.77 | step_microstep: 38.20
[2024-06-10 07:47:34,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16043.96 | bwd: 44995.61 | bwd_inner: 42863.93 | bwd_allreduce: 2131.00 | step: 39.70
{'loss': 1.2854, 'learning_rate': 3.5629941103899834e-05, 'epoch': 0.24}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 4602
[2024-06-10 07:47:37,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.26 | bwd_microstep: 1797.24 | bwd_inner_microstep: 1796.96 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.21
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475
[2024-06-10 07:47:39,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1217.02 | bwd_inner_microstep: 1217.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 07:47:41,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.61 | bwd_microstep: 1652.19 | bwd_inner_microstep: 1652.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 07:47:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1378.97 | bwd_inner_microstep: 1378.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 07:47:44,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.66 | bwd_microstep: 1146.17 | bwd_inner_microstep: 1146.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756
[2024-06-10 07:47:47,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1540.81 | bwd_inner_microstep: 1538.78 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 07:47:49,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.71 | bwd_microstep: 1642.76 | bwd_inner_microstep: 1642.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 07:47:51,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.38 | bwd_microstep: 1301.57 | bwd_inner_microstep: 1301.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3892
[2024-06-10 07:47:53,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.63 | bwd_microstep: 1636.46 | bwd_inner_microstep: 1636.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 07:47:55,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1385.95 | bwd_inner_microstep: 1385.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3662
[2024-06-10 07:47:57,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.02 | bwd_microstep: 1357.72 | bwd_inner_microstep: 1357.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 07:47:59,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.68 | bwd_microstep: 1429.46 | bwd_inner_microstep: 1429.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 07:48:01,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.25 | bwd_microstep: 1488.03 | bwd_inner_microstep: 1488.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3656
[2024-06-10 07:48:03,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.77 | bwd_microstep: 1554.72 | bwd_inner_microstep: 1554.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 07:48:05,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.74 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3830
[2024-06-10 07:48:07,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.89 | bwd_microstep: 1796.27 | bwd_inner_microstep: 1796.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 07:48:09,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1289.69 | bwd_inner_microstep: 1289.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 07:48:11,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1416.73 | bwd_inner_microstep: 1416.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 07:48:13,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.19 | bwd_microstep: 1667.56 | bwd_inner_microstep: 1667.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294
[2024-06-10 07:48:14,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.58 | bwd_microstep: 976.70 | bwd_inner_microstep: 976.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 07:48:16,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.33 | bwd_microstep: 1297.72 | bwd_inner_microstep: 1297.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 07:48:18,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1258.24 | bwd_inner_microstep: 1258.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 07:48:20,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 1517.62 | bwd_inner_microstep: 1517.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 07:48:22,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1560.69 | bwd_inner_microstep: 1560.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031
[2024-06-10 07:48:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.94 | bwd_microstep: 813.08 | bwd_inner_microstep: 813.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 07:48:26,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1647.17 | bwd_inner_microstep: 1647.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3558
[2024-06-10 07:48:28,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.30 | bwd_microstep: 1534.95 | bwd_inner_microstep: 1534.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3763
[2024-06-10 07:48:30,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.18 | bwd_microstep: 1712.18 | bwd_inner_microstep: 1712.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524
[2024-06-10 07:48:32,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.70 | bwd_microstep: 1590.36 | bwd_inner_microstep: 1590.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 07:48:34,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1344.16 | bwd_inner_microstep: 1344.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3804
[2024-06-10 07:48:37,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.46 | bwd_microstep: 1684.95 | bwd_inner_microstep: 1684.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 07:48:39,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.67
[2024-06-10 07:48:39,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.13 | bwd_microstep: 1636.50 | bwd_inner_microstep: 1628.72 | bwd_allreduce_microstep: 7.73 | step_microstep: 37.73
[2024-06-10 07:48:39,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17359.77 | bwd: 46625.61 | bwd_inner: 46614.73 | bwd_allreduce: 8.17 | step: 39.96

 24%|██▎       | 406/1726 [7:05:04<23:09:01, 63.14s/it]
 24%|██▎       | 407/1726 [7:06:06<22:56:40, 62.62s/it]


 24%|██▎       | 407/1726 [7:06:06<22:56:40, 62.62s/it]
 24%|██▎       | 408/1726 [7:07:07<22:47:52, 62.27s/it]


 24%|██▎       | 408/1726 [7:07:07<22:47:52, 62.27s/it]
 24%|██▎       | 409/1726 [7:08:08<22:40:41, 61.99s/it]


 24%|██▎       | 409/1726 [7:08:08<22:40:41, 61.99s/it]
 24%|██▍       | 410/1726 [7:09:10<22:35:47, 61.81s/it]


 24%|██▍       | 410/1726 [7:09:10<22:35:47, 61.81s/it]
 24%|██▍       | 411/1726 [7:10:11<22:31:52, 61.68s/it]


 24%|██▍       | 411/1726 [7:10:11<22:31:52, 61.68s/it]
 24%|██▍       | 412/1726 [7:11:16<22:48:32, 62.49{'loss': 1.33, 'learning_rate': 3.560649581904184e-05, 'epoch': 0.24}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 07:48:40,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.69 | bwd_microstep: 790.07 | bwd_inner_microstep: 789.87 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-10 07:48:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.63 | bwd_microstep: 1433.98 | bwd_inner_microstep: 1433.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3861
[2024-06-10 07:48:44,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.23 | bwd_microstep: 1398.68 | bwd_inner_microstep: 1398.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 07:48:46,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.20 | bwd_microstep: 1643.50 | bwd_inner_microstep: 1643.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 07:48:48,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.85 | bwd_microstep: 1188.08 | bwd_inner_microstep: 1188.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 07:48:50,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.13 | bwd_microstep: 1542.06 | bwd_inner_microstep: 1542.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 07:48:52,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 07:48:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 07:48:56,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1416.87 | bwd_inner_microstep: 1416.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 07:48:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1495.27 | bwd_inner_microstep: 1495.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675
[2024-06-10 07:49:00,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.90 | bwd_microstep: 1551.71 | bwd_inner_microstep: 1551.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3444
[2024-06-10 07:49:02,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.93 | bwd_microstep: 1381.46 | bwd_inner_microstep: 1381.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1965
[2024-06-10 07:49:03,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.30 | bwd_microstep: 865.08 | bwd_inner_microstep: 865.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3657
[2024-06-10 07:49:05,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.81 | bwd_microstep: 1593.48 | bwd_inner_microstep: 1593.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 07:49:07,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1520.70 | bwd_inner_microstep: 1520.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3476
[2024-06-10 07:49:09,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.63 | bwd_microstep: 1367.65 | bwd_inner_microstep: 1367.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 07:49:11,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1389.47 | bwd_inner_microstep: 1389.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2692
[2024-06-10 07:49:12,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.56 | bwd_microstep: 1034.77 | bwd_inner_microstep: 1034.62 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 07:49:14,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.19 | bwd_inner_microstep: 1256.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527
[2024-06-10 07:49:16,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.49 | bwd_microstep: 1326.80 | bwd_inner_microstep: 1326.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1007
[2024-06-10 07:49:17,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 163.45 | bwd_microstep: 428.20 | bwd_inner_microstep: 428.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3708
[2024-06-10 07:49:19,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1561.69 | bwd_inner_microstep: 1561.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 07:49:21,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.78 | bwd_microstep: 1435.01 | bwd_inner_microstep: 1434.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 07:49:22,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 800.98 | bwd_inner_microstep: 800.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 07:49:24,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1382.61 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537
[2024-06-10 07:49:26,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.62 | bwd_microstep: 1451.37 | bwd_inner_microstep: 1451.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-10 07:49:27,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 880.51 | bwd_inner_microstep: 880.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3473
[2024-06-10 07:49:29,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1426.76 | bwd_inner_microstep: 1426.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 07:49:31,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1559.55 | bwd_inner_microstep: 1559.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592
[2024-06-10 07:49:33,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.27 | bwd_microstep: 1703.58 | bwd_inner_microstep: 1703.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049
[2024-06-10 07:49:35,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.51 | bwd_microstep: 911.09 | bwd_inner_microstep: 911.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1394
[2024-06-10 07:49:40,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 07:49:40,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.85 | bwd_microstep: 4816.66 | bwd_inner_microstep: 606.70 | bwd_allreduce_microstep: 4209.90 | step_microstep: 38.80
[2024-06-10 07:49:40,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15343.45 | bwd: 45224.13 | bwd_inner: 41013.03 | bwd_allreduce: 4210.28 | step: 40.94
{'loss': 1.2455, 'learning_rate': 3.55829955681867e-05, 'epoch': 0.24}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3477
[2024-06-10 07:49:42,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.57 | bwd_microstep: 1501.62 | bwd_inner_microstep: 1501.54 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 07:49:44,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1386.71 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 07:49:46,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1379.46 | bwd_inner_microstep: 1379.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3834
[2024-06-10 07:49:48,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.04 | bwd_microstep: 1488.20 | bwd_inner_microstep: 1488.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 07:49:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1550.35 | bwd_inner_microstep: 1550.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 07:49:52,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1545.89 | bwd_inner_microstep: 1545.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 07:49:54,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1532.25 | bwd_inner_microstep: 1532.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750
[2024-06-10 07:49:56,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.74 | bwd_microstep: 1339.84 | bwd_inner_microstep: 1339.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 07:49:58,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.85 | bwd_microstep: 1391.72 | bwd_inner_microstep: 1391.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 739
[2024-06-10 07:49:58,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.07 | bwd_microstep: 303.93 | bwd_inner_microstep: 303.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3723
[2024-06-10 07:50:00,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1560.73 | bwd_inner_microstep: 1560.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 07:50:02,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.71 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3530
[2024-06-10 07:50:04,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.92 | bwd_microstep: 1445.47 | bwd_inner_microstep: 1445.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-10 07:50:06,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.93 | bwd_microstep: 1434.49 | bwd_inner_microstep: 1434.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3671
[2024-06-10 07:50:09,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.07 | bwd_microstep: 1824.76 | bwd_inner_microstep: 1824.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 07:50:10,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.56 | bwd_microstep: 808.47 | bwd_inner_microstep: 808.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 07:50:12,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1493.34 | bwd_inner_microstep: 1493.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3833
[2024-06-10 07:50:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1363.91 | bwd_inner_microstep: 1363.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 07:50:16,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1400.72 | bwd_inner_microstep: 1400.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679
[2024-06-10 07:50:18,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1329.50 | bwd_inner_microstep: 1329.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 07:50:20,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1531.28 | bwd_inner_microstep: 1531.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476
[2024-06-10 07:50:22,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.57 | bwd_microstep: 1216.00 | bwd_inner_microstep: 1215.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3481
[2024-06-10 07:50:23,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1330.20 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 07:50:25,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.42 | bwd_microstep: 1163.60 | bwd_inner_microstep: 1163.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 07:50:26,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.60 | bwd_microstep: 803.65 | bwd_inner_microstep: 803.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 07:50:28,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.21 | bwd_microstep: 1280.48 | bwd_inner_microstep: 1280.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453
[2024-06-10 07:50:30,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.62 | bwd_microstep: 1193.09 | bwd_inner_microstep: 1193.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3597
[2024-06-10 07:50:32,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1540.23 | bwd_inner_microstep: 1540.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 07:50:34,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.31 | bwd_microstep: 1496.43 | bwd_inner_microstep: 1496.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3475
[2024-06-10 07:50:36,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1332.46 | bwd_inner_microstep: 1332.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805
[2024-06-10 07:50:38,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.50 | bwd_microstep: 1753.93 | bwd_inner_microstep: 1753.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2275
[2024-06-10 07:50:42,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 07:50:42,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.78 | bwd_microstep: 3743.05 | bwd_inner_microstep: 1158.72 | bwd_allreduce_microstep: 2584.28 | step_microstep: 38.08
[2024-06-10 07:50:42,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16210.00 | bwd: 45952.43 | bwd_inner: 43367.18 | bwd_allreduce: 2584.55 | step: 39.79
{'loss': 1.3226, 'learning_rate': 3.5559440434102176e-05, 'epoch': 0.24}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 07:50:44,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1244.96 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:50:46,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1245.72 | bwd_inner_microstep: 1245.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3893
[2024-06-10 07:50:48,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1415.57 | bwd_inner_microstep: 1415.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 07:50:50,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.55 | bwd_microstep: 1481.54 | bwd_inner_microstep: 1481.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 07:50:51,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.96 | bwd_microstep: 973.22 | bwd_inner_microstep: 973.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 07:50:53,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.13 | bwd_microstep: 1633.81 | bwd_inner_microstep: 1633.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 07:50:55,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.64 | bwd_microstep: 1248.11 | bwd_inner_microstep: 1248.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 07:50:57,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1422.88 | bwd_inner_microstep: 1422.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958
[2024-06-10 07:50:58,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.25 | bwd_microstep: 897.01 | bwd_inner_microstep: 896.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3505
[2024-06-10 07:51:00,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1338.00 | bwd_inner_microstep: 1337.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 07:51:02,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.65 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 07:51:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.30 | bwd_microstep: 1584.07 | bwd_inner_microstep: 1584.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3420
[2024-06-10 07:51:06,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1395.45 | bwd_inner_microstep: 1395.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1980
[2024-06-10 07:51:07,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.48 | bwd_microstep: 848.30 | bwd_inner_microstep: 848.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 07:51:09,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 798.27 | bwd_inner_microstep: 798.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 950
[2024-06-10 07:51:09,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 147.69 | bwd_microstep: 380.03 | bwd_inner_microstep: 380.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 07:51:11,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 07:51:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.06 | bwd_microstep: 701.56 | bwd_inner_microstep: 701.39 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-10 07:51:13,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.39 | bwd_microstep: 701.63 | bwd_inner_microstep: 701.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 07:51:15,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1519.80 | bwd_inner_microstep: 1519.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 07:51:17,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1515.06 | bwd_inner_microstep: 1515.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077
[2024-06-10 07:51:18,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.28 | bwd_microstep: 915.52 | bwd_inner_microstep: 915.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 07:51:20,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1381.57 | bwd_inner_microstep: 1381.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086
[2024-06-10 07:51:21,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.28 | bwd_microstep: 821.13 | bwd_inner_microstep: 821.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3880
[2024-06-10 07:51:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1617.38 | bwd_inner_microstep: 1617.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 07:51:25,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1311.26 | bwd_inner_microstep: 1311.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 07:51:28,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1558.13 | bwd_inner_microstep: 1558.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 07:51:30,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1504.22 | bwd_inner_microstep: 1504.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3771
[2024-06-10 07:51:32,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.50 | bwd_microstep: 1746.16 | bwd_inner_microstep: 1746.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 07:51:34,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1378.11 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 07:51:36,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1516.83 | bwd_inner_microstep: 1516.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3305
[2024-06-10 07:51:44,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 07:51:44,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.10 | bwd_microstep: 7092.40 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 5747.19 | step_microstep: 38.03
[2024-06-10 07:51:44,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15018.58 | bwd: 46078.87 | bwd_inner: 40330.65 | bwd_allreduce: 5747.47 | step: 39.74
{'loss': 1.2906, 'learning_rate': 3.553583049974933e-05, 'epoch': 0.24}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 07:51:46,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.76 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 07:51:48,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.02 | bwd_microstep: 1554.73 | bwd_inner_microstep: 1554.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 07:51:50,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1483.41 | bwd_inner_microstep: 1483.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 07:51:51,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 795.10 | bwd_inner_microstep: 795.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 07:51:52,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.59 | bwd_microstep: 788.17 | bwd_inner_microstep: 788.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 07:51:54,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.30 | bwd_microstep: 1279.25 | bwd_inner_microstep: 1279.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 07:51:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.53 | bwd_microstep: 1283.79 | bwd_inner_microstep: 1283.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 07:51:57,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.41 | bwd_microstep: 1149.58 | bwd_inner_microstep: 1149.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-10 07:51:58,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.17 | bwd_microstep: 678.98 | bwd_inner_microstep: 678.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 07:52:00,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1282.31 | bwd_inner_microstep: 1282.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 07:52:02,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.25 | bwd_microstep: 1278.51 | bwd_inner_microstep: 1278.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3498
[2024-06-10 07:52:04,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1433.96 | bwd_inner_microstep: 1433.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3658
[2024-06-10 07:52:06,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1461.64 | bwd_inner_microstep: 1461.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 07:52:08,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1526.30 | bwd_inner_microstep: 1526.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1967
[2024-06-10 07:52:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.43 | bwd_microstep: 856.68 | bwd_inner_microstep: 856.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 07:52:11,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1487.91 | bwd_inner_microstep: 1487.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3425
[2024-06-10 07:52:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1396.06 | bwd_inner_microstep: 1396.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 07:52:15,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1354.09 | bwd_inner_microstep: 1354.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 07:52:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1443.03 | bwd_inner_microstep: 1443.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 07:52:19,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1349.78 | bwd_inner_microstep: 1349.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3440
[2024-06-10 07:52:20,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.99 | bwd_microstep: 1188.33 | bwd_inner_microstep: 1188.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 07:52:22,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1400.10 | bwd_inner_microstep: 1400.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 07:52:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 07:52:26,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1409.26 | bwd_inner_microstep: 1409.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 07:52:28,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1359.13 | bwd_inner_microstep: 1359.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 07:52:30,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1354.70 | bwd_inner_microstep: 1354.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3581
[2024-06-10 07:52:32,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.14 | bwd_microstep: 1330.63 | bwd_inner_microstep: 1330.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 07:52:34,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1442.79 | bwd_inner_microstep: 1442.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 07:52:36,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1429.84 | bwd_inner_microstep: 1429.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3791
[2024-06-10 07:52:38,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.86 | bwd_microstep: 1354.74 | bwd_inner_microstep: 1354.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3765
[2024-06-10 07:52:40,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.80 | bwd_microstep: 1345.77 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770
[2024-06-10 07:52:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-10 07:52:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 2312.12 | bwd_inner_microstep: 1514.24 | bwd_allreduce_microstep: 797.82 | step_microstep: 37.69
[2024-06-10 07:52:42,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15666.75 | bwd: 42679.09 | bwd_inner: 41880.25 | bwd_allreduce: 798.11 | step: 39.27
{'loss': 1.2971, 'learning_rate': 3.5512165848282225e-05, 'epoch': 0.24}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 07:52:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.07 | bwd_microstep: 1482.16 | bwd_inner_microstep: 1482.06 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 07:52:45,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.26 | bwd_microstep: 678.46 | bwd_inner_microstep: 678.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 07:52:47,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1376.88 | bwd_inner_microstep: 1376.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3913
[2024-06-10 07:52:49,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.12 | bwd_microstep: 1593.15 | bwd_inner_microstep: 1593.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 07:52:52,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.91 | bwd_microstep: 1485.30 | bwd_inner_microstep: 1485.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 07:52:53,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 808.48 | bwd_inner_microstep: 808.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2666
[2024-06-10 07:52:54,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.59 | bwd_microstep: 1073.69 | bwd_inner_microstep: 1073.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 07:52:56,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1485.01 | bwd_inner_microstep: 1484.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 07:52:58,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.32 | bwd_microstep: 1250.99 | bwd_inner_microstep: 1250.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477
[2024-06-10 07:53:00,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.44 | bwd_microstep: 1247.30 | bwd_inner_microstep: 1247.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 07:53:01,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.72 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 07:53:03,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.98 | bwd_microstep: 792.22 | bwd_inner_microstep: 792.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 07:53:04,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1424.62 | bwd_inner_microstep: 1424.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495
[2024-06-10 07:53:06,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1319.50 | bwd_inner_microstep: 1319.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3897
[2024-06-10 07:53:09,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1629.91 | bwd_inner_microstep: 1629.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 07:53:10,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1342.42 | bwd_inner_microstep: 1342.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 07:53:13,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1521.77 | bwd_inner_microstep: 1521.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3839
[2024-06-10 07:53:15,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.36 | bwd_microstep: 1856.24 | bwd_inner_microstep: 1856.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3658
[2024-06-10 07:53:17,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.64 | bwd_microstep: 1567.13 | bwd_inner_microstep: 1567.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3697
[2024-06-10 07:53:20,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.84 | bwd_microstep: 1722.91 | bwd_inner_microstep: 1722.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 07:53:22,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.88 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3683
[2024-06-10 07:53:23,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1230.86 | bwd_inner_microstep: 1230.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 07:53:26,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1554.59 | bwd_inner_microstep: 1554.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2067
[2024-06-10 07:53:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.50 | bwd_microstep: 849.50 | bwd_inner_microstep: 849.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 07:53:29,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.45 | bwd_microstep: 1547.74 | bwd_inner_microstep: 1547.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3561
[2024-06-10 07:53:30,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.67 | bwd_microstep: 1200.04 | bwd_inner_microstep: 1200.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3453
[2024-06-10 07:53:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.61 | bwd_microstep: 1219.95 | bwd_inner_microstep: 1219.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 07:53:34,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.90 | bwd_microstep: 972.36 | bwd_inner_microstep: 972.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049
[2024-06-10 07:53:35,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.80 | bwd_microstep: 910.26 | bwd_inner_microstep: 910.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597
[2024-06-10 07:53:37,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.58 | bwd_microstep: 1445.32 | bwd_inner_microstep: 1445.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-10 07:53:38,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.77 | bwd_microstep: 1158.07 | bwd_inner_microstep: 1158.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 07:53:44,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.37 | optimizer_step: 6.63
[2024-06-10 07:53:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 4942.77 | bwd_inner_microstep: 1567.59 | bwd_allreduce_microstep: 3375.11 | step_microstep: 39.13
[2024-06-10 07:53:44,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15703.64 | bwd: 45481.76 | bwd_inner: 42105.63 | bwd_allreduce: 3375.41 | step: 40.82
{'loss': 1.2912, 'learning_rate': 3.5488446563047645e-05, 'epoch': 0.24}
s/it]


 24%|██▍       | 412/1726 [7:11:16<22:48:32, 62.49s/it]
 24%|██▍       | 413/1726 [7:12:16<22:37:17, 62.02s/it]


 24%|██▍       | 413/1726 [7:12:16<22:37:17, 62.02s/it]
 24%|██▍       | 414/1726 [7:13:19<22:39:31, 62.17s/it]


 24%|██▍       | 414/1726 [7:13:19<22:39:31, 62.17s/it]
 24%|██▍       | 415/1726 [7:14:20<22:33:40, 61.95s/it]


 24%|██▍       | 415/1726 [7:14:20<22:33:40, 61.95s/it]
 24%|██▍       | 416/1726 [7:15:19<22:11:14, 60.97s/it]


 24%|██▍       | 416/1726 [7:15:19<22:11:14, 60.97s/it]
 24%|██▍       | 417/1726 [7:16:21<22:13:59, 61.15s/it]


 24%|██▍       | 417/1726 [7:16:21<22:13:59, 61.15dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 07:53:46,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1464.38 | bwd_inner_microstep: 1464.27 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3393
[2024-06-10 07:53:48,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.35 | bwd_microstep: 1369.61 | bwd_inner_microstep: 1369.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 07:53:50,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1353.14 | bwd_inner_microstep: 1353.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 07:53:52,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.06 | bwd_microstep: 1637.65 | bwd_inner_microstep: 1637.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 07:53:54,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1394.55 | bwd_inner_microstep: 1394.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 07:53:56,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.12 | bwd_microstep: 1281.05 | bwd_inner_microstep: 1281.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 07:53:57,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1254.10 | bwd_inner_microstep: 1254.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 07:53:59,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 07:54:01,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1514.19 | bwd_inner_microstep: 1514.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601
[2024-06-10 07:54:03,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1311.79 | bwd_inner_microstep: 1311.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 07:54:05,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1394.33 | bwd_inner_microstep: 1394.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 07:54:07,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.84 | bwd_microstep: 1424.81 | bwd_inner_microstep: 1424.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 07:54:09,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1389.92 | bwd_inner_microstep: 1389.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 07:54:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1384.64 | bwd_inner_microstep: 1384.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1959
[2024-06-10 07:54:12,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.24 | bwd_microstep: 838.66 | bwd_inner_microstep: 838.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 07:54:14,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.33 | bwd_microstep: 1287.43 | bwd_inner_microstep: 1287.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3840
[2024-06-10 07:54:16,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.42 | bwd_microstep: 1486.95 | bwd_inner_microstep: 1486.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284
[2024-06-10 07:54:17,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.41 | bwd_microstep: 880.15 | bwd_inner_microstep: 880.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 07:54:19,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 07:54:21,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1311.21 | bwd_inner_microstep: 1311.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 07:54:23,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.64 | bwd_microstep: 1254.91 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3531
[2024-06-10 07:54:24,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.33 | bwd_microstep: 1424.70 | bwd_inner_microstep: 1424.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3669
[2024-06-10 07:54:26,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1387.66 | bwd_inner_microstep: 1387.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808
[2024-06-10 07:54:28,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.68 | bwd_microstep: 1356.31 | bwd_inner_microstep: 1356.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643
[2024-06-10 07:54:30,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1317.66 | bwd_inner_microstep: 1317.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 07:54:32,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 07:54:34,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1491.92 | bwd_inner_microstep: 1491.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 07:54:36,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.98 | bwd_microstep: 1450.88 | bwd_inner_microstep: 1450.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 07:54:38,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1654.66 | bwd_inner_microstep: 1654.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065
[2024-06-10 07:54:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.32 | bwd_microstep: 818.40 | bwd_inner_microstep: 818.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2234
[2024-06-10 07:54:41,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.29 | bwd_microstep: 959.54 | bwd_inner_microstep: 959.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3556
[2024-06-10 07:54:43,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.14 | optimizer_step: 6.61
[2024-06-10 07:54:43,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.84 | bwd_microstep: 1953.77 | bwd_inner_microstep: 1642.23 | bwd_allreduce_microstep: 311.49 | step_microstep: 37.64
[2024-06-10 07:54:43,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16019.21 | bwd: 43177.87 | bwd_inner: 42865.38 | bwd_allreduce: 311.78 | step: 39.24
{'loss': 1.3108, 'learning_rate': 3.546467272758479e-05, 'epoch': 0.24}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2908
[2024-06-10 07:54:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.69 | bwd_microstep: 1085.67 | bwd_inner_microstep: 1085.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 07:54:47,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.66 | bwd_microstep: 1280.53 | bwd_inner_microstep: 1280.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2010
[2024-06-10 07:54:48,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.46 | bwd_microstep: 832.71 | bwd_inner_microstep: 832.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 07:54:50,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.16 | bwd_microstep: 1150.62 | bwd_inner_microstep: 1150.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 07:54:51,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 790.00 | bwd_inner_microstep: 789.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 07:54:52,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 791.48 | bwd_inner_microstep: 791.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 07:54:54,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.23 | bwd_microstep: 1422.83 | bwd_inner_microstep: 1422.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 07:54:56,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1335.99 | bwd_inner_microstep: 1335.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3481
[2024-06-10 07:54:57,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.36 | bwd_microstep: 1247.29 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3682
[2024-06-10 07:54:59,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.91 | bwd_microstep: 1354.40 | bwd_inner_microstep: 1354.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2377
[2024-06-10 07:55:00,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.95 | bwd_microstep: 966.04 | bwd_inner_microstep: 966.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3503
[2024-06-10 07:55:03,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.94 | bwd_microstep: 1549.23 | bwd_inner_microstep: 1549.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 07:55:04,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.79 | bwd_inner_microstep: 1348.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1979
[2024-06-10 07:55:06,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.37 | bwd_microstep: 780.12 | bwd_inner_microstep: 780.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3438
[2024-06-10 07:55:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.01 | bwd_microstep: 1203.09 | bwd_inner_microstep: 1203.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3509
[2024-06-10 07:55:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.26 | bwd_microstep: 1685.43 | bwd_inner_microstep: 1685.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3639
[2024-06-10 07:55:12,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1438.49 | bwd_inner_microstep: 1438.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 07:55:14,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1448.97 | bwd_inner_microstep: 1448.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 07:55:15,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 800.03 | bwd_inner_microstep: 800.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-10 07:55:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.46 | bwd_microstep: 976.49 | bwd_inner_microstep: 976.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3873
[2024-06-10 07:55:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1489.81 | bwd_inner_microstep: 1489.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 07:55:20,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1608.64 | bwd_inner_microstep: 1608.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 07:55:22,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1377.66 | bwd_inner_microstep: 1377.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534
[2024-06-10 07:55:24,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1325.71 | bwd_inner_microstep: 1325.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 07:55:26,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1401.24 | bwd_inner_microstep: 1401.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 07:55:28,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1400.15 | bwd_inner_microstep: 1400.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 07:55:30,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1400.87 | bwd_inner_microstep: 1400.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 07:55:32,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1400.27 | bwd_inner_microstep: 1400.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2064
[2024-06-10 07:55:33,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.05 | bwd_microstep: 724.11 | bwd_inner_microstep: 724.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 07:55:35,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.55 | bwd_microstep: 1657.69 | bwd_inner_microstep: 1657.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2197
[2024-06-10 07:55:36,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.16 | bwd_microstep: 1016.89 | bwd_inner_microstep: 1016.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2002
[2024-06-10 07:55:43,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.57
[2024-06-10 07:55:43,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.85 | bwd_microstep: 6159.62 | bwd_inner_microstep: 811.73 | bwd_allreduce_microstep: 5347.84 | step_microstep: 38.11
[2024-06-10 07:55:43,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14650.72 | bwd: 44450.88 | bwd_inner: 39102.13 | bwd_allreduce: 5348.07 | step: 39.65
{'loss': 1.2494, 'learning_rate': 3.544084442562498e-05, 'epoch': 0.24}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1994
[2024-06-10 07:55:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.29 | bwd_microstep: 884.00 | bwd_inner_microstep: 883.94 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895
[2024-06-10 07:55:46,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.47 | bwd_microstep: 1479.05 | bwd_inner_microstep: 1479.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2396
[2024-06-10 07:55:48,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.68 | bwd_microstep: 1032.28 | bwd_inner_microstep: 1032.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 07:55:49,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1341.81 | bwd_inner_microstep: 1341.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3771
[2024-06-10 07:55:52,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.74 | bwd_microstep: 1499.07 | bwd_inner_microstep: 1499.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3738
[2024-06-10 07:55:53,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.97 | bwd_microstep: 1363.73 | bwd_inner_microstep: 1363.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479
[2024-06-10 07:55:55,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1330.36 | bwd_inner_microstep: 1330.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-10 07:55:57,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.23 | bwd_microstep: 1180.02 | bwd_inner_microstep: 1179.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 07:55:59,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495
[2024-06-10 07:56:01,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3514
[2024-06-10 07:56:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1512.53 | bwd_inner_microstep: 1512.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3773
[2024-06-10 07:56:05,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.00 | bwd_microstep: 1583.32 | bwd_inner_microstep: 1583.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 07:56:07,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.76 | bwd_microstep: 1582.90 | bwd_inner_microstep: 1582.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3454
[2024-06-10 07:56:09,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.99 | bwd_microstep: 1513.44 | bwd_inner_microstep: 1513.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3682
[2024-06-10 07:56:12,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.60 | bwd_microstep: 1719.93 | bwd_inner_microstep: 1719.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 07:56:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.06 | bwd_microstep: 1524.17 | bwd_inner_microstep: 1524.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 07:56:16,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1438.54 | bwd_inner_microstep: 1438.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3622
[2024-06-10 07:56:18,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.00 | bwd_microstep: 1676.74 | bwd_inner_microstep: 1676.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825
[2024-06-10 07:56:20,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1385.72 | bwd_inner_microstep: 1385.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-10 07:56:21,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 711.53 | bwd_inner_microstep: 711.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3606
[2024-06-10 07:56:23,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1243.35 | bwd_inner_microstep: 1243.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-10 07:56:24,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1438.71 | bwd_inner_microstep: 1438.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 07:56:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1307.29 | bwd_inner_microstep: 1307.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3527
[2024-06-10 07:56:28,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.30 | bwd_microstep: 1197.40 | bwd_inner_microstep: 1197.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 07:56:30,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1620.91 | bwd_inner_microstep: 1620.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 07:56:32,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1495.00 | bwd_inner_microstep: 1494.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 07:56:34,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1377.18 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 07:56:36,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1536.70 | bwd_inner_microstep: 1536.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 07:56:38,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.07 | bwd_microstep: 1611.84 | bwd_inner_microstep: 1611.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-10 07:56:40,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.27 | bwd_microstep: 1344.89 | bwd_inner_microstep: 1344.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 07:56:42,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1299.41 | bwd_inner_microstep: 1299.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-10 07:56:44,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 07:56:44,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.90 | bwd_microstep: 1246.49 | bwd_inner_microstep: 1221.98 | bwd_allreduce_microstep: 24.47 | step_microstep: 37.68
[2024-06-10 07:56:44,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16489.00 | bwd: 44174.68 | bwd_inner: 44149.26 | bwd_allreduce: 24.72 | step: 39.20
{'loss': 1.2994, 'learning_rate': 3.541696174109137e-05, 'epoch': 0.24}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 07:56:46,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.56 | bwd_microstep: 1338.02 | bwd_inner_microstep: 1337.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 07:56:48,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.30 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2352
[2024-06-10 07:56:49,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.46 | bwd_microstep: 891.79 | bwd_inner_microstep: 891.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3408
[2024-06-10 07:56:50,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.14 | bwd_microstep: 1213.74 | bwd_inner_microstep: 1213.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 07:56:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.16 | bwd_microstep: 1437.22 | bwd_inner_microstep: 1437.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 07:56:54,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.27 | bwd_microstep: 1153.24 | bwd_inner_microstep: 1153.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 07:56:55,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.18 | bwd_microstep: 797.26 | bwd_inner_microstep: 797.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 07:56:57,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1284.04 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 07:56:58,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.71 | bwd_microstep: 679.14 | bwd_inner_microstep: 679.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1971
[2024-06-10 07:56:59,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.46 | bwd_microstep: 735.40 | bwd_inner_microstep: 735.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-10 07:57:00,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.25 | bwd_microstep: 888.17 | bwd_inner_microstep: 888.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 07:57:02,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1347.96 | bwd_inner_microstep: 1347.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-10 07:57:04,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.41 | bwd_microstep: 1432.30 | bwd_inner_microstep: 1432.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3057
[2024-06-10 07:57:06,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.70 | bwd_microstep: 1297.67 | bwd_inner_microstep: 1297.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 07:57:08,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1515.80 | bwd_inner_microstep: 1515.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-10 07:57:10,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.65 | bwd_microstep: 1365.94 | bwd_inner_microstep: 1365.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2486
[2024-06-10 07:57:11,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.97 | bwd_microstep: 1001.66 | bwd_inner_microstep: 1001.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 07:57:12,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.62 | bwd_microstep: 805.54 | bwd_inner_microstep: 805.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 07:57:14,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.69 | bwd_microstep: 1292.86 | bwd_inner_microstep: 1292.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526
[2024-06-10 07:57:16,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1422.65 | bwd_inner_microstep: 1422.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 07:57:18,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1294.91 | bwd_inner_microstep: 1294.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 07:57:20,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1322.24 | bwd_inner_microstep: 1322.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758
[2024-06-10 07:57:22,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1445.02 | bwd_inner_microstep: 1444.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-10 07:57:23,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.49 | bwd_microstep: 1314.52 | bwd_inner_microstep: 1314.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2043
[2024-06-10 07:57:26,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.62 | bwd_microstep: 1935.16 | bwd_inner_microstep: 1935.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182
[2024-06-10 07:57:27,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.32 | bwd_microstep: 856.96 | bwd_inner_microstep: 856.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 07:57:29,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 1457.26 | bwd_inner_microstep: 1457.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 07:57:31,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433
[2024-06-10 07:57:33,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1374.50 | bwd_inner_microstep: 1374.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2961
[2024-06-10 07:57:34,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.06 | bwd_microstep: 1262.08 | bwd_inner_microstep: 1262.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3765
[2024-06-10 07:57:37,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.29 | bwd_microstep: 1739.13 | bwd_inner_microstep: 1739.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3764
[2024-06-10 07:57:48,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.60
[2024-06-10 07:57:48,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.49 | bwd_microstep: 10421.32 | bwd_inner_microstep: 2295.46 | bwd_allreduce_microstep: 8125.80 | step_microstep: 39.06
[2024-06-10 07:57:48,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14688.87 | bwd: 48852.85 | bwd_inner: 40726.12 | bwd_allreduce: 8126.04 | step: 40.60
{'loss': 1.2845, 'learning_rate': 3.5393024758098645e-05, 'epoch': 0.24}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2108
[2024-06-10 07:57:49,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.03 | bwd_microstep: 913.31 | bwd_inner_microstep: 913.17 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401
[2024-06-10 07:57:51,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.38 | bwd_microstep: 1213.87 | bwd_inner_microstep: 1213.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416
[2024-06-10 07:57:52,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.64 | bwd_microstep: 1175.52 | bwd_inner_microstep: 1175.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4168
[2024-06-10 07:57:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.19 | bwd_microstep: 1747.79 | bwd_inner_microstep: 1747.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 07:57:57,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.32 | bwd_microstep: 1478.91 | bwd_inner_microstep: 1478.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2070
[2024-06-10 07:57:58,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.08 | bwd_microstep: 726.06 | bwd_inner_microstep: 726.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 07:58:00,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1401.92 | bwd_inner_microstep: 1401.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 07:58:02,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1564.61 | bwd_inner_microstep: 1564.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 713
[2024-06-10 07:58:02,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.00 | bwd_microstep: 291.87 | bwd_inner_microstep: 291.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 07:58:05,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1574.41 | bwd_inner_microstep: 1574.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662
[2024-06-10 07:58:06,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.22 | bwd_microstep: 1422.23 | bwd_inner_microstep: 1422.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-10 07:58:09,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.54 | bwd_microstep: 1577.57 | bwd_inner_microstep: 1577.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914
[2024-06-10 07:58:10,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.08 | bwd_microstep: 780.51 | bwd_inner_microstep: 780.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 07:58:12,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1373.97 | bwd_inner_microstep: 1373.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3099
[2024-06-10 07:58:13,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1246.63 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 07:58:15,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1391.87 | bwd_inner_microstep: 1391.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 07:58:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1301.67 | bwd_inner_microstep: 1301.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 07:58:19,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 07:58:21,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.91 | bwd_microstep: 1192.89 | bwd_inner_microstep: 1192.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 07:58:23,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.83 | bwd_microstep: 1289.14 | bwd_inner_microstep: 1289.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 07:58:25,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1413.34 | bwd_inner_microstep: 1413.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046
[2024-06-10 07:58:26,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.70 | bwd_microstep: 808.94 | bwd_inner_microstep: 808.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3719
[2024-06-10 07:58:28,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1496.07 | bwd_inner_microstep: 1496.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 07:58:30,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1293.25 | bwd_inner_microstep: 1293.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 07:58:32,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.64 | bwd_microstep: 1381.89 | bwd_inner_microstep: 1381.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 07:58:34,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.93 | bwd_microstep: 1448.01 | bwd_inner_microstep: 1447.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 07:58:36,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.05 | bwd_microstep: 1629.69 | bwd_inner_microstep: 1629.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3429
[2024-06-10 07:58:38,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1399.43 | bwd_inner_microstep: 1399.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554
[2024-06-10 07:58:40,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.42 | bwd_microstep: 1331.91 | bwd_inner_microstep: 1331.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3395
[2024-06-10 07:58:41,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1408.95 | bwd_inner_microstep: 1408.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-10 07:58:43,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1425.36 | bwd_inner_microstep: 1425.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777
[2024-06-10 07:58:50,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-10 07:58:50,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 5649.07 | bwd_inner_microstep: 1875.03 | bwd_allreduce_microstep: 3773.97 | step_microstep: 38.80
[2024-06-10 07:58:50,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15698.54 | bwd: 45907.44 | bwd_inner: 42132.43 | bwd_allreduce: 3774.27 | step: 40.51
{'loss': 1.2664, 'learning_rate': 3.5369033560952756e-05, 'epoch': 0.24}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 07:58:52,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1272.64 | bwd_inner_microstep: 1272.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 07:58:54,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3896
[2024-06-10 07:58:56,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1417.89 | bwd_inner_microstep: 1417.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798
[2024-06-10 07:58:58,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.56 | bwd_microstep: 1445.15 | bwd_inner_microstep: 1445.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2297
[2024-06-10 07:58:59,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.61 | bwd_microstep: 910.93 | bwd_inner_microstep: 910.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630
[2024-06-10 07:59:01,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.03 | bwd_microstep: 1314.54 | bwd_inner_microstep: 1314.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1955
[2024-06-10 07:59:02,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 733.05 | bwd_inner_microstep: 733.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3172
[2024-06-10 07:59:04,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1320.26 | bwd_inner_microstep: 1320.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 07:59:05,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1382.72 | bwd_inner_microstep: 1382.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2009
[2024-06-10 07:59:07,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.77 | bwd_microstep: 866.56 | bwd_inner_microstep: 866.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 07:59:09,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1496.88 | bwd_inner_microstep: 1496.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551
[2024-06-10 07:59:10,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.28 | bwd_microstep: 1238.25 | bwd_inner_microstep: 1238.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3471
[2024-06-10 07:59:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1360.02 | bwd_inner_microstep: 1359.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 07:59:14,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1485.34 | bwd_inner_microstep: 1485.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 07:59:16,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1346.47 | bwd_inner_microstep: 1346.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 07:59:18,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1384.07 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 07:59:20,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1399.49 | bwd_inner_microstep: 1399.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 07:59:22,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1555.80 | bwd_inner_microstep: 1555.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 07:59:24,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1400.34 | bwd_inner_microstep: 1400.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 07:59:26,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.16 | bwd_microstep: 1659.78 | bwd_inner_microstep: 1659.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 07:59:28,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1286.37 | bwd_inner_microstep: 1286.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 07:59:30,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1460.69 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-10 07:59:32,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.80 | bwd_microstep: 1298.21 | bwd_inner_microstep: 1298.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 07:59:34,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.58 | bwd_microstep: 1407.79 | bwd_inner_microstep: 1407.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2166
[2024-06-10 07:59:35,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.27 | bwd_microstep: 853.78 | bwd_inner_microstep: 853.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826
[2024-06-10 07:59:37,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.63 | bwd_microstep: 1517.73 | bwd_inner_microstep: 1517.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 07:59:39,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.70 | bwd_microstep: 1256.02 | bwd_inner_microstep: 1255.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2460
[2024-06-10 07:59:40,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.65 | bwd_microstep: 1050.89 | bwd_inner_microstep: 1050.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 07:59:42,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3613
[2024-06-10 07:59:44,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1434.79 | bwd_inner_microstep: 1434.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 07:59:46,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1495.65 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2733
[2024-06-10 07:59:53,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-10 07:59:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.75 | bwd_microstep: 5598.16 | bwd_inner_microstep: 1324.08 | bwd_allreduce_microstep: 4274.02 | step_microstep: 38.79
[2024-06-10 07:59:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15831.06 | bwd: 46625.80 | bwd_inner: 42350.85 | bwd_allreduce: 4274.26 | step: 40.39
s/it]
 24%|██▍       | 418/1726 [7:17:20<22:02:31, 60.67s/it]


 24%|██▍       | 418/1726 [7:17:20<22:02:31, 60.67s/it]
 24%|██▍       | 419/1726 [7:18:20<21:53:26, 60.30s/it]


 24%|██▍       | 419/1726 [7:18:20<21:53:26, 60.30s/it]
 24%|██▍       | 420/1726 [7:19:21<21:57:03, 60.51s/it]


 24%|██▍       | 420/1726 [7:19:21<21:57:03, 60.51s/it]
 24%|██▍       | 421/1726 [7:20:25<22:18:02, 61.52s/it]


 24%|██▍       | 421/1726 [7:20:25<22:18:02, 61.52s/it]
 24%|██▍       | 422/1726 [7:21:26<22:19:52, 61.65s/it]


 24%|██▍       | 422/1726 [7:21:26<22:19:52, 61.65s/it]
 25%|██▍       | 423/1726 [7:22:29<22:26:19, 61.99s/it]
                                                   {'loss': 1.2779, 'learning_rate': 3.534498823415056e-05, 'epoch': 0.25}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 07:59:54,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1371.82 | bwd_inner_microstep: 1371.68 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 07:59:56,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.65 | bwd_microstep: 1279.92 | bwd_inner_microstep: 1279.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3984
[2024-06-10 07:59:59,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.15 | bwd_microstep: 1704.73 | bwd_inner_microstep: 1704.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 08:00:00,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1345.41 | bwd_inner_microstep: 1345.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3765
[2024-06-10 08:00:02,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1473.43 | bwd_inner_microstep: 1473.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540
[2024-06-10 08:00:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1424.79 | bwd_inner_microstep: 1424.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1945
[2024-06-10 08:00:05,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.37 | bwd_microstep: 760.13 | bwd_inner_microstep: 760.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 08:00:07,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1248.43 | bwd_inner_microstep: 1248.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 08:00:09,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1528.61 | bwd_inner_microstep: 1528.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 08:00:11,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2174
[2024-06-10 08:00:13,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.95 | bwd_microstep: 952.16 | bwd_inner_microstep: 952.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 08:00:14,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.93 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 08:00:16,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.99 | bwd_microstep: 1380.74 | bwd_inner_microstep: 1380.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 08:00:19,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.24 | bwd_microstep: 1613.56 | bwd_inner_microstep: 1613.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3427
[2024-06-10 08:00:21,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1409.65 | bwd_inner_microstep: 1409.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3642
[2024-06-10 08:00:22,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650
[2024-06-10 08:00:24,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.54 | bwd_microstep: 1424.32 | bwd_inner_microstep: 1424.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-10 08:00:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1415.71 | bwd_inner_microstep: 1415.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 08:00:28,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1509.68 | bwd_inner_microstep: 1509.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070
[2024-06-10 08:00:29,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.15 | bwd_microstep: 815.53 | bwd_inner_microstep: 815.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3458
[2024-06-10 08:00:31,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.86 | bwd_microstep: 1355.74 | bwd_inner_microstep: 1355.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830
[2024-06-10 08:00:34,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.67 | bwd_microstep: 1757.07 | bwd_inner_microstep: 1757.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 08:00:36,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.64 | bwd_microstep: 1646.54 | bwd_inner_microstep: 1646.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 08:00:38,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1353.75 | bwd_inner_microstep: 1353.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 08:00:40,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.46 | bwd_microstep: 1259.16 | bwd_inner_microstep: 1259.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2206
[2024-06-10 08:00:41,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.00 | bwd_microstep: 767.19 | bwd_inner_microstep: 767.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2268
[2024-06-10 08:00:42,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.88 | bwd_microstep: 876.04 | bwd_inner_microstep: 876.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3771
[2024-06-10 08:00:44,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1562.34 | bwd_inner_microstep: 1562.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564
[2024-06-10 08:00:46,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1331.89 | bwd_inner_microstep: 1331.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278
[2024-06-10 08:00:47,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.59 | bwd_microstep: 1070.86 | bwd_inner_microstep: 1070.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 08:00:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.88 | bwd_microstep: 1405.60 | bwd_inner_microstep: 1405.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 08:00:56,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 08:00:56,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 6385.80 | bwd_inner_microstep: 1525.78 | bwd_allreduce_microstep: 4859.97 | step_microstep: 38.28
[2024-06-10 08:00:56,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15880.71 | bwd: 47457.17 | bwd_inner: 42596.16 | bwd_allreduce: 4860.27 | step: 39.87
{'loss': 1.3103, 'learning_rate': 3.532088886237956e-05, 'epoch': 0.25}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619
[2024-06-10 08:00:58,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.82 | bwd_microstep: 1428.20 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 08:01:00,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1476.54 | bwd_inner_microstep: 1476.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901
[2024-06-10 08:01:02,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1587.33 | bwd_inner_microstep: 1587.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 08:01:04,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.15 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 08:01:06,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1450.47 | bwd_inner_microstep: 1450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746
[2024-06-10 08:01:08,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.58 | bwd_microstep: 1638.72 | bwd_inner_microstep: 1638.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 08:01:10,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.17 | bwd_microstep: 1382.65 | bwd_inner_microstep: 1382.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 08:01:11,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1254.80 | bwd_inner_microstep: 1254.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1961
[2024-06-10 08:01:12,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.46 | bwd_microstep: 766.32 | bwd_inner_microstep: 766.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2084
[2024-06-10 08:01:14,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.87 | bwd_microstep: 883.58 | bwd_inner_microstep: 883.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 08:01:16,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 08:01:17,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1351.72 | bwd_inner_microstep: 1351.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 08:01:19,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.97 | bwd_microstep: 1454.86 | bwd_inner_microstep: 1454.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3517
[2024-06-10 08:01:21,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1430.70 | bwd_inner_microstep: 1430.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-10 08:01:22,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.04 | bwd_microstep: 686.48 | bwd_inner_microstep: 686.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 08:01:24,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1402.77 | bwd_inner_microstep: 1402.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295
[2024-06-10 08:01:26,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 882.68 | bwd_inner_microstep: 882.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 08:01:27,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1318.51 | bwd_inner_microstep: 1318.30 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 08:01:29,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1378.51 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-10 08:01:30,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 697.92 | bwd_inner_microstep: 697.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 08:01:32,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180
[2024-06-10 08:01:33,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.63 | bwd_microstep: 858.44 | bwd_inner_microstep: 858.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901
[2024-06-10 08:01:34,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.86 | bwd_microstep: 686.62 | bwd_inner_microstep: 686.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3689
[2024-06-10 08:01:37,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.34 | bwd_microstep: 1725.24 | bwd_inner_microstep: 1725.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3653
[2024-06-10 08:01:39,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1560.78 | bwd_inner_microstep: 1560.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825
[2024-06-10 08:01:41,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3889
[2024-06-10 08:01:43,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.97 | bwd_microstep: 1636.69 | bwd_inner_microstep: 1636.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 08:01:45,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.51 | bwd_microstep: 1758.90 | bwd_inner_microstep: 1758.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 08:01:48,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.33 | bwd_microstep: 1548.97 | bwd_inner_microstep: 1548.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 08:01:50,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1595.41 | bwd_inner_microstep: 1595.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 08:01:52,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1505.64 | bwd_inner_microstep: 1505.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 08:01:58,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 08:01:58,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.29 | bwd_microstep: 5342.64 | bwd_inner_microstep: 1615.47 | bwd_allreduce_microstep: 3727.12 | step_microstep: 38.12
[2024-06-10 08:01:58,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15592.93 | bwd: 45650.21 | bwd_inner: 41922.00 | bwd_allreduce: 3727.43 | step: 39.78
{'loss': 1.2599, 'learning_rate': 3.5296735530517646e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 08:02:00,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.89 | bwd_microstep: 1467.73 | bwd_inner_microstep: 1467.53 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2635
[2024-06-10 08:02:01,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.74 | bwd_microstep: 1049.96 | bwd_inner_microstep: 1049.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 08:02:02,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.31 | bwd_microstep: 809.21 | bwd_inner_microstep: 809.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 08:02:04,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.84 | bwd_microstep: 1481.17 | bwd_inner_microstep: 1481.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 08:02:06,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1245.89 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 08:02:08,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1379.58 | bwd_inner_microstep: 1379.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 08:02:10,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.80 | bwd_microstep: 1384.44 | bwd_inner_microstep: 1384.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 08:02:12,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.27 | bwd_microstep: 1152.96 | bwd_inner_microstep: 1152.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 08:02:13,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.71 | bwd_microstep: 1285.70 | bwd_inner_microstep: 1285.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 08:02:15,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1395.93 | bwd_inner_microstep: 1395.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 08:02:17,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1429.01 | bwd_inner_microstep: 1428.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 08:02:19,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1417.67 | bwd_inner_microstep: 1417.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1921
[2024-06-10 08:02:20,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.28 | bwd_microstep: 760.38 | bwd_inner_microstep: 760.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3516
[2024-06-10 08:02:22,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.72 | bwd_microstep: 1238.85 | bwd_inner_microstep: 1238.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3678
[2024-06-10 08:02:24,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.10 | bwd_microstep: 1480.10 | bwd_inner_microstep: 1480.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 08:02:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.86 | bwd_microstep: 1381.16 | bwd_inner_microstep: 1381.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-10 08:02:27,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.20 | bwd_microstep: 913.04 | bwd_inner_microstep: 913.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 08:02:29,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.16 | bwd_microstep: 921.78 | bwd_inner_microstep: 921.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3630
[2024-06-10 08:02:31,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.20 | bwd_microstep: 1708.40 | bwd_inner_microstep: 1708.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3640
[2024-06-10 08:02:33,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1541.05 | bwd_inner_microstep: 1541.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 08:02:35,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1460.17 | bwd_inner_microstep: 1460.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813
[2024-06-10 08:02:37,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1480.33 | bwd_inner_microstep: 1480.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 08:02:39,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.94 | bwd_microstep: 1452.88 | bwd_inner_microstep: 1452.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 08:02:40,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.98 | bwd_microstep: 800.48 | bwd_inner_microstep: 800.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437
[2024-06-10 08:02:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.98 | bwd_microstep: 1313.40 | bwd_inner_microstep: 1313.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829
[2024-06-10 08:02:44,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1515.55 | bwd_inner_microstep: 1515.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 08:02:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1351.30 | bwd_inner_microstep: 1351.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 08:02:48,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.36 | bwd_microstep: 1661.76 | bwd_inner_microstep: 1661.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 08:02:50,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 08:02:52,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1442.99 | bwd_inner_microstep: 1442.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-10 08:02:54,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.91 | bwd_microstep: 1634.94 | bwd_inner_microstep: 1634.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2895
[2024-06-10 08:02:57,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 08:02:57,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.19 | bwd_microstep: 2262.42 | bwd_inner_microstep: 1269.59 | bwd_allreduce_microstep: 992.78 | step_microstep: 37.73
[2024-06-10 08:02:57,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15746.69 | bwd: 43198.01 | bwd_inner: 42204.13 | bwd_allreduce: 993.09 | step: 39.43
{'loss': 1.2757, 'learning_rate': 3.527252832363271e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 08:02:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1465.11 | bwd_inner_microstep: 1465.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 08:03:01,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.47 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3811
[2024-06-10 08:03:03,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 08:03:05,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.20 | bwd_microstep: 1480.04 | bwd_inner_microstep: 1480.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 08:03:07,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1377.03 | bwd_inner_microstep: 1377.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 08:03:09,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.60 | bwd_microstep: 1532.22 | bwd_inner_microstep: 1532.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-10 08:03:10,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.80 | bwd_microstep: 817.28 | bwd_inner_microstep: 817.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3701
[2024-06-10 08:03:12,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.35 | bwd_microstep: 1328.24 | bwd_inner_microstep: 1328.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-10 08:03:14,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 08:03:16,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.08 | bwd_microstep: 1274.21 | bwd_inner_microstep: 1274.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 08:03:18,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1583.02 | bwd_inner_microstep: 1582.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448
[2024-06-10 08:03:20,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1411.42 | bwd_inner_microstep: 1411.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 08:03:22,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 08:03:24,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1480.82 | bwd_inner_microstep: 1480.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 08:03:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1480.65 | bwd_inner_microstep: 1480.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 08:03:28,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1344.63 | bwd_inner_microstep: 1344.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 08:03:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 809.59 | bwd_inner_microstep: 809.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 08:03:31,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1404.55 | bwd_inner_microstep: 1404.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 08:03:33,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1291.91 | bwd_inner_microstep: 1291.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2218
[2024-06-10 08:03:34,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.60 | bwd_microstep: 961.53 | bwd_inner_microstep: 961.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 08:03:36,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1499.09 | bwd_inner_microstep: 1499.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 08:03:38,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.11 | bwd_microstep: 1440.03 | bwd_inner_microstep: 1440.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 08:03:40,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1459.75 | bwd_inner_microstep: 1459.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 08:03:42,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1494.03 | bwd_inner_microstep: 1494.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576
[2024-06-10 08:03:44,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.16 | bwd_microstep: 1338.98 | bwd_inner_microstep: 1338.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 08:03:45,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 811.14 | bwd_inner_microstep: 811.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 08:03:47,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.82 | bwd_microstep: 1497.32 | bwd_inner_microstep: 1497.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 08:03:49,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1488.08 | bwd_inner_microstep: 1488.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-10 08:03:51,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1432.98 | bwd_inner_microstep: 1432.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 08:03:53,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1499.44 | bwd_inner_microstep: 1499.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 08:03:55,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1391.03 | bwd_inner_microstep: 1391.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3606
[2024-06-10 08:03:57,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 08:03:57,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.82 | bwd_microstep: 1612.27 | bwd_inner_microstep: 1604.54 | bwd_allreduce_microstep: 7.69 | step_microstep: 37.75
[2024-06-10 08:03:57,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16298.92 | bwd: 43582.98 | bwd_inner: 43574.39 | bwd_allreduce: 7.92 | step: 39.35
{'loss': 1.2866, 'learning_rate': 3.524826732698241e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-10 08:03:59,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1435.73 | bwd_inner_microstep: 1435.60 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462
[2024-06-10 08:04:01,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.54 | bwd_microstep: 1210.48 | bwd_inner_microstep: 1210.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 08:04:02,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.51 | bwd_microstep: 971.40 | bwd_inner_microstep: 971.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766
[2024-06-10 08:04:04,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1340.49 | bwd_inner_microstep: 1340.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 08:04:06,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1245.38 | bwd_inner_microstep: 1245.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586
[2024-06-10 08:04:08,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.31 | bwd_microstep: 1212.70 | bwd_inner_microstep: 1212.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 08:04:10,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 08:04:12,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.58 | bwd_microstep: 1621.29 | bwd_inner_microstep: 1621.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3055
[2024-06-10 08:04:13,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.13 | bwd_microstep: 1138.56 | bwd_inner_microstep: 1138.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 08:04:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1510.77 | bwd_inner_microstep: 1510.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1992
[2024-06-10 08:04:17,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.47 | bwd_microstep: 772.44 | bwd_inner_microstep: 772.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 08:04:18,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1280.74 | bwd_inner_microstep: 1280.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 08:04:20,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.54 | bwd_microstep: 1398.99 | bwd_inner_microstep: 1398.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-10 08:04:22,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1409.38 | bwd_inner_microstep: 1409.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 08:04:25,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.67 | bwd_microstep: 1624.94 | bwd_inner_microstep: 1624.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 08:04:26,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.87 | bwd_microstep: 1287.69 | bwd_inner_microstep: 1287.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2309
[2024-06-10 08:04:28,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.65 | bwd_microstep: 886.59 | bwd_inner_microstep: 886.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2485
[2024-06-10 08:04:29,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.76 | bwd_microstep: 965.13 | bwd_inner_microstep: 965.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2099
[2024-06-10 08:04:30,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.35 | bwd_microstep: 825.23 | bwd_inner_microstep: 825.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154
[2024-06-10 08:04:31,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.58 | bwd_microstep: 854.99 | bwd_inner_microstep: 854.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668
[2024-06-10 08:04:33,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1328.25 | bwd_inner_microstep: 1328.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2327
[2024-06-10 08:04:34,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.44 | bwd_microstep: 891.68 | bwd_inner_microstep: 891.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3598
[2024-06-10 08:04:36,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1339.15 | bwd_inner_microstep: 1339.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 08:04:37,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.19 | bwd_microstep: 734.24 | bwd_inner_microstep: 734.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 08:04:39,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1506.72 | bwd_inner_microstep: 1506.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1895
[2024-06-10 08:04:40,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.64 | bwd_microstep: 750.19 | bwd_inner_microstep: 750.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 08:04:43,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1645.10 | bwd_inner_microstep: 1645.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 08:04:44,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 08:04:46,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1344.42 | bwd_inner_microstep: 1344.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 08:04:48,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1632.74 | bwd_inner_microstep: 1632.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 08:04:50,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1493.21 | bwd_inner_microstep: 1493.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 08:05:00,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.34 | optimizer_step: 6.58
[2024-06-10 08:05:00,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 9157.38 | bwd_inner_microstep: 1567.11 | bwd_allreduce_microstep: 7590.21 | step_microstep: 38.77
[2024-06-10 08:05:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14957.71 | bwd: 47582.92 | bwd_inner: 39991.67 | bwd_allreduce: 7590.52 | step: 41.48
{'loss': 1.3058, 'learning_rate': 3.522395262601386e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 08:05:02,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.79 | bwd_microstep: 1506.61 | bwd_inner_microstep: 1506.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2394
[2024-06-10 08:05:04,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.13 | bwd_microstep: 997.07 | bwd_inner_microstep: 997.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-10 08:05:06,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.70 | bwd_microstep: 1555.96 | bwd_inner_microstep: 1555.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 08:05:08,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.33 | bwd_microstep: 1277.62 | bwd_inner_microstep: 1277.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297
[2024-06-10 08:05:09,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.34 | bwd_microstep: 974.60 | bwd_inner_microstep: 974.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 08:05:11,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1381.79 | bwd_inner_microstep: 1381.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 08:05:13,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1384.84 | bwd_inner_microstep: 1384.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 08:05:15,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.33 | bwd_microstep: 1293.67 | bwd_inner_microstep: 1293.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 08:05:16,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.32 | bwd_microstep: 1401.74 | bwd_inner_microstep: 1401.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 08:05:17,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.09 | bwd_microstep: 680.35 | bwd_inner_microstep: 680.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 08:05:19,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1388.56 | bwd_inner_microstep: 1388.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093
[2024-06-10 08:05:21,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.17 | bwd_microstep: 919.01 | bwd_inner_microstep: 918.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-10 08:05:23,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1413.38 | bwd_inner_microstep: 1413.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 08:05:25,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.07 | bwd_microstep: 1633.60 | bwd_inner_microstep: 1633.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2193
[2024-06-10 08:05:26,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.78 | bwd_microstep: 987.16 | bwd_inner_microstep: 987.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3523
[2024-06-10 08:05:28,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.66 | bwd_microstep: 1254.77 | bwd_inner_microstep: 1254.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501
[2024-06-10 08:05:30,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 1416.50 | bwd_inner_microstep: 1416.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 08:05:32,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.31 | bwd_microstep: 1185.39 | bwd_inner_microstep: 1185.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2956
[2024-06-10 08:05:33,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.55 | bwd_microstep: 1010.77 | bwd_inner_microstep: 1010.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621
[2024-06-10 08:05:35,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.89 | bwd_microstep: 1435.04 | bwd_inner_microstep: 1435.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2183
[2024-06-10 08:05:36,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.33 | bwd_microstep: 857.04 | bwd_inner_microstep: 857.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 08:05:38,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2187
[2024-06-10 08:05:39,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.44 | bwd_microstep: 765.94 | bwd_inner_microstep: 765.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 08:05:41,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1299.90 | bwd_inner_microstep: 1299.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3550
[2024-06-10 08:05:43,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1421.58 | bwd_inner_microstep: 1421.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 08:05:44,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.81 | bwd_microstep: 981.82 | bwd_inner_microstep: 981.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 08:05:46,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 08:05:48,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.28 | bwd_microstep: 1478.06 | bwd_inner_microstep: 1478.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 08:05:50,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1405.75 | bwd_inner_microstep: 1405.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3604
[2024-06-10 08:05:52,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1706.66 | bwd_inner_microstep: 1706.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3584
[2024-06-10 08:05:54,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1458.09 | bwd_inner_microstep: 1458.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-10 08:06:01,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 08:06:01,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.44 | bwd_microstep: 5977.52 | bwd_inner_microstep: 1817.16 | bwd_allreduce_microstep: 4160.30 | step_microstep: 38.32
[2024-06-10 08:06:01,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15286.31 | bwd: 45134.22 | bwd_inner: 40972.99 | bwd_allreduce: 4160.53 | step: 39.87


 25%|██▍       | 423/1726 [7:22:29<22:26:19, 61.99s/it]
 25%|██▍       | 424/1726 [7:23:33<22:36:14, 62.50s/it]


 25%|██▍       | 424/1726 [7:23:33<22:36:14, 62.50s/it]
 25%|██▍       | 425/1726 [7:24:35<22:29:15, 62.23s/it]


 25%|██▍       | 425/1726 [7:24:35<22:29:15, 62.23s/it]
 25%|██▍       | 426/1726 [7:25:34<22:09:09, 61.35s/it]


 25%|██▍       | 426/1726 [7:25:34<22:09:09, 61.35s/it]
 25%|██▍       | 427/1726 [7:26:34<22:00:51, 61.01s/it]


 25%|██▍       | 427/1726 [7:26:34<22:00:51, 61.01s/it]
 25%|██▍       | 428/1726 [7:27:37<22:11:57, 61.57s/it]


 25%|██▍       | 428/1726 [7:27:37<22:11:57, 61.57s/it]
 25%|██▍       | 429/1726 [7:28:38<22:05:41, {'loss': 1.2706, 'learning_rate': 3.5199584306363296e-05, 'epoch': 0.25}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 08:06:03,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1372.74 | bwd_inner_microstep: 1372.65 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 08:06:05,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.04 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893
[2024-06-10 08:06:07,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.57 | bwd_microstep: 1579.23 | bwd_inner_microstep: 1579.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 08:06:09,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.50 | bwd_microstep: 1274.33 | bwd_inner_microstep: 1274.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 08:06:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.63 | bwd_microstep: 1273.55 | bwd_inner_microstep: 1273.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 08:06:13,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.94 | bwd_microstep: 1628.78 | bwd_inner_microstep: 1628.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 08:06:14,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1296.57 | bwd_inner_microstep: 1296.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914
[2024-06-10 08:06:15,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.75 | bwd_microstep: 714.24 | bwd_inner_microstep: 714.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 08:06:17,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.92 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696
[2024-06-10 08:06:20,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1629.01 | bwd_inner_microstep: 1628.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3812
[2024-06-10 08:06:22,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.13 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 08:06:23,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.11 | bwd_microstep: 1315.07 | bwd_inner_microstep: 1315.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 08:06:26,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.77 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452
[2024-06-10 08:06:27,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.50 | bwd_microstep: 1314.17 | bwd_inner_microstep: 1314.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909
[2024-06-10 08:06:28,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.28 | bwd_microstep: 780.48 | bwd_inner_microstep: 780.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 08:06:30,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 08:06:33,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.75 | bwd_microstep: 1619.31 | bwd_inner_microstep: 1619.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 08:06:35,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1400.15 | bwd_inner_microstep: 1400.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2409
[2024-06-10 08:06:36,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.45 | bwd_microstep: 938.12 | bwd_inner_microstep: 938.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1958
[2024-06-10 08:06:37,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.64 | bwd_microstep: 768.27 | bwd_inner_microstep: 768.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516
[2024-06-10 08:06:39,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.03 | bwd_microstep: 1198.92 | bwd_inner_microstep: 1198.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 08:06:41,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.70 | bwd_microstep: 1460.74 | bwd_inner_microstep: 1460.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 08:06:42,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1384.58 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2014
[2024-06-10 08:06:44,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.75 | bwd_microstep: 842.23 | bwd_inner_microstep: 842.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:06:46,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.54 | bwd_inner_microstep: 1556.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2017
[2024-06-10 08:06:47,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.96 | bwd_microstep: 717.24 | bwd_inner_microstep: 717.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070
[2024-06-10 08:06:48,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.88 | bwd_microstep: 851.94 | bwd_inner_microstep: 851.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2270
[2024-06-10 08:06:49,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.95 | bwd_microstep: 1068.79 | bwd_inner_microstep: 1068.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3433
[2024-06-10 08:06:51,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1217.47 | bwd_inner_microstep: 1217.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 08:06:53,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1555.42 | bwd_inner_microstep: 1555.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3586
[2024-06-10 08:06:56,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.90 | bwd_microstep: 1750.19 | bwd_inner_microstep: 1750.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3806
[2024-06-10 08:07:01,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 08:07:01,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.40 | bwd_microstep: 4726.96 | bwd_inner_microstep: 1806.32 | bwd_allreduce_microstep: 2920.59 | step_microstep: 39.67
[2024-06-10 08:07:01,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15404.97 | bwd: 44313.30 | bwd_inner: 41391.72 | bwd_allreduce: 2920.87 | step: 41.44
{'loss': 1.3353, 'learning_rate': 3.517516245385582e-05, 'epoch': 0.25}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2428
[2024-06-10 08:07:02,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.41 | bwd_microstep: 937.08 | bwd_inner_microstep: 936.90 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1137
[2024-06-10 08:07:03,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.87 | bwd_microstep: 458.89 | bwd_inner_microstep: 458.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 08:07:05,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.74 | bwd_microstep: 1278.09 | bwd_inner_microstep: 1278.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1939
[2024-06-10 08:07:06,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.68 | bwd_microstep: 743.67 | bwd_inner_microstep: 743.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748
[2024-06-10 08:07:08,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 1444.84 | bwd_inner_microstep: 1444.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 775
[2024-06-10 08:07:08,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.14 | bwd_microstep: 306.45 | bwd_inner_microstep: 306.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 753
[2024-06-10 08:07:09,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 117.26 | bwd_microstep: 302.88 | bwd_inner_microstep: 302.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4080
[2024-06-10 08:07:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.49 | bwd_microstep: 1628.38 | bwd_inner_microstep: 1628.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405
[2024-06-10 08:07:13,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 1372.13 | bwd_inner_microstep: 1372.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 08:07:15,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1510.33 | bwd_inner_microstep: 1510.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3399
[2024-06-10 08:07:17,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.77 | bwd_microstep: 1358.57 | bwd_inner_microstep: 1358.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 08:07:19,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1479.69 | bwd_inner_microstep: 1479.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3104
[2024-06-10 08:07:20,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.30 | bwd_microstep: 1058.97 | bwd_inner_microstep: 1058.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 08:07:22,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1315.22 | bwd_inner_microstep: 1315.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 08:07:24,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1416.38 | bwd_inner_microstep: 1416.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 08:07:26,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.03 | bwd_microstep: 1386.36 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 08:07:28,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.53 | bwd_microstep: 1658.23 | bwd_inner_microstep: 1658.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 08:07:30,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1294.49 | bwd_inner_microstep: 1294.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514
[2024-06-10 08:07:32,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1421.75 | bwd_inner_microstep: 1421.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3725
[2024-06-10 08:07:34,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1369.56 | bwd_inner_microstep: 1369.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 608
[2024-06-10 08:07:34,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.66 | bwd_microstep: 262.34 | bwd_inner_microstep: 262.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:07:36,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1556.59 | bwd_inner_microstep: 1556.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 08:07:38,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1463.19 | bwd_inner_microstep: 1463.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3610
[2024-06-10 08:07:41,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1642.70 | bwd_inner_microstep: 1642.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3622
[2024-06-10 08:07:43,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1468.27 | bwd_inner_microstep: 1468.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 08:07:45,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.50 | bwd_microstep: 1354.32 | bwd_inner_microstep: 1354.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 08:07:47,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.66 | bwd_microstep: 1599.41 | bwd_inner_microstep: 1599.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 08:07:48,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.27 | bwd_microstep: 976.21 | bwd_inner_microstep: 976.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804
[2024-06-10 08:07:51,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.90 | bwd_microstep: 1748.47 | bwd_inner_microstep: 1748.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3452
[2024-06-10 08:07:52,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1407.52 | bwd_inner_microstep: 1407.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2725
[2024-06-10 08:07:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.00 | bwd_microstep: 1137.44 | bwd_inner_microstep: 1137.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816
[2024-06-10 08:08:02,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 08:08:02,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 7046.25 | bwd_inner_microstep: 1560.94 | bwd_allreduce_microstep: 5485.24 | step_microstep: 38.98
[2024-06-10 08:08:02,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14872.60 | bwd: 45404.71 | bwd_inner: 39918.40 | bwd_allreduce: 5485.55 | step: 40.77
{'loss': 1.2834, 'learning_rate': 3.515068715450508e-05, 'epoch': 0.25}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1932
[2024-06-10 08:08:03,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.68 | bwd_microstep: 877.25 | bwd_inner_microstep: 877.10 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 08:08:05,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.15 | bwd_microstep: 1472.24 | bwd_inner_microstep: 1472.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 08:08:07,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1278.56 | bwd_inner_microstep: 1278.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 08:08:09,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.41 | bwd_microstep: 1651.56 | bwd_inner_microstep: 1651.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3481
[2024-06-10 08:08:11,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.16 | bwd_microstep: 1245.36 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 08:08:12,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.30 | bwd_microstep: 792.31 | bwd_inner_microstep: 792.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 08:08:13,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 797.84 | bwd_inner_microstep: 797.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016
[2024-06-10 08:08:14,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.78 | bwd_microstep: 899.61 | bwd_inner_microstep: 899.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 08:08:16,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1483.95 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 08:08:18,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1382.63 | bwd_inner_microstep: 1382.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 08:08:20,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1347.18 | bwd_inner_microstep: 1347.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3674
[2024-06-10 08:08:22,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.16 | bwd_microstep: 1788.27 | bwd_inner_microstep: 1788.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 08:08:24,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1417.76 | bwd_inner_microstep: 1417.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 08:08:25,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.99 | bwd_microstep: 799.26 | bwd_inner_microstep: 799.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841
[2024-06-10 08:08:28,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.30 | bwd_microstep: 1560.15 | bwd_inner_microstep: 1560.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 08:08:29,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1353.88 | bwd_inner_microstep: 1353.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3908
[2024-06-10 08:08:32,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.60 | bwd_microstep: 1685.89 | bwd_inner_microstep: 1685.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2413
[2024-06-10 08:08:33,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.69 | bwd_microstep: 1103.89 | bwd_inner_microstep: 1103.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 08:08:35,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.16 | bwd_microstep: 1568.73 | bwd_inner_microstep: 1568.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 08:08:38,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1535.34 | bwd_inner_microstep: 1535.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 08:08:40,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1381.10 | bwd_inner_microstep: 1381.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 08:08:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1284.59 | bwd_inner_microstep: 1284.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3610
[2024-06-10 08:08:43,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.26 | bwd_microstep: 1212.95 | bwd_inner_microstep: 1212.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 08:08:45,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1376.89 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811
[2024-06-10 08:08:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.95 | bwd_microstep: 1415.30 | bwd_inner_microstep: 1415.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2358
[2024-06-10 08:08:48,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.26 | bwd_microstep: 896.22 | bwd_inner_microstep: 896.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2276
[2024-06-10 08:08:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.33 | bwd_microstep: 907.64 | bwd_inner_microstep: 907.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2642
[2024-06-10 08:08:51,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.04 | bwd_microstep: 1166.99 | bwd_inner_microstep: 1166.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588
[2024-06-10 08:08:53,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.75 | bwd_microstep: 1566.55 | bwd_inner_microstep: 1566.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573
[2024-06-10 08:08:55,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.83 | bwd_microstep: 1568.46 | bwd_inner_microstep: 1568.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 08:08:57,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.93 | bwd_microstep: 1599.53 | bwd_inner_microstep: 1599.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3784
[2024-06-10 08:09:03,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 08:09:03,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.37 | bwd_microstep: 4572.15 | bwd_inner_microstep: 1984.23 | bwd_allreduce_microstep: 2587.87 | step_microstep: 38.01
[2024-06-10 08:09:03,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15719.50 | bwd: 44990.05 | bwd_inner: 42401.17 | bwd_allreduce: 2588.15 | step: 39.68
{'loss': 1.2642, 'learning_rate': 3.5126158494512926e-05, 'epoch': 0.25}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 08:09:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.78 | bwd_microstep: 1597.35 | bwd_inner_microstep: 1597.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 08:09:07,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.93 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933
[2024-06-10 08:09:08,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.21 | bwd_microstep: 819.81 | bwd_inner_microstep: 819.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875
[2024-06-10 08:09:09,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 706.68 | bwd_inner_microstep: 706.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 08:09:11,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.20 | bwd_microstep: 1178.31 | bwd_inner_microstep: 1178.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-10 08:09:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.02 | bwd_microstep: 806.27 | bwd_inner_microstep: 806.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 08:09:13,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 08:09:15,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.90 | bwd_microstep: 1152.29 | bwd_inner_microstep: 1152.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 889
[2024-06-10 08:09:16,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.07 | bwd_microstep: 368.60 | bwd_inner_microstep: 368.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 08:09:17,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.65 | bwd_microstep: 1343.82 | bwd_inner_microstep: 1343.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2692
[2024-06-10 08:09:19,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.42 | bwd_microstep: 1225.19 | bwd_inner_microstep: 1225.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 08:09:21,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.97 | bwd_microstep: 1426.82 | bwd_inner_microstep: 1426.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 08:09:23,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 08:09:25,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1288.50 | bwd_inner_microstep: 1288.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 08:09:26,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.07 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3507
[2024-06-10 08:09:28,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1353.03 | bwd_inner_microstep: 1353.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 08:09:30,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1400.76 | bwd_inner_microstep: 1400.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 08:09:32,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1397.84 | bwd_inner_microstep: 1397.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 08:09:34,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1515.60 | bwd_inner_microstep: 1515.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526
[2024-06-10 08:09:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.35 | bwd_microstep: 1417.35 | bwd_inner_microstep: 1417.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 08:09:38,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1558.24 | bwd_inner_microstep: 1558.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 08:09:40,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141
[2024-06-10 08:09:41,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.97 | bwd_microstep: 835.41 | bwd_inner_microstep: 835.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3600
[2024-06-10 08:09:43,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.71 | bwd_microstep: 1539.42 | bwd_inner_microstep: 1539.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535
[2024-06-10 08:09:45,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.34 | bwd_microstep: 1456.31 | bwd_inner_microstep: 1456.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 08:09:47,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1393.07 | bwd_inner_microstep: 1393.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816
[2024-06-10 08:09:50,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.96 | bwd_microstep: 1690.47 | bwd_inner_microstep: 1690.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-10 08:09:51,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.29 | bwd_microstep: 909.14 | bwd_inner_microstep: 909.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922
[2024-06-10 08:09:52,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.43 | bwd_microstep: 822.18 | bwd_inner_microstep: 822.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 08:09:54,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1345.31 | bwd_inner_microstep: 1345.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 08:09:56,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.88 | bwd_microstep: 1431.01 | bwd_inner_microstep: 1430.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3774
[2024-06-10 08:10:03,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 08:10:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 6260.86 | bwd_inner_microstep: 1777.96 | bwd_allreduce_microstep: 4482.85 | step_microstep: 38.06
[2024-06-10 08:10:03,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15036.57 | bwd: 44720.21 | bwd_inner: 40236.44 | bwd_allreduce: 4483.08 | step: 39.64
{'loss': 1.3244, 'learning_rate': 3.5101576560269195e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 08:10:05,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.98 | bwd_microstep: 1467.15 | bwd_inner_microstep: 1467.05 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3852
[2024-06-10 08:10:07,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1360.79 | bwd_inner_microstep: 1360.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 08:10:09,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1383.62 | bwd_inner_microstep: 1383.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 08:10:11,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.20 | bwd_microstep: 1539.53 | bwd_inner_microstep: 1539.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 08:10:13,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 08:10:14,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1244.59 | bwd_inner_microstep: 1244.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 08:10:16,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1245.84 | bwd_inner_microstep: 1245.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3475
[2024-06-10 08:10:18,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1229.43 | bwd_inner_microstep: 1229.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 08:10:20,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-10 08:10:21,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.91 | bwd_microstep: 728.64 | bwd_inner_microstep: 728.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3506
[2024-06-10 08:10:23,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 1437.15 | bwd_inner_microstep: 1437.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3953
[2024-06-10 08:10:25,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.82 | bwd_microstep: 1845.19 | bwd_inner_microstep: 1845.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3671
[2024-06-10 08:10:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.99 | bwd_microstep: 1657.60 | bwd_inner_microstep: 1657.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3428
[2024-06-10 08:10:29,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.47 | bwd_microstep: 1216.59 | bwd_inner_microstep: 1216.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 08:10:31,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1320.07 | bwd_inner_microstep: 1320.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 08:10:33,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1512.50 | bwd_inner_microstep: 1512.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 08:10:35,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 08:10:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.76 | bwd_microstep: 1391.51 | bwd_inner_microstep: 1391.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 08:10:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.17 | bwd_microstep: 1383.64 | bwd_inner_microstep: 1383.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:10:41,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1555.49 | bwd_inner_microstep: 1555.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 08:10:43,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1282.91 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 08:10:44,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1279.63 | bwd_inner_microstep: 1279.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782
[2024-06-10 08:10:46,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.60 | bwd_microstep: 1447.23 | bwd_inner_microstep: 1447.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 08:10:49,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.77 | bwd_microstep: 1486.83 | bwd_inner_microstep: 1486.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 08:10:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.39 | bwd_microstep: 1403.84 | bwd_inner_microstep: 1403.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 08:10:52,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1329.53 | bwd_inner_microstep: 1329.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607
[2024-06-10 08:10:54,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.69 | bwd_microstep: 1572.66 | bwd_inner_microstep: 1572.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-10 08:10:57,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 1610.89 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 08:10:59,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1514.25 | bwd_inner_microstep: 1514.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2280
[2024-06-10 08:11:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.94 | bwd_microstep: 1022.93 | bwd_inner_microstep: 1022.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583
[2024-06-10 08:11:02,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.21 | bwd_microstep: 1239.02 | bwd_inner_microstep: 1238.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3455
[2024-06-10 08:11:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.15 | optimizer_step: 6.63
[2024-06-10 08:11:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.65 | bwd_microstep: 2165.68 | bwd_inner_microstep: 1570.41 | bwd_allreduce_microstep: 595.23 | step_microstep: 37.81
[2024-06-10 08:11:05,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16552.34 | bwd: 44928.67 | bwd_inner: 44332.45 | bwd_allreduce: 595.50 | step: 39.41
{'loss': 1.3248, 'learning_rate': 3.507694143835132e-05, 'epoch': 0.25}
61.33s/it]


 25%|██▍       | 429/1726 [7:28:38<22:05:41, 61.33s/it]
 25%|██▍       | 430/1726 [7:29:38<21:56:29, 60.95s/it]


 25%|██▍       | 430/1726 [7:29:38<21:56:29, 60.95s/it]
 25%|██▍       | 431/1726 [7:30:38<21:53:23, 60.85s/it]


 25%|██▍       | 431/1726 [7:30:38<21:53:23, 60.85s/it]
 25%|██▌       | 432/1726 [7:31:39<21:53:42, 60.91s/it]


 25%|██▌       | 432/1726 [7:31:39<21:53:42, 60.91s/it]
 25%|██▌       | 433/1726 [7:32:40<21:47:27, 60.67s/it]


 25%|██▌       | 433/1726 [7:32:40<21:47:27, 60.67s/it]
 25%|██▌       | 434/1726 [7:33:41<21:53:53, 61.02s/it]


 25%|██▌       | 434/1726 [7:33:41<21:53:53, dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 08:11:06,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.45 | bwd_microstep: 1270.85 | bwd_inner_microstep: 1270.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3392
[2024-06-10 08:11:08,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.19 | bwd_microstep: 1146.54 | bwd_inner_microstep: 1146.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 08:11:10,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1548.40 | bwd_inner_microstep: 1548.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 08:11:12,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.03 | bwd_microstep: 1562.38 | bwd_inner_microstep: 1562.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 08:11:14,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1437.87 | bwd_inner_microstep: 1437.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 08:11:16,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1246.52 | bwd_inner_microstep: 1246.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895
[2024-06-10 08:11:17,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.56 | bwd_microstep: 777.02 | bwd_inner_microstep: 776.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 08:11:19,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1286.35 | bwd_inner_microstep: 1286.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088
[2024-06-10 08:11:20,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.77 | bwd_microstep: 729.78 | bwd_inner_microstep: 729.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1991
[2024-06-10 08:11:21,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.07 | bwd_microstep: 862.68 | bwd_inner_microstep: 862.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 08:11:23,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1286.65 | bwd_inner_microstep: 1286.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 08:11:25,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.17 | bwd_microstep: 1378.42 | bwd_inner_microstep: 1378.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 08:11:27,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1513.74 | bwd_inner_microstep: 1513.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2815
[2024-06-10 08:11:28,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.80 | bwd_microstep: 1078.21 | bwd_inner_microstep: 1078.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883
[2024-06-10 08:11:29,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.11 | bwd_microstep: 709.74 | bwd_inner_microstep: 709.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 08:11:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 791.62 | bwd_inner_microstep: 791.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 08:11:32,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.40 | bwd_microstep: 1156.03 | bwd_inner_microstep: 1156.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3833
[2024-06-10 08:11:34,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1358.45 | bwd_inner_microstep: 1358.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 634
[2024-06-10 08:11:34,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.65 | bwd_microstep: 264.78 | bwd_inner_microstep: 264.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3439
[2024-06-10 08:11:36,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1298.98 | bwd_inner_microstep: 1298.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 08:11:38,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.10 | bwd_microstep: 1460.75 | bwd_inner_microstep: 1460.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:11:40,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1557.27 | bwd_inner_microstep: 1557.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 08:11:42,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3845
[2024-06-10 08:11:45,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.32 | bwd_microstep: 1698.75 | bwd_inner_microstep: 1698.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2286
[2024-06-10 08:11:46,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.34 | bwd_microstep: 783.80 | bwd_inner_microstep: 783.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3806
[2024-06-10 08:11:48,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.85 | bwd_microstep: 1413.52 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-10 08:11:50,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1454.58 | bwd_inner_microstep: 1454.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3682
[2024-06-10 08:11:52,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1458.72 | bwd_inner_microstep: 1458.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 08:11:54,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1500.22 | bwd_inner_microstep: 1500.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 08:11:56,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1607.64 | bwd_inner_microstep: 1607.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 08:11:58,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1375.80 | bwd_inner_microstep: 1375.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-10 08:12:05,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 08:12:05,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.88 | bwd_microstep: 6290.94 | bwd_inner_microstep: 1752.60 | bwd_allreduce_microstep: 4538.29 | step_microstep: 38.26
[2024-06-10 08:12:05,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15072.64 | bwd: 44953.67 | bwd_inner: 40414.46 | bwd_allreduce: 4538.53 | step: 39.83
{'loss': 1.2968, 'learning_rate': 3.5052253215524086e-05, 'epoch': 0.25}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 08:12:07,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1347.18 | bwd_inner_microstep: 1346.97 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 08:12:09,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1246.68 | bwd_inner_microstep: 1246.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866
[2024-06-10 08:12:11,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.15 | bwd_microstep: 1564.97 | bwd_inner_microstep: 1564.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 08:12:12,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 08:12:15,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1479.55 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 08:12:16,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1958
[2024-06-10 08:12:17,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.16 | bwd_microstep: 764.34 | bwd_inner_microstep: 764.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113
[2024-06-10 08:12:18,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.42 | bwd_microstep: 828.44 | bwd_inner_microstep: 828.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 08:12:20,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.94 | bwd_microstep: 1286.48 | bwd_inner_microstep: 1286.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 08:12:22,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.93 | bwd_microstep: 1384.71 | bwd_inner_microstep: 1384.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 08:12:24,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1528.19 | bwd_inner_microstep: 1528.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3510
[2024-06-10 08:12:26,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1445.35 | bwd_inner_microstep: 1445.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 08:12:28,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.23 | bwd_microstep: 1612.64 | bwd_inner_microstep: 1612.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 08:12:31,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.20 | bwd_microstep: 1611.63 | bwd_inner_microstep: 1611.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3567
[2024-06-10 08:12:33,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-10 08:12:35,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.90 | bwd_microstep: 1617.46 | bwd_inner_microstep: 1617.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3528
[2024-06-10 08:12:37,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.63 | bwd_microstep: 1619.33 | bwd_inner_microstep: 1619.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.32
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 08:12:39,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1376.24 | bwd_inner_microstep: 1376.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 08:12:41,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1508.26 | bwd_inner_microstep: 1508.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 08:12:43,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1490.56 | bwd_inner_microstep: 1490.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 08:12:45,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.20 | bwd_microstep: 1402.16 | bwd_inner_microstep: 1402.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2116
[2024-06-10 08:12:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.02 | bwd_microstep: 892.41 | bwd_inner_microstep: 892.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 08:12:49,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.11 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-10 08:12:51,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1517.86 | bwd_inner_microstep: 1517.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3713
[2024-06-10 08:12:53,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1495.74 | bwd_inner_microstep: 1495.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 08:12:55,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1490.78 | bwd_inner_microstep: 1490.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 08:12:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1258.55 | bwd_inner_microstep: 1258.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 08:12:59,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.63 | bwd_microstep: 1644.17 | bwd_inner_microstep: 1644.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 08:13:01,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1412.01 | bwd_inner_microstep: 1411.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3638
[2024-06-10 08:13:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1650.33 | bwd_inner_microstep: 1650.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3608
[2024-06-10 08:13:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.49 | bwd_microstep: 1534.60 | bwd_inner_microstep: 1534.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 08:13:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 08:13:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.15 | bwd_microstep: 1472.91 | bwd_inner_microstep: 1464.84 | bwd_allreduce_microstep: 8.03 | step_microstep: 38.63
[2024-06-10 08:13:07,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16791.16 | bwd: 45046.66 | bwd_inner: 45037.56 | bwd_allreduce: 8.34 | step: 42.55
{'loss': 1.2959, 'learning_rate': 3.502751197873927e-05, 'epoch': 0.25}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 08:13:09,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1243.64 | bwd_inner_microstep: 1243.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 08:13:11,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1383.97 | bwd_inner_microstep: 1383.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 08:13:12,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 790.31 | bwd_inner_microstep: 790.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:13:14,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1552.91 | bwd_inner_microstep: 1552.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 08:13:16,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1386.02 | bwd_inner_microstep: 1386.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 08:13:18,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.15 | bwd_microstep: 1385.69 | bwd_inner_microstep: 1385.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2247
[2024-06-10 08:13:19,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.01 | bwd_microstep: 868.89 | bwd_inner_microstep: 868.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3730
[2024-06-10 08:13:21,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1438.56 | bwd_inner_microstep: 1438.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2450
[2024-06-10 08:13:22,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.08 | bwd_microstep: 919.78 | bwd_inner_microstep: 919.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 08:13:24,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.18 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-10 08:13:26,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.88 | bwd_microstep: 1628.32 | bwd_inner_microstep: 1628.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3412
[2024-06-10 08:13:28,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.89 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 08:13:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.56 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3446
[2024-06-10 08:13:32,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.26 | bwd_microstep: 1301.14 | bwd_inner_microstep: 1301.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 08:13:33,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.87 | bwd_microstep: 1281.86 | bwd_inner_microstep: 1281.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3526
[2024-06-10 08:13:35,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.36 | bwd_microstep: 1436.41 | bwd_inner_microstep: 1436.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 08:13:37,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.00 | bwd_microstep: 1318.37 | bwd_inner_microstep: 1318.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 647
[2024-06-10 08:13:38,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.76 | bwd_microstep: 274.48 | bwd_inner_microstep: 274.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 08:13:40,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1404.48 | bwd_inner_microstep: 1404.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 08:13:41,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1297.18 | bwd_inner_microstep: 1297.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 08:13:43,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1404.47 | bwd_inner_microstep: 1404.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 08:13:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1406.49 | bwd_inner_microstep: 1406.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 08:13:47,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1393.58 | bwd_inner_microstep: 1393.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-10 08:13:49,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1327.71 | bwd_inner_microstep: 1327.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1914
[2024-06-10 08:13:50,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.31 | bwd_microstep: 749.81 | bwd_inner_microstep: 749.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 08:13:52,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1604.87 | bwd_inner_microstep: 1604.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572
[2024-06-10 08:13:54,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.98 | bwd_microstep: 1565.67 | bwd_inner_microstep: 1565.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3769
[2024-06-10 08:13:57,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1604.82 | bwd_inner_microstep: 1604.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3813
[2024-06-10 08:13:59,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.26 | bwd_microstep: 1817.20 | bwd_inner_microstep: 1817.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 08:14:01,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1560.01 | bwd_inner_microstep: 1559.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 08:14:03,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1638.06 | bwd_inner_microstep: 1638.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2264
[2024-06-10 08:14:08,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 08:14:08,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.99 | bwd_microstep: 3869.61 | bwd_inner_microstep: 1174.72 | bwd_allreduce_microstep: 2694.84 | step_microstep: 38.16
[2024-06-10 08:14:08,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.13 | bwd: 44612.07 | bwd_inner: 41916.32 | bwd_allreduce: 2695.07 | step: 39.72
{'loss': 1.2882, 'learning_rate': 3.500271781513539e-05, 'epoch': 0.25}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 08:14:10,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1280.82 | bwd_inner_microstep: 1280.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 08:14:11,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.69 | bwd_microstep: 1152.48 | bwd_inner_microstep: 1152.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-10 08:14:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.16 | bwd_microstep: 1557.91 | bwd_inner_microstep: 1557.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 08:14:15,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.62 | bwd_microstep: 1277.46 | bwd_inner_microstep: 1277.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 08:14:17,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.55 | bwd_microstep: 1383.96 | bwd_inner_microstep: 1383.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4128
[2024-06-10 08:14:19,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.52 | bwd_microstep: 1601.16 | bwd_inner_microstep: 1601.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 08:14:21,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1483.63 | bwd_inner_microstep: 1483.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 08:14:22,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 794.02 | bwd_inner_microstep: 793.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 08:14:23,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.19 | bwd_microstep: 796.25 | bwd_inner_microstep: 796.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3797
[2024-06-10 08:14:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.57 | bwd_microstep: 1746.16 | bwd_inner_microstep: 1746.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3412
[2024-06-10 08:14:28,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1444.40 | bwd_inner_microstep: 1444.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 08:14:30,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.99 | bwd_microstep: 1485.76 | bwd_inner_microstep: 1485.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3434
[2024-06-10 08:14:32,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1542.04 | bwd_inner_microstep: 1542.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2871
[2024-06-10 08:14:34,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.64 | bwd_microstep: 1080.07 | bwd_inner_microstep: 1080.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 08:14:35,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1315.56 | bwd_inner_microstep: 1315.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3411
[2024-06-10 08:14:37,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.55 | bwd_microstep: 1299.89 | bwd_inner_microstep: 1299.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3497
[2024-06-10 08:14:39,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.47 | bwd_microstep: 1367.49 | bwd_inner_microstep: 1367.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 08:14:41,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.15 | bwd_microstep: 1617.08 | bwd_inner_microstep: 1617.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 08:14:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.48 | bwd_microstep: 1312.80 | bwd_inner_microstep: 1312.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 08:14:45,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.73 | bwd_microstep: 1165.18 | bwd_inner_microstep: 1165.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2083
[2024-06-10 08:14:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.15 | bwd_microstep: 759.49 | bwd_inner_microstep: 759.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103
[2024-06-10 08:14:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.58 | bwd_microstep: 827.83 | bwd_inner_microstep: 827.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1918
[2024-06-10 08:14:48,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.97 | bwd_microstep: 720.97 | bwd_inner_microstep: 720.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 08:14:50,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1395.98 | bwd_inner_microstep: 1395.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 08:14:52,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.92 | bwd_microstep: 1355.63 | bwd_inner_microstep: 1355.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679
[2024-06-10 08:14:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1327.82 | bwd_inner_microstep: 1327.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 08:14:56,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1461.70 | bwd_inner_microstep: 1461.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3726
[2024-06-10 08:14:58,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1469.73 | bwd_inner_microstep: 1469.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 08:15:00,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.78 | bwd_microstep: 1637.98 | bwd_inner_microstep: 1637.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291
[2024-06-10 08:15:01,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.16 | bwd_microstep: 913.85 | bwd_inner_microstep: 913.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3816
[2024-06-10 08:15:04,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.65 | bwd_microstep: 1725.21 | bwd_inner_microstep: 1725.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3819
[2024-06-10 08:15:47,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 08:15:47,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.81 | bwd_microstep: 43331.91 | bwd_inner_microstep: 1617.81 | bwd_allreduce_microstep: 41714.03 | step_microstep: 38.95
[2024-06-10 08:15:47,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15635.64 | bwd: 83632.24 | bwd_inner: 41917.25 | bwd_allreduce: 41714.26 | step: 40.71
{'loss': 1.2628, 'learning_rate': 3.4977870812037355e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 08:15:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1473.23 | bwd_inner_microstep: 1473.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 08:15:51,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.67 | bwd_microstep: 1238.66 | bwd_inner_microstep: 1238.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3909
[2024-06-10 08:15:53,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1514.82 | bwd_inner_microstep: 1514.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 08:15:55,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.98 | bwd_microstep: 1478.80 | bwd_inner_microstep: 1478.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 08:15:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.41 | bwd_microstep: 1274.29 | bwd_inner_microstep: 1274.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 08:15:59,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1276.62 | bwd_inner_microstep: 1276.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 08:16:01,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1338.56 | bwd_inner_microstep: 1338.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 08:16:02,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-10 08:16:04,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.59 | bwd_microstep: 817.68 | bwd_inner_microstep: 817.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4119
[2024-06-10 08:16:06,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.96 | bwd_microstep: 1639.69 | bwd_inner_microstep: 1639.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3717
[2024-06-10 08:16:08,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.96 | bwd_microstep: 1590.00 | bwd_inner_microstep: 1589.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3985
[2024-06-10 08:16:10,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.26 | bwd_microstep: 1594.73 | bwd_inner_microstep: 1594.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3698
[2024-06-10 08:16:13,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.41 | bwd_microstep: 1720.81 | bwd_inner_microstep: 1720.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 08:16:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.51 | bwd_microstep: 1516.51 | bwd_inner_microstep: 1516.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3653
[2024-06-10 08:16:17,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.14 | bwd_microstep: 1710.03 | bwd_inner_microstep: 1710.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 08:16:19,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1447.78 | bwd_inner_microstep: 1447.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2617
[2024-06-10 08:16:21,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.96 | bwd_microstep: 1044.47 | bwd_inner_microstep: 1044.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 08:16:23,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1506.17 | bwd_inner_microstep: 1506.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978
[2024-06-10 08:16:24,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.99 | bwd_microstep: 734.79 | bwd_inner_microstep: 734.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3541
[2024-06-10 08:16:25,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 08:16:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.53 | bwd_microstep: 1596.44 | bwd_inner_microstep: 1596.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 08:16:30,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.19 | bwd_microstep: 1355.27 | bwd_inner_microstep: 1355.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-10 08:16:31,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.02 | bwd_microstep: 878.84 | bwd_inner_microstep: 878.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 08:16:33,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.14 | bwd_microstep: 1304.02 | bwd_inner_microstep: 1303.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 08:16:34,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1388.25 | bwd_inner_microstep: 1388.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 08:16:36,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.18 | bwd_microstep: 1189.67 | bwd_inner_microstep: 1189.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 08:16:38,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1380.12 | bwd_inner_microstep: 1380.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728
[2024-06-10 08:16:40,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.77 | bwd_microstep: 1462.51 | bwd_inner_microstep: 1462.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3401
[2024-06-10 08:16:42,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1388.52 | bwd_inner_microstep: 1388.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 08:16:44,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.98 | bwd_microstep: 1630.18 | bwd_inner_microstep: 1630.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 08:16:46,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.70 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3793
[2024-06-10 08:16:48,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.66
[2024-06-10 08:16:48,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1535.26 | bwd_inner_microstep: 1527.33 | bwd_allreduce_microstep: 7.88 | step_microstep: 37.75
[2024-06-10 08:16:48,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16445.52 | bwd: 44010.89 | bwd_inner: 44002.08 | bwd_allreduce: 8.11 | step: 39.36
{'loss': 1.2565, 'learning_rate': 3.4952971056956186e-05, 'epoch': 0.25}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3538
[2024-06-10 08:16:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1586.83 | bwd_inner_microstep: 1586.75 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3983
[2024-06-10 08:16:53,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.55 | bwd_microstep: 1605.15 | bwd_inner_microstep: 1605.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 08:16:55,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1479.57 | bwd_inner_microstep: 1479.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 08:16:57,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1556.45 | bwd_inner_microstep: 1556.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 782
[2024-06-10 08:16:57,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 127.75 | bwd_microstep: 311.82 | bwd_inner_microstep: 311.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758
[2024-06-10 08:17:00,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.01 | bwd_microstep: 1634.99 | bwd_inner_microstep: 1634.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 08:17:01,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.42 | bwd_microstep: 1187.42 | bwd_inner_microstep: 1187.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3525
[2024-06-10 08:17:03,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1438.94 | bwd_inner_microstep: 1438.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 08:17:05,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.82 | bwd_microstep: 1614.00 | bwd_inner_microstep: 1613.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663
[2024-06-10 08:17:08,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1615.38 | bwd_inner_microstep: 1615.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-10 08:17:10,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.65 | bwd_microstep: 1623.54 | bwd_inner_microstep: 1623.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3386
[2024-06-10 08:17:12,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.34 | bwd_microstep: 1272.63 | bwd_inner_microstep: 1272.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 08:17:14,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.52 | bwd_microstep: 1619.43 | bwd_inner_microstep: 1619.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 08:17:16,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.48 | bwd_microstep: 1348.72 | bwd_inner_microstep: 1348.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 08:17:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.10 | bwd_microstep: 1573.03 | bwd_inner_microstep: 1573.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3651
[2024-06-10 08:17:20,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1545.58 | bwd_inner_microstep: 1545.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468
[2024-06-10 08:17:22,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.61 | bwd_microstep: 1244.50 | bwd_inner_microstep: 1244.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-10 08:17:23,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1196.96 | bwd_inner_microstep: 1196.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3442
[2024-06-10 08:17:25,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.68 | bwd_microstep: 1379.80 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 08:17:26,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.88 | bwd_microstep: 695.92 | bwd_inner_microstep: 695.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 08:17:28,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1374.16 | bwd_inner_microstep: 1374.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 08:17:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.36 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 08:17:32,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1400.07 | bwd_inner_microstep: 1400.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 08:17:34,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1554.71 | bwd_inner_microstep: 1554.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 08:17:36,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1381.02 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2170
[2024-06-10 08:17:37,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.80 | bwd_microstep: 886.32 | bwd_inner_microstep: 886.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-10 08:17:39,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1415.02 | bwd_inner_microstep: 1415.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 08:17:41,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1379.89 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014
[2024-06-10 08:17:42,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.69 | bwd_microstep: 898.38 | bwd_inner_microstep: 898.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3815
[2024-06-10 08:17:45,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.37 | bwd_microstep: 1854.71 | bwd_inner_microstep: 1854.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 08:17:47,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1549.94 | bwd_inner_microstep: 1549.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3435
[2024-06-10 08:17:50,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 08:17:50,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.43 | bwd_microstep: 2736.55 | bwd_inner_microstep: 1340.26 | bwd_allreduce_microstep: 1396.24 | step_microstep: 37.88
[2024-06-10 08:17:50,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16343.49 | bwd: 45245.94 | bwd_inner: 43848.72 | bwd_allreduce: 1396.52 | step: 39.58
61.02s/it]
 25%|██▌       | 435/1726 [7:34:42<21:48:34, 60.82s/it]


 25%|██▌       | 435/1726 [7:34:42<21:48:34, 60.82s/it]
 25%|██▌       | 436/1726 [7:35:44<21:56:27, 61.23s/it]


 25%|██▌       | 436/1726 [7:35:44<21:56:27, 61.23s/it]
 25%|██▌       | 437/1726 [7:36:45<21:51:28, 61.05s/it]


 25%|██▌       | 437/1726 [7:36:45<21:51:28, 61.05s/it]
 25%|██▌       | 438/1726 [7:38:24<25:58:55, 72.62s/it]


 25%|██▌       | 438/1726 [7:38:24<25:58:55, 72.62s/it]
 25%|██▌       | 439/1726 [7:39:25<24:41:41, 69.08s/it]


 25%|██▌       | 439/1726 [7:39:25<24:41:41, 69.08s/it]
 25%|██▌       | 440/1726 [7:40:27<23:54:45, 66.94s/it]
                                              {'loss': 1.3234, 'learning_rate': 3.492801863758868e-05, 'epoch': 0.25}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 08:17:52,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.65 | bwd_microstep: 1472.35 | bwd_inner_microstep: 1472.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3938
[2024-06-10 08:17:54,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1592.54 | bwd_inner_microstep: 1592.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 08:17:56,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1249.81 | bwd_inner_microstep: 1249.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 08:17:58,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1483.68 | bwd_inner_microstep: 1483.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 08:18:00,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1284.17 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 08:18:02,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1541.04 | bwd_inner_microstep: 1541.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 08:18:04,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1446.83 | bwd_inner_microstep: 1446.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493
[2024-06-10 08:18:06,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.93 | bwd_microstep: 1192.20 | bwd_inner_microstep: 1192.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896
[2024-06-10 08:18:08,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.62 | bwd_microstep: 1583.72 | bwd_inner_microstep: 1583.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 08:18:10,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1389.13 | bwd_inner_microstep: 1389.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1886
[2024-06-10 08:18:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.22 | bwd_microstep: 773.30 | bwd_inner_microstep: 773.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 08:18:13,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.84 | bwd_microstep: 1521.15 | bwd_inner_microstep: 1521.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639
[2024-06-10 08:18:15,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1314.32 | bwd_inner_microstep: 1314.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 08:18:17,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1449.39 | bwd_inner_microstep: 1449.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 08:18:19,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1496.52 | bwd_inner_microstep: 1496.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3871
[2024-06-10 08:18:21,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.34 | bwd_microstep: 1558.84 | bwd_inner_microstep: 1558.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 08:18:23,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.41 | bwd_microstep: 1491.14 | bwd_inner_microstep: 1491.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 08:18:25,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1406.63 | bwd_inner_microstep: 1406.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1272
[2024-06-10 08:18:26,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 187.84 | bwd_microstep: 487.58 | bwd_inner_microstep: 487.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 08:18:28,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 08:18:30,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1557.01 | bwd_inner_microstep: 1556.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 08:18:32,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.91 | bwd_microstep: 1293.88 | bwd_inner_microstep: 1293.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 08:18:34,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1498.40 | bwd_inner_microstep: 1498.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3730
[2024-06-10 08:18:35,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1304.06 | bwd_inner_microstep: 1304.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510
[2024-06-10 08:18:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1347.90 | bwd_inner_microstep: 1347.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2724
[2024-06-10 08:18:39,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.21 | bwd_microstep: 1134.35 | bwd_inner_microstep: 1134.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 08:18:41,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1248.80 | bwd_inner_microstep: 1248.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1110
[2024-06-10 08:18:41,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.73 | bwd_microstep: 442.42 | bwd_inner_microstep: 442.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3577
[2024-06-10 08:18:43,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.77 | bwd_microstep: 1558.70 | bwd_inner_microstep: 1558.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 08:18:46,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1655.29 | bwd_inner_microstep: 1655.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3582
[2024-06-10 08:18:48,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.90 | bwd_microstep: 1661.36 | bwd_inner_microstep: 1661.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 08:18:50,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.19 | optimizer_step: 6.63
[2024-06-10 08:18:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.55 | bwd_microstep: 1698.17 | bwd_inner_microstep: 1690.36 | bwd_allreduce_microstep: 7.76 | step_microstep: 37.60
[2024-06-10 08:18:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16251.33 | bwd: 43520.92 | bwd_inner: 43512.26 | bwd_allreduce: 7.99 | step: 39.19
{'loss': 1.2654, 'learning_rate': 3.490301364181714e-05, 'epoch': 0.26}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 08:18:52,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1276.48 | bwd_inner_microstep: 1276.39 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398
[2024-06-10 08:18:54,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.04 | bwd_microstep: 1151.03 | bwd_inner_microstep: 1151.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 08:18:56,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1585.32 | bwd_inner_microstep: 1585.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 08:18:58,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.23 | bwd_microstep: 1560.48 | bwd_inner_microstep: 1560.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297
[2024-06-10 08:18:59,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.97 | bwd_microstep: 972.81 | bwd_inner_microstep: 972.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 08:19:01,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.38 | bwd_microstep: 1376.34 | bwd_inner_microstep: 1376.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-10 08:19:02,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.27 | bwd_microstep: 707.64 | bwd_inner_microstep: 707.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 08:19:03,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.97 | bwd_microstep: 799.24 | bwd_inner_microstep: 799.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3696
[2024-06-10 08:19:06,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.94 | bwd_microstep: 1656.98 | bwd_inner_microstep: 1656.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442
[2024-06-10 08:19:07,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.28 | bwd_microstep: 1317.14 | bwd_inner_microstep: 1317.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 08:19:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.10 | bwd_microstep: 1278.39 | bwd_inner_microstep: 1278.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 08:19:11,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1445.46 | bwd_inner_microstep: 1445.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491
[2024-06-10 08:19:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1506.94 | bwd_inner_microstep: 1506.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1966
[2024-06-10 08:19:14,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.53 | bwd_microstep: 762.58 | bwd_inner_microstep: 762.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975
[2024-06-10 08:19:15,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 705.10 | bwd_inner_microstep: 705.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3843
[2024-06-10 08:19:17,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1365.22 | bwd_inner_microstep: 1365.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2735
[2024-06-10 08:19:19,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.74 | bwd_microstep: 944.54 | bwd_inner_microstep: 944.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2082
[2024-06-10 08:19:20,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.23 | bwd_microstep: 916.99 | bwd_inner_microstep: 916.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 08:19:22,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1287.10 | bwd_inner_microstep: 1287.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 08:19:24,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1516.95 | bwd_inner_microstep: 1516.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 537
[2024-06-10 08:19:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.49 | bwd_microstep: 245.33 | bwd_inner_microstep: 245.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2092
[2024-06-10 08:19:25,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.56 | bwd_microstep: 790.75 | bwd_inner_microstep: 790.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 08:19:26,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.59 | bwd_microstep: 700.00 | bwd_inner_microstep: 699.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905
[2024-06-10 08:19:27,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.45 | bwd_microstep: 686.59 | bwd_inner_microstep: 686.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2183
[2024-06-10 08:19:28,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.25 | bwd_microstep: 808.76 | bwd_inner_microstep: 808.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2011
[2024-06-10 08:19:29,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.94 | bwd_microstep: 841.00 | bwd_inner_microstep: 840.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 08:19:31,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1383.40 | bwd_inner_microstep: 1383.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769
[2024-06-10 08:19:33,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1570.30 | bwd_inner_microstep: 1570.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 08:19:36,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1498.01 | bwd_inner_microstep: 1497.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 08:19:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.23 | bwd_microstep: 973.40 | bwd_inner_microstep: 973.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570
[2024-06-10 08:19:39,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1524.23 | bwd_inner_microstep: 1524.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 08:19:51,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.37 | optimizer_step: 6.63
[2024-06-10 08:19:51,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 11121.23 | bwd_inner_microstep: 1749.39 | bwd_allreduce_microstep: 9371.77 | step_microstep: 39.03
[2024-06-10 08:19:51,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 13809.40 | bwd: 46275.79 | bwd_inner: 36903.02 | bwd_allreduce: 9372.06 | step: 40.86
{'loss': 1.2614, 'learning_rate': 3.4877956157709024e-05, 'epoch': 0.26}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 08:19:53,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.19 | bwd_microstep: 1375.62 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 08:19:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.93 | bwd_microstep: 1474.23 | bwd_inner_microstep: 1474.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3818
[2024-06-10 08:19:57,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1511.21 | bwd_inner_microstep: 1511.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 08:19:59,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.88 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-10 08:20:01,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.20 | bwd_microstep: 1535.03 | bwd_inner_microstep: 1535.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 08:20:03,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.30 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733
[2024-06-10 08:20:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1428.66 | bwd_inner_microstep: 1428.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-10 08:20:06,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.27 | bwd_microstep: 1190.39 | bwd_inner_microstep: 1190.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2373
[2024-06-10 08:20:08,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.42 | bwd_microstep: 998.52 | bwd_inner_microstep: 998.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3589
[2024-06-10 08:20:10,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.23 | bwd_microstep: 1366.16 | bwd_inner_microstep: 1366.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184
[2024-06-10 08:20:11,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.19 | bwd_microstep: 951.13 | bwd_inner_microstep: 951.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 08:20:13,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.68 | bwd_microstep: 1480.99 | bwd_inner_microstep: 1480.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 08:20:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.90 | bwd_microstep: 1610.85 | bwd_inner_microstep: 1610.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3656
[2024-06-10 08:20:18,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.56 | bwd_microstep: 1819.89 | bwd_inner_microstep: 1819.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 08:20:20,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1471.94 | bwd_inner_microstep: 1471.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 08:20:22,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1294.41 | bwd_inner_microstep: 1294.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2032
[2024-06-10 08:20:23,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.48 | bwd_microstep: 838.39 | bwd_inner_microstep: 838.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 08:20:25,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1610.37 | bwd_inner_microstep: 1610.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-10 08:20:27,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1332.45 | bwd_inner_microstep: 1332.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522
[2024-06-10 08:20:29,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1323.37 | bwd_inner_microstep: 1323.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 08:20:31,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1509.90 | bwd_inner_microstep: 1509.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555
[2024-06-10 08:20:32,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.25 | bwd_microstep: 1202.26 | bwd_inner_microstep: 1202.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 08:20:34,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.89 | bwd_microstep: 879.31 | bwd_inner_microstep: 879.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-10 08:20:35,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.07 | bwd_microstep: 1284.52 | bwd_inner_microstep: 1284.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 08:20:37,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 08:20:39,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1351.11 | bwd_inner_microstep: 1351.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 08:20:41,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1357.55 | bwd_inner_microstep: 1357.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 08:20:43,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1560.66 | bwd_inner_microstep: 1560.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2024
[2024-06-10 08:20:44,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.31 | bwd_microstep: 846.60 | bwd_inner_microstep: 846.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 08:20:46,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1404.26 | bwd_inner_microstep: 1404.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3592
[2024-06-10 08:20:48,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.21 | bwd_microstep: 1439.07 | bwd_inner_microstep: 1439.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 08:20:50,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 08:20:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.14 | bwd_microstep: 990.03 | bwd_inner_microstep: 816.64 | bwd_allreduce_microstep: 173.32 | step_microstep: 39.08
[2024-06-10 08:20:50,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15880.34 | bwd: 42599.71 | bwd_inner: 42425.32 | bwd_allreduce: 173.63 | step: 40.77
{'loss': 1.3337, 'learning_rate': 3.485284627351667e-05, 'epoch': 0.26}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 08:20:51,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.73 | bwd_microstep: 1142.21 | bwd_inner_microstep: 1142.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4055
[2024-06-10 08:20:53,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.35 | bwd_microstep: 1618.26 | bwd_inner_microstep: 1618.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 08:20:55,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1245.21 | bwd_inner_microstep: 1245.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 08:20:57,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1491.66 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 08:20:59,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.75 | bwd_microstep: 1662.38 | bwd_inner_microstep: 1662.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 08:21:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.09 | bwd_microstep: 1151.16 | bwd_inner_microstep: 1151.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 08:21:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1532.61 | bwd_inner_microstep: 1532.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443
[2024-06-10 08:21:05,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.53 | bwd_microstep: 1284.31 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 08:21:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 08:21:09,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 08:21:10,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 797.34 | bwd_inner_microstep: 797.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 08:21:11,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1258.21 | bwd_inner_microstep: 1258.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-10 08:21:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1287.18 | bwd_inner_microstep: 1287.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514
[2024-06-10 08:21:15,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.57 | bwd_microstep: 1449.77 | bwd_inner_microstep: 1449.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 08:21:17,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.90 | bwd_microstep: 1351.61 | bwd_inner_microstep: 1351.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2938
[2024-06-10 08:21:19,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.32 | bwd_microstep: 1181.09 | bwd_inner_microstep: 1181.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3481
[2024-06-10 08:21:21,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.36 | bwd_microstep: 1427.77 | bwd_inner_microstep: 1427.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 08:21:23,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.55 | bwd_microstep: 1500.53 | bwd_inner_microstep: 1500.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827
[2024-06-10 08:21:25,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.01 | bwd_microstep: 1754.69 | bwd_inner_microstep: 1754.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 08:21:27,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1382.20 | bwd_inner_microstep: 1382.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-10 08:21:29,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.90 | bwd_microstep: 1288.30 | bwd_inner_microstep: 1288.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3737
[2024-06-10 08:21:31,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3565
[2024-06-10 08:21:33,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.18 | bwd_microstep: 1237.39 | bwd_inner_microstep: 1237.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3734
[2024-06-10 08:21:34,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1242.15 | bwd_inner_microstep: 1242.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2409
[2024-06-10 08:21:36,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.58 | bwd_microstep: 1135.69 | bwd_inner_microstep: 1135.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3611
[2024-06-10 08:21:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.39 | bwd_microstep: 1539.85 | bwd_inner_microstep: 1539.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2201
[2024-06-10 08:21:39,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.90 | bwd_microstep: 955.68 | bwd_inner_microstep: 955.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411
[2024-06-10 08:21:41,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.26 | bwd_microstep: 1370.23 | bwd_inner_microstep: 1370.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 08:21:43,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.04 | bwd_microstep: 1648.43 | bwd_inner_microstep: 1648.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 08:21:45,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1399.38 | bwd_inner_microstep: 1399.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 08:21:47,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1508.64 | bwd_inner_microstep: 1508.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 08:21:53,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 08:21:53,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.24 | bwd_microstep: 4599.69 | bwd_inner_microstep: 1718.78 | bwd_allreduce_microstep: 2880.86 | step_microstep: 38.09
[2024-06-10 08:21:53,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16255.97 | bwd: 46449.91 | bwd_inner: 43568.14 | bwd_allreduce: 2881.09 | step: 39.66
{'loss': 1.2171, 'learning_rate': 3.482768407767695e-05, 'epoch': 0.26}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461
[2024-06-10 08:21:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.42 | bwd_microstep: 1564.12 | bwd_inner_microstep: 1563.92 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 08:21:57,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 08:21:59,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.17 | bwd_microstep: 1577.32 | bwd_inner_microstep: 1577.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 08:22:01,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.29 | bwd_microstep: 1240.34 | bwd_inner_microstep: 1240.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2452
[2024-06-10 08:22:02,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.99 | bwd_microstep: 1042.02 | bwd_inner_microstep: 1041.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 08:22:04,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.13 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998
[2024-06-10 08:22:05,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.70 | bwd_microstep: 707.33 | bwd_inner_microstep: 707.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2199
[2024-06-10 08:22:06,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.88 | bwd_microstep: 764.86 | bwd_inner_microstep: 764.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 08:22:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1246.71 | bwd_inner_microstep: 1246.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 08:22:09,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1248.86 | bwd_inner_microstep: 1248.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 08:22:11,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.53 | bwd_microstep: 1419.10 | bwd_inner_microstep: 1419.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 08:22:12,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.55 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 08:22:14,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1478.18 | bwd_inner_microstep: 1478.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3933
[2024-06-10 08:22:17,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.58 | bwd_microstep: 1557.40 | bwd_inner_microstep: 1557.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2305
[2024-06-10 08:22:18,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.10 | bwd_microstep: 1077.73 | bwd_inner_microstep: 1077.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 08:22:20,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1347.89 | bwd_inner_microstep: 1347.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 08:22:22,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1509.38 | bwd_inner_microstep: 1509.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3613
[2024-06-10 08:22:24,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.13 | bwd_microstep: 1466.53 | bwd_inner_microstep: 1466.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 08:22:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.62 | bwd_microstep: 1319.15 | bwd_inner_microstep: 1319.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 08:22:28,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.24 | bwd_microstep: 1451.67 | bwd_inner_microstep: 1451.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 08:22:29,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.00 | bwd_microstep: 977.11 | bwd_inner_microstep: 977.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1980
[2024-06-10 08:22:30,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.72 | bwd_microstep: 704.76 | bwd_inner_microstep: 704.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 08:22:32,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 08:22:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.98 | bwd_microstep: 1659.25 | bwd_inner_microstep: 1659.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 08:22:37,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.86 | bwd_microstep: 1595.88 | bwd_inner_microstep: 1595.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 08:22:39,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.39 | bwd_microstep: 1645.35 | bwd_inner_microstep: 1645.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2267
[2024-06-10 08:22:40,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.70 | bwd_microstep: 810.71 | bwd_inner_microstep: 810.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 08:22:42,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.80 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2916
[2024-06-10 08:22:44,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.45 | bwd_microstep: 1190.28 | bwd_inner_microstep: 1190.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2236
[2024-06-10 08:22:45,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.44 | bwd_microstep: 864.84 | bwd_inner_microstep: 864.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 08:22:47,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.89 | bwd_microstep: 1294.17 | bwd_inner_microstep: 1294.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 08:22:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 08:22:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.27 | bwd_microstep: 5682.22 | bwd_inner_microstep: 786.28 | bwd_allreduce_microstep: 4895.89 | step_microstep: 37.97
[2024-06-10 08:22:53,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14934.13 | bwd: 44745.18 | bwd_inner: 39848.24 | bwd_allreduce: 4896.19 | step: 39.79
{'loss': 1.3222, 'learning_rate': 3.4802469658810984e-05, 'epoch': 0.26}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3378
[2024-06-10 08:22:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.77 | bwd_microstep: 1232.88 | bwd_inner_microstep: 1232.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 08:22:56,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1377.12 | bwd_inner_microstep: 1377.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907
[2024-06-10 08:22:58,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843
[2024-06-10 08:23:00,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1455.21 | bwd_inner_microstep: 1455.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 08:23:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.13 | bwd_microstep: 1447.76 | bwd_inner_microstep: 1447.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 08:23:04,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1251.03 | bwd_inner_microstep: 1251.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 08:23:06,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.93 | bwd_microstep: 1147.43 | bwd_inner_microstep: 1147.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062
[2024-06-10 08:23:07,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.95 | bwd_microstep: 816.36 | bwd_inner_microstep: 816.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 08:23:09,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.39 | bwd_microstep: 1292.49 | bwd_inner_microstep: 1292.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1978
[2024-06-10 08:23:10,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.79 | bwd_microstep: 704.47 | bwd_inner_microstep: 704.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432
[2024-06-10 08:23:11,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 08:23:13,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1338.93 | bwd_inner_microstep: 1338.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3775
[2024-06-10 08:23:15,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.02 | bwd_microstep: 1568.59 | bwd_inner_microstep: 1568.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 08:23:17,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.56 | bwd_microstep: 1385.04 | bwd_inner_microstep: 1385.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-10 08:23:19,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1520.16 | bwd_inner_microstep: 1520.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 08:23:21,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.72 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3635
[2024-06-10 08:23:24,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.38 | bwd_microstep: 1711.45 | bwd_inner_microstep: 1711.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 08:23:25,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1277.83 | bwd_inner_microstep: 1277.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 08:23:27,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.94 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 08:23:30,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.65 | bwd_microstep: 1659.72 | bwd_inner_microstep: 1659.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931
[2024-06-10 08:23:32,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1495.42 | bwd_inner_microstep: 1495.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 08:23:34,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.06 | bwd_microstep: 1528.32 | bwd_inner_microstep: 1528.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3810
[2024-06-10 08:23:36,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.09 | bwd_microstep: 1582.20 | bwd_inner_microstep: 1582.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 08:23:37,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 704.85 | bwd_inner_microstep: 704.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 08:23:39,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1495.93 | bwd_inner_microstep: 1495.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 08:23:41,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.13 | bwd_microstep: 1397.58 | bwd_inner_microstep: 1397.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 08:23:43,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.36 | bwd_microstep: 1453.07 | bwd_inner_microstep: 1453.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3422
[2024-06-10 08:23:45,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.59 | bwd_microstep: 1198.71 | bwd_inner_microstep: 1198.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 08:23:47,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.62 | bwd_microstep: 1632.93 | bwd_inner_microstep: 1632.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 08:23:49,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.00 | bwd_microstep: 1487.29 | bwd_inner_microstep: 1487.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 08:23:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.84 | bwd_microstep: 1547.17 | bwd_inner_microstep: 1547.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 08:23:55,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.46 | optimizer_step: 6.60
[2024-06-10 08:23:55,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 3025.75 | bwd_inner_microstep: 1574.59 | bwd_allreduce_microstep: 1451.09 | step_microstep: 43.03
[2024-06-10 08:23:55,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16361.87 | bwd: 45308.33 | bwd_inner: 43856.30 | bwd_allreduce: 1451.33 | step: 44.79


 25%|██▌       | 440/1726 [7:40:27<23:54:45, 66.94s/it]
 26%|██▌       | 441/1726 [7:41:27<23:09:52, 64.90s/it]


 26%|██▌       | 441/1726 [7:41:27<23:09:52, 64.90s/it]
 26%|██▌       | 442/1726 [7:42:27<22:40:05, 63.56s/it]


 26%|██▌       | 442/1726 [7:42:27<22:40:05, 63.56s/it]
 26%|██▌       | 443/1726 [7:43:26<22:08:39, 62.14s/it]


 26%|██▌       | 443/1726 [7:43:26<22:08:39, 62.14s/it]
 26%|██▌       | 444/1726 [7:44:29<22:13:28, 62.41s/it]


 26%|██▌       | 444/1726 [7:44:29<22:13:28, 62.41s/it]
 26%|██▌       | 445/1726 [7:45:29<21:57:09, 61.69s/it]


 26%|██▌       | 445/1726 [7:45:29<21:57:09, 61.69s/it]
 26%|██▌       | 446/1726 [7:46:31<21:58{'loss': 1.2795, 'learning_rate': 3.477720310572383e-05, 'epoch': 0.26}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 08:23:57,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.22 | bwd_microstep: 1489.72 | bwd_inner_microstep: 1489.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2338
[2024-06-10 08:23:58,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.57 | bwd_microstep: 984.11 | bwd_inner_microstep: 984.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 08:24:00,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.72 | bwd_microstep: 1479.38 | bwd_inner_microstep: 1479.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 08:24:02,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.88 | bwd_microstep: 1475.58 | bwd_inner_microstep: 1475.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 08:24:03,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 787.21 | bwd_inner_microstep: 787.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2705
[2024-06-10 08:24:05,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.71 | bwd_microstep: 1032.34 | bwd_inner_microstep: 1032.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 08:24:07,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1296.70 | bwd_inner_microstep: 1296.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 08:24:08,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1313.01 | bwd_inner_microstep: 1312.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1454
[2024-06-10 08:24:09,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 210.12 | bwd_microstep: 540.34 | bwd_inner_microstep: 540.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-10 08:24:10,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.58 | bwd_microstep: 711.89 | bwd_inner_microstep: 711.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3668
[2024-06-10 08:24:13,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.06 | bwd_microstep: 1772.85 | bwd_inner_microstep: 1772.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 08:24:14,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.75 | bwd_microstep: 1408.37 | bwd_inner_microstep: 1408.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3712
[2024-06-10 08:24:17,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.21 | bwd_microstep: 1779.04 | bwd_inner_microstep: 1779.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 08:24:19,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1379.98 | bwd_inner_microstep: 1379.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:24:21,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1556.18 | bwd_inner_microstep: 1556.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 08:24:23,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.57 | bwd_microstep: 1296.51 | bwd_inner_microstep: 1296.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 08:24:25,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.76 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 08:24:26,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.25 | bwd_microstep: 803.76 | bwd_inner_microstep: 803.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967
[2024-06-10 08:24:27,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.93 | bwd_microstep: 734.08 | bwd_inner_microstep: 734.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-10 08:24:29,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.62 | bwd_microstep: 1515.08 | bwd_inner_microstep: 1515.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 08:24:31,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3723
[2024-06-10 08:24:33,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1466.20 | bwd_inner_microstep: 1466.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 08:24:35,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.53 | bwd_microstep: 1657.98 | bwd_inner_microstep: 1657.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 08:24:37,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1300.79 | bwd_inner_microstep: 1300.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 08:24:39,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.07 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3623
[2024-06-10 08:24:41,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.19 | bwd_microstep: 1579.36 | bwd_inner_microstep: 1579.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3709
[2024-06-10 08:24:43,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.11 | bwd_microstep: 1730.28 | bwd_inner_microstep: 1730.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 08:24:45,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1415.91 | bwd_inner_microstep: 1415.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 08:24:47,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.76 | bwd_microstep: 1555.19 | bwd_inner_microstep: 1555.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-10 08:24:50,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.40 | bwd_microstep: 1706.48 | bwd_inner_microstep: 1706.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 08:24:52,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.08 | bwd_microstep: 1635.37 | bwd_inner_microstep: 1635.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 08:24:56,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-10 08:24:56,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.86 | bwd_microstep: 3075.82 | bwd_inner_microstep: 1805.79 | bwd_allreduce_microstep: 1269.99 | step_microstep: 37.89
[2024-06-10 08:24:56,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16049.81 | bwd: 44449.92 | bwd_inner: 43178.97 | bwd_allreduce: 1270.23 | step: 39.50
{'loss': 1.2685, 'learning_rate': 3.475188450740417e-05, 'epoch': 0.26}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 08:24:57,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1336.19 | bwd_inner_microstep: 1336.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3600
[2024-06-10 08:25:00,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.06 | bwd_microstep: 1539.10 | bwd_inner_microstep: 1539.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3880
[2024-06-10 08:25:02,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1582.05 | bwd_inner_microstep: 1582.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 08:25:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.11 | bwd_microstep: 1651.66 | bwd_inner_microstep: 1651.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 08:25:06,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 08:25:08,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1347.83 | bwd_inner_microstep: 1347.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 08:25:10,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1531.80 | bwd_inner_microstep: 1531.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 08:25:12,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1384.54 | bwd_inner_microstep: 1384.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 08:25:13,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.72 | bwd_microstep: 1154.13 | bwd_inner_microstep: 1154.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1908
[2024-06-10 08:25:14,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.12 | bwd_microstep: 718.31 | bwd_inner_microstep: 718.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3693
[2024-06-10 08:25:16,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.39 | bwd_microstep: 1488.15 | bwd_inner_microstep: 1488.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 08:25:18,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3629
[2024-06-10 08:25:20,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 08:25:22,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 08:25:23,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.39 | bwd_microstep: 781.10 | bwd_inner_microstep: 781.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 08:25:25,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.93 | bwd_microstep: 1520.40 | bwd_inner_microstep: 1520.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-10 08:25:27,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.27 | bwd_microstep: 1313.45 | bwd_inner_microstep: 1313.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 08:25:29,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.67 | bwd_microstep: 1546.91 | bwd_inner_microstep: 1546.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 08:25:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1515.45 | bwd_inner_microstep: 1515.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 08:25:33,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1393.28 | bwd_inner_microstep: 1393.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 08:25:35,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1511.69 | bwd_inner_microstep: 1511.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2037
[2024-06-10 08:25:36,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.17 | bwd_microstep: 745.73 | bwd_inner_microstep: 745.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 08:25:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1454.25 | bwd_inner_microstep: 1454.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-10 08:25:40,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.09 | bwd_microstep: 1309.35 | bwd_inner_microstep: 1309.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2100
[2024-06-10 08:25:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 823.95 | bwd_inner_microstep: 823.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 08:25:43,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1281.27 | bwd_inner_microstep: 1281.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634
[2024-06-10 08:25:45,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.27 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3568
[2024-06-10 08:25:46,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1240.85 | bwd_inner_microstep: 1240.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3538
[2024-06-10 08:25:48,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.26 | bwd_microstep: 1536.22 | bwd_inner_microstep: 1536.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567
[2024-06-10 08:25:51,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.97 | bwd_microstep: 1594.36 | bwd_inner_microstep: 1594.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807
[2024-06-10 08:25:53,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.31 | bwd_microstep: 1608.50 | bwd_inner_microstep: 1608.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 08:25:57,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 08:25:57,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 3481.04 | bwd_inner_microstep: 1904.83 | bwd_allreduce_microstep: 1576.16 | step_microstep: 38.02
[2024-06-10 08:25:57,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16085.36 | bwd: 44999.63 | bwd_inner: 43422.57 | bwd_allreduce: 1576.39 | step: 39.56
{'loss': 1.2854, 'learning_rate': 3.4726513953023944e-05, 'epoch': 0.26}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 08:25:59,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1442.81 | bwd_inner_microstep: 1442.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 08:26:01,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.50 | bwd_microstep: 1249.07 | bwd_inner_microstep: 1249.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 08:26:02,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1284.93 | bwd_inner_microstep: 1284.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 08:26:04,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.04 | bwd_microstep: 1346.99 | bwd_inner_microstep: 1346.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-10 08:26:06,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.06 | bwd_microstep: 1374.61 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 08:26:08,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1383.60 | bwd_inner_microstep: 1383.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023
[2024-06-10 08:26:09,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.29 | bwd_microstep: 715.07 | bwd_inner_microstep: 715.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 963
[2024-06-10 08:26:10,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 149.69 | bwd_microstep: 385.81 | bwd_inner_microstep: 385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 08:26:11,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.23 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-10 08:26:13,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1426.30 | bwd_inner_microstep: 1426.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670
[2024-06-10 08:26:16,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.92 | bwd_microstep: 1716.43 | bwd_inner_microstep: 1716.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-10 08:26:18,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.23 | bwd_microstep: 1621.45 | bwd_inner_microstep: 1621.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 08:26:20,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.21 | bwd_microstep: 1611.99 | bwd_inner_microstep: 1611.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3682
[2024-06-10 08:26:22,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1481.03 | bwd_inner_microstep: 1481.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 08:26:24,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.10 | bwd_microstep: 1407.33 | bwd_inner_microstep: 1407.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3417
[2024-06-10 08:26:26,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.22 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 08:26:28,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.56 | bwd_microstep: 1500.94 | bwd_inner_microstep: 1500.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2561
[2024-06-10 08:26:30,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.04 | bwd_microstep: 1068.31 | bwd_inner_microstep: 1068.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2719
[2024-06-10 08:26:31,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.25 | bwd_microstep: 1034.72 | bwd_inner_microstep: 1034.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 08:26:32,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.81 | bwd_microstep: 702.62 | bwd_inner_microstep: 702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672
[2024-06-10 08:26:34,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1554.85 | bwd_inner_microstep: 1554.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-10 08:26:36,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.96 | bwd_microstep: 957.97 | bwd_inner_microstep: 957.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547
[2024-06-10 08:26:37,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1327.71 | bwd_inner_microstep: 1327.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 08:26:40,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 1638.88 | bwd_inner_microstep: 1638.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 08:26:42,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 08:26:44,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1512.10 | bwd_inner_microstep: 1512.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 08:26:46,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1407.48 | bwd_inner_microstep: 1407.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3745
[2024-06-10 08:26:48,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.52 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 08:26:50,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1376.92 | bwd_inner_microstep: 1376.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 08:26:52,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.93 | bwd_microstep: 1501.54 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3772
[2024-06-10 08:26:54,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.58 | bwd_microstep: 1465.80 | bwd_inner_microstep: 1465.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2274
[2024-06-10 08:26:56,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 08:26:56,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.76 | bwd_microstep: 2107.11 | bwd_inner_microstep: 1022.95 | bwd_allreduce_microstep: 1084.12 | step_microstep: 37.81
[2024-06-10 08:26:56,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15670.01 | bwd: 43151.21 | bwd_inner: 42066.19 | bwd_allreduce: 1084.34 | step: 39.37
{'loss': 1.2651, 'learning_rate': 3.470109153193815e-05, 'epoch': 0.26}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 08:26:58,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1339.17 | bwd_inner_microstep: 1339.10 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4668
[2024-06-10 08:27:00,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1482.82 | bwd_inner_microstep: 1482.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925
[2024-06-10 08:27:02,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.85 | bwd_microstep: 1589.74 | bwd_inner_microstep: 1589.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 08:27:04,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1246.97 | bwd_inner_microstep: 1246.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 08:27:06,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1432.83 | bwd_inner_microstep: 1432.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 08:27:08,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1344.77 | bwd_inner_microstep: 1344.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 08:27:10,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.35 | bwd_microstep: 1280.65 | bwd_inner_microstep: 1280.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 08:27:12,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1481.48 | bwd_inner_microstep: 1481.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 08:27:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910
[2024-06-10 08:27:15,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.11 | bwd_microstep: 753.87 | bwd_inner_microstep: 753.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3580
[2024-06-10 08:27:16,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1366.29 | bwd_inner_microstep: 1366.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 08:27:18,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1279.05 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3416
[2024-06-10 08:27:20,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1374.88 | bwd_inner_microstep: 1374.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 08:27:22,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2132
[2024-06-10 08:27:23,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.75 | bwd_microstep: 927.96 | bwd_inner_microstep: 927.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 08:27:25,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.82 | bwd_microstep: 1453.08 | bwd_inner_microstep: 1453.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 08:27:27,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1384.75 | bwd_inner_microstep: 1384.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 08:27:29,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.88 | bwd_microstep: 1492.36 | bwd_inner_microstep: 1492.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 08:27:31,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1462.71 | bwd_inner_microstep: 1462.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 08:27:32,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.64 | bwd_microstep: 801.53 | bwd_inner_microstep: 801.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 08:27:34,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1185.37 | bwd_inner_microstep: 1185.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 08:27:36,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1510.85 | bwd_inner_microstep: 1510.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 08:27:38,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1296.64 | bwd_inner_microstep: 1296.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 08:27:40,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1376.67 | bwd_inner_microstep: 1376.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603
[2024-06-10 08:27:42,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.15 | bwd_microstep: 1215.06 | bwd_inner_microstep: 1215.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3549
[2024-06-10 08:27:44,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.53 | bwd_microstep: 1540.60 | bwd_inner_microstep: 1540.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-10 08:27:46,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.33 | bwd_microstep: 1318.81 | bwd_inner_microstep: 1318.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 08:27:48,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1502.48 | bwd_inner_microstep: 1502.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3810
[2024-06-10 08:27:50,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1418.25 | bwd_inner_microstep: 1418.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3822
[2024-06-10 08:27:52,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.56 | bwd_microstep: 1722.96 | bwd_inner_microstep: 1722.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 08:27:54,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1486.58 | bwd_inner_microstep: 1486.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 08:27:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 08:27:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.00 | bwd_microstep: 3992.35 | bwd_inner_microstep: 1688.97 | bwd_allreduce_microstep: 2303.33 | step_microstep: 38.34
[2024-06-10 08:27:59,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16300.00 | bwd: 45837.05 | bwd_inner: 43532.75 | bwd_allreduce: 2303.59 | step: 40.14
{'loss': 1.2745, 'learning_rate': 3.467561733368439e-05, 'epoch': 0.26}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487
[2024-06-10 08:28:00,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1340.14 | bwd_inner_microstep: 1340.06 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 08:28:03,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1490.82 | bwd_inner_microstep: 1490.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 08:28:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.49 | bwd_microstep: 1381.19 | bwd_inner_microstep: 1381.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 08:28:07,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.87 | bwd_microstep: 1539.82 | bwd_inner_microstep: 1539.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 08:28:08,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.69 | bwd_microstep: 1187.03 | bwd_inner_microstep: 1187.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-10 08:28:10,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1415.61 | bwd_inner_microstep: 1415.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 08:28:12,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3448
[2024-06-10 08:28:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1299.04 | bwd_inner_microstep: 1299.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485
[2024-06-10 08:28:16,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.37 | bwd_microstep: 1345.38 | bwd_inner_microstep: 1345.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 08:28:18,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1470.02 | bwd_inner_microstep: 1470.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 08:28:20,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.25 | bwd_microstep: 1522.73 | bwd_inner_microstep: 1522.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3656
[2024-06-10 08:28:22,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.05 | bwd_microstep: 1560.79 | bwd_inner_microstep: 1560.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-10 08:28:24,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.35 | bwd_microstep: 1524.21 | bwd_inner_microstep: 1524.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 08:28:26,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1280.41 | bwd_inner_microstep: 1280.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-10 08:28:28,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1523.25 | bwd_inner_microstep: 1523.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3464
[2024-06-10 08:28:30,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.57 | bwd_microstep: 1606.10 | bwd_inner_microstep: 1606.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643
[2024-06-10 08:28:32,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.89 | bwd_microstep: 1318.22 | bwd_inner_microstep: 1318.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 08:28:34,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.45 | bwd_microstep: 1628.19 | bwd_inner_microstep: 1628.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 08:28:36,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.27 | bwd_microstep: 1500.75 | bwd_inner_microstep: 1500.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 08:28:38,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1410.33 | bwd_inner_microstep: 1410.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 08:28:40,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.48 | bwd_microstep: 1453.71 | bwd_inner_microstep: 1453.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 08:28:42,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.42 | bwd_microstep: 1416.67 | bwd_inner_microstep: 1416.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909
[2024-06-10 08:28:43,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.23 | bwd_microstep: 686.38 | bwd_inner_microstep: 686.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2080
[2024-06-10 08:28:44,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.94 | bwd_microstep: 727.18 | bwd_inner_microstep: 727.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 08:28:46,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.28 | bwd_microstep: 1423.51 | bwd_inner_microstep: 1423.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 08:28:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.70 | bwd_microstep: 1346.23 | bwd_inner_microstep: 1346.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3568
[2024-06-10 08:28:50,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.69 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-10 08:28:52,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.17 | bwd_microstep: 1446.58 | bwd_inner_microstep: 1446.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 08:28:54,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.81 | bwd_microstep: 1645.79 | bwd_inner_microstep: 1645.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 08:28:56,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1547.60 | bwd_inner_microstep: 1547.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782
[2024-06-10 08:28:58,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.66 | bwd_microstep: 1449.74 | bwd_inner_microstep: 1449.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-10 08:29:00,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 08:29:00,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.29 | bwd_microstep: 1576.83 | bwd_inner_microstep: 1569.10 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.69
[2024-06-10 08:29:00,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16723.07 | bwd: 44698.12 | bwd_inner: 44689.47 | bwd_allreduce: 7.95 | step: 39.44
{'loss': 1.269, 'learning_rate': 3.465009144798268e-05, 'epoch': 0.26}
:18, 61.80s/it]


 26%|██▌       | 446/1726 [7:46:31<21:58:18, 61.80s/it]
 26%|██▌       | 447/1726 [7:47:32<21:51:17, 61.51s/it]


 26%|██▌       | 447/1726 [7:47:32<21:51:17, 61.51s/it]
 26%|██▌       | 448/1726 [7:48:34<21:49:38, 61.49s/it]


 26%|██▌       | 448/1726 [7:48:34<21:49:38, 61.49s/it]
 26%|██▌       | 449/1726 [7:49:33<21:33:45, 60.79s/it]


 26%|██▌       | 449/1726 [7:49:33<21:33:45, 60.79s/it]
 26%|██▌       | 450/1726 [7:50:35<21:43:37, 61.30s/it]


 26%|██▌       | 450/1726 [7:50:35<21:43:37, 61.30s/it]
 26%|██▌       | 451/1726 [7:51:37<21:45:36, 61.44s/it]


 26%|██▌       | 451/1726 [7:51:37<21:45dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 08:29:02,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1279.72 | bwd_inner_microstep: 1279.58 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2632
[2024-06-10 08:29:04,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.17 | bwd_microstep: 1050.46 | bwd_inner_microstep: 1050.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 08:29:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1298.03 | bwd_inner_microstep: 1298.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 08:29:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 1645.38 | bwd_inner_microstep: 1645.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 08:29:09,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.04 | bwd_microstep: 1273.66 | bwd_inner_microstep: 1273.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 08:29:11,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.54 | bwd_microstep: 1443.44 | bwd_inner_microstep: 1443.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 08:29:13,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1243.89 | bwd_inner_microstep: 1243.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4035
[2024-06-10 08:29:15,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.13 | bwd_microstep: 1419.83 | bwd_inner_microstep: 1419.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884
[2024-06-10 08:29:16,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.15 | bwd_microstep: 682.44 | bwd_inner_microstep: 682.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 08:29:18,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.49 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 08:29:20,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1378.35 | bwd_inner_microstep: 1378.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3562
[2024-06-10 08:29:22,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.92 | bwd_microstep: 1447.17 | bwd_inner_microstep: 1447.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 08:29:24,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1538.70 | bwd_inner_microstep: 1538.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 08:29:26,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1389.49 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499
[2024-06-10 08:29:28,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 08:29:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1281.24 | bwd_inner_microstep: 1281.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 08:29:32,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1556.06 | bwd_inner_microstep: 1556.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101
[2024-06-10 08:29:33,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.25 | bwd_microstep: 826.91 | bwd_inner_microstep: 826.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 08:29:35,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1406.06 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 08:29:37,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.98 | bwd_microstep: 1286.56 | bwd_inner_microstep: 1286.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 08:29:39,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1507.74 | bwd_inner_microstep: 1507.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2254
[2024-06-10 08:29:40,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.84 | bwd_microstep: 871.37 | bwd_inner_microstep: 871.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 08:29:41,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.47 | bwd_microstep: 801.63 | bwd_inner_microstep: 801.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 08:29:43,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1297.08 | bwd_inner_microstep: 1297.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 08:29:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.02 | bwd_microstep: 1288.78 | bwd_inner_microstep: 1288.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 08:29:47,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.34 | bwd_microstep: 1656.98 | bwd_inner_microstep: 1656.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 08:29:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1397.87 | bwd_inner_microstep: 1397.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-10 08:29:50,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.21 | bwd_microstep: 908.74 | bwd_inner_microstep: 908.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 08:29:51,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.21 | bwd_microstep: 802.80 | bwd_inner_microstep: 802.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 08:29:53,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1487.92 | bwd_inner_microstep: 1487.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3949
[2024-06-10 08:29:56,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.23 | bwd_microstep: 1593.15 | bwd_inner_microstep: 1593.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3583
[2024-06-10 08:30:01,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.62
[2024-06-10 08:30:01,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.61 | bwd_microstep: 4271.29 | bwd_inner_microstep: 1782.33 | bwd_allreduce_microstep: 2488.90 | step_microstep: 38.89
[2024-06-10 08:30:01,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15620.81 | bwd: 44221.19 | bwd_inner: 41731.24 | bwd_allreduce: 2489.19 | step: 40.51
{'loss': 1.309, 'learning_rate': 3.462451396473505e-05, 'epoch': 0.26}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 08:30:02,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.87 | bwd_microstep: 879.20 | bwd_inner_microstep: 879.06 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3948
[2024-06-10 08:30:04,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.26 | bwd_microstep: 1592.64 | bwd_inner_microstep: 1592.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 08:30:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1381.32 | bwd_inner_microstep: 1381.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 08:30:08,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.56 | bwd_microstep: 1542.19 | bwd_inner_microstep: 1542.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 08:30:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1284.94 | bwd_inner_microstep: 1284.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 08:30:12,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1250.98 | bwd_inner_microstep: 1250.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 08:30:13,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 1335.07 | bwd_inner_microstep: 1335.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 08:30:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1385.90 | bwd_inner_microstep: 1385.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2657
[2024-06-10 08:30:17,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.75 | bwd_microstep: 1101.78 | bwd_inner_microstep: 1101.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 08:30:19,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1389.17 | bwd_inner_microstep: 1389.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 08:30:21,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1432.87 | bwd_inner_microstep: 1432.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-10 08:30:23,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1413.43 | bwd_inner_microstep: 1413.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3001
[2024-06-10 08:30:24,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1171.51 | bwd_inner_microstep: 1171.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461
[2024-06-10 08:30:26,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.93 | bwd_microstep: 1437.64 | bwd_inner_microstep: 1437.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687
[2024-06-10 08:30:28,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.58 | bwd_microstep: 1524.80 | bwd_inner_microstep: 1524.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3503
[2024-06-10 08:30:31,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1616.70 | bwd_inner_microstep: 1616.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084
[2024-06-10 08:30:32,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 818.29 | bwd_inner_microstep: 818.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-10 08:30:34,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 08:30:36,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1514.12 | bwd_inner_microstep: 1514.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644
[2024-06-10 08:30:38,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1446.39 | bwd_inner_microstep: 1446.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 08:30:40,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1489.80 | bwd_inner_microstep: 1489.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 08:30:42,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.37 | bwd_microstep: 1325.62 | bwd_inner_microstep: 1325.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 08:30:44,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 1455.42 | bwd_inner_microstep: 1455.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 08:30:46,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1526.34 | bwd_inner_microstep: 1526.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 08:30:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1399.07 | bwd_inner_microstep: 1399.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 08:30:49,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.53 | bwd_microstep: 1184.11 | bwd_inner_microstep: 1184.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 08:30:52,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.95 | bwd_microstep: 1533.99 | bwd_inner_microstep: 1533.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621
[2024-06-10 08:30:54,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.86 | bwd_microstep: 1443.74 | bwd_inner_microstep: 1443.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 08:30:56,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1550.25 | bwd_inner_microstep: 1550.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3593
[2024-06-10 08:30:58,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.82 | bwd_microstep: 1704.49 | bwd_inner_microstep: 1704.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769
[2024-06-10 08:31:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.94 | bwd_microstep: 1365.91 | bwd_inner_microstep: 1365.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-10 08:31:02,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.17 | optimizer_step: 6.65
[2024-06-10 08:31:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.91 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1473.67 | bwd_allreduce_microstep: 7.62 | step_microstep: 37.65
[2024-06-10 08:31:02,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16595.33 | bwd: 44427.71 | bwd_inner: 44419.08 | bwd_allreduce: 7.90 | step: 39.24
{'loss': 1.3053, 'learning_rate': 3.459888497402526e-05, 'epoch': 0.26}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 08:31:04,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1444.89 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2356
[2024-06-10 08:31:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.53 | bwd_microstep: 893.17 | bwd_inner_microstep: 893.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 08:31:07,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1457.59 | bwd_inner_microstep: 1457.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925
[2024-06-10 08:31:09,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.68 | bwd_microstep: 1596.35 | bwd_inner_microstep: 1596.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 08:31:11,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.33 | bwd_microstep: 1288.39 | bwd_inner_microstep: 1288.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3830
[2024-06-10 08:31:13,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1322.00 | bwd_inner_microstep: 1321.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 08:31:15,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1300.27 | bwd_inner_microstep: 1300.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 08:31:17,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1254.03 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2178
[2024-06-10 08:31:18,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.01 | bwd_microstep: 985.91 | bwd_inner_microstep: 985.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420
[2024-06-10 08:31:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.87 | bwd_microstep: 1315.54 | bwd_inner_microstep: 1315.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 08:31:22,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 08:31:23,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.34 | bwd_microstep: 791.84 | bwd_inner_microstep: 791.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 08:31:25,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688
[2024-06-10 08:31:26,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.99 | bwd_microstep: 1424.00 | bwd_inner_microstep: 1423.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3620
[2024-06-10 08:31:29,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.00 | bwd_microstep: 1810.98 | bwd_inner_microstep: 1810.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 08:31:31,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.59 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 08:31:33,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.16 | bwd_microstep: 1558.54 | bwd_inner_microstep: 1558.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 08:31:35,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1294.55 | bwd_inner_microstep: 1294.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 08:31:37,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.06 | bwd_microstep: 1424.48 | bwd_inner_microstep: 1424.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626
[2024-06-10 08:31:39,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.30 | bwd_microstep: 1475.05 | bwd_inner_microstep: 1475.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 08:31:41,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1295.61 | bwd_inner_microstep: 1295.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529
[2024-06-10 08:31:42,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.02 | bwd_microstep: 1329.23 | bwd_inner_microstep: 1329.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3730
[2024-06-10 08:31:44,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.34 | bwd_microstep: 1277.25 | bwd_inner_microstep: 1277.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3613
[2024-06-10 08:31:46,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1275.64 | bwd_inner_microstep: 1275.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 08:31:47,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.34 | bwd_microstep: 695.94 | bwd_inner_microstep: 695.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 08:31:49,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1439.73 | bwd_inner_microstep: 1439.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3766
[2024-06-10 08:31:51,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1375.91 | bwd_inner_microstep: 1375.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 08:31:53,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1509.94 | bwd_inner_microstep: 1509.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 08:31:55,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.23 | bwd_microstep: 1502.34 | bwd_inner_microstep: 1502.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801
[2024-06-10 08:31:57,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1451.22 | bwd_inner_microstep: 1451.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 08:31:59,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.89 | bwd_microstep: 1648.05 | bwd_inner_microstep: 1648.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 08:32:02,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.14 | optimizer_step: 6.58
[2024-06-10 08:32:02,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 2310.14 | bwd_inner_microstep: 1819.88 | bwd_allreduce_microstep: 490.22 | step_microstep: 37.54
[2024-06-10 08:32:02,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16162.79 | bwd: 43703.07 | bwd_inner: 43211.90 | bwd_allreduce: 490.46 | step: 39.18
{'loss': 1.2901, 'learning_rate': 3.4573204566118476e-05, 'epoch': 0.26}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2008
[2024-06-10 08:32:03,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.92 | bwd_microstep: 890.52 | bwd_inner_microstep: 890.43 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 08:32:05,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.37 | bwd_microstep: 1292.59 | bwd_inner_microstep: 1292.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3854
[2024-06-10 08:32:07,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1363.71 | bwd_inner_microstep: 1363.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4317
[2024-06-10 08:32:09,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.56 | bwd_microstep: 1585.79 | bwd_inner_microstep: 1585.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 08:32:10,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.13 | bwd_microstep: 791.36 | bwd_inner_microstep: 791.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 08:32:12,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1310.04 | bwd_inner_microstep: 1310.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 08:32:14,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.54 | bwd_microstep: 1254.24 | bwd_inner_microstep: 1254.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 718
[2024-06-10 08:32:14,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 115.84 | bwd_microstep: 290.34 | bwd_inner_microstep: 290.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 08:32:16,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1431.40 | bwd_inner_microstep: 1431.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 08:32:18,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.07 | bwd_microstep: 1161.33 | bwd_inner_microstep: 1161.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1876
[2024-06-10 08:32:19,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.11 | bwd_microstep: 742.79 | bwd_inner_microstep: 742.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966
[2024-06-10 08:32:20,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.10 | bwd_microstep: 853.57 | bwd_inner_microstep: 853.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681
[2024-06-10 08:32:22,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.07 | bwd_microstep: 1514.93 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 08:32:24,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1347.71 | bwd_inner_microstep: 1347.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 08:32:26,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.45 | bwd_microstep: 1285.21 | bwd_inner_microstep: 1285.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 08:32:28,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1511.62 | bwd_inner_microstep: 1511.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3871
[2024-06-10 08:32:30,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.35 | bwd_microstep: 1498.05 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 08:32:32,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1553.98 | bwd_inner_microstep: 1553.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2448
[2024-06-10 08:32:33,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.54 | bwd_microstep: 853.67 | bwd_inner_microstep: 853.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2421
[2024-06-10 08:32:35,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.66 | bwd_microstep: 1036.77 | bwd_inner_microstep: 1036.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-10 08:32:36,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.79 | bwd_microstep: 697.47 | bwd_inner_microstep: 697.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 08:32:38,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.60 | bwd_microstep: 1294.52 | bwd_inner_microstep: 1294.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 08:32:39,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.01 | bwd_microstep: 802.70 | bwd_inner_microstep: 802.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4148
[2024-06-10 08:32:41,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.79 | bwd_microstep: 1647.65 | bwd_inner_microstep: 1647.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 08:32:43,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1389.44 | bwd_inner_microstep: 1389.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017
[2024-06-10 08:32:44,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.69 | bwd_microstep: 839.44 | bwd_inner_microstep: 839.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2841
[2024-06-10 08:32:46,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.55 | bwd_microstep: 1159.55 | bwd_inner_microstep: 1159.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3564
[2024-06-10 08:32:47,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.65 | bwd_microstep: 1344.72 | bwd_inner_microstep: 1344.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827
[2024-06-10 08:32:50,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.96 | bwd_microstep: 1750.64 | bwd_inner_microstep: 1750.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2562
[2024-06-10 08:32:51,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.92 | bwd_microstep: 1101.34 | bwd_inner_microstep: 1101.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-10 08:32:53,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.54 | bwd_microstep: 1514.25 | bwd_inner_microstep: 1514.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787
[2024-06-10 08:33:02,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 08:33:02,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.43 | bwd_microstep: 7800.95 | bwd_inner_microstep: 1628.74 | bwd_allreduce_microstep: 6172.15 | step_microstep: 37.95
[2024-06-10 08:33:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14498.34 | bwd: 44912.29 | bwd_inner: 38739.16 | bwd_allreduce: 6172.42 | step: 39.49
{'loss': 1.3042, 'learning_rate': 3.4547472831460976e-05, 'epoch': 0.26}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 08:33:04,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1588.63 | bwd_inner_microstep: 1588.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4511
[2024-06-10 08:33:06,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.49 | bwd_microstep: 1635.85 | bwd_inner_microstep: 1635.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 08:33:08,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.06 | bwd_microstep: 1286.52 | bwd_inner_microstep: 1286.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 08:33:10,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1544.78 | bwd_inner_microstep: 1544.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 08:33:12,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1478.98 | bwd_inner_microstep: 1478.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2588
[2024-06-10 08:33:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.74 | bwd_microstep: 1041.07 | bwd_inner_microstep: 1041.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2260
[2024-06-10 08:33:15,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.62 | bwd_microstep: 967.74 | bwd_inner_microstep: 967.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 08:33:17,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3436
[2024-06-10 08:33:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.96 | bwd_microstep: 1184.19 | bwd_inner_microstep: 1184.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 08:33:20,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1384.20 | bwd_inner_microstep: 1384.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 08:33:22,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.06 | bwd_microstep: 1300.45 | bwd_inner_microstep: 1300.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3647
[2024-06-10 08:33:24,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1541.41 | bwd_inner_microstep: 1541.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 08:33:26,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1286.21 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 08:33:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.31 | bwd_microstep: 891.72 | bwd_inner_microstep: 891.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3651
[2024-06-10 08:33:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.79 | bwd_microstep: 1642.42 | bwd_inner_microstep: 1642.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 08:33:31,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.57 | bwd_microstep: 1373.39 | bwd_inner_microstep: 1373.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2015
[2024-06-10 08:33:33,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.04 | bwd_microstep: 831.41 | bwd_inner_microstep: 831.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3629
[2024-06-10 08:33:35,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.47 | bwd_microstep: 1807.82 | bwd_inner_microstep: 1807.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 08:33:37,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 08:33:39,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.59 | bwd_microstep: 1503.77 | bwd_inner_microstep: 1503.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 08:33:41,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.25 | bwd_microstep: 1554.83 | bwd_inner_microstep: 1554.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 08:33:43,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1294.70 | bwd_inner_microstep: 1294.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3584
[2024-06-10 08:33:45,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.41 | bwd_microstep: 1207.23 | bwd_inner_microstep: 1207.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616
[2024-06-10 08:33:47,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1340.35 | bwd_inner_microstep: 1340.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 08:33:48,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1350.19 | bwd_inner_microstep: 1350.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 08:33:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1388.70 | bwd_inner_microstep: 1388.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599
[2024-06-10 08:33:53,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.00 | bwd_microstep: 1702.04 | bwd_inner_microstep: 1702.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-10 08:33:55,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1432.74 | bwd_inner_microstep: 1432.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3769
[2024-06-10 08:33:57,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.88 | bwd_microstep: 1507.65 | bwd_inner_microstep: 1507.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-10 08:33:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1429.74 | bwd_inner_microstep: 1429.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215
[2024-06-10 08:34:00,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.60 | bwd_microstep: 954.90 | bwd_inner_microstep: 954.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3595
[2024-06-10 08:34:02,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.15 | optimizer_step: 6.57
[2024-06-10 08:34:02,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.25 | bwd_microstep: 1842.51 | bwd_inner_microstep: 1455.26 | bwd_allreduce_microstep: 387.21 | step_microstep: 37.67
[2024-06-10 08:34:02,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16282.74 | bwd: 43982.67 | bwd_inner: 43594.56 | bwd_allreduce: 387.43 | step: 39.23
{'loss': 1.263, 'learning_rate': 3.452168986067979e-05, 'epoch': 0.26}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3456
[2024-06-10 08:34:05,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.45 | bwd_microstep: 1494.21 | bwd_inner_microstep: 1494.11 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882
[2024-06-10 08:34:07,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1487.43 | bwd_inner_microstep: 1487.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 08:34:09,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1376.07 | bwd_inner_microstep: 1376.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 08:34:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 1650.51 | bwd_inner_microstep: 1650.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 08:34:13,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.57 | bwd_microstep: 1448.41 | bwd_inner_microstep: 1448.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3425
[2024-06-10 08:34:15,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.99 | bwd_microstep: 1311.20 | bwd_inner_microstep: 1311.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 08:34:16,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1277.34 | bwd_inner_microstep: 1277.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 08:34:18,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1533.56 | bwd_inner_microstep: 1533.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3773
[2024-06-10 08:34:21,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.47 | bwd_microstep: 1505.59 | bwd_inner_microstep: 1505.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-10 08:34:23,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1423.00 | bwd_inner_microstep: 1422.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 08:34:24,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1347.71 | bwd_inner_microstep: 1347.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 08:34:26,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1390.04 | bwd_inner_microstep: 1390.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 08:34:28,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.20 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 08:34:30,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.47 | bwd_microstep: 1413.75 | bwd_inner_microstep: 1413.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 08:34:32,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.91 | bwd_microstep: 1604.27 | bwd_inner_microstep: 1604.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 08:34:34,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1425.54 | bwd_inner_microstep: 1425.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3715
[2024-06-10 08:34:37,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1697.40 | bwd_inner_microstep: 1697.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 08:34:39,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 1605.60 | bwd_inner_microstep: 1605.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543
[2024-06-10 08:34:41,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1326.70 | bwd_inner_microstep: 1326.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 08:34:43,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1355.88 | bwd_inner_microstep: 1355.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 08:34:45,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1498.75 | bwd_inner_microstep: 1498.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 08:34:46,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1295.66 | bwd_inner_microstep: 1295.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 08:34:49,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.64 | bwd_microstep: 1510.01 | bwd_inner_microstep: 1509.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 08:34:51,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.20 | bwd_microstep: 1497.22 | bwd_inner_microstep: 1497.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 08:34:53,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 1401.38 | bwd_inner_microstep: 1401.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 08:34:54,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1351.17 | bwd_inner_microstep: 1351.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 08:34:56,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 1446.34 | bwd_inner_microstep: 1446.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 08:34:58,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1346.29 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 08:35:01,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.06 | bwd_microstep: 1650.68 | bwd_inner_microstep: 1650.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 08:35:03,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.80 | bwd_microstep: 1458.32 | bwd_inner_microstep: 1458.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3608
[2024-06-10 08:35:05,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1550.29 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 08:35:07,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.16 | optimizer_step: 6.64
[2024-06-10 08:35:07,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.49 | bwd_microstep: 1512.34 | bwd_inner_microstep: 1504.61 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.63
[2024-06-10 08:35:07,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17370.28 | bwd: 46577.93 | bwd_inner: 46569.26 | bwd_allreduce: 7.96 | step: 39.20
:36, 61.44s/it]
 26%|██▌       | 452/1726 [7:52:37<21:36:35, 61.06s/it]


 26%|██▌       | 452/1726 [7:52:37<21:36:35, 61.06s/it]
 26%|██▌       | 453/1726 [7:53:39<21:37:31, 61.16s/it]


 26%|██▌       | 453/1726 [7:53:39<21:37:31, 61.16s/it]
 26%|██▋       | 454/1726 [7:54:39<21:30:26, 60.87s/it]


 26%|██▋       | 454/1726 [7:54:39<21:30:26, 60.87s/it]
 26%|██▋       | 455/1726 [7:55:39<21:22:15, 60.53s/it]


 26%|██▋       | 455/1726 [7:55:39<21:22:15, 60.53s/it]
 26%|██▋       | 456/1726 [7:56:39<21:21:45, 60.56s/it]


 26%|██▋       | 456/1726 [7:56:39<21:21:45, 60.56s/it]
 26%|██▋       | 457/1726 [7:57:44<21:44:29, 61.68s/it]
                                         {'loss': 1.2736, 'learning_rate': 3.44958557445824e-05, 'epoch': 0.26}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 08:35:09,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1338.62 | bwd_inner_microstep: 1338.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3917
[2024-06-10 08:35:11,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.37 | bwd_microstep: 1695.36 | bwd_inner_microstep: 1695.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 08:35:13,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1348.94 | bwd_inner_microstep: 1348.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865
[2024-06-10 08:35:15,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.17 | bwd_microstep: 1462.62 | bwd_inner_microstep: 1462.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 08:35:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 793.89 | bwd_inner_microstep: 793.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 08:35:18,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3682
[2024-06-10 08:35:20,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501
[2024-06-10 08:35:22,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.75 | bwd_microstep: 1222.48 | bwd_inner_microstep: 1222.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3506
[2024-06-10 08:35:23,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.03 | bwd_microstep: 1222.21 | bwd_inner_microstep: 1222.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 08:35:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.85 | bwd_microstep: 1154.68 | bwd_inner_microstep: 1154.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 08:35:27,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.59 | bwd_microstep: 1381.14 | bwd_inner_microstep: 1381.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 08:35:28,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.00 | bwd_microstep: 804.81 | bwd_inner_microstep: 804.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3378
[2024-06-10 08:35:30,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.49 | bwd_microstep: 1336.02 | bwd_inner_microstep: 1335.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 08:35:32,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1394.27 | bwd_inner_microstep: 1394.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 08:35:34,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1292.28 | bwd_inner_microstep: 1292.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 08:35:35,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.38 | bwd_microstep: 801.33 | bwd_inner_microstep: 801.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 08:35:37,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1418.03 | bwd_inner_microstep: 1418.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 08:35:39,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1438.91 | bwd_inner_microstep: 1438.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 08:35:41,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1396.78 | bwd_inner_microstep: 1396.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 08:35:42,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1402.87 | bwd_inner_microstep: 1402.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624
[2024-06-10 08:35:44,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.93 | bwd_microstep: 1313.77 | bwd_inner_microstep: 1313.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2027
[2024-06-10 08:35:45,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.46 | bwd_microstep: 839.68 | bwd_inner_microstep: 839.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 08:35:47,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.23 | bwd_inner_microstep: 1283.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633
[2024-06-10 08:35:49,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.54 | bwd_microstep: 1318.96 | bwd_inner_microstep: 1318.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 08:35:51,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1415.23 | bwd_inner_microstep: 1415.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629
[2024-06-10 08:35:53,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1539.95 | bwd_inner_microstep: 1539.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2295
[2024-06-10 08:35:55,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.82 | bwd_microstep: 1076.04 | bwd_inner_microstep: 1076.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 08:35:56,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.02 | bwd_microstep: 977.46 | bwd_inner_microstep: 977.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 08:35:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1475.11 | bwd_inner_microstep: 1475.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3011
[2024-06-10 08:36:00,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.66 | bwd_microstep: 1136.37 | bwd_inner_microstep: 1136.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3598
[2024-06-10 08:36:01,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.83 | bwd_microstep: 1347.42 | bwd_inner_microstep: 1347.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 08:36:07,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 08:36:07,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 5167.74 | bwd_inner_microstep: 1752.97 | bwd_allreduce_microstep: 3414.72 | step_microstep: 38.17
[2024-06-10 08:36:07,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15440.57 | bwd: 44661.01 | bwd_inner: 41245.39 | bwd_allreduce: 3414.94 | step: 39.88
{'loss': 1.3033, 'learning_rate': 3.4469970574156436e-05, 'epoch': 0.27}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 08:36:09,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.76 | bwd_microstep: 1267.56 | bwd_inner_microstep: 1267.37 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.65
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1088
[2024-06-10 08:36:10,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 168.50 | bwd_microstep: 434.56 | bwd_inner_microstep: 434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 08:36:11,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.41 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 08:36:14,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1557.70 | bwd_inner_microstep: 1557.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-10 08:36:15,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.89 | bwd_microstep: 1216.17 | bwd_inner_microstep: 1216.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1413
[2024-06-10 08:36:16,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 225.52 | bwd_microstep: 597.25 | bwd_inner_microstep: 597.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 08:36:18,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1385.87 | bwd_inner_microstep: 1385.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405
[2024-06-10 08:36:20,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1312.72 | bwd_inner_microstep: 1312.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 08:36:21,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.74 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 08:36:23,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3649
[2024-06-10 08:36:25,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.12 | bwd_microstep: 1324.62 | bwd_inner_microstep: 1324.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3723
[2024-06-10 08:36:27,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.01 | bwd_microstep: 1729.81 | bwd_inner_microstep: 1729.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 08:36:29,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.63 | bwd_microstep: 1288.45 | bwd_inner_microstep: 1288.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377
[2024-06-10 08:36:31,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.55 | bwd_microstep: 1332.61 | bwd_inner_microstep: 1332.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 08:36:33,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1384.48 | bwd_inner_microstep: 1384.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687
[2024-06-10 08:36:35,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.42 | bwd_microstep: 1622.35 | bwd_inner_microstep: 1622.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-10 08:36:37,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.27 | bwd_microstep: 1515.26 | bwd_inner_microstep: 1515.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 08:36:38,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.05 | bwd_microstep: 890.93 | bwd_inner_microstep: 890.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 08:36:40,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1512.83 | bwd_inner_microstep: 1512.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2308
[2024-06-10 08:36:42,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.37 | bwd_microstep: 883.81 | bwd_inner_microstep: 883.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 08:36:43,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1429.65 | bwd_inner_microstep: 1429.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469
[2024-06-10 08:36:45,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.52 | bwd_microstep: 1244.55 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 08:36:47,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1315.01 | bwd_inner_microstep: 1314.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3535
[2024-06-10 08:36:49,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.38 | bwd_microstep: 1588.47 | bwd_inner_microstep: 1588.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 08:36:52,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.12 | bwd_microstep: 1645.76 | bwd_inner_microstep: 1645.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2273
[2024-06-10 08:36:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 813.63 | bwd_inner_microstep: 813.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 08:36:54,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1294.35 | bwd_inner_microstep: 1294.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 08:36:56,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3596
[2024-06-10 08:36:58,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1246.89 | bwd_inner_microstep: 1246.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568
[2024-06-10 08:37:00,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.33 | bwd_microstep: 1593.60 | bwd_inner_microstep: 1593.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2037
[2024-06-10 08:37:02,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.33 | bwd_microstep: 907.00 | bwd_inner_microstep: 906.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638
[2024-06-10 08:37:08,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.37 | optimizer_step: 6.60
[2024-06-10 08:37:08,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.52 | bwd_microstep: 6064.34 | bwd_inner_microstep: 1148.46 | bwd_allreduce_microstep: 4915.81 | step_microstep: 39.11
[2024-06-10 08:37:08,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15128.62 | bwd: 45320.36 | bwd_inner: 40403.47 | bwd_allreduce: 4916.13 | step: 40.88
{'loss': 1.2928, 'learning_rate': 3.444403444056934e-05, 'epoch': 0.27}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 08:37:10,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.06 | bwd_microstep: 1236.55 | bwd_inner_microstep: 1236.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 08:37:12,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896
[2024-06-10 08:37:14,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.02 | bwd_microstep: 1584.35 | bwd_inner_microstep: 1584.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 08:37:16,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.92 | bwd_microstep: 1552.66 | bwd_inner_microstep: 1552.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865
[2024-06-10 08:37:18,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.71 | bwd_microstep: 1460.82 | bwd_inner_microstep: 1460.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3753
[2024-06-10 08:37:20,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.91 | bwd_microstep: 1501.71 | bwd_inner_microstep: 1501.52 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 08:37:21,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 794.51 | bwd_inner_microstep: 794.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3716
[2024-06-10 08:37:23,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1239.56 | bwd_inner_microstep: 1239.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 08:37:25,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1282.88 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 08:37:26,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 801.46 | bwd_inner_microstep: 801.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4091
[2024-06-10 08:37:28,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.55 | bwd_microstep: 1532.92 | bwd_inner_microstep: 1532.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 08:37:30,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.40 | bwd_microstep: 1254.31 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910
[2024-06-10 08:37:31,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.85 | bwd_microstep: 749.26 | bwd_inner_microstep: 749.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2154
[2024-06-10 08:37:32,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.78 | bwd_microstep: 887.24 | bwd_inner_microstep: 887.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 08:37:34,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1347.78 | bwd_inner_microstep: 1347.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 08:37:36,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.34 | bwd_microstep: 1344.71 | bwd_inner_microstep: 1344.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 08:37:37,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2082
[2024-06-10 08:37:38,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.98 | bwd_microstep: 724.94 | bwd_inner_microstep: 724.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 08:37:40,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.69 | bwd_microstep: 1434.68 | bwd_inner_microstep: 1434.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3698
[2024-06-10 08:37:42,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 08:37:44,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1411.72 | bwd_inner_microstep: 1411.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 08:37:46,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1560.09 | bwd_inner_microstep: 1560.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 08:37:48,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1427.33 | bwd_inner_microstep: 1427.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3825
[2024-06-10 08:37:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1407.39 | bwd_inner_microstep: 1407.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3556
[2024-06-10 08:37:52,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.97 | bwd_microstep: 1327.24 | bwd_inner_microstep: 1327.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469
[2024-06-10 08:37:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1249.54 | bwd_inner_microstep: 1249.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 08:37:56,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.11 | bwd_microstep: 1435.19 | bwd_inner_microstep: 1435.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 08:37:58,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1450.60 | bwd_inner_microstep: 1450.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 08:38:00,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.59 | bwd_microstep: 1353.80 | bwd_inner_microstep: 1353.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 08:38:01,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.62 | bwd_microstep: 1184.97 | bwd_inner_microstep: 1184.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2918
[2024-06-10 08:38:03,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.92 | bwd_microstep: 1290.74 | bwd_inner_microstep: 1290.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3047
[2024-06-10 08:38:10,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.30 | optimizer_step: 6.62
[2024-06-10 08:38:10,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.47 | bwd_microstep: 5888.53 | bwd_inner_microstep: 1507.28 | bwd_allreduce_microstep: 4381.19 | step_microstep: 38.68
[2024-06-10 08:38:10,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15503.98 | bwd: 45720.68 | bwd_inner: 41338.44 | bwd_allreduce: 4381.49 | step: 40.38
{'loss': 1.3046, 'learning_rate': 3.4418047435168025e-05, 'epoch': 0.27}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964
[2024-06-10 08:38:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.36 | bwd_microstep: 824.65 | bwd_inner_microstep: 824.52 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664
[2024-06-10 08:38:13,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.62 | bwd_microstep: 1717.19 | bwd_inner_microstep: 1717.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 08:38:15,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.38 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3405
[2024-06-10 08:38:17,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1294.47 | bwd_inner_microstep: 1294.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 08:38:19,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.73 | bwd_microstep: 1555.69 | bwd_inner_microstep: 1555.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 08:38:21,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1545.12 | bwd_inner_microstep: 1545.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741
[2024-06-10 08:38:23,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1431.50 | bwd_inner_microstep: 1431.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 08:38:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1411.05 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3902
[2024-06-10 08:38:27,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.09 | bwd_microstep: 1587.23 | bwd_inner_microstep: 1587.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2148
[2024-06-10 08:38:28,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.55 | bwd_microstep: 788.25 | bwd_inner_microstep: 788.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3619
[2024-06-10 08:38:30,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.95 | bwd_microstep: 1221.68 | bwd_inner_microstep: 1221.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687
[2024-06-10 08:38:32,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.92 | bwd_microstep: 1627.63 | bwd_inner_microstep: 1627.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2165
[2024-06-10 08:38:34,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.26 | bwd_microstep: 953.93 | bwd_inner_microstep: 953.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2922
[2024-06-10 08:38:35,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.38 | bwd_microstep: 1129.37 | bwd_inner_microstep: 1129.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3731
[2024-06-10 08:38:37,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.50 | bwd_microstep: 1630.14 | bwd_inner_microstep: 1630.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523
[2024-06-10 08:38:40,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.42 | bwd_microstep: 1584.37 | bwd_inner_microstep: 1584.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2386
[2024-06-10 08:38:41,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.18 | bwd_microstep: 1000.69 | bwd_inner_microstep: 1000.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 08:38:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.15 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 08:38:44,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.66 | bwd_microstep: 1282.27 | bwd_inner_microstep: 1282.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1992
[2024-06-10 08:38:45,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.68 | bwd_microstep: 829.63 | bwd_inner_microstep: 829.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 08:38:47,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.47 | bwd_microstep: 1458.24 | bwd_inner_microstep: 1458.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 08:38:49,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.97 | bwd_microstep: 1463.68 | bwd_inner_microstep: 1463.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2979
[2024-06-10 08:38:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.55 | bwd_microstep: 1202.01 | bwd_inner_microstep: 1201.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 08:38:53,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3800
[2024-06-10 08:38:55,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.24 | bwd_microstep: 1518.01 | bwd_inner_microstep: 1517.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 08:38:57,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 1440.34 | bwd_inner_microstep: 1440.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2022
[2024-06-10 08:38:58,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.46 | bwd_microstep: 743.13 | bwd_inner_microstep: 743.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 08:39:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.25 | bwd_microstep: 1397.32 | bwd_inner_microstep: 1397.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 08:39:02,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1354.14 | bwd_inner_microstep: 1354.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 08:39:04,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.88 | bwd_microstep: 1550.74 | bwd_inner_microstep: 1550.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 08:39:06,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1478.92 | bwd_inner_microstep: 1478.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3617
[2024-06-10 08:39:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 08:39:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.64 | bwd_microstep: 4630.23 | bwd_inner_microstep: 1739.63 | bwd_allreduce_microstep: 2890.55 | step_microstep: 37.97
[2024-06-10 08:39:11,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15797.78 | bwd: 45325.86 | bwd_inner: 42434.31 | bwd_allreduce: 2890.82 | step: 39.70
{'loss': 1.2734, 'learning_rate': 3.4392009649478596e-05, 'epoch': 0.27}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 08:39:13,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.66 | bwd_microstep: 1365.05 | bwd_inner_microstep: 1364.95 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920
[2024-06-10 08:39:15,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.80 | bwd_microstep: 1591.55 | bwd_inner_microstep: 1591.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3835
[2024-06-10 08:39:17,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1389.26 | bwd_inner_microstep: 1389.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 08:39:19,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.74 | bwd_microstep: 1383.76 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 08:39:20,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.35 | bwd_microstep: 817.01 | bwd_inner_microstep: 816.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 08:39:22,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1247.64 | bwd_inner_microstep: 1247.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 08:39:24,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.66 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-10 08:39:25,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.74 | bwd_microstep: 680.31 | bwd_inner_microstep: 680.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 08:39:27,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.43 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 08:39:28,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.27 | bwd_microstep: 1316.24 | bwd_inner_microstep: 1316.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672
[2024-06-10 08:39:31,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.70 | bwd_microstep: 1614.83 | bwd_inner_microstep: 1614.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2873
[2024-06-10 08:39:32,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.26 | bwd_microstep: 990.75 | bwd_inner_microstep: 990.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 08:39:34,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 08:39:36,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.42 | bwd_microstep: 1346.05 | bwd_inner_microstep: 1346.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3614
[2024-06-10 08:39:38,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.43 | bwd_microstep: 1705.86 | bwd_inner_microstep: 1705.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3635
[2024-06-10 08:39:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1249.75 | bwd_inner_microstep: 1249.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 08:39:42,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1511.21 | bwd_inner_microstep: 1511.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 08:39:44,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.26 | bwd_microstep: 1298.90 | bwd_inner_microstep: 1298.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 08:39:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.37 | bwd_microstep: 1527.10 | bwd_inner_microstep: 1527.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 08:39:48,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3829
[2024-06-10 08:39:50,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.28 | bwd_microstep: 1265.01 | bwd_inner_microstep: 1264.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 08:39:51,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 802.18 | bwd_inner_microstep: 802.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 08:39:53,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.73 | bwd_microstep: 1506.94 | bwd_inner_microstep: 1506.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 634
[2024-06-10 08:39:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.14 | bwd_microstep: 264.61 | bwd_inner_microstep: 264.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 08:39:55,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.91 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 08:39:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1302.31 | bwd_inner_microstep: 1302.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-10 08:39:58,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.49 | bwd_microstep: 702.10 | bwd_inner_microstep: 702.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 08:40:00,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.09 | bwd_microstep: 1604.88 | bwd_inner_microstep: 1604.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2003
[2024-06-10 08:40:01,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.19 | bwd_microstep: 832.50 | bwd_inner_microstep: 832.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934
[2024-06-10 08:40:02,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.35 | bwd_microstep: 727.89 | bwd_inner_microstep: 727.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3569
[2024-06-10 08:40:05,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.02 | bwd_microstep: 1665.78 | bwd_inner_microstep: 1665.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 08:40:13,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.35 | optimizer_step: 6.60
[2024-06-10 08:40:13,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.01 | bwd_microstep: 8433.24 | bwd_inner_microstep: 1101.23 | bwd_allreduce_microstep: 7331.94 | step_microstep: 38.85
[2024-06-10 08:40:14,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14899.13 | bwd: 47132.45 | bwd_inner: 39799.50 | bwd_allreduce: 7332.23 | step: 40.77
{'loss': 1.2294, 'learning_rate': 3.4365921175206e-05, 'epoch': 0.27}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 08:40:16,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 1475.62 | bwd_inner_microstep: 1475.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3982
[2024-06-10 08:40:18,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.88 | bwd_microstep: 1464.81 | bwd_inner_microstep: 1464.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 08:40:19,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1376.36 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852
[2024-06-10 08:40:22,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.92 | bwd_microstep: 1662.09 | bwd_inner_microstep: 1662.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 08:40:24,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1648.61 | bwd_inner_microstep: 1648.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 08:40:26,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.45 | bwd_microstep: 1344.04 | bwd_inner_microstep: 1344.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 08:40:27,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.21 | bwd_microstep: 682.30 | bwd_inner_microstep: 682.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 08:40:29,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-10 08:40:31,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690
[2024-06-10 08:40:32,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.94 | bwd_microstep: 1328.78 | bwd_inner_microstep: 1328.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 08:40:34,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1253.63 | bwd_inner_microstep: 1253.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 08:40:36,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.48 | bwd_microstep: 1530.80 | bwd_inner_microstep: 1530.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 08:40:38,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.25 | bwd_microstep: 1257.53 | bwd_inner_microstep: 1257.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902
[2024-06-10 08:40:39,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.71 | bwd_microstep: 689.98 | bwd_inner_microstep: 689.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 08:40:41,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1382.89 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 08:40:43,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1258.04 | bwd_inner_microstep: 1258.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3423
[2024-06-10 08:40:45,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1408.44 | bwd_inner_microstep: 1408.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 08:40:47,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.30 | bwd_microstep: 1518.99 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538
[2024-06-10 08:40:49,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1422.52 | bwd_inner_microstep: 1422.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 08:40:50,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.20 | bwd_microstep: 799.36 | bwd_inner_microstep: 799.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 08:40:52,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1309.96 | bwd_inner_microstep: 1309.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3744
[2024-06-10 08:40:54,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1560.98 | bwd_inner_microstep: 1560.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 08:40:56,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1399.19 | bwd_inner_microstep: 1399.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075
[2024-06-10 08:40:57,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.94 | bwd_microstep: 916.85 | bwd_inner_microstep: 916.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3946
[2024-06-10 08:41:00,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 693.88 | bwd_microstep: 1913.38 | bwd_inner_microstep: 1913.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3571
[2024-06-10 08:41:02,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.26 | bwd_microstep: 1561.21 | bwd_inner_microstep: 1561.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 08:41:03,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.98 | bwd_microstep: 920.88 | bwd_inner_microstep: 920.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 08:41:05,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1588.42 | bwd_inner_microstep: 1588.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-10 08:41:07,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.61 | bwd_microstep: 1643.79 | bwd_inner_microstep: 1643.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 08:41:10,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1507.14 | bwd_inner_microstep: 1507.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-10 08:41:11,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1405.06 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3779
[2024-06-10 08:41:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 08:41:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.19 | bwd_microstep: 5040.80 | bwd_inner_microstep: 1823.34 | bwd_allreduce_microstep: 3217.41 | step_microstep: 38.50
[2024-06-10 08:41:17,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16275.47 | bwd: 47002.62 | bwd_inner: 43784.30 | bwd_allreduce: 3217.65 | step: 40.13


 26%|██▋       | 457/1726 [7:57:44<21:44:29, 61.68s/it]
 27%|██▋       | 458/1726 [7:58:44<21:35:42, 61.31s/it]


 27%|██▋       | 458/1726 [7:58:44<21:35:42, 61.31s/it]
 27%|██▋       | 459/1726 [7:59:45<21:31:29, 61.16s/it]


 27%|██▋       | 459/1726 [7:59:45<21:31:29, 61.16s/it]
 27%|██▋       | 460/1726 [8:00:46<21:33:09, 61.29s/it]


 27%|██▋       | 460/1726 [8:00:46<21:33:09, 61.29s/it]
 27%|██▋       | 461/1726 [8:01:48<21:33:23, 61.35s/it]


 27%|██▋       | 461/1726 [8:01:48<21:33:23, 61.35s/it]
 27%|██▋       | 462/1726 [8:02:50<21:38:55, 61.66s/it]


 27%|██▋       | 462/1726 [8:02:50<21:38:55, 61.66s/it]
 27%|██▋       | 463/1726 [8:03:54<{'loss': 1.2398, 'learning_rate': 3.43397821042337e-05, 'epoch': 0.27}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 08:41:19,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1366.90 | bwd_inner_microstep: 1366.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3989
[2024-06-10 08:41:21,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1434.89 | bwd_inner_microstep: 1434.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 08:41:23,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.59 | bwd_microstep: 1546.38 | bwd_inner_microstep: 1546.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 08:41:24,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 794.92 | bwd_inner_microstep: 794.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 08:41:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.63 | bwd_microstep: 1434.46 | bwd_inner_microstep: 1434.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 08:41:28,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1377.77 | bwd_inner_microstep: 1377.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 08:41:30,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-10 08:41:32,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.10 | bwd_microstep: 1518.99 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 08:41:34,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1377.68 | bwd_inner_microstep: 1377.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987
[2024-06-10 08:41:35,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.41 | bwd_microstep: 710.57 | bwd_inner_microstep: 710.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3419
[2024-06-10 08:41:37,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3970
[2024-06-10 08:41:39,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.29 | bwd_microstep: 1694.24 | bwd_inner_microstep: 1694.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 08:41:41,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1380.50 | bwd_inner_microstep: 1380.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-10 08:41:43,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1410.78 | bwd_inner_microstep: 1410.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-10 08:41:44,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.89 | bwd_microstep: 976.32 | bwd_inner_microstep: 976.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-10 08:41:46,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1417.75 | bwd_inner_microstep: 1417.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 08:41:47,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.79 | bwd_microstep: 703.30 | bwd_inner_microstep: 703.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 08:41:49,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.38 | bwd_microstep: 1317.23 | bwd_inner_microstep: 1317.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 08:41:51,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1415.86 | bwd_inner_microstep: 1415.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 08:41:53,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.14 | bwd_microstep: 1327.64 | bwd_inner_microstep: 1327.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 08:41:55,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1285.07 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 08:41:56,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.64 | bwd_microstep: 1318.21 | bwd_inner_microstep: 1318.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 08:41:59,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.51 | bwd_microstep: 1556.34 | bwd_inner_microstep: 1556.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 08:42:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1387.80 | bwd_inner_microstep: 1387.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 08:42:02,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.20 | bwd_microstep: 1313.51 | bwd_inner_microstep: 1313.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399
[2024-06-10 08:42:04,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1441.57 | bwd_inner_microstep: 1441.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-10 08:42:06,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.08 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2056
[2024-06-10 08:42:07,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.30 | bwd_microstep: 812.89 | bwd_inner_microstep: 812.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 08:42:09,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1493.76 | bwd_inner_microstep: 1493.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3624
[2024-06-10 08:42:11,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.54 | bwd_microstep: 1452.92 | bwd_inner_microstep: 1452.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2222
[2024-06-10 08:42:13,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.26 | bwd_microstep: 800.31 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 08:42:18,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 08:42:18,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.96 | bwd_microstep: 4459.00 | bwd_inner_microstep: 1755.89 | bwd_allreduce_microstep: 2703.06 | step_microstep: 38.17
[2024-06-10 08:42:18,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.66 | bwd: 44503.80 | bwd_inner: 41799.84 | bwd_allreduce: 2703.29 | step: 39.82
{'loss': 1.2896, 'learning_rate': 3.4313592528623384e-05, 'epoch': 0.27}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890
[2024-06-10 08:42:19,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.15 | bwd_microstep: 766.45 | bwd_inner_microstep: 766.36 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 08:42:20,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1277.51 | bwd_inner_microstep: 1277.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 08:42:23,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.62 | bwd_microstep: 1496.90 | bwd_inner_microstep: 1496.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3852
[2024-06-10 08:42:25,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.20 | bwd_microstep: 1518.48 | bwd_inner_microstep: 1518.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 08:42:27,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1396.60 | bwd_inner_microstep: 1396.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2455
[2024-06-10 08:42:28,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.56 | bwd_microstep: 948.98 | bwd_inner_microstep: 948.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 08:42:30,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1245.82 | bwd_inner_microstep: 1245.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 08:42:31,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1248.22 | bwd_inner_microstep: 1248.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 08:42:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.13 | bwd_microstep: 1615.24 | bwd_inner_microstep: 1615.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3652
[2024-06-10 08:42:36,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.41 | bwd_microstep: 1481.96 | bwd_inner_microstep: 1481.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 08:42:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.14 | bwd_microstep: 1336.52 | bwd_inner_microstep: 1336.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3636
[2024-06-10 08:42:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 1576.18 | bwd_inner_microstep: 1576.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 08:42:41,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 707.44 | bwd_inner_microstep: 707.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-10 08:42:42,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.69 | bwd_microstep: 1310.84 | bwd_inner_microstep: 1310.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 08:42:44,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1278.07 | bwd_inner_microstep: 1278.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-10 08:42:46,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1513.30 | bwd_inner_microstep: 1513.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 08:42:48,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 08:42:50,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 08:42:52,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1284.02 | bwd_inner_microstep: 1283.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2493
[2024-06-10 08:42:53,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.19 | bwd_microstep: 1054.79 | bwd_inner_microstep: 1054.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 08:42:55,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.24 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 08:42:57,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.89 | bwd_microstep: 1259.94 | bwd_inner_microstep: 1259.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3598
[2024-06-10 08:42:59,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.14 | bwd_microstep: 1437.70 | bwd_inner_microstep: 1437.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 08:43:01,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.93 | bwd_microstep: 1657.74 | bwd_inner_microstep: 1657.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 08:43:02,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.19 | bwd_microstep: 877.28 | bwd_inner_microstep: 877.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 08:43:04,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1411.98 | bwd_inner_microstep: 1411.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 08:43:06,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1432.74 | bwd_inner_microstep: 1432.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3598
[2024-06-10 08:43:09,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.09 | bwd_microstep: 1525.71 | bwd_inner_microstep: 1525.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3611
[2024-06-10 08:43:11,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1477.39 | bwd_inner_microstep: 1477.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3566
[2024-06-10 08:43:13,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.14 | bwd_microstep: 1459.44 | bwd_inner_microstep: 1459.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441
[2024-06-10 08:43:14,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 08:43:19,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 08:43:19,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 4105.19 | bwd_inner_microstep: 1686.75 | bwd_allreduce_microstep: 2418.39 | step_microstep: 37.93
[2024-06-10 08:43:19,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15983.56 | bwd: 45272.33 | bwd_inner: 42852.95 | bwd_allreduce: 2418.67 | step: 39.48
{'loss': 1.2771, 'learning_rate': 3.428735254061458e-05, 'epoch': 0.27}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3479
[2024-06-10 08:43:21,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.34 | bwd_microstep: 1568.82 | bwd_inner_microstep: 1568.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3957
[2024-06-10 08:43:24,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.89 | bwd_microstep: 1697.71 | bwd_inner_microstep: 1697.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-10 08:43:26,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.58 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 08:43:28,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.28 | bwd_microstep: 1653.07 | bwd_inner_microstep: 1653.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3768
[2024-06-10 08:43:30,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.18 | bwd_microstep: 1490.52 | bwd_inner_microstep: 1490.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3763
[2024-06-10 08:43:32,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.13 | bwd_microstep: 1609.04 | bwd_inner_microstep: 1609.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 08:43:34,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1414.28 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4119
[2024-06-10 08:43:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.33 | bwd_microstep: 1740.78 | bwd_inner_microstep: 1740.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 08:43:39,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.68 | bwd_microstep: 1516.36 | bwd_inner_microstep: 1516.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-10 08:43:40,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.16 | bwd_microstep: 891.38 | bwd_inner_microstep: 891.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 08:43:42,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1618.23 | bwd_inner_microstep: 1618.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 08:43:44,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.46 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 08:43:46,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.31 | bwd_microstep: 1344.38 | bwd_inner_microstep: 1344.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 08:43:48,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1346.83 | bwd_inner_microstep: 1346.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393
[2024-06-10 08:43:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1245.80 | bwd_inner_microstep: 1245.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 08:43:52,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.68 | bwd_microstep: 1514.59 | bwd_inner_microstep: 1514.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 08:43:54,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1252.28 | bwd_inner_microstep: 1252.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 08:43:56,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.55 | bwd_microstep: 1346.81 | bwd_inner_microstep: 1346.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301
[2024-06-10 08:43:57,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.46 | bwd_microstep: 979.61 | bwd_inner_microstep: 979.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 08:43:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.82 | bwd_microstep: 878.41 | bwd_inner_microstep: 878.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 08:44:00,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.22 | bwd_microstep: 1440.14 | bwd_inner_microstep: 1440.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 08:44:02,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1555.75 | bwd_inner_microstep: 1555.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 08:44:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.79 | bwd_microstep: 805.02 | bwd_inner_microstep: 804.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 08:44:05,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.64 | bwd_microstep: 1503.73 | bwd_inner_microstep: 1503.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 08:44:07,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.63 | bwd_microstep: 1460.57 | bwd_inner_microstep: 1460.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2191
[2024-06-10 08:44:09,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.57 | bwd_microstep: 861.86 | bwd_inner_microstep: 861.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 08:44:11,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.26 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577
[2024-06-10 08:44:13,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.10 | bwd_microstep: 1522.92 | bwd_inner_microstep: 1522.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033
[2024-06-10 08:44:14,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.62 | bwd_microstep: 840.89 | bwd_inner_microstep: 840.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3642
[2024-06-10 08:44:16,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.47 | bwd_microstep: 1352.47 | bwd_inner_microstep: 1352.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 08:44:18,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1411.20 | bwd_inner_microstep: 1411.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3780
[2024-06-10 08:44:22,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 08:44:22,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 3286.41 | bwd_inner_microstep: 1773.45 | bwd_allreduce_microstep: 1512.90 | step_microstep: 38.28
[2024-06-10 08:44:22,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16370.00 | bwd: 45730.99 | bwd_inner: 44217.18 | bwd_allreduce: 1513.13 | step: 40.05
{'loss': 1.2856, 'learning_rate': 3.4261062232624405e-05, 'epoch': 0.27}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 08:44:24,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1469.81 | bwd_inner_microstep: 1469.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 08:44:25,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1290.62 | bwd_inner_microstep: 1290.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3863
[2024-06-10 08:44:28,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.71 | bwd_microstep: 1510.62 | bwd_inner_microstep: 1510.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 08:44:29,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.36 | bwd_microstep: 1245.75 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 08:44:31,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1436.08 | bwd_inner_microstep: 1436.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 08:44:33,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1382.71 | bwd_inner_microstep: 1382.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 08:44:35,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1530.63 | bwd_inner_microstep: 1530.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407
[2024-06-10 08:44:37,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.39 | bwd_microstep: 1211.14 | bwd_inner_microstep: 1211.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942
[2024-06-10 08:44:38,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.56 | bwd_microstep: 884.74 | bwd_inner_microstep: 884.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491
[2024-06-10 08:44:40,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.83 | bwd_microstep: 1580.41 | bwd_inner_microstep: 1580.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3016
[2024-06-10 08:44:42,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 431.70 | bwd_microstep: 1130.03 | bwd_inner_microstep: 1130.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850
[2024-06-10 08:44:44,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.56 | bwd_microstep: 1562.46 | bwd_inner_microstep: 1562.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028
[2024-06-10 08:44:45,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.46 | bwd_microstep: 906.00 | bwd_inner_microstep: 905.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3497
[2024-06-10 08:44:47,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.28 | bwd_microstep: 1332.06 | bwd_inner_microstep: 1332.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-10 08:44:49,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1319.62 | bwd_inner_microstep: 1319.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1980
[2024-06-10 08:44:50,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.04 | bwd_microstep: 706.97 | bwd_inner_microstep: 706.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624
[2024-06-10 08:44:52,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.26 | bwd_microstep: 1444.17 | bwd_inner_microstep: 1444.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829
[2024-06-10 08:44:54,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1586.44 | bwd_inner_microstep: 1586.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3724
[2024-06-10 08:44:56,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1465.58 | bwd_inner_microstep: 1465.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3694
[2024-06-10 08:44:58,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.69 | bwd_microstep: 1397.64 | bwd_inner_microstep: 1397.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 08:45:00,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1297.35 | bwd_inner_microstep: 1297.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 08:45:02,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.86 | bwd_microstep: 1460.41 | bwd_inner_microstep: 1460.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 08:45:04,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1391.92 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 08:45:06,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1398.22 | bwd_inner_microstep: 1398.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3638
[2024-06-10 08:45:08,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1346.97 | bwd_inner_microstep: 1346.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 08:45:10,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3599
[2024-06-10 08:45:12,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.62 | bwd_microstep: 1570.13 | bwd_inner_microstep: 1570.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 08:45:14,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.77 | bwd_microstep: 1377.17 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 08:45:15,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.45 | bwd_microstep: 906.44 | bwd_inner_microstep: 906.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3810
[2024-06-10 08:45:17,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.99 | bwd_microstep: 1723.88 | bwd_inner_microstep: 1723.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 08:45:19,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1506.73 | bwd_inner_microstep: 1506.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2116
[2024-06-10 08:45:24,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 08:45:24,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.21 | bwd_microstep: 3897.44 | bwd_inner_microstep: 1082.48 | bwd_allreduce_microstep: 2814.92 | step_microstep: 37.81
[2024-06-10 08:45:24,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15960.73 | bwd: 45661.96 | bwd_inner: 42846.10 | bwd_allreduce: 2815.16 | step: 39.52
{'loss': 1.3667, 'learning_rate': 3.423472169724719e-05, 'epoch': 0.27}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397
[2024-06-10 08:45:26,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1393.77 | bwd_inner_microstep: 1393.54 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 08:45:27,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.54 | bwd_microstep: 786.81 | bwd_inner_microstep: 786.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3864
[2024-06-10 08:45:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.24 | bwd_microstep: 1559.56 | bwd_inner_microstep: 1559.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 08:45:31,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1484.36 | bwd_inner_microstep: 1484.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 08:45:33,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 08:45:35,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.20 | bwd_microstep: 1428.61 | bwd_inner_microstep: 1428.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732
[2024-06-10 08:45:37,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.96 | bwd_microstep: 1630.06 | bwd_inner_microstep: 1630.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188
[2024-06-10 08:45:38,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.51 | bwd_microstep: 955.13 | bwd_inner_microstep: 955.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3707
[2024-06-10 08:45:40,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1557.85 | bwd_inner_microstep: 1557.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885
[2024-06-10 08:45:41,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.04 | bwd_microstep: 684.34 | bwd_inner_microstep: 684.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 08:45:43,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 796.20 | bwd_inner_microstep: 796.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-10 08:45:44,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.19 | bwd_microstep: 1344.04 | bwd_inner_microstep: 1344.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 08:45:46,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.09 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 08:45:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.50 | bwd_microstep: 1196.29 | bwd_inner_microstep: 1196.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2608
[2024-06-10 08:45:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.10 | bwd_microstep: 1204.64 | bwd_inner_microstep: 1204.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638
[2024-06-10 08:45:52,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.57 | bwd_microstep: 1711.10 | bwd_inner_microstep: 1711.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 08:45:54,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.27 | bwd_microstep: 1312.07 | bwd_inner_microstep: 1312.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 08:45:56,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1490.90 | bwd_inner_microstep: 1490.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 08:45:58,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.11 | bwd_microstep: 1528.42 | bwd_inner_microstep: 1528.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2010
[2024-06-10 08:45:59,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.67 | bwd_microstep: 832.44 | bwd_inner_microstep: 832.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291
[2024-06-10 08:46:00,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.04 | bwd_microstep: 913.13 | bwd_inner_microstep: 913.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 08:46:03,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.47 | bwd_microstep: 1659.48 | bwd_inner_microstep: 1659.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 08:46:05,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.52 | bwd_microstep: 1658.93 | bwd_inner_microstep: 1658.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 08:46:07,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.26 | bwd_microstep: 1746.77 | bwd_inner_microstep: 1746.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 08:46:09,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1400.27 | bwd_inner_microstep: 1400.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3428
[2024-06-10 08:46:11,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.78 | bwd_microstep: 1543.22 | bwd_inner_microstep: 1543.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-10 08:46:13,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.26 | bwd_microstep: 1535.03 | bwd_inner_microstep: 1535.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-10 08:46:15,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.64 | bwd_microstep: 1441.98 | bwd_inner_microstep: 1441.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574
[2024-06-10 08:46:17,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.91 | bwd_microstep: 1426.71 | bwd_inner_microstep: 1426.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3568
[2024-06-10 08:46:19,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1529.38 | bwd_inner_microstep: 1529.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-10 08:46:21,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.21 | bwd_microstep: 969.17 | bwd_inner_microstep: 969.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 08:46:24,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 08:46:24,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.72 | bwd_microstep: 2608.93 | bwd_inner_microstep: 1334.03 | bwd_allreduce_microstep: 1274.85 | step_microstep: 37.91
[2024-06-10 08:46:24,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15911.53 | bwd: 43998.87 | bwd_inner: 42722.93 | bwd_allreduce: 1275.16 | step: 39.54
{'loss': 1.3183, 'learning_rate': 3.420833102725415e-05, 'epoch': 0.27}
21:50:23, 62.25s/it]


 27%|██▋       | 463/1726 [8:03:54<21:50:23, 62.25s/it]
 27%|██▋       | 464/1726 [8:04:54<21:38:07, 61.72s/it]


 27%|██▋       | 464/1726 [8:04:54<21:38:07, 61.72s/it]
 27%|██▋       | 465/1726 [8:05:56<21:36:16, 61.68s/it]


 27%|██▋       | 465/1726 [8:05:56<21:36:16, 61.68s/it]
 27%|██▋       | 466/1726 [8:06:58<21:40:09, 61.91s/it]


 27%|██▋       | 466/1726 [8:06:58<21:40:09, 61.91s/it]
 27%|██▋       | 467/1726 [8:08:00<21:39:30, 61.93s/it]


 27%|██▋       | 467/1726 [8:08:00<21:39:30, 61.93s/it]
 27%|██▋       | 468/1726 [8:09:01<21:27:59, 61.43s/it]


 27%|██▋       | 468/1726 [8:09:01<dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 08:46:26,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1478.84 | bwd_inner_microstep: 1478.64 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 08:46:28,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1407.81 | bwd_inner_microstep: 1407.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565
[2024-06-10 08:46:30,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.59 | bwd_microstep: 1329.74 | bwd_inner_microstep: 1329.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4202
[2024-06-10 08:46:32,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.56 | bwd_microstep: 1754.86 | bwd_inner_microstep: 1754.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 08:46:34,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1281.47 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 08:46:35,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.58 | bwd_microstep: 791.06 | bwd_inner_microstep: 791.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 08:46:37,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1392.79 | bwd_inner_microstep: 1392.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 08:46:39,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1287.84 | bwd_inner_microstep: 1287.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 08:46:41,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1399.98 | bwd_inner_microstep: 1399.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-10 08:46:42,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.53 | bwd_microstep: 1288.06 | bwd_inner_microstep: 1288.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1918
[2024-06-10 08:46:44,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.39 | bwd_microstep: 810.77 | bwd_inner_microstep: 810.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3669
[2024-06-10 08:46:46,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.77 | bwd_microstep: 1719.40 | bwd_inner_microstep: 1719.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3638
[2024-06-10 08:46:48,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.20 | bwd_microstep: 1681.11 | bwd_inner_microstep: 1681.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 08:46:50,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1417.59 | bwd_inner_microstep: 1417.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 08:46:52,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.58 | bwd_microstep: 1278.38 | bwd_inner_microstep: 1278.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 08:46:54,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.49 | bwd_microstep: 1288.39 | bwd_inner_microstep: 1288.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 08:46:56,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.25 | bwd_microstep: 1389.57 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947
[2024-06-10 08:46:57,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.74 | bwd_microstep: 729.81 | bwd_inner_microstep: 729.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3550
[2024-06-10 08:46:58,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.59 | bwd_microstep: 1232.59 | bwd_inner_microstep: 1232.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 08:47:00,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.66 | bwd_microstep: 1452.69 | bwd_inner_microstep: 1452.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3491
[2024-06-10 08:47:02,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.18 | bwd_microstep: 1222.48 | bwd_inner_microstep: 1222.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454
[2024-06-10 08:47:04,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.78 | bwd_microstep: 1192.73 | bwd_inner_microstep: 1192.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3706
[2024-06-10 08:47:06,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.92 | bwd_microstep: 1663.51 | bwd_inner_microstep: 1663.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-10 08:47:08,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.58 | bwd_microstep: 1361.42 | bwd_inner_microstep: 1361.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 08:47:10,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.78 | bwd_microstep: 1647.68 | bwd_inner_microstep: 1647.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-10 08:47:12,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.09 | bwd_microstep: 1541.87 | bwd_inner_microstep: 1541.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3727
[2024-06-10 08:47:14,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1559.91 | bwd_inner_microstep: 1559.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 08:47:17,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1505.90 | bwd_inner_microstep: 1505.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3107
[2024-06-10 08:47:18,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1247.21 | bwd_inner_microstep: 1247.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2267
[2024-06-10 08:47:20,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.10 | bwd_microstep: 1073.03 | bwd_inner_microstep: 1073.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603
[2024-06-10 08:47:22,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.87 | bwd_microstep: 1466.73 | bwd_inner_microstep: 1466.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2268
[2024-06-10 08:47:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 08:47:26,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.59 | bwd_microstep: 3675.16 | bwd_inner_microstep: 990.44 | bwd_allreduce_microstep: 2684.67 | step_microstep: 38.12
[2024-06-10 08:47:26,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.19 | bwd: 45570.40 | bwd_inner: 42884.68 | bwd_allreduce: 2684.98 | step: 39.96
{'loss': 1.2618, 'learning_rate': 3.4181890315593104e-05, 'epoch': 0.27}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1861
[2024-06-10 08:47:27,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.14 | bwd_microstep: 673.49 | bwd_inner_microstep: 673.38 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 08:47:29,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.32 | bwd_microstep: 1678.11 | bwd_inner_microstep: 1678.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 08:47:31,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.60 | bwd_microstep: 1551.22 | bwd_inner_microstep: 1551.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 08:47:33,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.72 | bwd_microstep: 1545.31 | bwd_inner_microstep: 1545.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 08:47:35,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1383.67 | bwd_inner_microstep: 1383.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975
[2024-06-10 08:47:36,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.71 | bwd_microstep: 708.31 | bwd_inner_microstep: 708.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 08:47:38,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.31 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 08:47:40,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.65 | bwd_microstep: 1190.40 | bwd_inner_microstep: 1190.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2086
[2024-06-10 08:47:41,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.31 | bwd_microstep: 728.95 | bwd_inner_microstep: 728.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1937
[2024-06-10 08:47:42,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.78 | bwd_microstep: 726.49 | bwd_inner_microstep: 726.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496
[2024-06-10 08:47:44,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1317.58 | bwd_inner_microstep: 1317.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 08:47:46,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.74 | bwd_microstep: 1625.13 | bwd_inner_microstep: 1625.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3445
[2024-06-10 08:47:48,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.70 | bwd_microstep: 1478.51 | bwd_inner_microstep: 1478.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 08:47:50,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1485.50 | bwd_inner_microstep: 1485.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 08:47:52,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1510.35 | bwd_inner_microstep: 1510.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1904
[2024-06-10 08:47:53,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.71 | bwd_microstep: 778.46 | bwd_inner_microstep: 778.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938
[2024-06-10 08:47:54,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.93 | bwd_microstep: 728.93 | bwd_inner_microstep: 728.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 08:47:56,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.50 | bwd_microstep: 1523.38 | bwd_inner_microstep: 1523.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 08:47:58,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1509.33 | bwd_inner_microstep: 1509.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3554
[2024-06-10 08:48:00,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.72 | bwd_microstep: 1359.22 | bwd_inner_microstep: 1359.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 08:48:02,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.42 | bwd_microstep: 1528.97 | bwd_inner_microstep: 1528.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 08:48:04,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1396.73 | bwd_inner_microstep: 1396.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3793
[2024-06-10 08:48:06,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.76 | bwd_microstep: 1357.49 | bwd_inner_microstep: 1357.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 08:48:08,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1280.64 | bwd_inner_microstep: 1280.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-10 08:48:09,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.60 | bwd_microstep: 973.76 | bwd_inner_microstep: 973.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3669
[2024-06-10 08:48:11,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.31 | bwd_microstep: 1387.06 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 08:48:13,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.92 | bwd_microstep: 1462.54 | bwd_inner_microstep: 1462.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3725
[2024-06-10 08:48:15,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1512.43 | bwd_inner_microstep: 1512.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 08:48:17,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.90 | bwd_microstep: 1284.36 | bwd_inner_microstep: 1284.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 08:48:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1420.73 | bwd_inner_microstep: 1420.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 08:48:21,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1544.33 | bwd_inner_microstep: 1544.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 08:48:26,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-10 08:48:26,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.36 | bwd_microstep: 4207.43 | bwd_inner_microstep: 1613.34 | bwd_allreduce_microstep: 2594.01 | step_microstep: 38.94
[2024-06-10 08:48:26,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15572.72 | bwd: 44307.12 | bwd_inner: 41712.09 | bwd_allreduce: 2594.30 | step: 41.59
{'loss': 1.2858, 'learning_rate': 3.4155399655388076e-05, 'epoch': 0.27}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 08:48:28,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.87 | bwd_microstep: 1232.51 | bwd_inner_microstep: 1232.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3906
[2024-06-10 08:48:30,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1515.66 | bwd_inner_microstep: 1515.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-10 08:48:31,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.67 | bwd_microstep: 675.77 | bwd_inner_microstep: 675.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1856
[2024-06-10 08:48:32,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.70 | bwd_microstep: 672.72 | bwd_inner_microstep: 672.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-10 08:48:33,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.55 | bwd_microstep: 676.64 | bwd_inner_microstep: 676.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 08:48:35,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.83 | bwd_microstep: 1487.22 | bwd_inner_microstep: 1487.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 08:48:37,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1279.76 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 08:48:39,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.41 | bwd_microstep: 1633.05 | bwd_inner_microstep: 1633.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 08:48:41,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 08:48:42,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.31 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 08:48:44,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1349.77 | bwd_inner_microstep: 1349.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 08:48:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1301.77 | bwd_inner_microstep: 1301.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 08:48:48,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1348.55 | bwd_inner_microstep: 1348.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4002
[2024-06-10 08:48:50,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.82 | bwd_microstep: 1809.52 | bwd_inner_microstep: 1809.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 08:48:53,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.36 | bwd_microstep: 1558.21 | bwd_inner_microstep: 1558.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 08:48:55,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.09 | bwd_microstep: 1647.56 | bwd_inner_microstep: 1647.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 08:48:57,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.60 | bwd_microstep: 1256.24 | bwd_inner_microstep: 1256.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1999
[2024-06-10 08:48:58,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.98 | bwd_microstep: 863.14 | bwd_inner_microstep: 863.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-10 08:49:00,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1428.25 | bwd_inner_microstep: 1428.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3607
[2024-06-10 08:49:02,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1553.75 | bwd_inner_microstep: 1553.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 08:49:04,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.30 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 08:49:06,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.91 | bwd_microstep: 1382.11 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3833
[2024-06-10 08:49:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1356.57 | bwd_inner_microstep: 1356.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 08:49:09,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1388.73 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 08:49:12,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.11 | bwd_microstep: 1507.29 | bwd_inner_microstep: 1507.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 08:49:13,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.77 | bwd_microstep: 1404.46 | bwd_inner_microstep: 1404.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3679
[2024-06-10 08:49:15,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.36 | bwd_microstep: 1460.80 | bwd_inner_microstep: 1460.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3698
[2024-06-10 08:49:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.86 | bwd_microstep: 1554.92 | bwd_inner_microstep: 1554.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 08:49:19,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.56 | bwd_microstep: 809.29 | bwd_inner_microstep: 809.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3630
[2024-06-10 08:49:20,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.74 | bwd_microstep: 1216.69 | bwd_inner_microstep: 1216.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 08:49:22,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1285.55 | bwd_inner_microstep: 1285.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 08:49:26,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 08:49:26,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 3036.65 | bwd_inner_microstep: 1568.42 | bwd_allreduce_microstep: 1468.18 | step_microstep: 38.13
[2024-06-10 08:49:26,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15767.86 | bwd: 43641.70 | bwd_inner: 42172.60 | bwd_allreduce: 1468.41 | step: 40.01
{'loss': 1.3368, 'learning_rate': 3.412885913993905e-05, 'epoch': 0.27}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-10 08:49:28,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.23 | bwd_microstep: 1277.32 | bwd_inner_microstep: 1277.20 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.23
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3853
[2024-06-10 08:49:30,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.03 | bwd_microstep: 1588.79 | bwd_inner_microstep: 1588.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 08:49:32,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1552.75 | bwd_inner_microstep: 1552.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3504
[2024-06-10 08:49:34,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.63 | bwd_microstep: 1347.97 | bwd_inner_microstep: 1347.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4119
[2024-06-10 08:49:36,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.38 | bwd_microstep: 1734.70 | bwd_inner_microstep: 1734.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 08:49:38,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 08:49:40,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1246.13 | bwd_inner_microstep: 1246.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 08:49:41,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1299.01 | bwd_inner_microstep: 1298.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 08:49:43,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 792.88 | bwd_inner_microstep: 792.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 08:49:44,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.36 | bwd_microstep: 1150.68 | bwd_inner_microstep: 1150.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4047
[2024-06-10 08:49:46,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 1521.14 | bwd_inner_microstep: 1521.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4067
[2024-06-10 08:49:48,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.59 | bwd_microstep: 1646.80 | bwd_inner_microstep: 1646.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122
[2024-06-10 08:49:50,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.72 | bwd_microstep: 827.31 | bwd_inner_microstep: 827.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 08:49:51,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1352.34 | bwd_inner_microstep: 1352.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667
[2024-06-10 08:49:54,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1623.84 | bwd_inner_microstep: 1623.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1919
[2024-06-10 08:49:55,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.23 | bwd_microstep: 874.83 | bwd_inner_microstep: 874.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3434
[2024-06-10 08:49:57,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.41 | bwd_microstep: 1295.16 | bwd_inner_microstep: 1295.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 08:49:58,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.93 | bwd_microstep: 796.79 | bwd_inner_microstep: 796.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 08:50:00,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.01 | bwd_microstep: 1285.48 | bwd_inner_microstep: 1285.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144
[2024-06-10 08:50:01,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.58 | bwd_microstep: 834.12 | bwd_inner_microstep: 834.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 08:50:02,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.38 | bwd_microstep: 1186.24 | bwd_inner_microstep: 1186.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 08:50:05,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1512.24 | bwd_inner_microstep: 1512.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 08:50:06,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.07 | bwd_microstep: 974.27 | bwd_inner_microstep: 974.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905
[2024-06-10 08:50:07,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.83 | bwd_microstep: 685.78 | bwd_inner_microstep: 685.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3796
[2024-06-10 08:50:09,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.49 | bwd_microstep: 1482.14 | bwd_inner_microstep: 1482.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-10 08:50:11,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.12 | bwd_microstep: 1382.14 | bwd_inner_microstep: 1382.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-10 08:50:13,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.21 | bwd_microstep: 1550.03 | bwd_inner_microstep: 1550.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 08:50:15,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.47 | bwd_microstep: 1502.24 | bwd_inner_microstep: 1502.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3807
[2024-06-10 08:50:17,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1498.91 | bwd_inner_microstep: 1498.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 08:50:19,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.67 | bwd_microstep: 1498.14 | bwd_inner_microstep: 1498.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 2948
[2024-06-10 08:50:21,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.79 | bwd_microstep: 1360.09 | bwd_inner_microstep: 1360.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3593
[2024-06-10 08:50:28,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 08:50:28,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 5988.72 | bwd_inner_microstep: 2012.95 | bwd_allreduce_microstep: 3975.72 | step_microstep: 38.27
[2024-06-10 08:50:28,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15537.50 | bwd: 45914.12 | bwd_inner: 41937.39 | bwd_allreduce: 3976.00 | step: 39.91
{'loss': 1.3039, 'learning_rate': 3.410226886272159e-05, 'epoch': 0.27}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 08:50:29,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.53 | bwd_microstep: 1367.69 | bwd_inner_microstep: 1367.62 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4038
[2024-06-10 08:50:32,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1615.61 | bwd_inner_microstep: 1615.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3865
[2024-06-10 08:50:34,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.41 | bwd_microstep: 1362.25 | bwd_inner_microstep: 1362.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3795
[2024-06-10 08:50:36,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1398.84 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 08:50:38,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.24 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 08:50:39,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1346.27 | bwd_inner_microstep: 1346.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 08:50:42,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 1541.49 | bwd_inner_microstep: 1541.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 08:50:43,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1247.12 | bwd_inner_microstep: 1247.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 08:50:45,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.12 | bwd_microstep: 1182.53 | bwd_inner_microstep: 1182.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3609
[2024-06-10 08:50:47,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.55 | bwd_microstep: 1372.39 | bwd_inner_microstep: 1372.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-10 08:50:49,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1320.15 | bwd_inner_microstep: 1320.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 08:50:50,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.96 | bwd_microstep: 1311.57 | bwd_inner_microstep: 1311.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 08:50:52,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1447.22 | bwd_inner_microstep: 1447.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 08:50:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.78 | bwd_microstep: 788.14 | bwd_inner_microstep: 788.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 08:50:56,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.47 | bwd_microstep: 1519.51 | bwd_inner_microstep: 1519.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 08:50:57,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.21 | bwd_microstep: 1273.88 | bwd_inner_microstep: 1273.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 08:50:59,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.67 | bwd_microstep: 1339.02 | bwd_inner_microstep: 1338.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 08:51:01,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.72 | bwd_microstep: 1189.86 | bwd_inner_microstep: 1189.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3642
[2024-06-10 08:51:03,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.06 | bwd_microstep: 1543.47 | bwd_inner_microstep: 1543.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 08:51:05,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1614.25 | bwd_inner_microstep: 1614.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2733
[2024-06-10 08:51:07,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.46 | bwd_microstep: 946.85 | bwd_inner_microstep: 946.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2405
[2024-06-10 08:51:08,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.90 | bwd_microstep: 842.43 | bwd_inner_microstep: 842.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 08:51:10,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.07 | bwd_microstep: 1617.00 | bwd_inner_microstep: 1616.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2254
[2024-06-10 08:51:11,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.92 | bwd_microstep: 968.51 | bwd_inner_microstep: 968.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3825
[2024-06-10 08:51:14,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.57 | bwd_microstep: 1727.27 | bwd_inner_microstep: 1727.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3613
[2024-06-10 08:51:15,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.67 | bwd_microstep: 1310.27 | bwd_inner_microstep: 1310.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 08:51:18,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1559.78 | bwd_inner_microstep: 1559.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3835
[2024-06-10 08:51:20,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.02 | bwd_microstep: 1760.55 | bwd_inner_microstep: 1760.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 08:51:22,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1397.24 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 08:51:24,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.75 | bwd_microstep: 1494.59 | bwd_inner_microstep: 1494.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 08:51:26,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1651.94 | bwd_inner_microstep: 1651.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3852
[2024-06-10 08:51:29,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 08:51:29,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.21 | bwd_microstep: 1602.55 | bwd_inner_microstep: 1594.82 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.54
[2024-06-10 08:51:29,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16495.59 | bwd: 44140.29 | bwd_inner: 44131.65 | bwd_allreduce: 7.93 | step: 39.07
{'loss': 1.3178, 'learning_rate': 3.40756289173865e-05, 'epoch': 0.27}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 08:51:30,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 08:51:32,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3846
[2024-06-10 08:51:35,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.68 | bwd_microstep: 1611.48 | bwd_inner_microstep: 1611.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2278
[2024-06-10 08:51:36,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.20 | bwd_microstep: 935.88 | bwd_inner_microstep: 935.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 08:51:37,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 791.78 | bwd_inner_microstep: 791.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753
[2024-06-10 08:51:39,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.01 | bwd_microstep: 1472.10 | bwd_inner_microstep: 1472.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 08:51:41,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.51 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 08:51:43,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 1286.43 | bwd_inner_microstep: 1286.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 08:51:44,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1255.45 | bwd_inner_microstep: 1255.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 08:51:46,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.92 | bwd_microstep: 1154.73 | bwd_inner_microstep: 1154.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2685
[2024-06-10 08:51:47,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.42 | bwd_microstep: 999.42 | bwd_inner_microstep: 999.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-10 08:51:48,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.46 | bwd_microstep: 760.30 | bwd_inner_microstep: 760.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975
[2024-06-10 08:51:49,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 734.82 | bwd_inner_microstep: 734.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3544
[2024-06-10 08:51:51,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.63 | bwd_microstep: 1448.63 | bwd_inner_microstep: 1448.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 08:51:53,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1384.62 | bwd_inner_microstep: 1384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468
[2024-06-10 08:51:55,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 1438.74 | bwd_inner_microstep: 1438.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 08:51:57,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1493.03 | bwd_inner_microstep: 1493.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 08:51:59,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1380.72 | bwd_inner_microstep: 1380.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 08:52:01,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.71 | bwd_microstep: 1560.90 | bwd_inner_microstep: 1560.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-10 08:52:03,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.32 | bwd_microstep: 1480.90 | bwd_inner_microstep: 1480.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2032
[2024-06-10 08:52:04,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.21 | bwd_microstep: 716.15 | bwd_inner_microstep: 716.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 08:52:07,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.53 | bwd_microstep: 1536.27 | bwd_inner_microstep: 1536.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 08:52:08,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1287.66 | bwd_inner_microstep: 1287.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-10 08:52:10,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.46 | bwd_microstep: 879.01 | bwd_inner_microstep: 878.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 08:52:12,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1658.83 | bwd_inner_microstep: 1658.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 08:52:14,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.09 | bwd_microstep: 1430.28 | bwd_inner_microstep: 1430.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3553
[2024-06-10 08:52:16,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.43 | bwd_microstep: 1697.23 | bwd_inner_microstep: 1697.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 08:52:18,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.86 | bwd_microstep: 1654.58 | bwd_inner_microstep: 1654.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 08:52:20,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.67 | bwd_microstep: 805.67 | bwd_inner_microstep: 805.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2278
[2024-06-10 08:52:21,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.93 | bwd_microstep: 1004.02 | bwd_inner_microstep: 1003.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 08:52:23,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.36 | bwd_microstep: 1454.26 | bwd_inner_microstep: 1454.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 08:52:29,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.31 | optimizer_step: 6.65
[2024-06-10 08:52:29,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.48 | bwd_microstep: 5326.72 | bwd_inner_microstep: 1547.83 | bwd_allreduce_microstep: 3778.83 | step_microstep: 38.38
[2024-06-10 08:52:29,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15236.65 | bwd: 44655.71 | bwd_inner: 40875.92 | bwd_allreduce: 3779.07 | step: 39.96
21:27:59, 61.43s/it]
 27%|██▋       | 469/1726 [8:10:03<21:30:05, 61.58s/it]


 27%|██▋       | 469/1726 [8:10:03<21:30:05, 61.58s/it]
 27%|██▋       | 470/1726 [8:11:03<21:20:33, 61.17s/it]


 27%|██▋       | 470/1726 [8:11:03<21:20:33, 61.17s/it]
 27%|██▋       | 471/1726 [8:12:03<21:10:39, 60.75s/it]


 27%|██▋       | 471/1726 [8:12:03<21:10:39, 60.75s/it]
 27%|██▋       | 472/1726 [8:13:04<21:16:07, 61.06s/it]


 27%|██▋       | 472/1726 [8:13:04<21:16:07, 61.06s/it]
 27%|██▋       | 473/1726 [8:14:05<21:14:35, 61.03s/it]


 27%|██▋       | 473/1726 [8:14:05<21:14:35, 61.03s/it]
 27%|██▋       | 474/1726 [8:15:06<21:08:34, 60.79s/it]
                                    {'loss': 1.2566, 'learning_rate': 3.404893939775955e-05, 'epoch': 0.27}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 08:52:31,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1241.35 | bwd_inner_microstep: 1241.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 08:52:32,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.60 | bwd_microstep: 786.67 | bwd_inner_microstep: 786.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 08:52:34,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1506.77 | bwd_inner_microstep: 1506.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 08:52:35,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 08:52:37,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.91 | bwd_microstep: 1386.01 | bwd_inner_microstep: 1385.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 08:52:39,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.80 | bwd_microstep: 1283.80 | bwd_inner_microstep: 1283.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 08:52:41,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1388.31 | bwd_inner_microstep: 1388.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 08:52:43,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1385.50 | bwd_inner_microstep: 1385.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 08:52:45,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.09 | bwd_microstep: 1294.50 | bwd_inner_microstep: 1294.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 08:52:47,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1399.85 | bwd_inner_microstep: 1399.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1904
[2024-06-10 08:52:48,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.21 | bwd_microstep: 808.67 | bwd_inner_microstep: 808.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569
[2024-06-10 08:52:50,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.13 | bwd_microstep: 1447.64 | bwd_inner_microstep: 1447.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 08:52:52,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1345.12 | bwd_inner_microstep: 1345.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 08:52:54,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1607.04 | bwd_inner_microstep: 1607.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 08:52:56,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.43 | bwd_microstep: 1584.62 | bwd_inner_microstep: 1584.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 08:52:58,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.73 | bwd_microstep: 1308.18 | bwd_inner_microstep: 1308.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 08:53:00,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1255.19 | bwd_inner_microstep: 1255.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 08:53:01,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.83 | bwd_microstep: 1289.42 | bwd_inner_microstep: 1289.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3429
[2024-06-10 08:53:03,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.54 | bwd_microstep: 1410.46 | bwd_inner_microstep: 1410.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3467
[2024-06-10 08:53:05,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1425.83 | bwd_inner_microstep: 1425.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2186
[2024-06-10 08:53:07,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.70 | bwd_microstep: 859.06 | bwd_inner_microstep: 859.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 08:53:08,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1404.04 | bwd_inner_microstep: 1404.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 08:53:10,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.32 | bwd_microstep: 1484.92 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3814
[2024-06-10 08:53:13,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1502.66 | bwd_inner_microstep: 1502.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2288
[2024-06-10 08:53:14,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.63 | bwd_microstep: 1041.06 | bwd_inner_microstep: 1041.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 08:53:16,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1296.16 | bwd_inner_microstep: 1296.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 08:53:18,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1559.82 | bwd_inner_microstep: 1559.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 08:53:20,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.75 | bwd_microstep: 1427.07 | bwd_inner_microstep: 1427.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 08:53:22,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.59 | bwd_microstep: 1189.98 | bwd_inner_microstep: 1189.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 08:53:24,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1396.99 | bwd_inner_microstep: 1396.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567
[2024-06-10 08:53:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1590.48 | bwd_inner_microstep: 1590.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2230
[2024-06-10 08:53:30,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 08:53:30,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.61 | bwd_microstep: 4328.14 | bwd_inner_microstep: 946.04 | bwd_allreduce_microstep: 3382.05 | step_microstep: 37.96
[2024-06-10 08:53:30,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15776.22 | bwd: 45481.63 | bwd_inner: 42098.67 | bwd_allreduce: 3382.28 | step: 39.49
{'loss': 1.2282, 'learning_rate': 3.4022200397841056e-05, 'epoch': 0.28}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 08:53:32,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1374.27 | bwd_inner_microstep: 1374.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4574
[2024-06-10 08:53:35,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.77 | bwd_microstep: 1686.12 | bwd_inner_microstep: 1686.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 08:53:36,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1939
[2024-06-10 08:53:38,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.03 | bwd_microstep: 759.56 | bwd_inner_microstep: 759.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3567
[2024-06-10 08:53:39,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.35 | bwd_microstep: 1262.35 | bwd_inner_microstep: 1262.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1955
[2024-06-10 08:53:40,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.51 | bwd_microstep: 728.71 | bwd_inner_microstep: 728.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749
[2024-06-10 08:53:43,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.10 | bwd_microstep: 1635.98 | bwd_inner_microstep: 1635.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2181
[2024-06-10 08:53:44,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.94 | bwd_microstep: 887.16 | bwd_inner_microstep: 887.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 08:53:46,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.00 | bwd_microstep: 1380.18 | bwd_inner_microstep: 1380.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 08:53:48,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.43 | bwd_microstep: 1383.79 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 08:53:50,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.12 | bwd_microstep: 1639.36 | bwd_inner_microstep: 1639.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 08:53:52,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1430.10 | bwd_inner_microstep: 1430.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 08:53:54,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1449.62 | bwd_inner_microstep: 1449.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785
[2024-06-10 08:53:56,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.55 | bwd_microstep: 1602.77 | bwd_inner_microstep: 1602.73 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 08:53:58,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1241.85 | bwd_inner_microstep: 1241.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3382
[2024-06-10 08:53:59,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.58 | bwd_microstep: 1175.98 | bwd_inner_microstep: 1175.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3703
[2024-06-10 08:54:02,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.41 | bwd_microstep: 1727.76 | bwd_inner_microstep: 1727.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3428
[2024-06-10 08:54:04,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1410.77 | bwd_inner_microstep: 1410.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3674
[2024-06-10 08:54:05,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.91 | bwd_microstep: 1292.96 | bwd_inner_microstep: 1292.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 08:54:07,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1393.94 | bwd_inner_microstep: 1393.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3574
[2024-06-10 08:54:10,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1551.88 | bwd_inner_microstep: 1551.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 08:54:11,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635
[2024-06-10 08:54:13,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1419.36 | bwd_inner_microstep: 1419.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 08:54:16,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.39 | bwd_microstep: 1508.18 | bwd_inner_microstep: 1508.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 08:54:18,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 1459.67 | bwd_inner_microstep: 1459.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 08:54:20,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.34 | bwd_microstep: 1556.87 | bwd_inner_microstep: 1556.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 08:54:22,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 08:54:24,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1512.45 | bwd_inner_microstep: 1512.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 08:54:25,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1282.90 | bwd_inner_microstep: 1282.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 08:54:27,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1310.67 | bwd_inner_microstep: 1310.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 08:54:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.12 | bwd_microstep: 1312.93 | bwd_inner_microstep: 1312.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 08:54:33,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 08:54:33,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 3371.92 | bwd_inner_microstep: 1439.08 | bwd_allreduce_microstep: 1932.79 | step_microstep: 38.02
[2024-06-10 08:54:33,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16400.96 | bwd: 45881.55 | bwd_inner: 43947.84 | bwd_allreduce: 1933.03 | step: 39.48
{'loss': 1.2975, 'learning_rate': 3.3995412011805657e-05, 'epoch': 0.28}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 08:54:35,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.96 | bwd_microstep: 1338.55 | bwd_inner_microstep: 1338.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3941
[2024-06-10 08:54:37,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.34 | bwd_microstep: 1694.21 | bwd_inner_microstep: 1694.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2384
[2024-06-10 08:54:38,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.82 | bwd_microstep: 899.71 | bwd_inner_microstep: 899.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 08:54:41,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.21 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944
[2024-06-10 08:54:42,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.79 | bwd_microstep: 758.40 | bwd_inner_microstep: 758.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416
[2024-06-10 08:54:43,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.18 | bwd_microstep: 1181.35 | bwd_inner_microstep: 1181.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3406
[2024-06-10 08:54:45,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1292.97 | bwd_inner_microstep: 1292.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 08:54:47,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.27 | bwd_microstep: 1249.50 | bwd_inner_microstep: 1249.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-10 08:54:49,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.85 | bwd_microstep: 1548.53 | bwd_inner_microstep: 1548.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 08:54:51,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 1536.68 | bwd_inner_microstep: 1536.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 08:54:53,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1427.07 | bwd_inner_microstep: 1427.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2089
[2024-06-10 08:54:54,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.20 | bwd_microstep: 729.28 | bwd_inner_microstep: 729.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 08:54:56,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1410.24 | bwd_inner_microstep: 1410.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634
[2024-06-10 08:54:58,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.60 | bwd_microstep: 1542.49 | bwd_inner_microstep: 1542.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2127
[2024-06-10 08:55:00,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.36 | bwd_microstep: 1022.03 | bwd_inner_microstep: 1022.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2894
[2024-06-10 08:55:01,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.00 | bwd_microstep: 1273.97 | bwd_inner_microstep: 1273.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 08:55:03,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1509.28 | bwd_inner_microstep: 1509.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 08:55:05,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 1452.92 | bwd_inner_microstep: 1452.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1980
[2024-06-10 08:55:07,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.54 | bwd_microstep: 843.99 | bwd_inner_microstep: 843.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3543
[2024-06-10 08:55:08,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.48 | bwd_microstep: 1355.69 | bwd_inner_microstep: 1355.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3610
[2024-06-10 08:55:10,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.02 | bwd_microstep: 1373.00 | bwd_inner_microstep: 1372.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716
[2024-06-10 08:55:12,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.86 | bwd_microstep: 1340.93 | bwd_inner_microstep: 1340.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 08:55:14,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.77 | bwd_microstep: 1609.76 | bwd_inner_microstep: 1609.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 08:55:17,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.82 | bwd_microstep: 1653.89 | bwd_inner_microstep: 1653.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280
[2024-06-10 08:55:18,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.77 | bwd_microstep: 1006.14 | bwd_inner_microstep: 1006.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 08:55:20,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1605.58 | bwd_inner_microstep: 1605.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-10 08:55:22,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.09 | bwd_microstep: 932.54 | bwd_inner_microstep: 932.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3526
[2024-06-10 08:55:23,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1365.09 | bwd_inner_microstep: 1365.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 08:55:26,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.80 | bwd_microstep: 1631.79 | bwd_inner_microstep: 1631.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 08:55:27,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1298.39 | bwd_inner_microstep: 1298.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 08:55:30,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.31 | bwd_microstep: 1556.82 | bwd_inner_microstep: 1556.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-10 08:55:35,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.33 | optimizer_step: 6.63
[2024-06-10 08:55:35,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 4534.88 | bwd_inner_microstep: 1637.54 | bwd_allreduce_microstep: 2897.27 | step_microstep: 38.70
[2024-06-10 08:55:35,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15844.67 | bwd: 45523.16 | bwd_inner: 42624.96 | bwd_allreduce: 2897.51 | step: 40.25
{'loss': 1.3316, 'learning_rate': 3.396857433400192e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 08:55:37,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.42 | bwd_microstep: 1463.00 | bwd_inner_microstep: 1462.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3921
[2024-06-10 08:55:39,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1691.02 | bwd_inner_microstep: 1691.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 08:55:41,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.91 | bwd_microstep: 1678.41 | bwd_inner_microstep: 1678.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 08:55:43,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1388.97 | bwd_inner_microstep: 1388.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 08:55:45,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1381.33 | bwd_inner_microstep: 1381.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 08:55:47,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.60 | bwd_microstep: 1281.40 | bwd_inner_microstep: 1281.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 08:55:49,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1277.51 | bwd_inner_microstep: 1277.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 08:55:50,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.72 | bwd_microstep: 1146.64 | bwd_inner_microstep: 1146.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 08:55:52,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1252.12 | bwd_inner_microstep: 1252.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 08:55:54,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.82 | bwd_microstep: 1344.14 | bwd_inner_microstep: 1344.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3483
[2024-06-10 08:55:56,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.57 | bwd_microstep: 1544.09 | bwd_inner_microstep: 1544.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2164
[2024-06-10 08:55:57,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.28 | bwd_microstep: 949.63 | bwd_inner_microstep: 949.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2659
[2024-06-10 08:55:59,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.60 | bwd_microstep: 1021.13 | bwd_inner_microstep: 1021.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1978
[2024-06-10 08:56:00,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.81 | bwd_microstep: 828.17 | bwd_inner_microstep: 828.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 08:56:02,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1509.66 | bwd_inner_microstep: 1509.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 08:56:04,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.48 | bwd_microstep: 1158.27 | bwd_inner_microstep: 1158.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905
[2024-06-10 08:56:05,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.86 | bwd_microstep: 684.14 | bwd_inner_microstep: 684.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 08:56:07,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1382.50 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2447
[2024-06-10 08:56:08,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.38 | bwd_microstep: 1048.06 | bwd_inner_microstep: 1048.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 08:56:10,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1255.96 | bwd_inner_microstep: 1255.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637
[2024-06-10 08:56:12,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1513.48 | bwd_inner_microstep: 1513.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 08:56:14,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.08 | bwd_microstep: 1660.07 | bwd_inner_microstep: 1660.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1400
[2024-06-10 08:56:15,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 206.40 | bwd_microstep: 529.05 | bwd_inner_microstep: 529.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 08:56:17,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.27 | bwd_microstep: 1503.29 | bwd_inner_microstep: 1503.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 08:56:19,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1419.53 | bwd_inner_microstep: 1419.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3470
[2024-06-10 08:56:21,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 1428.52 | bwd_inner_microstep: 1428.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2223
[2024-06-10 08:56:22,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.99 | bwd_microstep: 925.87 | bwd_inner_microstep: 925.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 08:56:25,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.04 | bwd_microstep: 1751.31 | bwd_inner_microstep: 1751.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 08:56:27,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.81 | bwd_microstep: 1753.91 | bwd_inner_microstep: 1753.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 08:56:29,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.09 | bwd_microstep: 1405.09 | bwd_inner_microstep: 1405.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 08:56:31,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.29 | bwd_microstep: 1600.49 | bwd_inner_microstep: 1600.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 08:56:36,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 08:56:36,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 3826.82 | bwd_inner_microstep: 1667.15 | bwd_allreduce_microstep: 2159.62 | step_microstep: 37.98
[2024-06-10 08:56:36,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15823.27 | bwd: 44603.61 | bwd_inner: 42443.06 | bwd_allreduce: 2159.85 | step: 39.63
{'loss': 1.267, 'learning_rate': 3.394168745895199e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 08:56:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1480.35 | bwd_inner_microstep: 1480.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4004
[2024-06-10 08:56:40,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.24 | bwd_microstep: 1605.68 | bwd_inner_microstep: 1605.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399
[2024-06-10 08:56:41,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.16 | bwd_microstep: 1145.53 | bwd_inner_microstep: 1145.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 08:56:43,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.37 | bwd_microstep: 1523.42 | bwd_inner_microstep: 1523.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185
[2024-06-10 08:56:45,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.39 | bwd_microstep: 951.32 | bwd_inner_microstep: 951.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 08:56:46,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 788.98 | bwd_inner_microstep: 788.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 08:56:48,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1429.96 | bwd_inner_microstep: 1429.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 08:56:50,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1390.41 | bwd_inner_microstep: 1390.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 08:56:52,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1376.79 | bwd_inner_microstep: 1376.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-10 08:56:54,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.40 | bwd_microstep: 1616.55 | bwd_inner_microstep: 1616.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891
[2024-06-10 08:56:55,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.02 | bwd_microstep: 776.12 | bwd_inner_microstep: 776.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 08:56:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.76 | bwd_microstep: 1375.57 | bwd_inner_microstep: 1375.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 08:56:59,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.34 | bwd_microstep: 1312.50 | bwd_inner_microstep: 1312.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3651
[2024-06-10 08:57:01,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.86 | bwd_microstep: 1451.77 | bwd_inner_microstep: 1451.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3642
[2024-06-10 08:57:02,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.08 | bwd_microstep: 1315.75 | bwd_inner_microstep: 1315.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3761
[2024-06-10 08:57:04,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.97 | bwd_microstep: 1403.65 | bwd_inner_microstep: 1403.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 08:57:06,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.86 | bwd_microstep: 1405.12 | bwd_inner_microstep: 1405.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1413
[2024-06-10 08:57:07,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.91 | bwd_microstep: 531.80 | bwd_inner_microstep: 531.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 08:57:08,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.23 | bwd_microstep: 795.91 | bwd_inner_microstep: 795.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 08:57:09,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.20 | bwd_microstep: 801.12 | bwd_inner_microstep: 801.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2351
[2024-06-10 08:57:11,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.29 | bwd_microstep: 926.78 | bwd_inner_microstep: 926.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 08:57:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.46 | bwd_microstep: 1453.21 | bwd_inner_microstep: 1453.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 08:57:15,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1396.84 | bwd_inner_microstep: 1396.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 08:57:17,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.21 | bwd_microstep: 1542.06 | bwd_inner_microstep: 1542.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3601
[2024-06-10 08:57:19,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.20 | bwd_microstep: 1537.70 | bwd_inner_microstep: 1537.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 08:57:20,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.46 | bwd_microstep: 803.77 | bwd_inner_microstep: 803.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 08:57:22,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.82 | bwd_microstep: 1497.28 | bwd_inner_microstep: 1497.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2177
[2024-06-10 08:57:23,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.65 | bwd_microstep: 1049.59 | bwd_inner_microstep: 1049.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2278
[2024-06-10 08:57:25,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.21 | bwd_microstep: 1006.67 | bwd_inner_microstep: 1006.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-10 08:57:27,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.74 | bwd_microstep: 1540.14 | bwd_inner_microstep: 1540.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569
[2024-06-10 08:57:29,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.26 | bwd_microstep: 1443.83 | bwd_inner_microstep: 1443.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 08:57:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.01 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 08:57:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 8442.47 | bwd_inner_microstep: 1689.88 | bwd_allreduce_microstep: 6752.53 | step_microstep: 40.33
[2024-06-10 08:57:38,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15046.04 | bwd: 47118.65 | bwd_inner: 40365.20 | bwd_allreduce: 6752.77 | step: 41.80
{'loss': 1.2801, 'learning_rate': 3.39147514813513e-05, 'epoch': 0.28}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-10 08:57:40,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.73 | bwd_microstep: 1575.11 | bwd_inner_microstep: 1575.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 08:57:42,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.87 | bwd_microstep: 1269.92 | bwd_inner_microstep: 1269.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 08:57:44,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.51 | bwd_microstep: 1552.97 | bwd_inner_microstep: 1552.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-10 08:57:46,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.17 | bwd_microstep: 1373.19 | bwd_inner_microstep: 1373.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 08:57:48,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1247.45 | bwd_inner_microstep: 1247.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 08:57:49,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.21 | bwd_microstep: 1241.72 | bwd_inner_microstep: 1241.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 08:57:51,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.42 | bwd_microstep: 1382.92 | bwd_inner_microstep: 1382.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3740
[2024-06-10 08:57:53,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1459.11 | bwd_inner_microstep: 1459.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 08:57:55,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1389.40 | bwd_inner_microstep: 1389.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3727
[2024-06-10 08:57:57,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1457.40 | bwd_inner_microstep: 1457.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 08:57:59,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1415.14 | bwd_inner_microstep: 1415.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-10 08:58:01,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.73 | bwd_microstep: 1422.46 | bwd_inner_microstep: 1422.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 08:58:03,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1245.93 | bwd_inner_microstep: 1245.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 08:58:05,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.87 | bwd_microstep: 1279.25 | bwd_inner_microstep: 1279.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 08:58:06,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.59 | bwd_microstep: 1203.04 | bwd_inner_microstep: 1203.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 08:58:08,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 08:58:10,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.97 | bwd_microstep: 1279.91 | bwd_inner_microstep: 1279.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 08:58:12,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1511.14 | bwd_inner_microstep: 1511.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2135
[2024-06-10 08:58:13,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.39 | bwd_microstep: 928.48 | bwd_inner_microstep: 928.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 08:58:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1557.92 | bwd_inner_microstep: 1557.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-10 08:58:16,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.56 | bwd_microstep: 686.94 | bwd_inner_microstep: 686.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 08:58:18,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1250.16 | bwd_inner_microstep: 1250.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 08:58:20,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1908
[2024-06-10 08:58:21,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.39 | bwd_microstep: 812.11 | bwd_inner_microstep: 812.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 08:58:23,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1608.70 | bwd_inner_microstep: 1608.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3526
[2024-06-10 08:58:25,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1228.00 | bwd_inner_microstep: 1227.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816
[2024-06-10 08:58:27,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.42 | bwd_microstep: 1599.54 | bwd_inner_microstep: 1599.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3768
[2024-06-10 08:58:29,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.42 | bwd_microstep: 1414.98 | bwd_inner_microstep: 1414.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 08:58:31,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1345.81 | bwd_inner_microstep: 1345.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 08:58:33,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1550.68 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3590
[2024-06-10 08:58:35,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.48 | bwd_microstep: 1465.75 | bwd_inner_microstep: 1465.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 08:58:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 08:58:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1546.02 | bwd_inner_microstep: 1538.28 | bwd_allreduce_microstep: 7.69 | step_microstep: 37.66
[2024-06-10 08:58:37,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16108.28 | bwd: 42974.63 | bwd_inner: 42966.05 | bwd_allreduce: 7.92 | step: 39.25


 27%|██▋       | 474/1726 [8:15:06<21:08:34, 60.79s/it]
 28%|██▊       | 475/1726 [8:16:07<21:12:35, 61.04s/it]


 28%|██▊       | 475/1726 [8:16:07<21:12:35, 61.04s/it]
 28%|██▊       | 476/1726 [8:17:10<21:21:28, 61.51s/it]


 28%|██▊       | 476/1726 [8:17:10<21:21:28, 61.51s/it]
 28%|██▊       | 477/1726 [8:18:11<21:21:43, 61.57s/it]


 28%|██▊       | 477/1726 [8:18:11<21:21:43, 61.57s/it]
 28%|██▊       | 478/1726 [8:19:12<21:15:45, 61.33s/it]


 28%|██▊       | 478/1726 [8:19:12<21:15:45, 61.33s/it]
 28%|██▊       | 479/1726 [8:20:15<21:21:58, 61.68s/it]


 28%|██▊       | 479/1726 [8:20:15<21:21:58, 61.68s/it]
 28%|██▊       | 480/1726 [8:2{'loss': 1.3054, 'learning_rate': 3.388776649606823e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 08:58:40,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.52 | bwd_microstep: 1495.41 | bwd_inner_microstep: 1495.30 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1785
[2024-06-10 08:58:40,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 224.77 | bwd_microstep: 573.95 | bwd_inner_microstep: 573.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 08:58:43,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.28 | bwd_microstep: 1664.32 | bwd_inner_microstep: 1664.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846
[2024-06-10 08:58:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.57 | bwd_microstep: 1662.11 | bwd_inner_microstep: 1662.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 08:58:47,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.45 | bwd_microstep: 1249.17 | bwd_inner_microstep: 1249.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 08:58:49,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1443.36 | bwd_inner_microstep: 1443.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733
[2024-06-10 08:58:51,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1432.07 | bwd_inner_microstep: 1432.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 08:58:53,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.80 | bwd_microstep: 1388.65 | bwd_inner_microstep: 1388.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-10 08:58:54,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.92 | bwd_microstep: 732.90 | bwd_inner_microstep: 732.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502
[2024-06-10 08:58:55,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1349.15 | bwd_inner_microstep: 1349.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 08:58:57,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1486.90 | bwd_inner_microstep: 1486.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 08:59:00,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.82 | bwd_microstep: 1520.82 | bwd_inner_microstep: 1520.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-10 08:59:02,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.21 | bwd_microstep: 1527.30 | bwd_inner_microstep: 1527.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 08:59:04,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 1511.62 | bwd_inner_microstep: 1511.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 08:59:06,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.07 | bwd_microstep: 1442.58 | bwd_inner_microstep: 1442.35 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.15
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3659
[2024-06-10 08:59:08,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.21 | bwd_microstep: 1369.03 | bwd_inner_microstep: 1369.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3470
[2024-06-10 08:59:10,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.51 | bwd_microstep: 1521.33 | bwd_inner_microstep: 1521.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 08:59:12,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1420.83 | bwd_inner_microstep: 1420.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-10 08:59:14,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1500.56 | bwd_inner_microstep: 1500.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 08:59:16,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.43 | bwd_microstep: 1401.99 | bwd_inner_microstep: 1401.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 08:59:18,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.75 | bwd_microstep: 1571.54 | bwd_inner_microstep: 1571.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 08:59:20,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1289.64 | bwd_inner_microstep: 1289.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503
[2024-06-10 08:59:21,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.80 | bwd_microstep: 1319.94 | bwd_inner_microstep: 1319.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 08:59:24,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.38 | bwd_microstep: 1481.69 | bwd_inner_microstep: 1481.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 08:59:26,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1574.51 | bwd_inner_microstep: 1574.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 08:59:28,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1511.28 | bwd_inner_microstep: 1511.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3455
[2024-06-10 08:59:30,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1404.82 | bwd_inner_microstep: 1404.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 08:59:32,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1449.81 | bwd_inner_microstep: 1449.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 08:59:33,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.72 | bwd_microstep: 818.70 | bwd_inner_microstep: 818.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 08:59:35,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.06 | bwd_microstep: 1556.95 | bwd_inner_microstep: 1556.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 08:59:37,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.08 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510
[2024-06-10 08:59:39,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.30 | optimizer_step: 6.59
[2024-06-10 08:59:39,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.99 | bwd_microstep: 1236.75 | bwd_inner_microstep: 1225.93 | bwd_allreduce_microstep: 10.77 | step_microstep: 38.74
[2024-06-10 08:59:39,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16526.67 | bwd: 44197.37 | bwd_inner: 44185.47 | bwd_allreduce: 11.10 | step: 40.44
{'loss': 1.2471, 'learning_rate': 3.3860732598143754e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 08:59:41,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.69 | bwd_microstep: 1468.23 | bwd_inner_microstep: 1468.09 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3921
[2024-06-10 08:59:43,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.73 | bwd_microstep: 1591.99 | bwd_inner_microstep: 1591.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 08:59:45,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.27 | bwd_microstep: 1555.73 | bwd_inner_microstep: 1555.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 08:59:47,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1343.65 | bwd_inner_microstep: 1343.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 08:59:48,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.55 | bwd_microstep: 1247.76 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 08:59:50,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1282.76 | bwd_inner_microstep: 1282.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 08:59:52,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1412.68 | bwd_inner_microstep: 1412.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1860
[2024-06-10 08:59:53,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.83 | bwd_microstep: 679.01 | bwd_inner_microstep: 678.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 08:59:55,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1388.66 | bwd_inner_microstep: 1388.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 08:59:57,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1248.87 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3697
[2024-06-10 08:59:59,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.43 | bwd_microstep: 1755.66 | bwd_inner_microstep: 1755.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3698
[2024-06-10 09:00:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.61 | bwd_microstep: 1641.01 | bwd_inner_microstep: 1640.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-10 09:00:03,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1218.54 | bwd_inner_microstep: 1218.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 09:00:05,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.76 | bwd_microstep: 1614.14 | bwd_inner_microstep: 1614.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656
[2024-06-10 09:00:08,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.74 | bwd_microstep: 1621.71 | bwd_inner_microstep: 1621.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3455
[2024-06-10 09:00:09,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.40 | bwd_microstep: 1222.24 | bwd_inner_microstep: 1222.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 09:00:11,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.86 | bwd_microstep: 1408.18 | bwd_inner_microstep: 1408.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 09:00:13,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1293.89 | bwd_inner_microstep: 1293.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 09:00:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.03 | bwd_microstep: 806.80 | bwd_inner_microstep: 806.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 09:00:16,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1283.98 | bwd_inner_microstep: 1283.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 09:00:18,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.61 | bwd_microstep: 1186.80 | bwd_inner_microstep: 1186.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 09:00:20,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1655.89 | bwd_inner_microstep: 1655.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 09:00:22,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1497.92 | bwd_inner_microstep: 1497.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3825
[2024-06-10 09:00:24,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.74 | bwd_microstep: 1624.82 | bwd_inner_microstep: 1624.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2044
[2024-06-10 09:00:25,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.04 | bwd_microstep: 746.47 | bwd_inner_microstep: 746.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415
[2024-06-10 09:00:27,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1402.44 | bwd_inner_microstep: 1402.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2188
[2024-06-10 09:00:29,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.35 | bwd_microstep: 1052.22 | bwd_inner_microstep: 1052.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 09:00:31,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1501.93 | bwd_inner_microstep: 1501.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3768
[2024-06-10 09:00:33,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.58 | bwd_microstep: 1570.94 | bwd_inner_microstep: 1570.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3802
[2024-06-10 09:00:35,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.49 | bwd_microstep: 1718.88 | bwd_inner_microstep: 1718.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3596
[2024-06-10 09:00:37,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.10 | bwd_microstep: 1670.61 | bwd_inner_microstep: 1670.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3761
[2024-06-10 09:00:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 09:00:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.78 | bwd_microstep: 1511.37 | bwd_inner_microstep: 1503.68 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.44
[2024-06-10 09:00:40,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16510.83 | bwd: 44225.81 | bwd_inner: 44217.14 | bwd_allreduce: 7.94 | step: 39.19
{'loss': 1.3154, 'learning_rate': 3.383364988279113e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 09:00:42,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1470.66 | bwd_inner_microstep: 1470.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 09:00:43,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.24 | bwd_microstep: 1295.21 | bwd_inner_microstep: 1295.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 09:00:45,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.97 | bwd_microstep: 788.97 | bwd_inner_microstep: 788.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217
[2024-06-10 09:00:46,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.76 | bwd_microstep: 956.49 | bwd_inner_microstep: 956.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 09:00:48,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1390.57 | bwd_inner_microstep: 1390.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 09:00:50,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1388.41 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 09:00:51,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.35 | bwd_microstep: 803.79 | bwd_inner_microstep: 803.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 09:00:53,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1345.47 | bwd_inner_microstep: 1345.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 09:00:55,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.77 | bwd_microstep: 1632.64 | bwd_inner_microstep: 1632.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1948
[2024-06-10 09:00:56,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.30 | bwd_microstep: 889.95 | bwd_inner_microstep: 889.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 09:00:58,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530
[2024-06-10 09:01:00,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1421.58 | bwd_inner_microstep: 1421.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 09:01:02,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.55 | bwd_microstep: 1619.91 | bwd_inner_microstep: 1619.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 09:01:04,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1481.51 | bwd_inner_microstep: 1481.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 09:01:06,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1384.45 | bwd_inner_microstep: 1384.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 09:01:08,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.53 | bwd_microstep: 1392.82 | bwd_inner_microstep: 1392.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2472
[2024-06-10 09:01:10,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.31 | bwd_microstep: 954.49 | bwd_inner_microstep: 954.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2147
[2024-06-10 09:01:11,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.94 | bwd_microstep: 851.86 | bwd_inner_microstep: 851.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3561
[2024-06-10 09:01:12,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.01 | bwd_microstep: 1234.25 | bwd_inner_microstep: 1234.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3500
[2024-06-10 09:01:14,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1439.41 | bwd_inner_microstep: 1439.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 09:01:16,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.61 | bwd_microstep: 974.75 | bwd_inner_microstep: 974.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 09:01:18,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1655.47 | bwd_inner_microstep: 1655.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 09:01:20,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.56 | bwd_microstep: 1380.64 | bwd_inner_microstep: 1380.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1992
[2024-06-10 09:01:21,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.81 | bwd_microstep: 758.71 | bwd_inner_microstep: 758.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 09:01:23,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1282.75 | bwd_inner_microstep: 1282.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3583
[2024-06-10 09:01:25,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.95 | bwd_microstep: 1799.88 | bwd_inner_microstep: 1799.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 09:01:27,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.61 | bwd_microstep: 1557.27 | bwd_inner_microstep: 1557.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 09:01:29,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.93 | bwd_microstep: 1315.02 | bwd_inner_microstep: 1314.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 09:01:31,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1557.97 | bwd_inner_microstep: 1557.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4380
[2024-06-10 09:01:34,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.30 | bwd_microstep: 1816.79 | bwd_inner_microstep: 1816.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 09:01:36,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.70 | bwd_microstep: 1597.66 | bwd_inner_microstep: 1597.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3762
[2024-06-10 09:01:41,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 09:01:41,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 4644.22 | bwd_inner_microstep: 1679.53 | bwd_allreduce_microstep: 2964.64 | step_microstep: 37.95
[2024-06-10 09:01:41,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15856.02 | bwd: 45565.24 | bwd_inner: 42599.67 | bwd_allreduce: 2964.87 | step: 39.53
{'loss': 1.295, 'learning_rate': 3.380651844539553e-05, 'epoch': 0.28}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982
[2024-06-10 09:01:43,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.19 | bwd_microstep: 895.22 | bwd_inner_microstep: 895.09 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2392
[2024-06-10 09:01:44,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.71 | bwd_microstep: 997.95 | bwd_inner_microstep: 997.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 09:01:45,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.88 | bwd_microstep: 787.71 | bwd_inner_microstep: 787.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 09:01:47,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.40 | bwd_microstep: 1286.41 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 09:01:49,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 1531.23 | bwd_inner_microstep: 1531.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410
[2024-06-10 09:01:51,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.10 | bwd_microstep: 1149.09 | bwd_inner_microstep: 1149.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 09:01:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.89 | bwd_microstep: 1387.39 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 09:01:54,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.87 | bwd_microstep: 797.58 | bwd_inner_microstep: 797.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 09:01:55,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1249.93 | bwd_inner_microstep: 1249.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 09:01:57,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.97 | bwd_microstep: 1446.17 | bwd_inner_microstep: 1446.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-10 09:01:59,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1348.04 | bwd_inner_microstep: 1348.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 09:02:01,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1249.24 | bwd_inner_microstep: 1249.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580
[2024-06-10 09:02:03,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.21 | bwd_microstep: 1600.94 | bwd_inner_microstep: 1600.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897
[2024-06-10 09:02:04,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.77 | bwd_microstep: 712.50 | bwd_inner_microstep: 712.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 09:02:06,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.86 | bwd_microstep: 1289.05 | bwd_inner_microstep: 1289.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3501
[2024-06-10 09:02:08,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1514.14 | bwd_inner_microstep: 1514.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3519
[2024-06-10 09:02:10,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.01 | bwd_microstep: 1222.44 | bwd_inner_microstep: 1222.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 09:02:11,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.20 | bwd_microstep: 1288.74 | bwd_inner_microstep: 1288.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641
[2024-06-10 09:02:14,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.92 | bwd_microstep: 1614.48 | bwd_inner_microstep: 1614.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2290
[2024-06-10 09:02:15,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.73 | bwd_microstep: 940.66 | bwd_inner_microstep: 940.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 09:02:17,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1386.63 | bwd_inner_microstep: 1386.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 09:02:19,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.12 | bwd_microstep: 1452.15 | bwd_inner_microstep: 1452.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709
[2024-06-10 09:02:21,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1333.42 | bwd_inner_microstep: 1333.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 09:02:23,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.72 | bwd_microstep: 1665.57 | bwd_inner_microstep: 1665.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 09:02:25,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 09:02:27,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.55 | bwd_microstep: 1498.78 | bwd_inner_microstep: 1498.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3591
[2024-06-10 09:02:29,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.59 | bwd_microstep: 1273.32 | bwd_inner_microstep: 1273.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 09:02:31,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.39 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3774
[2024-06-10 09:02:33,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1505.73 | bwd_inner_microstep: 1505.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3817
[2024-06-10 09:02:35,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.47 | bwd_microstep: 1803.04 | bwd_inner_microstep: 1803.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574
[2024-06-10 09:02:37,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1426.70 | bwd_inner_microstep: 1426.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3442
[2024-06-10 09:02:44,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 09:02:44,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 6681.77 | bwd_inner_microstep: 1759.76 | bwd_allreduce_microstep: 4921.96 | step_microstep: 37.96
[2024-06-10 09:02:44,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.64 | bwd: 46978.60 | bwd_inner: 42055.63 | bwd_allreduce: 4922.24 | step: 39.54
{'loss': 1.2921, 'learning_rate': 3.377933838151374e-05, 'epoch': 0.28}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 09:02:46,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.92 | bwd_microstep: 1239.55 | bwd_inner_microstep: 1239.41 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3531
[2024-06-10 09:02:48,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.18 | bwd_microstep: 1194.10 | bwd_inner_microstep: 1194.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 09:02:50,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.39 | bwd_microstep: 1644.72 | bwd_inner_microstep: 1644.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2299
[2024-06-10 09:02:51,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.28 | bwd_microstep: 848.09 | bwd_inner_microstep: 848.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2022
[2024-06-10 09:02:52,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.24 | bwd_microstep: 713.99 | bwd_inner_microstep: 713.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416
[2024-06-10 09:02:54,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1311.89 | bwd_inner_microstep: 1311.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 09:02:56,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1380.84 | bwd_inner_microstep: 1380.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-10 09:02:58,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.28 | bwd_microstep: 1526.41 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 09:03:00,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.41 | bwd_microstep: 1247.17 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 09:03:02,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1296.83 | bwd_inner_microstep: 1296.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 09:03:03,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.23 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3504
[2024-06-10 09:03:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1347.61 | bwd_inner_microstep: 1347.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2395
[2024-06-10 09:03:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.26 | bwd_microstep: 1098.27 | bwd_inner_microstep: 1098.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 09:03:09,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.97 | bwd_microstep: 1473.16 | bwd_inner_microstep: 1473.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3669
[2024-06-10 09:03:11,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.17 | bwd_microstep: 1454.11 | bwd_inner_microstep: 1454.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 09:03:13,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.18 | bwd_microstep: 1449.95 | bwd_inner_microstep: 1449.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 09:03:15,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.20 | bwd_microstep: 1487.17 | bwd_inner_microstep: 1487.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 09:03:17,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1343.75 | bwd_inner_microstep: 1343.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3834
[2024-06-10 09:03:19,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.71 | bwd_microstep: 1583.86 | bwd_inner_microstep: 1583.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 09:03:21,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.64 | bwd_microstep: 1460.87 | bwd_inner_microstep: 1460.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 09:03:23,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.49 | bwd_microstep: 1454.08 | bwd_inner_microstep: 1454.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 09:03:25,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1263.48 | bwd_inner_microstep: 1263.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825
[2024-06-10 09:03:27,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.62 | bwd_microstep: 1480.53 | bwd_inner_microstep: 1480.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2077
[2024-06-10 09:03:28,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.20 | bwd_microstep: 787.64 | bwd_inner_microstep: 787.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-10 09:03:30,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.52 | bwd_microstep: 1515.86 | bwd_inner_microstep: 1515.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3552
[2024-06-10 09:03:32,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.61 | bwd_microstep: 1459.10 | bwd_inner_microstep: 1459.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 09:03:34,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1557.40 | bwd_inner_microstep: 1557.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-10 09:03:36,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.34 | bwd_microstep: 1442.73 | bwd_inner_microstep: 1442.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 09:03:38,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1551.72 | bwd_inner_microstep: 1551.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 09:03:40,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1499.95 | bwd_inner_microstep: 1499.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3578
[2024-06-10 09:03:42,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1422.61 | bwd_inner_microstep: 1422.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 09:03:46,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 09:03:46,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 3469.85 | bwd_inner_microstep: 1756.51 | bwd_allreduce_microstep: 1713.29 | step_microstep: 38.06
[2024-06-10 09:03:46,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16272.31 | bwd: 45350.13 | bwd_inner: 43635.81 | bwd_allreduce: 1713.57 | step: 39.66
{'loss': 1.2995, 'learning_rate': 3.37521097868738e-05, 'epoch': 0.28}
1:14<21:06:54, 61.01s/it]


 28%|██▊       | 480/1726 [8:21:14<21:06:54, 61.01s/it]
 28%|██▊       | 481/1726 [8:22:15<21:06:20, 61.03s/it]


 28%|██▊       | 481/1726 [8:22:15<21:06:20, 61.03s/it]
 28%|██▊       | 482/1726 [8:23:16<21:05:41, 61.05s/it]


 28%|██▊       | 482/1726 [8:23:16<21:05:41, 61.05s/it]
 28%|██▊       | 483/1726 [8:24:18<21:09:08, 61.26s/it]


 28%|██▊       | 483/1726 [8:24:18<21:09:08, 61.26s/it]
 28%|██▊       | 484/1726 [8:25:21<21:18:47, 61.78s/it]


 28%|██▊       | 484/1726 [8:25:21<21:18:47, 61.78s/it]
 28%|██▊       | 485/1726 [8:26:23<21:18:58, 61.84s/it]


 28%|██▊       | 485/1726 [8:2dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 09:03:48,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1364.86 | bwd_inner_microstep: 1364.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 09:03:49,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.73 | bwd_microstep: 801.01 | bwd_inner_microstep: 800.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2245
[2024-06-10 09:03:51,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.98 | bwd_microstep: 900.17 | bwd_inner_microstep: 900.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3846
[2024-06-10 09:03:53,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.50 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 09:03:55,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.91 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 09:03:56,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.90 | bwd_microstep: 1244.86 | bwd_inner_microstep: 1244.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3666
[2024-06-10 09:03:58,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.01 | bwd_microstep: 1453.72 | bwd_inner_microstep: 1453.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491
[2024-06-10 09:04:00,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.81 | bwd_microstep: 1509.34 | bwd_inner_microstep: 1509.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3488
[2024-06-10 09:04:02,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.36 | bwd_microstep: 1361.47 | bwd_inner_microstep: 1361.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3662
[2024-06-10 09:04:05,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.35 | bwd_microstep: 1652.96 | bwd_inner_microstep: 1652.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 09:04:07,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.62 | bwd_microstep: 1509.71 | bwd_inner_microstep: 1509.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 09:04:09,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1381.60 | bwd_inner_microstep: 1381.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2128
[2024-06-10 09:04:10,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.66 | bwd_microstep: 801.83 | bwd_inner_microstep: 801.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 09:04:12,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.72 | bwd_microstep: 1603.93 | bwd_inner_microstep: 1603.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 09:04:13,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.69 | bwd_microstep: 801.82 | bwd_inner_microstep: 801.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 09:04:15,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.77 | bwd_microstep: 1455.56 | bwd_inner_microstep: 1455.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 09:04:17,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.16 | bwd_microstep: 1292.43 | bwd_inner_microstep: 1292.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3540
[2024-06-10 09:04:19,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1360.41 | bwd_inner_microstep: 1360.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 09:04:21,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1509.92 | bwd_inner_microstep: 1509.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 09:04:23,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1316.30 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 09:04:24,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.48 | bwd_microstep: 1158.26 | bwd_inner_microstep: 1158.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 09:04:26,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.90 | bwd_microstep: 1189.39 | bwd_inner_microstep: 1189.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 09:04:28,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.32 | bwd_microstep: 1255.12 | bwd_inner_microstep: 1255.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 09:04:30,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.03 | bwd_microstep: 1532.26 | bwd_inner_microstep: 1532.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2184
[2024-06-10 09:04:31,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.83 | bwd_microstep: 826.55 | bwd_inner_microstep: 826.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 09:04:33,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.67 | bwd_microstep: 1284.56 | bwd_inner_microstep: 1284.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 09:04:35,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1647.74 | bwd_inner_microstep: 1647.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 09:04:37,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1376.36 | bwd_inner_microstep: 1376.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 09:04:39,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1515.77 | bwd_inner_microstep: 1515.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 09:04:41,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.33 | bwd_microstep: 1654.53 | bwd_inner_microstep: 1654.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3800
[2024-06-10 09:04:44,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.54 | bwd_microstep: 1750.18 | bwd_inner_microstep: 1750.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 09:04:46,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 09:04:46,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.75 | bwd_microstep: 1850.10 | bwd_inner_microstep: 1491.39 | bwd_allreduce_microstep: 358.65 | step_microstep: 38.94
[2024-06-10 09:04:46,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16031.67 | bwd: 43301.31 | bwd_inner: 42941.75 | bwd_allreduce: 358.88 | step: 40.99
{'loss': 1.3183, 'learning_rate': 3.372483275737468e-05, 'epoch': 0.28}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411
[2024-06-10 09:04:48,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.33 | bwd_microstep: 1363.22 | bwd_inner_microstep: 1363.13 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3571
[2024-06-10 09:04:50,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1367.40 | bwd_inner_microstep: 1367.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 09:04:51,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.76 | bwd_microstep: 788.12 | bwd_inner_microstep: 788.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 09:04:53,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1381.64 | bwd_inner_microstep: 1381.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1440
[2024-06-10 09:04:54,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 220.16 | bwd_microstep: 571.97 | bwd_inner_microstep: 571.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4120
[2024-06-10 09:04:56,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.11 | bwd_microstep: 1640.44 | bwd_inner_microstep: 1640.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 09:04:58,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.05 | bwd_microstep: 1281.79 | bwd_inner_microstep: 1281.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 09:05:00,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.86 | bwd_microstep: 1385.94 | bwd_inner_microstep: 1385.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 09:05:01,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.73 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408
[2024-06-10 09:05:03,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.89 | bwd_microstep: 1308.45 | bwd_inner_microstep: 1308.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1999
[2024-06-10 09:05:04,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.26 | bwd_microstep: 901.78 | bwd_inner_microstep: 901.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 09:05:06,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.50 | bwd_microstep: 1448.52 | bwd_inner_microstep: 1448.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 09:05:08,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1487.20 | bwd_inner_microstep: 1487.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 09:05:10,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1353.78 | bwd_inner_microstep: 1353.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3632
[2024-06-10 09:05:12,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.55 | bwd_microstep: 1348.25 | bwd_inner_microstep: 1348.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3651
[2024-06-10 09:05:14,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.84 | bwd_microstep: 1580.87 | bwd_inner_microstep: 1580.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3433
[2024-06-10 09:05:16,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.94 | bwd_microstep: 1267.90 | bwd_inner_microstep: 1267.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 09:05:18,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1277.61 | bwd_inner_microstep: 1277.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3678
[2024-06-10 09:05:20,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1328.79 | bwd_inner_microstep: 1328.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 09:05:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.25 | bwd_microstep: 1296.91 | bwd_inner_microstep: 1296.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3915
[2024-06-10 09:05:24,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.14 | bwd_microstep: 1699.31 | bwd_inner_microstep: 1699.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2066
[2024-06-10 09:05:25,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.25 | bwd_microstep: 760.62 | bwd_inner_microstep: 760.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 09:05:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.61 | bwd_microstep: 1189.63 | bwd_inner_microstep: 1189.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2457
[2024-06-10 09:05:28,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.13 | bwd_microstep: 925.00 | bwd_inner_microstep: 924.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822
[2024-06-10 09:05:30,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.77 | bwd_microstep: 1419.09 | bwd_inner_microstep: 1419.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 09:05:32,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.65 | bwd_microstep: 1660.27 | bwd_inner_microstep: 1660.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3892
[2024-06-10 09:05:34,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.65 | bwd_microstep: 1694.79 | bwd_inner_microstep: 1694.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 09:05:37,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.44 | bwd_microstep: 1553.51 | bwd_inner_microstep: 1553.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 09:05:38,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1285.53 | bwd_inner_microstep: 1285.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 09:05:40,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1419.50 | bwd_inner_microstep: 1419.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 09:05:42,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1404.87 | bwd_inner_microstep: 1404.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3442
[2024-06-10 09:05:49,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.27 | optimizer_step: 6.60
[2024-06-10 09:05:49,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 5786.12 | bwd_inner_microstep: 1759.10 | bwd_allreduce_microstep: 4026.97 | step_microstep: 38.29
[2024-06-10 09:05:49,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15832.30 | bwd: 46432.88 | bwd_inner: 42404.92 | bwd_allreduce: 4027.25 | step: 40.20
{'loss': 1.2904, 'learning_rate': 3.369750738908593e-05, 'epoch': 0.28}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 09:05:51,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1391.74 | bwd_inner_microstep: 1391.48 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3949
[2024-06-10 09:05:53,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.47 | bwd_microstep: 1699.87 | bwd_inner_microstep: 1699.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-10 09:05:54,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.58 | bwd_microstep: 707.67 | bwd_inner_microstep: 707.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 09:05:56,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.26 | bwd_microstep: 1184.96 | bwd_inner_microstep: 1184.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2054
[2024-06-10 09:05:57,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.62 | bwd_microstep: 815.46 | bwd_inner_microstep: 815.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 09:05:58,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.93 | bwd_microstep: 960.27 | bwd_inner_microstep: 960.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-10 09:06:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.51 | bwd_microstep: 1628.38 | bwd_inner_microstep: 1628.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 09:06:02,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.17 | bwd_microstep: 1380.15 | bwd_inner_microstep: 1380.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 09:06:04,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1385.75 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901
[2024-06-10 09:06:06,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.34 | bwd_microstep: 1590.42 | bwd_inner_microstep: 1590.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 09:06:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1296.84 | bwd_inner_microstep: 1296.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3629
[2024-06-10 09:06:10,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.43 | bwd_microstep: 1361.73 | bwd_inner_microstep: 1361.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3730
[2024-06-10 09:06:12,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.88 | bwd_microstep: 1698.58 | bwd_inner_microstep: 1698.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3670
[2024-06-10 09:06:15,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.37 | bwd_microstep: 1773.35 | bwd_inner_microstep: 1773.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 09:06:17,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1521.67 | bwd_inner_microstep: 1521.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3639
[2024-06-10 09:06:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.91 | bwd_microstep: 1813.44 | bwd_inner_microstep: 1813.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 09:06:21,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.34 | bwd_microstep: 1586.91 | bwd_inner_microstep: 1586.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2395
[2024-06-10 09:06:23,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.44 | bwd_microstep: 1006.86 | bwd_inner_microstep: 1006.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2305
[2024-06-10 09:06:24,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.24 | bwd_microstep: 917.39 | bwd_inner_microstep: 917.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-10 09:06:26,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.84 | bwd_microstep: 1584.98 | bwd_inner_microstep: 1584.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 09:06:28,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1512.69 | bwd_inner_microstep: 1512.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 09:06:30,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1258.80 | bwd_inner_microstep: 1258.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 09:06:32,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1377.77 | bwd_inner_microstep: 1377.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3745
[2024-06-10 09:06:34,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.01 | bwd_microstep: 1371.86 | bwd_inner_microstep: 1371.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1912
[2024-06-10 09:06:35,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.53 | bwd_microstep: 757.34 | bwd_inner_microstep: 757.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 09:06:37,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.71 | bwd_microstep: 1182.60 | bwd_inner_microstep: 1182.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2276
[2024-06-10 09:06:38,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.19 | bwd_microstep: 812.24 | bwd_inner_microstep: 812.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3583
[2024-06-10 09:06:40,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1333.96 | bwd_inner_microstep: 1333.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2016
[2024-06-10 09:06:41,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.98 | bwd_microstep: 759.27 | bwd_inner_microstep: 759.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568
[2024-06-10 09:06:43,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.57 | bwd_microstep: 1568.75 | bwd_inner_microstep: 1568.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 09:06:45,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1514.29 | bwd_inner_microstep: 1514.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3583
[2024-06-10 09:06:50,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 09:06:50,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.89 | bwd_microstep: 4255.25 | bwd_inner_microstep: 1626.99 | bwd_allreduce_microstep: 2628.20 | step_microstep: 38.25
[2024-06-10 09:06:50,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15764.49 | bwd: 45011.27 | bwd_inner: 42381.94 | bwd_allreduce: 2628.55 | step: 40.01
{'loss': 1.2839, 'learning_rate': 3.367013377824737e-05, 'epoch': 0.28}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2008
[2024-06-10 09:06:51,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.47 | bwd_microstep: 861.00 | bwd_inner_microstep: 860.81 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2342
[2024-06-10 09:06:52,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.96 | bwd_microstep: 992.79 | bwd_inner_microstep: 992.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 09:06:54,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.61 | bwd_microstep: 1553.50 | bwd_inner_microstep: 1553.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 09:06:56,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.42 | bwd_microstep: 1314.48 | bwd_inner_microstep: 1314.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 09:06:58,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.21 | bwd_microstep: 1477.48 | bwd_inner_microstep: 1477.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4011
[2024-06-10 09:07:00,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1508.42 | bwd_inner_microstep: 1508.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 09:07:02,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1389.58 | bwd_inner_microstep: 1389.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3448
[2024-06-10 09:07:04,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1304.20 | bwd_inner_microstep: 1304.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092
[2024-06-10 09:07:05,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.28 | bwd_microstep: 927.26 | bwd_inner_microstep: 927.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941
[2024-06-10 09:07:07,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.00 | bwd_microstep: 885.96 | bwd_inner_microstep: 885.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974
[2024-06-10 09:07:08,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.26 | bwd_microstep: 891.39 | bwd_inner_microstep: 891.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 09:07:10,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.03 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3489
[2024-06-10 09:07:12,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1512.11 | bwd_inner_microstep: 1512.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1440
[2024-06-10 09:07:13,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 237.74 | bwd_microstep: 630.17 | bwd_inner_microstep: 630.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 09:07:15,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1377.54 | bwd_inner_microstep: 1377.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2450
[2024-06-10 09:07:16,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.50 | bwd_microstep: 952.08 | bwd_inner_microstep: 952.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 09:07:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 09:07:20,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1499.05 | bwd_inner_microstep: 1499.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 09:07:21,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.93 | bwd_microstep: 804.68 | bwd_inner_microstep: 804.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3707
[2024-06-10 09:07:23,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.34 | bwd_microstep: 1561.57 | bwd_inner_microstep: 1561.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536
[2024-06-10 09:07:25,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.56 | bwd_microstep: 1418.61 | bwd_inner_microstep: 1418.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 09:07:27,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.59 | bwd_microstep: 1357.06 | bwd_inner_microstep: 1357.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 09:07:29,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1409.47 | bwd_inner_microstep: 1409.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 09:07:31,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 1500.61 | bwd_inner_microstep: 1500.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 09:07:32,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.85 | bwd_microstep: 975.58 | bwd_inner_microstep: 975.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-10 09:07:34,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.46 | bwd_microstep: 1532.71 | bwd_inner_microstep: 1532.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435
[2024-06-10 09:07:36,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.11 | bwd_microstep: 1318.01 | bwd_inner_microstep: 1317.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3627
[2024-06-10 09:07:38,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.31 | bwd_microstep: 1467.16 | bwd_inner_microstep: 1467.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3464
[2024-06-10 09:07:40,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.20 | bwd_microstep: 1436.51 | bwd_inner_microstep: 1436.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3058
[2024-06-10 09:07:42,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.51 | bwd_microstep: 1300.12 | bwd_inner_microstep: 1300.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 09:07:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.25 | bwd_microstep: 975.75 | bwd_inner_microstep: 975.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 09:07:50,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 09:07:50,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.02 | bwd_microstep: 5650.31 | bwd_inner_microstep: 1858.33 | bwd_allreduce_microstep: 3791.92 | step_microstep: 38.24
[2024-06-10 09:07:50,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15114.59 | bwd: 44445.82 | bwd_inner: 40652.84 | bwd_allreduce: 3792.23 | step: 39.94
{'loss': 1.2314, 'learning_rate': 3.364271202126871e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 09:07:52,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.80 | bwd_microstep: 1439.25 | bwd_inner_microstep: 1439.17 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925
[2024-06-10 09:07:54,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1589.79 | bwd_inner_microstep: 1589.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 09:07:56,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.21 | bwd_microstep: 1280.35 | bwd_inner_microstep: 1280.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3801
[2024-06-10 09:07:58,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.06 | bwd_microstep: 1511.56 | bwd_inner_microstep: 1511.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 09:08:00,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1341.71 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 09:08:02,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1387.13 | bwd_inner_microstep: 1387.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 09:08:04,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.07 | bwd_microstep: 1489.50 | bwd_inner_microstep: 1489.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 09:08:05,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1280.53 | bwd_inner_microstep: 1280.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 09:08:06,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 793.64 | bwd_inner_microstep: 793.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3614
[2024-06-10 09:08:08,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.76 | bwd_microstep: 1435.85 | bwd_inner_microstep: 1435.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423
[2024-06-10 09:08:10,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.08 | bwd_microstep: 1308.77 | bwd_inner_microstep: 1308.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2654
[2024-06-10 09:08:12,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.24 | bwd_microstep: 1053.69 | bwd_inner_microstep: 1053.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3694
[2024-06-10 09:08:14,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.65 | bwd_microstep: 1724.47 | bwd_inner_microstep: 1724.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 09:08:15,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.93 | bwd_microstep: 794.94 | bwd_inner_microstep: 794.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3653
[2024-06-10 09:08:18,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.79 | bwd_microstep: 1784.46 | bwd_inner_microstep: 1784.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3472
[2024-06-10 09:08:20,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.24 | bwd_microstep: 1542.58 | bwd_inner_microstep: 1542.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 09:08:22,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.10 | bwd_microstep: 1500.69 | bwd_inner_microstep: 1500.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 09:08:24,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.84 | bwd_microstep: 1393.49 | bwd_inner_microstep: 1393.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 09:08:26,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.67 | bwd_microstep: 1614.27 | bwd_inner_microstep: 1614.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 09:08:28,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1463.73 | bwd_inner_microstep: 1463.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501
[2024-06-10 09:08:30,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.33 | bwd_microstep: 1224.07 | bwd_inner_microstep: 1224.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 09:08:31,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.28 | bwd_microstep: 1287.72 | bwd_inner_microstep: 1287.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823
[2024-06-10 09:08:34,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.46 | bwd_microstep: 1581.48 | bwd_inner_microstep: 1581.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606
[2024-06-10 09:08:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1536.28 | bwd_inner_microstep: 1536.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3835
[2024-06-10 09:08:38,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.83 | bwd_microstep: 1588.82 | bwd_inner_microstep: 1588.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3454
[2024-06-10 09:08:40,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.08 | bwd_microstep: 1210.45 | bwd_inner_microstep: 1210.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607
[2024-06-10 09:08:42,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.91 | bwd_microstep: 1580.99 | bwd_inner_microstep: 1580.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3619
[2024-06-10 09:08:44,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.05 | bwd_microstep: 1457.90 | bwd_inner_microstep: 1457.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 09:08:46,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.32 | bwd_microstep: 1505.99 | bwd_inner_microstep: 1505.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919
[2024-06-10 09:08:48,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.26 | bwd_microstep: 1190.83 | bwd_inner_microstep: 1190.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 09:08:50,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1647.23 | bwd_inner_microstep: 1647.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 09:08:52,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.65
[2024-06-10 09:08:52,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.67 | bwd_microstep: 1637.66 | bwd_inner_microstep: 1629.90 | bwd_allreduce_microstep: 7.71 | step_microstep: 37.73
[2024-06-10 09:08:52,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16850.59 | bwd: 45179.83 | bwd_inner: 45171.16 | bwd_allreduce: 7.98 | step: 39.40
{'loss': 1.2937, 'learning_rate': 3.3615242214729226e-05, 'epoch': 0.28}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 09:08:54,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.75 | bwd_microstep: 1491.73 | bwd_inner_microstep: 1491.52 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 09:08:56,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1247.37 | bwd_inner_microstep: 1247.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4360
[2024-06-10 09:08:58,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.38 | bwd_microstep: 1709.23 | bwd_inner_microstep: 1709.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3768
[2024-06-10 09:09:00,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1503.97 | bwd_inner_microstep: 1503.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 09:09:02,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1347.20 | bwd_inner_microstep: 1347.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-10 09:09:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.72 | bwd_microstep: 1537.27 | bwd_inner_microstep: 1537.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 09:09:06,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3735
[2024-06-10 09:09:08,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.49 | bwd_microstep: 1468.10 | bwd_inner_microstep: 1468.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 09:09:10,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.78 | bwd_microstep: 1182.93 | bwd_inner_microstep: 1182.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 09:09:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.87 | bwd_microstep: 1280.19 | bwd_inner_microstep: 1280.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 09:09:13,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1251.71 | bwd_inner_microstep: 1251.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 09:09:15,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.24 | bwd_microstep: 1317.10 | bwd_inner_microstep: 1317.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3455
[2024-06-10 09:09:17,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1385.21 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3659
[2024-06-10 09:09:19,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.96 | bwd_microstep: 1468.47 | bwd_inner_microstep: 1468.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2052
[2024-06-10 09:09:20,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.80 | bwd_microstep: 850.75 | bwd_inner_microstep: 850.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2436
[2024-06-10 09:09:22,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.65 | bwd_microstep: 1015.47 | bwd_inner_microstep: 1015.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2704
[2024-06-10 09:09:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.23 | bwd_microstep: 1131.11 | bwd_inner_microstep: 1131.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 09:09:25,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.84 | bwd_microstep: 1308.39 | bwd_inner_microstep: 1308.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 09:09:27,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.94 | bwd_microstep: 1493.57 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 09:09:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059
[2024-06-10 09:09:30,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.04 | bwd_microstep: 914.04 | bwd_inner_microstep: 914.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682
[2024-06-10 09:09:32,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.20 | bwd_microstep: 1433.95 | bwd_inner_microstep: 1433.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 09:09:34,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1552.22 | bwd_inner_microstep: 1552.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3607
[2024-06-10 09:09:36,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.97 | bwd_microstep: 1215.16 | bwd_inner_microstep: 1215.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-10 09:09:38,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.93 | bwd_microstep: 1529.46 | bwd_inner_microstep: 1529.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 09:09:40,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.52 | bwd_microstep: 1402.25 | bwd_inner_microstep: 1402.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4955
[2024-06-10 09:09:43,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.03 | bwd_microstep: 1763.15 | bwd_inner_microstep: 1763.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 09:09:45,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.55 | bwd_microstep: 1513.53 | bwd_inner_microstep: 1513.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3427
[2024-06-10 09:09:47,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.63 | bwd_microstep: 1374.40 | bwd_inner_microstep: 1374.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 09:09:49,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.37 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 09:09:51,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1478.07 | bwd_inner_microstep: 1478.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 09:09:53,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 09:09:53,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.17 | bwd_microstep: 1514.27 | bwd_inner_microstep: 1506.48 | bwd_allreduce_microstep: 7.74 | step_microstep: 37.62
[2024-06-10 09:09:53,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16466.79 | bwd: 43856.23 | bwd_inner: 43847.29 | bwd_allreduce: 8.13 | step: 39.49
6:23<21:18:58, 61.84s/it]
 28%|██▊       | 486/1726 [8:27:23<21:04:44, 61.20s/it]


 28%|██▊       | 486/1726 [8:27:23<21:04:44, 61.20s/it]
 28%|██▊       | 487/1726 [8:28:25<21:12:33, 61.62s/it]


 28%|██▊       | 487/1726 [8:28:25<21:12:33, 61.62s/it]
 28%|██▊       | 488/1726 [8:29:27<21:08:30, 61.48s/it]


 28%|██▊       | 488/1726 [8:29:27<21:08:30, 61.48s/it]
 28%|██▊       | 489/1726 [8:30:26<20:57:46, 61.01s/it]


 28%|██▊       | 489/1726 [8:30:26<20:57:46, 61.01s/it]
 28%|██▊       | 490/1726 [8:31:29<21:05:22, 61.43s/it]


 28%|██▊       | 490/1726 [8:31:29<21:05:22, 61.43s/it]
 28%|██▊       | 491/1726 [8:32:30<20:59:47, 61.20s/it]
                               {'loss': 1.3113, 'learning_rate': 3.358772445537745e-05, 'epoch': 0.28}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3477
[2024-06-10 09:09:55,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.04 | bwd_microstep: 1575.52 | bwd_inner_microstep: 1575.42 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3642
[2024-06-10 09:09:57,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1451.63 | bwd_inner_microstep: 1451.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908
[2024-06-10 09:09:59,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.86 | bwd_microstep: 1590.14 | bwd_inner_microstep: 1590.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 09:10:01,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.98 | bwd_inner_microstep: 1378.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3802
[2024-06-10 09:10:03,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.40 | bwd_microstep: 1602.97 | bwd_inner_microstep: 1602.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049
[2024-06-10 09:10:04,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.15 | bwd_microstep: 816.84 | bwd_inner_microstep: 816.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 09:10:06,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.96 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 09:10:08,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1381.55 | bwd_inner_microstep: 1381.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 09:10:10,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1249.83 | bwd_inner_microstep: 1249.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 09:10:12,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.42 | bwd_microstep: 1531.26 | bwd_inner_microstep: 1531.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 09:10:14,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1288.46 | bwd_inner_microstep: 1288.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-10 09:10:15,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.02 | bwd_microstep: 828.58 | bwd_inner_microstep: 828.44 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 09:10:17,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.81 | bwd_microstep: 1627.62 | bwd_inner_microstep: 1627.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1995
[2024-06-10 09:10:18,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.28 | bwd_microstep: 708.24 | bwd_inner_microstep: 708.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 09:10:20,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1328.83 | bwd_inner_microstep: 1328.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 09:10:22,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.06 | bwd_microstep: 1437.57 | bwd_inner_microstep: 1437.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3648
[2024-06-10 09:10:25,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1710.65 | bwd_inner_microstep: 1710.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3546
[2024-06-10 09:10:27,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.72 | bwd_microstep: 1693.18 | bwd_inner_microstep: 1693.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 09:10:29,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.98 | bwd_microstep: 1344.80 | bwd_inner_microstep: 1344.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1990
[2024-06-10 09:10:30,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.22 | bwd_microstep: 740.27 | bwd_inner_microstep: 740.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3514
[2024-06-10 09:10:32,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.76 | bwd_microstep: 1533.63 | bwd_inner_microstep: 1533.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3824
[2024-06-10 09:10:34,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.05 | bwd_microstep: 1619.11 | bwd_inner_microstep: 1619.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 09:10:36,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1557.42 | bwd_inner_microstep: 1557.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3814
[2024-06-10 09:10:38,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.51 | bwd_microstep: 1620.98 | bwd_inner_microstep: 1620.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 09:10:40,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.34 | bwd_microstep: 1182.83 | bwd_inner_microstep: 1182.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 09:10:42,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.44 | bwd_microstep: 1554.97 | bwd_inner_microstep: 1554.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 09:10:44,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1601.27 | bwd_inner_microstep: 1601.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3609
[2024-06-10 09:10:46,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.21 | bwd_microstep: 1371.44 | bwd_inner_microstep: 1371.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2301
[2024-06-10 09:10:48,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.68 | bwd_microstep: 1077.15 | bwd_inner_microstep: 1077.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 09:10:50,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1477.50 | bwd_inner_microstep: 1477.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2291
[2024-06-10 09:10:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.91 | bwd_microstep: 1074.08 | bwd_inner_microstep: 1074.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 09:10:54,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 09:10:54,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 2309.87 | bwd_inner_microstep: 1771.36 | bwd_allreduce_microstep: 538.46 | step_microstep: 37.74
[2024-06-10 09:10:54,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16390.98 | bwd: 44751.72 | bwd_inner: 44212.10 | bwd_allreduce: 538.82 | step: 39.58
{'loss': 1.2944, 'learning_rate': 3.356015884013077e-05, 'epoch': 0.29}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-10 09:10:55,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.67 | bwd_microstep: 700.36 | bwd_inner_microstep: 700.22 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2302
[2024-06-10 09:10:57,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.69 | bwd_microstep: 1072.05 | bwd_inner_microstep: 1072.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2715
[2024-06-10 09:10:58,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.89 | bwd_microstep: 1052.00 | bwd_inner_microstep: 1051.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-10 09:11:00,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.91 | bwd_microstep: 1542.25 | bwd_inner_microstep: 1542.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798
[2024-06-10 09:11:02,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1546.17 | bwd_inner_microstep: 1546.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-10 09:11:05,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.44 | bwd_microstep: 1637.97 | bwd_inner_microstep: 1637.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 09:11:06,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1215.71 | bwd_inner_microstep: 1215.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 09:11:08,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1249.06 | bwd_inner_microstep: 1249.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 09:11:10,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.64 | bwd_microstep: 1345.86 | bwd_inner_microstep: 1345.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 09:11:12,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1494.50 | bwd_inner_microstep: 1494.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3712
[2024-06-10 09:11:15,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.92 | bwd_microstep: 1798.66 | bwd_inner_microstep: 1798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 09:11:17,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1495.80 | bwd_inner_microstep: 1495.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 09:11:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1341.49 | bwd_inner_microstep: 1341.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-10 09:11:20,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1414.60 | bwd_inner_microstep: 1414.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 09:11:23,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.28 | bwd_microstep: 1578.87 | bwd_inner_microstep: 1578.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3839
[2024-06-10 09:11:25,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1606.73 | bwd_inner_microstep: 1606.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449
[2024-06-10 09:11:27,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1292.44 | bwd_inner_microstep: 1292.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 09:11:28,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.84 | bwd_microstep: 1296.60 | bwd_inner_microstep: 1296.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 09:11:29,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.98 | bwd_microstep: 809.74 | bwd_inner_microstep: 809.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829
[2024-06-10 09:11:32,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.98 | bwd_microstep: 1509.95 | bwd_inner_microstep: 1509.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-10 09:11:33,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.63 | bwd_microstep: 1156.75 | bwd_inner_microstep: 1156.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3713
[2024-06-10 09:11:35,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827
[2024-06-10 09:11:37,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1481.85 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2272
[2024-06-10 09:11:38,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.17 | bwd_microstep: 970.92 | bwd_inner_microstep: 970.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426
[2024-06-10 09:11:40,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.83 | bwd_microstep: 1311.22 | bwd_inner_microstep: 1311.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 09:11:42,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.80 | bwd_microstep: 1454.06 | bwd_inner_microstep: 1454.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-10 09:11:44,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.59 | bwd_microstep: 1485.20 | bwd_inner_microstep: 1485.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 09:11:46,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.82 | bwd_microstep: 1303.31 | bwd_inner_microstep: 1303.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2244
[2024-06-10 09:11:47,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.68 | bwd_microstep: 872.52 | bwd_inner_microstep: 872.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 09:11:49,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1447.62 | bwd_inner_microstep: 1447.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 09:11:51,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.54 | bwd_microstep: 1303.01 | bwd_inner_microstep: 1302.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 09:11:57,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 09:11:57,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 5101.50 | bwd_inner_microstep: 1567.15 | bwd_allreduce_microstep: 3534.29 | step_microstep: 38.26
[2024-06-10 09:11:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15945.26 | bwd: 46274.70 | bwd_inner: 42739.39 | bwd_allreduce: 3534.58 | step: 39.80
{'loss': 1.2668, 'learning_rate': 3.353254546607515e-05, 'epoch': 0.29}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 09:11:59,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.51 | bwd_microstep: 1568.16 | bwd_inner_microstep: 1567.96 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 09:12:01,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1275.37 | bwd_inner_microstep: 1275.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 09:12:03,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2912
[2024-06-10 09:12:04,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.54 | bwd_microstep: 1190.97 | bwd_inner_microstep: 1190.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 09:12:06,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.86 | bwd_microstep: 1437.85 | bwd_inner_microstep: 1437.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-10 09:12:08,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.54 | bwd_microstep: 1181.72 | bwd_inner_microstep: 1181.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 09:12:10,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1393.35 | bwd_inner_microstep: 1393.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 09:12:12,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 09:12:13,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 793.59 | bwd_inner_microstep: 793.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1887
[2024-06-10 09:12:14,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.89 | bwd_microstep: 715.00 | bwd_inner_microstep: 714.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 09:12:16,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1384.37 | bwd_inner_microstep: 1384.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 09:12:17,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1292.77 | bwd_inner_microstep: 1292.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3657
[2024-06-10 09:12:19,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1355.85 | bwd_inner_microstep: 1355.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1945
[2024-06-10 09:12:21,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.89 | bwd_microstep: 884.54 | bwd_inner_microstep: 884.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 09:12:23,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.89 | bwd_microstep: 1451.06 | bwd_inner_microstep: 1451.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2104
[2024-06-10 09:12:24,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.35 | bwd_microstep: 1015.18 | bwd_inner_microstep: 1015.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 09:12:25,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.22 | bwd_microstep: 892.40 | bwd_inner_microstep: 892.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 09:12:27,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1488.20 | bwd_inner_microstep: 1488.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 09:12:29,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.74 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 09:12:31,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1255.70 | bwd_inner_microstep: 1255.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2468
[2024-06-10 09:12:33,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.13 | bwd_microstep: 1081.33 | bwd_inner_microstep: 1081.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 09:12:35,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1447.63 | bwd_inner_microstep: 1447.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803
[2024-06-10 09:12:37,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.18 | bwd_microstep: 1475.26 | bwd_inner_microstep: 1475.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3472
[2024-06-10 09:12:38,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.98 | bwd_microstep: 1331.55 | bwd_inner_microstep: 1331.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891
[2024-06-10 09:12:39,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.89 | bwd_microstep: 779.87 | bwd_inner_microstep: 779.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 09:12:42,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1557.52 | bwd_inner_microstep: 1557.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 09:12:44,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.14 | bwd_microstep: 1387.78 | bwd_inner_microstep: 1387.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-10 09:12:46,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.48 | bwd_microstep: 1639.27 | bwd_inner_microstep: 1639.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-10 09:12:48,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.21 | bwd_microstep: 1452.55 | bwd_inner_microstep: 1452.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 09:12:50,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1555.52 | bwd_inner_microstep: 1555.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 09:12:52,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.58 | bwd_microstep: 1645.57 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3382
[2024-06-10 09:12:58,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 09:12:58,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 4815.10 | bwd_inner_microstep: 1599.58 | bwd_allreduce_microstep: 3215.46 | step_microstep: 38.19
[2024-06-10 09:12:58,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15525.48 | bwd: 44859.69 | bwd_inner: 41643.16 | bwd_allreduce: 3215.78 | step: 40.37
{'loss': 1.2818, 'learning_rate': 3.350488443046475e-05, 'epoch': 0.29}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3542
[2024-06-10 09:13:00,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.96 | bwd_microstep: 1593.62 | bwd_inner_microstep: 1593.41 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3936
[2024-06-10 09:13:02,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.67 | bwd_microstep: 1688.31 | bwd_inner_microstep: 1688.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 09:13:04,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1477.67 | bwd_inner_microstep: 1477.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 09:13:06,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1491.65 | bwd_inner_microstep: 1491.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2903
[2024-06-10 09:13:08,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.97 | bwd_microstep: 1190.99 | bwd_inner_microstep: 1190.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-10 09:13:09,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.11 | bwd_microstep: 789.15 | bwd_inner_microstep: 789.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 09:13:11,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1245.49 | bwd_inner_microstep: 1245.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 09:13:12,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 795.71 | bwd_inner_microstep: 795.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 09:13:14,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 09:13:15,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1252.33 | bwd_inner_microstep: 1252.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 09:13:17,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1347.33 | bwd_inner_microstep: 1347.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 09:13:19,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1380.61 | bwd_inner_microstep: 1380.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 09:13:21,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1349.87 | bwd_inner_microstep: 1349.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622
[2024-06-10 09:13:23,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1313.34 | bwd_inner_microstep: 1313.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 09:13:24,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.83 | bwd_microstep: 1191.99 | bwd_inner_microstep: 1191.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2496
[2024-06-10 09:13:26,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.36 | bwd_microstep: 1055.90 | bwd_inner_microstep: 1055.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3900
[2024-06-10 09:13:28,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.77 | bwd_microstep: 1697.51 | bwd_inner_microstep: 1697.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 09:13:30,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.00 | bwd_microstep: 1658.03 | bwd_inner_microstep: 1658.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 09:13:32,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.04 | bwd_microstep: 978.73 | bwd_inner_microstep: 978.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3990
[2024-06-10 09:13:34,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.56 | bwd_microstep: 1514.90 | bwd_inner_microstep: 1514.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3858
[2024-06-10 09:13:36,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1367.90 | bwd_inner_microstep: 1367.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 09:13:38,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1460.58 | bwd_inner_microstep: 1460.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 09:13:40,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1311.78 | bwd_inner_microstep: 1311.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-10 09:13:42,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.16 | bwd_microstep: 1586.77 | bwd_inner_microstep: 1586.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3537
[2024-06-10 09:13:44,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.92 | bwd_microstep: 1518.45 | bwd_inner_microstep: 1518.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3535
[2024-06-10 09:13:46,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1586.04 | bwd_inner_microstep: 1586.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3411
[2024-06-10 09:13:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1439.64 | bwd_inner_microstep: 1439.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599
[2024-06-10 09:13:50,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.98 | bwd_microstep: 1705.26 | bwd_inner_microstep: 1705.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3555
[2024-06-10 09:13:52,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.58 | bwd_microstep: 1449.71 | bwd_inner_microstep: 1449.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3579
[2024-06-10 09:13:54,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.17 | bwd_microstep: 1349.32 | bwd_inner_microstep: 1349.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807
[2024-06-10 09:13:57,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.14 | bwd_microstep: 1751.84 | bwd_inner_microstep: 1751.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 09:13:59,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 09:13:59,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1636.43 | bwd_inner_microstep: 1628.66 | bwd_allreduce_microstep: 7.72 | step_microstep: 37.63
[2024-06-10 09:13:59,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16576.51 | bwd: 44426.08 | bwd_inner: 44417.30 | bwd_allreduce: 8.03 | step: 39.28
{'loss': 1.2847, 'learning_rate': 3.347717583072159e-05, 'epoch': 0.29}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 09:14:01,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1245.39 | bwd_inner_microstep: 1245.27 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898
[2024-06-10 09:14:03,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.35 | bwd_microstep: 1688.52 | bwd_inner_microstep: 1688.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-10 09:14:05,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.43 | bwd_microstep: 1312.92 | bwd_inner_microstep: 1312.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-10 09:14:06,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.17 | bwd_microstep: 711.82 | bwd_inner_microstep: 711.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 870
[2024-06-10 09:14:06,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.38 | bwd_microstep: 366.97 | bwd_inner_microstep: 366.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2212
[2024-06-10 09:14:08,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.34 | bwd_microstep: 957.68 | bwd_inner_microstep: 957.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 09:14:10,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.31 | bwd_microstep: 1649.39 | bwd_inner_microstep: 1649.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 09:14:12,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886
[2024-06-10 09:14:13,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.80 | bwd_microstep: 682.12 | bwd_inner_microstep: 682.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3974
[2024-06-10 09:14:15,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.38 | bwd_microstep: 1637.79 | bwd_inner_microstep: 1637.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2402
[2024-06-10 09:14:16,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.17 | bwd_microstep: 846.84 | bwd_inner_microstep: 846.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 09:14:18,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1389.78 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3641
[2024-06-10 09:14:20,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.56 | bwd_microstep: 1346.97 | bwd_inner_microstep: 1346.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-10 09:14:21,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.99 | bwd_microstep: 1161.30 | bwd_inner_microstep: 1161.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2387
[2024-06-10 09:14:23,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.60 | bwd_microstep: 934.33 | bwd_inner_microstep: 934.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 09:14:25,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1380.63 | bwd_inner_microstep: 1380.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-10 09:14:27,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1420.63 | bwd_inner_microstep: 1420.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 09:14:29,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.09 | bwd_microstep: 1391.08 | bwd_inner_microstep: 1391.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607
[2024-06-10 09:14:31,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.18 | bwd_microstep: 1704.71 | bwd_inner_microstep: 1704.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3829
[2024-06-10 09:14:33,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.03 | bwd_microstep: 1821.97 | bwd_inner_microstep: 1821.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 09:14:35,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.36 | bwd_microstep: 1286.28 | bwd_inner_microstep: 1286.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3536
[2024-06-10 09:14:37,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.41 | bwd_microstep: 1592.41 | bwd_inner_microstep: 1592.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 09:14:39,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.77 | bwd_microstep: 1520.49 | bwd_inner_microstep: 1520.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443
[2024-06-10 09:14:41,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1293.59 | bwd_inner_microstep: 1293.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 09:14:43,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 1303.62 | bwd_inner_microstep: 1303.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 09:14:45,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1403.88 | bwd_inner_microstep: 1403.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 09:14:47,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.99 | bwd_microstep: 1539.03 | bwd_inner_microstep: 1539.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 09:14:49,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.90 | bwd_microstep: 1282.99 | bwd_inner_microstep: 1282.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3541
[2024-06-10 09:14:51,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1457.88 | bwd_inner_microstep: 1457.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 09:14:52,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.02 | bwd_microstep: 976.43 | bwd_inner_microstep: 976.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3653
[2024-06-10 09:14:54,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.13 | bwd_microstep: 1474.56 | bwd_inner_microstep: 1474.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3571
[2024-06-10 09:15:01,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 09:15:01,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 6510.01 | bwd_inner_microstep: 1526.71 | bwd_allreduce_microstep: 4983.24 | step_microstep: 38.31
[2024-06-10 09:15:01,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15520.73 | bwd: 46539.96 | bwd_inner: 41555.69 | bwd_allreduce: 4983.54 | step: 40.06
{'loss': 1.2778, 'learning_rate': 3.344941976443521e-05, 'epoch': 0.29}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-10 09:15:02,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.75 | bwd_microstep: 712.16 | bwd_inner_microstep: 712.02 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3916
[2024-06-10 09:15:04,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.53 | bwd_microstep: 1439.10 | bwd_inner_microstep: 1439.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 09:15:06,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.37 | bwd_microstep: 1245.56 | bwd_inner_microstep: 1245.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 09:15:08,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1378.87 | bwd_inner_microstep: 1378.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 09:15:10,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.33 | bwd_microstep: 1247.02 | bwd_inner_microstep: 1246.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 09:15:12,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1382.22 | bwd_inner_microstep: 1382.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3433
[2024-06-10 09:15:13,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1302.11 | bwd_inner_microstep: 1301.96 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 09:15:15,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1341.41 | bwd_inner_microstep: 1341.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 09:15:18,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.06 | bwd_microstep: 1621.88 | bwd_inner_microstep: 1621.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3677
[2024-06-10 09:15:20,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.33 | bwd_microstep: 1663.70 | bwd_inner_microstep: 1663.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404
[2024-06-10 09:15:22,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1443.05 | bwd_inner_microstep: 1443.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 09:15:24,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1485.33 | bwd_inner_microstep: 1485.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3518
[2024-06-10 09:15:26,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1352.18 | bwd_inner_microstep: 1352.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 09:15:28,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.96 | bwd_microstep: 1622.37 | bwd_inner_microstep: 1622.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3690
[2024-06-10 09:15:30,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.47 | bwd_microstep: 1430.02 | bwd_inner_microstep: 1429.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3843
[2024-06-10 09:15:32,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.95 | bwd_microstep: 1365.19 | bwd_inner_microstep: 1365.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 09:15:34,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.82 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 09:15:35,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1351.22 | bwd_inner_microstep: 1351.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 09:15:39,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 2682.91 | bwd_inner_microstep: 2682.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2269
[2024-06-10 09:15:40,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.35 | bwd_microstep: 809.49 | bwd_inner_microstep: 809.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 09:15:42,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1484.11 | bwd_inner_microstep: 1484.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 09:15:44,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.50 | bwd_microstep: 1373.98 | bwd_inner_microstep: 1373.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 612
[2024-06-10 09:15:44,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.87 | bwd_microstep: 260.51 | bwd_inner_microstep: 260.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2073
[2024-06-10 09:15:45,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.39 | bwd_microstep: 846.90 | bwd_inner_microstep: 846.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 09:15:46,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.98 | bwd_microstep: 800.03 | bwd_inner_microstep: 800.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3468
[2024-06-10 09:15:48,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.93 | bwd_microstep: 1211.41 | bwd_inner_microstep: 1211.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3530
[2024-06-10 09:15:50,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.95 | bwd_microstep: 1437.86 | bwd_inner_microstep: 1437.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997
[2024-06-10 09:15:51,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.21 | bwd_microstep: 830.43 | bwd_inner_microstep: 830.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2230
[2024-06-10 09:15:53,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.67 | bwd_microstep: 1022.75 | bwd_inner_microstep: 1022.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 09:15:55,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.50 | bwd_microstep: 1450.30 | bwd_inner_microstep: 1450.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3609
[2024-06-10 09:15:57,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.27 | bwd_microstep: 1807.71 | bwd_inner_microstep: 1807.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2228
[2024-06-10 09:16:03,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.39 | optimizer_step: 6.59
[2024-06-10 09:16:03,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.31 | bwd_microstep: 5817.92 | bwd_inner_microstep: 1145.61 | bwd_allreduce_microstep: 4672.25 | step_microstep: 40.69
[2024-06-10 09:16:03,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15114.81 | bwd: 46503.17 | bwd_inner: 41829.79 | bwd_allreduce: 4672.58 | step: 42.50


 28%|██▊       | 491/1726 [8:32:30<20:59:47, 61.20s/it]
 29%|██▊       | 492/1726 [8:33:31<21:00:38, 61.30s/it]


 29%|██▊       | 492/1726 [8:33:31<21:00:38, 61.30s/it]
 29%|██▊       | 493/1726 [8:34:34<21:07:24, 61.67s/it]


 29%|██▊       | 493/1726 [8:34:34<21:07:24, 61.67s/it]
 29%|██▊       | 494/1726 [8:35:34<21:00:42, 61.40s/it]


 29%|██▊       | 494/1726 [8:35:34<21:00:42, 61.40s/it]
 29%|██▊       | 495/1726 [8:36:36<20:59:24, 61.38s/it]


 29%|██▊       | 495/1726 [8:36:36<20:59:24, 61.38s/it]
 29%|██▊       | 496/1726 [8:37:38<21:04:40, 61.69s/it]


 29%|██▊       | 496/1726 [8:37:38<21:04:40, 61.69s/it]
 29%|██▉       | 497/1726{'loss': 1.2518, 'learning_rate': 3.342161632936234e-05, 'epoch': 0.29}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914
[2024-06-10 09:16:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.91 | bwd_microstep: 771.31 | bwd_inner_microstep: 771.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 09:16:06,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.88 | bwd_microstep: 1340.57 | bwd_inner_microstep: 1340.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 09:16:08,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.34 | bwd_microstep: 1340.82 | bwd_inner_microstep: 1340.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 09:16:10,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1399.27 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486
[2024-06-10 09:16:12,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.72 | bwd_microstep: 1216.84 | bwd_inner_microstep: 1216.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792
[2024-06-10 09:16:14,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.28 | bwd_microstep: 1548.49 | bwd_inner_microstep: 1548.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 09:16:15,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 790.36 | bwd_inner_microstep: 790.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 09:16:17,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1352.11 | bwd_inner_microstep: 1352.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 09:16:19,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1246.04 | bwd_inner_microstep: 1246.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 09:16:20,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.83 | bwd_microstep: 1186.19 | bwd_inner_microstep: 1186.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 09:16:22,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.42 | bwd_microstep: 1640.43 | bwd_inner_microstep: 1640.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 09:16:24,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1415.13 | bwd_inner_microstep: 1415.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 09:16:26,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.80 | bwd_microstep: 1401.60 | bwd_inner_microstep: 1401.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3510
[2024-06-10 09:16:28,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3511
[2024-06-10 09:16:30,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1431.54 | bwd_inner_microstep: 1431.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 09:16:32,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 09:16:34,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.47 | bwd_microstep: 1484.75 | bwd_inner_microstep: 1484.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633
[2024-06-10 09:16:36,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1312.07 | bwd_inner_microstep: 1312.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2498
[2024-06-10 09:16:38,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.61 | bwd_microstep: 1054.53 | bwd_inner_microstep: 1054.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1977
[2024-06-10 09:16:39,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.21 | bwd_microstep: 765.71 | bwd_inner_microstep: 765.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1991
[2024-06-10 09:16:40,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.36 | bwd_microstep: 896.17 | bwd_inner_microstep: 896.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943
[2024-06-10 09:16:41,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.24 | bwd_microstep: 728.87 | bwd_inner_microstep: 728.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615
[2024-06-10 09:16:41,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.61 | bwd_microstep: 260.83 | bwd_inner_microstep: 260.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 09:16:43,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.75 | bwd_microstep: 1645.61 | bwd_inner_microstep: 1645.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 09:16:45,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.16 | bwd_microstep: 1252.62 | bwd_inner_microstep: 1252.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 09:16:48,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.93 | bwd_microstep: 1660.28 | bwd_inner_microstep: 1660.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 09:16:49,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.58 | bwd_microstep: 1277.56 | bwd_inner_microstep: 1277.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 09:16:51,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1255.44 | bwd_inner_microstep: 1255.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 09:16:53,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.00 | bwd_microstep: 1504.66 | bwd_inner_microstep: 1504.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 09:16:55,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1277.96 | bwd_inner_microstep: 1277.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797
[2024-06-10 09:16:57,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.22 | bwd_microstep: 1656.32 | bwd_inner_microstep: 1656.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 09:17:03,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.64
[2024-06-10 09:17:03,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.18 | bwd_microstep: 4865.96 | bwd_inner_microstep: 1954.14 | bwd_allreduce_microstep: 2911.78 | step_microstep: 37.94
[2024-06-10 09:17:03,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15157.55 | bwd: 43816.66 | bwd_inner: 40903.88 | bwd_allreduce: 2912.05 | step: 39.39
{'loss': 1.2756, 'learning_rate': 3.339376562342653e-05, 'epoch': 0.29}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 09:17:05,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1877
[2024-06-10 09:17:06,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.11 | bwd_microstep: 708.97 | bwd_inner_microstep: 708.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4339
[2024-06-10 09:17:08,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.89 | bwd_microstep: 1598.18 | bwd_inner_microstep: 1598.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-10 09:17:10,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.70 | bwd_microstep: 1512.16 | bwd_inner_microstep: 1512.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 09:17:11,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.48 | bwd_microstep: 792.30 | bwd_inner_microstep: 792.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 09:17:13,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1247.26 | bwd_inner_microstep: 1247.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 09:17:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.84 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 09:17:17,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.70 | bwd_microstep: 1631.70 | bwd_inner_microstep: 1631.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-10 09:17:18,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.35 | bwd_microstep: 715.42 | bwd_inner_microstep: 715.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3453
[2024-06-10 09:17:20,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1381.75 | bwd_inner_microstep: 1381.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3630
[2024-06-10 09:17:22,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.95 | bwd_microstep: 1436.12 | bwd_inner_microstep: 1436.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3680
[2024-06-10 09:17:24,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.10 | bwd_microstep: 1660.27 | bwd_inner_microstep: 1660.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 09:17:26,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.01 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3652
[2024-06-10 09:17:28,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.59 | bwd_microstep: 1649.75 | bwd_inner_microstep: 1649.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 09:17:30,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1472.79 | bwd_inner_microstep: 1472.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3654
[2024-06-10 09:17:32,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.55 | bwd_microstep: 1579.06 | bwd_inner_microstep: 1579.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3531
[2024-06-10 09:17:34,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.17 | bwd_microstep: 1684.27 | bwd_inner_microstep: 1684.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3125
[2024-06-10 09:17:36,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.58 | bwd_microstep: 1188.35 | bwd_inner_microstep: 1188.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 09:17:38,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.39 | bwd_microstep: 1392.92 | bwd_inner_microstep: 1392.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 09:17:40,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.35 | bwd_microstep: 1657.77 | bwd_inner_microstep: 1657.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 09:17:42,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.25 | bwd_microstep: 1309.60 | bwd_inner_microstep: 1309.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3535
[2024-06-10 09:17:44,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.36 | bwd_microstep: 1197.11 | bwd_inner_microstep: 1197.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 09:17:46,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.34 | bwd_microstep: 1401.26 | bwd_inner_microstep: 1401.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 09:17:47,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.53 | bwd_microstep: 1254.78 | bwd_inner_microstep: 1254.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 09:17:49,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1396.00 | bwd_inner_microstep: 1395.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2158
[2024-06-10 09:17:50,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.61 | bwd_microstep: 759.93 | bwd_inner_microstep: 759.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 09:17:52,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 806.00 | bwd_inner_microstep: 805.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 09:17:53,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.70 | bwd_microstep: 1374.90 | bwd_inner_microstep: 1374.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 09:17:56,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.85 | bwd_microstep: 1655.89 | bwd_inner_microstep: 1655.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3787
[2024-06-10 09:17:58,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.99 | bwd_microstep: 1579.26 | bwd_inner_microstep: 1579.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-10 09:17:59,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.83 | bwd_microstep: 908.11 | bwd_inner_microstep: 908.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3822
[2024-06-10 09:18:03,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 09:18:03,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 3509.70 | bwd_inner_microstep: 1808.68 | bwd_allreduce_microstep: 1700.96 | step_microstep: 38.34
[2024-06-10 09:18:03,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15902.07 | bwd: 44440.61 | bwd_inner: 42738.73 | bwd_allreduce: 1701.19 | step: 39.83
{'loss': 1.2762, 'learning_rate': 3.3365867744717827e-05, 'epoch': 0.29}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 09:18:05,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.34 | bwd_microstep: 1476.29 | bwd_inner_microstep: 1476.08 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 09:18:06,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.05 | bwd_microstep: 789.15 | bwd_inner_microstep: 789.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 09:18:09,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.26 | bwd_microstep: 1649.10 | bwd_inner_microstep: 1649.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 09:18:10,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1283.53 | bwd_inner_microstep: 1283.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 09:18:12,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.23 | bwd_microstep: 1446.02 | bwd_inner_microstep: 1445.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 09:18:14,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.14 | bwd_microstep: 1274.65 | bwd_inner_microstep: 1274.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 09:18:16,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.92 | bwd_microstep: 1247.51 | bwd_inner_microstep: 1247.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 09:18:18,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1388.97 | bwd_inner_microstep: 1388.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 09:18:19,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.35 | bwd_microstep: 807.75 | bwd_inner_microstep: 807.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 09:18:21,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1391.44 | bwd_inner_microstep: 1391.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2156
[2024-06-10 09:18:22,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.27 | bwd_microstep: 853.59 | bwd_inner_microstep: 853.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2194
[2024-06-10 09:18:23,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.95 | bwd_microstep: 992.07 | bwd_inner_microstep: 992.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-10 09:18:25,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1414.82 | bwd_inner_microstep: 1414.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 09:18:27,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1346.56 | bwd_inner_microstep: 1346.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2133
[2024-06-10 09:18:29,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.81 | bwd_microstep: 927.60 | bwd_inner_microstep: 927.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3025
[2024-06-10 09:18:30,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.21 | bwd_microstep: 1230.09 | bwd_inner_microstep: 1230.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-10 09:18:33,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.98 | bwd_microstep: 1625.16 | bwd_inner_microstep: 1625.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 09:18:35,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.52 | bwd_microstep: 1490.28 | bwd_inner_microstep: 1490.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 09:18:36,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1285.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3914
[2024-06-10 09:18:39,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1589.26 | bwd_inner_microstep: 1589.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3672
[2024-06-10 09:18:40,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1356.55 | bwd_inner_microstep: 1356.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3734
[2024-06-10 09:18:42,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1463.01 | bwd_inner_microstep: 1462.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 09:18:45,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1509.64 | bwd_inner_microstep: 1509.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 09:18:46,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.26 | bwd_microstep: 700.25 | bwd_inner_microstep: 700.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 09:18:47,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.13 | bwd_microstep: 1258.42 | bwd_inner_microstep: 1258.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2188
[2024-06-10 09:18:48,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.87 | bwd_microstep: 861.25 | bwd_inner_microstep: 861.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3717
[2024-06-10 09:18:50,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.93 | bwd_microstep: 1397.29 | bwd_inner_microstep: 1397.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569
[2024-06-10 09:18:52,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.63 | bwd_microstep: 1444.99 | bwd_inner_microstep: 1444.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-10 09:18:55,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.04 | bwd_microstep: 1600.17 | bwd_inner_microstep: 1600.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 09:18:57,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 1506.17 | bwd_inner_microstep: 1506.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.26
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 09:18:59,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.97 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3809
[2024-06-10 09:19:03,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.72 | optimizer_step: 6.61
[2024-06-10 09:19:03,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 4054.17 | bwd_inner_microstep: 1561.04 | bwd_allreduce_microstep: 2493.01 | step_microstep: 46.25
[2024-06-10 09:19:03,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15538.27 | bwd: 44114.78 | bwd_inner: 41620.62 | bwd_allreduce: 2493.37 | step: 49.58
{'loss': 1.2377, 'learning_rate': 3.3337922791492406e-05, 'epoch': 0.29}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 09:19:05,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1491.51 | bwd_inner_microstep: 1491.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872
[2024-06-10 09:19:08,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.29 | bwd_microstep: 1665.73 | bwd_inner_microstep: 1665.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4240
[2024-06-10 09:19:10,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.61 | bwd_microstep: 1564.93 | bwd_inner_microstep: 1564.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 09:19:12,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.75 | bwd_microstep: 1553.29 | bwd_inner_microstep: 1553.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 09:19:14,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.00 | bwd_microstep: 1251.15 | bwd_inner_microstep: 1251.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 09:19:16,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.06 | bwd_microstep: 1546.07 | bwd_inner_microstep: 1546.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 09:19:18,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1403.10 | bwd_inner_microstep: 1403.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 09:19:20,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1391.21 | bwd_inner_microstep: 1391.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 09:19:22,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.15 | bwd_microstep: 1387.90 | bwd_inner_microstep: 1387.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3501
[2024-06-10 09:19:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.44 | bwd_microstep: 1254.68 | bwd_inner_microstep: 1254.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2179
[2024-06-10 09:19:25,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.30 | bwd_microstep: 954.03 | bwd_inner_microstep: 954.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 09:19:27,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1355.09 | bwd_inner_microstep: 1355.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3464
[2024-06-10 09:19:28,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.35 | bwd_microstep: 1341.92 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 09:19:31,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.55 | bwd_microstep: 1650.89 | bwd_inner_microstep: 1650.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958
[2024-06-10 09:19:32,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.55 | bwd_microstep: 831.34 | bwd_inner_microstep: 831.13 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2001
[2024-06-10 09:19:33,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.12 | bwd_microstep: 712.40 | bwd_inner_microstep: 712.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 09:19:35,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1493.76 | bwd_inner_microstep: 1493.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 09:19:37,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1407.25 | bwd_inner_microstep: 1407.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 09:19:39,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 09:19:41,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1258.78 | bwd_inner_microstep: 1258.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 09:19:43,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1565.32 | bwd_inner_microstep: 1565.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3493
[2024-06-10 09:19:44,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.80 | bwd_microstep: 1227.39 | bwd_inner_microstep: 1227.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 09:19:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.75 | bwd_microstep: 1660.51 | bwd_inner_microstep: 1660.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 09:19:48,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.05 | bwd_microstep: 810.05 | bwd_inner_microstep: 810.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 09:19:50,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1348.68 | bwd_inner_microstep: 1348.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 09:19:52,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1458.31 | bwd_inner_microstep: 1458.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 09:19:54,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.49 | bwd_microstep: 1582.59 | bwd_inner_microstep: 1582.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 09:19:56,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1395.79 | bwd_inner_microstep: 1395.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 09:19:58,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.46 | bwd_microstep: 1647.96 | bwd_inner_microstep: 1647.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 09:20:00,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.41 | bwd_microstep: 1348.10 | bwd_inner_microstep: 1348.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3590
[2024-06-10 09:20:02,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.28 | bwd_microstep: 1808.78 | bwd_inner_microstep: 1808.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 09:20:06,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 09:20:06,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.77 | bwd_microstep: 3458.21 | bwd_inner_microstep: 1572.72 | bwd_allreduce_microstep: 1885.44 | step_microstep: 38.35
[2024-06-10 09:20:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16519.96 | bwd: 46225.51 | bwd_inner: 44338.98 | bwd_allreduce: 1885.75 | step: 40.42
{'loss': 1.2419, 'learning_rate': 3.3309930862172245e-05, 'epoch': 0.29}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481
[2024-06-10 09:20:09,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.58 | bwd_microstep: 1576.22 | bwd_inner_microstep: 1576.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2222
[2024-06-10 09:20:10,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.82 | bwd_microstep: 873.53 | bwd_inner_microstep: 873.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 09:20:12,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.07 | bwd_microstep: 1245.73 | bwd_inner_microstep: 1245.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 09:20:14,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.28 | bwd_microstep: 1563.86 | bwd_inner_microstep: 1563.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868
[2024-06-10 09:20:16,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.27 | bwd_microstep: 1666.27 | bwd_inner_microstep: 1666.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 09:20:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.36 | bwd_microstep: 684.14 | bwd_inner_microstep: 684.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2576
[2024-06-10 09:20:18,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.58 | bwd_microstep: 1041.14 | bwd_inner_microstep: 1041.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 09:20:20,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.15 | bwd_microstep: 1286.74 | bwd_inner_microstep: 1286.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 09:20:22,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.19 | bwd_microstep: 1386.94 | bwd_inner_microstep: 1386.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3413
[2024-06-10 09:20:24,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.64 | bwd_microstep: 1213.80 | bwd_inner_microstep: 1213.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 09:20:26,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.48 | bwd_microstep: 1288.90 | bwd_inner_microstep: 1288.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3682
[2024-06-10 09:20:28,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.86 | bwd_microstep: 1658.64 | bwd_inner_microstep: 1658.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 09:20:30,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.72 | bwd_microstep: 1389.53 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3484
[2024-06-10 09:20:32,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1542.31 | bwd_inner_microstep: 1542.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 09:20:34,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1341.58 | bwd_inner_microstep: 1341.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 900
[2024-06-10 09:20:34,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.96 | bwd_microstep: 372.13 | bwd_inner_microstep: 372.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 09:20:36,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1376.23 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2924
[2024-06-10 09:20:38,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.25 | bwd_microstep: 1285.90 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 09:20:40,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.95 | bwd_microstep: 1387.01 | bwd_inner_microstep: 1386.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3678
[2024-06-10 09:20:42,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1327.41 | bwd_inner_microstep: 1327.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 09:20:44,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.33 | bwd_microstep: 1658.48 | bwd_inner_microstep: 1658.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-10 09:20:45,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.96 | bwd_microstep: 1073.92 | bwd_inner_microstep: 1073.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 09:20:48,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.34 | bwd_microstep: 1507.22 | bwd_inner_microstep: 1507.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 09:20:49,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.66 | bwd_microstep: 1300.43 | bwd_inner_microstep: 1300.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 09:20:51,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.34 | bwd_microstep: 1337.83 | bwd_inner_microstep: 1337.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-10 09:20:53,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.83 | bwd_microstep: 1307.15 | bwd_inner_microstep: 1307.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 09:20:55,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 09:20:57,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.06 | bwd_microstep: 1596.18 | bwd_inner_microstep: 1596.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3624
[2024-06-10 09:20:59,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.42 | bwd_microstep: 1708.10 | bwd_inner_microstep: 1708.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2035
[2024-06-10 09:21:00,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.01 | bwd_microstep: 841.88 | bwd_inner_microstep: 841.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 09:21:03,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.12 | bwd_microstep: 1525.10 | bwd_inner_microstep: 1525.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812
[2024-06-10 09:21:07,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 09:21:07,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.98 | bwd_microstep: 3793.38 | bwd_inner_microstep: 1721.69 | bwd_allreduce_microstep: 2071.64 | step_microstep: 38.24
[2024-06-10 09:21:07,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15765.32 | bwd: 44439.36 | bwd_inner: 42366.81 | bwd_allreduce: 2071.87 | step: 39.76
{'loss': 1.3103, 'learning_rate': 3.328189205534479e-05, 'epoch': 0.29}
 [8:38:40<21:05:17, 61.77s/it]


 29%|██▉       | 497/1726 [8:38:40<21:05:17, 61.77s/it]
 29%|██▉       | 498/1726 [8:39:39<20:49:03, 61.03s/it]


 29%|██▉       | 498/1726 [8:39:39<20:49:03, 61.03s/it]
 29%|██▉       | 499/1726 [8:40:40<20:45:52, 60.92s/it]


 29%|██▉       | 499/1726 [8:40:40<20:45:52, 60.92s/it]
 29%|██▉       | 500/1726 [8:41:40<20:39:20, 60.65s/it]


 29%|██▉       | 500/1726 [8:41:40<20:39:20, 60.65s/it]
 29%|██▉       | 501/1726 [8:42:43<20:53:31, 61.40s/it]


 29%|██▉       | 501/1726 [8:42:43<20:53:31, 61.40s/it]
 29%|██▉       | 502/1726 [8:43:44<20:47:18, 61.14s/it]


 29%|██▉       | 502/1726dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-10 09:21:09,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1419.03 | bwd_inner_microstep: 1418.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4691
[2024-06-10 09:21:11,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.02 | bwd_microstep: 1584.05 | bwd_inner_microstep: 1584.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3958
[2024-06-10 09:21:13,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1400.73 | bwd_inner_microstep: 1400.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 09:21:15,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1379.28 | bwd_inner_microstep: 1379.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298
[2024-06-10 09:21:16,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.53 | bwd_microstep: 971.40 | bwd_inner_microstep: 971.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 09:21:17,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 793.93 | bwd_inner_microstep: 793.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 09:21:19,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.56 | bwd_microstep: 1415.37 | bwd_inner_microstep: 1415.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 09:21:21,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1397.94 | bwd_inner_microstep: 1397.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3866
[2024-06-10 09:21:23,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3989
[2024-06-10 09:21:26,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.74 | bwd_microstep: 1613.04 | bwd_inner_microstep: 1613.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 09:21:27,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1390.90 | bwd_inner_microstep: 1390.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 09:21:29,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 09:21:31,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1294.79 | bwd_inner_microstep: 1294.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 09:21:33,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1422.37 | bwd_inner_microstep: 1422.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1947
[2024-06-10 09:21:34,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.87 | bwd_microstep: 890.93 | bwd_inner_microstep: 890.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3454
[2024-06-10 09:21:36,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1302.21 | bwd_inner_microstep: 1302.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 09:21:38,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.74 | bwd_microstep: 1405.51 | bwd_inner_microstep: 1405.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3835
[2024-06-10 09:21:40,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.03 | bwd_microstep: 1701.64 | bwd_inner_microstep: 1701.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2681
[2024-06-10 09:21:42,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.77 | bwd_microstep: 1223.95 | bwd_inner_microstep: 1223.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 09:21:44,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3641
[2024-06-10 09:21:46,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.76 | bwd_microstep: 1543.95 | bwd_inner_microstep: 1543.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 09:21:48,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.00 | bwd_microstep: 1382.59 | bwd_inner_microstep: 1382.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 09:21:50,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1403.94 | bwd_inner_microstep: 1403.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3447
[2024-06-10 09:21:52,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.24 | bwd_microstep: 1190.48 | bwd_inner_microstep: 1190.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 09:21:54,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1556.97 | bwd_inner_microstep: 1556.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-10 09:21:56,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.41 | bwd_microstep: 1607.46 | bwd_inner_microstep: 1607.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641
[2024-06-10 09:21:58,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.22 | bwd_microstep: 1620.56 | bwd_inner_microstep: 1620.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 09:21:59,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.23 | bwd_microstep: 805.80 | bwd_inner_microstep: 805.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 09:22:02,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.17 | bwd_microstep: 1658.70 | bwd_inner_microstep: 1658.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 09:22:04,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.81 | bwd_microstep: 1544.28 | bwd_inner_microstep: 1544.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3569
[2024-06-10 09:22:06,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.97 | bwd_microstep: 1698.46 | bwd_inner_microstep: 1698.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781
[2024-06-10 09:22:09,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 09:22:09,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1922.49 | bwd_inner_microstep: 1526.91 | bwd_allreduce_microstep: 395.53 | step_microstep: 37.73
[2024-06-10 09:22:09,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16541.39 | bwd: 44679.35 | bwd_inner: 44282.89 | bwd_allreduce: 395.76 | step: 39.85
{'loss': 1.3266, 'learning_rate': 3.325380646976255e-05, 'epoch': 0.29}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 09:22:11,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.76 | bwd_microstep: 1475.79 | bwd_inner_microstep: 1475.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 09:22:12,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.41 | bwd_microstep: 1287.56 | bwd_inner_microstep: 1287.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 09:22:14,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.18 | bwd_microstep: 1449.60 | bwd_inner_microstep: 1449.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 09:22:17,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1553.01 | bwd_inner_microstep: 1552.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 09:22:18,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1345.22 | bwd_inner_microstep: 1345.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 09:22:20,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1247.56 | bwd_inner_microstep: 1247.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 09:22:22,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.71 | bwd_microstep: 1483.07 | bwd_inner_microstep: 1483.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3724
[2024-06-10 09:22:24,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.39 | bwd_microstep: 1242.40 | bwd_inner_microstep: 1242.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 09:22:26,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.17 | bwd_microstep: 1637.99 | bwd_inner_microstep: 1637.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 09:22:27,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.64 | bwd_microstep: 799.72 | bwd_inner_microstep: 799.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 09:22:29,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.10 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1281.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3567
[2024-06-10 09:22:31,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1267.58 | bwd_inner_microstep: 1267.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 09:22:33,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1347.46 | bwd_inner_microstep: 1347.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3423
[2024-06-10 09:22:34,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.51 | bwd_microstep: 1239.45 | bwd_inner_microstep: 1239.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3637
[2024-06-10 09:22:37,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.85 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3526
[2024-06-10 09:22:39,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.65 | bwd_microstep: 1558.76 | bwd_inner_microstep: 1558.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897
[2024-06-10 09:22:41,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.51 | bwd_microstep: 1589.96 | bwd_inner_microstep: 1589.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-10 09:22:43,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.80 | bwd_microstep: 1603.40 | bwd_inner_microstep: 1603.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2494
[2024-06-10 09:22:45,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.01 | bwd_microstep: 1116.82 | bwd_inner_microstep: 1116.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 09:22:47,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.66 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 09:22:49,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1555.86 | bwd_inner_microstep: 1555.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 09:22:51,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1386.91 | bwd_inner_microstep: 1386.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3803
[2024-06-10 09:22:53,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.57 | bwd_microstep: 1623.04 | bwd_inner_microstep: 1623.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 09:22:55,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.57 | bwd_microstep: 1356.63 | bwd_inner_microstep: 1356.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-10 09:22:57,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.74 | bwd_microstep: 1755.78 | bwd_inner_microstep: 1755.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 09:23:00,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.75 | bwd_microstep: 1568.90 | bwd_inner_microstep: 1568.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3555
[2024-06-10 09:23:02,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.34 | bwd_microstep: 1692.65 | bwd_inner_microstep: 1692.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 09:23:04,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.48 | bwd_microstep: 1408.35 | bwd_inner_microstep: 1408.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-10 09:23:05,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.53 | bwd_microstep: 980.20 | bwd_inner_microstep: 980.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 09:23:07,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.54 | bwd_microstep: 1183.98 | bwd_inner_microstep: 1183.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2236
[2024-06-10 09:23:08,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.10 | bwd_microstep: 898.50 | bwd_inner_microstep: 898.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 09:23:12,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 09:23:12,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 3336.95 | bwd_inner_microstep: 1422.90 | bwd_allreduce_microstep: 1914.00 | step_microstep: 38.10
[2024-06-10 09:23:12,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16567.48 | bwd: 46419.82 | bwd_inner: 44504.90 | bwd_allreduce: 1914.23 | step: 40.03
{'loss': 1.2699, 'learning_rate': 3.322567420434283e-05, 'epoch': 0.29}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 09:23:14,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.03 | bwd_microstep: 1330.73 | bwd_inner_microstep: 1330.64 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044
[2024-06-10 09:23:15,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.47 | bwd_microstep: 813.53 | bwd_inner_microstep: 813.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-10 09:23:17,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1450.12 | bwd_inner_microstep: 1450.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3361
[2024-06-10 09:23:19,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.02 | bwd_microstep: 1334.35 | bwd_inner_microstep: 1334.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 09:23:21,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.00 | bwd_microstep: 1486.86 | bwd_inner_microstep: 1486.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741
[2024-06-10 09:23:23,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.35 | bwd_microstep: 1633.48 | bwd_inner_microstep: 1633.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2260
[2024-06-10 09:23:24,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.25 | bwd_microstep: 903.66 | bwd_inner_microstep: 903.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1941
[2024-06-10 09:23:25,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.21 | bwd_microstep: 762.01 | bwd_inner_microstep: 761.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 09:23:27,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1531.04 | bwd_inner_microstep: 1531.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 09:23:29,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1386.97 | bwd_inner_microstep: 1386.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 09:23:31,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.95 | bwd_microstep: 1391.03 | bwd_inner_microstep: 1391.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1965
[2024-06-10 09:23:32,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.90 | bwd_microstep: 848.66 | bwd_inner_microstep: 848.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-10 09:23:34,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.17 | bwd_microstep: 895.54 | bwd_inner_microstep: 895.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3691
[2024-06-10 09:23:36,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.34 | bwd_microstep: 1724.83 | bwd_inner_microstep: 1724.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 09:23:38,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.28 | bwd_microstep: 1478.34 | bwd_inner_microstep: 1478.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 09:23:40,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.44 | bwd_microstep: 1420.18 | bwd_inner_microstep: 1420.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 09:23:42,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.34 | bwd_microstep: 1447.31 | bwd_inner_microstep: 1447.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404
[2024-06-10 09:23:44,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1439.77 | bwd_inner_microstep: 1439.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 09:23:46,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.59 | bwd_microstep: 1414.51 | bwd_inner_microstep: 1414.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 09:23:48,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1292.64 | bwd_inner_microstep: 1292.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3527
[2024-06-10 09:23:50,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 1454.06 | bwd_inner_microstep: 1454.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 09:23:52,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1418.24 | bwd_inner_microstep: 1418.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3550
[2024-06-10 09:23:53,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.58 | bwd_microstep: 1200.54 | bwd_inner_microstep: 1200.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 09:23:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1290.56 | bwd_inner_microstep: 1290.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-10 09:23:57,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1409.59 | bwd_inner_microstep: 1409.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 09:23:59,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.61 | bwd_microstep: 1190.20 | bwd_inner_microstep: 1190.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 09:24:01,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1378.96 | bwd_inner_microstep: 1378.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 09:24:03,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1356.96 | bwd_inner_microstep: 1356.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3542
[2024-06-10 09:24:05,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.27 | bwd_microstep: 1354.89 | bwd_inner_microstep: 1354.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-10 09:24:07,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.71 | bwd_microstep: 1596.33 | bwd_inner_microstep: 1596.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 09:24:08,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1250.93 | bwd_inner_microstep: 1250.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 09:24:14,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 09:24:14,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.49 | bwd_microstep: 5335.98 | bwd_inner_microstep: 1043.89 | bwd_allreduce_microstep: 4292.04 | step_microstep: 38.41
[2024-06-10 09:24:14,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15660.03 | bwd: 46222.85 | bwd_inner: 41929.83 | bwd_allreduce: 4292.32 | step: 40.02
{'loss': 1.326, 'learning_rate': 3.3197495358167314e-05, 'epoch': 0.29}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 09:24:16,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.20 | bwd_microstep: 1369.80 | bwd_inner_microstep: 1369.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3905
[2024-06-10 09:24:18,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.23 | bwd_microstep: 1585.92 | bwd_inner_microstep: 1585.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 09:24:20,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1340.60 | bwd_inner_microstep: 1340.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-10 09:24:22,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1445.63 | bwd_inner_microstep: 1445.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 09:24:24,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1243.87 | bwd_inner_microstep: 1243.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 09:24:26,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.25 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 09:24:27,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1246.97 | bwd_inner_microstep: 1246.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 09:24:29,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1531.32 | bwd_inner_microstep: 1531.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 09:24:31,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1250.50 | bwd_inner_microstep: 1250.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 09:24:33,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 09:24:35,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1253.67 | bwd_inner_microstep: 1253.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3503
[2024-06-10 09:24:37,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1345.48 | bwd_inner_microstep: 1345.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 09:24:39,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1451.27 | bwd_inner_microstep: 1451.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 09:24:41,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.77 | bwd_microstep: 1487.89 | bwd_inner_microstep: 1487.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2211
[2024-06-10 09:24:42,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.89 | bwd_microstep: 1059.56 | bwd_inner_microstep: 1059.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 09:24:44,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1456.55 | bwd_inner_microstep: 1456.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 09:24:46,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1283.65 | bwd_inner_microstep: 1283.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 09:24:48,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.82 | bwd_microstep: 1294.59 | bwd_inner_microstep: 1294.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1991
[2024-06-10 09:24:49,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.09 | bwd_microstep: 707.82 | bwd_inner_microstep: 707.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 09:24:51,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.17 | bwd_microstep: 1428.08 | bwd_inner_microstep: 1428.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2289
[2024-06-10 09:24:52,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.25 | bwd_microstep: 938.13 | bwd_inner_microstep: 938.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2081
[2024-06-10 09:24:53,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.69 | bwd_microstep: 822.04 | bwd_inner_microstep: 822.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2274
[2024-06-10 09:24:54,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.01 | bwd_microstep: 1037.16 | bwd_inner_microstep: 1037.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3605
[2024-06-10 09:24:57,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.83 | bwd_microstep: 1643.20 | bwd_inner_microstep: 1643.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3560
[2024-06-10 09:24:59,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.45 | bwd_microstep: 1527.52 | bwd_inner_microstep: 1527.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3734
[2024-06-10 09:25:01,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.32 | bwd_microstep: 1601.53 | bwd_inner_microstep: 1601.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3557
[2024-06-10 09:25:03,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.33 | bwd_microstep: 1589.05 | bwd_inner_microstep: 1589.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3648
[2024-06-10 09:25:05,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.18 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-10 09:25:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.40 | bwd_microstep: 1407.27 | bwd_inner_microstep: 1407.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 09:25:09,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.13 | bwd_microstep: 1552.07 | bwd_inner_microstep: 1552.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 09:25:11,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.20 | bwd_microstep: 1340.27 | bwd_inner_microstep: 1340.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3765
[2024-06-10 09:25:15,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.30 | optimizer_step: 6.63
[2024-06-10 09:25:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.07 | bwd_microstep: 3190.95 | bwd_inner_microstep: 1974.21 | bwd_allreduce_microstep: 1216.67 | step_microstep: 38.54
[2024-06-10 09:25:15,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16027.23 | bwd: 44252.99 | bwd_inner: 43035.36 | bwd_allreduce: 1216.92 | step: 40.38
{'loss': 1.2455, 'learning_rate': 3.3169270030481754e-05, 'epoch': 0.29}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 09:25:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.21 | bwd_microstep: 1280.86 | bwd_inner_microstep: 1280.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 09:25:18,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.45 | bwd_microstep: 1249.36 | bwd_inner_microstep: 1249.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 09:25:20,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.16 | bwd_microstep: 1497.19 | bwd_inner_microstep: 1497.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 09:25:22,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.50 | bwd_microstep: 1448.27 | bwd_inner_microstep: 1448.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 09:25:24,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.89 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 09:25:25,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.79 | bwd_microstep: 700.53 | bwd_inner_microstep: 700.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 09:25:27,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1191.09 | bwd_inner_microstep: 1191.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 09:25:29,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.59 | bwd_microstep: 1384.14 | bwd_inner_microstep: 1384.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743
[2024-06-10 09:25:31,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 1536.20 | bwd_inner_microstep: 1536.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907
[2024-06-10 09:25:32,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.65 | bwd_microstep: 687.11 | bwd_inner_microstep: 687.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 09:25:34,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.53 | bwd_microstep: 1289.85 | bwd_inner_microstep: 1289.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 09:25:35,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.36 | bwd_microstep: 697.36 | bwd_inner_microstep: 697.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2929
[2024-06-10 09:25:36,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.71 | bwd_microstep: 1094.35 | bwd_inner_microstep: 1094.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 09:25:38,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1385.79 | bwd_inner_microstep: 1385.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656
[2024-06-10 09:25:40,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.72 | bwd_microstep: 1719.22 | bwd_inner_microstep: 1719.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504
[2024-06-10 09:25:42,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1322.13 | bwd_inner_microstep: 1322.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3633
[2024-06-10 09:25:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1492.83 | bwd_inner_microstep: 1492.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 09:25:46,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1254.91 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2207
[2024-06-10 09:25:47,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.68 | bwd_microstep: 882.22 | bwd_inner_microstep: 882.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 09:25:49,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.06 | bwd_microstep: 1188.89 | bwd_inner_microstep: 1188.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 09:25:51,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1526.95 | bwd_inner_microstep: 1526.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-10 09:25:52,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.05 | bwd_microstep: 884.38 | bwd_inner_microstep: 884.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 09:25:54,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1500.08 | bwd_inner_microstep: 1500.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 09:25:56,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.38 | bwd_microstep: 1290.44 | bwd_inner_microstep: 1290.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-10 09:25:58,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.21 | bwd_microstep: 1487.99 | bwd_inner_microstep: 1487.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605
[2024-06-10 09:26:00,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1440.14 | bwd_inner_microstep: 1440.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-10 09:26:02,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.68 | bwd_microstep: 1645.67 | bwd_inner_microstep: 1645.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042
[2024-06-10 09:26:04,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.58 | bwd_microstep: 901.80 | bwd_inner_microstep: 901.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-10 09:26:05,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.02 | bwd_microstep: 802.44 | bwd_inner_microstep: 802.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3574
[2024-06-10 09:26:07,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.09 | bwd_microstep: 1626.60 | bwd_inner_microstep: 1626.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3597
[2024-06-10 09:26:09,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.94 | bwd_microstep: 1638.62 | bwd_inner_microstep: 1638.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3580
[2024-06-10 09:26:15,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 09:26:15,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.95 | bwd_microstep: 5353.36 | bwd_inner_microstep: 1884.91 | bwd_allreduce_microstep: 3468.38 | step_microstep: 38.64
[2024-06-10 09:26:15,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.86 | bwd: 44648.54 | bwd_inner: 41179.23 | bwd_allreduce: 3468.61 | step: 40.16
{'loss': 1.2581, 'learning_rate': 3.3140998320695606e-05, 'epoch': 0.29}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 09:26:17,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.61 | bwd_microstep: 1334.83 | bwd_inner_microstep: 1334.64 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3857
[2024-06-10 09:26:19,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.94 | bwd_microstep: 1492.80 | bwd_inner_microstep: 1492.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3903
[2024-06-10 09:26:21,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.82 | bwd_microstep: 1548.92 | bwd_inner_microstep: 1548.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 09:26:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.79 | bwd_microstep: 1395.43 | bwd_inner_microstep: 1395.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 09:26:25,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.27 | bwd_microstep: 1342.28 | bwd_inner_microstep: 1342.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-10 09:26:26,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.12 | bwd_microstep: 726.60 | bwd_inner_microstep: 726.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 09:26:27,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.49 | bwd_microstep: 791.17 | bwd_inner_microstep: 791.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 859
[2024-06-10 09:26:28,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 137.13 | bwd_microstep: 350.88 | bwd_inner_microstep: 350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3534
[2024-06-10 09:26:29,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.81 | bwd_microstep: 1346.57 | bwd_inner_microstep: 1346.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704
[2024-06-10 09:26:32,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.03 | bwd_microstep: 1627.31 | bwd_inner_microstep: 1627.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 09:26:34,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1533.13 | bwd_inner_microstep: 1533.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 09:26:36,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 1386.49 | bwd_inner_microstep: 1386.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 09:26:38,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.98 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3537
[2024-06-10 09:26:40,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.94 | bwd_microstep: 1624.09 | bwd_inner_microstep: 1624.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-10 09:26:42,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.90 | bwd_microstep: 1526.66 | bwd_inner_microstep: 1526.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-10 09:26:44,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.56 | bwd_microstep: 1349.68 | bwd_inner_microstep: 1349.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-10 09:26:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1419.86 | bwd_inner_microstep: 1419.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 09:26:48,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1292.67 | bwd_inner_microstep: 1292.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 09:26:50,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.53 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 09:26:52,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1289.24 | bwd_inner_microstep: 1289.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3782
[2024-06-10 09:26:54,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.65 | bwd_microstep: 1587.53 | bwd_inner_microstep: 1587.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 09:26:56,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.83 | bwd_microstep: 1458.86 | bwd_inner_microstep: 1458.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 09:26:58,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.97 | bwd_microstep: 1309.32 | bwd_inner_microstep: 1309.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3736
[2024-06-10 09:27:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.65 | bwd_microstep: 1471.78 | bwd_inner_microstep: 1471.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539
[2024-06-10 09:27:02,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.94 | bwd_microstep: 1522.82 | bwd_inner_microstep: 1522.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 09:27:04,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.30 | bwd_microstep: 1499.43 | bwd_inner_microstep: 1499.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2064
[2024-06-10 09:27:05,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.12 | bwd_microstep: 862.75 | bwd_inner_microstep: 862.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468
[2024-06-10 09:27:07,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.65 | bwd_microstep: 1242.35 | bwd_inner_microstep: 1242.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 09:27:09,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.51 | bwd_microstep: 1652.80 | bwd_inner_microstep: 1652.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3593
[2024-06-10 09:27:11,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.60 | bwd_microstep: 1354.92 | bwd_inner_microstep: 1354.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091
[2024-06-10 09:27:12,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.16 | bwd_microstep: 918.26 | bwd_inner_microstep: 918.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3804
[2024-06-10 09:27:15,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 09:27:15,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.43 | bwd_microstep: 2607.31 | bwd_inner_microstep: 1918.11 | bwd_allreduce_microstep: 689.15 | step_microstep: 38.04
[2024-06-10 09:27:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16088.84 | bwd: 43826.82 | bwd_inner: 43136.62 | bwd_allreduce: 689.44 | step: 39.86
 [8:43:44<20:47:18, 61.14s/it]
 29%|██▉       | 503/1726 [8:44:45<20:49:03, 61.28s/it]


 29%|██▉       | 503/1726 [8:44:45<20:49:03, 61.28s/it]
 29%|██▉       | 504/1726 [8:45:49<21:00:43, 61.90s/it]


 29%|██▉       | 504/1726 [8:45:49<21:00:43, 61.90s/it]
 29%|██▉       | 505/1726 [8:46:51<21:01:40, 62.00s/it]


 29%|██▉       | 505/1726 [8:46:51<21:01:40, 62.00s/it]
 29%|██▉       | 506/1726 [8:47:52<20:52:20, 61.59s/it]


 29%|██▉       | 506/1726 [8:47:52<20:52:20, 61.59s/it]
 29%|██▉       | 507/1726 [8:48:52<20:43:39, 61.21s/it]


 29%|██▉       | 507/1726 [8:48:52<20:43:39, 61.21s/it]
 29%|██▉       | 508/1726 [8:49:52<20:37:00, 60.94s/it]
                          {'loss': 1.3095, 'learning_rate': 3.311268032838169e-05, 'epoch': 0.29}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3541
[2024-06-10 09:27:18,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.79 | bwd_microstep: 1593.89 | bwd_inner_microstep: 1593.75 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 09:27:20,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1375.89 | bwd_inner_microstep: 1375.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 09:27:21,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.11 | bwd_microstep: 794.73 | bwd_inner_microstep: 794.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 09:27:23,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1450.47 | bwd_inner_microstep: 1450.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2945
[2024-06-10 09:27:24,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.29 | bwd_microstep: 1098.25 | bwd_inner_microstep: 1098.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 09:27:26,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.04 | bwd_microstep: 1185.88 | bwd_inner_microstep: 1185.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 09:27:28,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.73 | bwd_microstep: 1538.81 | bwd_inner_microstep: 1538.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 09:27:30,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.46 | bwd_microstep: 1281.62 | bwd_inner_microstep: 1281.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-10 09:27:31,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.49 | bwd_microstep: 700.65 | bwd_inner_microstep: 700.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931
[2024-06-10 09:27:32,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.03 | bwd_microstep: 821.86 | bwd_inner_microstep: 821.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876
[2024-06-10 09:27:33,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.79 | bwd_microstep: 715.75 | bwd_inner_microstep: 715.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 09:27:35,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1250.87 | bwd_inner_microstep: 1250.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-10 09:27:37,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.48 | bwd_microstep: 1707.65 | bwd_inner_microstep: 1707.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-10 09:27:39,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.37 | bwd_microstep: 1418.20 | bwd_inner_microstep: 1418.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 09:27:41,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1489.66 | bwd_inner_microstep: 1489.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 09:27:43,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1257.91 | bwd_inner_microstep: 1257.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3504
[2024-06-10 09:27:45,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.40 | bwd_microstep: 1555.78 | bwd_inner_microstep: 1555.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-10 09:27:47,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.16 | bwd_microstep: 1361.02 | bwd_inner_microstep: 1360.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295
[2024-06-10 09:27:48,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 877.03 | bwd_inner_microstep: 877.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3631
[2024-06-10 09:27:50,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1314.61 | bwd_inner_microstep: 1314.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430
[2024-06-10 09:27:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.36 | bwd_microstep: 1158.58 | bwd_inner_microstep: 1158.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 09:27:53,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1313.05 | bwd_inner_microstep: 1313.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 09:27:55,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1393.86 | bwd_inner_microstep: 1393.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 09:27:57,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1373.37 | bwd_inner_microstep: 1373.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2074
[2024-06-10 09:27:58,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.01 | bwd_microstep: 728.69 | bwd_inner_microstep: 728.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 09:28:00,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1400.51 | bwd_inner_microstep: 1400.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3496
[2024-06-10 09:28:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.31 | bwd_microstep: 1413.47 | bwd_inner_microstep: 1413.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 09:28:04,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1412.39 | bwd_inner_microstep: 1412.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.76
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2956
[2024-06-10 09:28:06,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.04 | bwd_microstep: 1196.65 | bwd_inner_microstep: 1196.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 09:28:08,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.74 | bwd_microstep: 1615.66 | bwd_inner_microstep: 1615.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 09:28:10,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1341.34 | bwd_inner_microstep: 1341.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015
[2024-06-10 09:28:17,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 09:28:17,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.84 | bwd_microstep: 7066.87 | bwd_inner_microstep: 1020.76 | bwd_allreduce_microstep: 6046.04 | step_microstep: 38.86
[2024-06-10 09:28:17,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15053.09 | bwd: 46204.98 | bwd_inner: 40157.89 | bwd_allreduce: 6046.35 | step: 42.68
{'loss': 1.3673, 'learning_rate': 3.3084316153275824e-05, 'epoch': 0.29}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 09:28:18,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.57 | bwd_microstep: 674.47 | bwd_inner_microstep: 674.31 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 09:28:20,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.32 | bwd_microstep: 1242.27 | bwd_inner_microstep: 1242.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3946
[2024-06-10 09:28:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.97 | bwd_microstep: 1689.86 | bwd_inner_microstep: 1689.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 09:28:24,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1485.34 | bwd_inner_microstep: 1485.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 09:28:26,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1654.66 | bwd_inner_microstep: 1654.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3758
[2024-06-10 09:28:28,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.10 | bwd_microstep: 1472.33 | bwd_inner_microstep: 1472.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-10 09:28:30,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 09:28:33,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.50 | bwd_microstep: 1634.28 | bwd_inner_microstep: 1634.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1962
[2024-06-10 09:28:34,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.41 | bwd_microstep: 768.25 | bwd_inner_microstep: 768.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2489
[2024-06-10 09:28:35,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.15 | bwd_microstep: 1149.32 | bwd_inner_microstep: 1149.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2669
[2024-06-10 09:28:37,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.87 | bwd_microstep: 1027.67 | bwd_inner_microstep: 1027.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529
[2024-06-10 09:28:39,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.58 | bwd_microstep: 1360.64 | bwd_inner_microstep: 1360.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656
[2024-06-10 09:28:41,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.96 | bwd_microstep: 1720.69 | bwd_inner_microstep: 1720.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3704
[2024-06-10 09:28:43,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.98 | bwd_microstep: 1725.24 | bwd_inner_microstep: 1725.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 09:28:45,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.83 | bwd_microstep: 1291.35 | bwd_inner_microstep: 1291.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2512
[2024-06-10 09:28:46,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.27 | bwd_microstep: 962.55 | bwd_inner_microstep: 962.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036
[2024-06-10 09:28:48,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 812.93 | bwd_inner_microstep: 812.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 09:28:50,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.85 | bwd_microstep: 1522.89 | bwd_inner_microstep: 1522.68 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 09:28:52,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.67 | bwd_microstep: 1496.20 | bwd_inner_microstep: 1496.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 09:28:54,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.93 | bwd_microstep: 1283.05 | bwd_inner_microstep: 1283.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 09:28:55,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1400.88 | bwd_inner_microstep: 1400.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 09:28:56,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.64 | bwd_microstep: 690.60 | bwd_inner_microstep: 690.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 09:28:57,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.21 | bwd_microstep: 699.21 | bwd_inner_microstep: 699.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1953
[2024-06-10 09:28:58,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.34 | bwd_microstep: 704.33 | bwd_inner_microstep: 704.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3588
[2024-06-10 09:29:01,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.58 | bwd_microstep: 1642.60 | bwd_inner_microstep: 1642.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3830
[2024-06-10 09:29:03,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.28 | bwd_microstep: 1605.29 | bwd_inner_microstep: 1605.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2256
[2024-06-10 09:29:04,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.69 | bwd_microstep: 874.84 | bwd_inner_microstep: 874.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 09:29:05,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.49 | bwd_microstep: 972.27 | bwd_inner_microstep: 972.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 09:29:07,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1413.28 | bwd_inner_microstep: 1413.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3563
[2024-06-10 09:29:09,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.32 | bwd_microstep: 1237.83 | bwd_inner_microstep: 1237.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-10 09:29:11,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.21 | bwd_microstep: 1576.92 | bwd_inner_microstep: 1576.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 09:29:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.32 | optimizer_step: 6.56
[2024-06-10 09:29:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.38 | bwd_microstep: 6797.63 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 5107.23 | step_microstep: 38.74
[2024-06-10 09:29:19,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15236.95 | bwd: 46023.21 | bwd_inner: 40914.76 | bwd_allreduce: 5107.60 | step: 40.48
{'loss': 1.3137, 'learning_rate': 3.305590589527648e-05, 'epoch': 0.3}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 09:29:21,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1393.50 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936
[2024-06-10 09:29:23,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.59 | bwd_microstep: 1591.83 | bwd_inner_microstep: 1591.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 09:29:25,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1249.29 | bwd_inner_microstep: 1249.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-10 09:29:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1465.33 | bwd_inner_microstep: 1465.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 09:29:28,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1384.81 | bwd_inner_microstep: 1384.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 09:29:30,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1382.64 | bwd_inner_microstep: 1382.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 09:29:33,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.33 | bwd_microstep: 1529.15 | bwd_inner_microstep: 1529.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 09:29:34,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 1286.64 | bwd_inner_microstep: 1286.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 09:29:37,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.08 | bwd_microstep: 1626.17 | bwd_inner_microstep: 1626.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-10 09:29:38,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.25 | bwd_microstep: 821.93 | bwd_inner_microstep: 821.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3636
[2024-06-10 09:29:39,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1317.54 | bwd_inner_microstep: 1317.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 09:29:41,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1394.23 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.23 | step_microstep: 0.66
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3588
[2024-06-10 09:29:43,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.42 | bwd_microstep: 1270.32 | bwd_inner_microstep: 1270.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-10 09:29:45,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1317.72 | bwd_inner_microstep: 1317.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 09:29:47,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.67 | bwd_microstep: 1518.88 | bwd_inner_microstep: 1518.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 09:29:49,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.54 | bwd_microstep: 1477.69 | bwd_inner_microstep: 1477.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 09:29:51,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1391.12 | bwd_inner_microstep: 1390.79 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.34
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 09:29:53,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1349.87 | bwd_inner_microstep: 1349.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-10 09:29:55,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1335.94 | bwd_inner_microstep: 1335.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 09:29:57,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.43 | bwd_microstep: 1391.18 | bwd_inner_microstep: 1391.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 09:29:59,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.98 | bwd_microstep: 1308.25 | bwd_inner_microstep: 1308.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 09:30:01,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.96 | bwd_microstep: 1604.01 | bwd_inner_microstep: 1603.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 09:30:03,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1398.56 | bwd_inner_microstep: 1398.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3783
[2024-06-10 09:30:05,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.39 | bwd_microstep: 1356.02 | bwd_inner_microstep: 1355.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182
[2024-06-10 09:30:06,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.01 | bwd_microstep: 953.88 | bwd_inner_microstep: 953.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3600
[2024-06-10 09:30:08,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.58 | bwd_microstep: 1582.89 | bwd_inner_microstep: 1582.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1955
[2024-06-10 09:30:09,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.78 | bwd_microstep: 764.69 | bwd_inner_microstep: 764.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 09:30:11,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.94 | bwd_microstep: 1460.46 | bwd_inner_microstep: 1460.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3589
[2024-06-10 09:30:13,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1241.78 | bwd_inner_microstep: 1241.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 09:30:15,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.98 | bwd_microstep: 1509.17 | bwd_inner_microstep: 1509.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2542
[2024-06-10 09:30:16,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.93 | bwd_microstep: 876.62 | bwd_inner_microstep: 876.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-10 09:30:20,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.33 | optimizer_step: 6.59
[2024-06-10 09:30:20,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.57 | bwd_microstep: 2979.63 | bwd_inner_microstep: 1761.18 | bwd_allreduce_microstep: 1218.40 | step_microstep: 40.68
[2024-06-10 09:30:20,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16161.62 | bwd: 44531.78 | bwd_inner: 43311.82 | bwd_allreduce: 1219.08 | step: 43.53
{'loss': 1.258, 'learning_rate': 3.302744965444445e-05, 'epoch': 0.3}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 09:30:22,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.55 | bwd_microstep: 1333.54 | bwd_inner_microstep: 1333.32 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4006
[2024-06-10 09:30:24,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.28 | bwd_microstep: 1610.92 | bwd_inner_microstep: 1610.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2903
[2024-06-10 09:30:25,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.71 | bwd_microstep: 1032.21 | bwd_inner_microstep: 1032.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 09:30:27,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1397.18 | bwd_inner_microstep: 1397.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 09:30:29,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1277.36 | bwd_inner_microstep: 1277.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 09:30:31,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.46 | bwd_microstep: 1530.64 | bwd_inner_microstep: 1530.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-10 09:30:33,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 1348.27 | bwd_inner_microstep: 1348.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 09:30:35,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 09:30:37,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1288.98 | bwd_inner_microstep: 1288.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1899
[2024-06-10 09:30:38,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.99 | bwd_microstep: 731.55 | bwd_inner_microstep: 731.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2308
[2024-06-10 09:30:39,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.08 | bwd_microstep: 918.55 | bwd_inner_microstep: 918.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3453
[2024-06-10 09:30:41,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1303.36 | bwd_inner_microstep: 1303.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 09:30:43,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.28 | bwd_microstep: 1379.24 | bwd_inner_microstep: 1379.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 09:30:45,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1511.45 | bwd_inner_microstep: 1511.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 09:30:47,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1409.51 | bwd_inner_microstep: 1409.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 09:30:49,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.65 | bwd_microstep: 1382.24 | bwd_inner_microstep: 1382.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 09:30:50,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1302.15 | bwd_inner_microstep: 1302.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 09:30:53,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.09 | bwd_microstep: 1660.24 | bwd_inner_microstep: 1660.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 09:30:55,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1512.75 | bwd_inner_microstep: 1512.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 09:30:56,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.40 | bwd_microstep: 976.24 | bwd_inner_microstep: 976.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 09:30:58,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1406.41 | bwd_inner_microstep: 1406.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 09:31:00,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1383.79 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 09:31:01,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.07 | bwd_microstep: 877.47 | bwd_inner_microstep: 877.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999
[2024-06-10 09:31:02,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.50 | bwd_microstep: 740.77 | bwd_inner_microstep: 740.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 09:31:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1352.04 | bwd_inner_microstep: 1352.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 09:31:06,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1549.51 | bwd_inner_microstep: 1549.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2276
[2024-06-10 09:31:07,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.23 | bwd_microstep: 937.48 | bwd_inner_microstep: 937.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 09:31:09,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1249.57 | bwd_inner_microstep: 1249.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 09:31:11,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1413.24 | bwd_inner_microstep: 1413.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 09:31:13,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.94 | bwd_microstep: 1597.72 | bwd_inner_microstep: 1597.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 09:31:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.37 | bwd_microstep: 1450.15 | bwd_inner_microstep: 1450.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 09:31:20,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 09:31:20,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 4049.54 | bwd_inner_microstep: 1685.39 | bwd_allreduce_microstep: 2364.10 | step_microstep: 37.92
[2024-06-10 09:31:20,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15638.20 | bwd: 44294.21 | bwd_inner: 41929.04 | bwd_allreduce: 2364.42 | step: 39.75
{'loss': 1.3153, 'learning_rate': 3.2998947531002456e-05, 'epoch': 0.3}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 09:31:22,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.88 | bwd_microstep: 1488.11 | bwd_inner_microstep: 1488.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3401
[2024-06-10 09:31:24,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1325.50 | bwd_inner_microstep: 1325.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 09:31:26,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.82 | bwd_microstep: 1547.92 | bwd_inner_microstep: 1547.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 09:31:28,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.97 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2242
[2024-06-10 09:31:29,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.68 | bwd_microstep: 898.69 | bwd_inner_microstep: 898.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 09:31:31,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1438.34 | bwd_inner_microstep: 1438.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-10 09:31:32,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.87 | bwd_microstep: 727.82 | bwd_inner_microstep: 727.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4742
[2024-06-10 09:31:35,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 718.83 | bwd_microstep: 1916.09 | bwd_inner_microstep: 1916.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3542
[2024-06-10 09:31:37,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1473.76 | bwd_inner_microstep: 1473.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 09:31:39,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 09:31:41,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1449.35 | bwd_inner_microstep: 1449.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-10 09:31:42,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.73 | bwd_microstep: 911.09 | bwd_inner_microstep: 911.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2828
[2024-06-10 09:31:43,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.15 | bwd_microstep: 1156.89 | bwd_inner_microstep: 1156.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3526
[2024-06-10 09:31:45,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1341.72 | bwd_inner_microstep: 1341.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 09:31:47,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1344.65 | bwd_inner_microstep: 1344.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651
[2024-06-10 09:31:49,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.23 | bwd_microstep: 1415.91 | bwd_inner_microstep: 1415.83 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038
[2024-06-10 09:31:50,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.61 | bwd_microstep: 905.89 | bwd_inner_microstep: 905.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 09:31:52,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.40 | bwd_microstep: 1187.16 | bwd_inner_microstep: 1187.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2308
[2024-06-10 09:31:53,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.32 | bwd_microstep: 854.44 | bwd_inner_microstep: 854.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 09:31:54,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.44 | bwd_microstep: 696.50 | bwd_inner_microstep: 696.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 09:31:56,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.04 | bwd_microstep: 1321.39 | bwd_inner_microstep: 1321.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3627
[2024-06-10 09:31:58,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1660.68 | bwd_inner_microstep: 1660.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-10 09:32:00,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1416.91 | bwd_inner_microstep: 1416.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 09:32:02,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.87 | bwd_microstep: 1490.64 | bwd_inner_microstep: 1490.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109
[2024-06-10 09:32:04,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.57 | bwd_microstep: 923.64 | bwd_inner_microstep: 923.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 09:32:06,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1414.87 | bwd_inner_microstep: 1414.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 09:32:07,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1354.90 | bwd_inner_microstep: 1354.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 09:32:09,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.27 | bwd_microstep: 1417.46 | bwd_inner_microstep: 1417.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 09:32:11,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1510.29 | bwd_inner_microstep: 1510.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-10 09:32:13,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1413.55 | bwd_inner_microstep: 1413.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 09:32:15,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.58 | bwd_microstep: 1312.56 | bwd_inner_microstep: 1312.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-10 09:32:19,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 09:32:19,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.20 | bwd_microstep: 3252.56 | bwd_inner_microstep: 1473.68 | bwd_allreduce_microstep: 1778.83 | step_microstep: 38.22
[2024-06-10 09:32:19,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15468.93 | bwd: 43159.79 | bwd_inner: 41379.95 | bwd_allreduce: 1779.11 | step: 40.06
{'loss': 1.2664, 'learning_rate': 3.2970399625334836e-05, 'epoch': 0.3}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 09:32:21,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.79 | bwd_microstep: 1473.87 | bwd_inner_microstep: 1473.76 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3894
[2024-06-10 09:32:23,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1487.22 | bwd_inner_microstep: 1487.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 09:32:25,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2316
[2024-06-10 09:32:26,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.08 | bwd_microstep: 948.24 | bwd_inner_microstep: 948.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3755
[2024-06-10 09:32:29,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.05 | bwd_microstep: 1744.04 | bwd_inner_microstep: 1744.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 09:32:31,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1342.18 | bwd_inner_microstep: 1342.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 09:32:32,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.85 | bwd_microstep: 1245.12 | bwd_inner_microstep: 1245.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 09:32:34,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1248.50 | bwd_inner_microstep: 1248.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2462
[2024-06-10 09:32:35,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.57 | bwd_microstep: 953.22 | bwd_inner_microstep: 953.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556
[2024-06-10 09:32:37,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1362.31 | bwd_inner_microstep: 1362.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1879
[2024-06-10 09:32:38,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.58 | bwd_microstep: 776.54 | bwd_inner_microstep: 776.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432
[2024-06-10 09:32:40,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.34 | bwd_microstep: 1285.49 | bwd_inner_microstep: 1285.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 09:32:42,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.72 | bwd_microstep: 1613.86 | bwd_inner_microstep: 1613.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 09:32:44,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1493.61 | bwd_inner_microstep: 1493.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3838
[2024-06-10 09:32:46,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1361.28 | bwd_inner_microstep: 1361.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 09:32:48,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1411.95 | bwd_inner_microstep: 1411.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 09:32:50,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.58 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 09:32:52,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.09 | bwd_microstep: 1286.56 | bwd_inner_microstep: 1286.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972
[2024-06-10 09:32:53,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.98 | bwd_microstep: 734.31 | bwd_inner_microstep: 734.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 09:32:55,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.53 | bwd_microstep: 1287.53 | bwd_inner_microstep: 1287.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 09:32:57,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.35 | bwd_microstep: 1512.09 | bwd_inner_microstep: 1512.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 09:32:59,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1464.62 | bwd_inner_microstep: 1464.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 09:33:01,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.86 | bwd_microstep: 1412.16 | bwd_inner_microstep: 1412.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3418
[2024-06-10 09:33:03,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.25 | bwd_microstep: 1378.00 | bwd_inner_microstep: 1377.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441
[2024-06-10 09:33:05,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1413.89 | bwd_inner_microstep: 1413.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641
[2024-06-10 09:33:07,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.97 | bwd_microstep: 1712.46 | bwd_inner_microstep: 1712.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 09:33:09,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 1626.80 | bwd_inner_microstep: 1626.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-10 09:33:11,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.54 | bwd_microstep: 967.06 | bwd_inner_microstep: 967.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 09:33:13,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.41 | bwd_microstep: 1502.68 | bwd_inner_microstep: 1502.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 09:33:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1316.58 | bwd_inner_microstep: 1316.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399
[2024-06-10 09:33:17,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.01 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-10 09:33:19,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 09:33:19,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.08 | bwd_microstep: 2183.60 | bwd_inner_microstep: 1091.16 | bwd_allreduce_microstep: 1092.39 | step_microstep: 37.70
[2024-06-10 09:33:19,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15960.57 | bwd: 43870.46 | bwd_inner: 42777.07 | bwd_allreduce: 1092.67 | step: 39.37


 29%|██▉       | 508/1726 [8:49:52<20:37:00, 60.94s/it]
 29%|██▉       | 509/1726 [8:50:54<20:40:08, 61.14s/it]


 29%|██▉       | 509/1726 [8:50:54<20:40:08, 61.14s/it]
 30%|██▉       | 510/1726 [8:51:55<20:42:01, 61.28s/it]


 30%|██▉       | 510/1726 [8:51:55<20:42:01, 61.28s/it]
 30%|██▉       | 511/1726 [8:52:56<20:39:41, 61.22s/it]


 30%|██▉       | 511/1726 [8:52:56<20:39:41, 61.22s/it]
 30%|██▉       | 512/1726 [8:53:57<20:32:56, 60.94s/it]


 30%|██▉       | 512/1726 [8:53:57<20:32:56, 60.94s/it]
 30%|██▉       | 513/1726 [8:54:56<20:20:05, 60.35s/it]


 30%|██▉       | 513/1726 [8:54:56<20:20:05, 60.35s/it]
 30%|██▉       | 514{'loss': 1.3432, 'learning_rate': 3.294180603798716e-05, 'epoch': 0.3}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 09:33:21,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.65 | bwd_microstep: 1473.85 | bwd_inner_microstep: 1473.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 09:33:23,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1250.88 | bwd_inner_microstep: 1250.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1946
[2024-06-10 09:33:24,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.12 | bwd_microstep: 763.95 | bwd_inner_microstep: 763.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 09:33:26,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1655.72 | bwd_inner_microstep: 1655.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 09:33:27,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.20 | bwd_microstep: 791.24 | bwd_inner_microstep: 791.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 09:33:30,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.76 | bwd_microstep: 1559.06 | bwd_inner_microstep: 1559.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 09:33:31,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1357.17 | bwd_inner_microstep: 1357.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 09:33:33,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1346.42 | bwd_inner_microstep: 1346.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 09:33:35,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1389.30 | bwd_inner_microstep: 1389.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 09:33:37,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1276.44 | bwd_inner_microstep: 1276.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 09:33:39,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1312.51 | bwd_inner_microstep: 1312.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 09:33:41,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1281.91 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 09:33:42,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.83 | bwd_microstep: 790.76 | bwd_inner_microstep: 790.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3667
[2024-06-10 09:33:44,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.56 | bwd_microstep: 1584.26 | bwd_inner_microstep: 1584.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504
[2024-06-10 09:33:46,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.58 | bwd_microstep: 1445.38 | bwd_inner_microstep: 1445.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 09:33:48,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1481.74 | bwd_inner_microstep: 1481.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524
[2024-06-10 09:33:50,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.51 | bwd_microstep: 1595.95 | bwd_inner_microstep: 1595.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 09:33:52,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.05 | bwd_microstep: 1438.79 | bwd_inner_microstep: 1438.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 09:33:54,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.28 | bwd_microstep: 1158.64 | bwd_inner_microstep: 1158.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 09:33:56,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1397.98 | bwd_inner_microstep: 1397.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 09:33:57,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.57 | bwd_microstep: 1295.55 | bwd_inner_microstep: 1295.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 09:33:59,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 09:34:00,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.47 | bwd_microstep: 801.73 | bwd_inner_microstep: 801.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 09:34:03,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.92 | bwd_microstep: 1459.90 | bwd_inner_microstep: 1459.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 09:34:05,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1455.31 | bwd_inner_microstep: 1455.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3540
[2024-06-10 09:34:06,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.45 | bwd_microstep: 1359.43 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 09:34:08,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.47 | bwd_microstep: 1299.44 | bwd_inner_microstep: 1299.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2255
[2024-06-10 09:34:09,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.67 | bwd_microstep: 808.47 | bwd_inner_microstep: 808.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818
[2024-06-10 09:34:12,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.83 | bwd_microstep: 1717.85 | bwd_inner_microstep: 1717.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3429
[2024-06-10 09:34:14,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.88 | bwd_microstep: 1544.59 | bwd_inner_microstep: 1544.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3620
[2024-06-10 09:34:16,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.49 | bwd_microstep: 1711.47 | bwd_inner_microstep: 1711.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 09:34:21,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 09:34:21,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.12 | bwd_microstep: 4123.63 | bwd_inner_microstep: 1782.30 | bwd_allreduce_microstep: 2341.27 | step_microstep: 38.23
[2024-06-10 09:34:21,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16012.79 | bwd: 45328.72 | bwd_inner: 42986.53 | bwd_allreduce: 2341.51 | step: 39.89
{'loss': 1.268, 'learning_rate': 3.291316686966589e-05, 'epoch': 0.3}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 09:34:23,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.55 | bwd_microstep: 1434.15 | bwd_inner_microstep: 1433.99 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2389
[2024-06-10 09:34:24,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 410.44 | bwd_microstep: 1097.48 | bwd_inner_microstep: 1097.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 09:34:27,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.79 | bwd_microstep: 1586.40 | bwd_inner_microstep: 1586.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 09:34:28,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3847
[2024-06-10 09:34:31,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.75 | bwd_microstep: 1457.89 | bwd_inner_microstep: 1457.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 09:34:32,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.74 | bwd_microstep: 973.58 | bwd_inner_microstep: 973.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 09:34:34,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.02 | bwd_microstep: 1632.73 | bwd_inner_microstep: 1632.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3750
[2024-06-10 09:34:36,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1604.16 | bwd_inner_microstep: 1604.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3710
[2024-06-10 09:34:38,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.44 | bwd_microstep: 1460.26 | bwd_inner_microstep: 1460.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2186
[2024-06-10 09:34:40,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.64 | bwd_microstep: 983.15 | bwd_inner_microstep: 983.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1965
[2024-06-10 09:34:41,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.30 | bwd_microstep: 857.15 | bwd_inner_microstep: 857.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3706
[2024-06-10 09:34:43,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.09 | bwd_microstep: 1724.29 | bwd_inner_microstep: 1724.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032
[2024-06-10 09:34:44,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.55 | bwd_microstep: 905.95 | bwd_inner_microstep: 905.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 09:34:46,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.30 | bwd_microstep: 1274.18 | bwd_inner_microstep: 1274.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 09:34:48,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1387.98 | bwd_inner_microstep: 1387.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972
[2024-06-10 09:34:49,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.62 | bwd_microstep: 735.38 | bwd_inner_microstep: 735.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1845
[2024-06-10 09:34:50,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.86 | bwd_microstep: 700.69 | bwd_inner_microstep: 700.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3560
[2024-06-10 09:34:52,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.75 | bwd_microstep: 1557.42 | bwd_inner_microstep: 1557.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3514
[2024-06-10 09:34:55,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.91 | bwd_microstep: 1684.43 | bwd_inner_microstep: 1684.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3465
[2024-06-10 09:34:57,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.39 | bwd_microstep: 1428.75 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 09:34:59,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.25 | bwd_microstep: 1538.13 | bwd_inner_microstep: 1538.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142
[2024-06-10 09:35:00,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.47 | bwd_microstep: 834.86 | bwd_inner_microstep: 834.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3831
[2024-06-10 09:35:02,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1484.91 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 09:35:03,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.20 | bwd_microstep: 812.48 | bwd_inner_microstep: 812.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 09:35:05,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1554.84 | bwd_inner_microstep: 1554.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 09:35:07,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1410.42 | bwd_inner_microstep: 1410.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 09:35:09,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1558.29 | bwd_inner_microstep: 1558.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 09:35:11,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.47 | bwd_microstep: 1314.70 | bwd_inner_microstep: 1314.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3611
[2024-06-10 09:35:13,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.59 | bwd_microstep: 1708.53 | bwd_inner_microstep: 1708.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801
[2024-06-10 09:35:16,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.67 | bwd_microstep: 1460.44 | bwd_inner_microstep: 1460.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3767
[2024-06-10 09:35:18,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1503.26 | bwd_inner_microstep: 1503.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 09:35:21,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 09:35:21,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.19 | bwd_microstep: 3168.12 | bwd_inner_microstep: 1666.44 | bwd_allreduce_microstep: 1501.64 | step_microstep: 37.94
[2024-06-10 09:35:21,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15815.57 | bwd: 44211.42 | bwd_inner: 42708.75 | bwd_allreduce: 1501.93 | step: 39.85
{'loss': 1.2592, 'learning_rate': 3.2884482221238044e-05, 'epoch': 0.3}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1984
[2024-06-10 09:35:22,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.27 | bwd_microstep: 849.70 | bwd_inner_microstep: 849.57 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1907
[2024-06-10 09:35:24,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.62 | bwd_microstep: 743.47 | bwd_inner_microstep: 743.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3484
[2024-06-10 09:35:25,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.24 | bwd_microstep: 1431.13 | bwd_inner_microstep: 1431.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 09:35:28,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.34 | bwd_microstep: 1639.05 | bwd_inner_microstep: 1639.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3745
[2024-06-10 09:35:30,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.58 | bwd_microstep: 1435.30 | bwd_inner_microstep: 1435.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 09:35:31,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.61 | bwd_microstep: 1280.69 | bwd_inner_microstep: 1280.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-10 09:35:33,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.86 | bwd_microstep: 1302.80 | bwd_inner_microstep: 1302.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2092
[2024-06-10 09:35:34,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.73 | bwd_microstep: 822.00 | bwd_inner_microstep: 821.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3504
[2024-06-10 09:35:36,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.78 | bwd_microstep: 1364.55 | bwd_inner_microstep: 1364.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 09:35:38,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1343.45 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3627
[2024-06-10 09:35:40,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.44 | bwd_microstep: 1458.21 | bwd_inner_microstep: 1458.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2654
[2024-06-10 09:35:42,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.45 | bwd_microstep: 1018.64 | bwd_inner_microstep: 1018.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 09:35:43,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.32 | bwd_microstep: 1242.72 | bwd_inner_microstep: 1242.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3518
[2024-06-10 09:35:45,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.68 | bwd_microstep: 1550.95 | bwd_inner_microstep: 1550.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 09:35:48,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.10 | bwd_microstep: 1506.04 | bwd_inner_microstep: 1506.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3686
[2024-06-10 09:35:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.79 | bwd_microstep: 1753.70 | bwd_inner_microstep: 1753.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2008
[2024-06-10 09:35:51,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.32 | bwd_microstep: 772.38 | bwd_inner_microstep: 772.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 09:35:52,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.68 | bwd_microstep: 800.30 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 09:35:54,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 09:35:55,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.03 | bwd_microstep: 699.22 | bwd_inner_microstep: 699.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 09:35:56,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.93 | bwd_microstep: 808.79 | bwd_inner_microstep: 808.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-10 09:35:58,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.79 | bwd_microstep: 1636.04 | bwd_inner_microstep: 1636.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 09:36:00,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1411.65 | bwd_inner_microstep: 1411.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 09:36:02,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1509.79 | bwd_inner_microstep: 1509.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3673
[2024-06-10 09:36:05,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.69 | bwd_microstep: 1659.13 | bwd_inner_microstep: 1659.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2279
[2024-06-10 09:36:06,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.76 | bwd_microstep: 1021.76 | bwd_inner_microstep: 1021.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3820
[2024-06-10 09:36:08,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.90 | bwd_microstep: 1724.08 | bwd_inner_microstep: 1724.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-10 09:36:11,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.51 | bwd_microstep: 1545.32 | bwd_inner_microstep: 1545.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 09:36:12,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1247.65 | bwd_inner_microstep: 1247.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 09:36:14,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.84 | bwd_microstep: 1503.04 | bwd_inner_microstep: 1503.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 09:36:16,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.06 | bwd_microstep: 1344.95 | bwd_inner_microstep: 1344.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469
[2024-06-10 09:36:23,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.70 | optimizer_gradients: 4.37 | optimizer_step: 6.62
[2024-06-10 09:36:23,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.05 | bwd_microstep: 6507.31 | bwd_inner_microstep: 1622.47 | bwd_allreduce_microstep: 4884.77 | step_microstep: 39.93
[2024-06-10 09:36:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15402.76 | bwd: 46338.13 | bwd_inner: 41452.35 | bwd_allreduce: 4885.06 | step: 41.57
{'loss': 1.3005, 'learning_rate': 3.285575219373079e-05, 'epoch': 0.3}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 09:36:25,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1336.20 | bwd_inner_microstep: 1336.13 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 09:36:27,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1383.52 | bwd_inner_microstep: 1383.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1953
[2024-06-10 09:36:28,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.19 | bwd_microstep: 851.55 | bwd_inner_microstep: 851.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 09:36:30,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.99 | bwd_microstep: 1344.11 | bwd_inner_microstep: 1344.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 09:36:32,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1494.59 | bwd_inner_microstep: 1494.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 09:36:34,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.54 | bwd_microstep: 1475.97 | bwd_inner_microstep: 1475.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 09:36:36,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.01 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-10 09:36:38,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.23 | bwd_microstep: 1634.48 | bwd_inner_microstep: 1634.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 09:36:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1249.98 | bwd_inner_microstep: 1249.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 09:36:42,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 09:36:44,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.72 | bwd_microstep: 1512.35 | bwd_inner_microstep: 1512.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529
[2024-06-10 09:36:46,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1322.25 | bwd_inner_microstep: 1322.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3668
[2024-06-10 09:36:48,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1551.19 | bwd_inner_microstep: 1551.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 09:36:50,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.64 | bwd_microstep: 1318.26 | bwd_inner_microstep: 1318.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3460
[2024-06-10 09:36:52,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.03 | bwd_microstep: 1346.20 | bwd_inner_microstep: 1346.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3388
[2024-06-10 09:36:53,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.68 | bwd_microstep: 1144.74 | bwd_inner_microstep: 1144.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 09:36:55,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1394.41 | bwd_inner_microstep: 1394.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-10 09:36:57,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.06 | bwd_microstep: 1197.74 | bwd_inner_microstep: 1197.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483
[2024-06-10 09:36:59,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1408.07 | bwd_inner_microstep: 1408.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-10 09:37:01,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1607.79 | bwd_inner_microstep: 1607.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3682
[2024-06-10 09:37:03,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1331.36 | bwd_inner_microstep: 1331.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 09:37:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.77 | bwd_microstep: 974.06 | bwd_inner_microstep: 974.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-10 09:37:06,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.98 | bwd_microstep: 1196.42 | bwd_inner_microstep: 1196.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3826
[2024-06-10 09:37:08,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.45 | bwd_microstep: 1625.37 | bwd_inner_microstep: 1625.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2179
[2024-06-10 09:37:09,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.12 | bwd_microstep: 800.49 | bwd_inner_microstep: 800.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425
[2024-06-10 09:37:11,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1283.04 | bwd_inner_microstep: 1283.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 09:37:13,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1451.17 | bwd_inner_microstep: 1451.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 09:37:15,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.15 | bwd_microstep: 1754.25 | bwd_inner_microstep: 1754.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919
[2024-06-10 09:37:16,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.56 | bwd_microstep: 690.32 | bwd_inner_microstep: 690.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 09:37:18,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.16 | bwd_microstep: 1414.27 | bwd_inner_microstep: 1414.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 09:37:20,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1555.02 | bwd_inner_microstep: 1555.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 09:37:25,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.34 | optimizer_step: 6.58
[2024-06-10 09:37:25,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.47 | bwd_microstep: 4448.05 | bwd_inner_microstep: 1964.21 | bwd_allreduce_microstep: 2483.77 | step_microstep: 38.76
[2024-06-10 09:37:25,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16006.89 | bwd: 45631.66 | bwd_inner: 43146.77 | bwd_allreduce: 2484.10 | step: 40.49
{'loss': 1.2665, 'learning_rate': 3.282697688833114e-05, 'epoch': 0.3}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 09:37:27,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1433.93 | bwd_inner_microstep: 1433.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 09:37:29,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1388.04 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3412
[2024-06-10 09:37:31,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.62 | bwd_microstep: 1209.59 | bwd_inner_microstep: 1209.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1985
[2024-06-10 09:37:32,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.91 | bwd_microstep: 706.26 | bwd_inner_microstep: 706.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 09:37:34,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1352.06 | bwd_inner_microstep: 1352.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3276
[2024-06-10 09:37:35,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.59 | bwd_microstep: 1217.76 | bwd_inner_microstep: 1217.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 09:37:36,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.27 | bwd_microstep: 678.72 | bwd_inner_microstep: 678.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1886
[2024-06-10 09:37:37,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.00 | bwd_microstep: 713.71 | bwd_inner_microstep: 713.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 09:37:40,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.97 | bwd_microstep: 1644.52 | bwd_inner_microstep: 1644.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567
[2024-06-10 09:37:42,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.03 | bwd_microstep: 1334.38 | bwd_inner_microstep: 1334.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 09:37:43,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1388.28 | bwd_inner_microstep: 1388.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 09:37:45,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1384.76 | bwd_inner_microstep: 1384.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635
[2024-06-10 09:37:47,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1416.32 | bwd_inner_microstep: 1416.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 09:37:49,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.76 | bwd_microstep: 1549.47 | bwd_inner_microstep: 1549.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3451
[2024-06-10 09:37:52,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1479.47 | bwd_inner_microstep: 1479.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-10 09:37:54,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.19 | bwd_microstep: 1519.17 | bwd_inner_microstep: 1519.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3449
[2024-06-10 09:37:56,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.64 | bwd_microstep: 1551.97 | bwd_inner_microstep: 1551.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3543
[2024-06-10 09:37:58,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.21 | bwd_microstep: 1563.62 | bwd_inner_microstep: 1563.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 09:38:00,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1294.58 | bwd_inner_microstep: 1294.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 09:38:01,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.17 | bwd_microstep: 1293.76 | bwd_inner_microstep: 1293.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 09:38:03,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1287.99 | bwd_inner_microstep: 1287.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 09:38:05,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1399.13 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 09:38:07,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.10 | bwd_microstep: 1488.90 | bwd_inner_microstep: 1488.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 09:38:09,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.72 | bwd_microstep: 1493.75 | bwd_inner_microstep: 1493.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 09:38:11,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1408.56 | bwd_inner_microstep: 1408.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609
[2024-06-10 09:38:13,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1537.65 | bwd_inner_microstep: 1537.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440
[2024-06-10 09:38:15,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.53 | bwd_microstep: 1218.41 | bwd_inner_microstep: 1218.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 09:38:17,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.41 | bwd_microstep: 1314.26 | bwd_inner_microstep: 1314.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 09:38:19,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1508.62 | bwd_inner_microstep: 1508.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-10 09:38:21,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.04 | bwd_microstep: 1357.51 | bwd_inner_microstep: 1357.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3770
[2024-06-10 09:38:23,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1500.31 | bwd_inner_microstep: 1500.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3775
[2024-06-10 09:38:26,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-10 09:38:26,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.98 | bwd_microstep: 2542.24 | bwd_inner_microstep: 1512.32 | bwd_allreduce_microstep: 1029.87 | step_microstep: 37.90
[2024-06-10 09:38:26,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.43 | bwd: 44177.72 | bwd_inner: 43146.94 | bwd_allreduce: 1030.11 | step: 39.44
{'loss': 1.2934, 'learning_rate': 3.279815640638557e-05, 'epoch': 0.3}
/1726 [8:55:56<20:18:06, 60.30s/it]


 30%|██▉       | 514/1726 [8:55:56<20:18:06, 60.30s/it]
 30%|██▉       | 515/1726 [8:56:58<20:25:36, 60.72s/it]


 30%|██▉       | 515/1726 [8:56:58<20:25:36, 60.72s/it]
 30%|██▉       | 516/1726 [8:57:58<20:22:34, 60.62s/it]


 30%|██▉       | 516/1726 [8:57:58<20:22:34, 60.62s/it]
 30%|██▉       | 517/1726 [8:59:00<20:30:26, 61.06s/it]


 30%|██▉       | 517/1726 [8:59:00<20:30:26, 61.06s/it]
 30%|███       | 518/1726 [9:00:02<20:35:01, 61.34s/it]


 30%|███       | 518/1726 [9:00:02<20:35:01, 61.34s/it]
 30%|███       | 519/1726 [9:01:03<20:29:47, 61.13s/it]


 30%|███       | 519dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 09:38:28,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1476.39 | bwd_inner_microstep: 1476.27 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 09:38:30,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.84 | bwd_microstep: 1483.52 | bwd_inner_microstep: 1483.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 09:38:32,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287
[2024-06-10 09:38:33,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.37 | bwd_microstep: 876.37 | bwd_inner_microstep: 876.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 09:38:35,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1309.81 | bwd_inner_microstep: 1309.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-10 09:38:37,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1437.75 | bwd_inner_microstep: 1437.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 09:38:39,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.01 | bwd_microstep: 1386.77 | bwd_inner_microstep: 1386.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 09:38:41,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1534.77 | bwd_inner_microstep: 1534.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2493
[2024-06-10 09:38:42,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.52 | bwd_microstep: 926.52 | bwd_inner_microstep: 926.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 09:38:44,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1288.72 | bwd_inner_microstep: 1288.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 09:38:45,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.29 | bwd_microstep: 796.53 | bwd_inner_microstep: 796.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3704
[2024-06-10 09:38:47,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1555.17 | bwd_inner_microstep: 1555.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 09:38:49,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.25 | bwd_microstep: 1347.73 | bwd_inner_microstep: 1347.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2134
[2024-06-10 09:38:51,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.70 | bwd_microstep: 931.55 | bwd_inner_microstep: 931.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 09:38:52,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.43 | bwd_microstep: 1340.51 | bwd_inner_microstep: 1340.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 09:38:54,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1444.10 | bwd_inner_microstep: 1444.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3526
[2024-06-10 09:38:57,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.83 | bwd_microstep: 1584.94 | bwd_inner_microstep: 1584.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2293
[2024-06-10 09:38:58,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.41 | bwd_microstep: 1009.23 | bwd_inner_microstep: 1009.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 09:38:59,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.16 | bwd_microstep: 789.95 | bwd_inner_microstep: 789.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3538
[2024-06-10 09:39:01,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.82 | bwd_microstep: 1690.19 | bwd_inner_microstep: 1690.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616
[2024-06-10 09:39:04,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1537.76 | bwd_inner_microstep: 1537.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2176
[2024-06-10 09:39:05,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.60 | bwd_microstep: 955.07 | bwd_inner_microstep: 955.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 09:39:07,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1281.30 | bwd_inner_microstep: 1281.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 09:39:09,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.12 | bwd_microstep: 1398.36 | bwd_inner_microstep: 1398.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3650
[2024-06-10 09:39:10,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1327.01 | bwd_inner_microstep: 1326.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 09:39:13,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.77 | bwd_microstep: 1560.12 | bwd_inner_microstep: 1560.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 09:39:15,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.33 | bwd_microstep: 1498.59 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1894
[2024-06-10 09:39:16,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.52 | bwd_microstep: 750.37 | bwd_inner_microstep: 750.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2267
[2024-06-10 09:39:17,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.75 | bwd_microstep: 1067.78 | bwd_inner_microstep: 1067.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3777
[2024-06-10 09:39:19,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.69 | bwd_microstep: 1580.83 | bwd_inner_microstep: 1580.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 09:39:21,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.47 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1396.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-10 09:39:27,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-10 09:39:27,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.73 | bwd_microstep: 4767.64 | bwd_inner_microstep: 1645.62 | bwd_allreduce_microstep: 3121.96 | step_microstep: 41.79
[2024-06-10 09:39:27,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15509.29 | bwd: 44748.12 | bwd_inner: 41625.13 | bwd_allreduce: 3122.25 | step: 43.46
{'loss': 1.2667, 'learning_rate': 3.276929084939967e-05, 'epoch': 0.3}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 09:39:28,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1344.10 | bwd_inner_microstep: 1343.92 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 09:39:31,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.87 | bwd_microstep: 1476.41 | bwd_inner_microstep: 1476.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856
[2024-06-10 09:39:33,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.48 | bwd_microstep: 1661.33 | bwd_inner_microstep: 1661.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2309
[2024-06-10 09:39:34,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.71 | bwd_microstep: 980.89 | bwd_inner_microstep: 980.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 09:39:36,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 1540.93 | bwd_inner_microstep: 1540.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 09:39:37,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.55 | bwd_microstep: 679.45 | bwd_inner_microstep: 679.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1948
[2024-06-10 09:39:38,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.06 | bwd_microstep: 852.96 | bwd_inner_microstep: 852.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 09:39:40,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1251.29 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 09:39:42,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.55 | bwd_microstep: 1298.19 | bwd_inner_microstep: 1298.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741
[2024-06-10 09:39:44,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.20 | bwd_microstep: 1634.01 | bwd_inner_microstep: 1633.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 09:39:46,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1258.33 | bwd_inner_microstep: 1258.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 09:39:48,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1256.43 | bwd_inner_microstep: 1256.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 09:39:50,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1526.52 | bwd_inner_microstep: 1526.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 09:39:52,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2482
[2024-06-10 09:39:53,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.54 | bwd_microstep: 1051.43 | bwd_inner_microstep: 1051.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668
[2024-06-10 09:39:55,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.38 | bwd_microstep: 1448.86 | bwd_inner_microstep: 1448.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2291
[2024-06-10 09:39:57,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.05 | bwd_microstep: 1075.10 | bwd_inner_microstep: 1075.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 09:39:58,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1283.21 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3645
[2024-06-10 09:40:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1472.46 | bwd_inner_microstep: 1472.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499
[2024-06-10 09:40:03,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1576.34 | bwd_inner_microstep: 1576.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-10 09:40:04,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.57 | bwd_microstep: 1325.16 | bwd_inner_microstep: 1325.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 09:40:07,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.10 | bwd_microstep: 1530.30 | bwd_inner_microstep: 1530.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 09:40:08,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1356.96 | bwd_inner_microstep: 1356.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998
[2024-06-10 09:40:09,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.17 | bwd_microstep: 739.01 | bwd_inner_microstep: 738.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2970
[2024-06-10 09:40:11,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.53 | bwd_microstep: 1015.28 | bwd_inner_microstep: 1015.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 09:40:13,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.37 | bwd_microstep: 1452.31 | bwd_inner_microstep: 1452.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 09:40:15,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.24 | bwd_microstep: 1534.36 | bwd_inner_microstep: 1534.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3782
[2024-06-10 09:40:17,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1412.46 | bwd_inner_microstep: 1412.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3534
[2024-06-10 09:40:19,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1356.62 | bwd_inner_microstep: 1356.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 09:40:21,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.88 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3685
[2024-06-10 09:40:23,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.67 | bwd_microstep: 1720.72 | bwd_inner_microstep: 1720.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 09:40:28,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 09:40:28,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.61 | bwd_microstep: 4218.28 | bwd_inner_microstep: 1815.69 | bwd_allreduce_microstep: 2402.53 | step_microstep: 37.93
[2024-06-10 09:40:28,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15930.25 | bwd: 45196.54 | bwd_inner: 42792.96 | bwd_allreduce: 2402.83 | step: 39.48
{'loss': 1.2458, 'learning_rate': 3.274038031903778e-05, 'epoch': 0.3}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-10 09:40:30,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1328.18 | bwd_inner_microstep: 1328.10 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3891
[2024-06-10 09:40:32,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.96 | bwd_microstep: 1679.29 | bwd_inner_microstep: 1679.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866
[2024-06-10 09:40:34,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.15 | bwd_microstep: 1561.42 | bwd_inner_microstep: 1561.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 09:40:36,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.63 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758
[2024-06-10 09:40:39,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.14 | bwd_microstep: 1637.71 | bwd_inner_microstep: 1637.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 09:40:40,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.54 | bwd_microstep: 809.73 | bwd_inner_microstep: 809.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 09:40:41,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1247.05 | bwd_inner_microstep: 1247.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3562
[2024-06-10 09:40:44,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.96 | bwd_microstep: 1566.95 | bwd_inner_microstep: 1566.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 09:40:45,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.47 | bwd_microstep: 792.06 | bwd_inner_microstep: 792.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-10 09:40:47,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.98 | bwd_microstep: 1406.75 | bwd_inner_microstep: 1406.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 09:40:48,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.82 | bwd_microstep: 1302.84 | bwd_inner_microstep: 1302.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 09:40:50,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.02 | bwd_microstep: 1353.83 | bwd_inner_microstep: 1353.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 09:40:52,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.03 | bwd_microstep: 1511.30 | bwd_inner_microstep: 1511.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3667
[2024-06-10 09:40:55,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.61 | bwd_microstep: 1652.51 | bwd_inner_microstep: 1652.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3556
[2024-06-10 09:40:57,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.29 | bwd_microstep: 1443.51 | bwd_inner_microstep: 1443.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 09:40:58,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.68 | bwd_microstep: 917.60 | bwd_inner_microstep: 917.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3555
[2024-06-10 09:41:00,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.73 | bwd_microstep: 1591.76 | bwd_inner_microstep: 1591.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-10 09:41:02,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.54 | bwd_microstep: 1718.35 | bwd_inner_microstep: 1718.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1995
[2024-06-10 09:41:04,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.09 | bwd_microstep: 930.90 | bwd_inner_microstep: 930.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2025
[2024-06-10 09:41:05,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.53 | bwd_microstep: 716.10 | bwd_inner_microstep: 716.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 09:41:07,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.75 | bwd_microstep: 1430.89 | bwd_inner_microstep: 1430.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 09:41:09,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.07 | bwd_microstep: 1656.86 | bwd_inner_microstep: 1656.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 09:41:11,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.19 | bwd_microstep: 1357.59 | bwd_inner_microstep: 1357.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 09:41:13,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 09:41:14,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.84 | bwd_microstep: 821.01 | bwd_inner_microstep: 820.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 641
[2024-06-10 09:41:14,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.02 | bwd_microstep: 274.86 | bwd_inner_microstep: 274.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 09:41:16,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1499.86 | bwd_inner_microstep: 1499.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 09:41:18,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.35 | bwd_microstep: 1530.65 | bwd_inner_microstep: 1530.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692
[2024-06-10 09:41:21,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.87 | bwd_microstep: 1626.86 | bwd_inner_microstep: 1626.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 09:41:23,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.05 | bwd_microstep: 1537.28 | bwd_inner_microstep: 1537.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 09:41:25,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.70 | bwd_microstep: 1405.32 | bwd_inner_microstep: 1405.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 09:41:27,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 09:41:27,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.97 | bwd_microstep: 1554.38 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 7.67 | step_microstep: 37.59
[2024-06-10 09:41:27,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15842.93 | bwd: 42524.09 | bwd_inner: 42515.44 | bwd_allreduce: 7.94 | step: 39.15
{'loss': 1.3284, 'learning_rate': 3.271142491712264e-05, 'epoch': 0.3}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 09:41:29,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1572.97 | bwd_inner_microstep: 1572.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 09:41:31,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1377.70 | bwd_inner_microstep: 1377.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3861
[2024-06-10 09:41:33,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1460.20 | bwd_inner_microstep: 1460.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 09:41:35,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1345.94 | bwd_inner_microstep: 1345.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 09:41:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.59 | bwd_microstep: 1289.52 | bwd_inner_microstep: 1289.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 09:41:39,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.97 | bwd_microstep: 1531.27 | bwd_inner_microstep: 1531.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 09:41:40,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.21 | bwd_microstep: 678.94 | bwd_inner_microstep: 678.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 09:41:42,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.62 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090
[2024-06-10 09:41:43,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.89 | bwd_microstep: 917.72 | bwd_inner_microstep: 917.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 09:41:45,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1387.31 | bwd_inner_microstep: 1387.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 09:41:47,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1387.27 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 09:41:48,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1248.13 | bwd_inner_microstep: 1248.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1978
[2024-06-10 09:41:50,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.82 | bwd_microstep: 858.82 | bwd_inner_microstep: 858.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 09:41:51,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1343.92 | bwd_inner_microstep: 1343.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3654
[2024-06-10 09:41:54,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.37 | bwd_microstep: 1719.26 | bwd_inner_microstep: 1719.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 09:41:56,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1460.00 | bwd_inner_microstep: 1459.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-10 09:41:58,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.49 | bwd_microstep: 1284.68 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 09:41:59,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.03 | bwd_microstep: 1180.96 | bwd_inner_microstep: 1180.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618
[2024-06-10 09:42:01,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1310.74 | bwd_inner_microstep: 1310.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-10 09:42:03,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.98 | bwd_microstep: 1312.84 | bwd_inner_microstep: 1312.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3887
[2024-06-10 09:42:05,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.82 | bwd_microstep: 1791.02 | bwd_inner_microstep: 1791.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 09:42:07,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1493.28 | bwd_inner_microstep: 1493.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 09:42:09,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1491.99 | bwd_inner_microstep: 1491.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 09:42:12,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1553.09 | bwd_inner_microstep: 1553.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3748
[2024-06-10 09:42:14,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.15 | bwd_microstep: 1706.35 | bwd_inner_microstep: 1706.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014
[2024-06-10 09:42:15,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.88 | bwd_microstep: 897.08 | bwd_inner_microstep: 897.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-10 09:42:17,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.57 | bwd_microstep: 1300.59 | bwd_inner_microstep: 1300.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 09:42:19,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.94 | bwd_microstep: 1654.54 | bwd_inner_microstep: 1654.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3560
[2024-06-10 09:42:21,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.94 | bwd_microstep: 1556.93 | bwd_inner_microstep: 1556.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 09:42:23,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.33 | bwd_microstep: 1399.79 | bwd_inner_microstep: 1399.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 09:42:25,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1359.13 | bwd_inner_microstep: 1359.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572
[2024-06-10 09:42:30,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 09:42:30,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.21 | bwd_microstep: 4365.30 | bwd_inner_microstep: 1617.07 | bwd_allreduce_microstep: 2748.17 | step_microstep: 38.41
[2024-06-10 09:42:30,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16372.71 | bwd: 46622.16 | bwd_inner: 43873.08 | bwd_allreduce: 2748.40 | step: 39.96
{'loss': 1.2834, 'learning_rate': 3.268242474563502e-05, 'epoch': 0.3}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2393
[2024-06-10 09:42:32,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.31 | bwd_microstep: 988.04 | bwd_inner_microstep: 987.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4534
[2024-06-10 09:42:34,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.60 | bwd_microstep: 1741.84 | bwd_inner_microstep: 1741.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 09:42:36,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1344.00 | bwd_inner_microstep: 1343.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 09:42:38,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1397.78 | bwd_inner_microstep: 1397.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 09:42:39,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.28 | bwd_microstep: 971.13 | bwd_inner_microstep: 971.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 09:42:41,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1248.49 | bwd_inner_microstep: 1248.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3397
[2024-06-10 09:42:43,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.39 | bwd_microstep: 1292.44 | bwd_inner_microstep: 1292.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 09:42:45,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.50 | bwd_microstep: 1483.49 | bwd_inner_microstep: 1483.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-10 09:42:46,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.98 | bwd_microstep: 1197.18 | bwd_inner_microstep: 1197.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 09:42:47,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.31 | bwd_microstep: 797.18 | bwd_inner_microstep: 797.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 09:42:49,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1376.03 | bwd_inner_microstep: 1376.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 09:42:50,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.92 | bwd_microstep: 792.46 | bwd_inner_microstep: 792.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3706
[2024-06-10 09:42:52,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.22 | bwd_microstep: 1266.05 | bwd_inner_microstep: 1266.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 09:42:54,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1496.29 | bwd_inner_microstep: 1496.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661
[2024-06-10 09:42:56,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.92 | bwd_microstep: 1565.33 | bwd_inner_microstep: 1565.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3654
[2024-06-10 09:42:59,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.59 | bwd_microstep: 1565.28 | bwd_inner_microstep: 1565.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 09:43:00,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1343.71 | bwd_inner_microstep: 1343.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876
[2024-06-10 09:43:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.13 | bwd_microstep: 710.47 | bwd_inner_microstep: 710.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438
[2024-06-10 09:43:03,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1413.06 | bwd_inner_microstep: 1413.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 990
[2024-06-10 09:43:04,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 153.37 | bwd_microstep: 390.96 | bwd_inner_microstep: 390.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 09:43:05,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 696.31 | bwd_inner_microstep: 696.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075
[2024-06-10 09:43:06,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.78 | bwd_microstep: 918.47 | bwd_inner_microstep: 918.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3828
[2024-06-10 09:43:08,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.29 | bwd_microstep: 1297.72 | bwd_inner_microstep: 1297.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3856
[2024-06-10 09:43:10,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1470.87 | bwd_inner_microstep: 1470.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 09:43:12,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1419.29 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 09:43:14,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.16 | bwd_microstep: 1551.37 | bwd_inner_microstep: 1551.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 09:43:16,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.04 | bwd_microstep: 1652.76 | bwd_inner_microstep: 1652.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3570
[2024-06-10 09:43:18,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.76 | bwd_microstep: 1335.67 | bwd_inner_microstep: 1335.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 09:43:20,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.24 | bwd_microstep: 1641.91 | bwd_inner_microstep: 1641.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 09:43:23,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1548.04 | bwd_inner_microstep: 1548.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 09:43:25,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.12 | bwd_microstep: 1650.29 | bwd_inner_microstep: 1650.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 09:43:31,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 09:43:31,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.50 | bwd_microstep: 5413.23 | bwd_inner_microstep: 1738.67 | bwd_allreduce_microstep: 3674.51 | step_microstep: 38.21
[2024-06-10 09:43:31,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.85 | bwd: 44977.19 | bwd_inner: 41301.74 | bwd_allreduce: 3674.75 | step: 39.87
{'loss': 1.2761, 'learning_rate': 3.265337990671337e-05, 'epoch': 0.3}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3387
[2024-06-10 09:43:33,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.64 | bwd_microstep: 1268.74 | bwd_inner_microstep: 1268.63 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2382
[2024-06-10 09:43:34,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.70 | bwd_microstep: 998.24 | bwd_inner_microstep: 998.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2332
[2024-06-10 09:43:35,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.40 | bwd_microstep: 984.20 | bwd_inner_microstep: 984.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2261
[2024-06-10 09:43:37,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.11 | bwd_microstep: 967.87 | bwd_inner_microstep: 967.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 09:43:39,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.94 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 09:43:41,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1652.05 | bwd_inner_microstep: 1652.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 09:43:43,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.58 | bwd_microstep: 1250.42 | bwd_inner_microstep: 1250.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 09:43:44,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 09:43:47,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.99 | bwd_microstep: 1528.16 | bwd_inner_microstep: 1528.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3482
[2024-06-10 09:43:49,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1443.72 | bwd_inner_microstep: 1443.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 09:43:50,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.01 | bwd_microstep: 1346.46 | bwd_inner_microstep: 1346.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 09:43:52,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1344.95 | bwd_inner_microstep: 1344.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 09:43:54,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1479.66 | bwd_inner_microstep: 1479.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 09:43:56,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.48 | bwd_microstep: 1348.85 | bwd_inner_microstep: 1348.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3503
[2024-06-10 09:43:58,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.60 | bwd_microstep: 1253.54 | bwd_inner_microstep: 1253.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3597
[2024-06-10 09:44:00,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.33 | bwd_microstep: 1641.67 | bwd_inner_microstep: 1641.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 09:44:02,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.86 | bwd_microstep: 1352.14 | bwd_inner_microstep: 1352.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 09:44:04,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1512.02 | bwd_inner_microstep: 1511.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 09:44:06,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.09 | bwd_microstep: 1295.47 | bwd_inner_microstep: 1295.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 09:44:08,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.34 | bwd_microstep: 1180.58 | bwd_inner_microstep: 1180.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 09:44:10,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.17 | bwd_microstep: 1400.04 | bwd_inner_microstep: 1400.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 09:44:12,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.73 | bwd_microstep: 1611.91 | bwd_inner_microstep: 1611.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3518
[2024-06-10 09:44:14,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.56 | bwd_microstep: 1324.60 | bwd_inner_microstep: 1324.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 09:44:16,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.33 | bwd_microstep: 1446.00 | bwd_inner_microstep: 1445.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 09:44:18,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1405.21 | bwd_inner_microstep: 1405.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682
[2024-06-10 09:44:19,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.61 | bwd_microstep: 1434.41 | bwd_inner_microstep: 1434.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 09:44:20,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.77 | bwd_microstep: 689.62 | bwd_inner_microstep: 689.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 09:44:23,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1557.61 | bwd_inner_microstep: 1557.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 09:44:24,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.48 | bwd_microstep: 1310.89 | bwd_inner_microstep: 1310.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1915
[2024-06-10 09:44:25,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.78 | bwd_microstep: 785.07 | bwd_inner_microstep: 785.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 09:44:27,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1398.18 | bwd_inner_microstep: 1398.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397
[2024-06-10 09:44:33,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.28 | optimizer_step: 6.58
[2024-06-10 09:44:33,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 4559.18 | bwd_inner_microstep: 1595.77 | bwd_allreduce_microstep: 2963.35 | step_microstep: 38.55
[2024-06-10 09:44:33,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15858.31 | bwd: 45443.89 | bwd_inner: 42479.52 | bwd_allreduce: 2963.63 | step: 40.47
/1726 [9:01:03<20:29:47, 61.13s/it]
 30%|███       | 520/1726 [9:02:03<20:25:38, 60.98s/it]


 30%|███       | 520/1726 [9:02:03<20:25:38, 60.98s/it]
 30%|███       | 521/1726 [9:03:05<20:27:35, 61.12s/it]


 30%|███       | 521/1726 [9:03:05<20:27:35, 61.12s/it]
 30%|███       | 522/1726 [9:04:04<20:12:03, 60.40s/it]


 30%|███       | 522/1726 [9:04:04<20:12:03, 60.40s/it]
 30%|███       | 523/1726 [9:05:07<20:28:42, 61.28s/it]


 30%|███       | 523/1726 [9:05:07<20:28:42, 61.28s/it]
 30%|███       | 524/1726 [9:06:08<20:24:25, 61.12s/it]


 30%|███       | 524/1726 [9:06:08<20:24:25, 61.12s/it]
 30%|███       | 525/1726 [9:07:09<20:26:46, 61.29s/it]
                     {'loss': 1.2959, 'learning_rate': 3.262429050265348e-05, 'epoch': 0.3}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 09:44:35,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1442.07 | bwd_inner_microstep: 1441.95 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4475
[2024-06-10 09:44:37,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 677.72 | bwd_microstep: 1830.89 | bwd_inner_microstep: 1830.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 09:44:39,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1381.90 | bwd_inner_microstep: 1381.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 09:44:41,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1380.59 | bwd_inner_microstep: 1380.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 09:44:43,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.26 | bwd_microstep: 1279.72 | bwd_inner_microstep: 1279.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-10 09:44:45,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.73 | bwd_microstep: 1535.42 | bwd_inner_microstep: 1535.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3401
[2024-06-10 09:44:47,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1306.07 | bwd_inner_microstep: 1306.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 09:44:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.62 | bwd_microstep: 1254.58 | bwd_inner_microstep: 1254.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-10 09:44:49,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.22 | bwd_microstep: 782.09 | bwd_inner_microstep: 782.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 09:44:51,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.12 | bwd_microstep: 1345.91 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3648
[2024-06-10 09:44:53,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.46 | bwd_microstep: 1353.41 | bwd_inner_microstep: 1353.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3483
[2024-06-10 09:44:55,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.68 | bwd_microstep: 1434.29 | bwd_inner_microstep: 1434.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 09:44:57,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1286.54 | bwd_inner_microstep: 1286.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3628
[2024-06-10 09:44:59,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 1461.95 | bwd_inner_microstep: 1461.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 09:45:01,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.28 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 09:45:03,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.25 | bwd_microstep: 1588.73 | bwd_inner_microstep: 1588.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.28
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499
[2024-06-10 09:45:05,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.92 | bwd_microstep: 1584.65 | bwd_inner_microstep: 1584.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3516
[2024-06-10 09:45:07,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.96 | bwd_microstep: 1517.14 | bwd_inner_microstep: 1517.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 09:45:09,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.58 | bwd_microstep: 1649.75 | bwd_inner_microstep: 1649.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 09:45:12,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.65 | bwd_microstep: 1661.30 | bwd_inner_microstep: 1661.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 09:45:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.69 | bwd_microstep: 1512.63 | bwd_inner_microstep: 1512.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 09:45:16,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1398.71 | bwd_inner_microstep: 1398.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3610
[2024-06-10 09:45:18,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1440.99 | bwd_inner_microstep: 1440.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 09:45:20,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1396.25 | bwd_inner_microstep: 1396.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 09:45:22,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.19 | bwd_microstep: 1399.98 | bwd_inner_microstep: 1399.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 09:45:24,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1497.33 | bwd_inner_microstep: 1497.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 09:45:25,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.81 | bwd_microstep: 1285.22 | bwd_inner_microstep: 1285.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 09:45:28,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1563.12 | bwd_inner_microstep: 1563.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 09:45:29,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.36 | bwd_microstep: 1311.53 | bwd_inner_microstep: 1311.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 09:45:31,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.34 | bwd_microstep: 1443.79 | bwd_inner_microstep: 1443.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3665
[2024-06-10 09:45:33,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.04 | bwd_microstep: 1448.41 | bwd_inner_microstep: 1448.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864
[2024-06-10 09:45:34,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 09:45:34,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.85 | bwd_microstep: 752.98 | bwd_inner_microstep: 740.80 | bwd_allreduce_microstep: 12.14 | step_microstep: 37.71
[2024-06-10 09:45:34,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16719.31 | bwd: 44814.93 | bwd_inner: 44801.80 | bwd_allreduce: 12.41 | step: 40.72
{'loss': 1.3363, 'learning_rate': 3.259515663590805e-05, 'epoch': 0.3}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 09:45:36,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1332.67 | bwd_inner_microstep: 1332.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2453
[2024-06-10 09:45:38,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.80 | bwd_microstep: 1014.73 | bwd_inner_microstep: 1014.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 09:45:40,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.38 | bwd_microstep: 1457.39 | bwd_inner_microstep: 1457.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 09:45:42,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.88 | bwd_microstep: 1651.36 | bwd_inner_microstep: 1651.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 09:45:44,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1249.27 | bwd_inner_microstep: 1249.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3478
[2024-06-10 09:45:45,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1217.77 | bwd_inner_microstep: 1217.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4151
[2024-06-10 09:45:48,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.88 | bwd_microstep: 1645.88 | bwd_inner_microstep: 1645.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 09:45:49,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 796.62 | bwd_inner_microstep: 796.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-10 09:45:51,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.92 | bwd_microstep: 1532.17 | bwd_inner_microstep: 1532.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 09:45:53,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.99 | bwd_microstep: 1519.61 | bwd_inner_microstep: 1519.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407
[2024-06-10 09:45:55,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.71 | bwd_microstep: 1308.05 | bwd_inner_microstep: 1308.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1959
[2024-06-10 09:45:56,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.84 | bwd_microstep: 856.90 | bwd_inner_microstep: 856.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 09:45:58,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.40 | bwd_microstep: 1392.20 | bwd_inner_microstep: 1392.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 09:45:59,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.23 | bwd_microstep: 691.40 | bwd_inner_microstep: 691.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 09:46:01,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1402.93 | bwd_inner_microstep: 1402.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3455
[2024-06-10 09:46:03,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.09 | bwd_microstep: 1388.29 | bwd_inner_microstep: 1388.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3203
[2024-06-10 09:46:05,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.18 | bwd_microstep: 1363.81 | bwd_inner_microstep: 1363.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2686
[2024-06-10 09:46:06,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.53 | bwd_microstep: 1028.51 | bwd_inner_microstep: 1028.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 09:46:08,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1379.77 | bwd_inner_microstep: 1379.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914
[2024-06-10 09:46:09,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.07 | bwd_microstep: 780.50 | bwd_inner_microstep: 780.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 09:46:11,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1511.50 | bwd_inner_microstep: 1511.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 09:46:13,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.87 | bwd_microstep: 1501.08 | bwd_inner_microstep: 1501.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 09:46:15,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.26 | bwd_microstep: 1557.76 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3724
[2024-06-10 09:46:18,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.67 | bwd_microstep: 1732.14 | bwd_inner_microstep: 1732.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3604
[2024-06-10 09:46:20,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.33 | bwd_microstep: 1575.67 | bwd_inner_microstep: 1575.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3552
[2024-06-10 09:46:22,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.85 | bwd_microstep: 1327.88 | bwd_inner_microstep: 1327.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 09:46:24,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.79 | bwd_microstep: 1657.36 | bwd_inner_microstep: 1657.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3777
[2024-06-10 09:46:26,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1579.39 | bwd_inner_microstep: 1579.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 09:46:28,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.51 | bwd_microstep: 1255.94 | bwd_inner_microstep: 1255.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 09:46:30,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1443.80 | bwd_inner_microstep: 1443.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 09:46:32,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.59 | bwd_microstep: 1442.47 | bwd_inner_microstep: 1442.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-10 09:46:37,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.57
[2024-06-10 09:46:37,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 4502.64 | bwd_inner_microstep: 1754.82 | bwd_allreduce_microstep: 2747.77 | step_microstep: 38.15
[2024-06-10 09:46:37,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16121.31 | bwd: 46097.46 | bwd_inner: 43348.77 | bwd_allreduce: 2748.00 | step: 39.80
{'loss': 1.297, 'learning_rate': 3.256597840908643e-05, 'epoch': 0.31}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 09:46:39,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 1465.24 | bwd_inner_microstep: 1465.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 09:46:41,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284
[2024-06-10 09:46:42,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.17 | bwd_microstep: 906.28 | bwd_inner_microstep: 906.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869
[2024-06-10 09:46:45,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.31 | bwd_microstep: 1664.42 | bwd_inner_microstep: 1664.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 09:46:47,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.94 | bwd_microstep: 1451.06 | bwd_inner_microstep: 1451.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 09:46:48,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.66 | bwd_microstep: 1285.54 | bwd_inner_microstep: 1285.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 09:46:50,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1391.91 | bwd_inner_microstep: 1391.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 09:46:51,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.37 | bwd_microstep: 798.51 | bwd_inner_microstep: 798.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2475
[2024-06-10 09:46:53,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.13 | bwd_microstep: 955.71 | bwd_inner_microstep: 955.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1995
[2024-06-10 09:46:54,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.74 | bwd_microstep: 741.59 | bwd_inner_microstep: 741.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3547
[2024-06-10 09:46:56,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.12 | bwd_microstep: 1455.18 | bwd_inner_microstep: 1455.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3748
[2024-06-10 09:46:58,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.41 | bwd_microstep: 1683.13 | bwd_inner_microstep: 1683.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670
[2024-06-10 09:47:00,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.51 | bwd_microstep: 1720.30 | bwd_inner_microstep: 1720.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968
[2024-06-10 09:47:01,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.48 | bwd_microstep: 703.95 | bwd_inner_microstep: 703.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 09:47:03,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1447.42 | bwd_inner_microstep: 1447.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 09:47:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1412.28 | bwd_inner_microstep: 1412.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 09:47:07,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.06 | bwd_microstep: 1493.75 | bwd_inner_microstep: 1493.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 09:47:09,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.40 | bwd_microstep: 1407.93 | bwd_inner_microstep: 1407.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 09:47:11,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.37 | bwd_microstep: 1431.43 | bwd_inner_microstep: 1431.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 09:47:12,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.26 | bwd_microstep: 802.48 | bwd_inner_microstep: 802.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904
[2024-06-10 09:47:14,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.94 | bwd_microstep: 1492.84 | bwd_inner_microstep: 1492.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 09:47:16,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1377.65 | bwd_inner_microstep: 1377.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-10 09:47:18,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.16 | bwd_microstep: 1517.58 | bwd_inner_microstep: 1517.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3615
[2024-06-10 09:47:20,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1250.17 | bwd_inner_microstep: 1250.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 09:47:22,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1426.60 | bwd_inner_microstep: 1426.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3817
[2024-06-10 09:47:24,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1538.96 | bwd_inner_microstep: 1538.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3728
[2024-06-10 09:47:26,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.94 | bwd_microstep: 1498.36 | bwd_inner_microstep: 1498.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3729
[2024-06-10 09:47:29,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.16 | bwd_microstep: 1730.05 | bwd_inner_microstep: 1730.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2058
[2024-06-10 09:47:30,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.22 | bwd_microstep: 1013.12 | bwd_inner_microstep: 1013.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856
[2024-06-10 09:47:32,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.05 | bwd_microstep: 1571.39 | bwd_inner_microstep: 1571.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 09:47:34,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.06 | bwd_microstep: 1393.23 | bwd_inner_microstep: 1393.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-10 09:47:38,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 09:47:38,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 2835.86 | bwd_inner_microstep: 1715.21 | bwd_allreduce_microstep: 1120.60 | step_microstep: 38.19
[2024-06-10 09:47:38,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16052.38 | bwd: 44242.47 | bwd_inner: 43120.95 | bwd_allreduce: 1120.83 | step: 40.07
{'loss': 1.2469, 'learning_rate': 3.2536755924954185e-05, 'epoch': 0.31}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-10 09:47:40,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.23 | bwd_microstep: 1441.78 | bwd_inner_microstep: 1441.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 09:47:42,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1308.94 | bwd_inner_microstep: 1308.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 09:47:43,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.80 | bwd_microstep: 1278.56 | bwd_inner_microstep: 1278.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 09:47:46,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.22 | bwd_microstep: 1654.14 | bwd_inner_microstep: 1654.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 09:47:48,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.57 | bwd_microstep: 1456.06 | bwd_inner_microstep: 1456.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 09:47:49,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.96 | bwd_microstep: 1251.17 | bwd_inner_microstep: 1251.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 09:47:50,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 791.95 | bwd_inner_microstep: 791.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 09:47:52,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.26 | bwd_microstep: 803.29 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 09:47:53,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1341.50 | bwd_inner_microstep: 1341.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 09:47:55,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.48 | bwd_microstep: 1252.90 | bwd_inner_microstep: 1252.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3675
[2024-06-10 09:47:57,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.17 | bwd_microstep: 1658.26 | bwd_inner_microstep: 1658.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2165
[2024-06-10 09:47:59,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.16 | bwd_microstep: 978.11 | bwd_inner_microstep: 978.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3497
[2024-06-10 09:48:01,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.55 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3649
[2024-06-10 09:48:03,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.60 | bwd_microstep: 1820.06 | bwd_inner_microstep: 1820.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-10 09:48:05,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1413.81 | bwd_inner_microstep: 1413.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2375
[2024-06-10 09:48:07,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.35 | bwd_microstep: 1026.80 | bwd_inner_microstep: 1026.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1410
[2024-06-10 09:48:07,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 188.29 | bwd_microstep: 475.67 | bwd_inner_microstep: 475.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 09:48:09,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1514.08 | bwd_inner_microstep: 1514.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 09:48:11,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 1438.15 | bwd_inner_microstep: 1438.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 09:48:13,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.81 | bwd_microstep: 1285.34 | bwd_inner_microstep: 1285.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 09:48:15,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1495.44 | bwd_inner_microstep: 1495.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-10 09:48:17,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.84 | bwd_microstep: 1489.90 | bwd_inner_microstep: 1489.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 09:48:18,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.89 | bwd_microstep: 796.38 | bwd_inner_microstep: 796.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3538
[2024-06-10 09:48:20,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1327.94 | bwd_inner_microstep: 1327.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 09:48:21,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.63 | bwd_microstep: 701.62 | bwd_inner_microstep: 701.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3995
[2024-06-10 09:48:24,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.69 | bwd_microstep: 1709.35 | bwd_inner_microstep: 1709.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 09:48:26,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1556.83 | bwd_inner_microstep: 1556.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3554
[2024-06-10 09:48:28,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.58 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 837
[2024-06-10 09:48:28,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.88 | bwd_microstep: 344.24 | bwd_inner_microstep: 344.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 09:48:30,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1381.47 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 09:48:32,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.22 | bwd_microstep: 1377.95 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3694
[2024-06-10 09:48:38,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.35 | optimizer_step: 6.62
[2024-06-10 09:48:38,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 5567.80 | bwd_inner_microstep: 1653.50 | bwd_allreduce_microstep: 3914.23 | step_microstep: 38.91
[2024-06-10 09:48:38,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15303.91 | bwd: 44974.29 | bwd_inner: 41059.13 | bwd_allreduce: 3914.48 | step: 40.45
{'loss': 1.319, 'learning_rate': 3.250748928643274e-05, 'epoch': 0.31}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3474
[2024-06-10 09:48:40,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1493.75 | bwd_inner_microstep: 1493.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2046
[2024-06-10 09:48:41,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.20 | bwd_microstep: 779.23 | bwd_inner_microstep: 779.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2372
[2024-06-10 09:48:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.64 | bwd_microstep: 995.58 | bwd_inner_microstep: 995.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3790
[2024-06-10 09:48:45,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1453.24 | bwd_inner_microstep: 1453.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3629
[2024-06-10 09:48:47,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.35 | bwd_microstep: 1375.37 | bwd_inner_microstep: 1375.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 09:48:49,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.76 | bwd_microstep: 1538.99 | bwd_inner_microstep: 1538.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 09:48:51,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3706
[2024-06-10 09:48:52,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.71 | bwd_microstep: 1362.28 | bwd_inner_microstep: 1362.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409
[2024-06-10 09:48:54,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.46 | bwd_microstep: 1306.57 | bwd_inner_microstep: 1306.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3755
[2024-06-10 09:48:57,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.51 | bwd_microstep: 1676.94 | bwd_inner_microstep: 1676.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 09:48:58,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1320.12 | bwd_inner_microstep: 1320.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3515
[2024-06-10 09:49:00,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1441.28 | bwd_inner_microstep: 1441.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 09:49:02,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1489.32 | bwd_inner_microstep: 1489.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 09:49:05,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.63 | bwd_microstep: 1648.90 | bwd_inner_microstep: 1648.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 09:49:07,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1499.96 | bwd_inner_microstep: 1499.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 09:49:09,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1484.52 | bwd_inner_microstep: 1484.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 09:49:11,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1348.21 | bwd_inner_microstep: 1348.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3700
[2024-06-10 09:49:13,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.09 | bwd_microstep: 1460.28 | bwd_inner_microstep: 1460.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3545
[2024-06-10 09:49:15,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1425.50 | bwd_inner_microstep: 1425.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 09:49:17,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.95 | bwd_microstep: 1346.58 | bwd_inner_microstep: 1346.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 09:49:19,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1401.48 | bwd_inner_microstep: 1401.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 09:49:20,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.03 | bwd_microstep: 1255.94 | bwd_inner_microstep: 1255.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 09:49:22,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.42 | bwd_microstep: 1503.04 | bwd_inner_microstep: 1503.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3657
[2024-06-10 09:49:24,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.17 | bwd_microstep: 1448.23 | bwd_inner_microstep: 1448.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3538
[2024-06-10 09:49:26,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.26 | bwd_microstep: 1343.57 | bwd_inner_microstep: 1343.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-10 09:49:28,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.12 | bwd_microstep: 1503.05 | bwd_inner_microstep: 1503.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 09:49:30,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.25 | bwd_microstep: 1289.21 | bwd_inner_microstep: 1289.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3575
[2024-06-10 09:49:32,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1372.04 | bwd_inner_microstep: 1372.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 09:49:34,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1504.11 | bwd_inner_microstep: 1504.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588
[2024-06-10 09:49:36,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1315.84 | bwd_inner_microstep: 1315.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921
[2024-06-10 09:49:37,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.39 | bwd_microstep: 727.89 | bwd_inner_microstep: 727.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 09:49:39,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 09:49:39,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1549.11 | bwd_inner_microstep: 1541.37 | bwd_allreduce_microstep: 7.70 | step_microstep: 37.58
[2024-06-10 09:49:39,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16435.03 | bwd: 43911.85 | bwd_inner: 43903.23 | bwd_allreduce: 7.93 | step: 39.22
{'loss': 1.2495, 'learning_rate': 3.247817859659905e-05, 'epoch': 0.31}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 09:49:41,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1384.85 | bwd_inner_microstep: 1384.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 09:49:43,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.15 | bwd_microstep: 1152.67 | bwd_inner_microstep: 1152.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864
[2024-06-10 09:49:45,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1461.28 | bwd_inner_microstep: 1461.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 09:49:46,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.32 | bwd_microstep: 1256.65 | bwd_inner_microstep: 1256.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4182
[2024-06-10 09:49:49,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.57 | bwd_microstep: 1654.12 | bwd_inner_microstep: 1654.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3485
[2024-06-10 09:49:50,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.42 | bwd_microstep: 1247.74 | bwd_inner_microstep: 1247.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1866
[2024-06-10 09:49:51,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.62 | bwd_microstep: 741.93 | bwd_inner_microstep: 741.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 09:49:53,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1252.43 | bwd_inner_microstep: 1252.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 09:49:55,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1410.16 | bwd_inner_microstep: 1410.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3952
[2024-06-10 09:49:57,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.74 | bwd_microstep: 1601.93 | bwd_inner_microstep: 1601.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692
[2024-06-10 09:49:59,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1330.07 | bwd_inner_microstep: 1330.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 09:50:01,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.73 | bwd_microstep: 1191.23 | bwd_inner_microstep: 1191.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 09:50:03,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.34 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432
[2024-06-10 09:50:05,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1328.65 | bwd_inner_microstep: 1328.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3528
[2024-06-10 09:50:07,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1451.18 | bwd_inner_microstep: 1451.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3461
[2024-06-10 09:50:08,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1230.88 | bwd_inner_microstep: 1230.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 09:50:10,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.01 | bwd_microstep: 1412.37 | bwd_inner_microstep: 1412.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 09:50:12,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.63 | bwd_microstep: 1514.91 | bwd_inner_microstep: 1514.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2005
[2024-06-10 09:50:13,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.52 | bwd_microstep: 740.97 | bwd_inner_microstep: 740.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 09:50:15,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.67 | bwd_microstep: 1290.58 | bwd_inner_microstep: 1290.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 09:50:17,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.14 | bwd_microstep: 1293.52 | bwd_inner_microstep: 1293.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 09:50:19,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.41 | bwd_microstep: 1192.76 | bwd_inner_microstep: 1192.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 09:50:21,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.65 | bwd_microstep: 1503.22 | bwd_inner_microstep: 1503.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 09:50:23,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1338.32 | bwd_inner_microstep: 1338.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2272
[2024-06-10 09:50:24,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.70 | bwd_microstep: 973.69 | bwd_inner_microstep: 973.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2230
[2024-06-10 09:50:25,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.00 | bwd_microstep: 868.26 | bwd_inner_microstep: 868.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3618
[2024-06-10 09:50:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1543.39 | bwd_inner_microstep: 1543.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 09:50:29,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.65 | bwd_microstep: 1585.16 | bwd_inner_microstep: 1585.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-10 09:50:31,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.51 | bwd_microstep: 1186.86 | bwd_inner_microstep: 1186.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 09:50:33,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.45 | bwd_microstep: 1358.00 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3479
[2024-06-10 09:50:35,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.85 | bwd_microstep: 1344.91 | bwd_inner_microstep: 1344.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3584
[2024-06-10 09:50:39,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 09:50:39,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.66 | bwd_microstep: 3060.11 | bwd_inner_microstep: 1924.49 | bwd_allreduce_microstep: 1135.58 | step_microstep: 37.77
[2024-06-10 09:50:39,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15844.26 | bwd: 43388.81 | bwd_inner: 42252.33 | bwd_allreduce: 1135.80 | step: 39.32


 30%|███       | 525/1726 [9:07:09<20:26:46, 61.29s/it]
 30%|███       | 526/1726 [9:08:11<20:29:23, 61.47s/it]


 30%|███       | 526/1726 [9:08:11<20:29:23, 61.47s/it]
 31%|███       | 527/1726 [9:09:14<20:35:01, 61.80s/it]


 31%|███       | 527/1726 [9:09:14<20:35:01, 61.80s/it]
 31%|███       | 528/1726 [9:10:14<20:27:09, 61.46s/it]


 31%|███       | 528/1726 [9:10:14<20:27:09, 61.46s/it]
 31%|███       | 529/1726 [9:11:15<20:21:07, 61.21s/it]


 31%|███       | 529/1726 [9:11:15<20:21:07, 61.21s/it]
 31%|███       | 530/1726 [9:12:16<20:17:00, 61.05s/it]


 31%|███       | 530/1726 [9:12:16<20:17:00, 61.05s/it]
 31%|███       {'loss': 1.2601, 'learning_rate': 3.244882395868521e-05, 'epoch': 0.31}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 09:50:41,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1440.62 | bwd_inner_microstep: 1440.53 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 09:50:42,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.49 | bwd_microstep: 792.33 | bwd_inner_microstep: 792.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 09:50:44,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.03 | bwd_microstep: 1647.97 | bwd_inner_microstep: 1647.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 09:50:46,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.14 | bwd_microstep: 1480.13 | bwd_inner_microstep: 1480.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 09:50:48,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.95 | bwd_microstep: 1527.00 | bwd_inner_microstep: 1526.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 09:50:50,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.97 | bwd_microstep: 1153.49 | bwd_inner_microstep: 1153.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 09:50:51,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.13 | bwd_microstep: 696.89 | bwd_inner_microstep: 696.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3416
[2024-06-10 09:50:52,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.08 | bwd_microstep: 1199.19 | bwd_inner_microstep: 1199.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 09:50:54,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.79 | bwd_microstep: 1483.85 | bwd_inner_microstep: 1483.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 09:50:56,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1297.93 | bwd_inner_microstep: 1297.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3444
[2024-06-10 09:50:58,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1424.30 | bwd_inner_microstep: 1424.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3676
[2024-06-10 09:51:01,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.04 | bwd_microstep: 1825.18 | bwd_inner_microstep: 1825.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2413
[2024-06-10 09:51:02,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.88 | bwd_microstep: 1006.41 | bwd_inner_microstep: 1006.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1963
[2024-06-10 09:51:03,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.40 | bwd_microstep: 733.70 | bwd_inner_microstep: 733.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 09:51:04,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.11 | bwd_microstep: 697.92 | bwd_inner_microstep: 697.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3525
[2024-06-10 09:51:06,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.80 | bwd_microstep: 1555.68 | bwd_inner_microstep: 1555.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 09:51:08,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.45 | bwd_microstep: 1396.62 | bwd_inner_microstep: 1396.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 09:51:09,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.61 | bwd_microstep: 795.71 | bwd_inner_microstep: 795.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 09:51:11,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.15 | bwd_microstep: 1658.11 | bwd_inner_microstep: 1658.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823
[2024-06-10 09:51:14,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.69 | bwd_microstep: 1487.43 | bwd_inner_microstep: 1487.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 09:51:15,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1277.14 | bwd_inner_microstep: 1277.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2288
[2024-06-10 09:51:16,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.21 | bwd_microstep: 829.26 | bwd_inner_microstep: 829.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-10 09:51:18,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.36 | bwd_microstep: 1308.82 | bwd_inner_microstep: 1308.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 09:51:20,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.91 | bwd_microstep: 1432.16 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 09:51:22,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.99 | bwd_microstep: 1350.54 | bwd_inner_microstep: 1350.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 09:51:24,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.54 | bwd_microstep: 1357.99 | bwd_inner_microstep: 1357.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 09:51:26,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1346.30 | bwd_inner_microstep: 1346.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 09:51:28,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.42 | bwd_microstep: 1650.36 | bwd_inner_microstep: 1650.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3379
[2024-06-10 09:51:30,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.75 | bwd_microstep: 1403.34 | bwd_inner_microstep: 1403.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3649
[2024-06-10 09:51:32,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.24 | bwd_microstep: 1611.96 | bwd_inner_microstep: 1611.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 09:51:34,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.29 | bwd_microstep: 1484.92 | bwd_inner_microstep: 1484.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2059
[2024-06-10 09:51:41,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 09:51:41,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.18 | bwd_microstep: 6541.70 | bwd_inner_microstep: 1080.69 | bwd_allreduce_microstep: 5460.96 | step_microstep: 37.90
[2024-06-10 09:51:41,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15450.27 | bwd: 46894.97 | bwd_inner: 41433.02 | bwd_allreduce: 5461.24 | step: 39.58
{'loss': 1.2861, 'learning_rate': 3.24194254760781e-05, 'epoch': 0.31}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 09:51:43,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.13 | bwd_microstep: 1393.92 | bwd_inner_microstep: 1393.81 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 09:51:45,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.68 | bwd_microstep: 1288.23 | bwd_inner_microstep: 1288.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 09:51:47,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.01 | bwd_microstep: 1279.63 | bwd_inner_microstep: 1279.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481
[2024-06-10 09:51:49,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.40 | bwd_microstep: 1342.58 | bwd_inner_microstep: 1342.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 09:51:51,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.82 | bwd_inner_microstep: 1382.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 09:51:53,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.22 | bwd_microstep: 1521.47 | bwd_inner_microstep: 1521.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-10 09:51:55,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1527.84 | bwd_inner_microstep: 1527.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 09:51:56,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1246.53 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2634
[2024-06-10 09:51:58,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.90 | bwd_microstep: 1052.43 | bwd_inner_microstep: 1052.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3543
[2024-06-10 09:52:00,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.06 | bwd_microstep: 1514.19 | bwd_inner_microstep: 1514.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1997
[2024-06-10 09:52:01,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.46 | bwd_microstep: 862.93 | bwd_inner_microstep: 862.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 09:52:03,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 09:52:05,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1519.98 | bwd_inner_microstep: 1519.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3652
[2024-06-10 09:52:07,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.88 | bwd_microstep: 1566.73 | bwd_inner_microstep: 1566.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2406
[2024-06-10 09:52:09,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.41 | bwd_microstep: 939.31 | bwd_inner_microstep: 939.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4038
[2024-06-10 09:52:11,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.73 | bwd_microstep: 1715.86 | bwd_inner_microstep: 1715.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-10 09:52:13,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.45 | bwd_microstep: 1199.74 | bwd_inner_microstep: 1199.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 09:52:14,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1286.27 | bwd_inner_microstep: 1286.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2087
[2024-06-10 09:52:15,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.54 | bwd_microstep: 758.77 | bwd_inner_microstep: 758.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1470
[2024-06-10 09:52:16,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 208.45 | bwd_microstep: 543.36 | bwd_inner_microstep: 543.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 09:52:17,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.54 | bwd_microstep: 695.93 | bwd_inner_microstep: 695.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024
[2024-06-10 09:52:18,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.89 | bwd_microstep: 715.07 | bwd_inner_microstep: 715.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 09:52:20,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.25 | bwd_microstep: 1518.23 | bwd_inner_microstep: 1518.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3603
[2024-06-10 09:52:22,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.38 | bwd_microstep: 1376.07 | bwd_inner_microstep: 1376.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 09:52:24,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1492.42 | bwd_inner_microstep: 1492.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 09:52:26,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.94 | bwd_microstep: 1501.59 | bwd_inner_microstep: 1501.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 09:52:27,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.70 | bwd_microstep: 802.78 | bwd_inner_microstep: 802.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 09:52:29,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1304.20 | bwd_inner_microstep: 1304.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3419
[2024-06-10 09:52:31,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.15 | bwd_microstep: 1409.42 | bwd_inner_microstep: 1409.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927
[2024-06-10 09:52:32,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.26 | bwd_microstep: 726.85 | bwd_inner_microstep: 726.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3591
[2024-06-10 09:52:34,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1564.85 | bwd_inner_microstep: 1564.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 09:52:41,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 09:52:41,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.56 | bwd_microstep: 5892.52 | bwd_inner_microstep: 1634.13 | bwd_allreduce_microstep: 4258.33 | step_microstep: 39.12
[2024-06-10 09:52:41,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14979.38 | bwd: 44291.13 | bwd_inner: 40031.80 | bwd_allreduce: 4258.62 | step: 40.68
{'loss': 1.2765, 'learning_rate': 3.2389983252319026e-05, 'epoch': 0.31}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-10 09:52:43,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.52 | bwd_microstep: 1308.01 | bwd_inner_microstep: 1307.82 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3909
[2024-06-10 09:52:45,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.89 | bwd_microstep: 1682.34 | bwd_inner_microstep: 1682.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4277
[2024-06-10 09:52:47,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.82 | bwd_microstep: 1566.39 | bwd_inner_microstep: 1566.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 09:52:49,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1275.55 | bwd_inner_microstep: 1275.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 09:52:51,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 09:52:53,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1482.65 | bwd_inner_microstep: 1482.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909
[2024-06-10 09:52:54,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.62 | bwd_microstep: 775.28 | bwd_inner_microstep: 775.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 09:52:56,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 1304.32 | bwd_inner_microstep: 1304.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3715
[2024-06-10 09:52:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1335.74 | bwd_inner_microstep: 1335.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 09:52:59,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 795.76 | bwd_inner_microstep: 795.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1867
[2024-06-10 09:53:00,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.54 | bwd_microstep: 741.57 | bwd_inner_microstep: 741.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3501
[2024-06-10 09:53:02,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.03 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3591
[2024-06-10 09:53:04,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.55 | bwd_microstep: 1375.44 | bwd_inner_microstep: 1375.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 09:53:06,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1418.48 | bwd_inner_microstep: 1418.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3492
[2024-06-10 09:53:08,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.53 | bwd_microstep: 1545.22 | bwd_inner_microstep: 1545.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 09:53:10,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.01 | bwd_microstep: 1504.75 | bwd_inner_microstep: 1504.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 09:53:12,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1416.93 | bwd_inner_microstep: 1416.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 09:53:13,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1252.42 | bwd_inner_microstep: 1252.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3542
[2024-06-10 09:53:16,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.52 | bwd_microstep: 1575.65 | bwd_inner_microstep: 1575.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3629
[2024-06-10 09:53:18,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1457.54 | bwd_inner_microstep: 1457.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3641
[2024-06-10 09:53:20,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.92 | bwd_microstep: 1765.42 | bwd_inner_microstep: 1765.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 09:53:22,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.63 | bwd_microstep: 1458.09 | bwd_inner_microstep: 1458.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 09:53:24,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1345.93 | bwd_inner_microstep: 1345.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 09:53:26,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.55 | bwd_microstep: 1317.10 | bwd_inner_microstep: 1317.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 09:53:28,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1352.62 | bwd_inner_microstep: 1352.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3572
[2024-06-10 09:53:30,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.21 | bwd_microstep: 1529.13 | bwd_inner_microstep: 1529.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 09:53:31,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.41 | bwd_microstep: 1155.83 | bwd_inner_microstep: 1155.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 09:53:34,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1557.82 | bwd_inner_microstep: 1557.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 09:53:35,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1279.94 | bwd_inner_microstep: 1279.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3595
[2024-06-10 09:53:37,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.32 | bwd_microstep: 1244.90 | bwd_inner_microstep: 1244.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943
[2024-06-10 09:53:39,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.87 | bwd_microstep: 1196.29 | bwd_inner_microstep: 1196.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565
[2024-06-10 09:53:41,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 09:53:41,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.89 | bwd_microstep: 2190.48 | bwd_inner_microstep: 1774.76 | bwd_allreduce_microstep: 415.67 | step_microstep: 37.78
[2024-06-10 09:53:41,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16253.25 | bwd: 44012.42 | bwd_inner: 43595.70 | bwd_allreduce: 415.98 | step: 39.39
{'loss': 1.2993, 'learning_rate': 3.236049739110335e-05, 'epoch': 0.31}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 09:53:44,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1476.24 | bwd_inner_microstep: 1476.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3940
[2024-06-10 09:53:46,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1497.54 | bwd_inner_microstep: 1497.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 09:53:48,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.62 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3897
[2024-06-10 09:53:50,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.71 | bwd_microstep: 1482.34 | bwd_inner_microstep: 1482.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 09:53:51,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.32 | bwd_microstep: 1393.44 | bwd_inner_microstep: 1393.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 09:53:53,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.43 | bwd_microstep: 1181.97 | bwd_inner_microstep: 1181.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4099
[2024-06-10 09:53:56,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.60 | bwd_microstep: 1733.53 | bwd_inner_microstep: 1733.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1988
[2024-06-10 09:53:57,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.16 | bwd_microstep: 774.24 | bwd_inner_microstep: 774.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2975
[2024-06-10 09:53:58,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.45 | bwd_microstep: 1140.79 | bwd_inner_microstep: 1140.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 09:54:00,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.17 | bwd_microstep: 1484.56 | bwd_inner_microstep: 1484.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 09:54:02,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.63 | bwd_microstep: 1484.10 | bwd_inner_microstep: 1484.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3425
[2024-06-10 09:54:04,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1395.50 | bwd_inner_microstep: 1395.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397
[2024-06-10 09:54:06,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.12 | bwd_microstep: 1147.40 | bwd_inner_microstep: 1147.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3509
[2024-06-10 09:54:08,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1250.29 | bwd_inner_microstep: 1250.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 09:54:09,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1290.09 | bwd_inner_microstep: 1290.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521
[2024-06-10 09:54:11,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.92 | bwd_microstep: 1326.88 | bwd_inner_microstep: 1326.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 09:54:13,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.40 | bwd_microstep: 1295.40 | bwd_inner_microstep: 1295.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 09:54:15,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1311.69 | bwd_inner_microstep: 1311.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 09:54:17,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1376.42 | bwd_inner_microstep: 1376.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 09:54:18,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 808.79 | bwd_inner_microstep: 808.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 09:54:20,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1400.47 | bwd_inner_microstep: 1400.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 09:54:22,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.83 | bwd_microstep: 1300.66 | bwd_inner_microstep: 1300.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 09:54:23,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1401.45 | bwd_inner_microstep: 1401.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 09:54:26,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.91 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.96
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2480
[2024-06-10 09:54:27,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.46 | bwd_microstep: 954.84 | bwd_inner_microstep: 954.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2195
[2024-06-10 09:54:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.00 | bwd_microstep: 860.30 | bwd_inner_microstep: 860.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 09:54:30,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.40 | bwd_microstep: 1657.59 | bwd_inner_microstep: 1657.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1342
[2024-06-10 09:54:31,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 198.50 | bwd_microstep: 518.56 | bwd_inner_microstep: 518.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 09:54:33,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.69 | bwd_microstep: 1401.87 | bwd_inner_microstep: 1401.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 09:54:35,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1551.49 | bwd_inner_microstep: 1551.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3387
[2024-06-10 09:54:37,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.50 | bwd_microstep: 1438.97 | bwd_inner_microstep: 1438.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 09:54:42,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 09:54:42,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.77 | bwd_microstep: 4294.16 | bwd_inner_microstep: 1571.77 | bwd_allreduce_microstep: 2722.34 | step_microstep: 38.09
[2024-06-10 09:54:42,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15673.03 | bwd: 44566.63 | bwd_inner: 41843.27 | bwd_allreduce: 2722.62 | step: 40.68
{'loss': 1.2707, 'learning_rate': 3.233096799628012e-05, 'epoch': 0.31}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3393
[2024-06-10 09:54:44,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.56 | bwd_microstep: 1366.44 | bwd_inner_microstep: 1366.33 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 09:54:45,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.93 | bwd_microstep: 791.59 | bwd_inner_microstep: 791.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4204
[2024-06-10 09:54:47,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.90 | bwd_microstep: 1656.67 | bwd_inner_microstep: 1656.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4189
[2024-06-10 09:54:50,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.53 | bwd_microstep: 1753.62 | bwd_inner_microstep: 1753.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 09:54:52,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 1386.77 | bwd_inner_microstep: 1386.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 09:54:53,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1247.25 | bwd_inner_microstep: 1247.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 750
[2024-06-10 09:54:54,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.70 | bwd_microstep: 300.48 | bwd_inner_microstep: 300.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 09:54:56,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1386.51 | bwd_inner_microstep: 1386.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 09:54:58,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1386.39 | bwd_inner_microstep: 1386.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 09:54:59,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1292.43 | bwd_inner_microstep: 1292.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3542
[2024-06-10 09:55:01,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.94 | bwd_microstep: 1358.59 | bwd_inner_microstep: 1358.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2131
[2024-06-10 09:55:03,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.32 | bwd_microstep: 892.93 | bwd_inner_microstep: 892.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 09:55:05,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.60 | bwd_microstep: 1415.54 | bwd_inner_microstep: 1415.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1937
[2024-06-10 09:55:06,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.12 | bwd_microstep: 890.23 | bwd_inner_microstep: 890.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2681
[2024-06-10 09:55:07,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.74 | bwd_microstep: 1087.05 | bwd_inner_microstep: 1087.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3954
[2024-06-10 09:55:09,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.12 | bwd_microstep: 1506.38 | bwd_inner_microstep: 1506.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 09:55:11,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.00 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3522
[2024-06-10 09:55:13,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.51 | bwd_microstep: 1230.25 | bwd_inner_microstep: 1230.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3805
[2024-06-10 09:55:15,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1262.86 | bwd_inner_microstep: 1262.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3520
[2024-06-10 09:55:17,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.18 | bwd_microstep: 1436.83 | bwd_inner_microstep: 1436.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3817
[2024-06-10 09:55:19,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.92 | bwd_microstep: 1611.00 | bwd_inner_microstep: 1610.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 09:55:21,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.08 | bwd_microstep: 1357.45 | bwd_inner_microstep: 1357.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 09:55:23,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.84 | bwd_microstep: 1508.75 | bwd_inner_microstep: 1508.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 09:55:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.20 | bwd_microstep: 1525.83 | bwd_inner_microstep: 1525.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3671
[2024-06-10 09:55:27,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.46 | bwd_microstep: 1652.53 | bwd_inner_microstep: 1652.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 09:55:29,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1498.29 | bwd_inner_microstep: 1498.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687
[2024-06-10 09:55:32,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.48 | bwd_microstep: 1726.17 | bwd_inner_microstep: 1726.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2008
[2024-06-10 09:55:33,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.71 | bwd_microstep: 900.09 | bwd_inner_microstep: 900.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 09:55:35,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1494.57 | bwd_inner_microstep: 1494.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-10 09:55:37,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.52 | bwd_microstep: 1621.46 | bwd_inner_microstep: 1621.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 09:55:39,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1432.86 | bwd_inner_microstep: 1432.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3787
[2024-06-10 09:55:45,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.35 | optimizer_step: 6.61
[2024-06-10 09:55:45,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.75 | bwd_microstep: 4689.61 | bwd_inner_microstep: 1918.42 | bwd_allreduce_microstep: 2771.12 | step_microstep: 38.91
[2024-06-10 09:55:45,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16048.02 | bwd: 46064.30 | bwd_inner: 43292.16 | bwd_allreduce: 2771.41 | step: 40.76
{'loss': 1.2271, 'learning_rate': 3.23013951718517e-05, 'epoch': 0.31}
| 531/1726 [9:13:15<20:07:05, 60.61s/it]


 31%|███       | 531/1726 [9:13:15<20:07:05, 60.61s/it]
 31%|███       | 532/1726 [9:14:18<20:18:29, 61.23s/it]


 31%|███       | 532/1726 [9:14:18<20:18:29, 61.23s/it]
 31%|███       | 533/1726 [9:15:18<20:07:47, 60.74s/it]


 31%|███       | 533/1726 [9:15:18<20:07:47, 60.74s/it]
 31%|███       | 534/1726 [9:16:18<20:05:58, 60.70s/it]


 31%|███       | 534/1726 [9:16:18<20:05:58, 60.70s/it]
 31%|███       | 535/1726 [9:17:19<20:04:14, 60.67s/it]


 31%|███       | 535/1726 [9:17:19<20:04:14, 60.67s/it]
 31%|███       | 536/1726 [9:18:21<20:14:01, 61.21s/it]


 31%|███       dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 09:55:46,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.08 | bwd_microstep: 1330.65 | bwd_inner_microstep: 1330.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4049
[2024-06-10 09:55:49,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.27 | bwd_microstep: 1614.46 | bwd_inner_microstep: 1614.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 09:55:50,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.93 | bwd_microstep: 1254.39 | bwd_inner_microstep: 1254.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 09:55:52,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.11 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 09:55:54,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1352.29 | bwd_inner_microstep: 1352.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865
[2024-06-10 09:55:55,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.21 | bwd_microstep: 742.20 | bwd_inner_microstep: 742.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 09:55:57,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1249.67 | bwd_inner_microstep: 1249.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 09:55:58,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.50 | bwd_microstep: 1247.86 | bwd_inner_microstep: 1247.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2263
[2024-06-10 09:56:00,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.60 | bwd_microstep: 876.31 | bwd_inner_microstep: 876.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3733
[2024-06-10 09:56:02,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.23 | bwd_microstep: 1701.35 | bwd_inner_microstep: 1701.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 09:56:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1382.76 | bwd_inner_microstep: 1382.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 09:56:06,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1343.95 | bwd_inner_microstep: 1343.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 09:56:08,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.83 | bwd_microstep: 1616.71 | bwd_inner_microstep: 1616.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 09:56:10,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.97 | bwd_microstep: 1379.17 | bwd_inner_microstep: 1379.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1978
[2024-06-10 09:56:11,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.20 | bwd_microstep: 896.69 | bwd_inner_microstep: 896.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3638
[2024-06-10 09:56:13,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.42 | bwd_microstep: 1542.52 | bwd_inner_microstep: 1542.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3624
[2024-06-10 09:56:16,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.22 | bwd_microstep: 1656.95 | bwd_inner_microstep: 1656.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3530
[2024-06-10 09:56:17,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1361.68 | bwd_inner_microstep: 1361.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833
[2024-06-10 09:56:19,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.93 | bwd_microstep: 1460.81 | bwd_inner_microstep: 1460.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 09:56:21,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.31 | bwd_microstep: 882.33 | bwd_inner_microstep: 882.11 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3675
[2024-06-10 09:56:23,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1327.54 | bwd_inner_microstep: 1327.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2288
[2024-06-10 09:56:24,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.46 | bwd_microstep: 850.92 | bwd_inner_microstep: 850.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 09:56:26,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.60 | bwd_microstep: 1393.90 | bwd_inner_microstep: 1393.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3530
[2024-06-10 09:56:28,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.65 | bwd_microstep: 1561.88 | bwd_inner_microstep: 1561.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 09:56:30,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.15 | bwd_microstep: 1659.01 | bwd_inner_microstep: 1658.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-10 09:56:31,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.57 | bwd_microstep: 921.54 | bwd_inner_microstep: 921.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549
[2024-06-10 09:56:33,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.93 | bwd_microstep: 1332.91 | bwd_inner_microstep: 1332.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 09:56:35,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.79 | bwd_microstep: 1442.21 | bwd_inner_microstep: 1442.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3553
[2024-06-10 09:56:37,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.85 | bwd_microstep: 1541.90 | bwd_inner_microstep: 1541.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 09:56:39,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1399.17 | bwd_inner_microstep: 1399.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2223
[2024-06-10 09:56:41,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.49 | bwd_microstep: 965.42 | bwd_inner_microstep: 965.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3589
[2024-06-10 09:56:47,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 09:56:47,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.34 | bwd_microstep: 5614.27 | bwd_inner_microstep: 1926.47 | bwd_allreduce_microstep: 3687.74 | step_microstep: 38.71
[2024-06-10 09:56:47,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15802.79 | bwd: 46184.60 | bwd_inner: 42495.78 | bwd_allreduce: 3688.05 | step: 40.34
{'loss': 1.2966, 'learning_rate': 3.227177902197344e-05, 'epoch': 0.31}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388
[2024-06-10 09:56:49,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.42 | bwd_microstep: 1236.22 | bwd_inner_microstep: 1236.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3905
[2024-06-10 09:56:51,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.00 | bwd_microstep: 1483.63 | bwd_inner_microstep: 1483.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 09:56:53,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.43 | bwd_microstep: 1375.81 | bwd_inner_microstep: 1375.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 09:56:54,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1341.54 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4214
[2024-06-10 09:56:57,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.47 | bwd_microstep: 1560.15 | bwd_inner_microstep: 1560.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764
[2024-06-10 09:56:59,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.79 | bwd_microstep: 1436.70 | bwd_inner_microstep: 1436.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 09:57:00,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.78 | bwd_microstep: 1283.77 | bwd_inner_microstep: 1283.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 09:57:02,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.54 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199
[2024-06-10 09:57:04,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.21 | bwd_microstep: 956.39 | bwd_inner_microstep: 956.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 09:57:05,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1286.85 | bwd_inner_microstep: 1286.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 09:57:07,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 09:57:09,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.24 | bwd_microstep: 1286.96 | bwd_inner_microstep: 1286.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3493
[2024-06-10 09:57:11,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1346.62 | bwd_inner_microstep: 1346.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 09:57:13,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1496.48 | bwd_inner_microstep: 1496.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644
[2024-06-10 09:57:15,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1314.80 | bwd_inner_microstep: 1314.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513
[2024-06-10 09:57:17,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1420.06 | bwd_inner_microstep: 1420.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 09:57:19,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.33 | bwd_microstep: 1455.05 | bwd_inner_microstep: 1455.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2011
[2024-06-10 09:57:20,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.27 | bwd_microstep: 851.48 | bwd_inner_microstep: 851.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 09:57:22,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1293.16 | bwd_inner_microstep: 1293.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521
[2024-06-10 09:57:24,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.59 | bwd_microstep: 1199.19 | bwd_inner_microstep: 1199.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3713
[2024-06-10 09:57:25,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1367.80 | bwd_inner_microstep: 1367.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-10 09:57:28,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.63 | bwd_microstep: 1614.08 | bwd_inner_microstep: 1614.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2426
[2024-06-10 09:57:29,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.14 | bwd_microstep: 845.70 | bwd_inner_microstep: 845.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755
[2024-06-10 09:57:31,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1344.62 | bwd_inner_microstep: 1344.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3823
[2024-06-10 09:57:33,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.81 | bwd_microstep: 1693.86 | bwd_inner_microstep: 1693.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3598
[2024-06-10 09:57:35,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.89 | bwd_microstep: 1706.30 | bwd_inner_microstep: 1706.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 09:57:37,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1512.61 | bwd_inner_microstep: 1512.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 09:57:40,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.28 | bwd_microstep: 1538.43 | bwd_inner_microstep: 1538.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3727
[2024-06-10 09:57:42,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1399.31 | bwd_inner_microstep: 1399.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 09:57:44,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.87 | bwd_microstep: 1501.84 | bwd_inner_microstep: 1501.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3772
[2024-06-10 09:57:46,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1574.20 | bwd_inner_microstep: 1574.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3650
[2024-06-10 09:57:48,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 09:57:48,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.44 | bwd_microstep: 1825.60 | bwd_inner_microstep: 1817.83 | bwd_allreduce_microstep: 7.71 | step_microstep: 37.85
[2024-06-10 09:57:48,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16611.71 | bwd: 44410.63 | bwd_inner: 44402.00 | bwd_allreduce: 7.94 | step: 39.40
{'loss': 1.3197, 'learning_rate': 3.224211965095326e-05, 'epoch': 0.31}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 09:57:50,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1240.77 | bwd_inner_microstep: 1240.70 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 09:57:52,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 09:57:54,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.49 | bwd_microstep: 1657.53 | bwd_inner_microstep: 1657.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 09:57:56,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.56 | bwd_microstep: 1559.41 | bwd_inner_microstep: 1559.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 09:57:58,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.50 | bwd_microstep: 1255.40 | bwd_inner_microstep: 1255.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-10 09:58:00,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.26 | bwd_microstep: 1637.06 | bwd_inner_microstep: 1637.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 09:58:01,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.36 | bwd_microstep: 803.00 | bwd_inner_microstep: 802.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-10 09:58:03,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 09:58:05,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1394.91 | bwd_inner_microstep: 1394.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 09:58:07,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1388.91 | bwd_inner_microstep: 1388.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 09:58:08,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.80 | bwd_microstep: 813.73 | bwd_inner_microstep: 813.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496
[2024-06-10 09:58:10,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1316.86 | bwd_inner_microstep: 1316.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-10 09:58:11,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.28 | bwd_microstep: 834.55 | bwd_inner_microstep: 834.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 09:58:13,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.31 | bwd_microstep: 1190.85 | bwd_inner_microstep: 1190.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1896
[2024-06-10 09:58:14,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.21 | bwd_microstep: 729.07 | bwd_inner_microstep: 729.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 09:58:16,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.57 | bwd_microstep: 1295.04 | bwd_inner_microstep: 1295.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 09:58:18,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.33 | bwd_microstep: 1499.71 | bwd_inner_microstep: 1499.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 09:58:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 1592.94 | bwd_inner_microstep: 1592.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3401
[2024-06-10 09:58:22,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.22 | bwd_microstep: 1505.98 | bwd_inner_microstep: 1505.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 09:58:24,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1491.05 | bwd_inner_microstep: 1491.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 09:58:26,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.36 | bwd_microstep: 1343.72 | bwd_inner_microstep: 1343.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 09:58:28,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.85 | bwd_microstep: 1481.44 | bwd_inner_microstep: 1481.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3662
[2024-06-10 09:58:30,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.03 | bwd_microstep: 1477.69 | bwd_inner_microstep: 1477.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2009
[2024-06-10 09:58:31,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.98 | bwd_microstep: 865.59 | bwd_inner_microstep: 865.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036
[2024-06-10 09:58:32,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.15 | bwd_microstep: 811.35 | bwd_inner_microstep: 811.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-10 09:58:34,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.35 | bwd_microstep: 1539.35 | bwd_inner_microstep: 1539.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 09:58:37,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.99 | bwd_microstep: 1542.31 | bwd_inner_microstep: 1542.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 09:58:39,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.64 | bwd_microstep: 1450.90 | bwd_inner_microstep: 1450.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 09:58:41,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.11 | bwd_microstep: 1445.45 | bwd_inner_microstep: 1445.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-10 09:58:42,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.40 | bwd_microstep: 818.63 | bwd_inner_microstep: 818.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 09:58:44,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1559.41 | bwd_inner_microstep: 1559.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 09:58:51,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.38 | optimizer_step: 6.60
[2024-06-10 09:58:51,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.74 | bwd_microstep: 6881.58 | bwd_inner_microstep: 1762.50 | bwd_allreduce_microstep: 5119.01 | step_microstep: 39.00
[2024-06-10 09:58:51,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15621.05 | bwd: 47092.66 | bwd_inner: 41972.66 | bwd_allreduce: 5119.28 | step: 40.70
{'loss': 1.2778, 'learning_rate': 3.221241716325131e-05, 'epoch': 0.31}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 09:58:54,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1578.18 | bwd_inner_microstep: 1577.98 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.18
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878
[2024-06-10 09:58:55,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.72 | bwd_microstep: 741.90 | bwd_inner_microstep: 740.92 | bwd_allreduce_microstep: 0.93 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864
[2024-06-10 09:58:57,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.92 | bwd_microstep: 1462.17 | bwd_inner_microstep: 1462.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 09:58:59,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.02 | bwd_microstep: 1653.00 | bwd_inner_microstep: 1652.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 09:59:01,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.71 | bwd_microstep: 1245.17 | bwd_inner_microstep: 1245.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1888
[2024-06-10 09:59:02,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.42 | bwd_microstep: 773.74 | bwd_inner_microstep: 773.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 09:59:03,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.62 | bwd_microstep: 1279.90 | bwd_inner_microstep: 1279.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-10 09:59:04,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.85 | bwd_microstep: 683.67 | bwd_inner_microstep: 683.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 09:59:06,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1430.64 | bwd_inner_microstep: 1430.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865
[2024-06-10 09:59:07,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.27 | bwd_microstep: 741.81 | bwd_inner_microstep: 741.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 09:59:09,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.21 | bwd_microstep: 1481.91 | bwd_inner_microstep: 1481.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 09:59:11,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.70 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 09:59:13,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.92 | bwd_microstep: 1486.58 | bwd_inner_microstep: 1486.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3445
[2024-06-10 09:59:15,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.98 | bwd_microstep: 1315.82 | bwd_inner_microstep: 1315.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 09:59:17,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.78 | bwd_microstep: 1484.85 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 09:59:19,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1511.63 | bwd_inner_microstep: 1511.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 09:59:21,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.16 | bwd_microstep: 1419.23 | bwd_inner_microstep: 1419.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3690
[2024-06-10 09:59:23,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1661.70 | bwd_inner_microstep: 1661.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 09:59:26,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1660.28 | bwd_inner_microstep: 1660.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084
[2024-06-10 09:59:27,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.51 | bwd_microstep: 820.38 | bwd_inner_microstep: 820.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 09:59:28,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1157.07 | bwd_inner_microstep: 1157.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-10 09:59:30,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.74 | bwd_microstep: 1326.32 | bwd_inner_microstep: 1326.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3616
[2024-06-10 09:59:32,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.05 | bwd_microstep: 1561.31 | bwd_inner_microstep: 1561.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3546
[2024-06-10 09:59:35,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.22 | bwd_microstep: 1521.47 | bwd_inner_microstep: 1521.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3811
[2024-06-10 09:59:37,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.30 | bwd_microstep: 1638.12 | bwd_inner_microstep: 1638.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3828
[2024-06-10 09:59:39,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.50 | bwd_microstep: 1752.08 | bwd_inner_microstep: 1752.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2018
[2024-06-10 09:59:40,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.20 | bwd_microstep: 865.47 | bwd_inner_microstep: 865.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 09:59:43,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1649.85 | bwd_inner_microstep: 1649.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 09:59:45,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1395.85 | bwd_inner_microstep: 1395.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 09:59:46,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.29 | bwd_microstep: 1307.39 | bwd_inner_microstep: 1307.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3777
[2024-06-10 09:59:48,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.73 | bwd_microstep: 1511.14 | bwd_inner_microstep: 1511.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2264
[2024-06-10 09:59:53,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 09:59:53,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.07 | bwd_microstep: 3723.85 | bwd_inner_microstep: 992.31 | bwd_allreduce_microstep: 2731.48 | step_microstep: 39.47
[2024-06-10 09:59:53,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15781.54 | bwd: 45128.92 | bwd_inner: 42395.41 | bwd_allreduce: 2732.73 | step: 41.36
{'loss': 1.288, 'learning_rate': 3.21826716634796e-05, 'epoch': 0.31}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 09:59:55,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1367.93 | bwd_inner_microstep: 1367.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 09:59:56,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1378.26 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2043
[2024-06-10 09:59:58,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.69 | bwd_microstep: 838.22 | bwd_inner_microstep: 838.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 09:59:59,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1383.46 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3413
[2024-06-10 10:00:01,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.69 | bwd_microstep: 1181.71 | bwd_inner_microstep: 1181.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 10:00:03,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.77 | bwd_microstep: 1546.47 | bwd_inner_microstep: 1546.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 10:00:04,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.45 | bwd_microstep: 790.35 | bwd_inner_microstep: 790.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 10:00:06,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1282.98 | bwd_inner_microstep: 1282.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3506
[2024-06-10 10:00:08,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.07 | bwd_microstep: 1221.95 | bwd_inner_microstep: 1221.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713
[2024-06-10 10:00:10,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1333.60 | bwd_inner_microstep: 1333.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 10:00:12,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1386.39 | bwd_inner_microstep: 1386.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 10:00:14,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.83 | bwd_microstep: 1409.40 | bwd_inner_microstep: 1409.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734
[2024-06-10 10:00:16,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.25 | bwd_microstep: 1525.67 | bwd_inner_microstep: 1525.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 10:00:18,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1383.11 | bwd_inner_microstep: 1383.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3655
[2024-06-10 10:00:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.58 | bwd_microstep: 1768.20 | bwd_inner_microstep: 1768.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2002
[2024-06-10 10:00:21,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.91 | bwd_microstep: 832.04 | bwd_inner_microstep: 832.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 10:00:23,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1349.24 | bwd_inner_microstep: 1349.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-10 10:00:25,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1407.35 | bwd_inner_microstep: 1407.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-10 10:00:27,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.44 | bwd_microstep: 1542.77 | bwd_inner_microstep: 1542.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-10 10:00:29,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 1528.50 | bwd_inner_microstep: 1528.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 10:00:31,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1289.43 | bwd_inner_microstep: 1289.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 10:00:33,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.51 | bwd_microstep: 1255.22 | bwd_inner_microstep: 1255.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3678
[2024-06-10 10:00:35,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1374.71 | bwd_inner_microstep: 1374.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 10:00:37,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 1514.35 | bwd_inner_microstep: 1514.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 10:00:39,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.53 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3567
[2024-06-10 10:00:41,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.19 | bwd_microstep: 1666.20 | bwd_inner_microstep: 1666.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-10 10:00:43,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.82 | bwd_microstep: 1437.41 | bwd_inner_microstep: 1437.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 10:00:45,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1396.40 | bwd_inner_microstep: 1396.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3746
[2024-06-10 10:00:47,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.23 | bwd_microstep: 1344.15 | bwd_inner_microstep: 1344.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 10:00:49,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1550.79 | bwd_inner_microstep: 1550.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 10:00:50,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.78 | bwd_microstep: 682.35 | bwd_inner_microstep: 682.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3575
[2024-06-10 10:00:52,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.57
[2024-06-10 10:00:52,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1951.55 | bwd_inner_microstep: 1573.36 | bwd_allreduce_microstep: 378.14 | step_microstep: 37.85
[2024-06-10 10:00:52,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16033.27 | bwd: 43305.03 | bwd_inner: 42925.99 | bwd_allreduce: 378.37 | step: 39.34
{'loss': 1.2859, 'learning_rate': 3.215288325640161e-05, 'epoch': 0.31}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 10:00:54,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1282.31 | bwd_inner_microstep: 1282.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4508
[2024-06-10 10:00:56,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.60 | bwd_microstep: 1743.46 | bwd_inner_microstep: 1743.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3845
[2024-06-10 10:00:58,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1362.62 | bwd_inner_microstep: 1362.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 10:01:00,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1248.13 | bwd_inner_microstep: 1248.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 10:01:02,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1873
[2024-06-10 10:01:03,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 709.03 | bwd_inner_microstep: 709.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1867
[2024-06-10 10:01:04,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.77 | bwd_microstep: 741.67 | bwd_inner_microstep: 741.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 10:01:06,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.76 | bwd_microstep: 1386.86 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 10:01:08,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.30 | bwd_microstep: 1491.23 | bwd_inner_microstep: 1491.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 10:01:10,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1285.21 | bwd_inner_microstep: 1285.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3505
[2024-06-10 10:01:12,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1529.87 | bwd_inner_microstep: 1529.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2059
[2024-06-10 10:01:13,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.76 | bwd_microstep: 972.12 | bwd_inner_microstep: 972.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3413
[2024-06-10 10:01:15,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1488.11 | bwd_inner_microstep: 1488.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 10:01:17,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.44 | bwd_microstep: 1480.70 | bwd_inner_microstep: 1480.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 10:01:19,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.81 | bwd_microstep: 1432.17 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3631
[2024-06-10 10:01:21,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.11 | bwd_microstep: 1216.17 | bwd_inner_microstep: 1216.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 10:01:23,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.06 | bwd_microstep: 1487.35 | bwd_inner_microstep: 1487.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-10 10:01:24,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 699.74 | bwd_inner_microstep: 699.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2428
[2024-06-10 10:01:25,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.92 | bwd_microstep: 942.88 | bwd_inner_microstep: 942.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 10:01:28,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 1656.89 | bwd_inner_microstep: 1656.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 10:01:29,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967
[2024-06-10 10:01:30,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.00 | bwd_microstep: 701.44 | bwd_inner_microstep: 701.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 10:01:33,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.00 | bwd_microstep: 1662.58 | bwd_inner_microstep: 1662.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 10:01:35,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.46 | bwd_microstep: 1493.84 | bwd_inner_microstep: 1493.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 10:01:37,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.06 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 10:01:39,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1555.86 | bwd_inner_microstep: 1555.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2184
[2024-06-10 10:01:40,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.91 | bwd_microstep: 809.15 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3815
[2024-06-10 10:01:42,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.21 | bwd_microstep: 1718.22 | bwd_inner_microstep: 1718.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 10:01:44,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1338.91 | bwd_inner_microstep: 1338.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566
[2024-06-10 10:01:46,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.47 | bwd_microstep: 1594.90 | bwd_inner_microstep: 1594.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 10:01:48,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1424.09 | bwd_inner_microstep: 1424.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2697
[2024-06-10 10:01:54,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.33 | optimizer_step: 6.58
[2024-06-10 10:01:54,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.10 | bwd_microstep: 5565.10 | bwd_inner_microstep: 1393.27 | bwd_allreduce_microstep: 4171.75 | step_microstep: 38.55
[2024-06-10 10:01:54,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15638.62 | bwd: 46143.75 | bwd_inner: 41971.07 | bwd_allreduce: 4171.99 | step: 40.03
| 536/1726 [9:18:21<20:14:01, 61.21s/it]
 31%|███       | 537/1726 [9:19:24<20:19:42, 61.55s/it]


 31%|███       | 537/1726 [9:19:24<20:19:42, 61.55s/it]
 31%|███       | 538/1726 [9:20:25<20:17:40, 61.50s/it]


 31%|███       | 538/1726 [9:20:25<20:17:40, 61.50s/it]
 31%|███       | 539/1726 [9:21:28<20:25:56, 61.97s/it]


 31%|███       | 539/1726 [9:21:28<20:25:56, 61.97s/it]
 31%|███▏      | 540/1726 [9:22:29<20:20:47, 61.76s/it]


 31%|███▏      | 540/1726 [9:22:29<20:20:47, 61.76s/it]
 31%|███▏      | 541/1726 [9:23:29<20:07:26, 61.14s/it]


 31%|███▏      | 541/1726 [9:23:29<20:07:26, 61.14s/it]
 31%|███▏      | 542/1726 [9:24:31<20:12:15, 61.43s/it]
      {'loss': 1.2645, 'learning_rate': 3.212305204693198e-05, 'epoch': 0.31}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3605
[2024-06-10 10:01:56,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.43 | bwd_microstep: 1205.86 | bwd_inner_microstep: 1205.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2496
[2024-06-10 10:01:57,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.34 | bwd_microstep: 1019.04 | bwd_inner_microstep: 1019.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 10:02:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1546.52 | bwd_inner_microstep: 1546.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1531
[2024-06-10 10:02:00,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 228.72 | bwd_microstep: 591.67 | bwd_inner_microstep: 591.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 10:02:03,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.69 | bwd_microstep: 1625.97 | bwd_inner_microstep: 1625.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 10:02:04,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.61 | bwd_microstep: 1241.54 | bwd_inner_microstep: 1241.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1359
[2024-06-10 10:02:05,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 223.70 | bwd_microstep: 581.83 | bwd_inner_microstep: 581.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3626
[2024-06-10 10:02:07,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1215.45 | bwd_inner_microstep: 1215.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 10:02:09,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.31 | bwd_microstep: 1628.10 | bwd_inner_microstep: 1628.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2077
[2024-06-10 10:02:10,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.15 | bwd_microstep: 789.60 | bwd_inner_microstep: 789.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3686
[2024-06-10 10:02:12,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1549.95 | bwd_inner_microstep: 1549.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3517
[2024-06-10 10:02:14,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1418.35 | bwd_inner_microstep: 1418.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 10:02:16,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1288.79 | bwd_inner_microstep: 1288.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2479
[2024-06-10 10:02:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.35 | bwd_microstep: 1048.83 | bwd_inner_microstep: 1048.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 10:02:20,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.03 | bwd_microstep: 1715.65 | bwd_inner_microstep: 1715.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723
[2024-06-10 10:02:22,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.72 | bwd_microstep: 1338.80 | bwd_inner_microstep: 1338.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-10 10:02:23,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.46 | bwd_microstep: 1190.22 | bwd_inner_microstep: 1190.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 10:02:25,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.39 | bwd_microstep: 809.88 | bwd_inner_microstep: 809.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 10:02:27,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.30 | bwd_microstep: 1659.48 | bwd_inner_microstep: 1659.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477
[2024-06-10 10:02:29,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.25 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-10 10:02:30,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.51 | bwd_microstep: 1311.07 | bwd_inner_microstep: 1311.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3725
[2024-06-10 10:02:32,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1414.87 | bwd_inner_microstep: 1414.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 10:02:33,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.95 | bwd_microstep: 803.14 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 10:02:36,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.58 | bwd_microstep: 1556.54 | bwd_inner_microstep: 1556.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3791
[2024-06-10 10:02:38,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.35 | bwd_microstep: 1478.29 | bwd_inner_microstep: 1478.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 10:02:40,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.87 | bwd_microstep: 1423.85 | bwd_inner_microstep: 1423.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424
[2024-06-10 10:02:42,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1410.55 | bwd_inner_microstep: 1410.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3646
[2024-06-10 10:02:43,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.33 | bwd_microstep: 1347.52 | bwd_inner_microstep: 1347.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 10:02:45,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.59 | bwd_microstep: 808.66 | bwd_inner_microstep: 808.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-10 10:02:46,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.32 | bwd_microstep: 1350.80 | bwd_inner_microstep: 1350.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 10:02:48,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.25 | bwd_microstep: 804.16 | bwd_inner_microstep: 804.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3737
[2024-06-10 10:02:56,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.84 | optimizer_step: 6.58
[2024-06-10 10:02:56,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.50 | bwd_microstep: 7408.13 | bwd_inner_microstep: 1681.69 | bwd_allreduce_microstep: 5726.37 | step_microstep: 39.41
[2024-06-10 10:02:56,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14975.81 | bwd: 45832.17 | bwd_inner: 40104.84 | bwd_allreduce: 5726.62 | step: 41.07
{'loss': 1.3068, 'learning_rate': 3.2093178140136064e-05, 'epoch': 0.31}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 10:02:58,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1468.74 | bwd_inner_microstep: 1468.62 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 10:02:59,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.52 | bwd_microstep: 1280.56 | bwd_inner_microstep: 1280.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 10:03:00,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.95 | bwd_microstep: 677.85 | bwd_inner_microstep: 677.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 10:03:02,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.76 | bwd_microstep: 1534.03 | bwd_inner_microstep: 1534.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4257
[2024-06-10 10:03:05,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.57 | bwd_microstep: 1760.09 | bwd_inner_microstep: 1760.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 10:03:07,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1385.63 | bwd_inner_microstep: 1385.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:03:08,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1245.46 | bwd_inner_microstep: 1245.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 10:03:10,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.69 | bwd_microstep: 1149.09 | bwd_inner_microstep: 1149.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1908
[2024-06-10 10:03:11,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 716.46 | bwd_inner_microstep: 716.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 10:03:13,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.27 | bwd_microstep: 1389.47 | bwd_inner_microstep: 1389.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3683
[2024-06-10 10:03:15,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.70 | bwd_microstep: 1551.20 | bwd_inner_microstep: 1551.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 10:03:17,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.38 | bwd_microstep: 1185.53 | bwd_inner_microstep: 1185.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 10:03:19,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1443.44 | bwd_inner_microstep: 1443.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3504
[2024-06-10 10:03:21,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.66 | bwd_microstep: 1550.81 | bwd_inner_microstep: 1550.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-10 10:03:23,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1367.93 | bwd_inner_microstep: 1367.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1914
[2024-06-10 10:03:24,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.28 | bwd_microstep: 746.79 | bwd_inner_microstep: 746.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3519
[2024-06-10 10:03:26,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.99 | bwd_microstep: 1682.33 | bwd_inner_microstep: 1682.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3570
[2024-06-10 10:03:28,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.18 | bwd_microstep: 1240.11 | bwd_inner_microstep: 1240.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3628
[2024-06-10 10:03:30,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.19 | bwd_microstep: 1808.26 | bwd_inner_microstep: 1808.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 10:03:32,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.89 | bwd_microstep: 1399.41 | bwd_inner_microstep: 1399.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 883
[2024-06-10 10:03:33,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.14 | bwd_microstep: 398.70 | bwd_inner_microstep: 398.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 10:03:35,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.62 | bwd_microstep: 1381.06 | bwd_inner_microstep: 1381.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 10:03:37,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.62 | bwd_microstep: 1626.42 | bwd_inner_microstep: 1626.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 10:03:39,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1596.02 | bwd_inner_microstep: 1595.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-10 10:03:42,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.97 | bwd_microstep: 1750.63 | bwd_inner_microstep: 1750.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2128
[2024-06-10 10:03:43,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.11 | bwd_microstep: 1025.38 | bwd_inner_microstep: 1025.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 10:03:45,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.24 | bwd_microstep: 1392.64 | bwd_inner_microstep: 1392.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032
[2024-06-10 10:03:46,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.75 | bwd_microstep: 903.81 | bwd_inner_microstep: 903.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 10:03:48,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1253.64 | bwd_inner_microstep: 1253.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.25
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 10:03:50,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1400.49 | bwd_inner_microstep: 1400.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 10:03:52,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.30 | bwd_microstep: 1627.26 | bwd_inner_microstep: 1627.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2200
[2024-06-10 10:03:59,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.59 | optimizer_gradients: 4.30 | optimizer_step: 6.62
[2024-06-10 10:03:59,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.13 | bwd_microstep: 6802.47 | bwd_inner_microstep: 976.83 | bwd_allreduce_microstep: 5825.58 | step_microstep: 38.75
[2024-06-10 10:03:59,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15600.32 | bwd: 47741.72 | bwd_inner: 41915.13 | bwd_allreduce: 5825.87 | step: 41.60
{'loss': 1.269, 'learning_rate': 3.20632616412296e-05, 'epoch': 0.32}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 10:04:01,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.02 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-10 10:04:04,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1557.89 | bwd_inner_microstep: 1557.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 10:04:05,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 1350.46 | bwd_inner_microstep: 1350.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 10:04:08,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.56 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 10:04:10,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1552.70 | bwd_inner_microstep: 1552.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 10:04:12,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 10:04:13,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1243.97 | bwd_inner_microstep: 1243.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 10:04:15,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1284.14 | bwd_inner_microstep: 1284.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883
[2024-06-10 10:04:16,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.15 | bwd_microstep: 681.08 | bwd_inner_microstep: 681.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 10:04:18,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.00 | bwd_microstep: 1307.71 | bwd_inner_microstep: 1307.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 10:04:20,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.10 | bwd_microstep: 1285.90 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 10:04:22,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.26 | bwd_microstep: 1482.24 | bwd_inner_microstep: 1482.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3638
[2024-06-10 10:04:24,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.05 | bwd_microstep: 1812.01 | bwd_inner_microstep: 1811.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 10:04:26,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.75 | bwd_microstep: 1379.49 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3511
[2024-06-10 10:04:28,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1550.40 | bwd_inner_microstep: 1550.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 10:04:30,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1615.05 | bwd_inner_microstep: 1615.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495
[2024-06-10 10:04:32,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.47 | bwd_microstep: 1429.25 | bwd_inner_microstep: 1429.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 10:04:34,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 1514.10 | bwd_inner_microstep: 1514.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 10:04:36,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.85 | bwd_microstep: 1487.11 | bwd_inner_microstep: 1487.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2671
[2024-06-10 10:04:38,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.89 | bwd_microstep: 1155.10 | bwd_inner_microstep: 1155.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2683
[2024-06-10 10:04:40,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.75 | bwd_microstep: 1033.88 | bwd_inner_microstep: 1033.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 10:04:41,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-10 10:04:43,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.93 | bwd_microstep: 1454.11 | bwd_inner_microstep: 1454.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3549
[2024-06-10 10:04:45,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.92 | bwd_microstep: 1456.56 | bwd_inner_microstep: 1456.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 10:04:48,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1511.87 | bwd_inner_microstep: 1511.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 10:04:49,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.09 | bwd_microstep: 1190.79 | bwd_inner_microstep: 1190.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 10:04:51,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.66 | bwd_microstep: 1251.54 | bwd_inner_microstep: 1251.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 10:04:53,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1407.68 | bwd_inner_microstep: 1407.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-10 10:04:55,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.16 | bwd_microstep: 1634.68 | bwd_inner_microstep: 1634.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 10:04:56,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.44 | bwd_microstep: 793.43 | bwd_inner_microstep: 793.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3035
[2024-06-10 10:04:58,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.97 | bwd_microstep: 1329.21 | bwd_inner_microstep: 1329.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3805
[2024-06-10 10:05:01,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.17 | optimizer_step: 6.64
[2024-06-10 10:05:01,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.24 | bwd_microstep: 2444.81 | bwd_inner_microstep: 1733.56 | bwd_allreduce_microstep: 711.19 | step_microstep: 37.66
[2024-06-10 10:05:01,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16494.27 | bwd: 45062.71 | bwd_inner: 44350.60 | bwd_allreduce: 711.42 | step: 39.56
{'loss': 1.3437, 'learning_rate': 3.2033302655578343e-05, 'epoch': 0.32}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 10:05:03,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.12 | bwd_microstep: 1338.96 | bwd_inner_microstep: 1338.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 10:05:05,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1375.42 | bwd_inner_microstep: 1375.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 10:05:07,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1386.91 | bwd_inner_microstep: 1386.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 10:05:09,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.34 | bwd_microstep: 1555.68 | bwd_inner_microstep: 1555.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 10:05:11,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.72 | bwd_microstep: 1149.24 | bwd_inner_microstep: 1149.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 10:05:13,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1432.07 | bwd_inner_microstep: 1432.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 10:05:14,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 10:05:16,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1251.67 | bwd_inner_microstep: 1251.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 10:05:18,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1246.14 | bwd_inner_microstep: 1246.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 10:05:20,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.66 | bwd_microstep: 1386.11 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3843
[2024-06-10 10:05:22,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.58 | bwd_microstep: 1695.71 | bwd_inner_microstep: 1695.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3408
[2024-06-10 10:05:24,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.29 | bwd_microstep: 1203.41 | bwd_inner_microstep: 1203.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1974
[2024-06-10 10:05:25,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.74 | bwd_microstep: 742.43 | bwd_inner_microstep: 742.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 10:05:27,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1384.44 | bwd_inner_microstep: 1384.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3386
[2024-06-10 10:05:29,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.82 | bwd_microstep: 1388.85 | bwd_inner_microstep: 1388.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-10 10:05:31,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.77 | bwd_microstep: 1417.97 | bwd_inner_microstep: 1417.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3511
[2024-06-10 10:05:33,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1554.65 | bwd_inner_microstep: 1554.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 10:05:35,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.89 | bwd_microstep: 1660.23 | bwd_inner_microstep: 1660.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-10 10:05:36,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 696.49 | bwd_inner_microstep: 696.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 10:05:38,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.20 | bwd_microstep: 1517.52 | bwd_inner_microstep: 1517.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847
[2024-06-10 10:05:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.39 | bwd_microstep: 1670.27 | bwd_inner_microstep: 1670.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 10:05:43,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.09 | bwd_microstep: 1607.02 | bwd_inner_microstep: 1606.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830
[2024-06-10 10:05:45,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.57 | bwd_microstep: 1756.69 | bwd_inner_microstep: 1756.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2052
[2024-06-10 10:05:46,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.77 | bwd_microstep: 1009.62 | bwd_inner_microstep: 1009.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 10:05:48,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1350.69 | bwd_inner_microstep: 1350.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-10 10:05:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.11 | bwd_microstep: 1604.40 | bwd_inner_microstep: 1604.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 10:05:52,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1445.20 | bwd_inner_microstep: 1445.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561
[2024-06-10 10:05:54,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1327.16 | bwd_inner_microstep: 1327.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3426
[2024-06-10 10:05:56,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1397.09 | bwd_inner_microstep: 1397.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447
[2024-06-10 10:05:58,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.81 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 10:06:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1353.86 | bwd_inner_microstep: 1353.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3582
[2024-06-10 10:06:03,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.13 | optimizer_step: 6.59
[2024-06-10 10:06:03,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.14 | bwd_microstep: 2210.17 | bwd_inner_microstep: 1812.80 | bwd_allreduce_microstep: 397.33 | step_microstep: 37.56
[2024-06-10 10:06:03,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16446.56 | bwd: 44687.43 | bwd_inner: 44289.21 | bwd_allreduce: 397.55 | step: 39.17
{'loss': 1.2778, 'learning_rate': 3.20033012886977e-05, 'epoch': 0.32}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 10:06:05,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.13 | bwd_microstep: 1374.28 | bwd_inner_microstep: 1374.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4025
[2024-06-10 10:06:07,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.72 | bwd_microstep: 1713.26 | bwd_inner_microstep: 1713.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479
[2024-06-10 10:06:09,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.64 | bwd_microstep: 1411.71 | bwd_inner_microstep: 1411.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 10:06:11,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1270.41 | bwd_inner_microstep: 1270.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 10:06:12,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1245.75 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 10:06:14,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1282.52 | bwd_inner_microstep: 1282.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 10:06:15,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.41 | bwd_microstep: 789.63 | bwd_inner_microstep: 789.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 10:06:17,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1248.39 | bwd_inner_microstep: 1248.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 10:06:19,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1289.29 | bwd_inner_microstep: 1289.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-10 10:06:21,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.05 | bwd_microstep: 1528.03 | bwd_inner_microstep: 1528.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 10:06:22,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.95 | bwd_microstep: 818.45 | bwd_inner_microstep: 818.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 10:06:24,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.42 | bwd_microstep: 1385.64 | bwd_inner_microstep: 1385.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 10:06:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.45 | bwd_microstep: 1624.73 | bwd_inner_microstep: 1624.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1932
[2024-06-10 10:06:27,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.83 | bwd_microstep: 851.18 | bwd_inner_microstep: 851.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892
[2024-06-10 10:06:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.23 | bwd_microstep: 1578.50 | bwd_inner_microstep: 1578.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 10:06:32,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.93 | bwd_microstep: 1482.91 | bwd_inner_microstep: 1482.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 10:06:34,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.56 | bwd_microstep: 1514.18 | bwd_inner_microstep: 1514.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3622
[2024-06-10 10:06:36,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.23 | bwd_microstep: 1649.52 | bwd_inner_microstep: 1649.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 10:06:38,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.84 | bwd_microstep: 1644.23 | bwd_inner_microstep: 1644.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642
[2024-06-10 10:06:40,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.13 | bwd_microstep: 1714.38 | bwd_inner_microstep: 1714.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 10:06:42,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.66 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 10:06:45,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.38 | bwd_microstep: 1648.05 | bwd_inner_microstep: 1648.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441
[2024-06-10 10:06:46,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.53 | bwd_microstep: 1188.20 | bwd_inner_microstep: 1188.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-10 10:06:48,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.55 | bwd_microstep: 1538.59 | bwd_inner_microstep: 1538.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 10:06:50,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 1411.21 | bwd_inner_microstep: 1411.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 10:06:52,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.95 | bwd_microstep: 1457.14 | bwd_inner_microstep: 1457.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3716
[2024-06-10 10:06:55,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.66 | bwd_microstep: 1566.78 | bwd_inner_microstep: 1566.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 10:06:56,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.82 | bwd_microstep: 1345.08 | bwd_inner_microstep: 1345.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 10:06:58,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1394.74 | bwd_inner_microstep: 1394.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2242
[2024-06-10 10:07:00,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.92 | bwd_microstep: 1000.31 | bwd_inner_microstep: 1000.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 10:07:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.01 | bwd_microstep: 1184.84 | bwd_inner_microstep: 1184.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 10:07:05,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.47 | optimizer_step: 6.61
[2024-06-10 10:07:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 2631.64 | bwd_inner_microstep: 1569.75 | bwd_allreduce_microstep: 1061.81 | step_microstep: 42.73
[2024-06-10 10:07:05,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16419.38 | bwd: 45167.47 | bwd_inner: 44104.71 | bwd_allreduce: 1062.05 | step: 44.38
{'loss': 1.2791, 'learning_rate': 3.19732576462523e-05, 'epoch': 0.32}


 31%|███▏      | 542/1726 [9:24:31<20:12:15, 61.43s/it]
 31%|███▏      | 543/1726 [9:25:32<20:09:30, 61.34s/it]


 31%|███▏      | 543/1726 [9:25:32<20:09:30, 61.34s/it]
 32%|███▏      | 544/1726 [9:26:36<20:22:22, 62.05s/it]


 32%|███▏      | 544/1726 [9:26:36<20:22:22, 62.05s/it]
 32%|███▏      | 545/1726 [9:27:38<20:20:37, 62.01s/it]


 32%|███▏      | 545/1726 [9:27:38<20:20:37, 62.01s/it]
 32%|███▏      | 546/1726 [9:28:39<20:16:30, 61.86s/it]


 32%|███▏      | 546/1726 [9:28:39<20:16:30, 61.86s/it]
 32%|███▏      | 547/1726 [9:29:41<20:15:59, 61.88s/it]


 32%|███▏      | 547/1726 [9:29:41<20:15dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 10:07:07,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1490.28 | bwd_inner_microstep: 1490.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2355
[2024-06-10 10:07:08,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.44 | bwd_microstep: 988.19 | bwd_inner_microstep: 988.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 10:07:10,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1377.80 | bwd_inner_microstep: 1377.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 10:07:12,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1476.06 | bwd_inner_microstep: 1476.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 10:07:14,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.71 | bwd_microstep: 1453.06 | bwd_inner_microstep: 1453.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-10 10:07:16,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.04 | bwd_microstep: 1563.95 | bwd_inner_microstep: 1563.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419
[2024-06-10 10:07:18,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1281.05 | bwd_inner_microstep: 1281.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 10:07:20,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.24 | bwd_microstep: 1353.45 | bwd_inner_microstep: 1353.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 10:07:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.38 | bwd_microstep: 1510.98 | bwd_inner_microstep: 1510.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 10:07:24,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.98 | bwd_microstep: 1415.85 | bwd_inner_microstep: 1415.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 10:07:26,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.76 | bwd_microstep: 1618.39 | bwd_inner_microstep: 1618.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 10:07:28,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.27 | bwd_microstep: 1342.84 | bwd_inner_microstep: 1342.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446
[2024-06-10 10:07:30,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1414.73 | bwd_inner_microstep: 1414.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-10 10:07:32,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1320.79 | bwd_inner_microstep: 1320.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3647
[2024-06-10 10:07:34,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.38 | bwd_microstep: 1647.54 | bwd_inner_microstep: 1647.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3689
[2024-06-10 10:07:36,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1331.34 | bwd_inner_microstep: 1331.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-10 10:07:38,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.73 | bwd_microstep: 1624.18 | bwd_inner_microstep: 1624.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 10:07:40,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.99 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3869
[2024-06-10 10:07:42,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1467.89 | bwd_inner_microstep: 1467.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 10:07:44,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.41 | bwd_microstep: 1658.51 | bwd_inner_microstep: 1658.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3715
[2024-06-10 10:07:46,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.13 | bwd_microstep: 1333.02 | bwd_inner_microstep: 1333.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 10:07:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.94 | bwd_microstep: 1415.52 | bwd_inner_microstep: 1415.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 10:07:50,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1296.85 | bwd_inner_microstep: 1296.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1993
[2024-06-10 10:07:51,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.97 | bwd_microstep: 710.63 | bwd_inner_microstep: 710.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3575
[2024-06-10 10:07:53,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.93 | bwd_microstep: 1235.59 | bwd_inner_microstep: 1235.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3584
[2024-06-10 10:07:55,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.00 | bwd_microstep: 1551.72 | bwd_inner_microstep: 1551.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1601
[2024-06-10 10:07:55,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.52 | bwd_microstep: 552.15 | bwd_inner_microstep: 552.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1917
[2024-06-10 10:07:57,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.54 | bwd_microstep: 763.69 | bwd_inner_microstep: 763.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388
[2024-06-10 10:07:58,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.44 | bwd_microstep: 1337.38 | bwd_inner_microstep: 1337.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 10:08:01,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.54 | bwd_microstep: 1545.33 | bwd_inner_microstep: 1545.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 10:08:03,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.88 | bwd_microstep: 1533.70 | bwd_inner_microstep: 1533.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 10:08:05,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 10:08:05,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.51 | bwd_microstep: 2258.50 | bwd_inner_microstep: 1114.09 | bwd_allreduce_microstep: 1144.37 | step_microstep: 37.77
[2024-06-10 10:08:05,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16091.78 | bwd: 44270.22 | bwd_inner: 43124.91 | bwd_allreduce: 1144.60 | step: 39.59
{'loss': 1.2421, 'learning_rate': 3.194317183405573e-05, 'epoch': 0.32}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2406
[2024-06-10 10:08:07,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.38 | bwd_microstep: 1020.81 | bwd_inner_microstep: 1020.67 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3975
[2024-06-10 10:08:09,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.06 | bwd_microstep: 1699.58 | bwd_inner_microstep: 1699.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2360
[2024-06-10 10:08:10,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.33 | bwd_microstep: 988.94 | bwd_inner_microstep: 988.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3918
[2024-06-10 10:08:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.50 | bwd_microstep: 1590.39 | bwd_inner_microstep: 1590.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 10:08:14,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1247.30 | bwd_inner_microstep: 1247.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1354
[2024-06-10 10:08:15,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 213.49 | bwd_microstep: 551.87 | bwd_inner_microstep: 551.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 10:08:17,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1248.53 | bwd_inner_microstep: 1248.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3607
[2024-06-10 10:08:19,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.15 | bwd_microstep: 1275.56 | bwd_inner_microstep: 1275.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 10:08:20,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 10:08:22,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.90 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2087
[2024-06-10 10:08:23,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.76 | bwd_microstep: 821.60 | bwd_inner_microstep: 821.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 10:08:25,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1383.18 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 10:08:27,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1382.77 | bwd_inner_microstep: 1382.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 10:08:29,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1345.14 | bwd_inner_microstep: 1345.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 10:08:31,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1418.78 | bwd_inner_microstep: 1418.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 10:08:33,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 10:08:35,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1288.00 | bwd_inner_microstep: 1287.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292
[2024-06-10 10:08:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.65 | bwd_microstep: 877.66 | bwd_inner_microstep: 877.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-10 10:08:37,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.19 | bwd_microstep: 683.76 | bwd_inner_microstep: 683.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 10:08:39,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.76 | bwd_microstep: 1520.38 | bwd_inner_microstep: 1520.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 10:08:41,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1384.68 | bwd_inner_microstep: 1384.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 10:08:43,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1512.02 | bwd_inner_microstep: 1511.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3608
[2024-06-10 10:08:45,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1369.22 | bwd_inner_microstep: 1369.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 10:08:47,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1398.18 | bwd_inner_microstep: 1398.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 10:08:49,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.06 | bwd_microstep: 1514.76 | bwd_inner_microstep: 1514.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 10:08:51,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1510.42 | bwd_inner_microstep: 1510.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3556
[2024-06-10 10:08:53,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.19 | bwd_microstep: 1560.55 | bwd_inner_microstep: 1560.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002
[2024-06-10 10:08:54,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.31 | bwd_microstep: 895.10 | bwd_inner_microstep: 895.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2066
[2024-06-10 10:08:55,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.16 | bwd_microstep: 880.68 | bwd_inner_microstep: 880.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1946
[2024-06-10 10:08:57,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.21 | bwd_microstep: 822.03 | bwd_inner_microstep: 822.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 10:08:58,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.12 | bwd_microstep: 1354.22 | bwd_inner_microstep: 1354.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3536
[2024-06-10 10:09:06,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.33 | optimizer_step: 6.60
[2024-06-10 10:09:06,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.01 | bwd_microstep: 6533.74 | bwd_inner_microstep: 1795.39 | bwd_allreduce_microstep: 4738.28 | step_microstep: 38.76
[2024-06-10 10:09:06,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14989.36 | bwd: 44928.70 | bwd_inner: 40189.38 | bwd_allreduce: 4738.58 | step: 40.33
{'loss': 1.3505, 'learning_rate': 3.191304395807004e-05, 'epoch': 0.32}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1911
[2024-06-10 10:09:07,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 707.16 | bwd_inner_microstep: 707.02 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 10:09:08,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1254.72 | bwd_inner_microstep: 1254.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 10:09:10,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-10 10:09:13,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.55 | bwd_microstep: 1636.37 | bwd_inner_microstep: 1636.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406
[2024-06-10 10:09:14,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.89 | bwd_microstep: 1209.61 | bwd_inner_microstep: 1209.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 10:09:16,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.90 | bwd_microstep: 1545.03 | bwd_inner_microstep: 1545.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 10:09:18,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.72 | bwd_microstep: 791.24 | bwd_inner_microstep: 791.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885
[2024-06-10 10:09:19,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.47 | bwd_microstep: 680.74 | bwd_inner_microstep: 680.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171
[2024-06-10 10:09:20,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.05 | bwd_microstep: 852.47 | bwd_inner_microstep: 852.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689
[2024-06-10 10:09:22,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.79 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672
[2024-06-10 10:09:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.96 | bwd_microstep: 1545.84 | bwd_inner_microstep: 1545.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994
[2024-06-10 10:09:25,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.53 | bwd_microstep: 830.57 | bwd_inner_microstep: 830.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660
[2024-06-10 10:09:27,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1449.74 | bwd_inner_microstep: 1449.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3667
[2024-06-10 10:09:29,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.25 | bwd_microstep: 1445.63 | bwd_inner_microstep: 1445.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 10:09:31,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.77 | bwd_microstep: 1357.06 | bwd_inner_microstep: 1357.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3486
[2024-06-10 10:09:33,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.91 | bwd_microstep: 1247.30 | bwd_inner_microstep: 1247.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3421
[2024-06-10 10:09:35,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1474.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 10:09:37,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1381.28 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3435
[2024-06-10 10:09:39,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1396.20 | bwd_inner_microstep: 1396.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3430
[2024-06-10 10:09:41,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.79 | bwd_microstep: 1494.76 | bwd_inner_microstep: 1494.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3610
[2024-06-10 10:09:42,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.75 | bwd_microstep: 1311.54 | bwd_inner_microstep: 1311.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2284
[2024-06-10 10:09:44,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.62 | bwd_microstep: 941.97 | bwd_inner_microstep: 941.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 10:09:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.17 | bwd_microstep: 1188.57 | bwd_inner_microstep: 1188.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 10:09:48,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 1513.96 | bwd_inner_microstep: 1513.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 10:09:50,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 1496.13 | bwd_inner_microstep: 1496.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2196
[2024-06-10 10:09:51,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.86 | bwd_microstep: 863.85 | bwd_inner_microstep: 863.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3547
[2024-06-10 10:09:53,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.53 | bwd_microstep: 1561.41 | bwd_inner_microstep: 1561.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3589
[2024-06-10 10:09:55,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.04 | bwd_microstep: 1702.48 | bwd_inner_microstep: 1702.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1938
[2024-06-10 10:09:56,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.48 | bwd_microstep: 774.45 | bwd_inner_microstep: 774.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 10:09:58,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1496.08 | bwd_inner_microstep: 1496.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 10:10:00,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.92 | bwd_microstep: 1162.65 | bwd_inner_microstep: 1162.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-10 10:10:07,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 10:10:07,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 6773.06 | bwd_inner_microstep: 1857.27 | bwd_allreduce_microstep: 4915.74 | step_microstep: 38.29
[2024-06-10 10:10:07,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15352.70 | bwd: 46191.69 | bwd_inner: 41274.93 | bwd_allreduce: 4916.02 | step: 39.89
{'loss': 1.2711, 'learning_rate': 3.188287412440546e-05, 'epoch': 0.32}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 10:10:09,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 10:10:11,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.04 | bwd_microstep: 1477.40 | bwd_inner_microstep: 1477.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3828
[2024-06-10 10:10:13,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.42 | bwd_microstep: 1414.38 | bwd_inner_microstep: 1414.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3948
[2024-06-10 10:10:16,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1596.77 | bwd_inner_microstep: 1596.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4156
[2024-06-10 10:10:18,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.68 | bwd_microstep: 1738.99 | bwd_inner_microstep: 1738.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 10:10:20,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1345.83 | bwd_inner_microstep: 1345.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 10:10:22,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.83 | bwd_microstep: 1546.97 | bwd_inner_microstep: 1546.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2635
[2024-06-10 10:10:23,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.70 | bwd_microstep: 954.43 | bwd_inner_microstep: 954.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469
[2024-06-10 10:10:25,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.99 | bwd_microstep: 1343.51 | bwd_inner_microstep: 1343.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2081
[2024-06-10 10:10:26,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.52 | bwd_microstep: 851.56 | bwd_inner_microstep: 851.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3635
[2024-06-10 10:10:28,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1321.71 | bwd_inner_microstep: 1321.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 10:10:30,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.07 | bwd_microstep: 1381.02 | bwd_inner_microstep: 1380.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2029
[2024-06-10 10:10:31,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.79 | bwd_microstep: 902.30 | bwd_inner_microstep: 902.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 10:10:33,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.38 | bwd_microstep: 1341.68 | bwd_inner_microstep: 1341.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 10:10:35,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1484.45 | bwd_inner_microstep: 1484.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 10:10:37,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1379.10 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3649
[2024-06-10 10:10:39,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.76 | bwd_microstep: 1444.02 | bwd_inner_microstep: 1443.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 10:10:41,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1335.65 | bwd_inner_microstep: 1335.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 10:10:42,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.54 | bwd_microstep: 807.88 | bwd_inner_microstep: 807.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-10 10:10:43,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.14 | bwd_microstep: 698.22 | bwd_inner_microstep: 698.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 10:10:45,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1292.52 | bwd_inner_microstep: 1292.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 10:10:47,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1282.61 | bwd_inner_microstep: 1282.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2313
[2024-06-10 10:10:48,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.02 | bwd_microstep: 887.78 | bwd_inner_microstep: 887.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 10:10:50,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.43 | bwd_microstep: 1447.77 | bwd_inner_microstep: 1447.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3501
[2024-06-10 10:10:52,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1348.20 | bwd_inner_microstep: 1348.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3450
[2024-06-10 10:10:53,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.57 | bwd_microstep: 1189.08 | bwd_inner_microstep: 1189.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-10 10:10:55,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1426.04 | bwd_inner_microstep: 1426.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 10:10:57,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.78 | bwd_microstep: 1441.30 | bwd_inner_microstep: 1441.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3564
[2024-06-10 10:10:59,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.34 | bwd_microstep: 1244.87 | bwd_inner_microstep: 1244.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534
[2024-06-10 10:11:01,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.91 | bwd_microstep: 1448.66 | bwd_inner_microstep: 1448.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3377
[2024-06-10 10:11:03,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.23 | bwd_microstep: 1270.13 | bwd_inner_microstep: 1270.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3593
[2024-06-10 10:11:08,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 10:11:08,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.61 | bwd_microstep: 4306.57 | bwd_inner_microstep: 1767.48 | bwd_allreduce_microstep: 2539.04 | step_microstep: 37.85
[2024-06-10 10:11:08,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15604.70 | bwd: 44300.02 | bwd_inner: 41760.06 | bwd_allreduce: 2539.27 | step: 39.34
{'loss': 1.3043, 'learning_rate': 3.185266243931998e-05, 'epoch': 0.32}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-10 10:11:10,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1436.82 | bwd_inner_microstep: 1436.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4001
[2024-06-10 10:11:12,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.38 | bwd_microstep: 1703.89 | bwd_inner_microstep: 1703.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896
[2024-06-10 10:11:14,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 1583.42 | bwd_inner_microstep: 1583.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3883
[2024-06-10 10:11:16,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.76 | bwd_microstep: 1481.92 | bwd_inner_microstep: 1481.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 10:11:18,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1343.30 | bwd_inner_microstep: 1343.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888
[2024-06-10 10:11:20,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.59 | bwd_microstep: 1089.98 | bwd_inner_microstep: 1089.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-10 10:11:21,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.22 | bwd_microstep: 1279.20 | bwd_inner_microstep: 1279.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 10:11:23,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.88 | bwd_microstep: 1437.47 | bwd_inner_microstep: 1437.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 10:11:25,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.19 | bwd_microstep: 1456.78 | bwd_inner_microstep: 1456.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-10 10:11:27,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.02 | bwd_microstep: 1155.93 | bwd_inner_microstep: 1155.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 10:11:29,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1416.20 | bwd_inner_microstep: 1416.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 10:11:31,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1390.25 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1424
[2024-06-10 10:11:32,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 223.91 | bwd_microstep: 565.86 | bwd_inner_microstep: 565.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657
[2024-06-10 10:11:34,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.96 | bwd_microstep: 1521.36 | bwd_inner_microstep: 1521.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-10 10:11:36,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3647
[2024-06-10 10:11:38,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.37 | bwd_microstep: 1711.68 | bwd_inner_microstep: 1711.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482
[2024-06-10 10:11:40,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.13 | bwd_microstep: 1573.22 | bwd_inner_microstep: 1573.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 10:11:43,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1490.11 | bwd_inner_microstep: 1490.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 10:11:45,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1478.48 | bwd_inner_microstep: 1478.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 10:11:47,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.61 | bwd_microstep: 1428.34 | bwd_inner_microstep: 1428.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 10:11:48,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.26 | bwd_microstep: 1187.15 | bwd_inner_microstep: 1187.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 10:11:50,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1353.50 | bwd_inner_microstep: 1353.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3831
[2024-06-10 10:11:52,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.67 | bwd_microstep: 1708.30 | bwd_inner_microstep: 1708.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532
[2024-06-10 10:11:54,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1326.75 | bwd_inner_microstep: 1326.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 10:11:56,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.72 | bwd_microstep: 1302.92 | bwd_inner_microstep: 1302.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3584
[2024-06-10 10:11:58,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.13 | bwd_microstep: 1365.62 | bwd_inner_microstep: 1365.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3773
[2024-06-10 10:12:00,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.69 | bwd_microstep: 1409.08 | bwd_inner_microstep: 1409.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 10:12:02,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1305.28 | bwd_inner_microstep: 1305.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3448
[2024-06-10 10:12:04,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.41 | bwd_microstep: 1511.02 | bwd_inner_microstep: 1510.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 10:12:06,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.56 | bwd_microstep: 1539.26 | bwd_inner_microstep: 1539.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3483
[2024-06-10 10:12:08,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1432.13 | bwd_inner_microstep: 1432.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 10:12:10,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.18 | optimizer_step: 6.67
[2024-06-10 10:12:10,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.99 | bwd_microstep: 1286.69 | bwd_inner_microstep: 1277.62 | bwd_allreduce_microstep: 9.02 | step_microstep: 37.82
[2024-06-10 10:12:10,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16790.08 | bwd: 44859.11 | bwd_inner: 44849.14 | bwd_allreduce: 9.27 | step: 39.35
{'loss': 1.2678, 'learning_rate': 3.182240900921901e-05, 'epoch': 0.32}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 10:12:12,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.88 | bwd_microstep: 1478.65 | bwd_inner_microstep: 1478.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3930
[2024-06-10 10:12:14,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 1398.94 | bwd_inner_microstep: 1398.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2269
[2024-06-10 10:12:15,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.58 | bwd_microstep: 871.91 | bwd_inner_microstep: 871.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 10:12:17,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.65 | bwd_microstep: 1247.53 | bwd_inner_microstep: 1247.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 10:12:18,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.56 | bwd_microstep: 792.55 | bwd_inner_microstep: 792.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3775
[2024-06-10 10:12:20,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1377.81 | bwd_inner_microstep: 1377.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 10:12:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1260.09 | bwd_inner_microstep: 1260.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3518
[2024-06-10 10:12:23,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.96 | bwd_microstep: 1255.68 | bwd_inner_microstep: 1255.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 10:12:25,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1434.72 | bwd_inner_microstep: 1434.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 10:12:27,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1278.79 | bwd_inner_microstep: 1278.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511
[2024-06-10 10:12:29,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.90 | bwd_microstep: 1445.16 | bwd_inner_microstep: 1445.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1964
[2024-06-10 10:12:30,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.08 | bwd_microstep: 891.11 | bwd_inner_microstep: 891.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3938
[2024-06-10 10:12:32,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1429.56 | bwd_inner_microstep: 1429.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3659
[2024-06-10 10:12:34,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.59 | bwd_microstep: 1546.09 | bwd_inner_microstep: 1546.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3667
[2024-06-10 10:12:37,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.73 | bwd_microstep: 1721.76 | bwd_inner_microstep: 1721.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 10:12:38,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1388.41 | bwd_inner_microstep: 1388.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 10:12:41,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.44 | bwd_microstep: 1654.34 | bwd_inner_microstep: 1654.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 10:12:43,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.16 | bwd_microstep: 1501.74 | bwd_inner_microstep: 1501.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 10:12:45,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.20 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 10:12:46,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.04 | bwd_microstep: 697.73 | bwd_inner_microstep: 697.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-10 10:12:48,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.20 | bwd_microstep: 1313.34 | bwd_inner_microstep: 1313.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 10:12:49,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.53 | bwd_microstep: 1315.69 | bwd_inner_microstep: 1315.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3874
[2024-06-10 10:12:51,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1485.26 | bwd_inner_microstep: 1485.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 10:12:54,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.02 | bwd_microstep: 1547.70 | bwd_inner_microstep: 1547.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 10:12:55,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.07 | bwd_microstep: 1263.91 | bwd_inner_microstep: 1263.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 10:12:57,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1409.47 | bwd_inner_microstep: 1409.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3639
[2024-06-10 10:12:59,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.08 | bwd_microstep: 1436.97 | bwd_inner_microstep: 1436.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 10:13:01,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.69 | bwd_microstep: 1472.34 | bwd_inner_microstep: 1472.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2483
[2024-06-10 10:13:03,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.08 | bwd_microstep: 929.65 | bwd_inner_microstep: 929.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3805
[2024-06-10 10:13:05,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.74 | bwd_microstep: 1856.75 | bwd_inner_microstep: 1856.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3729
[2024-06-10 10:13:07,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.26 | bwd_microstep: 1667.32 | bwd_inner_microstep: 1667.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3810
[2024-06-10 10:13:11,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 10:13:11,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.65 | bwd_microstep: 3026.62 | bwd_inner_microstep: 1777.91 | bwd_allreduce_microstep: 1248.66 | step_microstep: 37.91
[2024-06-10 10:13:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16174.97 | bwd: 44806.18 | bwd_inner: 43556.60 | bwd_allreduce: 1248.89 | step: 39.52
:59, 61.88s/it]
 32%|███▏      | 548/1726 [9:30:42<20:08:04, 61.53s/it]


 32%|███▏      | 548/1726 [9:30:42<20:08:04, 61.53s/it]
 32%|███▏      | 549/1726 [9:31:42<19:59:32, 61.15s/it]


 32%|███▏      | 549/1726 [9:31:42<19:59:32, 61.15s/it]
 32%|███▏      | 550/1726 [9:32:44<20:02:51, 61.37s/it]


 32%|███▏      | 550/1726 [9:32:44<20:02:51, 61.37s/it]
 32%|███▏      | 551/1726 [9:33:44<19:55:07, 61.03s/it]


 32%|███▏      | 551/1726 [9:33:44<19:55:07, 61.03s/it]
 32%|███▏      | 552/1726 [9:34:46<19:59:48, 61.32s/it]


 32%|███▏      | 552/1726 [9:34:46<19:59:48, 61.32s/it]
 32%|███▏      | 553/1726 [9:35:48<19:58:50, 61.32s/it]
                   {'loss': 1.2565, 'learning_rate': 3.1792113940654976e-05, 'epoch': 0.32}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 10:13:13,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.98 | bwd_microstep: 1178.18 | bwd_inner_microstep: 1178.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 10:13:14,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.69 | bwd_microstep: 1276.73 | bwd_inner_microstep: 1276.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 10:13:16,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.09 | bwd_microstep: 1180.85 | bwd_inner_microstep: 1180.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 10:13:18,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1546.79 | bwd_inner_microstep: 1546.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 10:13:20,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1550.65 | bwd_inner_microstep: 1550.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 10:13:22,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1295.33 | bwd_inner_microstep: 1295.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 10:13:24,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.97 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:13:26,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1245.02 | bwd_inner_microstep: 1244.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 10:13:28,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.41 | bwd_microstep: 1625.79 | bwd_inner_microstep: 1625.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 10:13:30,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1410.22 | bwd_inner_microstep: 1410.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:13:32,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1250.18 | bwd_inner_microstep: 1250.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 10:13:34,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1434.97 | bwd_inner_microstep: 1434.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 10:13:36,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.75 | bwd_microstep: 1482.64 | bwd_inner_microstep: 1482.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1894
[2024-06-10 10:13:37,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.63 | bwd_microstep: 788.85 | bwd_inner_microstep: 788.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:13:38,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1247.56 | bwd_inner_microstep: 1247.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 10:13:41,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.09 | bwd_microstep: 1622.31 | bwd_inner_microstep: 1622.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539
[2024-06-10 10:13:43,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.65 | bwd_microstep: 1456.84 | bwd_inner_microstep: 1456.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 10:13:44,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.79 | bwd_microstep: 1192.86 | bwd_inner_microstep: 1192.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 10:13:46,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.06 | bwd_microstep: 1282.18 | bwd_inner_microstep: 1282.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-10 10:13:48,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.48 | bwd_microstep: 1390.64 | bwd_inner_microstep: 1390.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 10:13:50,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1557.28 | bwd_inner_microstep: 1557.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2435
[2024-06-10 10:13:51,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.62 | bwd_microstep: 949.51 | bwd_inner_microstep: 949.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 10:13:53,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1406.77 | bwd_inner_microstep: 1406.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 10:13:55,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.11 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 10:13:57,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1290.43 | bwd_inner_microstep: 1290.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 10:13:59,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.45 | bwd_microstep: 1545.04 | bwd_inner_microstep: 1545.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 10:14:01,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.96 | bwd_microstep: 1259.07 | bwd_inner_microstep: 1259.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 10:14:03,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.74 | bwd_microstep: 1458.66 | bwd_inner_microstep: 1458.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 10:14:05,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.75 | bwd_microstep: 1450.47 | bwd_inner_microstep: 1450.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-10 10:14:07,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.73 | bwd_microstep: 1282.67 | bwd_inner_microstep: 1282.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 10:14:09,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.31 | bwd_microstep: 1648.49 | bwd_inner_microstep: 1648.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 10:14:11,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 10:14:11,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.21 | bwd_microstep: 1849.90 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 365.93 | step_microstep: 38.92
[2024-06-10 10:14:11,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16253.09 | bwd: 43725.99 | bwd_inner: 43359.12 | bwd_allreduce: 366.17 | step: 40.67
{'loss': 1.2713, 'learning_rate': 3.176177734032693e-05, 'epoch': 0.32}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-10 10:14:13,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.80 | bwd_microstep: 1437.32 | bwd_inner_microstep: 1437.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4000
[2024-06-10 10:14:15,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.28 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 10:14:17,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.53 | bwd_microstep: 1373.22 | bwd_inner_microstep: 1373.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461
[2024-06-10 10:14:19,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1212.90 | bwd_inner_microstep: 1212.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-10 10:14:21,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1413.86 | bwd_inner_microstep: 1413.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 10:14:23,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1249.75 | bwd_inner_microstep: 1249.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 10:14:24,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1277.36 | bwd_inner_microstep: 1277.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 10:14:26,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.11 | bwd_microstep: 1387.90 | bwd_inner_microstep: 1387.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 10:14:28,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.74 | bwd_microstep: 1249.08 | bwd_inner_microstep: 1249.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2772
[2024-06-10 10:14:30,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.26 | bwd_microstep: 1050.74 | bwd_inner_microstep: 1050.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 10:14:32,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.69 | bwd_microstep: 1439.03 | bwd_inner_microstep: 1439.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3760
[2024-06-10 10:14:34,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.73 | bwd_microstep: 1516.00 | bwd_inner_microstep: 1515.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385
[2024-06-10 10:14:35,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1238.86 | bwd_inner_microstep: 1238.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746
[2024-06-10 10:14:38,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.08 | bwd_microstep: 1633.18 | bwd_inner_microstep: 1633.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 10:14:40,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.46 | bwd_microstep: 1483.19 | bwd_inner_microstep: 1483.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1980
[2024-06-10 10:14:41,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.72 | bwd_microstep: 831.27 | bwd_inner_microstep: 831.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2164
[2024-06-10 10:14:42,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.45 | bwd_microstep: 951.20 | bwd_inner_microstep: 951.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-10 10:14:44,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.39 | bwd_microstep: 1411.32 | bwd_inner_microstep: 1411.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679
[2024-06-10 10:14:46,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1325.39 | bwd_inner_microstep: 1325.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 10:14:48,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.43 | bwd_microstep: 1611.89 | bwd_inner_microstep: 1611.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 10:14:50,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.88 | bwd_microstep: 1613.67 | bwd_inner_microstep: 1613.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 10:14:52,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.98 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 10:14:54,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1375.63 | bwd_inner_microstep: 1375.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552
[2024-06-10 10:14:56,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1426.07 | bwd_inner_microstep: 1426.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 10:14:58,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.84 | bwd_microstep: 1408.86 | bwd_inner_microstep: 1408.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 10:14:59,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.94 | bwd_microstep: 697.13 | bwd_inner_microstep: 697.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 10:15:01,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.19 | bwd_microstep: 1353.03 | bwd_inner_microstep: 1353.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570
[2024-06-10 10:15:03,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.68 | bwd_microstep: 1522.05 | bwd_inner_microstep: 1522.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 10:15:05,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.84 | bwd_microstep: 1300.01 | bwd_inner_microstep: 1299.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 10:15:07,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.77 | bwd_microstep: 1284.14 | bwd_inner_microstep: 1284.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3593
[2024-06-10 10:15:09,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1270.33 | bwd_inner_microstep: 1270.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 10:15:12,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 10:15:12,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.68 | bwd_microstep: 2711.22 | bwd_inner_microstep: 1633.68 | bwd_allreduce_microstep: 1077.47 | step_microstep: 38.48
[2024-06-10 10:15:12,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16068.30 | bwd: 44100.40 | bwd_inner: 43021.92 | bwd_allreduce: 1077.76 | step: 40.22
{'loss': 1.3112, 'learning_rate': 3.173139931508025e-05, 'epoch': 0.32}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 10:15:14,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1376.32 | bwd_inner_microstep: 1376.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3880
[2024-06-10 10:15:16,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.66 | bwd_microstep: 1546.10 | bwd_inner_microstep: 1546.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942
[2024-06-10 10:15:17,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.60 | bwd_microstep: 820.68 | bwd_inner_microstep: 820.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 10:15:19,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.73 | bwd_microstep: 1249.56 | bwd_inner_microstep: 1249.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009
[2024-06-10 10:15:20,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.47 | bwd_microstep: 739.16 | bwd_inner_microstep: 739.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 10:15:22,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.35 | bwd_microstep: 1531.78 | bwd_inner_microstep: 1531.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 10:15:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.84 | bwd_microstep: 794.54 | bwd_inner_microstep: 794.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 10:15:25,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1390.44 | bwd_inner_microstep: 1390.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-10 10:15:27,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1282.87 | bwd_inner_microstep: 1282.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3715
[2024-06-10 10:15:29,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.51 | bwd_microstep: 1729.74 | bwd_inner_microstep: 1729.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3665
[2024-06-10 10:15:31,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.41 | bwd_microstep: 1479.90 | bwd_inner_microstep: 1479.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 10:15:33,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.96 | bwd_microstep: 1615.18 | bwd_inner_microstep: 1615.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3485
[2024-06-10 10:15:36,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.31 | bwd_microstep: 1580.33 | bwd_inner_microstep: 1580.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-10 10:15:38,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.93 | bwd_microstep: 1583.66 | bwd_inner_microstep: 1583.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519
[2024-06-10 10:15:40,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.39 | bwd_microstep: 1448.71 | bwd_inner_microstep: 1448.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 10:15:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1395.05 | bwd_inner_microstep: 1395.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3980
[2024-06-10 10:15:44,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.29 | bwd_microstep: 1682.20 | bwd_inner_microstep: 1682.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-10 10:15:46,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.66 | bwd_microstep: 1656.79 | bwd_inner_microstep: 1656.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 10:15:48,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1460.46 | bwd_inner_microstep: 1460.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 10:15:51,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.47 | bwd_microstep: 1653.63 | bwd_inner_microstep: 1653.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3623
[2024-06-10 10:15:53,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1444.48 | bwd_inner_microstep: 1444.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 10:15:54,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1391.77 | bwd_inner_microstep: 1391.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 10:15:56,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.41 | bwd_microstep: 1402.57 | bwd_inner_microstep: 1402.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1984
[2024-06-10 10:15:57,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.65 | bwd_microstep: 767.80 | bwd_inner_microstep: 767.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 10:15:59,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.26 | bwd_microstep: 1438.72 | bwd_inner_microstep: 1438.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 10:16:01,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3625
[2024-06-10 10:16:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.11 | bwd_microstep: 1657.24 | bwd_inner_microstep: 1657.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2554
[2024-06-10 10:16:05,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.51 | bwd_microstep: 1002.47 | bwd_inner_microstep: 1002.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 10:16:07,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1515.76 | bwd_inner_microstep: 1515.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3587
[2024-06-10 10:16:09,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.31 | bwd_microstep: 1700.90 | bwd_inner_microstep: 1700.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 10:16:11,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 10:16:14,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 10:16:14,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 2358.05 | bwd_inner_microstep: 1516.25 | bwd_allreduce_microstep: 841.75 | step_microstep: 37.68
[2024-06-10 10:16:14,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16589.85 | bwd: 45495.72 | bwd_inner: 44653.07 | bwd_allreduce: 841.98 | step: 39.43
{'loss': 1.3083, 'learning_rate': 3.170097997190615e-05, 'epoch': 0.32}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 10:16:16,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.85 | bwd_microstep: 1470.45 | bwd_inner_microstep: 1470.26 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 10:16:18,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1254.61 | bwd_inner_microstep: 1254.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 10:16:20,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-10 10:16:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.99 | bwd_microstep: 677.66 | bwd_inner_microstep: 677.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 10:16:23,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.45 | bwd_microstep: 1551.17 | bwd_inner_microstep: 1551.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 10:16:25,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1278.93 | bwd_inner_microstep: 1278.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 10:16:26,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.91 | bwd_microstep: 796.18 | bwd_inner_microstep: 796.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2034
[2024-06-10 10:16:27,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.34 | bwd_microstep: 809.99 | bwd_inner_microstep: 809.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 10:16:29,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1285.52 | bwd_inner_microstep: 1285.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3512
[2024-06-10 10:16:31,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.52 | bwd_microstep: 1224.80 | bwd_inner_microstep: 1224.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000
[2024-06-10 10:16:32,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.05 | bwd_microstep: 770.48 | bwd_inner_microstep: 770.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 10:16:33,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.43 | bwd_microstep: 895.56 | bwd_inner_microstep: 895.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2493
[2024-06-10 10:16:34,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.85 | bwd_microstep: 957.72 | bwd_inner_microstep: 957.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 10:16:36,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.08 | bwd_microstep: 1613.83 | bwd_inner_microstep: 1613.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 10:16:39,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.74 | bwd_microstep: 1615.77 | bwd_inner_microstep: 1615.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 10:16:41,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1346.06 | bwd_inner_microstep: 1346.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 10:16:42,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1246.25 | bwd_inner_microstep: 1246.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891
[2024-06-10 10:16:43,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.85 | bwd_microstep: 715.39 | bwd_inner_microstep: 715.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 10:16:45,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1493.52 | bwd_inner_microstep: 1493.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 10:16:47,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.04 | bwd_microstep: 1294.68 | bwd_inner_microstep: 1294.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 10:16:49,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.12 | bwd_microstep: 1511.51 | bwd_inner_microstep: 1511.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 10:16:51,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1388.21 | bwd_inner_microstep: 1388.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 10:16:53,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.07 | bwd_microstep: 1542.49 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3601
[2024-06-10 10:16:55,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.73 | bwd_microstep: 1568.13 | bwd_inner_microstep: 1568.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 10:16:57,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1497.87 | bwd_inner_microstep: 1497.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 10:16:59,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 10:17:01,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1348.21 | bwd_inner_microstep: 1348.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 10:17:03,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.64 | bwd_microstep: 1528.64 | bwd_inner_microstep: 1528.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3632
[2024-06-10 10:17:06,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.57 | bwd_microstep: 1711.79 | bwd_inner_microstep: 1711.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574
[2024-06-10 10:17:08,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1423.19 | bwd_inner_microstep: 1423.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021
[2024-06-10 10:17:09,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.43 | bwd_microstep: 712.86 | bwd_inner_microstep: 712.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 10:17:16,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.35 | optimizer_step: 6.58
[2024-06-10 10:17:16,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 7189.51 | bwd_inner_microstep: 1679.28 | bwd_allreduce_microstep: 5510.15 | step_microstep: 38.93
[2024-06-10 10:17:16,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15258.14 | bwd: 46387.38 | bwd_inner: 40876.15 | bwd_allreduce: 5510.13 | step: 40.69
{'loss': 1.3073, 'learning_rate': 3.167051941794143e-05, 'epoch': 0.32}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484
[2024-06-10 10:17:18,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1567.03 | bwd_inner_microstep: 1567.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 10:17:20,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 796.29 | bwd_inner_microstep: 796.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850
[2024-06-10 10:17:22,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.14 | bwd_microstep: 1660.57 | bwd_inner_microstep: 1660.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275
[2024-06-10 10:17:23,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.77 | bwd_microstep: 873.04 | bwd_inner_microstep: 873.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 10:17:25,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.27 | bwd_microstep: 1501.10 | bwd_inner_microstep: 1501.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-10 10:17:27,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.85 | bwd_microstep: 1650.28 | bwd_inner_microstep: 1650.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3782
[2024-06-10 10:17:29,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1286.58 | bwd_inner_microstep: 1286.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 10:17:31,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.69 | bwd_microstep: 1147.94 | bwd_inner_microstep: 1147.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 10:17:33,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.14 | bwd_microstep: 1317.73 | bwd_inner_microstep: 1317.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690
[2024-06-10 10:17:35,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.30 | bwd_microstep: 1360.57 | bwd_inner_microstep: 1360.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-10 10:17:36,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1414.12 | bwd_inner_microstep: 1414.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2974
[2024-06-10 10:17:38,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.99 | bwd_microstep: 1168.35 | bwd_inner_microstep: 1168.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3666
[2024-06-10 10:17:41,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.42 | bwd_microstep: 1752.88 | bwd_inner_microstep: 1752.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3692
[2024-06-10 10:17:43,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.74 | bwd_microstep: 1725.40 | bwd_inner_microstep: 1725.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 10:17:45,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.74 | bwd_microstep: 1345.71 | bwd_inner_microstep: 1345.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131
[2024-06-10 10:17:46,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.63 | bwd_microstep: 929.98 | bwd_inner_microstep: 929.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 10:17:48,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 10:17:50,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1408.65 | bwd_inner_microstep: 1408.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 10:17:52,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.63 | bwd_microstep: 1657.87 | bwd_inner_microstep: 1657.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3609
[2024-06-10 10:17:54,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.01 | bwd_microstep: 1444.21 | bwd_inner_microstep: 1444.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 10:17:56,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1510.36 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 10:17:59,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1557.04 | bwd_inner_microstep: 1557.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 10:18:01,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.46 | bwd_microstep: 1508.72 | bwd_inner_microstep: 1508.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 10:18:03,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.16 | bwd_microstep: 1397.06 | bwd_inner_microstep: 1397.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 10:18:05,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1559.48 | bwd_inner_microstep: 1559.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 10:18:07,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.57 | bwd_microstep: 1417.64 | bwd_inner_microstep: 1417.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2077
[2024-06-10 10:18:08,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.08 | bwd_microstep: 947.53 | bwd_inner_microstep: 947.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3570
[2024-06-10 10:18:10,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.43 | bwd_microstep: 1335.07 | bwd_inner_microstep: 1335.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1933
[2024-06-10 10:18:11,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.69 | bwd_microstep: 765.58 | bwd_inner_microstep: 765.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 10:18:13,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1447.96 | bwd_inner_microstep: 1447.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 10:18:15,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1282.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 10:18:17,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 10:18:17,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.49 | bwd_microstep: 1676.73 | bwd_inner_microstep: 1542.35 | bwd_allreduce_microstep: 134.34 | step_microstep: 37.62
[2024-06-10 10:18:17,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16328.30 | bwd: 43930.90 | bwd_inner: 43795.66 | bwd_allreduce: 134.56 | step: 39.34
{'loss': 1.3121, 'learning_rate': 3.1640017760467984e-05, 'epoch': 0.32}


 32%|███▏      | 553/1726 [9:35:48<19:58:50, 61.32s/it]
 32%|███▏      | 554/1726 [9:36:48<19:52:04, 61.03s/it]


 32%|███▏      | 554/1726 [9:36:48<19:52:04, 61.03s/it]
 32%|███▏      | 555/1726 [9:37:49<19:48:05, 60.88s/it]


 32%|███▏      | 555/1726 [9:37:49<19:48:05, 60.88s/it]
 32%|███▏      | 556/1726 [9:38:51<19:56:17, 61.35s/it]


 32%|███▏      | 556/1726 [9:38:51<19:56:17, 61.35s/it]
 32%|███▏      | 557/1726 [9:39:53<19:59:04, 61.54s/it]


 32%|███▏      | 557/1726 [9:39:53<19:59:04, 61.54s/it]
 32%|███▏      | 558/1726 [9:40:54<19:52:39, 61.27s/it]


 32%|███▏      | 558/1726 [9:40:54<19:52:39, 61.27s/idynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 10:18:19,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.95 | bwd_microstep: 1273.01 | bwd_inner_microstep: 1272.90 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558
[2024-06-10 10:18:21,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.44 | bwd_microstep: 1571.97 | bwd_inner_microstep: 1571.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 10:18:23,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.96 | bwd_microstep: 1282.31 | bwd_inner_microstep: 1282.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 10:18:25,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.10 | bwd_microstep: 1652.30 | bwd_inner_microstep: 1652.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 10:18:27,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1550.74 | bwd_inner_microstep: 1550.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608
[2024-06-10 10:18:29,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1314.51 | bwd_inner_microstep: 1314.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 10:18:31,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1284.84 | bwd_inner_microstep: 1284.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 10:18:32,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.84 | bwd_microstep: 960.09 | bwd_inner_microstep: 960.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 10:18:34,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.35 | bwd_microstep: 1255.96 | bwd_inner_microstep: 1255.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 10:18:36,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.83 | bwd_microstep: 1615.51 | bwd_inner_microstep: 1615.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-10 10:18:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1286.58 | bwd_inner_microstep: 1286.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 10:18:40,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.29 | bwd_microstep: 1481.86 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3542
[2024-06-10 10:18:42,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.93 | bwd_microstep: 1380.68 | bwd_inner_microstep: 1380.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 10:18:44,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.35 | bwd_microstep: 1321.73 | bwd_inner_microstep: 1321.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 10:18:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.13 | bwd_microstep: 795.81 | bwd_inner_microstep: 795.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 10:18:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1293.02 | bwd_inner_microstep: 1292.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 10:18:48,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1293.15 | bwd_inner_microstep: 1293.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 10:18:51,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.11 | bwd_microstep: 1660.45 | bwd_inner_microstep: 1660.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 10:18:53,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.91 | bwd_microstep: 1459.38 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 10:18:54,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1397.59 | bwd_inner_microstep: 1397.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 10:18:57,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.08 | bwd_microstep: 1666.72 | bwd_inner_microstep: 1666.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 10:18:59,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1560.48 | bwd_inner_microstep: 1560.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 10:19:01,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.36 | bwd_microstep: 1613.96 | bwd_inner_microstep: 1613.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2080
[2024-06-10 10:19:02,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.28 | bwd_microstep: 882.99 | bwd_inner_microstep: 882.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2289
[2024-06-10 10:19:04,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.63 | bwd_microstep: 1013.07 | bwd_inner_microstep: 1012.92 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.25
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1933
[2024-06-10 10:19:05,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.91 | bwd_microstep: 762.65 | bwd_inner_microstep: 762.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 10:19:07,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1414.46 | bwd_inner_microstep: 1414.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3599
[2024-06-10 10:19:09,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.07 | bwd_microstep: 1649.02 | bwd_inner_microstep: 1648.78 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.15
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3783
[2024-06-10 10:19:11,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.48 | bwd_microstep: 1618.66 | bwd_inner_microstep: 1618.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 10:19:13,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.15 | bwd_microstep: 1498.49 | bwd_inner_microstep: 1498.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2682
[2024-06-10 10:19:15,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.20 | bwd_microstep: 1121.86 | bwd_inner_microstep: 1121.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3764
[2024-06-10 10:19:19,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.25 | optimizer_step: 6.58
[2024-06-10 10:19:19,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.36 | bwd_microstep: 3011.32 | bwd_inner_microstep: 1977.93 | bwd_allreduce_microstep: 1033.33 | step_microstep: 38.36
[2024-06-10 10:19:19,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16303.08 | bwd: 44945.28 | bwd_inner: 43910.58 | bwd_allreduce: 1033.80 | step: 40.44
{'loss': 1.2698, 'learning_rate': 3.16094751069125e-05, 'epoch': 0.32}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1929
[2024-06-10 10:19:20,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.18 | bwd_microstep: 878.87 | bwd_inner_microstep: 878.72 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 10:19:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1243.02 | bwd_inner_microstep: 1242.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 10:19:24,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.22 | bwd_microstep: 1553.95 | bwd_inner_microstep: 1553.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888
[2024-06-10 10:19:25,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.54 | bwd_microstep: 682.50 | bwd_inner_microstep: 682.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 10:19:27,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.11 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 10:19:29,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1381.91 | bwd_inner_microstep: 1381.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 10:19:31,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4186
[2024-06-10 10:19:33,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.33 | bwd_microstep: 1561.98 | bwd_inner_microstep: 1561.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 10:19:35,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1289.26 | bwd_inner_microstep: 1289.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 10:19:37,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1397.33 | bwd_inner_microstep: 1397.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3444
[2024-06-10 10:19:38,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1303.68 | bwd_inner_microstep: 1303.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 10:19:40,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.51 | bwd_microstep: 966.98 | bwd_inner_microstep: 966.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 10:19:42,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1447.76 | bwd_inner_microstep: 1447.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 10:19:43,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.48 | bwd_inner_microstep: 1150.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3617
[2024-06-10 10:19:46,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.30 | bwd_microstep: 1709.90 | bwd_inner_microstep: 1709.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1984
[2024-06-10 10:19:47,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.47 | bwd_microstep: 858.29 | bwd_inner_microstep: 858.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3531
[2024-06-10 10:19:49,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1419.80 | bwd_inner_microstep: 1419.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3523
[2024-06-10 10:19:51,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1522.33 | bwd_inner_microstep: 1522.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 10:19:52,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.53 | bwd_microstep: 803.87 | bwd_inner_microstep: 803.70 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.23
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 10:19:54,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 1376.31 | bwd_inner_microstep: 1376.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-10 10:19:56,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.80 | bwd_microstep: 1445.24 | bwd_inner_microstep: 1445.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 10:19:58,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.78 | bwd_microstep: 1506.65 | bwd_inner_microstep: 1506.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 10:20:00,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1410.75 | bwd_inner_microstep: 1410.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 10:20:02,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1398.16 | bwd_inner_microstep: 1398.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 10:20:04,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.93 | bwd_microstep: 1658.90 | bwd_inner_microstep: 1658.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 10:20:06,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.75 | bwd_microstep: 1631.15 | bwd_inner_microstep: 1631.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3552
[2024-06-10 10:20:09,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.57 | bwd_microstep: 1589.47 | bwd_inner_microstep: 1589.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2241
[2024-06-10 10:20:10,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.45 | bwd_microstep: 966.68 | bwd_inner_microstep: 966.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3593
[2024-06-10 10:20:12,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.48 | bwd_microstep: 1703.73 | bwd_inner_microstep: 1703.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3271
[2024-06-10 10:20:14,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1417.27 | bwd_inner_microstep: 1417.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 10:20:17,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.75 | bwd_microstep: 1644.49 | bwd_inner_microstep: 1644.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3591
[2024-06-10 10:20:21,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.13 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 10:20:21,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.00 | bwd_microstep: 3279.34 | bwd_inner_microstep: 1928.31 | bwd_allreduce_microstep: 1350.98 | step_microstep: 38.61
[2024-06-10 10:20:21,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.03 | bwd: 45238.21 | bwd_inner: 43886.05 | bwd_allreduce: 1351.34 | step: 40.62
{'loss': 1.2306, 'learning_rate': 3.157889156484604e-05, 'epoch': 0.32}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 10:20:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.16 | bwd_microstep: 1339.45 | bwd_inner_microstep: 1339.37 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3899
[2024-06-10 10:20:25,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.62 | bwd_microstep: 1685.30 | bwd_inner_microstep: 1685.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478
[2024-06-10 10:20:27,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.11 | bwd_microstep: 1451.32 | bwd_inner_microstep: 1451.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 10:20:29,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1342.50 | bwd_inner_microstep: 1342.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 10:20:30,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.06 | bwd_microstep: 1377.82 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 10:20:31,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.69 | bwd_microstep: 702.41 | bwd_inner_microstep: 702.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 10:20:33,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.53 | bwd_microstep: 1152.15 | bwd_inner_microstep: 1152.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 10:20:35,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1387.05 | bwd_inner_microstep: 1387.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 10:20:37,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.67 | bwd_microstep: 1536.92 | bwd_inner_microstep: 1536.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3741
[2024-06-10 10:20:39,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.68 | bwd_microstep: 1561.54 | bwd_inner_microstep: 1561.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-10 10:20:40,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.99 | bwd_microstep: 914.46 | bwd_inner_microstep: 914.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 10:20:42,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1384.38 | bwd_inner_microstep: 1384.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 10:20:45,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1605.03 | bwd_inner_microstep: 1605.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3975
[2024-06-10 10:20:47,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.18 | bwd_microstep: 1701.71 | bwd_inner_microstep: 1701.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 10:20:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1607.16 | bwd_inner_microstep: 1607.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3654
[2024-06-10 10:20:51,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.10 | bwd_microstep: 1716.74 | bwd_inner_microstep: 1716.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1985
[2024-06-10 10:20:53,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.63 | bwd_microstep: 849.28 | bwd_inner_microstep: 849.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 10:20:55,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1396.25 | bwd_inner_microstep: 1396.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3631
[2024-06-10 10:20:57,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.47 | bwd_microstep: 1711.64 | bwd_inner_microstep: 1711.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-10 10:20:59,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1503.77 | bwd_inner_microstep: 1503.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3523
[2024-06-10 10:21:01,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.49 | bwd_microstep: 1199.29 | bwd_inner_microstep: 1199.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 10:21:03,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.30 | bwd_microstep: 1661.45 | bwd_inner_microstep: 1661.21 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 10:21:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.32 | bwd_microstep: 1394.92 | bwd_inner_microstep: 1394.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 10:21:06,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.73 | bwd_microstep: 809.04 | bwd_inner_microstep: 809.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822
[2024-06-10 10:21:08,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.62 | bwd_microstep: 1587.49 | bwd_inner_microstep: 1587.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 10:21:10,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1554.81 | bwd_inner_microstep: 1554.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-10 10:21:12,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1435.80 | bwd_inner_microstep: 1435.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 10:21:14,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1511.31 | bwd_inner_microstep: 1511.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3811
[2024-06-10 10:21:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.83 | bwd_microstep: 1308.10 | bwd_inner_microstep: 1307.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818
[2024-06-10 10:21:19,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.86 | bwd_microstep: 1717.24 | bwd_inner_microstep: 1717.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2282
[2024-06-10 10:21:20,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.81 | bwd_microstep: 1009.18 | bwd_inner_microstep: 1009.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549
[2024-06-10 10:21:22,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 10:21:22,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.88 | bwd_microstep: 1480.87 | bwd_inner_microstep: 1473.18 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.60
[2024-06-10 10:21:22,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16616.74 | bwd: 44596.45 | bwd_inner: 44586.92 | bwd_allreduce: 8.71 | step: 39.40
{'loss': 1.2873, 'learning_rate': 3.154826724198368e-05, 'epoch': 0.33}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3384
[2024-06-10 10:21:24,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.51 | bwd_microstep: 1272.42 | bwd_inner_microstep: 1272.27 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 10:21:26,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.99 | bwd_microstep: 1245.18 | bwd_inner_microstep: 1245.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3926
[2024-06-10 10:21:27,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.59 | bwd_microstep: 1300.03 | bwd_inner_microstep: 1300.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831
[2024-06-10 10:21:29,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.29 | bwd_microstep: 1387.19 | bwd_inner_microstep: 1387.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 10:21:31,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.35 | bwd_microstep: 1478.96 | bwd_inner_microstep: 1478.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3544
[2024-06-10 10:21:33,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.87 | bwd_microstep: 1262.19 | bwd_inner_microstep: 1262.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 10:21:35,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1386.64 | bwd_inner_microstep: 1386.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3772
[2024-06-10 10:21:37,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1308.64 | bwd_inner_microstep: 1308.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.19
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2213
[2024-06-10 10:21:38,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.67 | bwd_microstep: 924.76 | bwd_inner_microstep: 924.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 10:21:40,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.79 | bwd_microstep: 1436.14 | bwd_inner_microstep: 1436.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739
[2024-06-10 10:21:42,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.10 | bwd_microstep: 1636.83 | bwd_inner_microstep: 1636.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 10:21:44,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1250.47 | bwd_inner_microstep: 1250.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3428
[2024-06-10 10:21:46,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.68 | bwd_microstep: 1218.03 | bwd_inner_microstep: 1218.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1962
[2024-06-10 10:21:47,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.44 | bwd_microstep: 813.52 | bwd_inner_microstep: 813.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3727
[2024-06-10 10:21:49,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.87 | bwd_microstep: 1681.26 | bwd_inner_microstep: 1681.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 10:21:51,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.14 | bwd_microstep: 1477.47 | bwd_inner_microstep: 1477.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 10:21:53,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1493.38 | bwd_inner_microstep: 1493.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 10:21:55,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.23 | bwd_microstep: 1387.81 | bwd_inner_microstep: 1387.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036
[2024-06-10 10:21:56,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.53 | bwd_microstep: 812.34 | bwd_inner_microstep: 812.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 10:21:58,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078
[2024-06-10 10:21:59,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.97 | bwd_microstep: 818.02 | bwd_inner_microstep: 817.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 10:22:01,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1403.84 | bwd_inner_microstep: 1403.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1973
[2024-06-10 10:22:02,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.57 | bwd_microstep: 857.84 | bwd_inner_microstep: 857.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3701
[2024-06-10 10:22:05,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.14 | bwd_microstep: 1592.25 | bwd_inner_microstep: 1592.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 10:22:07,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1380.95 | bwd_inner_microstep: 1380.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3806
[2024-06-10 10:22:08,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1323.24 | bwd_inner_microstep: 1323.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 10:22:10,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1509.33 | bwd_inner_microstep: 1509.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3805
[2024-06-10 10:22:12,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1358.77 | bwd_inner_microstep: 1358.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 10:22:14,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.27 | bwd_microstep: 1546.99 | bwd_inner_microstep: 1546.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 10:22:16,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.91 | bwd_microstep: 1440.89 | bwd_inner_microstep: 1440.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 10:22:19,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.34 | bwd_microstep: 1651.25 | bwd_inner_microstep: 1651.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 10:22:23,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.62
[2024-06-10 10:22:23,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.33 | bwd_microstep: 3352.89 | bwd_inner_microstep: 1574.22 | bwd_allreduce_microstep: 1778.61 | step_microstep: 38.39
[2024-06-10 10:22:23,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15898.53 | bwd: 44295.77 | bwd_inner: 42516.13 | bwd_allreduce: 1778.89 | step: 42.45
{'loss': 1.3024, 'learning_rate': 3.151760224618413e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 10:22:25,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1466.15 | bwd_inner_microstep: 1465.99 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.15
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3914
[2024-06-10 10:22:27,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1355.39 | bwd_inner_microstep: 1355.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 10:22:28,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.16 | bwd_microstep: 1195.16 | bwd_inner_microstep: 1195.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 10:22:30,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1285.24 | bwd_inner_microstep: 1285.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 10:22:32,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.36 | bwd_microstep: 1445.02 | bwd_inner_microstep: 1444.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2065
[2024-06-10 10:22:33,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.88 | bwd_microstep: 877.23 | bwd_inner_microstep: 877.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 10:22:35,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1289.30 | bwd_inner_microstep: 1289.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883
[2024-06-10 10:22:36,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.72 | bwd_microstep: 688.08 | bwd_inner_microstep: 688.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 10:22:38,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 10:22:40,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1251.35 | bwd_inner_microstep: 1251.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912
[2024-06-10 10:22:41,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.01 | bwd_microstep: 713.83 | bwd_inner_microstep: 713.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3450
[2024-06-10 10:22:42,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.67 | bwd_microstep: 1195.15 | bwd_inner_microstep: 1195.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 10:22:44,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 10:22:46,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1405.31 | bwd_inner_microstep: 1405.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 10:22:48,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1582.93 | bwd_inner_microstep: 1582.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 10:22:50,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1480.76 | bwd_inner_microstep: 1480.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 10:22:52,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.01 | bwd_microstep: 1476.84 | bwd_inner_microstep: 1476.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 10:22:54,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1448.99 | bwd_inner_microstep: 1448.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3506
[2024-06-10 10:22:57,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.55 | bwd_microstep: 1685.17 | bwd_inner_microstep: 1685.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 10:22:59,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.85 | bwd_microstep: 1355.79 | bwd_inner_microstep: 1355.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3518
[2024-06-10 10:23:01,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.17 | bwd_microstep: 1553.35 | bwd_inner_microstep: 1553.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 10:23:02,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.06 | bwd_microstep: 1286.47 | bwd_inner_microstep: 1286.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3701
[2024-06-10 10:23:04,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1333.13 | bwd_inner_microstep: 1333.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2273
[2024-06-10 10:23:05,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.92 | bwd_microstep: 812.87 | bwd_inner_microstep: 812.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 10:23:08,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1565.98 | bwd_inner_microstep: 1565.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 10:23:09,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.50 | bwd_microstep: 1186.71 | bwd_inner_microstep: 1186.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 10:23:11,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1536.71 | bwd_inner_microstep: 1536.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 10:23:14,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1558.99 | bwd_inner_microstep: 1558.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564
[2024-06-10 10:23:15,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.50 | bwd_microstep: 1205.38 | bwd_inner_microstep: 1205.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2042
[2024-06-10 10:23:17,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.84 | bwd_microstep: 1000.92 | bwd_inner_microstep: 1000.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2238
[2024-06-10 10:23:18,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.97 | bwd_microstep: 1060.15 | bwd_inner_microstep: 1060.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035
[2024-06-10 10:23:26,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.35 | optimizer_step: 6.58
[2024-06-10 10:23:26,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.57 | bwd_microstep: 7724.95 | bwd_inner_microstep: 934.57 | bwd_allreduce_microstep: 6790.30 | step_microstep: 38.89
[2024-06-10 10:23:26,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15326.94 | bwd: 47809.57 | bwd_inner: 41018.21 | bwd_allreduce: 6790.61 | step: 40.52
{'loss': 1.2445, 'learning_rate': 3.1486896685449345e-05, 'epoch': 0.33}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2051
[2024-06-10 10:23:27,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.19 | bwd_microstep: 839.43 | bwd_inner_microstep: 839.28 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 10:23:29,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1245.19 | bwd_inner_microstep: 1245.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3922
[2024-06-10 10:23:31,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.78 | bwd_microstep: 1696.35 | bwd_inner_microstep: 1696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 10:23:34,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.55 | bwd_microstep: 1642.45 | bwd_inner_microstep: 1642.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2249
[2024-06-10 10:23:35,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.87 | bwd_microstep: 966.54 | bwd_inner_microstep: 966.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-10 10:23:37,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.98 | bwd_microstep: 1530.16 | bwd_inner_microstep: 1530.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 10:23:39,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.41 | bwd_microstep: 1276.67 | bwd_inner_microstep: 1276.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 10:23:41,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1385.04 | bwd_inner_microstep: 1385.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 10:23:43,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.38 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-10 10:23:44,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.69 | bwd_microstep: 817.63 | bwd_inner_microstep: 817.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3691
[2024-06-10 10:23:46,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1485.11 | bwd_inner_microstep: 1485.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486
[2024-06-10 10:23:47,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.57 | bwd_microstep: 1224.95 | bwd_inner_microstep: 1224.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 10:23:49,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.70 | bwd_microstep: 1347.57 | bwd_inner_microstep: 1347.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 10:23:51,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.99 | bwd_microstep: 1514.45 | bwd_inner_microstep: 1514.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 10:23:53,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.12 | bwd_microstep: 1248.14 | bwd_inner_microstep: 1248.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 10:23:55,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1415.07 | bwd_inner_microstep: 1415.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 10:23:57,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.06 | bwd_microstep: 1283.29 | bwd_inner_microstep: 1283.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 10:23:59,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1416.10 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3509
[2024-06-10 10:24:01,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.18 | bwd_microstep: 1253.98 | bwd_inner_microstep: 1253.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2944
[2024-06-10 10:24:02,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.60 | bwd_microstep: 1099.27 | bwd_inner_microstep: 1099.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-10 10:24:03,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.17 | bwd_microstep: 702.74 | bwd_inner_microstep: 702.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 10:24:05,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.39 | bwd_microstep: 1557.23 | bwd_inner_microstep: 1557.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 10:24:07,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-10 10:24:08,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.23 | bwd_microstep: 712.27 | bwd_inner_microstep: 712.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 10:24:10,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.02 | bwd_microstep: 1491.18 | bwd_inner_microstep: 1491.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2251
[2024-06-10 10:24:11,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.17 | bwd_microstep: 971.27 | bwd_inner_microstep: 971.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3541
[2024-06-10 10:24:13,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.63 | bwd_microstep: 1324.76 | bwd_inner_microstep: 1324.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3438
[2024-06-10 10:24:15,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.10 | bwd_microstep: 1371.58 | bwd_inner_microstep: 1371.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2087
[2024-06-10 10:24:16,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.10 | bwd_microstep: 977.23 | bwd_inner_microstep: 977.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 10:24:19,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1507.74 | bwd_inner_microstep: 1507.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 10:24:21,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.13 | bwd_microstep: 1544.14 | bwd_inner_microstep: 1544.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982
[2024-06-10 10:24:27,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.37 | optimizer_step: 6.59
[2024-06-10 10:24:27,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.38 | bwd_microstep: 5945.32 | bwd_inner_microstep: 937.89 | bwd_allreduce_microstep: 5007.36 | step_microstep: 39.09
[2024-06-10 10:24:27,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15115.09 | bwd: 45361.97 | bwd_inner: 40353.55 | bwd_allreduce: 5007.65 | step: 41.00
t]
 32%|███▏      | 559/1726 [9:41:55<19:53:45, 61.38s/it]


 32%|███▏      | 559/1726 [9:41:55<19:53:45, 61.38s/it]
 32%|███▏      | 560/1726 [9:42:57<19:55:54, 61.54s/it]


 32%|███▏      | 560/1726 [9:42:57<19:55:54, 61.54s/it]
 33%|███▎      | 561/1726 [9:43:59<19:55:05, 61.55s/it]


 33%|███▎      | 561/1726 [9:43:59<19:55:05, 61.55s/it]
 33%|███▎      | 562/1726 [9:44:59<19:48:15, 61.25s/it]


 33%|███▎      | 562/1726 [9:44:59<19:48:15, 61.25s/it]
 33%|███▎      | 563/1726 [9:46:03<20:00:14, 61.92s/it]


 33%|███▎      | 563/1726 [9:46:03<20:00:14, 61.92s/it]
 33%|███▎      | 564/1726 [9:47:04<19:52:57, 61.60s/it]
                                {'loss': 1.3033, 'learning_rate': 3.1456150667924146e-05, 'epoch': 0.33}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3420
[2024-06-10 10:24:29,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.41 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 10:24:31,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.34 | bwd_microstep: 1375.73 | bwd_inner_microstep: 1375.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 10:24:33,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3859
[2024-06-10 10:24:35,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.97 | bwd_microstep: 1661.30 | bwd_inner_microstep: 1661.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 10:24:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2909
[2024-06-10 10:24:38,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.28 | bwd_microstep: 999.36 | bwd_inner_microstep: 999.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 10:24:39,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 813.60 | bwd_inner_microstep: 813.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-10 10:24:40,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.72 | bwd_microstep: 684.13 | bwd_inner_microstep: 684.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 10:24:42,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1417.25 | bwd_inner_microstep: 1417.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 10:24:44,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1419.93 | bwd_inner_microstep: 1419.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 10:24:46,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1353.18 | bwd_inner_microstep: 1353.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3680
[2024-06-10 10:24:48,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1491.06 | bwd_inner_microstep: 1491.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3516
[2024-06-10 10:24:50,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1521.66 | bwd_inner_microstep: 1521.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3399
[2024-06-10 10:24:52,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.92 | bwd_microstep: 1179.89 | bwd_inner_microstep: 1179.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2413
[2024-06-10 10:24:53,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.58 | bwd_microstep: 954.81 | bwd_inner_microstep: 954.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641
[2024-06-10 10:24:55,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.80 | bwd_microstep: 1576.49 | bwd_inner_microstep: 1576.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 10:24:57,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1454.53 | bwd_inner_microstep: 1454.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 10:24:59,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.52 | bwd_microstep: 1606.63 | bwd_inner_microstep: 1606.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 10:25:02,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.36 | bwd_microstep: 1522.35 | bwd_inner_microstep: 1522.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2691
[2024-06-10 10:25:03,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.14 | bwd_microstep: 1097.24 | bwd_inner_microstep: 1097.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-10 10:25:05,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.70 | bwd_microstep: 1312.24 | bwd_inner_microstep: 1312.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 10:25:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.60 | bwd_microstep: 1648.77 | bwd_inner_microstep: 1648.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3727
[2024-06-10 10:25:09,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.77 | bwd_microstep: 1402.87 | bwd_inner_microstep: 1402.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 10:25:11,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1477.57 | bwd_inner_microstep: 1477.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-10 10:25:14,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.97 | bwd_microstep: 1708.74 | bwd_inner_microstep: 1708.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 10:25:16,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1534.87 | bwd_inner_microstep: 1534.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2019
[2024-06-10 10:25:17,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.20 | bwd_microstep: 717.56 | bwd_inner_microstep: 717.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 10:25:19,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.96 | bwd_microstep: 1662.75 | bwd_inner_microstep: 1662.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 10:25:21,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.52 | bwd_microstep: 1459.25 | bwd_inner_microstep: 1459.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 10:25:23,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.96 | bwd_microstep: 1559.11 | bwd_inner_microstep: 1559.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2055
[2024-06-10 10:25:24,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.84 | bwd_microstep: 724.67 | bwd_inner_microstep: 724.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 10:25:28,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 10:25:28,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.10 | bwd_microstep: 3181.05 | bwd_inner_microstep: 1421.94 | bwd_allreduce_microstep: 1759.05 | step_microstep: 38.03
[2024-06-10 10:25:28,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15912.75 | bwd: 44550.27 | bwd_inner: 42790.26 | bwd_allreduce: 1759.31 | step: 40.01
{'loss': 1.2629, 'learning_rate': 3.142536430189585e-05, 'epoch': 0.33}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 10:25:30,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1248.41 | bwd_inner_microstep: 1248.20 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 10:25:31,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.10 | bwd_microstep: 1256.08 | bwd_inner_microstep: 1256.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 10:25:33,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.37 | bwd_microstep: 1280.20 | bwd_inner_microstep: 1280.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 10:25:35,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1481.25 | bwd_inner_microstep: 1481.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3780
[2024-06-10 10:25:37,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.33 | bwd_microstep: 1313.08 | bwd_inner_microstep: 1313.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 10:25:39,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.94 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 10:25:41,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1541.46 | bwd_inner_microstep: 1541.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1873
[2024-06-10 10:25:42,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.26 | bwd_microstep: 710.15 | bwd_inner_microstep: 710.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 10:25:44,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.46 | bwd_microstep: 1187.45 | bwd_inner_microstep: 1187.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2146
[2024-06-10 10:25:45,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.69 | bwd_microstep: 850.77 | bwd_inner_microstep: 850.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 10:25:47,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.94 | bwd_microstep: 1613.33 | bwd_inner_microstep: 1613.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 10:25:49,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.43 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 10:25:51,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1409.44 | bwd_inner_microstep: 1409.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 10:25:53,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1346.54 | bwd_inner_microstep: 1346.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3569
[2024-06-10 10:25:55,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.23 | bwd_microstep: 1333.06 | bwd_inner_microstep: 1333.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 10:25:56,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.90 | bwd_microstep: 806.57 | bwd_inner_microstep: 806.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3836
[2024-06-10 10:25:58,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1439.44 | bwd_inner_microstep: 1439.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635
[2024-06-10 10:26:00,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1409.67 | bwd_inner_microstep: 1409.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 10:26:02,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.22 | bwd_microstep: 1462.54 | bwd_inner_microstep: 1462.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1993
[2024-06-10 10:26:03,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.36 | bwd_microstep: 739.74 | bwd_inner_microstep: 739.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 10:26:05,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.34 | bwd_microstep: 1623.70 | bwd_inner_microstep: 1623.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 10:26:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1254.26 | bwd_inner_microstep: 1254.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2073
[2024-06-10 10:26:08,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.34 | bwd_microstep: 976.14 | bwd_inner_microstep: 976.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2164
[2024-06-10 10:26:10,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.02 | bwd_microstep: 951.15 | bwd_inner_microstep: 951.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3718
[2024-06-10 10:26:11,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.95 | bwd_microstep: 1240.56 | bwd_inner_microstep: 1240.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-10 10:26:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.18 | bwd_microstep: 973.91 | bwd_inner_microstep: 973.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2153
[2024-06-10 10:26:14,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.56 | bwd_microstep: 805.17 | bwd_inner_microstep: 805.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 10:26:16,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1280.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580
[2024-06-10 10:26:18,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.94 | bwd_microstep: 1528.24 | bwd_inner_microstep: 1528.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 10:26:20,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.06 | bwd_microstep: 1552.72 | bwd_inner_microstep: 1552.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599
[2024-06-10 10:26:22,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.25 | bwd_microstep: 1703.98 | bwd_inner_microstep: 1703.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-10 10:26:29,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.55 | optimizer_gradients: 4.37 | optimizer_step: 6.63
[2024-06-10 10:26:29,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.90 | bwd_microstep: 6642.02 | bwd_inner_microstep: 1612.82 | bwd_allreduce_microstep: 5029.14 | step_microstep: 41.84
[2024-06-10 10:26:29,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15279.15 | bwd: 45985.18 | bwd_inner: 40954.96 | bwd_allreduce: 5029.46 | step: 43.58
{'loss': 1.2643, 'learning_rate': 3.139453769579387e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 10:26:31,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.12 | bwd_microstep: 1442.51 | bwd_inner_microstep: 1442.36 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3947
[2024-06-10 10:26:34,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.40 | bwd_microstep: 1694.01 | bwd_inner_microstep: 1693.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2010
[2024-06-10 10:26:35,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.37 | bwd_microstep: 739.56 | bwd_inner_microstep: 739.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3791
[2024-06-10 10:26:37,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.83 | bwd_microstep: 1348.46 | bwd_inner_microstep: 1348.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 10:26:38,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.89 | bwd_microstep: 679.32 | bwd_inner_microstep: 679.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 10:26:40,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 10:26:42,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.05 | bwd_microstep: 1534.22 | bwd_inner_microstep: 1534.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 10:26:44,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 10:26:45,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.76 | bwd_microstep: 1377.86 | bwd_inner_microstep: 1377.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 10:26:47,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.42 | bwd_microstep: 1388.04 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3610
[2024-06-10 10:26:49,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.81 | bwd_microstep: 1314.52 | bwd_inner_microstep: 1314.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 10:26:51,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.89 | bwd_microstep: 1301.61 | bwd_inner_microstep: 1301.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 10:26:53,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1348.87 | bwd_inner_microstep: 1348.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655
[2024-06-10 10:26:55,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1591.79 | bwd_inner_microstep: 1591.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3504
[2024-06-10 10:26:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1516.11 | bwd_inner_microstep: 1516.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 10:26:58,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 790.11 | bwd_inner_microstep: 790.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3628
[2024-06-10 10:27:00,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.43 | bwd_microstep: 1472.28 | bwd_inner_microstep: 1472.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 10:27:02,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1513.97 | bwd_inner_microstep: 1513.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 10:27:04,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.07 | bwd_microstep: 1353.17 | bwd_inner_microstep: 1353.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 10:27:06,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.54 | bwd_microstep: 1539.81 | bwd_inner_microstep: 1539.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 10:27:08,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.86 | bwd_microstep: 1351.18 | bwd_inner_microstep: 1351.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2761
[2024-06-10 10:27:10,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.80 | bwd_microstep: 1144.78 | bwd_inner_microstep: 1144.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 10:27:12,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.12 | bwd_microstep: 1514.60 | bwd_inner_microstep: 1514.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591
[2024-06-10 10:27:14,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1468.66 | bwd_inner_microstep: 1468.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 10:27:16,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1376.55 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2010
[2024-06-10 10:27:17,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.68 | bwd_microstep: 863.64 | bwd_inner_microstep: 863.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704
[2024-06-10 10:27:19,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.63 | bwd_microstep: 1625.77 | bwd_inner_microstep: 1625.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 10:27:21,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.83 | bwd_microstep: 1457.34 | bwd_inner_microstep: 1457.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-10 10:27:23,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.75 | bwd_microstep: 1585.00 | bwd_inner_microstep: 1584.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 10:27:26,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.07 | bwd_microstep: 1551.17 | bwd_inner_microstep: 1551.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2596
[2024-06-10 10:27:27,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.84 | bwd_microstep: 1094.96 | bwd_inner_microstep: 1094.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 10:27:31,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.27 | optimizer_step: 6.60
[2024-06-10 10:27:31,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.16 | bwd_microstep: 3196.36 | bwd_inner_microstep: 1666.93 | bwd_allreduce_microstep: 1529.37 | step_microstep: 39.33
[2024-06-10 10:27:31,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16141.75 | bwd: 44952.92 | bwd_inner: 43422.52 | bwd_allreduce: 1529.66 | step: 41.18
{'loss': 1.265, 'learning_rate': 3.136367095818937e-05, 'epoch': 0.33}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 10:27:33,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1368.40 | bwd_inner_microstep: 1368.29 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 10:27:35,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1251.86 | bwd_inner_microstep: 1251.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 10:27:37,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.69 | bwd_microstep: 1650.29 | bwd_inner_microstep: 1650.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2228
[2024-06-10 10:27:38,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.28 | bwd_microstep: 864.91 | bwd_inner_microstep: 864.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 10:27:40,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1286.20 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 10:27:42,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1282.43 | bwd_inner_microstep: 1282.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 10:27:44,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.58 | bwd_microstep: 1635.04 | bwd_inner_microstep: 1635.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 10:27:45,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 793.54 | bwd_inner_microstep: 793.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2194
[2024-06-10 10:27:46,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.03 | bwd_microstep: 920.67 | bwd_inner_microstep: 920.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 10:27:48,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.89 | bwd_microstep: 1522.33 | bwd_inner_microstep: 1522.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 10:27:50,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.11 | bwd_microstep: 1520.24 | bwd_inner_microstep: 1520.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 10:27:52,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.02 | bwd_microstep: 1513.78 | bwd_inner_microstep: 1513.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 10:27:55,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.53 | bwd_microstep: 1627.45 | bwd_inner_microstep: 1627.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 10:27:57,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.82 | bwd_microstep: 1611.92 | bwd_inner_microstep: 1611.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-10 10:27:59,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.41 | bwd_microstep: 1313.34 | bwd_inner_microstep: 1313.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 10:28:01,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1515.70 | bwd_inner_microstep: 1515.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 10:28:02,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.94 | bwd_microstep: 794.47 | bwd_inner_microstep: 794.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 10:28:04,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1415.70 | bwd_inner_microstep: 1415.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506
[2024-06-10 10:28:06,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.63 | bwd_microstep: 1319.22 | bwd_inner_microstep: 1319.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 10:28:07,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 10:28:09,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.64 | bwd_microstep: 1433.34 | bwd_inner_microstep: 1433.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3828
[2024-06-10 10:28:12,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.83 | bwd_microstep: 1752.53 | bwd_inner_microstep: 1752.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2000
[2024-06-10 10:28:13,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.38 | bwd_microstep: 850.47 | bwd_inner_microstep: 850.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 10:28:15,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.01 | bwd_microstep: 1290.25 | bwd_inner_microstep: 1290.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723
[2024-06-10 10:28:17,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.43 | bwd_microstep: 1339.62 | bwd_inner_microstep: 1339.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2425
[2024-06-10 10:28:18,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.19 | bwd_microstep: 1130.65 | bwd_inner_microstep: 1130.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 10:28:21,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.27 | bwd_microstep: 1649.33 | bwd_inner_microstep: 1649.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 10:28:23,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.31 | bwd_microstep: 1552.55 | bwd_inner_microstep: 1552.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 10:28:25,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.37 | bwd_microstep: 1628.61 | bwd_inner_microstep: 1628.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2634
[2024-06-10 10:28:26,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.03 | bwd_microstep: 1111.23 | bwd_inner_microstep: 1111.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3595
[2024-06-10 10:28:29,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.17 | bwd_microstep: 1565.91 | bwd_inner_microstep: 1565.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3109
[2024-06-10 10:28:35,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 10:28:35,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.98 | bwd_microstep: 5703.61 | bwd_inner_microstep: 1332.07 | bwd_allreduce_microstep: 4371.46 | step_microstep: 38.76
[2024-06-10 10:28:35,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16040.61 | bwd: 47499.71 | bwd_inner: 43127.22 | bwd_allreduce: 4371.77 | step: 40.35
{'loss': 1.3159, 'learning_rate': 3.1332764197794825e-05, 'epoch': 0.33}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-10 10:28:37,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1333.05 | bwd_inner_microstep: 1333.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 10:28:38,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.15 | bwd_microstep: 795.26 | bwd_inner_microstep: 795.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 10:28:40,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 1585.45 | bwd_inner_microstep: 1585.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 10:28:42,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1385.96 | bwd_inner_microstep: 1385.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403
[2024-06-10 10:28:44,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.79 | bwd_microstep: 1306.78 | bwd_inner_microstep: 1306.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2109
[2024-06-10 10:28:45,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.43 | bwd_microstep: 825.28 | bwd_inner_microstep: 825.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 10:28:46,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.81 | bwd_microstep: 1189.30 | bwd_inner_microstep: 1189.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 10:28:48,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1285.73 | bwd_inner_microstep: 1285.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 10:28:50,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.08 | bwd_microstep: 1286.36 | bwd_inner_microstep: 1286.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 10:28:52,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.71 | bwd_microstep: 1284.55 | bwd_inner_microstep: 1284.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3932
[2024-06-10 10:28:54,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1494.32 | bwd_inner_microstep: 1494.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3684
[2024-06-10 10:28:56,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.85 | bwd_microstep: 1466.59 | bwd_inner_microstep: 1466.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 10:28:58,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.38 | bwd_microstep: 1484.25 | bwd_inner_microstep: 1484.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 10:29:00,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.46 | bwd_microstep: 1289.54 | bwd_inner_microstep: 1289.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 10:29:02,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.02 | bwd_microstep: 1539.95 | bwd_inner_microstep: 1539.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2130
[2024-06-10 10:29:03,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.85 | bwd_microstep: 893.56 | bwd_inner_microstep: 893.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2338
[2024-06-10 10:29:04,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.19 | bwd_microstep: 984.49 | bwd_inner_microstep: 984.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3649
[2024-06-10 10:29:07,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.15 | bwd_microstep: 1683.99 | bwd_inner_microstep: 1683.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 10:29:09,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.62 | bwd_microstep: 1282.83 | bwd_inner_microstep: 1282.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 10:29:10,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.49 | bwd_microstep: 1298.03 | bwd_inner_microstep: 1298.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3739
[2024-06-10 10:29:13,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.24 | bwd_microstep: 1597.16 | bwd_inner_microstep: 1597.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668
[2024-06-10 10:29:15,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.76 | bwd_microstep: 1456.29 | bwd_inner_microstep: 1456.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 10:29:17,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1665.18 | bwd_inner_microstep: 1665.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3584
[2024-06-10 10:29:18,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.28 | bwd_microstep: 1205.94 | bwd_inner_microstep: 1205.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 10:29:21,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.14 | bwd_microstep: 1559.39 | bwd_inner_microstep: 1559.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 10:29:23,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.50 | bwd_microstep: 1403.88 | bwd_inner_microstep: 1403.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2229
[2024-06-10 10:29:24,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.71 | bwd_microstep: 866.74 | bwd_inner_microstep: 866.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 10:29:26,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1409.22 | bwd_inner_microstep: 1409.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 10:29:28,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1412.85 | bwd_inner_microstep: 1412.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 10:29:30,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.77 | bwd_microstep: 1559.22 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3821
[2024-06-10 10:29:32,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1482.03 | bwd_inner_microstep: 1482.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 10:29:35,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.61
[2024-06-10 10:29:35,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 2300.85 | bwd_inner_microstep: 1586.32 | bwd_allreduce_microstep: 714.47 | step_microstep: 39.74
[2024-06-10 10:29:35,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16012.81 | bwd: 43614.02 | bwd_inner: 42898.63 | bwd_allreduce: 714.71 | step: 41.45
{'loss': 1.2512, 'learning_rate': 3.130181752346369e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 10:29:37,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1471.40 | bwd_inner_microstep: 1471.24 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 10:29:39,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.42 | bwd_microstep: 1257.34 | bwd_inner_microstep: 1257.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2700
[2024-06-10 10:29:40,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.05 | bwd_microstep: 1128.99 | bwd_inner_microstep: 1128.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 10:29:42,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.27 | bwd_microstep: 1553.45 | bwd_inner_microstep: 1553.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 10:29:44,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1507.76 | bwd_inner_microstep: 1507.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 10:29:45,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.88 | bwd_microstep: 794.61 | bwd_inner_microstep: 794.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2335
[2024-06-10 10:29:47,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.03 | bwd_microstep: 985.43 | bwd_inner_microstep: 985.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420
[2024-06-10 10:29:49,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.43 | bwd_microstep: 1310.07 | bwd_inner_microstep: 1310.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 10:29:51,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1398.04 | bwd_inner_microstep: 1398.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 10:29:52,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1253.45 | bwd_inner_microstep: 1253.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 10:29:55,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.33 | bwd_microstep: 1628.90 | bwd_inner_microstep: 1628.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 10:29:56,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1285.24 | bwd_inner_microstep: 1285.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424
[2024-06-10 10:29:58,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.40 | bwd_microstep: 1403.88 | bwd_inner_microstep: 1403.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 10:30:00,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.84 | bwd_microstep: 1250.80 | bwd_inner_microstep: 1250.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 10:30:02,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.18 | bwd_microstep: 1244.52 | bwd_inner_microstep: 1244.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 10:30:04,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1383.84 | bwd_inner_microstep: 1383.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 10:30:05,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1285.78 | bwd_inner_microstep: 1285.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 10:30:07,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1410.12 | bwd_inner_microstep: 1410.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 10:30:09,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.83 | bwd_microstep: 1279.95 | bwd_inner_microstep: 1279.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 10:30:11,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.01 | bwd_microstep: 1565.04 | bwd_inner_microstep: 1565.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 10:30:14,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.63 | bwd_microstep: 1645.19 | bwd_inner_microstep: 1645.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-10 10:30:16,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1419.73 | bwd_inner_microstep: 1419.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3614
[2024-06-10 10:30:18,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.99 | bwd_microstep: 1470.65 | bwd_inner_microstep: 1470.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3834
[2024-06-10 10:30:20,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.06 | bwd_microstep: 1438.80 | bwd_inner_microstep: 1438.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 10:30:22,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.10 | bwd_microstep: 1513.24 | bwd_inner_microstep: 1513.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3544
[2024-06-10 10:30:24,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.05 | bwd_microstep: 1592.10 | bwd_inner_microstep: 1592.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2272
[2024-06-10 10:30:25,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.52 | bwd_microstep: 784.36 | bwd_inner_microstep: 784.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 10:30:27,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1254.05 | bwd_inner_microstep: 1254.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3543
[2024-06-10 10:30:29,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.94 | bwd_microstep: 1461.55 | bwd_inner_microstep: 1461.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3579
[2024-06-10 10:30:31,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.09 | bwd_microstep: 1628.02 | bwd_inner_microstep: 1627.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 10:30:33,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1550.89 | bwd_inner_microstep: 1550.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 10:30:36,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 10:30:36,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 2466.52 | bwd_inner_microstep: 1687.12 | bwd_allreduce_microstep: 779.35 | step_microstep: 37.83
[2024-06-10 10:30:36,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16351.64 | bwd: 44623.73 | bwd_inner: 43843.36 | bwd_allreduce: 779.64 | step: 39.86


 33%|███▎      | 564/1726 [9:47:04<19:52:57, 61.60s/it]
 33%|███▎      | 565/1726 [9:48:05<19:47:30, 61.37s/it]


 33%|███▎      | 565/1726 [9:48:05<19:47:30, 61.37s/it]
 33%|███▎      | 566/1726 [9:49:06<19:47:56, 61.45s/it]


 33%|███▎      | 566/1726 [9:49:06<19:47:56, 61.45s/it]
 33%|███▎      | 567/1726 [9:50:08<19:47:01, 61.45s/it]


 33%|███▎      | 567/1726 [9:50:08<19:47:01, 61.45s/it]
 33%|███▎      | 568/1726 [9:51:12<20:00:07, 62.18s/it]


 33%|███▎      | 568/1726 [9:51:12<20:00:07, 62.18s/it]
 33%|███▎      | 569/1726 [9:52:12<19:46:27, 61.53s/it]


 33%|███▎      | 569/1726 [9:52:12<19:46:27, 61.53s/it]
 33%|█�{'loss': 1.2896, 'learning_rate': 3.127083104418999e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 10:30:38,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1492.55 | bwd_inner_microstep: 1492.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2396
[2024-06-10 10:30:39,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.18 | bwd_microstep: 843.68 | bwd_inner_microstep: 843.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 10:30:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.33 | bwd_microstep: 1386.76 | bwd_inner_microstep: 1386.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 10:30:43,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.71 | bwd_microstep: 1278.11 | bwd_inner_microstep: 1278.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 10:30:45,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.86 | bwd_microstep: 1402.14 | bwd_inner_microstep: 1402.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415
[2024-06-10 10:30:47,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.67 | bwd_microstep: 1187.11 | bwd_inner_microstep: 1187.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2181
[2024-06-10 10:30:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.42 | bwd_microstep: 797.34 | bwd_inner_microstep: 797.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 10:30:49,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.02 | bwd_microstep: 799.81 | bwd_inner_microstep: 796.68 | bwd_allreduce_microstep: 3.01 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 10:30:50,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.61 | bwd_microstep: 791.98 | bwd_inner_microstep: 791.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 10:30:52,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 10:30:54,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1348.28 | bwd_inner_microstep: 1348.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3498
[2024-06-10 10:30:55,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1239.29 | bwd_inner_microstep: 1238.41 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 10:30:57,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.83 | bwd_microstep: 1477.92 | bwd_inner_microstep: 1477.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 10:31:00,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.60 | bwd_microstep: 1522.06 | bwd_inner_microstep: 1522.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3081
[2024-06-10 10:31:01,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.30 | bwd_microstep: 1245.94 | bwd_inner_microstep: 1245.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3823
[2024-06-10 10:31:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.17 | bwd_microstep: 1819.17 | bwd_inner_microstep: 1819.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 10:31:06,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.72 | bwd_microstep: 1475.37 | bwd_inner_microstep: 1475.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 10:31:08,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3636
[2024-06-10 10:31:10,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.75 | bwd_microstep: 1708.28 | bwd_inner_microstep: 1708.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 10:31:12,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.14 | bwd_microstep: 1599.18 | bwd_inner_microstep: 1599.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3449
[2024-06-10 10:31:14,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1402.15 | bwd_inner_microstep: 1402.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670
[2024-06-10 10:31:16,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.71 | bwd_microstep: 1460.99 | bwd_inner_microstep: 1460.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2983
[2024-06-10 10:31:18,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.02 | bwd_microstep: 1204.19 | bwd_inner_microstep: 1204.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3624
[2024-06-10 10:31:20,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.95 | bwd_microstep: 1474.96 | bwd_inner_microstep: 1474.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235
[2024-06-10 10:31:21,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.34 | bwd_microstep: 869.81 | bwd_inner_microstep: 869.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 10:31:23,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.49 | bwd_microstep: 979.33 | bwd_inner_microstep: 979.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3746
[2024-06-10 10:31:24,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.10 | bwd_microstep: 1250.20 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-10 10:31:26,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.83 | bwd_microstep: 1303.32 | bwd_inner_microstep: 1303.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 10:31:28,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1507.90 | bwd_inner_microstep: 1507.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3780
[2024-06-10 10:31:30,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.24 | bwd_microstep: 1352.22 | bwd_inner_microstep: 1352.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 10:31:32,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.60 | bwd_microstep: 1281.43 | bwd_inner_microstep: 1281.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 10:31:38,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.60
[2024-06-10 10:31:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.57 | bwd_microstep: 5381.51 | bwd_inner_microstep: 1636.06 | bwd_allreduce_microstep: 3745.39 | step_microstep: 38.77
[2024-06-10 10:31:38,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15678.00 | bwd: 45762.45 | bwd_inner: 42012.91 | bwd_allreduce: 3748.76 | step: 40.57
{'loss': 1.2838, 'learning_rate': 3.1239804869107943e-05, 'epoch': 0.33}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 10:31:40,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.05 | bwd_microstep: 1141.55 | bwd_inner_microstep: 1141.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894
[2024-06-10 10:31:42,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.68 | bwd_microstep: 1580.96 | bwd_inner_microstep: 1580.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 10:31:43,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.45 | bwd_microstep: 1152.94 | bwd_inner_microstep: 1152.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 10:31:45,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1486.07 | bwd_inner_microstep: 1486.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 10:31:47,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.99 | bwd_microstep: 1249.57 | bwd_inner_microstep: 1249.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 10:31:49,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1343.76 | bwd_inner_microstep: 1343.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3738
[2024-06-10 10:31:51,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 1632.59 | bwd_inner_microstep: 1632.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 10:31:53,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 10:31:55,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.18 | bwd_microstep: 1251.73 | bwd_inner_microstep: 1251.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 10:31:57,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1379.48 | bwd_inner_microstep: 1379.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 10:31:59,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.99 | bwd_microstep: 1520.27 | bwd_inner_microstep: 1520.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4117
[2024-06-10 10:32:01,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.77 | bwd_microstep: 1534.83 | bwd_inner_microstep: 1534.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 10:32:03,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.40 | bwd_microstep: 1252.68 | bwd_inner_microstep: 1252.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2147
[2024-06-10 10:32:04,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.92 | bwd_microstep: 909.10 | bwd_inner_microstep: 909.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654
[2024-06-10 10:32:06,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1456.12 | bwd_inner_microstep: 1456.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825
[2024-06-10 10:32:08,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.30 | bwd_microstep: 1586.62 | bwd_inner_microstep: 1586.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3691
[2024-06-10 10:32:10,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.09 | bwd_microstep: 1726.64 | bwd_inner_microstep: 1726.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3630
[2024-06-10 10:32:12,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.52 | bwd_microstep: 1250.95 | bwd_inner_microstep: 1250.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 10:32:14,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.29 | bwd_microstep: 1284.73 | bwd_inner_microstep: 1284.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-10 10:32:16,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1320.19 | bwd_inner_microstep: 1320.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 10:32:18,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 10:32:20,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.18 | bwd_microstep: 1357.54 | bwd_inner_microstep: 1357.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121
[2024-06-10 10:32:21,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.38 | bwd_microstep: 831.41 | bwd_inner_microstep: 831.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 10:32:23,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1380.11 | bwd_inner_microstep: 1380.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2750
[2024-06-10 10:32:24,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.38 | bwd_microstep: 1043.99 | bwd_inner_microstep: 1043.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 10:32:26,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1380.30 | bwd_inner_microstep: 1380.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 10:32:28,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.21 | bwd_microstep: 1604.38 | bwd_inner_microstep: 1604.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 10:32:30,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1393.73 | bwd_inner_microstep: 1393.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 10:32:32,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2054
[2024-06-10 10:32:33,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.23 | bwd_microstep: 881.17 | bwd_inner_microstep: 881.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 10:32:36,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.62 | bwd_microstep: 1604.02 | bwd_inner_microstep: 1603.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 10:32:40,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 10:32:40,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.32 | bwd_microstep: 3729.43 | bwd_inner_microstep: 1577.64 | bwd_allreduce_microstep: 2151.74 | step_microstep: 38.25
[2024-06-10 10:32:40,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16162.44 | bwd: 45409.31 | bwd_inner: 43256.67 | bwd_allreduce: 2151.97 | step: 39.87
{'loss': 1.2543, 'learning_rate': 3.1208739107491576e-05, 'epoch': 0.33}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 10:32:42,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.59 | bwd_microstep: 1332.88 | bwd_inner_microstep: 1332.75 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 10:32:43,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3859
[2024-06-10 10:32:45,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.65 | bwd_microstep: 1458.37 | bwd_inner_microstep: 1458.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851
[2024-06-10 10:32:48,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.05 | bwd_microstep: 1560.57 | bwd_inner_microstep: 1560.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 10:32:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1278.48 | bwd_inner_microstep: 1278.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 10:32:51,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.15 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-10 10:32:53,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.98 | bwd_microstep: 1309.71 | bwd_inner_microstep: 1309.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 10:32:55,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 10:32:57,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1288.22 | bwd_inner_microstep: 1288.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684
[2024-06-10 10:32:58,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.49 | bwd_microstep: 1327.29 | bwd_inner_microstep: 1327.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 10:33:00,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.06 | bwd_microstep: 1522.29 | bwd_inner_microstep: 1522.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3680
[2024-06-10 10:33:02,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.30 | bwd_microstep: 1324.86 | bwd_inner_microstep: 1324.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2123
[2024-06-10 10:33:03,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.42 | bwd_microstep: 859.59 | bwd_inner_microstep: 859.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887
[2024-06-10 10:33:04,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.67 | bwd_microstep: 684.26 | bwd_inner_microstep: 684.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 10:33:06,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.12 | bwd_microstep: 1285.22 | bwd_inner_microstep: 1285.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 10:33:08,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1382.75 | bwd_inner_microstep: 1382.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 10:33:10,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1398.03 | bwd_inner_microstep: 1398.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 10:33:12,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.83 | bwd_microstep: 1554.41 | bwd_inner_microstep: 1554.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 10:33:14,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.66 | bwd_microstep: 1455.51 | bwd_inner_microstep: 1455.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3834
[2024-06-10 10:33:16,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1480.91 | bwd_inner_microstep: 1480.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004
[2024-06-10 10:33:17,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.39 | bwd_microstep: 711.57 | bwd_inner_microstep: 711.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 10:33:19,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1493.07 | bwd_inner_microstep: 1493.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 10:33:20,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 803.24 | bwd_inner_microstep: 803.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075
[2024-06-10 10:33:22,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.66 | bwd_microstep: 918.52 | bwd_inner_microstep: 918.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 10:33:24,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.44 | bwd_microstep: 1658.45 | bwd_inner_microstep: 1658.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907
[2024-06-10 10:33:25,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.86 | bwd_microstep: 685.69 | bwd_inner_microstep: 685.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 10:33:26,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.74 | bwd_microstep: 881.34 | bwd_inner_microstep: 881.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 10:33:27,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.24 | bwd_microstep: 813.34 | bwd_inner_microstep: 813.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 10:33:29,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.91 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 10:33:31,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1349.76 | bwd_inner_microstep: 1349.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 10:33:33,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1593.28 | bwd_inner_microstep: 1593.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798
[2024-06-10 10:34:17,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 10:34:17,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.70 | bwd_microstep: 43192.42 | bwd_inner_microstep: 1873.57 | bwd_allreduce_microstep: 41318.78 | step_microstep: 39.09
[2024-06-10 10:34:17,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15089.02 | bwd: 81704.15 | bwd_inner: 40384.32 | bwd_allreduce: 41319.05 | step: 40.83
{'loss': 1.2753, 'learning_rate': 3.117763386875435e-05, 'epoch': 0.33}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3470
[2024-06-10 10:34:19,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.01 | bwd_microstep: 1496.88 | bwd_inner_microstep: 1496.80 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 10:34:21,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.63 | bwd_microstep: 1473.90 | bwd_inner_microstep: 1473.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3938
[2024-06-10 10:34:23,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.92 | bwd_microstep: 1484.09 | bwd_inner_microstep: 1484.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 10:34:25,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1338.35 | bwd_inner_microstep: 1338.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 10:34:27,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.17 | bwd_microstep: 1338.29 | bwd_inner_microstep: 1338.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3785
[2024-06-10 10:34:29,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.31 | bwd_microstep: 1743.54 | bwd_inner_microstep: 1743.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 10:34:31,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.06 | bwd_microstep: 1278.89 | bwd_inner_microstep: 1278.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3389
[2024-06-10 10:34:33,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.33 | bwd_microstep: 1175.51 | bwd_inner_microstep: 1175.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 10:34:34,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1286.17 | bwd_inner_microstep: 1286.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1906
[2024-06-10 10:34:36,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.96 | bwd_microstep: 810.03 | bwd_inner_microstep: 810.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3713
[2024-06-10 10:34:38,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.47 | bwd_microstep: 1657.16 | bwd_inner_microstep: 1657.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3607
[2024-06-10 10:34:40,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.90 | bwd_microstep: 1454.23 | bwd_inner_microstep: 1454.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1954
[2024-06-10 10:34:41,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.40 | bwd_microstep: 826.16 | bwd_inner_microstep: 826.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2111
[2024-06-10 10:34:42,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.85 | bwd_microstep: 801.28 | bwd_inner_microstep: 801.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955
[2024-06-10 10:34:43,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.49 | bwd_microstep: 827.68 | bwd_inner_microstep: 827.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 10:34:44,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.64 | bwd_microstep: 797.57 | bwd_inner_microstep: 797.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3850
[2024-06-10 10:34:47,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 680.70 | bwd_microstep: 1864.86 | bwd_inner_microstep: 1864.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3376
[2024-06-10 10:34:49,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.83 | bwd_microstep: 1336.23 | bwd_inner_microstep: 1336.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3681
[2024-06-10 10:34:52,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.67 | bwd_microstep: 2543.48 | bwd_inner_microstep: 2543.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3659
[2024-06-10 10:34:54,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.19 | bwd_microstep: 1355.36 | bwd_inner_microstep: 1355.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519
[2024-06-10 10:34:56,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.04 | bwd_microstep: 1322.27 | bwd_inner_microstep: 1322.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 10:34:58,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.95 | bwd_microstep: 1650.55 | bwd_inner_microstep: 1650.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 10:34:59,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.34 | bwd_microstep: 699.27 | bwd_inner_microstep: 699.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3547
[2024-06-10 10:35:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.13 | bwd_microstep: 1421.65 | bwd_inner_microstep: 1421.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 10:35:03,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1651.79 | bwd_inner_microstep: 1651.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 10:35:05,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.49 | bwd_microstep: 1298.48 | bwd_inner_microstep: 1298.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 10:35:07,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1282.24 | bwd_inner_microstep: 1282.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607
[2024-06-10 10:35:09,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.43 | bwd_microstep: 1429.84 | bwd_inner_microstep: 1429.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 10:35:10,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1257.35 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766
[2024-06-10 10:35:13,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.11 | bwd_microstep: 1608.55 | bwd_inner_microstep: 1608.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 10:35:15,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 1394.02 | bwd_inner_microstep: 1394.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 10:35:20,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.41 | optimizer_step: 6.61
[2024-06-10 10:35:20,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.97 | bwd_microstep: 4837.04 | bwd_inner_microstep: 1830.83 | bwd_allreduce_microstep: 3006.13 | step_microstep: 40.28
[2024-06-10 10:35:20,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15894.94 | bwd: 46742.78 | bwd_inner: 43735.54 | bwd_allreduce: 3006.44 | step: 42.12
{'loss': 1.3094, 'learning_rate': 3.114648926244873e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 10:35:22,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.37 | bwd_microstep: 1443.07 | bwd_inner_microstep: 1442.95 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 10:35:24,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.48 | bwd_microstep: 1257.51 | bwd_inner_microstep: 1257.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-10 10:35:26,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1644.55 | bwd_inner_microstep: 1644.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 10:35:28,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1399.30 | bwd_inner_microstep: 1399.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 10:35:30,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 10:35:32,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1381.12 | bwd_inner_microstep: 1381.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 10:35:33,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 789.62 | bwd_inner_microstep: 789.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-10 10:35:35,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.39 | bwd_microstep: 1543.17 | bwd_inner_microstep: 1543.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870
[2024-06-10 10:35:36,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.32 | bwd_microstep: 711.47 | bwd_inner_microstep: 711.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 10:35:38,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.46 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3407
[2024-06-10 10:35:40,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.15 | bwd_microstep: 1402.18 | bwd_inner_microstep: 1402.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-10 10:35:41,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.13 | bwd_microstep: 898.39 | bwd_inner_microstep: 898.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2163
[2024-06-10 10:35:42,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.39 | bwd_microstep: 856.61 | bwd_inner_microstep: 856.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 10:35:44,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 10:35:46,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.02 | bwd_microstep: 1298.70 | bwd_inner_microstep: 1298.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 10:35:48,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.91 | bwd_microstep: 1523.12 | bwd_inner_microstep: 1523.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 10:35:51,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1557.92 | bwd_inner_microstep: 1557.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 10:35:53,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.53 | bwd_microstep: 1454.67 | bwd_inner_microstep: 1454.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 10:35:54,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1410.02 | bwd_inner_microstep: 1409.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 10:35:57,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1514.17 | bwd_inner_microstep: 1514.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 10:35:58,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1302.76 | bwd_inner_microstep: 1302.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 10:36:01,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.14 | bwd_microstep: 1555.89 | bwd_inner_microstep: 1555.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2534
[2024-06-10 10:36:02,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.71 | bwd_microstep: 963.74 | bwd_inner_microstep: 963.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3811
[2024-06-10 10:36:04,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.78 | bwd_microstep: 1717.07 | bwd_inner_microstep: 1717.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 10:36:06,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1246.52 | bwd_inner_microstep: 1246.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 10:36:08,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.61 | bwd_microstep: 1647.74 | bwd_inner_microstep: 1647.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2064
[2024-06-10 10:36:10,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.08 | bwd_microstep: 1009.75 | bwd_inner_microstep: 1009.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 10:36:11,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.36 | bwd_microstep: 1302.52 | bwd_inner_microstep: 1302.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3769
[2024-06-10 10:36:13,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.90 | bwd_microstep: 1495.14 | bwd_inner_microstep: 1495.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 10:36:15,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1412.37 | bwd_inner_microstep: 1412.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 10:36:17,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.93 | bwd_microstep: 1285.62 | bwd_inner_microstep: 1285.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 10:36:20,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.30 | optimizer_step: 6.61
[2024-06-10 10:36:20,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.96 | bwd_microstep: 2600.01 | bwd_inner_microstep: 1842.36 | bwd_allreduce_microstep: 757.59 | step_microstep: 38.42
[2024-06-10 10:36:20,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16097.32 | bwd: 44017.62 | bwd_inner: 43259.00 | bwd_allreduce: 757.88 | step: 40.02
�█▎      | 570/1726 [9:53:13<19:44:22, 61.47s/it]


 33%|███▎      | 570/1726 [9:53:13<19:44:22, 61.47s/it]
 33%|███▎      | 571/1726 [9:54:15<19:45:13, 61.57s/it]


 33%|███▎      | 571/1726 [9:54:15<19:45:13, 61.57s/it]
 33%|███▎      | 572/1726 [9:55:17<19:46:14, 61.68s/it]


 33%|███▎      | 572/1726 [9:55:17<19:46:14, 61.68s/it]
 33%|███▎      | 573/1726 [9:56:54<23:09:41, 72.32s/it]


 33%|███▎      | 573/1726 [9:56:54<23:09:41, 72.32s/it]
 33%|███▎      | 574/1726 [9:57:57<22:14:47, 69.52s/it]


 33%|███▎      | 574/1726 [9:57:57<22:14:47, 69.52s/it]
 33%|███▎      | 575/1726 [9:58:57<21:21:36, 66.81s/it]
                                             {'loss': 1.2294, 'learning_rate': 3.111530539826588e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 10:36:22,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.54 | bwd_microstep: 1444.10 | bwd_inner_microstep: 1444.01 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 10:36:24,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1282.98 | bwd_inner_microstep: 1282.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 10:36:26,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1379.03 | bwd_inner_microstep: 1379.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 10:36:28,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.77 | bwd_microstep: 1552.22 | bwd_inner_microstep: 1552.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3788
[2024-06-10 10:36:30,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1382.59 | bwd_inner_microstep: 1382.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 10:36:32,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1287.11 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 10:36:34,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1384.76 | bwd_inner_microstep: 1384.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 10:36:36,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 10:36:38,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1441.21 | bwd_inner_microstep: 1441.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701
[2024-06-10 10:36:40,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.55 | bwd_microstep: 1628.81 | bwd_inner_microstep: 1628.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3495
[2024-06-10 10:36:42,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1545.73 | bwd_inner_microstep: 1545.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3502
[2024-06-10 10:36:44,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.23 | bwd_microstep: 1679.95 | bwd_inner_microstep: 1679.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3481
[2024-06-10 10:36:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1508.85 | bwd_inner_microstep: 1508.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1973
[2024-06-10 10:36:48,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.75 | bwd_microstep: 833.62 | bwd_inner_microstep: 833.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 10:36:50,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.85 | bwd_microstep: 1592.59 | bwd_inner_microstep: 1592.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 10:36:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.26 | bwd_microstep: 1424.05 | bwd_inner_microstep: 1424.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 10:36:54,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.99 | bwd_microstep: 1399.22 | bwd_inner_microstep: 1399.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701
[2024-06-10 10:36:56,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.08 | bwd_microstep: 1626.61 | bwd_inner_microstep: 1626.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 10:36:58,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.21 | bwd_microstep: 1526.53 | bwd_inner_microstep: 1526.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 10:37:00,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1440.57 | bwd_inner_microstep: 1440.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 10:37:02,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1560.69 | bwd_inner_microstep: 1560.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673
[2024-06-10 10:37:04,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.39 | bwd_microstep: 1326.54 | bwd_inner_microstep: 1326.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-10 10:37:05,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.80 | bwd_microstep: 808.50 | bwd_inner_microstep: 808.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3534
[2024-06-10 10:37:07,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.33 | bwd_microstep: 1519.18 | bwd_inner_microstep: 1519.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 10:37:08,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.38 | bwd_microstep: 978.42 | bwd_inner_microstep: 978.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3553
[2024-06-10 10:37:10,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1368.34 | bwd_inner_microstep: 1368.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 10:37:11,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.54 | bwd_microstep: 685.99 | bwd_inner_microstep: 685.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 10:37:13,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.35 | bwd_microstep: 1297.33 | bwd_inner_microstep: 1297.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 10:37:15,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1511.16 | bwd_inner_microstep: 1511.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 10:37:18,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.50 | bwd_microstep: 1656.78 | bwd_inner_microstep: 1656.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 10:37:20,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1504.50 | bwd_inner_microstep: 1504.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 10:37:23,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.29 | optimizer_step: 6.58
[2024-06-10 10:37:23,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 3273.04 | bwd_inner_microstep: 1473.17 | bwd_allreduce_microstep: 1799.81 | step_microstep: 39.07
[2024-06-10 10:37:23,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16497.46 | bwd: 46097.99 | bwd_inner: 44297.19 | bwd_allreduce: 1800.09 | step: 40.60
{'loss': 1.2599, 'learning_rate': 3.10840823860352e-05, 'epoch': 0.33}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4331
[2024-06-10 10:37:26,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.46 | bwd_microstep: 1684.86 | bwd_inner_microstep: 1684.71 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 10:37:28,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.61 | bwd_microstep: 1680.41 | bwd_inner_microstep: 1680.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 10:37:30,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.23 | bwd_microstep: 1249.65 | bwd_inner_microstep: 1249.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1901
[2024-06-10 10:37:31,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.68 | bwd_microstep: 780.02 | bwd_inner_microstep: 779.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3494
[2024-06-10 10:37:33,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1444.99 | bwd_inner_microstep: 1444.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 10:37:35,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.81 | bwd_microstep: 1385.44 | bwd_inner_microstep: 1385.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 10:37:37,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1379.97 | bwd_inner_microstep: 1379.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 10:37:39,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.63 | bwd_microstep: 1531.51 | bwd_inner_microstep: 1531.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 10:37:41,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.47 | bwd_microstep: 1251.88 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3522
[2024-06-10 10:37:43,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.50 | bwd_microstep: 1442.35 | bwd_inner_microstep: 1442.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3504
[2024-06-10 10:37:45,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.35 | bwd_microstep: 1536.95 | bwd_inner_microstep: 1536.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 10:37:47,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.94 | bwd_microstep: 1602.13 | bwd_inner_microstep: 1602.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461
[2024-06-10 10:37:49,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.07 | bwd_microstep: 1570.93 | bwd_inner_microstep: 1570.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 10:37:51,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1344.91 | bwd_inner_microstep: 1344.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 10:37:53,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1373.86 | bwd_inner_microstep: 1373.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 10:37:55,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.82 | bwd_microstep: 1380.62 | bwd_inner_microstep: 1380.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 10:37:57,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1393.05 | bwd_inner_microstep: 1393.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 10:37:59,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1515.99 | bwd_inner_microstep: 1515.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 10:38:01,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1557.35 | bwd_inner_microstep: 1557.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 10:38:03,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.09 | bwd_microstep: 1401.91 | bwd_inner_microstep: 1401.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1985
[2024-06-10 10:38:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.02 | bwd_microstep: 833.56 | bwd_inner_microstep: 833.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 10:38:06,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1427.36 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 10:38:08,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1397.39 | bwd_inner_microstep: 1397.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 10:38:10,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.59 | bwd_microstep: 1255.39 | bwd_inner_microstep: 1255.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3437
[2024-06-10 10:38:11,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.84 | bwd_microstep: 1190.07 | bwd_inner_microstep: 1190.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 10:38:13,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1304.65 | bwd_inner_microstep: 1304.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 10:38:15,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.23 | bwd_microstep: 1501.89 | bwd_inner_microstep: 1501.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 10:38:17,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.47 | bwd_microstep: 1496.01 | bwd_inner_microstep: 1495.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3552
[2024-06-10 10:38:20,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.62 | bwd_microstep: 1694.64 | bwd_inner_microstep: 1694.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 10:38:22,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1558.24 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 10:38:24,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1495.75 | bwd_inner_microstep: 1495.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 10:38:26,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.26 | optimizer_step: 6.66
[2024-06-10 10:38:26,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.16 | bwd_microstep: 1687.39 | bwd_inner_microstep: 1679.31 | bwd_allreduce_microstep: 8.03 | step_microstep: 38.77
[2024-06-10 10:38:26,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16946.71 | bwd: 45351.16 | bwd_inner: 45342.09 | bwd_allreduce: 8.33 | step: 40.61
{'loss': 1.272, 'learning_rate': 3.105282033572398e-05, 'epoch': 0.33}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486
[2024-06-10 10:38:28,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.52 | bwd_microstep: 1577.29 | bwd_inner_microstep: 1577.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 10:38:30,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.13 | bwd_microstep: 1412.19 | bwd_inner_microstep: 1412.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3880
[2024-06-10 10:38:32,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1385.20 | bwd_inner_microstep: 1385.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 10:38:34,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1481.32 | bwd_inner_microstep: 1481.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 10:38:36,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.17 | bwd_microstep: 1386.58 | bwd_inner_microstep: 1386.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2256
[2024-06-10 10:38:37,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.11 | bwd_microstep: 871.98 | bwd_inner_microstep: 871.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 10:38:38,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.04 | bwd_microstep: 684.58 | bwd_inner_microstep: 684.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 10:38:40,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.48 | bwd_microstep: 1255.06 | bwd_inner_microstep: 1255.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-10 10:38:41,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.88 | bwd_microstep: 728.69 | bwd_inner_microstep: 728.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3685
[2024-06-10 10:38:43,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 1457.81 | bwd_inner_microstep: 1457.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4026
[2024-06-10 10:38:45,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.89 | bwd_microstep: 1619.98 | bwd_inner_microstep: 1619.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 10:38:47,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.84 | bwd_microstep: 1441.35 | bwd_inner_microstep: 1441.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3536
[2024-06-10 10:38:49,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.64 | bwd_microstep: 1562.05 | bwd_inner_microstep: 1562.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2132
[2024-06-10 10:38:51,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.83 | bwd_microstep: 1027.36 | bwd_inner_microstep: 1027.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 10:38:53,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.78 | bwd_microstep: 1604.03 | bwd_inner_microstep: 1604.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 10:38:55,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.46 | bwd_microstep: 1588.38 | bwd_inner_microstep: 1588.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 10:38:57,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.49 | bwd_microstep: 1289.03 | bwd_inner_microstep: 1289.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 10:38:59,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1254.96 | bwd_inner_microstep: 1254.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 10:39:01,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.33 | bwd_microstep: 1661.31 | bwd_inner_microstep: 1661.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 10:39:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1260.43 | bwd_inner_microstep: 1260.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817
[2024-06-10 10:39:05,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1510.66 | bwd_inner_microstep: 1510.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3159
[2024-06-10 10:39:07,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1265.36 | bwd_inner_microstep: 1265.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 10:39:09,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1408.43 | bwd_inner_microstep: 1408.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2021
[2024-06-10 10:39:10,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.22 | bwd_microstep: 905.37 | bwd_inner_microstep: 905.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3602
[2024-06-10 10:39:12,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.06 | bwd_microstep: 1456.58 | bwd_inner_microstep: 1456.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.26
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 10:39:14,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1508.36 | bwd_inner_microstep: 1508.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 10:39:16,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1397.01 | bwd_inner_microstep: 1396.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3730
[2024-06-10 10:39:18,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1371.34 | bwd_inner_microstep: 1371.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 10:39:20,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.18 | bwd_microstep: 1433.56 | bwd_inner_microstep: 1433.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 10:39:22,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1376.52 | bwd_inner_microstep: 1376.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728
[2024-06-10 10:39:23,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.20 | bwd_microstep: 1337.04 | bwd_inner_microstep: 1337.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 10:39:26,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 10:39:26,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 1972.60 | bwd_inner_microstep: 1622.75 | bwd_allreduce_microstep: 349.80 | step_microstep: 38.96
[2024-06-10 10:39:26,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.28 | bwd: 43492.43 | bwd_inner: 43141.71 | bwd_allreduce: 350.03 | step: 41.80
{'loss': 1.2183, 'learning_rate': 3.1021519357436994e-05, 'epoch': 0.33}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 10:39:28,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.21 | bwd_microstep: 1590.66 | bwd_inner_microstep: 1590.57 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3935
[2024-06-10 10:39:30,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.80 | bwd_microstep: 1594.95 | bwd_inner_microstep: 1594.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 10:39:32,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1243.90 | bwd_inner_microstep: 1243.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857
[2024-06-10 10:39:34,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.76 | bwd_microstep: 1663.85 | bwd_inner_microstep: 1663.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 10:39:37,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1638.47 | bwd_inner_microstep: 1638.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 10:39:39,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1346.35 | bwd_inner_microstep: 1346.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3466
[2024-06-10 10:39:40,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.35 | bwd_microstep: 1342.78 | bwd_inner_microstep: 1342.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 10:39:42,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.94 | bwd_microstep: 1284.46 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1977
[2024-06-10 10:39:43,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.73 | bwd_microstep: 734.64 | bwd_inner_microstep: 734.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-10 10:39:44,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.64 | bwd_microstep: 688.28 | bwd_inner_microstep: 688.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-10 10:39:46,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.40 | bwd_microstep: 1158.04 | bwd_inner_microstep: 1158.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3485
[2024-06-10 10:39:48,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.46 | bwd_microstep: 1220.12 | bwd_inner_microstep: 1220.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 10:39:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.84 | bwd_microstep: 1280.37 | bwd_inner_microstep: 1280.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 10:39:52,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.98 | bwd_microstep: 1616.54 | bwd_inner_microstep: 1616.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 10:39:53,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.55 | bwd_microstep: 1342.81 | bwd_inner_microstep: 1342.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3539
[2024-06-10 10:39:56,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.36 | bwd_microstep: 1657.25 | bwd_inner_microstep: 1657.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 10:39:57,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.76 | bwd_microstep: 800.78 | bwd_inner_microstep: 800.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 10:39:59,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.96 | bwd_microstep: 1296.54 | bwd_inner_microstep: 1296.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 10:40:00,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 10:40:02,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.64 | bwd_microstep: 974.07 | bwd_inner_microstep: 974.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 10:40:04,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.68 | bwd_microstep: 1452.33 | bwd_inner_microstep: 1452.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 10:40:05,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1156.59 | bwd_inner_microstep: 1156.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 10:40:07,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1254.09 | bwd_inner_microstep: 1254.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 10:40:09,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1556.24 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 10:40:11,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1499.50 | bwd_inner_microstep: 1499.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2274
[2024-06-10 10:40:13,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.59 | bwd_microstep: 936.50 | bwd_inner_microstep: 936.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3612
[2024-06-10 10:40:15,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.57 | bwd_microstep: 1460.59 | bwd_inner_microstep: 1460.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4059
[2024-06-10 10:40:17,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.98 | bwd_microstep: 1629.32 | bwd_inner_microstep: 1629.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3541
[2024-06-10 10:40:19,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.07 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 10:40:21,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.39 | bwd_microstep: 1447.09 | bwd_inner_microstep: 1447.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 10:40:23,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1489.99 | bwd_inner_microstep: 1489.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3765
[2024-06-10 10:40:30,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 10:40:30,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.77 | bwd_microstep: 5706.16 | bwd_inner_microstep: 1990.69 | bwd_allreduce_microstep: 3715.42 | step_microstep: 38.59
[2024-06-10 10:40:30,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16129.64 | bwd: 47005.43 | bwd_inner: 43289.02 | bwd_allreduce: 3715.70 | step: 40.29
{'loss': 1.292, 'learning_rate': 3.0990179561416124e-05, 'epoch': 0.34}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 10:40:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1464.94 | bwd_inner_microstep: 1464.78 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 10:40:33,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1243.91 | bwd_inner_microstep: 1243.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2305
[2024-06-10 10:40:35,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.12 | bwd_microstep: 882.17 | bwd_inner_microstep: 882.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 10:40:36,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1375.43 | bwd_inner_microstep: 1375.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4162
[2024-06-10 10:40:39,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.43 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 10:40:40,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.81 | bwd_microstep: 1278.33 | bwd_inner_microstep: 1278.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721
[2024-06-10 10:40:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1331.17 | bwd_inner_microstep: 1331.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 10:40:43,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.10 | bwd_microstep: 677.90 | bwd_inner_microstep: 677.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 10:40:45,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1382.53 | bwd_inner_microstep: 1382.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678
[2024-06-10 10:40:47,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.02 | bwd_microstep: 1446.09 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 10:40:49,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.05 | bwd_microstep: 1518.87 | bwd_inner_microstep: 1518.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2093
[2024-06-10 10:40:50,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 819.00 | bwd_inner_microstep: 818.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-10 10:40:52,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.05 | bwd_microstep: 899.82 | bwd_inner_microstep: 899.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3646
[2024-06-10 10:40:54,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.11 | bwd_microstep: 1456.24 | bwd_inner_microstep: 1456.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 10:40:56,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1378.63 | bwd_inner_microstep: 1378.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825
[2024-06-10 10:40:58,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.55 | bwd_microstep: 1583.14 | bwd_inner_microstep: 1583.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 10:41:00,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1411.09 | bwd_inner_microstep: 1411.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 10:41:02,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1504.35 | bwd_inner_microstep: 1504.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 10:41:04,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1313.42 | bwd_inner_microstep: 1313.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 10:41:05,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.28 | bwd_microstep: 1357.12 | bwd_inner_microstep: 1357.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 10:41:07,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1400.54 | bwd_inner_microstep: 1400.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3582
[2024-06-10 10:41:09,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.14 | bwd_microstep: 1333.90 | bwd_inner_microstep: 1333.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3631
[2024-06-10 10:41:11,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.60 | bwd_microstep: 1449.08 | bwd_inner_microstep: 1449.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3643
[2024-06-10 10:41:13,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.44 | bwd_microstep: 1478.11 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2151
[2024-06-10 10:41:14,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.09 | bwd_microstep: 851.09 | bwd_inner_microstep: 851.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 10:41:17,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.86 | bwd_microstep: 1508.19 | bwd_inner_microstep: 1508.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3826
[2024-06-10 10:41:18,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.21 | bwd_microstep: 1261.31 | bwd_inner_microstep: 1261.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3819
[2024-06-10 10:41:20,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.32 | bwd_microstep: 1498.45 | bwd_inner_microstep: 1498.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 10:41:22,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1466.85 | bwd_inner_microstep: 1466.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 10:41:24,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1451.58 | bwd_inner_microstep: 1451.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-10 10:41:27,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.05 | bwd_microstep: 1751.63 | bwd_inner_microstep: 1751.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3435
[2024-06-10 10:41:29,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 10:41:29,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1922.36 | bwd_inner_microstep: 1467.99 | bwd_allreduce_microstep: 454.32 | step_microstep: 37.78
[2024-06-10 10:41:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16037.32 | bwd: 43343.94 | bwd_inner: 42888.60 | bwd_allreduce: 454.60 | step: 39.32
{'loss': 1.2924, 'learning_rate': 3.095880105803997e-05, 'epoch': 0.34}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 10:41:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1444.72 | bwd_inner_microstep: 1444.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 10:41:33,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.03 | bwd_microstep: 1283.02 | bwd_inner_microstep: 1282.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 10:41:35,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.02 | bwd_microstep: 1650.35 | bwd_inner_microstep: 1650.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847
[2024-06-10 10:41:38,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.55 | bwd_microstep: 1659.98 | bwd_inner_microstep: 1659.84 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3482
[2024-06-10 10:41:39,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.42 | bwd_microstep: 1347.47 | bwd_inner_microstep: 1347.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 10:41:41,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.91 | bwd_microstep: 1180.31 | bwd_inner_microstep: 1180.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2648
[2024-06-10 10:41:42,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.31 | bwd_microstep: 954.81 | bwd_inner_microstep: 954.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 10:41:45,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1536.89 | bwd_inner_microstep: 1536.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 10:41:46,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.03 | bwd_microstep: 1287.83 | bwd_inner_microstep: 1287.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 10:41:48,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1381.49 | bwd_inner_microstep: 1381.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3605
[2024-06-10 10:41:50,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.23 | bwd_microstep: 1355.56 | bwd_inner_microstep: 1355.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 10:41:52,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.49 | bwd_microstep: 1378.20 | bwd_inner_microstep: 1378.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3476
[2024-06-10 10:41:54,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.38 | bwd_microstep: 1573.35 | bwd_inner_microstep: 1573.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 10:41:56,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1613.30 | bwd_inner_microstep: 1613.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655
[2024-06-10 10:41:59,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.18 | bwd_microstep: 1582.68 | bwd_inner_microstep: 1582.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 10:42:00,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641
[2024-06-10 10:42:02,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1434.63 | bwd_inner_microstep: 1434.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 10:42:04,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.04 | bwd_microstep: 801.57 | bwd_inner_microstep: 801.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 10:42:05,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.29 | bwd_microstep: 1397.70 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 10:42:07,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.43 | bwd_microstep: 1375.50 | bwd_inner_microstep: 1375.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1996
[2024-06-10 10:42:08,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.36 | bwd_microstep: 737.66 | bwd_inner_microstep: 737.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 10:42:10,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1378.51 | bwd_inner_microstep: 1378.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 10:42:12,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.19 | bwd_microstep: 1357.93 | bwd_inner_microstep: 1357.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 10:42:14,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1512.39 | bwd_inner_microstep: 1512.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 10:42:16,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.50 | bwd_microstep: 1391.82 | bwd_inner_microstep: 1391.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510
[2024-06-10 10:42:18,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.29 | bwd_microstep: 1192.82 | bwd_inner_microstep: 1192.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 10:42:20,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.19 | bwd_microstep: 1308.92 | bwd_inner_microstep: 1308.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2242
[2024-06-10 10:42:21,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.80 | bwd_microstep: 866.80 | bwd_inner_microstep: 866.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 10:42:23,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.37 | bwd_microstep: 1449.34 | bwd_inner_microstep: 1449.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 10:42:25,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.00 | bwd_microstep: 1591.31 | bwd_inner_microstep: 1591.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 10:42:26,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.04 | bwd_microstep: 977.61 | bwd_inner_microstep: 977.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3470
[2024-06-10 10:42:28,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 10:42:28,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.05 | bwd_microstep: 1290.16 | bwd_inner_microstep: 1281.11 | bwd_allreduce_microstep: 9.00 | step_microstep: 38.49
[2024-06-10 10:42:28,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15997.71 | bwd: 42640.11 | bwd_inner: 42630.10 | bwd_allreduce: 9.27 | step: 40.25


 33%|███▎      | 575/1726 [9:58:57<21:21:36, 66.81s/it]
 33%|███▎      | 576/1726 [10:00:00<20:58:18, 65.65s/it]


 33%|███▎      | 576/1726 [10:00:00<20:58:18, 65.65s/it]
 33%|███▎      | 577/1726 [10:01:03<20:40:04, 64.76s/it]


 33%|███▎      | 577/1726 [10:01:03<20:40:04, 64.76s/it]
 33%|███▎      | 578/1726 [10:02:03<20:11:33, 63.32s/it]


 33%|███▎      | 578/1726 [10:02:03<20:11:33, 63.32s/it]
 34%|███▎      | 579/1726 [10:03:06<20:11:29, 63.37s/it]


 34%|███▎      | 579/1726 [10:03:06<20:11:29, 63.37s/it]
 34%|███▎      | 580/1726 [10:04:06<19:49:29, 62.28s/it]


 34%|███▎      | 580/1726 [10:04:06<19:49:29, 62.28s/it]
 34%|█{'loss': 1.2494, 'learning_rate': 3.0927383957823466e-05, 'epoch': 0.34}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1946
[2024-06-10 10:42:29,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 822.78 | bwd_inner_microstep: 822.69 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 10:42:31,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1376.21 | bwd_inner_microstep: 1376.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 10:42:33,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1274.82 | bwd_inner_microstep: 1274.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 10:42:35,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1275.47 | bwd_inner_microstep: 1275.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 10:42:37,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1449.30 | bwd_inner_microstep: 1449.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 10:42:39,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1245.24 | bwd_inner_microstep: 1245.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 10:42:40,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1286.88 | bwd_inner_microstep: 1286.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 10:42:42,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1244.48 | bwd_inner_microstep: 1244.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 10:42:44,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.41 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 10:42:46,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.99 | bwd_microstep: 1387.63 | bwd_inner_microstep: 1387.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-10 10:42:48,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1418.02 | bwd_inner_microstep: 1418.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388
[2024-06-10 10:42:49,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1241.68 | bwd_inner_microstep: 1241.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 10:42:51,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.29 | bwd_microstep: 1448.23 | bwd_inner_microstep: 1448.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 10:42:53,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1346.65 | bwd_inner_microstep: 1346.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 10:42:55,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.59 | bwd_microstep: 1379.17 | bwd_inner_microstep: 1379.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 10:42:57,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.21 | bwd_microstep: 1521.63 | bwd_inner_microstep: 1521.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 10:42:59,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1398.82 | bwd_inner_microstep: 1398.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 10:43:01,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1561.49 | bwd_inner_microstep: 1561.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 10:43:03,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 10:43:05,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 10:43:07,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.99 | bwd_microstep: 1506.51 | bwd_inner_microstep: 1506.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 10:43:09,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1378.23 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 10:43:10,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.82 | bwd_microstep: 978.41 | bwd_inner_microstep: 978.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 10:43:12,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.75 | bwd_microstep: 1426.83 | bwd_inner_microstep: 1426.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 10:43:14,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.41 | bwd_microstep: 1292.33 | bwd_inner_microstep: 1292.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3578
[2024-06-10 10:43:16,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.04 | bwd_microstep: 1423.61 | bwd_inner_microstep: 1423.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3578
[2024-06-10 10:43:18,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1525.05 | bwd_inner_microstep: 1525.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3606
[2024-06-10 10:43:20,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.90 | bwd_microstep: 1372.30 | bwd_inner_microstep: 1372.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 10:43:22,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.21 | bwd_microstep: 1311.21 | bwd_inner_microstep: 1311.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 10:43:24,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.68 | bwd_microstep: 1594.41 | bwd_inner_microstep: 1594.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-10 10:43:26,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.03 | bwd_microstep: 1073.82 | bwd_inner_microstep: 1073.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2267
[2024-06-10 10:43:30,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.33 | optimizer_step: 6.60
[2024-06-10 10:43:30,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.44 | bwd_microstep: 3459.02 | bwd_inner_microstep: 1210.47 | bwd_allreduce_microstep: 2248.49 | step_microstep: 38.70
[2024-06-10 10:43:30,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15990.33 | bwd: 44994.77 | bwd_inner: 42745.28 | bwd_allreduce: 2248.77 | step: 40.25
{'loss': 1.2718, 'learning_rate': 3.089592837141746e-05, 'epoch': 0.34}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 10:43:31,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1241.33 | bwd_inner_microstep: 1241.25 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2356
[2024-06-10 10:43:33,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.34 | bwd_microstep: 890.45 | bwd_inner_microstep: 890.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 10:43:34,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 10:43:36,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1340.69 | bwd_inner_microstep: 1340.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 10:43:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.82 | bwd_microstep: 1312.53 | bwd_inner_microstep: 1312.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 10:43:39,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.17 | bwd_microstep: 788.82 | bwd_inner_microstep: 788.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 10:43:41,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 1299.65 | bwd_inner_microstep: 1299.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-10 10:43:43,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.08 | bwd_microstep: 1635.79 | bwd_inner_microstep: 1635.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 10:43:45,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1409.31 | bwd_inner_microstep: 1409.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1981
[2024-06-10 10:43:46,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.44 | bwd_microstep: 771.03 | bwd_inner_microstep: 771.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 10:43:47,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.31 | bwd_microstep: 802.50 | bwd_inner_microstep: 802.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 10:43:49,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.23 | bwd_microstep: 1297.59 | bwd_inner_microstep: 1297.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 10:43:51,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3711
[2024-06-10 10:43:53,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1234.92 | bwd_inner_microstep: 1234.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2411
[2024-06-10 10:43:54,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.50 | bwd_microstep: 1037.72 | bwd_inner_microstep: 1037.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 10:43:56,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.60 | bwd_microstep: 1612.89 | bwd_inner_microstep: 1612.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 10:43:58,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.05 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 10:44:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.03 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 10:44:02,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1410.34 | bwd_inner_microstep: 1410.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3638
[2024-06-10 10:44:04,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.50 | bwd_microstep: 1446.71 | bwd_inner_microstep: 1446.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891
[2024-06-10 10:44:06,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.17 | bwd_microstep: 1595.00 | bwd_inner_microstep: 1594.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 10:44:08,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1296.24 | bwd_inner_microstep: 1296.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 10:44:10,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1362.01 | bwd_inner_microstep: 1361.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814
[2024-06-10 10:44:12,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.17 | bwd_microstep: 1584.11 | bwd_inner_microstep: 1584.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 10:44:14,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.70 | bwd_microstep: 1184.06 | bwd_inner_microstep: 1184.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 10:44:16,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.19 | bwd_microstep: 1639.09 | bwd_inner_microstep: 1639.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 10:44:18,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1559.64 | bwd_inner_microstep: 1559.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3789
[2024-06-10 10:44:20,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.21 | bwd_microstep: 1292.10 | bwd_inner_microstep: 1292.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3753
[2024-06-10 10:44:22,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.58 | bwd_microstep: 1567.22 | bwd_inner_microstep: 1567.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3590
[2024-06-10 10:44:24,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.51 | bwd_microstep: 1535.25 | bwd_inner_microstep: 1535.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 10:44:26,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1245.51 | bwd_inner_microstep: 1245.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3536
[2024-06-10 10:44:31,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.40 | optimizer_step: 6.59
[2024-06-10 10:44:31,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.59 | bwd_microstep: 4581.08 | bwd_inner_microstep: 1913.22 | bwd_allreduce_microstep: 2667.79 | step_microstep: 39.76
[2024-06-10 10:44:31,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15926.58 | bwd: 45301.74 | bwd_inner: 42632.96 | bwd_allreduce: 2668.07 | step: 41.60
{'loss': 1.257, 'learning_rate': 3.086443440960838e-05, 'epoch': 0.34}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3500
[2024-06-10 10:44:33,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1450.00 | bwd_inner_microstep: 1449.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-10 10:44:35,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.19 | bwd_microstep: 1440.19 | bwd_inner_microstep: 1440.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 10:44:37,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1485.96 | bwd_inner_microstep: 1485.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 10:44:38,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.22 | bwd_microstep: 793.98 | bwd_inner_microstep: 793.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 10:44:41,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.71 | bwd_microstep: 1643.11 | bwd_inner_microstep: 1643.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 10:44:42,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1249.13 | bwd_inner_microstep: 1249.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 10:44:44,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.74 | bwd_microstep: 1279.75 | bwd_inner_microstep: 1279.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 10:44:46,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1289.09 | bwd_inner_microstep: 1289.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:44:48,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.49 | bwd_microstep: 1250.90 | bwd_inner_microstep: 1250.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 10:44:50,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1553.26 | bwd_inner_microstep: 1553.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 10:44:52,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.80 | bwd_microstep: 1390.67 | bwd_inner_microstep: 1390.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3407
[2024-06-10 10:44:53,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1291.08 | bwd_inner_microstep: 1291.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 10:44:55,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.29 | bwd_microstep: 892.58 | bwd_inner_microstep: 892.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 10:44:56,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.63 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1912
[2024-06-10 10:44:58,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.57 | bwd_microstep: 843.12 | bwd_inner_microstep: 843.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3503
[2024-06-10 10:45:00,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.84 | bwd_microstep: 1686.41 | bwd_inner_microstep: 1686.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181
[2024-06-10 10:45:01,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.60 | bwd_microstep: 953.12 | bwd_inner_microstep: 953.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3644
[2024-06-10 10:45:03,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.91 | bwd_microstep: 1583.07 | bwd_inner_microstep: 1583.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-10 10:45:04,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 701.78 | bwd_inner_microstep: 701.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3724
[2024-06-10 10:45:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.12 | bwd_microstep: 1566.09 | bwd_inner_microstep: 1566.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2061
[2024-06-10 10:45:08,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.08 | bwd_microstep: 755.27 | bwd_inner_microstep: 755.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 10:45:09,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1256.30 | bwd_inner_microstep: 1256.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 10:45:11,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1254.14 | bwd_inner_microstep: 1254.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 10:45:13,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1256.16 | bwd_inner_microstep: 1256.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 10:45:15,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1291.22 | bwd_inner_microstep: 1291.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2869
[2024-06-10 10:45:16,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.04 | bwd_microstep: 1262.44 | bwd_inner_microstep: 1262.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2054
[2024-06-10 10:45:18,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.23 | bwd_microstep: 1012.69 | bwd_inner_microstep: 1012.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 10:45:20,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.89 | bwd_microstep: 1514.37 | bwd_inner_microstep: 1514.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 10:45:22,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1489.95 | bwd_inner_microstep: 1489.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3582
[2024-06-10 10:45:24,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1364.67 | bwd_inner_microstep: 1364.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2966
[2024-06-10 10:45:26,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.06 | bwd_microstep: 1260.64 | bwd_inner_microstep: 1260.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3769
[2024-06-10 10:45:33,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 10:45:33,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.57 | bwd_microstep: 7309.19 | bwd_inner_microstep: 1820.97 | bwd_allreduce_microstep: 5488.16 | step_microstep: 38.54
[2024-06-10 10:45:33,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15294.03 | bwd: 46655.65 | bwd_inner: 41166.54 | bwd_allreduce: 5488.42 | step: 40.22
{'loss': 1.307, 'learning_rate': 3.0832902183317784e-05, 'epoch': 0.34}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 10:45:36,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.13 | bwd_microstep: 1466.75 | bwd_inner_microstep: 1466.56 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 549
[2024-06-10 10:45:36,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 98.08 | bwd_microstep: 249.80 | bwd_inner_microstep: 249.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 10:45:37,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.57 | bwd_microstep: 969.41 | bwd_inner_microstep: 969.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 10:45:39,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.82 | bwd_microstep: 1551.62 | bwd_inner_microstep: 1551.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 10:45:41,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1246.34 | bwd_inner_microstep: 1246.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 10:45:43,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.71 | bwd_microstep: 1639.30 | bwd_inner_microstep: 1639.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 10:45:45,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1278.49 | bwd_inner_microstep: 1278.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 10:45:47,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.24 | bwd_microstep: 1284.73 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2923
[2024-06-10 10:45:48,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.48 | bwd_microstep: 1096.14 | bwd_inner_microstep: 1096.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 10:45:51,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.29 | bwd_microstep: 1549.86 | bwd_inner_microstep: 1549.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 10:45:53,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1525.31 | bwd_inner_microstep: 1525.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1925
[2024-06-10 10:45:54,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.26 | bwd_microstep: 760.87 | bwd_inner_microstep: 760.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 10:45:55,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.81 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-10 10:45:58,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.35 | bwd_microstep: 1549.88 | bwd_inner_microstep: 1549.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 10:45:59,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1374.60 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 10:46:01,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1345.00 | bwd_inner_microstep: 1344.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 10:46:03,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1489.71 | bwd_inner_microstep: 1489.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 10:46:05,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.17 | bwd_microstep: 1514.21 | bwd_inner_microstep: 1514.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 10:46:08,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1474.59 | bwd_inner_microstep: 1474.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 10:46:09,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1297.11 | bwd_inner_microstep: 1297.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2305
[2024-06-10 10:46:10,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.44 | bwd_microstep: 836.57 | bwd_inner_microstep: 836.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-10 10:46:13,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1612.01 | bwd_inner_microstep: 1611.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 10:46:15,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1393.67 | bwd_inner_microstep: 1393.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 10:46:16,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.87 | bwd_microstep: 1182.69 | bwd_inner_microstep: 1182.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1412
[2024-06-10 10:46:17,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 206.49 | bwd_microstep: 533.51 | bwd_inner_microstep: 533.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 10:46:19,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 10:46:21,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.59 | bwd_microstep: 1618.70 | bwd_inner_microstep: 1618.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2215
[2024-06-10 10:46:22,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.87 | bwd_microstep: 863.63 | bwd_inner_microstep: 863.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575
[2024-06-10 10:46:24,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.72 | bwd_microstep: 1431.34 | bwd_inner_microstep: 1431.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588
[2024-06-10 10:46:26,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1338.82 | bwd_inner_microstep: 1338.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 10:46:27,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.94 | bwd_microstep: 801.90 | bwd_inner_microstep: 801.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3580
[2024-06-10 10:46:36,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 10:46:36,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.53 | bwd_microstep: 8411.20 | bwd_inner_microstep: 1754.91 | bwd_allreduce_microstep: 6656.22 | step_microstep: 38.80
[2024-06-10 10:46:36,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15198.02 | bwd: 47331.55 | bwd_inner: 40674.27 | bwd_allreduce: 6656.53 | step: 40.69
{'loss': 1.3422, 'learning_rate': 3.0801331803602015e-05, 'epoch': 0.34}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 10:46:39,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.93 | bwd_microstep: 1587.35 | bwd_inner_microstep: 1587.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2547
[2024-06-10 10:46:40,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.02 | bwd_microstep: 1028.92 | bwd_inner_microstep: 1028.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 10:46:42,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.97 | bwd_microstep: 1475.53 | bwd_inner_microstep: 1475.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3855
[2024-06-10 10:46:44,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.28 | bwd_microstep: 1662.04 | bwd_inner_microstep: 1662.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3789
[2024-06-10 10:46:46,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.69 | bwd_microstep: 1348.01 | bwd_inner_microstep: 1347.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756
[2024-06-10 10:46:48,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.53 | bwd_microstep: 1639.92 | bwd_inner_microstep: 1639.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743
[2024-06-10 10:46:51,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.92 | bwd_microstep: 1533.66 | bwd_inner_microstep: 1533.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-10 10:46:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.02 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960
[2024-06-10 10:46:53,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.18 | bwd_microstep: 828.90 | bwd_inner_microstep: 828.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 10:46:55,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.77 | bwd_microstep: 1531.04 | bwd_inner_microstep: 1531.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 10:46:57,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-10 10:46:59,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1319.86 | bwd_inner_microstep: 1319.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3665
[2024-06-10 10:47:01,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.76 | bwd_microstep: 1585.23 | bwd_inner_microstep: 1585.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2395
[2024-06-10 10:47:02,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.17 | bwd_microstep: 943.14 | bwd_inner_microstep: 943.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 10:47:04,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 10:47:06,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.82 | bwd_microstep: 1405.05 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3659
[2024-06-10 10:47:08,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.69 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3634
[2024-06-10 10:47:10,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.62 | bwd_microstep: 1814.05 | bwd_inner_microstep: 1814.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 10:47:12,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.06 | bwd_microstep: 1411.30 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 10:47:14,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.69 | bwd_microstep: 1183.98 | bwd_inner_microstep: 1183.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 10:47:16,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3991
[2024-06-10 10:47:18,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.51 | bwd_microstep: 1708.55 | bwd_inner_microstep: 1708.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 10:47:20,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.08 | bwd_microstep: 1436.76 | bwd_inner_microstep: 1436.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-10 10:47:22,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1319.27 | bwd_inner_microstep: 1319.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2279
[2024-06-10 10:47:23,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.91 | bwd_microstep: 1006.59 | bwd_inner_microstep: 1006.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2333
[2024-06-10 10:47:25,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.61 | bwd_microstep: 985.69 | bwd_inner_microstep: 985.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 10:47:27,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1502.50 | bwd_inner_microstep: 1502.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 10:47:29,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.70 | bwd_microstep: 1535.00 | bwd_inner_microstep: 1534.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641
[2024-06-10 10:47:31,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.79 | bwd_microstep: 1318.24 | bwd_inner_microstep: 1318.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3582
[2024-06-10 10:47:33,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.86 | bwd_microstep: 1700.66 | bwd_inner_microstep: 1700.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 10:47:35,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.72 | bwd_microstep: 1451.86 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 10:47:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 10:47:38,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1737.25 | bwd_inner_microstep: 1571.00 | bwd_allreduce_microstep: 166.20 | step_microstep: 37.97
[2024-06-10 10:47:38,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16469.18 | bwd: 44356.43 | bwd_inner: 44189.32 | bwd_allreduce: 166.43 | step: 39.67
██▎      | 581/1726 [10:05:05<19:29:37, 61.29s/it]


 34%|███▎      | 581/1726 [10:05:05<19:29:37, 61.29s/it]
 34%|███▎      | 582/1726 [10:06:06<19:28:51, 61.30s/it]


 34%|███▎      | 582/1726 [10:06:06<19:28:51, 61.30s/it]
 34%|███▍      | 583/1726 [10:07:08<19:29:29, 61.39s/it]


 34%|███▍      | 583/1726 [10:07:08<19:29:29, 61.39s/it]
 34%|███▍      | 584/1726 [10:08:10<19:33:40, 61.66s/it]


 34%|███▍      | 584/1726 [10:08:10<19:33:40, 61.66s/it]
 34%|███▍      | 585/1726 [10:09:13<19:39:34, 62.03s/it]


 34%|███▍      | 585/1726 [10:09:13<19:39:34, 62.03s/it]
 34%|███▍      | 586/1726 [10:10:14<19:33:46, 61.78s/it]
                           {'loss': 1.2241, 'learning_rate': 3.076972338165178e-05, 'epoch': 0.34}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 10:47:39,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.25 | bwd_microstep: 1249.59 | bwd_inner_microstep: 1249.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 10:47:40,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.88 | bwd_microstep: 807.61 | bwd_inner_microstep: 807.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3333
[2024-06-10 10:47:42,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.19 | bwd_microstep: 1300.30 | bwd_inner_microstep: 1300.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2759
[2024-06-10 10:47:44,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.96 | bwd_microstep: 974.05 | bwd_inner_microstep: 974.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3667
[2024-06-10 10:47:46,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.21 | bwd_microstep: 1590.82 | bwd_inner_microstep: 1590.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 10:47:47,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1248.62 | bwd_inner_microstep: 1248.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 10:47:49,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 10:47:51,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 10:47:53,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1282.58 | bwd_inner_microstep: 1282.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 10:47:55,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.51 | bwd_microstep: 1406.66 | bwd_inner_microstep: 1406.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 10:47:57,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.05 | bwd_microstep: 1153.52 | bwd_inner_microstep: 1153.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 10:47:58,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.06 | bwd_microstep: 1192.37 | bwd_inner_microstep: 1192.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2304
[2024-06-10 10:48:00,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.10 | bwd_microstep: 1071.99 | bwd_inner_microstep: 1071.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3649
[2024-06-10 10:48:02,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.45 | bwd_microstep: 1825.66 | bwd_inner_microstep: 1825.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 10:48:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.63 | bwd_microstep: 1253.26 | bwd_inner_microstep: 1253.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3458
[2024-06-10 10:48:06,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1500.09 | bwd_inner_microstep: 1500.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828
[2024-06-10 10:48:08,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.00 | bwd_microstep: 1587.72 | bwd_inner_microstep: 1587.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2306
[2024-06-10 10:48:10,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.20 | bwd_microstep: 984.53 | bwd_inner_microstep: 984.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 10:48:11,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1420.98 | bwd_inner_microstep: 1420.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823
[2024-06-10 10:48:14,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1501.82 | bwd_inner_microstep: 1501.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 10:48:16,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.36 | bwd_microstep: 1661.42 | bwd_inner_microstep: 1661.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810
[2024-06-10 10:48:18,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.22 | bwd_microstep: 1754.25 | bwd_inner_microstep: 1754.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 10:48:20,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1504.51 | bwd_inner_microstep: 1504.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 10:48:22,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1384.28 | bwd_inner_microstep: 1384.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 10:48:24,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.31 | bwd_microstep: 1449.32 | bwd_inner_microstep: 1449.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3545
[2024-06-10 10:48:27,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.12 | bwd_microstep: 1694.96 | bwd_inner_microstep: 1694.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 10:48:28,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1397.84 | bwd_inner_microstep: 1397.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2290
[2024-06-10 10:48:30,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.47 | bwd_microstep: 880.30 | bwd_inner_microstep: 880.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1953
[2024-06-10 10:48:31,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.19 | bwd_microstep: 730.87 | bwd_inner_microstep: 730.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 10:48:33,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1464.45 | bwd_inner_microstep: 1464.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3716
[2024-06-10 10:48:35,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1400.42 | bwd_inner_microstep: 1400.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 10:48:39,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.43 | optimizer_step: 6.60
[2024-06-10 10:48:39,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.80 | bwd_microstep: 3801.45 | bwd_inner_microstep: 1310.86 | bwd_allreduce_microstep: 2490.52 | step_microstep: 40.12
[2024-06-10 10:48:39,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15918.35 | bwd: 45139.60 | bwd_inner: 42648.14 | bwd_allreduce: 2490.76 | step: 42.06
{'loss': 1.2404, 'learning_rate': 3.073807702879179e-05, 'epoch': 0.34}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 10:48:41,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.43 | bwd_microstep: 1444.92 | bwd_inner_microstep: 1444.81 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 10:48:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.77 | bwd_microstep: 1291.21 | bwd_inner_microstep: 1291.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 10:48:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1347.56 | bwd_inner_microstep: 1347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 10:48:47,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1380.97 | bwd_inner_microstep: 1380.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 10:48:48,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.45 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 10:48:50,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.30 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764
[2024-06-10 10:48:52,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.91 | bwd_microstep: 1541.30 | bwd_inner_microstep: 1541.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3714
[2024-06-10 10:48:54,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.85 | bwd_microstep: 1588.67 | bwd_inner_microstep: 1588.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 10:48:56,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 10:48:58,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.61 | bwd_microstep: 1256.99 | bwd_inner_microstep: 1256.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 10:49:00,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1319.00 | bwd_inner_microstep: 1318.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967
[2024-06-10 10:49:01,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.91 | bwd_microstep: 734.30 | bwd_inner_microstep: 734.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3521
[2024-06-10 10:49:03,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.83 | bwd_microstep: 1441.86 | bwd_inner_microstep: 1441.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-10 10:49:05,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1591.88 | bwd_inner_microstep: 1591.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660
[2024-06-10 10:49:07,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.31 | bwd_microstep: 1456.61 | bwd_inner_microstep: 1456.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2120
[2024-06-10 10:49:08,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.36 | bwd_microstep: 927.10 | bwd_inner_microstep: 927.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 10:49:10,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.65 | bwd_microstep: 1530.87 | bwd_inner_microstep: 1530.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3722
[2024-06-10 10:49:13,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.84 | bwd_microstep: 1733.80 | bwd_inner_microstep: 1733.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3434
[2024-06-10 10:49:15,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.63 | bwd_microstep: 1459.68 | bwd_inner_microstep: 1459.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 10:49:17,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.31 | bwd_microstep: 1345.95 | bwd_inner_microstep: 1345.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 10:49:19,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.26 | bwd_microstep: 1531.08 | bwd_inner_microstep: 1531.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 10:49:21,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1450.75 | bwd_inner_microstep: 1450.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3612
[2024-06-10 10:49:23,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.49 | bwd_microstep: 1678.75 | bwd_inner_microstep: 1678.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 10:49:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.51 | bwd_microstep: 1649.20 | bwd_inner_microstep: 1649.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 10:49:27,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1499.81 | bwd_inner_microstep: 1499.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2232
[2024-06-10 10:49:28,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.74 | bwd_microstep: 803.14 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4287
[2024-06-10 10:49:31,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 722.46 | bwd_microstep: 1982.95 | bwd_inner_microstep: 1982.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 10:49:33,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.23 | bwd_microstep: 1388.22 | bwd_inner_microstep: 1388.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2112
[2024-06-10 10:49:34,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.11 | bwd_microstep: 734.50 | bwd_inner_microstep: 734.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205
[2024-06-10 10:49:35,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.17 | bwd_microstep: 962.67 | bwd_inner_microstep: 962.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3768
[2024-06-10 10:49:38,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.86 | bwd_microstep: 1498.27 | bwd_inner_microstep: 1498.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 10:49:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.19 | optimizer_step: 6.58
[2024-06-10 10:49:40,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1508.25 | bwd_inner_microstep: 1337.45 | bwd_allreduce_microstep: 170.76 | step_microstep: 37.76
[2024-06-10 10:49:40,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.10 | bwd: 43903.30 | bwd_inner: 43731.41 | bwd_allreduce: 171.13 | step: 42.24
{'loss': 1.2363, 'learning_rate': 3.070639285648032e-05, 'epoch': 0.34}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 10:49:42,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1381.33 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3902
[2024-06-10 10:49:44,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.43 | bwd_microstep: 1493.80 | bwd_inner_microstep: 1493.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 10:49:45,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.73 | bwd_microstep: 1376.34 | bwd_inner_microstep: 1376.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 10:49:47,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1348.36 | bwd_inner_microstep: 1348.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 10:49:49,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 10:49:51,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.19 | bwd_microstep: 1388.29 | bwd_inner_microstep: 1388.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398
[2024-06-10 10:49:53,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.18 | bwd_microstep: 1152.41 | bwd_inner_microstep: 1152.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 10:49:54,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.46 | bwd_microstep: 701.84 | bwd_inner_microstep: 701.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 10:49:55,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1251.83 | bwd_inner_microstep: 1251.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2889
[2024-06-10 10:49:57,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.91 | bwd_microstep: 1124.60 | bwd_inner_microstep: 1124.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-10 10:49:59,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.87 | bwd_microstep: 1621.04 | bwd_inner_microstep: 1621.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3618
[2024-06-10 10:50:01,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1459.78 | bwd_inner_microstep: 1459.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 10:50:03,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1249.80 | bwd_inner_microstep: 1249.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 10:50:05,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1491.74 | bwd_inner_microstep: 1491.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 10:50:06,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.68 | bwd_microstep: 889.42 | bwd_inner_microstep: 889.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516
[2024-06-10 10:50:08,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.56 | bwd_microstep: 1586.81 | bwd_inner_microstep: 1586.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3636
[2024-06-10 10:50:11,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.40 | bwd_microstep: 1575.51 | bwd_inner_microstep: 1575.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1953
[2024-06-10 10:50:12,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.83 | bwd_microstep: 888.68 | bwd_inner_microstep: 888.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004
[2024-06-10 10:50:13,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.76 | bwd_microstep: 841.64 | bwd_inner_microstep: 841.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1038
[2024-06-10 10:50:14,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 159.28 | bwd_microstep: 400.67 | bwd_inner_microstep: 400.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898
[2024-06-10 10:50:16,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.16 | bwd_microstep: 1592.26 | bwd_inner_microstep: 1592.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 10:50:18,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.51 | bwd_microstep: 1560.81 | bwd_inner_microstep: 1560.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.16
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3882
[2024-06-10 10:50:20,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1419.68 | bwd_inner_microstep: 1419.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 10:50:21,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.56 | bwd_microstep: 788.17 | bwd_inner_microstep: 788.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1009
[2024-06-10 10:50:21,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.49 | bwd_microstep: 394.96 | bwd_inner_microstep: 394.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3615
[2024-06-10 10:50:24,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.85 | bwd_microstep: 1672.82 | bwd_inner_microstep: 1672.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 10:50:26,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1375.81 | bwd_inner_microstep: 1375.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 10:50:28,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.41 | bwd_microstep: 1514.59 | bwd_inner_microstep: 1514.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451
[2024-06-10 10:50:30,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.39 | bwd_microstep: 1316.34 | bwd_inner_microstep: 1316.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 10:50:31,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.24 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 10:50:34,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.32 | bwd_microstep: 1545.93 | bwd_inner_microstep: 1545.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 10:50:41,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.57
[2024-06-10 10:50:41,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.88 | bwd_microstep: 6930.38 | bwd_inner_microstep: 1644.33 | bwd_allreduce_microstep: 5285.97 | step_microstep: 38.72
[2024-06-10 10:50:41,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15167.29 | bwd: 45918.01 | bwd_inner: 40630.99 | bwd_allreduce: 5286.27 | step: 41.59
{'loss': 1.3125, 'learning_rate': 3.067467097630886e-05, 'epoch': 0.34}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493
[2024-06-10 10:50:43,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.87 | bwd_microstep: 1567.28 | bwd_inner_microstep: 1567.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3938
[2024-06-10 10:50:46,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.01 | bwd_microstep: 1686.25 | bwd_inner_microstep: 1686.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 10:50:47,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.08 | bwd_microstep: 1340.28 | bwd_inner_microstep: 1340.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1898
[2024-06-10 10:50:48,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.60 | bwd_microstep: 710.74 | bwd_inner_microstep: 710.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 10:50:50,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.51 | bwd_microstep: 1247.17 | bwd_inner_microstep: 1247.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 10:50:52,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.56 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 10:50:54,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.42 | bwd_microstep: 1244.06 | bwd_inner_microstep: 1244.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 10:50:56,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.47 | bwd_microstep: 1385.30 | bwd_inner_microstep: 1385.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3776
[2024-06-10 10:50:58,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1572.30 | bwd_inner_microstep: 1572.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 10:50:59,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.94 | bwd_microstep: 797.48 | bwd_inner_microstep: 797.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440
[2024-06-10 10:51:01,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1312.15 | bwd_inner_microstep: 1312.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 10:51:02,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.63 | bwd_microstep: 809.46 | bwd_inner_microstep: 809.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3411
[2024-06-10 10:51:04,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1509.61 | bwd_inner_microstep: 1509.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986
[2024-06-10 10:51:05,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.63 | bwd_microstep: 831.25 | bwd_inner_microstep: 831.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 10:51:07,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1254.27 | bwd_inner_microstep: 1254.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 10:51:09,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.27 | bwd_microstep: 1615.14 | bwd_inner_microstep: 1615.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2119
[2024-06-10 10:51:11,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.96 | bwd_microstep: 1021.04 | bwd_inner_microstep: 1021.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 10:51:13,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1450.81 | bwd_inner_microstep: 1450.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 10:51:14,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.34 | bwd_microstep: 1278.91 | bwd_inner_microstep: 1278.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709
[2024-06-10 10:51:16,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1332.69 | bwd_inner_microstep: 1332.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 10:51:18,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.77 | bwd_microstep: 1661.71 | bwd_inner_microstep: 1661.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3531
[2024-06-10 10:51:20,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1427.24 | bwd_inner_microstep: 1427.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3757
[2024-06-10 10:51:22,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.86 | bwd_microstep: 1346.75 | bwd_inner_microstep: 1346.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 10:51:24,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1557.79 | bwd_inner_microstep: 1557.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-10 10:51:26,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1306.16 | bwd_inner_microstep: 1306.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-10 10:51:28,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1381.74 | bwd_inner_microstep: 1381.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 10:51:30,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.09 | bwd_microstep: 1493.81 | bwd_inner_microstep: 1493.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3754
[2024-06-10 10:51:32,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.57 | bwd_microstep: 1276.26 | bwd_inner_microstep: 1276.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 10:51:34,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1413.90 | bwd_inner_microstep: 1413.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3735
[2024-06-10 10:51:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.31 | bwd_microstep: 1335.86 | bwd_inner_microstep: 1335.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770
[2024-06-10 10:51:38,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.84 | bwd_microstep: 1345.77 | bwd_inner_microstep: 1345.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 10:51:43,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.32 | optimizer_step: 6.58
[2024-06-10 10:51:43,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.99 | bwd_microstep: 4790.41 | bwd_inner_microstep: 1524.08 | bwd_allreduce_microstep: 3266.27 | step_microstep: 39.58
[2024-06-10 10:51:43,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15814.44 | bwd: 45745.95 | bwd_inner: 42478.64 | bwd_allreduce: 3266.57 | step: 41.08
{'loss': 1.2682, 'learning_rate': 3.064291150000173e-05, 'epoch': 0.34}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-10 10:51:45,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.13 | bwd_microstep: 1172.98 | bwd_inner_microstep: 1172.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 10:51:46,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.14 | bwd_microstep: 1292.24 | bwd_inner_microstep: 1292.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 10:51:48,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.15 | bwd_microstep: 1243.81 | bwd_inner_microstep: 1243.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 10:51:50,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.96 | bwd_microstep: 1376.58 | bwd_inner_microstep: 1376.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3909
[2024-06-10 10:51:52,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.91 | bwd_microstep: 1688.62 | bwd_inner_microstep: 1688.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2172
[2024-06-10 10:51:54,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.63 | bwd_microstep: 948.63 | bwd_inner_microstep: 948.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 10:51:55,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.82 | bwd_microstep: 1312.53 | bwd_inner_microstep: 1312.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 10:51:57,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1248.16 | bwd_inner_microstep: 1248.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 10:51:59,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.76 | bwd_microstep: 1284.34 | bwd_inner_microstep: 1284.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-10 10:52:01,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1630.65 | bwd_inner_microstep: 1630.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 10:52:03,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.34 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 10:52:05,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.95 | bwd_microstep: 1288.15 | bwd_inner_microstep: 1288.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472
[2024-06-10 10:52:07,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.27 | bwd_microstep: 1215.77 | bwd_inner_microstep: 1215.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 10:52:08,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1344.53 | bwd_inner_microstep: 1344.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.42
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 10:52:10,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.01 | bwd_microstep: 1444.67 | bwd_inner_microstep: 1444.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-10 10:52:12,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.07 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 10:52:14,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.76 | bwd_microstep: 1290.25 | bwd_inner_microstep: 1290.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 10:52:16,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.31 | bwd_microstep: 1318.10 | bwd_inner_microstep: 1318.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-10 10:52:17,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.97 | bwd_microstep: 1201.48 | bwd_inner_microstep: 1201.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 10:52:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1556.84 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 10:52:21,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.67 | bwd_microstep: 881.10 | bwd_inner_microstep: 881.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-10 10:52:23,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1426.29 | bwd_inner_microstep: 1426.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 10:52:25,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.87 | bwd_microstep: 1391.66 | bwd_inner_microstep: 1391.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 10:52:27,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.47 | bwd_microstep: 1282.90 | bwd_inner_microstep: 1282.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675
[2024-06-10 10:52:29,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1552.57 | bwd_inner_microstep: 1552.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2286
[2024-06-10 10:52:30,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.41 | bwd_microstep: 937.67 | bwd_inner_microstep: 937.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3460
[2024-06-10 10:52:32,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.94 | bwd_microstep: 1570.05 | bwd_inner_microstep: 1570.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2627
[2024-06-10 10:52:34,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.63 | bwd_microstep: 1116.16 | bwd_inner_microstep: 1116.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 10:52:36,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 10:52:37,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1256.65 | bwd_inner_microstep: 1256.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-10 10:52:39,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.77 | bwd_microstep: 1451.80 | bwd_inner_microstep: 1451.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 10:52:45,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.36 | optimizer_step: 6.62
[2024-06-10 10:52:45,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 5444.11 | bwd_inner_microstep: 1684.19 | bwd_allreduce_microstep: 3759.85 | step_microstep: 38.85
[2024-06-10 10:52:45,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15889.74 | bwd: 46186.52 | bwd_inner: 42425.57 | bwd_allreduce: 3760.21 | step: 42.01
{'loss': 1.2607, 'learning_rate': 3.061111453941561e-05, 'epoch': 0.34}


 34%|███▍      | 586/1726 [10:10:14<19:33:46, 61.78s/it]
 34%|███▍      | 587/1726 [10:11:16<19:30:46, 61.67s/it]


 34%|███▍      | 587/1726 [10:11:16<19:30:46, 61.67s/it]
 34%|███▍      | 588/1726 [10:12:16<19:23:42, 61.36s/it]


 34%|███▍      | 588/1726 [10:12:16<19:23:42, 61.36s/it]
 34%|███▍      | 589/1726 [10:13:18<19:23:10, 61.38s/it]


 34%|███▍      | 589/1726 [10:13:18<19:23:10, 61.38s/it]
 34%|███▍      | 590/1726 [10:14:20<19:25:04, 61.54s/it]


 34%|███▍      | 590/1726 [10:14:20<19:25:04, 61.54s/it]
 34%|███▍      | 591/1726 [10:15:22<19:29:08, 61.80s/it]


 34%|███▍      | 591/1726 [10:15:22<19:29:08,dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 10:52:47,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1367.56 | bwd_inner_microstep: 1367.49 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 10:52:49,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1477.80 | bwd_inner_microstep: 1477.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844
[2024-06-10 10:52:51,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1465.40 | bwd_inner_microstep: 1465.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 10:52:52,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.29 | bwd_microstep: 680.42 | bwd_inner_microstep: 680.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 10:52:54,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 10:52:55,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.43 | bwd_microstep: 794.11 | bwd_inner_microstep: 794.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 10:52:57,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1248.13 | bwd_inner_microstep: 1248.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 10:52:59,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.86 | bwd_microstep: 1434.41 | bwd_inner_microstep: 1434.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 10:53:01,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.33 | bwd_microstep: 1288.07 | bwd_inner_microstep: 1288.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4083
[2024-06-10 10:53:03,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.58 | bwd_microstep: 1827.27 | bwd_inner_microstep: 1827.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 10:53:05,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.41 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3544
[2024-06-10 10:53:07,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.44 | bwd_microstep: 1454.69 | bwd_inner_microstep: 1454.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3412
[2024-06-10 10:53:09,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.77 | bwd_microstep: 1184.05 | bwd_inner_microstep: 1184.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3512
[2024-06-10 10:53:11,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.18 | bwd_microstep: 1451.19 | bwd_inner_microstep: 1451.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3652
[2024-06-10 10:53:12,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1227.44 | bwd_inner_microstep: 1227.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 10:53:14,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.30 | bwd_microstep: 823.54 | bwd_inner_microstep: 823.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 10:53:15,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1397.88 | bwd_inner_microstep: 1397.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 10:53:17,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1350.86 | bwd_inner_microstep: 1350.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2076
[2024-06-10 10:53:19,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.88 | bwd_microstep: 853.41 | bwd_inner_microstep: 853.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 10:53:20,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.22 | bwd_microstep: 970.32 | bwd_inner_microstep: 970.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4056
[2024-06-10 10:53:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.54 | bwd_microstep: 1827.18 | bwd_inner_microstep: 1827.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 10:53:23,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.01 | bwd_microstep: 805.80 | bwd_inner_microstep: 805.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 10:53:26,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.50 | bwd_microstep: 1755.04 | bwd_inner_microstep: 1755.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 10:53:28,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1279.19 | bwd_inner_microstep: 1279.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 10:53:30,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1499.71 | bwd_inner_microstep: 1499.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3445
[2024-06-10 10:53:32,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1410.72 | bwd_inner_microstep: 1410.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 10:53:34,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1400.87 | bwd_inner_microstep: 1400.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 10:53:36,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.68 | bwd_microstep: 1464.58 | bwd_inner_microstep: 1464.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 10:53:37,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.91 | bwd_microstep: 978.17 | bwd_inner_microstep: 978.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 10:53:39,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.96 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 10:53:41,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.70 | bwd_microstep: 1646.65 | bwd_inner_microstep: 1646.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 10:53:48,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 10:53:48,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.67 | bwd_microstep: 6101.01 | bwd_inner_microstep: 1753.40 | bwd_allreduce_microstep: 4347.56 | step_microstep: 38.08
[2024-06-10 10:53:48,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15670.92 | bwd: 46387.19 | bwd_inner: 42038.66 | bwd_allreduce: 4347.83 | step: 40.09
{'loss': 1.2982, 'learning_rate': 3.057928020653925e-05, 'epoch': 0.34}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 10:53:50,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.24 | bwd_microstep: 1297.46 | bwd_inner_microstep: 1297.28 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 10:53:52,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-10 10:53:54,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.68 | bwd_microstep: 1583.12 | bwd_inner_microstep: 1583.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 10:53:56,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.66 | bwd_microstep: 1479.39 | bwd_inner_microstep: 1479.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 10:53:58,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1348.80 | bwd_inner_microstep: 1348.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 10:53:59,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.89 | bwd_microstep: 795.39 | bwd_inner_microstep: 795.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671
[2024-06-10 10:54:01,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.34 | bwd_microstep: 1327.99 | bwd_inner_microstep: 1327.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709
[2024-06-10 10:54:02,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.10 | bwd_microstep: 1332.10 | bwd_inner_microstep: 1332.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 10:54:05,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.46 | bwd_microstep: 1645.85 | bwd_inner_microstep: 1645.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 10:54:07,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.94 | bwd_microstep: 1443.72 | bwd_inner_microstep: 1443.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3701
[2024-06-10 10:54:09,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.81 | bwd_microstep: 1721.97 | bwd_inner_microstep: 1721.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3379
[2024-06-10 10:54:11,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.58 | bwd_microstep: 1269.79 | bwd_inner_microstep: 1269.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436
[2024-06-10 10:54:13,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.49 | bwd_microstep: 1316.23 | bwd_inner_microstep: 1316.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 10:54:15,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.71 | bwd_microstep: 1477.08 | bwd_inner_microstep: 1477.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3663
[2024-06-10 10:54:17,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.00 | bwd_microstep: 1445.94 | bwd_inner_microstep: 1445.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 10:54:19,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1431.25 | bwd_inner_microstep: 1431.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 10:54:20,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1355.24 | bwd_inner_microstep: 1355.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 10:54:22,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1400.45 | bwd_inner_microstep: 1400.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586
[2024-06-10 10:54:24,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.87 | bwd_microstep: 1211.92 | bwd_inner_microstep: 1211.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3831
[2024-06-10 10:54:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1491.94 | bwd_inner_microstep: 1491.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 10:54:28,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.97 | bwd_microstep: 1453.77 | bwd_inner_microstep: 1453.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 10:54:30,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.11 | bwd_microstep: 1296.33 | bwd_inner_microstep: 1296.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-10 10:54:32,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.75 | bwd_microstep: 1568.99 | bwd_inner_microstep: 1568.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 10:54:34,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1376.11 | bwd_inner_microstep: 1376.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2706
[2024-06-10 10:54:35,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.81 | bwd_microstep: 1038.82 | bwd_inner_microstep: 1038.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3483
[2024-06-10 10:54:37,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.09 | bwd_microstep: 1345.33 | bwd_inner_microstep: 1345.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 10:54:39,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1314.85 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 10:54:41,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 1405.11 | bwd_inner_microstep: 1405.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2279
[2024-06-10 10:54:42,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.32 | bwd_microstep: 1005.32 | bwd_inner_microstep: 1005.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 10:54:45,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1461.47 | bwd_inner_microstep: 1461.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2669
[2024-06-10 10:54:46,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.58 | bwd_microstep: 1118.31 | bwd_inner_microstep: 1118.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2395
[2024-06-10 10:54:48,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 10:54:48,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.83 | bwd_microstep: 1072.41 | bwd_inner_microstep: 1064.41 | bwd_allreduce_microstep: 7.95 | step_microstep: 37.79
[2024-06-10 10:54:48,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16201.32 | bwd: 43213.78 | bwd_inner: 43204.63 | bwd_allreduce: 8.35 | step: 39.78
{'loss': 1.3038, 'learning_rate': 3.0547408613493e-05, 'epoch': 0.34}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 10:54:49,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.24 | bwd_microstep: 796.99 | bwd_inner_microstep: 796.88 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 10:54:51,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1385.29 | bwd_inner_microstep: 1385.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3886
[2024-06-10 10:54:53,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.84 | bwd_microstep: 1688.29 | bwd_inner_microstep: 1688.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2487
[2024-06-10 10:54:54,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.05 | bwd_microstep: 1053.52 | bwd_inner_microstep: 1053.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 10:54:56,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1387.75 | bwd_inner_microstep: 1387.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 10:54:58,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2526
[2024-06-10 10:54:59,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.43 | bwd_microstep: 1033.83 | bwd_inner_microstep: 1033.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 10:55:01,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1251.13 | bwd_inner_microstep: 1251.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.23
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503
[2024-06-10 10:55:03,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1317.69 | bwd_inner_microstep: 1317.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 10:55:05,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.64 | bwd_microstep: 1394.89 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 10:55:07,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.52 | bwd_microstep: 1498.66 | bwd_inner_microstep: 1498.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1964
[2024-06-10 10:55:08,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.24 | bwd_microstep: 891.66 | bwd_inner_microstep: 891.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1940
[2024-06-10 10:55:09,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.60 | bwd_microstep: 854.46 | bwd_inner_microstep: 854.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3393
[2024-06-10 10:55:11,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.68 | bwd_microstep: 1435.79 | bwd_inner_microstep: 1435.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 10:55:14,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1537.60 | bwd_inner_microstep: 1537.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3425
[2024-06-10 10:55:15,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.54 | bwd_microstep: 1375.94 | bwd_inner_microstep: 1375.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682
[2024-06-10 10:55:17,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.25 | bwd_microstep: 1420.80 | bwd_inner_microstep: 1420.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 10:55:19,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.30 | bwd_microstep: 1472.96 | bwd_inner_microstep: 1472.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 10:55:21,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1347.91 | bwd_inner_microstep: 1347.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 10:55:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.28 | bwd_microstep: 1188.49 | bwd_inner_microstep: 1188.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 10:55:25,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.09 | bwd_microstep: 1300.33 | bwd_inner_microstep: 1300.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621
[2024-06-10 10:55:27,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1444.01 | bwd_inner_microstep: 1443.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-10 10:55:28,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.32 | bwd_microstep: 810.33 | bwd_inner_microstep: 810.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 10:55:30,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1402.08 | bwd_inner_microstep: 1402.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2024
[2024-06-10 10:55:31,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.29 | bwd_microstep: 744.03 | bwd_inner_microstep: 744.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 10:55:33,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.41 | bwd_microstep: 1253.62 | bwd_inner_microstep: 1253.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 10:55:35,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.21 | bwd_microstep: 1515.17 | bwd_inner_microstep: 1515.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2195
[2024-06-10 10:55:36,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.82 | bwd_microstep: 798.50 | bwd_inner_microstep: 798.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 10:55:38,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.34 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 10:55:40,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.48 | bwd_microstep: 1510.41 | bwd_inner_microstep: 1510.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2960
[2024-06-10 10:55:41,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.39 | bwd_microstep: 1262.30 | bwd_inner_microstep: 1262.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 10:55:50,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.58
[2024-06-10 10:55:50,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.24 | bwd_microstep: 7901.51 | bwd_inner_microstep: 1439.98 | bwd_allreduce_microstep: 6461.46 | step_microstep: 38.89
[2024-06-10 10:55:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15086.48 | bwd: 46812.48 | bwd_inner: 40349.99 | bwd_allreduce: 6461.75 | step: 41.90
{'loss': 1.2817, 'learning_rate': 3.0515499872528446e-05, 'epoch': 0.34}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 10:55:52,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1271.64 | bwd_inner_microstep: 1271.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 10:55:53,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1243.87 | bwd_inner_microstep: 1243.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3863
[2024-06-10 10:55:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 678.43 | bwd_microstep: 1868.54 | bwd_inner_microstep: 1868.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 10:55:58,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1478.18 | bwd_inner_microstep: 1478.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 10:55:59,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.38 | bwd_microstep: 970.75 | bwd_inner_microstep: 970.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2890
[2024-06-10 10:56:01,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.87 | bwd_microstep: 995.38 | bwd_inner_microstep: 995.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638
[2024-06-10 10:56:02,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.12 | bwd_microstep: 1017.64 | bwd_inner_microstep: 1017.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:56:04,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1245.37 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 10:56:06,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.16 | bwd_microstep: 1384.71 | bwd_inner_microstep: 1384.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515
[2024-06-10 10:56:07,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.41 | bwd_microstep: 1228.11 | bwd_inner_microstep: 1228.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 10:56:09,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.77 | bwd_microstep: 1166.54 | bwd_inner_microstep: 1166.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 10:56:11,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1348.50 | bwd_inner_microstep: 1348.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3662
[2024-06-10 10:56:13,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.00 | bwd_microstep: 1545.35 | bwd_inner_microstep: 1545.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 10:56:15,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1345.84 | bwd_inner_microstep: 1345.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 10:56:17,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1607.98 | bwd_inner_microstep: 1607.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 10:56:19,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1384.47 | bwd_inner_microstep: 1384.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3530
[2024-06-10 10:56:21,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 10:56:23,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1420.37 | bwd_inner_microstep: 1420.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3532
[2024-06-10 10:56:25,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.43 | bwd_microstep: 1581.85 | bwd_inner_microstep: 1581.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 10:56:27,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.34 | bwd_microstep: 1298.65 | bwd_inner_microstep: 1298.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 10:56:29,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1297.49 | bwd_inner_microstep: 1297.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 10:56:31,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1383.04 | bwd_inner_microstep: 1383.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3601
[2024-06-10 10:56:33,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.15 | bwd_microstep: 1467.48 | bwd_inner_microstep: 1467.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 10:56:35,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.62 | bwd_microstep: 1659.84 | bwd_inner_microstep: 1659.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 10:56:36,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.79 | bwd_microstep: 808.77 | bwd_inner_microstep: 808.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 10:56:38,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1296.04 | bwd_inner_microstep: 1296.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023
[2024-06-10 10:56:39,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.42 | bwd_microstep: 717.39 | bwd_inner_microstep: 717.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 10:56:41,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.03 | bwd_microstep: 1357.50 | bwd_inner_microstep: 1357.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 10:56:42,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.62 | bwd_microstep: 1284.71 | bwd_inner_microstep: 1284.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 10:56:44,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.81 | bwd_microstep: 1304.37 | bwd_inner_microstep: 1304.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 10:56:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.42 | bwd_microstep: 1511.12 | bwd_inner_microstep: 1511.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3808
[2024-06-10 10:56:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.32 | optimizer_step: 6.59
[2024-06-10 10:56:51,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.86 | bwd_microstep: 4112.54 | bwd_inner_microstep: 2060.46 | bwd_allreduce_microstep: 2052.01 | step_microstep: 38.69
[2024-06-10 10:56:51,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16032.10 | bwd: 44948.32 | bwd_inner: 42895.38 | bwd_allreduce: 2052.25 | step: 40.37
{'loss': 1.2267, 'learning_rate': 3.0483554096027998e-05, 'epoch': 0.34}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-10 10:56:53,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.34 | bwd_microstep: 1366.82 | bwd_inner_microstep: 1366.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-10 10:56:55,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.99 | bwd_microstep: 1582.22 | bwd_inner_microstep: 1582.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 10:56:57,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.65 | bwd_microstep: 1250.87 | bwd_inner_microstep: 1250.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 10:56:59,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.26 | bwd_microstep: 1638.90 | bwd_inner_microstep: 1638.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-10 10:57:01,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.32 | bwd_microstep: 1533.37 | bwd_inner_microstep: 1533.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2238
[2024-06-10 10:57:03,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.08 | bwd_microstep: 898.77 | bwd_inner_microstep: 898.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 10:57:05,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1532.13 | bwd_inner_microstep: 1532.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887
[2024-06-10 10:57:06,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.96 | bwd_microstep: 682.79 | bwd_inner_microstep: 682.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 10:57:07,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.41 | bwd_microstep: 1249.33 | bwd_inner_microstep: 1249.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3685
[2024-06-10 10:57:09,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.94 | bwd_microstep: 1325.93 | bwd_inner_microstep: 1325.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3400
[2024-06-10 10:57:11,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.84 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489
[2024-06-10 10:57:13,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1428.20 | bwd_inner_microstep: 1428.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2647
[2024-06-10 10:57:15,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.96 | bwd_microstep: 1116.39 | bwd_inner_microstep: 1116.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-10 10:57:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1409.50 | bwd_inner_microstep: 1409.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3919
[2024-06-10 10:57:19,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.81 | bwd_microstep: 1789.26 | bwd_inner_microstep: 1789.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3640
[2024-06-10 10:57:21,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.52 | bwd_microstep: 1640.18 | bwd_inner_microstep: 1640.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 10:57:23,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1345.53 | bwd_inner_microstep: 1345.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 10:57:25,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1477.47 | bwd_inner_microstep: 1477.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 10:57:27,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 1604.37 | bwd_inner_microstep: 1604.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1875
[2024-06-10 10:57:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.82 | bwd_microstep: 806.66 | bwd_inner_microstep: 806.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-10 10:57:30,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.29 | bwd_microstep: 1072.43 | bwd_inner_microstep: 1072.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2283
[2024-06-10 10:57:31,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.49 | bwd_microstep: 909.57 | bwd_inner_microstep: 909.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-10 10:57:32,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.98 | bwd_microstep: 901.32 | bwd_inner_microstep: 901.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3538
[2024-06-10 10:57:34,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.45 | bwd_microstep: 1457.16 | bwd_inner_microstep: 1457.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 10:57:36,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1526.16 | bwd_inner_microstep: 1526.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2210
[2024-06-10 10:57:38,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.48 | bwd_microstep: 861.04 | bwd_inner_microstep: 861.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 10:57:40,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.85 | bwd_microstep: 1451.47 | bwd_inner_microstep: 1451.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 10:57:42,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1492.94 | bwd_inner_microstep: 1492.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 10:57:44,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.32 | bwd_microstep: 1317.13 | bwd_inner_microstep: 1317.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-10 10:57:46,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.94 | bwd_microstep: 1616.19 | bwd_inner_microstep: 1616.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 10:57:48,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.61 | bwd_microstep: 1526.33 | bwd_inner_microstep: 1526.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3771
[2024-06-10 10:57:54,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.60
[2024-06-10 10:57:54,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.05 | bwd_microstep: 5348.18 | bwd_inner_microstep: 1933.23 | bwd_allreduce_microstep: 3414.89 | step_microstep: 38.88
[2024-06-10 10:57:54,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15876.45 | bwd: 46449.64 | bwd_inner: 43033.79 | bwd_allreduce: 3415.15 | step: 40.46
{'loss': 1.2834, 'learning_rate': 3.0451571396504528e-05, 'epoch': 0.35}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 10:57:56,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.36 | bwd_microstep: 1268.58 | bwd_inner_microstep: 1268.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470
[2024-06-10 10:57:58,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.28 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 10:58:00,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.74 | bwd_microstep: 1561.08 | bwd_inner_microstep: 1561.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 10:58:01,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.67 | bwd_microstep: 788.47 | bwd_inner_microstep: 788.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 10:58:03,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.20 | bwd_microstep: 1386.56 | bwd_inner_microstep: 1386.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 10:58:05,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.64 | bwd_microstep: 1278.50 | bwd_inner_microstep: 1278.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 10:58:07,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.02 | bwd_microstep: 1483.42 | bwd_inner_microstep: 1483.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 10:58:08,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1281.86 | bwd_inner_microstep: 1281.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 936
[2024-06-10 10:58:09,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.70 | bwd_microstep: 379.54 | bwd_inner_microstep: 379.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 10:58:11,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1283.75 | bwd_inner_microstep: 1283.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 10:58:12,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1257.57 | bwd_inner_microstep: 1257.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2013
[2024-06-10 10:58:14,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.39 | bwd_microstep: 901.26 | bwd_inner_microstep: 901.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-10 10:58:16,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.23 | bwd_microstep: 1712.94 | bwd_inner_microstep: 1712.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3667
[2024-06-10 10:58:18,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.68 | bwd_microstep: 1542.69 | bwd_inner_microstep: 1542.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 10:58:20,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1289.31 | bwd_inner_microstep: 1289.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3506
[2024-06-10 10:58:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.61 | bwd_microstep: 1585.08 | bwd_inner_microstep: 1585.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3854
[2024-06-10 10:58:24,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.88 | bwd_microstep: 1700.86 | bwd_inner_microstep: 1700.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-10 10:58:27,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.92 | bwd_microstep: 1612.98 | bwd_inner_microstep: 1612.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 10:58:29,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 1498.92 | bwd_inner_microstep: 1498.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 10:58:31,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.09 | bwd_microstep: 1427.00 | bwd_inner_microstep: 1426.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 10:58:33,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.02 | bwd_microstep: 1488.16 | bwd_inner_microstep: 1488.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 10:58:35,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.25 | bwd_microstep: 1441.77 | bwd_inner_microstep: 1441.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 10:58:37,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1397.14 | bwd_inner_microstep: 1397.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-10 10:58:38,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.29 | bwd_microstep: 702.88 | bwd_inner_microstep: 702.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 10:58:40,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1398.90 | bwd_inner_microstep: 1398.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 10:58:41,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.09 | bwd_microstep: 684.94 | bwd_inner_microstep: 684.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 10:58:43,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1433.64 | bwd_inner_microstep: 1433.39 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 10:58:44,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1348.04 | bwd_inner_microstep: 1348.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3565
[2024-06-10 10:58:46,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1454.28 | bwd_inner_microstep: 1454.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2047
[2024-06-10 10:58:48,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.76 | bwd_microstep: 905.12 | bwd_inner_microstep: 905.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3806
[2024-06-10 10:58:50,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 677.13 | bwd_microstep: 1864.50 | bwd_inner_microstep: 1864.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 10:58:54,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.30 | optimizer_step: 6.58
[2024-06-10 10:58:54,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.10 | bwd_microstep: 3164.09 | bwd_inner_microstep: 1990.45 | bwd_allreduce_microstep: 1173.59 | step_microstep: 38.37
[2024-06-10 10:58:54,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15885.43 | bwd: 43934.92 | bwd_inner: 42760.20 | bwd_allreduce: 1173.95 | step: 40.29
 61.80s/it]
 34%|███▍      | 592/1726 [10:16:25<19:31:33, 61.99s/it]


 34%|███▍      | 592/1726 [10:16:25<19:31:33, 61.99s/it]
 34%|███▍      | 593/1726 [10:17:24<19:18:00, 61.32s/it]


 34%|███▍      | 593/1726 [10:17:24<19:18:00, 61.32s/it]
 34%|███▍      | 594/1726 [10:18:27<19:22:16, 61.60s/it]


 34%|███▍      | 594/1726 [10:18:27<19:22:16, 61.60s/it]
 34%|███▍      | 595/1726 [10:19:28<19:19:46, 61.53s/it]


 34%|███▍      | 595/1726 [10:19:28<19:19:46, 61.53s/it]
 35%|███▍      | 596/1726 [10:20:31<19:25:18, 61.87s/it]


 35%|███▍      | 596/1726 [10:20:31<19:25:18, 61.87s/it]
 35%|███▍      | 597/1726 [10:21:31<19:14:45, 61.37s/it]
       {'loss': 1.3164, 'learning_rate': 3.0419551886600935e-05, 'epoch': 0.35}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3475
[2024-06-10 10:58:56,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1571.18 | bwd_inner_microstep: 1571.12 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3912
[2024-06-10 10:58:58,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1588.75 | bwd_inner_microstep: 1588.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 10:59:01,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.40 | bwd_microstep: 1660.65 | bwd_inner_microstep: 1660.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 10:59:02,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.84 | bwd_microstep: 1187.24 | bwd_inner_microstep: 1187.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2467
[2024-06-10 10:59:04,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.64 | bwd_microstep: 952.97 | bwd_inner_microstep: 952.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 10:59:05,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.11 | bwd_microstep: 1296.85 | bwd_inner_microstep: 1296.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2503
[2024-06-10 10:59:07,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 961.08 | bwd_inner_microstep: 961.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 10:59:09,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.50 | bwd_microstep: 1532.00 | bwd_inner_microstep: 1531.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 10:59:11,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.55 | bwd_microstep: 1285.56 | bwd_inner_microstep: 1285.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 10:59:12,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1288.62 | bwd_inner_microstep: 1288.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 10:59:14,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1390.58 | bwd_inner_microstep: 1390.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 10:59:16,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.69 | bwd_microstep: 1379.90 | bwd_inner_microstep: 1379.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-10 10:59:17,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.59 | bwd_microstep: 782.41 | bwd_inner_microstep: 782.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-10 10:59:19,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1418.77 | bwd_inner_microstep: 1418.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515
[2024-06-10 10:59:21,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.91 | bwd_microstep: 1223.32 | bwd_inner_microstep: 1223.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 10:59:23,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.64 | bwd_microstep: 1384.35 | bwd_inner_microstep: 1384.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931
[2024-06-10 10:59:24,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.18 | bwd_microstep: 729.00 | bwd_inner_microstep: 728.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 10:59:26,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1402.23 | bwd_inner_microstep: 1402.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 10:59:28,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.50 | bwd_microstep: 1293.63 | bwd_inner_microstep: 1293.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524
[2024-06-10 10:59:29,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.61 | bwd_microstep: 1200.61 | bwd_inner_microstep: 1200.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 10:59:31,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1499.30 | bwd_inner_microstep: 1499.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 10:59:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.14 | bwd_microstep: 1506.87 | bwd_inner_microstep: 1506.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3584
[2024-06-10 10:59:36,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.61 | bwd_microstep: 1467.17 | bwd_inner_microstep: 1467.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 10:59:38,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.02 | bwd_microstep: 1493.34 | bwd_inner_microstep: 1493.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 10:59:40,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.69 | bwd_microstep: 1550.79 | bwd_inner_microstep: 1550.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3749
[2024-06-10 10:59:42,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.56 | bwd_microstep: 1566.82 | bwd_inner_microstep: 1566.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1932
[2024-06-10 10:59:43,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.98 | bwd_microstep: 731.36 | bwd_inner_microstep: 731.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 10:59:45,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.77 | bwd_microstep: 1343.20 | bwd_inner_microstep: 1343.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385
[2024-06-10 10:59:47,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.37 | bwd_microstep: 1439.40 | bwd_inner_microstep: 1439.27 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.25
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3729
[2024-06-10 10:59:49,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.17 | bwd_microstep: 1563.56 | bwd_inner_microstep: 1563.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 10:59:51,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.78 | bwd_microstep: 1351.49 | bwd_inner_microstep: 1351.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 10:59:55,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 10:59:55,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 3157.92 | bwd_inner_microstep: 1695.95 | bwd_allreduce_microstep: 1461.92 | step_microstep: 38.38
[2024-06-10 10:59:55,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15975.15 | bwd: 44200.98 | bwd_inner: 42737.98 | bwd_allreduce: 1462.21 | step: 40.67
{'loss': 1.2141, 'learning_rate': 3.0387495679089753e-05, 'epoch': 0.35}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-10 10:59:56,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.84 | bwd_microstep: 1240.52 | bwd_inner_microstep: 1240.34 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3880
[2024-06-10 10:59:58,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.61 | bwd_microstep: 1481.87 | bwd_inner_microstep: 1481.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 11:00:01,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1548.58 | bwd_inner_microstep: 1548.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 11:00:02,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 11:00:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1249.95 | bwd_inner_microstep: 1249.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:00:06,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1397.66 | bwd_inner_microstep: 1397.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 11:00:08,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1384.28 | bwd_inner_microstep: 1384.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 11:00:10,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 11:00:10,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.00 | bwd_microstep: 681.46 | bwd_inner_microstep: 681.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 11:00:11,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.24 | bwd_microstep: 709.29 | bwd_inner_microstep: 709.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3409
[2024-06-10 11:00:13,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.26 | bwd_microstep: 1371.18 | bwd_inner_microstep: 1371.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3644
[2024-06-10 11:00:16,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.12 | bwd_microstep: 1814.00 | bwd_inner_microstep: 1813.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843
[2024-06-10 11:00:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.89 | bwd_microstep: 1663.49 | bwd_inner_microstep: 1663.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 11:00:20,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 11:00:22,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.76 | bwd_microstep: 1427.27 | bwd_inner_microstep: 1427.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 11:00:24,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.50 | bwd_microstep: 1391.31 | bwd_inner_microstep: 1391.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:00:26,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.83 | bwd_microstep: 1395.25 | bwd_inner_microstep: 1395.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3530
[2024-06-10 11:00:27,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.53 | bwd_microstep: 1227.20 | bwd_inner_microstep: 1227.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 11:00:30,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1607.71 | bwd_inner_microstep: 1607.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 11:00:31,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.34 | bwd_microstep: 1160.43 | bwd_inner_microstep: 1160.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 11:00:33,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1329.40 | bwd_inner_microstep: 1329.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 11:00:35,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.45 | bwd_microstep: 1543.98 | bwd_inner_microstep: 1543.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 11:00:37,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1451.32 | bwd_inner_microstep: 1451.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 11:00:39,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1464.06 | bwd_inner_microstep: 1464.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 11:00:42,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.36 | bwd_microstep: 1666.25 | bwd_inner_microstep: 1666.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3556
[2024-06-10 11:00:43,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1282.35 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-10 11:00:45,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.17 | bwd_microstep: 1216.92 | bwd_inner_microstep: 1216.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3723
[2024-06-10 11:00:47,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1369.58 | bwd_inner_microstep: 1369.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 11:00:49,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1397.16 | bwd_inner_microstep: 1397.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 11:00:51,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1509.72 | bwd_inner_microstep: 1509.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 11:00:53,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.78 | bwd_microstep: 1547.16 | bwd_inner_microstep: 1547.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 11:00:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 11:00:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.27 | bwd_microstep: 3146.53 | bwd_inner_microstep: 1796.44 | bwd_allreduce_microstep: 1350.03 | step_microstep: 39.60
[2024-06-10 11:00:57,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16473.37 | bwd: 45455.34 | bwd_inner: 44104.26 | bwd_allreduce: 1350.33 | step: 41.26
{'loss': 1.276, 'learning_rate': 3.03554028868728e-05, 'epoch': 0.35}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 11:00:59,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.41 | bwd_microstep: 1479.36 | bwd_inner_microstep: 1479.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2386
[2024-06-10 11:01:00,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.31 | bwd_microstep: 996.41 | bwd_inner_microstep: 996.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3857
[2024-06-10 11:01:02,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1457.92 | bwd_inner_microstep: 1457.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827
[2024-06-10 11:01:04,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1486.11 | bwd_inner_microstep: 1486.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2424
[2024-06-10 11:01:06,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.86 | bwd_microstep: 939.01 | bwd_inner_microstep: 938.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 11:01:08,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.90 | bwd_microstep: 1343.44 | bwd_inner_microstep: 1343.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2225
[2024-06-10 11:01:09,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.71 | bwd_microstep: 862.25 | bwd_inner_microstep: 862.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 11:01:11,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 11:01:12,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.31 | bwd_microstep: 1290.46 | bwd_inner_microstep: 1290.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696
[2024-06-10 11:01:14,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.87 | bwd_microstep: 1424.38 | bwd_inner_microstep: 1424.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 11:01:17,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.25 | bwd_microstep: 1622.55 | bwd_inner_microstep: 1622.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 11:01:19,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1379.67 | bwd_inner_microstep: 1379.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 11:01:20,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.79 | bwd_microstep: 1342.05 | bwd_inner_microstep: 1342.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3653
[2024-06-10 11:01:22,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1371.05 | bwd_inner_microstep: 1371.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 11:01:24,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.31 | bwd_microstep: 1521.78 | bwd_inner_microstep: 1521.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 11:01:26,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2296
[2024-06-10 11:01:27,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.82 | bwd_microstep: 879.20 | bwd_inner_microstep: 879.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 11:01:29,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1375.97 | bwd_inner_microstep: 1375.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 647
[2024-06-10 11:01:30,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.16 | bwd_microstep: 278.27 | bwd_inner_microstep: 278.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-10 11:01:31,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.52 | bwd_microstep: 1185.33 | bwd_inner_microstep: 1185.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-10 11:01:33,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.05 | bwd_microstep: 1155.24 | bwd_inner_microstep: 1155.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 11:01:35,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.46 | bwd_microstep: 1559.44 | bwd_inner_microstep: 1559.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 11:01:37,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1299.99 | bwd_inner_microstep: 1299.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 11:01:39,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1379.82 | bwd_inner_microstep: 1379.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-10 11:01:41,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1419.29 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3774
[2024-06-10 11:01:43,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1404.63 | bwd_inner_microstep: 1404.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3808
[2024-06-10 11:01:45,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.17 | bwd_microstep: 1705.94 | bwd_inner_microstep: 1705.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 11:01:47,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.69 | bwd_microstep: 1307.19 | bwd_inner_microstep: 1307.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3576
[2024-06-10 11:01:49,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1555.65 | bwd_inner_microstep: 1555.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 11:01:51,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1347.94 | bwd_inner_microstep: 1347.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2934
[2024-06-10 11:01:53,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.87 | bwd_microstep: 1191.77 | bwd_inner_microstep: 1191.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597
[2024-06-10 11:02:00,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.42 | optimizer_step: 6.58
[2024-06-10 11:02:00,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.85 | bwd_microstep: 6768.14 | bwd_inner_microstep: 1463.15 | bwd_allreduce_microstep: 5304.92 | step_microstep: 39.99
[2024-06-10 11:02:00,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15602.24 | bwd: 46996.41 | bwd_inner: 41690.56 | bwd_allreduce: 5305.16 | step: 41.58
{'loss': 1.2324, 'learning_rate': 3.0323273622980706e-05, 'epoch': 0.35}


 35%|███▍      | 597/1726 [10:21:31<19:14:45, 61.37s/it]
 35%|███▍      | 598/1726 [10:22:31<19:09:08, 61.12s/it]


 35%|███▍      | 598/1726 [10:22:31<19:09:08, 61.12s/it]
 35%|███▍      | 599/1726 [10:23:34<19:14:40, 61.47s/it]


 35%|███▍      | 599/1726 [10:23:34<19:14:40, 61.47s/it]
 35%|███▍      | 600/1726 [10:24:37<19:21:55, 61.91s/it]


 35%|███▍      | 600/1726 [10:24:37<19:21:55, 61.91s/it][INFO|trainer.py:2936] 2024-06-10 11:02:03,014 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600
[INFO|configuration_utils.py:473] 2024-06-10 11:02:03,018 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/config.json
[INFO|configuration_utils.py:594] 2024-06-10 11:02:03,020 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-10 11:02:11,553 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-10 11:02:11,564 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-10 11:02:11,566 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-10 11:02:11,567 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/added_tokens.json
[2024-06-10 11:02:11,786] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved!
[2024-06-10 11:02:11,798] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/mp_rank_00_model_states.pt
[2024-06-10 11:02:11,798] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/mp_rank_00_model_states.pt...
[2024-06-10 11:02:21,370] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/mp_rank_00_model_states.pt.
[2024-06-10 11:02:21,377] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-10 11:02:34,521] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-10 11:02:34,530] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-10 11:02:34,530] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now!
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-10 11:02:36,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.05 | bwd_microstep: 1563.68 | bwd_inner_microstep: 1563.62 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3954
[2024-06-10 11:02:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.78 | bwd_microstep: 1588.42 | bwd_inner_microstep: 1588.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 11:02:41,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.58 | bwd_microstep: 1375.74 | bwd_inner_microstep: 1375.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3835
[2024-06-10 11:02:42,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1381.12 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 11:02:45,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1536.59 | bwd_inner_microstep: 1536.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 11:02:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.67 | bwd_microstep: 1540.04 | bwd_inner_microstep: 1540.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 11:02:48,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.66 | bwd_microstep: 1244.45 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 11:02:50,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 1385.09 | bwd_inner_microstep: 1385.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 11:02:52,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.68 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4003
[2024-06-10 11:02:54,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 1508.84 | bwd_inner_microstep: 1508.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1913
[2024-06-10 11:02:55,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.45 | bwd_microstep: 839.12 | bwd_inner_microstep: 839.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1990
[2024-06-10 11:02:57,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.37 | bwd_microstep: 897.14 | bwd_inner_microstep: 897.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-10 11:02:59,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1409.03 | bwd_inner_microstep: 1409.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 11:03:01,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3674
[2024-06-10 11:03:03,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.05 | bwd_microstep: 1714.16 | bwd_inner_microstep: 1714.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 11:03:05,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1288.75 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 11:03:06,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1157.09 | bwd_inner_microstep: 1157.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 11:03:08,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:03:10,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1393.64 | bwd_inner_microstep: 1393.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3837
[2024-06-10 11:03:12,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.14 | bwd_microstep: 1358.51 | bwd_inner_microstep: 1358.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-10 11:03:14,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1625.20 | bwd_inner_microstep: 1625.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 11:03:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1413.53 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2306
[2024-06-10 11:03:17,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.36 | bwd_microstep: 983.06 | bwd_inner_microstep: 983.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4171
[2024-06-10 11:03:20,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.33 | bwd_microstep: 1556.34 | bwd_inner_microstep: 1556.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 11:03:22,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.60 | bwd_microstep: 1462.54 | bwd_inner_microstep: 1462.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 11:03:23,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.28 | bwd_microstep: 1241.69 | bwd_inner_microstep: 1241.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3584
[2024-06-10 11:03:26,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.29 | bwd_microstep: 1698.36 | bwd_inner_microstep: 1698.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2041
[2024-06-10 11:03:27,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.82 | bwd_microstep: 904.10 | bwd_inner_microstep: 904.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 11:03:29,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.29 | bwd_microstep: 1449.35 | bwd_inner_microstep: 1449.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 11:03:31,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 1400.85 | bwd_inner_microstep: 1400.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 11:03:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.00 | bwd_microstep: 1385.89 | bwd_inner_microstep: 1385.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 11:03:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 11:03:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.32 | bwd_microstep: 1624.41 | bwd_inner_microstep: 1472.14 | bwd_allreduce_microstep: 152.22 | step_microstep: 37.92
[2024-06-10 11:03:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16391.34 | bwd: 43996.48 | bwd_inner: 43843.30 | bwd_allreduce: 152.47 | step: 39.48
{'loss': 1.2789, 'learning_rate': 3.029110800057258e-05, 'epoch': 0.35}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 11:03:37,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1336.94 | bwd_inner_microstep: 1336.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4020
[2024-06-10 11:03:39,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1545.41 | bwd_inner_microstep: 1545.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 11:03:41,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 11:03:43,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1249.34 | bwd_inner_microstep: 1249.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 11:03:44,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.48 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 11:03:46,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1278.42 | bwd_inner_microstep: 1278.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 11:03:48,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.55 | bwd_microstep: 1532.28 | bwd_inner_microstep: 1532.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 11:03:50,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1536.95 | bwd_inner_microstep: 1536.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893
[2024-06-10 11:03:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 687.37 | bwd_inner_microstep: 687.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 11:03:53,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-10 11:03:55,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.85 | bwd_microstep: 895.86 | bwd_inner_microstep: 895.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 11:03:56,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1345.92 | bwd_inner_microstep: 1345.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 11:03:58,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.22 | bwd_microstep: 1522.76 | bwd_inner_microstep: 1522.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2777
[2024-06-10 11:04:00,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.53 | bwd_microstep: 1146.52 | bwd_inner_microstep: 1146.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 11:04:02,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1290.39 | bwd_inner_microstep: 1290.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 11:04:04,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1447.23 | bwd_inner_microstep: 1447.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 11:04:06,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 11:04:07,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.04 | bwd_microstep: 807.27 | bwd_inner_microstep: 807.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3631
[2024-06-10 11:04:09,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.21 | bwd_microstep: 1552.07 | bwd_inner_microstep: 1552.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 11:04:11,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.81
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 11:04:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.35 | bwd_microstep: 1425.17 | bwd_inner_microstep: 1425.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 11:04:14,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.56 | bwd_microstep: 801.21 | bwd_inner_microstep: 801.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 11:04:16,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.14 | bwd_microstep: 1558.83 | bwd_inner_microstep: 1558.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 11:04:18,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 11:04:20,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.91 | bwd_microstep: 1455.15 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 11:04:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.67 | bwd_microstep: 1553.14 | bwd_inner_microstep: 1553.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 11:04:23,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.91 | bwd_microstep: 802.84 | bwd_inner_microstep: 802.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 11:04:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1509.38 | bwd_inner_microstep: 1509.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 11:04:27,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.97 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3579
[2024-06-10 11:04:29,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1333.34 | bwd_inner_microstep: 1333.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 11:04:31,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.67 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3787
[2024-06-10 11:04:35,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.64
[2024-06-10 11:04:35,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.56 | bwd_microstep: 3656.88 | bwd_inner_microstep: 1523.48 | bwd_allreduce_microstep: 2133.34 | step_microstep: 38.34
[2024-06-10 11:04:35,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15735.26 | bwd: 44236.27 | bwd_inner: 42102.01 | bwd_allreduce: 2133.57 | step: 41.84
{'loss': 1.2564, 'learning_rate': 3.025890613293557e-05, 'epoch': 0.35}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 11:04:37,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1474.18 | bwd_inner_microstep: 1473.97 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3914
[2024-06-10 11:04:40,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1550.29 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2395
[2024-06-10 11:04:41,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.03 | bwd_microstep: 1000.68 | bwd_inner_microstep: 1000.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3762
[2024-06-10 11:04:43,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1371.81 | bwd_inner_microstep: 1371.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3777
[2024-06-10 11:04:45,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.08 | bwd_microstep: 1571.73 | bwd_inner_microstep: 1571.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3421
[2024-06-10 11:04:47,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.02 | bwd_microstep: 1216.44 | bwd_inner_microstep: 1216.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2087
[2024-06-10 11:04:48,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.50 | bwd_microstep: 727.46 | bwd_inner_microstep: 727.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 11:04:50,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1388.14 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 11:04:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.86 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-10 11:04:52,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.35 | bwd_microstep: 818.83 | bwd_inner_microstep: 818.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-10 11:04:54,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1417.88 | bwd_inner_microstep: 1417.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 11:04:56,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.47 | bwd_microstep: 1290.33 | bwd_inner_microstep: 1290.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3495
[2024-06-10 11:04:58,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1345.85 | bwd_inner_microstep: 1345.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470
[2024-06-10 11:05:00,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1344.31 | bwd_inner_microstep: 1344.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 11:05:02,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1495.04 | bwd_inner_microstep: 1495.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2760
[2024-06-10 11:05:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.86 | bwd_microstep: 1142.31 | bwd_inner_microstep: 1142.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607
[2024-06-10 11:05:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.43 | bwd_microstep: 1703.31 | bwd_inner_microstep: 1703.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 11:05:08,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.39 | bwd_microstep: 1651.03 | bwd_inner_microstep: 1651.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 11:05:09,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.94 | bwd_microstep: 796.98 | bwd_inner_microstep: 796.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2070
[2024-06-10 11:05:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.59 | bwd_microstep: 724.56 | bwd_inner_microstep: 724.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671
[2024-06-10 11:05:12,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1329.68 | bwd_inner_microstep: 1329.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 11:05:13,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.69 | bwd_microstep: 804.20 | bwd_inner_microstep: 804.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 11:05:15,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.45 | bwd_microstep: 1388.27 | bwd_inner_microstep: 1388.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 11:05:17,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 976.20 | bwd_inner_microstep: 976.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 11:05:19,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.13 | bwd_microstep: 1503.72 | bwd_inner_microstep: 1503.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 11:05:21,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.10 | bwd_microstep: 1438.40 | bwd_inner_microstep: 1438.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 11:05:22,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.62 | bwd_microstep: 1183.52 | bwd_inner_microstep: 1183.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 11:05:24,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1557.53 | bwd_inner_microstep: 1557.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 11:05:27,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.49 | bwd_microstep: 1646.08 | bwd_inner_microstep: 1646.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2952
[2024-06-10 11:05:28,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.31 | bwd_microstep: 1100.82 | bwd_inner_microstep: 1100.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441
[2024-06-10 11:05:30,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1384.13 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3463
[2024-06-10 11:05:39,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.47 | optimizer_step: 6.61
[2024-06-10 11:05:39,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.56 | bwd_microstep: 8152.17 | bwd_inner_microstep: 1777.13 | bwd_allreduce_microstep: 6374.97 | step_microstep: 40.20
[2024-06-10 11:05:39,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15431.67 | bwd: 47781.04 | bwd_inner: 41404.98 | bwd_allreduce: 6375.29 | step: 41.76
{'loss': 1.2539, 'learning_rate': 3.0226668133484494e-05, 'epoch': 0.35}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 11:05:41,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.34 | bwd_microstep: 1336.73 | bwd_inner_microstep: 1336.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2030
[2024-06-10 11:05:42,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.74 | bwd_microstep: 715.15 | bwd_inner_microstep: 715.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 11:05:44,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1557.92 | bwd_inner_microstep: 1557.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854
[2024-06-10 11:05:46,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.88 | bwd_microstep: 1658.71 | bwd_inner_microstep: 1658.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 11:05:48,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1277.36 | bwd_inner_microstep: 1277.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 11:05:49,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.77 | bwd_microstep: 677.15 | bwd_inner_microstep: 677.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 11:05:51,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.02 | bwd_microstep: 1292.28 | bwd_inner_microstep: 1292.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 11:05:53,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1386.62 | bwd_inner_microstep: 1386.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 11:05:55,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.61 | bwd_microstep: 1477.33 | bwd_inner_microstep: 1477.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-10 11:05:56,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.68 | bwd_microstep: 1278.75 | bwd_inner_microstep: 1278.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2948
[2024-06-10 11:05:58,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.62 | bwd_microstep: 1190.89 | bwd_inner_microstep: 1190.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3445
[2024-06-10 11:06:00,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.78 | bwd_microstep: 1376.24 | bwd_inner_microstep: 1376.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2151
[2024-06-10 11:06:01,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.15 | bwd_microstep: 878.05 | bwd_inner_microstep: 878.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 11:06:03,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1580.34 | bwd_inner_microstep: 1580.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-10 11:06:06,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.59 | bwd_microstep: 1562.25 | bwd_inner_microstep: 1562.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 11:06:07,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.08 | bwd_microstep: 1285.53 | bwd_inner_microstep: 1285.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 11:06:09,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.90 | bwd_microstep: 1254.72 | bwd_inner_microstep: 1254.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 11:06:10,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.99 | bwd_microstep: 973.76 | bwd_inner_microstep: 973.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3836
[2024-06-10 11:06:12,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1386.09 | bwd_inner_microstep: 1386.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1973
[2024-06-10 11:06:13,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.65 | bwd_microstep: 704.05 | bwd_inner_microstep: 704.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 11:06:15,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1346.94 | bwd_inner_microstep: 1346.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3530
[2024-06-10 11:06:17,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.62 | bwd_microstep: 1655.00 | bwd_inner_microstep: 1654.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3822
[2024-06-10 11:06:20,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.73 | bwd_microstep: 1749.33 | bwd_inner_microstep: 1749.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 11:06:22,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1552.35 | bwd_inner_microstep: 1552.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-10 11:06:23,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.46 | bwd_microstep: 811.60 | bwd_inner_microstep: 811.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-10 11:06:25,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 11:06:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1480.49 | bwd_inner_microstep: 1480.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 11:06:29,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1495.98 | bwd_inner_microstep: 1495.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3760
[2024-06-10 11:06:31,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 1349.43 | bwd_inner_microstep: 1349.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 11:06:33,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1285.73 | bwd_inner_microstep: 1285.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190
[2024-06-10 11:06:34,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.43 | bwd_microstep: 858.96 | bwd_inner_microstep: 858.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2225
[2024-06-10 11:06:43,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.43 | optimizer_step: 6.61
[2024-06-10 11:06:43,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.50 | bwd_microstep: 8967.64 | bwd_inner_microstep: 1010.30 | bwd_allreduce_microstep: 7957.27 | step_microstep: 40.13
[2024-06-10 11:06:43,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15295.82 | bwd: 48848.81 | bwd_inner: 40890.60 | bwd_allreduce: 7957.52 | step: 41.86
{'loss': 1.2737, 'learning_rate': 3.0194394115761415e-05, 'epoch': 0.35}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-10 11:06:45,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1442.45 | bwd_inner_microstep: 1442.32 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 11:06:47,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.06 | bwd_microstep: 1339.52 | bwd_inner_microstep: 1339.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2350
[2024-06-10 11:06:49,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.35 | bwd_microstep: 984.50 | bwd_inner_microstep: 984.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 11:06:50,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.83 | bwd_microstep: 1370.04 | bwd_inner_microstep: 1370.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3473
[2024-06-10 11:06:52,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.12 | bwd_microstep: 1438.44 | bwd_inner_microstep: 1438.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 11:06:54,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1374.54 | bwd_inner_microstep: 1374.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 11:06:56,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1340.66 | bwd_inner_microstep: 1340.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 11:06:58,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-10 11:06:59,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 794.98 | bwd_inner_microstep: 794.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 11:07:00,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.48 | bwd_microstep: 795.67 | bwd_inner_microstep: 795.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 11:07:02,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1380.56 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-10 11:07:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.89 | bwd_microstep: 1219.86 | bwd_inner_microstep: 1219.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1907
[2024-06-10 11:07:05,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.43 | bwd_microstep: 718.03 | bwd_inner_microstep: 718.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 11:07:07,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1379.01 | bwd_inner_microstep: 1378.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 11:07:09,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1446.29 | bwd_inner_microstep: 1446.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 11:07:11,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.75 | bwd_microstep: 1632.52 | bwd_inner_microstep: 1632.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 11:07:13,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1457.79 | bwd_inner_microstep: 1457.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 11:07:15,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1483.10 | bwd_inner_microstep: 1483.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3476
[2024-06-10 11:07:17,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.57 | bwd_microstep: 1249.33 | bwd_inner_microstep: 1249.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 11:07:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1556.88 | bwd_inner_microstep: 1556.66 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 11:07:21,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1285.79 | bwd_inner_microstep: 1285.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3439
[2024-06-10 11:07:22,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.94 | bwd_microstep: 1156.26 | bwd_inner_microstep: 1156.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528
[2024-06-10 11:07:24,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1326.04 | bwd_inner_microstep: 1326.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 11:07:26,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.29 | bwd_microstep: 1461.98 | bwd_inner_microstep: 1461.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 11:07:28,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.22 | bwd_microstep: 1329.35 | bwd_inner_microstep: 1329.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 11:07:30,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.03 | bwd_microstep: 1556.93 | bwd_inner_microstep: 1556.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1891
[2024-06-10 11:07:31,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.84 | bwd_microstep: 794.04 | bwd_inner_microstep: 794.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-10 11:07:33,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.59 | bwd_microstep: 1190.49 | bwd_inner_microstep: 1190.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3558
[2024-06-10 11:07:35,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.60 | bwd_microstep: 1662.47 | bwd_inner_microstep: 1662.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3811
[2024-06-10 11:07:38,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.34 | bwd_microstep: 1791.06 | bwd_inner_microstep: 1791.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583
[2024-06-10 11:07:40,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.82 | bwd_microstep: 1696.31 | bwd_inner_microstep: 1696.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 11:07:46,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.34 | optimizer_step: 6.56
[2024-06-10 11:07:46,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.53 | bwd_microstep: 5471.69 | bwd_inner_microstep: 1451.93 | bwd_allreduce_microstep: 4019.68 | step_microstep: 38.82
[2024-06-10 11:07:46,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15871.01 | bwd: 46472.43 | bwd_inner: 42451.54 | bwd_allreduce: 4020.05 | step: 40.86
{'loss': 1.2413, 'learning_rate': 3.0162084193435257e-05, 'epoch': 0.35}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5043
[2024-06-10 11:07:49,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 727.55 | bwd_microstep: 1959.36 | bwd_inner_microstep: 1959.14 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3470
[2024-06-10 11:07:51,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 1439.41 | bwd_inner_microstep: 1439.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903
[2024-06-10 11:07:53,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1588.13 | bwd_inner_microstep: 1588.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 11:07:55,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.21 | bwd_microstep: 1242.89 | bwd_inner_microstep: 1242.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 11:07:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.73 | bwd_microstep: 1276.63 | bwd_inner_microstep: 1276.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3419
[2024-06-10 11:07:58,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1314.03 | bwd_inner_microstep: 1314.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432
[2024-06-10 11:08:00,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.24 | bwd_microstep: 1188.50 | bwd_inner_microstep: 1188.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 11:08:02,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1416.23 | bwd_inner_microstep: 1416.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 11:08:04,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1309.72 | bwd_inner_microstep: 1309.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 11:08:06,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.27 | bwd_microstep: 1419.30 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2466
[2024-06-10 11:08:07,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.15 | bwd_microstep: 929.48 | bwd_inner_microstep: 929.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504
[2024-06-10 11:08:09,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.32 | bwd_microstep: 1447.41 | bwd_inner_microstep: 1447.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2427
[2024-06-10 11:08:10,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.00 | bwd_microstep: 1040.30 | bwd_inner_microstep: 1040.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 11:08:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1384.49 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3633
[2024-06-10 11:08:14,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.61 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 11:08:16,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.02 | bwd_microstep: 1362.92 | bwd_inner_microstep: 1362.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-10 11:08:18,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1421.53 | bwd_inner_microstep: 1421.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 11:08:20,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.88 | bwd_microstep: 1605.37 | bwd_inner_microstep: 1605.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 11:08:22,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1395.87 | bwd_inner_microstep: 1395.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 11:08:24,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1499.02 | bwd_inner_microstep: 1499.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3536
[2024-06-10 11:08:26,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.83 | bwd_microstep: 1202.31 | bwd_inner_microstep: 1202.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 11:08:28,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.01 | bwd_microstep: 1290.89 | bwd_inner_microstep: 1290.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2239
[2024-06-10 11:08:29,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.06 | bwd_microstep: 899.91 | bwd_inner_microstep: 899.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.30
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817
[2024-06-10 11:08:31,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.54 | bwd_microstep: 1628.41 | bwd_inner_microstep: 1628.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718
[2024-06-10 11:08:33,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1339.38 | bwd_inner_microstep: 1339.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 11:08:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1258.48 | bwd_inner_microstep: 1258.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 11:08:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.48 | bwd_microstep: 1567.32 | bwd_inner_microstep: 1567.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 11:08:38,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.44 | bwd_microstep: 731.02 | bwd_inner_microstep: 730.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551
[2024-06-10 11:08:40,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.08 | bwd_microstep: 1204.83 | bwd_inner_microstep: 1204.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-10 11:08:42,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.66 | bwd_microstep: 1439.46 | bwd_inner_microstep: 1439.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2905
[2024-06-10 11:08:43,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.20 | bwd_microstep: 1094.95 | bwd_inner_microstep: 1094.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 11:08:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 11:08:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.75 | bwd_microstep: 4647.53 | bwd_inner_microstep: 1861.01 | bwd_allreduce_microstep: 2786.47 | step_microstep: 38.44
[2024-06-10 11:08:49,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16124.38 | bwd: 45981.58 | bwd_inner: 43194.02 | bwd_allreduce: 2786.79 | step: 41.54

 35%|███▍      | 601/1726 [10:26:12<22:27:55, 71.89s/it]


 35%|███▍      | 601/1726 [10:26:12<22:27:55, 71.89s/it]
 35%|███▍      | 602/1726 [10:27:12<21:21:46, 68.42s/it]


 35%|███▍      | 602/1726 [10:27:12<21:21:46, 68.42s/it]
 35%|███▍      | 603/1726 [10:28:16<20:53:21, 66.96s/it]


 35%|███▍      | 603/1726 [10:28:16<20:53:21, 66.96s/it]
 35%|███▍      | 604/1726 [10:29:20<20:38:25, 66.23s/it]


 35%|███▍      | 604/1726 [10:29:20<20:38:25, 66.23s/it]
 35%|███▌      | 605/1726 [10:30:23<20:17:37, 65.17s/it]


 35%|███▌      | 605/1726 [10:30:23<20:17:37, 65.17s/it]
 35%|███▌      | 606/1726 [10:31:25<20:01:27, 64.36s/it]
                  {'loss': 1.2677, 'learning_rate': 3.0129738480301398e-05, 'epoch': 0.35}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 11:08:51,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.21 | bwd_microstep: 1444.91 | bwd_inner_microstep: 1444.71 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903
[2024-06-10 11:08:53,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.43 | bwd_microstep: 1482.53 | bwd_inner_microstep: 1482.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882
[2024-06-10 11:08:55,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.34 | bwd_microstep: 1479.15 | bwd_inner_microstep: 1479.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 11:08:57,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1651.98 | bwd_inner_microstep: 1651.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 11:08:59,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1383.53 | bwd_inner_microstep: 1383.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 11:09:01,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1370
[2024-06-10 11:09:01,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.37 | bwd_microstep: 519.92 | bwd_inner_microstep: 519.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 11:09:03,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1317.44 | bwd_inner_microstep: 1317.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3744
[2024-06-10 11:09:05,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1500.12 | bwd_inner_microstep: 1500.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 11:09:07,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1283.71 | bwd_inner_microstep: 1283.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 11:09:08,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.16 | bwd_microstep: 797.55 | bwd_inner_microstep: 797.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3684
[2024-06-10 11:09:10,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1572.06 | bwd_inner_microstep: 1572.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3676
[2024-06-10 11:09:12,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1407.17 | bwd_inner_microstep: 1407.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3652
[2024-06-10 11:09:14,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.29 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2672
[2024-06-10 11:09:16,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.31 | bwd_microstep: 1025.09 | bwd_inner_microstep: 1025.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 11:09:18,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1380.13 | bwd_inner_microstep: 1380.02 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 11:09:19,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.66 | bwd_microstep: 819.69 | bwd_inner_microstep: 819.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3832
[2024-06-10 11:09:21,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.34 | bwd_microstep: 1588.08 | bwd_inner_microstep: 1588.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 11:09:22,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.92 | bwd_microstep: 881.36 | bwd_inner_microstep: 881.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304
[2024-06-10 11:09:24,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.80 | bwd_microstep: 977.68 | bwd_inner_microstep: 977.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3601
[2024-06-10 11:09:26,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 1370.67 | bwd_inner_microstep: 1370.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3462
[2024-06-10 11:09:28,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1424.39 | bwd_inner_microstep: 1424.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 11:09:29,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1384.12 | bwd_inner_microstep: 1384.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-10 11:09:31,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.29 | bwd_microstep: 977.50 | bwd_inner_microstep: 977.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3613
[2024-06-10 11:09:33,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1575.56 | bwd_inner_microstep: 1575.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 11:09:35,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.07 | bwd_microstep: 1596.88 | bwd_inner_microstep: 1596.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 11:09:37,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1605.27 | bwd_inner_microstep: 1603.87 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.14
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529
[2024-06-10 11:09:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.57 | bwd_microstep: 1424.84 | bwd_inner_microstep: 1424.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 11:09:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.03 | bwd_microstep: 1612.61 | bwd_inner_microstep: 1612.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 11:09:43,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.36 | bwd_microstep: 1281.32 | bwd_inner_microstep: 1281.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2225
[2024-06-10 11:09:45,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.00 | bwd_microstep: 896.27 | bwd_inner_microstep: 896.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 11:09:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.37 | optimizer_step: 6.63
[2024-06-10 11:09:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.78 | bwd_microstep: 5121.25 | bwd_inner_microstep: 1658.01 | bwd_allreduce_microstep: 3463.17 | step_microstep: 38.99
[2024-06-10 11:09:50,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.36 | bwd: 45646.49 | bwd_inner: 42181.95 | bwd_allreduce: 3463.65 | step: 41.30
{'loss': 1.3236, 'learning_rate': 3.0097357090281267e-05, 'epoch': 0.35}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 11:09:52,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1337.58 | bwd_inner_microstep: 1337.32 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 11:09:53,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.93 | bwd_microstep: 778.71 | bwd_inner_microstep: 778.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 11:09:55,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 11:09:57,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.63 | bwd_microstep: 1481.45 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 11:09:59,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.78 | bwd_microstep: 1642.14 | bwd_inner_microstep: 1642.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 11:10:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.60 | bwd_microstep: 1652.92 | bwd_inner_microstep: 1652.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895
[2024-06-10 11:10:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.60 | bwd_microstep: 684.26 | bwd_inner_microstep: 684.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 11:10:05,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 11:10:07,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.45 | bwd_microstep: 1538.43 | bwd_inner_microstep: 1538.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609
[2024-06-10 11:10:09,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1312.76 | bwd_inner_microstep: 1312.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 11:10:10,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.10 | bwd_microstep: 682.58 | bwd_inner_microstep: 682.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 11:10:11,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.47 | bwd_microstep: 1296.06 | bwd_inner_microstep: 1296.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3014
[2024-06-10 11:10:13,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.96 | bwd_microstep: 1263.76 | bwd_inner_microstep: 1263.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 11:10:15,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.28 | bwd_microstep: 1382.27 | bwd_inner_microstep: 1382.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 11:10:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1390.36 | bwd_inner_microstep: 1390.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424
[2024-06-10 11:10:19,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1541.27 | bwd_inner_microstep: 1541.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 11:10:21,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.32 | bwd_microstep: 1355.94 | bwd_inner_microstep: 1355.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3628
[2024-06-10 11:10:23,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1347.92 | bwd_inner_microstep: 1347.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 11:10:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1295.09 | bwd_inner_microstep: 1295.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 11:10:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1283.39 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634
[2024-06-10 11:10:28,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.22 | bwd_microstep: 1316.51 | bwd_inner_microstep: 1316.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3812
[2024-06-10 11:10:30,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.34 | bwd_microstep: 1293.97 | bwd_inner_microstep: 1293.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 11:10:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1187.74 | bwd_inner_microstep: 1187.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 11:10:34,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1533.44 | bwd_inner_microstep: 1533.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1982
[2024-06-10 11:10:35,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.71 | bwd_microstep: 769.50 | bwd_inner_microstep: 769.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3485
[2024-06-10 11:10:37,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1439.35 | bwd_inner_microstep: 1439.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3716
[2024-06-10 11:10:39,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1558.88 | bwd_inner_microstep: 1558.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 11:10:41,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.34 | bwd_microstep: 1605.93 | bwd_inner_microstep: 1605.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 11:10:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.58 | bwd_microstep: 1648.47 | bwd_inner_microstep: 1648.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 11:10:45,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.23 | bwd_microstep: 1450.72 | bwd_inner_microstep: 1450.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 11:10:47,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1510.20 | bwd_inner_microstep: 1510.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3743
[2024-06-10 11:10:52,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 11:10:52,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.73 | bwd_microstep: 3528.52 | bwd_inner_microstep: 1971.90 | bwd_allreduce_microstep: 1556.56 | step_microstep: 38.07
[2024-06-10 11:10:52,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16134.88 | bwd: 44877.23 | bwd_inner: 43319.55 | bwd_allreduce: 1556.89 | step: 39.76
{'loss': 1.236, 'learning_rate': 3.006494013742196e-05, 'epoch': 0.35}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 11:10:54,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.57 | bwd_microstep: 1329.52 | bwd_inner_microstep: 1329.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4079
[2024-06-10 11:10:56,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1453.93 | bwd_inner_microstep: 1453.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 11:10:57,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 1308.66 | bwd_inner_microstep: 1308.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 11:10:59,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1378.39 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 11:11:02,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.12 | bwd_microstep: 1648.96 | bwd_inner_microstep: 1648.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 11:11:03,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3396
[2024-06-10 11:11:05,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.31 | bwd_microstep: 1211.61 | bwd_inner_microstep: 1211.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416
[2024-06-10 11:11:07,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.24 | bwd_microstep: 1314.32 | bwd_inner_microstep: 1314.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 11:11:08,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.84 | bwd_microstep: 819.05 | bwd_inner_microstep: 819.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 11:11:10,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1385.31 | bwd_inner_microstep: 1385.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-10 11:11:12,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.18 | bwd_microstep: 1346.18 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 11:11:14,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.78 | bwd_microstep: 1351.71 | bwd_inner_microstep: 1351.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 11:11:16,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1475.28 | bwd_inner_microstep: 1475.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 11:11:17,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.77 | bwd_microstep: 677.95 | bwd_inner_microstep: 677.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1952
[2024-06-10 11:11:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.27 | bwd_microstep: 920.60 | bwd_inner_microstep: 920.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2497
[2024-06-10 11:11:19,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.63 | bwd_microstep: 986.63 | bwd_inner_microstep: 986.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 11:11:20,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.94 | bwd_microstep: 804.39 | bwd_inner_microstep: 804.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 11:11:22,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1460.02 | bwd_inner_microstep: 1459.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 11:11:25,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.73 | bwd_microstep: 1657.13 | bwd_inner_microstep: 1657.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1974
[2024-06-10 11:11:26,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.15 | bwd_microstep: 827.36 | bwd_inner_microstep: 827.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 11:11:28,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1608.71 | bwd_inner_microstep: 1608.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 11:11:30,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.12 | bwd_microstep: 1533.51 | bwd_inner_microstep: 1533.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 11:11:31,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.90 | bwd_microstep: 680.46 | bwd_inner_microstep: 680.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 11:11:33,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-10 11:11:35,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1492.31 | bwd_inner_microstep: 1492.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-10 11:11:36,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.56 | bwd_microstep: 908.39 | bwd_inner_microstep: 908.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2397
[2024-06-10 11:11:37,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.70 | bwd_microstep: 889.22 | bwd_inner_microstep: 889.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 11:11:39,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1283.96 | bwd_inner_microstep: 1283.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 11:11:41,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1396.96 | bwd_inner_microstep: 1396.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816
[2024-06-10 11:11:43,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.82 | bwd_microstep: 1619.07 | bwd_inner_microstep: 1619.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-10 11:11:45,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.19 | bwd_microstep: 1440.40 | bwd_inner_microstep: 1440.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749
[2024-06-10 11:11:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-10 11:11:52,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.90 | bwd_microstep: 5977.40 | bwd_inner_microstep: 1851.93 | bwd_allreduce_microstep: 4125.39 | step_microstep: 38.77
[2024-06-10 11:11:52,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15166.45 | bwd: 44720.32 | bwd_inner: 40594.00 | bwd_allreduce: 4125.64 | step: 40.58
{'loss': 1.314, 'learning_rate': 3.0032487735895803e-05, 'epoch': 0.35}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 11:11:54,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.24 | bwd_microstep: 1334.01 | bwd_inner_microstep: 1333.90 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 11:11:56,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1284.86 | bwd_inner_microstep: 1284.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3904
[2024-06-10 11:11:58,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.92 | bwd_microstep: 1687.69 | bwd_inner_microstep: 1687.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 11:12:00,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.53 | bwd_microstep: 1652.33 | bwd_inner_microstep: 1652.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3790
[2024-06-10 11:12:02,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.86 | bwd_microstep: 1576.04 | bwd_inner_microstep: 1576.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474
[2024-06-10 11:12:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.12 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1248.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 11:12:06,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1252.77 | bwd_inner_microstep: 1252.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 11:12:08,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1397.69 | bwd_inner_microstep: 1397.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2085
[2024-06-10 11:12:09,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.18 | bwd_microstep: 730.75 | bwd_inner_microstep: 730.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3493
[2024-06-10 11:12:11,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.62 | bwd_microstep: 1533.71 | bwd_inner_microstep: 1533.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3509
[2024-06-10 11:12:13,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.83 | bwd_microstep: 1559.94 | bwd_inner_microstep: 1559.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 11:12:15,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.28 | bwd_microstep: 1254.91 | bwd_inner_microstep: 1254.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 11:12:17,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1354.77 | bwd_inner_microstep: 1354.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2914
[2024-06-10 11:12:18,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.50 | bwd_microstep: 1094.41 | bwd_inner_microstep: 1094.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 11:12:19,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.59 | bwd_microstep: 795.47 | bwd_inner_microstep: 795.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 11:12:21,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1287.06 | bwd_inner_microstep: 1287.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 11:12:23,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.14 | bwd_microstep: 1286.13 | bwd_inner_microstep: 1286.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 11:12:25,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.55 | bwd_microstep: 1422.92 | bwd_inner_microstep: 1422.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 11:12:27,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.58 | bwd_microstep: 1287.01 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 11:12:28,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.61 | bwd_microstep: 1190.46 | bwd_inner_microstep: 1190.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 11:12:29,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.71 | bwd_microstep: 799.46 | bwd_inner_microstep: 799.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3835
[2024-06-10 11:12:31,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 1359.50 | bwd_inner_microstep: 1359.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075
[2024-06-10 11:12:33,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.26 | bwd_microstep: 919.01 | bwd_inner_microstep: 918.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 11:12:34,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1389.60 | bwd_inner_microstep: 1389.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2163
[2024-06-10 11:12:36,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.19 | bwd_microstep: 857.34 | bwd_inner_microstep: 857.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 11:12:37,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.95 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 11:12:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.38 | bwd_microstep: 1666.20 | bwd_inner_microstep: 1666.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 11:12:41,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.43 | bwd_microstep: 1653.10 | bwd_inner_microstep: 1653.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597
[2024-06-10 11:12:44,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1570.50 | bwd_inner_microstep: 1570.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3778
[2024-06-10 11:12:46,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.69 | bwd_microstep: 1821.01 | bwd_inner_microstep: 1820.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580
[2024-06-10 11:12:48,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.04 | bwd_microstep: 1593.21 | bwd_inner_microstep: 1593.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3776
[2024-06-10 11:12:52,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.28 | optimizer_step: 6.57
[2024-06-10 11:12:52,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.65 | bwd_microstep: 3063.11 | bwd_inner_microstep: 1905.75 | bwd_allreduce_microstep: 1157.30 | step_microstep: 38.12
[2024-06-10 11:12:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15873.40 | bwd: 43802.25 | bwd_inner: 42643.95 | bwd_allreduce: 1157.59 | step: 40.23
{'loss': 1.2643, 'learning_rate': 3.0000000000000004e-05, 'epoch': 0.35}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 11:12:54,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.54 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4210
[2024-06-10 11:12:56,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.34 | bwd_microstep: 1656.69 | bwd_inner_microstep: 1656.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 11:12:58,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1349.50 | bwd_inner_microstep: 1349.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 11:13:00,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1378.33 | bwd_inner_microstep: 1378.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 11:13:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1384.37 | bwd_inner_microstep: 1384.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 11:13:04,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1245.67 | bwd_inner_microstep: 1245.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 11:13:06,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.27 | bwd_microstep: 1388.27 | bwd_inner_microstep: 1388.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 11:13:07,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 11:13:09,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1248.79 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 11:13:11,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1387.32 | bwd_inner_microstep: 1387.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 11:13:13,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.70 | bwd_microstep: 1393.33 | bwd_inner_microstep: 1393.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483
[2024-06-10 11:13:15,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.46 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 11:13:17,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.50 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 11:13:19,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1386.24 | bwd_inner_microstep: 1386.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 11:13:21,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.02 | bwd_microstep: 1485.31 | bwd_inner_microstep: 1485.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 11:13:23,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1391.50 | bwd_inner_microstep: 1391.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 11:13:25,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.88 | bwd_microstep: 1620.43 | bwd_inner_microstep: 1620.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 11:13:27,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 11:13:29,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.05 | bwd_microstep: 1291.54 | bwd_inner_microstep: 1291.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 11:13:31,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.72 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3630
[2024-06-10 11:13:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1540.39 | bwd_inner_microstep: 1540.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 11:13:35,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1417.48 | bwd_inner_microstep: 1417.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624
[2024-06-10 11:13:37,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.72 | bwd_microstep: 1442.76 | bwd_inner_microstep: 1442.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 11:13:39,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.02 | bwd_microstep: 1645.70 | bwd_inner_microstep: 1645.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 11:13:41,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1557.59 | bwd_inner_microstep: 1557.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 11:13:42,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.73 | bwd_microstep: 804.97 | bwd_inner_microstep: 804.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 11:13:44,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.66 | bwd_microstep: 1549.80 | bwd_inner_microstep: 1549.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-10 11:13:46,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.53 | bwd_microstep: 1421.25 | bwd_inner_microstep: 1421.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 11:13:47,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.21 | bwd_microstep: 691.69 | bwd_inner_microstep: 691.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437
[2024-06-10 11:13:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1372.98 | bwd_inner_microstep: 1372.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3773
[2024-06-10 11:13:51,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 1676.20 | bwd_inner_microstep: 1676.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 11:13:55,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.36 | optimizer_step: 6.62
[2024-06-10 11:13:55,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 2842.53 | bwd_inner_microstep: 1814.43 | bwd_allreduce_microstep: 1028.03 | step_microstep: 39.48
[2024-06-10 11:13:55,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16682.80 | bwd: 45898.12 | bwd_inner: 44868.89 | bwd_allreduce: 1028.42 | step: 41.70
{'loss': 1.2655, 'learning_rate': 2.9967477044156184e-05, 'epoch': 0.35}


 35%|███▌      | 606/1726 [10:31:25<20:01:27, 64.36s/it]
 35%|███▌      | 607/1726 [10:32:27<19:45:44, 63.58s/it]


 35%|███▌      | 607/1726 [10:32:27<19:45:44, 63.58s/it]
 35%|███▌      | 608/1726 [10:33:28<19:32:17, 62.91s/it]


 35%|███▌      | 608/1726 [10:33:28<19:32:17, 62.91s/it]
 35%|███▌      | 609/1726 [10:34:29<19:16:19, 62.11s/it]


 35%|███▌      | 609/1726 [10:34:29<19:16:19, 62.11s/it]
 35%|███▌      | 610/1726 [10:35:29<19:03:45, 61.49s/it]


 35%|███▌      | 610/1726 [10:35:29<19:03:45, 61.49s/it]
 35%|███▌      | 611/1726 [10:36:32<19:10:59, 61.94s/it]


 35%|███▌      | 611/1726 [10:36:32<dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1888
[2024-06-10 11:13:56,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.54 | bwd_microstep: 804.61 | bwd_inner_microstep: 804.45 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 11:13:58,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.66 | bwd_microstep: 1277.79 | bwd_inner_microstep: 1277.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 11:14:00,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1287.18 | bwd_inner_microstep: 1287.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 856
[2024-06-10 11:14:00,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 135.14 | bwd_microstep: 345.86 | bwd_inner_microstep: 345.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 11:14:02,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.85 | bwd_microstep: 1386.92 | bwd_inner_microstep: 1386.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 11:14:04,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1245.30 | bwd_inner_microstep: 1245.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 11:14:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.51 | bwd_microstep: 1186.54 | bwd_inner_microstep: 1186.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3577
[2024-06-10 11:14:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.62 | bwd_microstep: 1208.16 | bwd_inner_microstep: 1208.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2617
[2024-06-10 11:14:09,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.75 | bwd_microstep: 1065.42 | bwd_inner_microstep: 1065.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-10 11:14:10,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.52 | bwd_microstep: 710.32 | bwd_inner_microstep: 710.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 11:14:11,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.64 | bwd_microstep: 1293.44 | bwd_inner_microstep: 1293.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3690
[2024-06-10 11:14:13,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.96 | bwd_microstep: 1488.81 | bwd_inner_microstep: 1488.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3656
[2024-06-10 11:14:15,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.58 | bwd_microstep: 1351.56 | bwd_inner_microstep: 1351.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 11:14:17,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.17 | bwd_microstep: 1420.38 | bwd_inner_microstep: 1420.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 11:14:19,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.64 | bwd_microstep: 1416.02 | bwd_inner_microstep: 1415.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3550
[2024-06-10 11:14:21,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.21 | bwd_microstep: 1563.81 | bwd_inner_microstep: 1563.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3491
[2024-06-10 11:14:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.27 | bwd_microstep: 1644.38 | bwd_inner_microstep: 1644.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3648
[2024-06-10 11:14:26,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.47 | bwd_microstep: 1814.85 | bwd_inner_microstep: 1814.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 11:14:28,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.56 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 11:14:30,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.14 | bwd_microstep: 1296.54 | bwd_inner_microstep: 1296.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 11:14:31,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.21 | bwd_microstep: 800.39 | bwd_inner_microstep: 800.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304
[2024-06-10 11:14:32,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.72 | bwd_microstep: 983.12 | bwd_inner_microstep: 983.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3586
[2024-06-10 11:14:34,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.58 | bwd_microstep: 1273.49 | bwd_inner_microstep: 1273.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 11:14:36,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.25 | bwd_microstep: 1499.49 | bwd_inner_microstep: 1499.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 11:14:38,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1513.72 | bwd_inner_microstep: 1513.55 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.31
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3562
[2024-06-10 11:14:40,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1459.96 | bwd_inner_microstep: 1459.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 11:14:42,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1240.80 | bwd_inner_microstep: 1240.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 11:14:44,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1362.07 | bwd_inner_microstep: 1362.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3593
[2024-06-10 11:14:46,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.78 | bwd_microstep: 1768.93 | bwd_inner_microstep: 1768.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-10 11:14:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1305.39 | bwd_inner_microstep: 1305.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 11:14:50,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.70 | bwd_microstep: 1406.09 | bwd_inner_microstep: 1406.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2062
[2024-06-10 11:14:55,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.32 | optimizer_step: 6.61
[2024-06-10 11:14:55,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.26 | bwd_microstep: 5076.61 | bwd_inner_microstep: 1007.52 | bwd_allreduce_microstep: 4069.03 | step_microstep: 38.22
[2024-06-10 11:14:55,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15272.58 | bwd: 44878.48 | bwd_inner: 40808.26 | bwd_allreduce: 4069.43 | step: 40.07
{'loss': 1.2544, 'learning_rate': 2.9934918982910032e-05, 'epoch': 0.35}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5176
[2024-06-10 11:14:58,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 738.17 | bwd_microstep: 1987.58 | bwd_inner_microstep: 1987.39 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 11:15:00,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1393.52 | bwd_inner_microstep: 1393.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3941
[2024-06-10 11:15:02,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.56 | bwd_microstep: 1396.42 | bwd_inner_microstep: 1396.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3831
[2024-06-10 11:15:04,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.51 | bwd_microstep: 1486.06 | bwd_inner_microstep: 1486.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 11:15:06,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1355.42 | bwd_inner_microstep: 1355.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3565
[2024-06-10 11:15:08,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.70 | bwd_microstep: 1207.97 | bwd_inner_microstep: 1207.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 11:15:09,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.62 | bwd_microstep: 697.62 | bwd_inner_microstep: 697.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 11:15:11,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.26 | bwd_microstep: 1396.03 | bwd_inner_microstep: 1396.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 11:15:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.70 | bwd_microstep: 1420.18 | bwd_inner_microstep: 1420.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 11:15:14,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 704.86 | bwd_inner_microstep: 704.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3437
[2024-06-10 11:15:15,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.72 | bwd_microstep: 1216.63 | bwd_inner_microstep: 1216.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888
[2024-06-10 11:15:17,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.58 | bwd_microstep: 1610.79 | bwd_inner_microstep: 1610.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3760
[2024-06-10 11:15:20,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1603.47 | bwd_inner_microstep: 1603.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3673
[2024-06-10 11:15:22,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.53 | bwd_microstep: 1721.24 | bwd_inner_microstep: 1721.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3649
[2024-06-10 11:15:24,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.79 | bwd_microstep: 1652.19 | bwd_inner_microstep: 1652.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-10 11:15:26,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 1294.20 | bwd_inner_microstep: 1294.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3611
[2024-06-10 11:15:28,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.04 | bwd_microstep: 1677.10 | bwd_inner_microstep: 1677.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 11:15:30,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1414.25 | bwd_inner_microstep: 1414.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 11:15:32,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.35 | bwd_microstep: 1298.00 | bwd_inner_microstep: 1297.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2100
[2024-06-10 11:15:33,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.94 | bwd_microstep: 825.69 | bwd_inner_microstep: 825.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 11:15:35,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.19 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 11:15:36,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.67 | bwd_microstep: 880.58 | bwd_inner_microstep: 880.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 11:15:37,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.15 | bwd_microstep: 697.94 | bwd_inner_microstep: 697.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2664
[2024-06-10 11:15:39,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.14 | bwd_microstep: 1118.52 | bwd_inner_microstep: 1118.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3721
[2024-06-10 11:15:41,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.79 | bwd_microstep: 1844.60 | bwd_inner_microstep: 1844.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 11:15:43,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1553.51 | bwd_inner_microstep: 1553.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-10 11:15:45,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.14 | bwd_microstep: 1404.64 | bwd_inner_microstep: 1404.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2207
[2024-06-10 11:15:47,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.25 | bwd_microstep: 960.59 | bwd_inner_microstep: 960.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 11:15:49,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1438.76 | bwd_inner_microstep: 1438.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 11:15:51,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1345.08 | bwd_inner_microstep: 1345.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588
[2024-06-10 11:15:53,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.08 | bwd_microstep: 1702.39 | bwd_inner_microstep: 1702.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3799
[2024-06-10 11:15:55,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 11:15:55,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.42 | bwd_microstep: 1755.89 | bwd_inner_microstep: 1747.87 | bwd_allreduce_microstep: 7.97 | step_microstep: 38.54
[2024-06-10 11:15:55,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16193.75 | bwd: 43346.25 | bwd_inner: 43336.95 | bwd_allreduce: 8.49 | step: 40.27
{'loss': 1.3129, 'learning_rate': 2.990232593093087e-05, 'epoch': 0.36}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 11:15:57,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.87 | bwd_microstep: 1476.25 | bwd_inner_microstep: 1476.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4018
[2024-06-10 11:16:00,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.11 | bwd_microstep: 1613.18 | bwd_inner_microstep: 1613.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-10 11:16:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.30 | bwd_microstep: 1344.85 | bwd_inner_microstep: 1344.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 11:16:03,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.22 | bwd_microstep: 1285.79 | bwd_inner_microstep: 1285.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2321
[2024-06-10 11:16:05,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.59 | bwd_microstep: 884.45 | bwd_inner_microstep: 884.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893
[2024-06-10 11:16:05,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.21 | bwd_microstep: 686.70 | bwd_inner_microstep: 686.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2376
[2024-06-10 11:16:07,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.06 | bwd_microstep: 932.73 | bwd_inner_microstep: 932.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 11:16:09,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 11:16:10,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.53 | bwd_microstep: 698.35 | bwd_inner_microstep: 698.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3496
[2024-06-10 11:16:12,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.82 | bwd_microstep: 1447.92 | bwd_inner_microstep: 1447.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2200
[2024-06-10 11:16:13,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.42 | bwd_microstep: 1054.61 | bwd_inner_microstep: 1054.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-10 11:16:15,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.67 | bwd_microstep: 1583.52 | bwd_inner_microstep: 1583.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 11:16:17,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.03 | bwd_microstep: 1502.21 | bwd_inner_microstep: 1502.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3570
[2024-06-10 11:16:19,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.54 | bwd_microstep: 1332.38 | bwd_inner_microstep: 1332.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3654
[2024-06-10 11:16:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.38 | bwd_microstep: 1588.56 | bwd_inner_microstep: 1588.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 11:16:23,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1255.34 | bwd_inner_microstep: 1255.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 11:16:25,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.58 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:16:27,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1392.89 | bwd_inner_microstep: 1392.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-10 11:16:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.64 | bwd_microstep: 1152.39 | bwd_inner_microstep: 1152.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 11:16:30,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.04 | bwd_microstep: 1460.31 | bwd_inner_microstep: 1460.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2038
[2024-06-10 11:16:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 808.57 | bwd_inner_microstep: 808.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 11:16:33,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1283.58 | bwd_inner_microstep: 1283.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 11:16:35,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1520.56 | bwd_inner_microstep: 1520.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 11:16:38,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.97 | bwd_microstep: 1655.42 | bwd_inner_microstep: 1655.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 868
[2024-06-10 11:16:38,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 153.08 | bwd_microstep: 397.85 | bwd_inner_microstep: 397.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-10 11:16:40,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.61 | bwd_microstep: 1153.65 | bwd_inner_microstep: 1153.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 11:16:42,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3787
[2024-06-10 11:16:44,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1553.88 | bwd_inner_microstep: 1553.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 11:16:45,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.00 | bwd_microstep: 791.80 | bwd_inner_microstep: 791.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2643
[2024-06-10 11:16:47,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.28 | bwd_microstep: 1148.81 | bwd_inner_microstep: 1148.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3586
[2024-06-10 11:16:49,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.16 | bwd_microstep: 1704.40 | bwd_inner_microstep: 1704.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3698
[2024-06-10 11:16:54,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 11:16:54,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 4647.55 | bwd_inner_microstep: 1498.35 | bwd_allreduce_microstep: 3149.14 | step_microstep: 39.44
[2024-06-10 11:16:54,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15032.51 | bwd: 43370.06 | bwd_inner: 40220.01 | bwd_allreduce: 3149.37 | step: 41.00
{'loss': 1.3037, 'learning_rate': 2.9869698003011254e-05, 'epoch': 0.36}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 11:16:56,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1337.71 | bwd_inner_microstep: 1337.54 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429
[2024-06-10 11:16:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.28 | bwd_microstep: 1154.24 | bwd_inner_microstep: 1154.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3031
[2024-06-10 11:16:59,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.70 | bwd_microstep: 1100.37 | bwd_inner_microstep: 1100.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 11:17:01,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1502.22 | bwd_inner_microstep: 1502.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 11:17:03,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.50 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 11:17:05,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.32 | bwd_microstep: 1541.03 | bwd_inner_microstep: 1541.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 11:17:07,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.82 | bwd_microstep: 1345.48 | bwd_inner_microstep: 1345.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3702
[2024-06-10 11:17:09,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1367.75 | bwd_inner_microstep: 1367.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405
[2024-06-10 11:17:11,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.79 | bwd_microstep: 1374.99 | bwd_inner_microstep: 1374.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 11:17:12,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.89 | bwd_microstep: 805.55 | bwd_inner_microstep: 805.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3702
[2024-06-10 11:17:14,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.72 | bwd_microstep: 1694.82 | bwd_inner_microstep: 1694.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667
[2024-06-10 11:17:16,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.28 | bwd_microstep: 1323.52 | bwd_inner_microstep: 1323.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 11:17:18,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1593.97 | bwd_inner_microstep: 1593.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1887
[2024-06-10 11:17:19,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.59 | bwd_microstep: 783.69 | bwd_inner_microstep: 783.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 11:17:21,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.55 | bwd_microstep: 1291.57 | bwd_inner_microstep: 1291.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3838
[2024-06-10 11:17:23,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1571.02 | bwd_inner_microstep: 1570.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 11:17:25,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1296.54 | bwd_inner_microstep: 1296.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 11:17:27,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.67 | bwd_microstep: 1298.03 | bwd_inner_microstep: 1298.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 11:17:29,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.78 | bwd_microstep: 1309.83 | bwd_inner_microstep: 1309.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-10 11:17:30,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.91 | bwd_microstep: 1161.53 | bwd_inner_microstep: 1161.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 11:17:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.80 | bwd_microstep: 1161.35 | bwd_inner_microstep: 1161.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3701
[2024-06-10 11:17:34,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1435.48 | bwd_inner_microstep: 1435.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 11:17:36,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1490.06 | bwd_inner_microstep: 1490.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1982
[2024-06-10 11:17:37,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.79 | bwd_microstep: 707.20 | bwd_inner_microstep: 707.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 11:17:39,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1382.65 | bwd_inner_microstep: 1382.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 11:17:41,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.06 | bwd_microstep: 1398.71 | bwd_inner_microstep: 1398.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3562
[2024-06-10 11:17:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.96 | bwd_microstep: 1459.25 | bwd_inner_microstep: 1459.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2930
[2024-06-10 11:17:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.65 | bwd_microstep: 1196.15 | bwd_inner_microstep: 1196.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3775
[2024-06-10 11:17:47,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.85 | bwd_microstep: 1573.13 | bwd_inner_microstep: 1573.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 11:17:48,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.89 | bwd_microstep: 698.82 | bwd_inner_microstep: 698.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 11:17:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1507.68 | bwd_inner_microstep: 1507.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 11:17:54,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 11:17:54,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.83 | bwd_microstep: 3554.53 | bwd_inner_microstep: 1529.54 | bwd_allreduce_microstep: 2024.94 | step_microstep: 38.26
[2024-06-10 11:17:54,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15656.88 | bwd: 43821.25 | bwd_inner: 41795.27 | bwd_allreduce: 2025.23 | step: 40.28
{'loss': 1.2986, 'learning_rate': 2.983703531406658e-05, 'epoch': 0.36}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1936
[2024-06-10 11:17:55,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.97 | bwd_microstep: 876.06 | bwd_inner_microstep: 875.90 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4015
[2024-06-10 11:17:58,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.62 | bwd_microstep: 1711.44 | bwd_inner_microstep: 1711.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 11:18:00,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1483.43 | bwd_inner_microstep: 1483.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1315
[2024-06-10 11:18:00,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 187.97 | bwd_microstep: 482.52 | bwd_inner_microstep: 482.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-10 11:18:02,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1454.14 | bwd_inner_microstep: 1454.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 11:18:03,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.85 | bwd_microstep: 801.37 | bwd_inner_microstep: 801.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 11:18:05,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1385.35 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 11:18:07,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.56 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 11:18:09,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.31 | bwd_microstep: 1388.97 | bwd_inner_microstep: 1388.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 11:18:10,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.21 | bwd_microstep: 778.78 | bwd_inner_microstep: 778.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3458
[2024-06-10 11:18:12,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1520.33 | bwd_inner_microstep: 1520.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001
[2024-06-10 11:18:13,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.04 | bwd_microstep: 895.28 | bwd_inner_microstep: 895.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 11:18:16,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.80 | bwd_microstep: 1649.91 | bwd_inner_microstep: 1649.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 11:18:18,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.50 | bwd_microstep: 1496.52 | bwd_inner_microstep: 1496.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 951
[2024-06-10 11:18:18,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 169.03 | bwd_microstep: 442.66 | bwd_inner_microstep: 442.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 11:18:19,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.20 | bwd_microstep: 794.42 | bwd_inner_microstep: 794.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 11:18:21,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.35 | bwd_microstep: 1285.68 | bwd_inner_microstep: 1285.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3839
[2024-06-10 11:18:23,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 11:18:25,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1285.17 | bwd_inner_microstep: 1285.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 11:18:27,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.79 | bwd_microstep: 1298.66 | bwd_inner_microstep: 1298.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 11:18:29,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.95 | bwd_microstep: 1520.15 | bwd_inner_microstep: 1520.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 11:18:30,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.77 | bwd_microstep: 1161.23 | bwd_inner_microstep: 1161.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 11:18:32,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.32 | bwd_microstep: 1440.31 | bwd_inner_microstep: 1440.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-10 11:18:34,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.01 | bwd_microstep: 1464.41 | bwd_inner_microstep: 1464.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:18:36,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 11:18:39,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.95 | bwd_microstep: 1611.69 | bwd_inner_microstep: 1611.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603
[2024-06-10 11:18:41,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.98 | bwd_microstep: 1704.82 | bwd_inner_microstep: 1704.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 11:18:43,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.10 | bwd_microstep: 1451.09 | bwd_inner_microstep: 1451.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487
[2024-06-10 11:18:45,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.63 | bwd_microstep: 1413.51 | bwd_inner_microstep: 1413.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3834
[2024-06-10 11:18:47,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.82 | bwd_microstep: 1729.06 | bwd_inner_microstep: 1729.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 11:18:50,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.77 | bwd_microstep: 1650.90 | bwd_inner_microstep: 1650.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3068
[2024-06-10 11:18:55,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.37 | optimizer_step: 6.63
[2024-06-10 11:18:55,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.29 | bwd_microstep: 5108.69 | bwd_inner_microstep: 1507.36 | bwd_allreduce_microstep: 3601.26 | step_microstep: 38.90
[2024-06-10 11:18:55,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15555.49 | bwd: 45347.83 | bwd_inner: 41745.52 | bwd_allreduce: 3601.56 | step: 40.75
{'loss': 1.2319, 'learning_rate': 2.980433797913467e-05, 'epoch': 0.36}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1934
[2024-06-10 11:18:56,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.84 | bwd_microstep: 813.15 | bwd_inner_microstep: 813.01 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 11:18:58,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.58 | bwd_microstep: 1475.23 | bwd_inner_microstep: 1475.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 11:19:01,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.80 | bwd_microstep: 1539.53 | bwd_inner_microstep: 1539.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 11:19:02,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1278.19 | bwd_inner_microstep: 1278.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 11:19:04,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1347.92 | bwd_inner_microstep: 1347.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 11:19:06,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1285.90 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 11:19:08,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1254.52 | bwd_inner_microstep: 1254.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 11:19:10,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1413.75 | bwd_inner_microstep: 1413.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988
[2024-06-10 11:19:11,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.39 | bwd_microstep: 832.83 | bwd_inner_microstep: 832.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410
[2024-06-10 11:19:13,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1310.12 | bwd_inner_microstep: 1310.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3671
[2024-06-10 11:19:15,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.42 | bwd_microstep: 1562.77 | bwd_inner_microstep: 1562.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 11:19:17,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 11:19:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1874
[2024-06-10 11:19:20,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.87 | bwd_microstep: 772.36 | bwd_inner_microstep: 772.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 11:19:22,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.05 | bwd_microstep: 1515.96 | bwd_inner_microstep: 1515.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3828
[2024-06-10 11:19:24,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.11 | bwd_microstep: 1619.29 | bwd_inner_microstep: 1619.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 11:19:26,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1398.11 | bwd_inner_microstep: 1398.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628
[2024-06-10 11:19:28,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1441.74 | bwd_inner_microstep: 1441.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 11:19:30,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.34 | bwd_microstep: 1380.66 | bwd_inner_microstep: 1380.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 11:19:32,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.96 | bwd_microstep: 1499.73 | bwd_inner_microstep: 1499.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3681
[2024-06-10 11:19:34,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1261.91 | bwd_inner_microstep: 1261.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831
[2024-06-10 11:19:36,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.82 | bwd_microstep: 1511.84 | bwd_inner_microstep: 1511.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3689
[2024-06-10 11:19:37,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1330.91 | bwd_inner_microstep: 1330.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701
[2024-06-10 11:19:40,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.97 | bwd_microstep: 1633.60 | bwd_inner_microstep: 1633.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2041
[2024-06-10 11:19:41,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.05 | bwd_microstep: 907.48 | bwd_inner_microstep: 907.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 11:19:43,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.49 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2627
[2024-06-10 11:19:45,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.82 | bwd_microstep: 1019.47 | bwd_inner_microstep: 1019.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 11:19:46,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1288.97 | bwd_inner_microstep: 1288.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 11:19:48,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1257.12 | bwd_inner_microstep: 1257.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 11:19:49,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.68 | bwd_microstep: 820.17 | bwd_inner_microstep: 820.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2944
[2024-06-10 11:19:51,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.14 | bwd_microstep: 1097.14 | bwd_inner_microstep: 1097.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 11:19:58,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.37 | optimizer_step: 6.60
[2024-06-10 11:19:58,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 6373.34 | bwd_inner_microstep: 1838.27 | bwd_allreduce_microstep: 4535.01 | step_microstep: 38.90
[2024-06-10 11:19:58,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15583.97 | bwd: 46520.56 | bwd_inner: 41984.52 | bwd_allreduce: 4535.30 | step: 40.54
19:10:59, 61.94s/it]
 35%|███▌      | 612/1726 [10:37:32<19:01:58, 61.51s/it]


 35%|███▌      | 612/1726 [10:37:32<19:01:58, 61.51s/it]
 36%|███▌      | 613/1726 [10:38:32<18:52:02, 61.03s/it]


 36%|███▌      | 613/1726 [10:38:32<18:52:02, 61.03s/it]
 36%|███▌      | 614/1726 [10:39:31<18:38:20, 60.34s/it]


 36%|███▌      | 614/1726 [10:39:31<18:38:20, 60.34s/it]
 36%|███▌      | 615/1726 [10:40:31<18:34:30, 60.19s/it]


 36%|███▌      | 615/1726 [10:40:31<18:34:30, 60.19s/it]
 36%|███▌      | 616/1726 [10:41:32<18:39:30, 60.51s/it]


 36%|███▌      | 616/1726 [10:41:32<18:39:30, 60.51s/it]
 36%|███▌      | 617/1726 [10:42:34<18:49:17, 61.10s/it{'loss': 1.2778, 'learning_rate': 2.9771606113375368e-05, 'epoch': 0.36}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 11:19:59,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.54 | bwd_microstep: 670.44 | bwd_inner_microstep: 670.30 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 11:20:00,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.98 | bwd_microstep: 1242.38 | bwd_inner_microstep: 1242.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 11:20:01,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.40 | bwd_microstep: 807.21 | bwd_inner_microstep: 807.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 11:20:03,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.46 | bwd_microstep: 1277.47 | bwd_inner_microstep: 1277.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 11:20:05,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.79 | bwd_microstep: 1340.03 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 11:20:07,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 11:20:09,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.01 | bwd_microstep: 1530.61 | bwd_inner_microstep: 1530.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 11:20:11,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1244.96 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1888
[2024-06-10 11:20:12,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.91 | bwd_microstep: 712.12 | bwd_inner_microstep: 712.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-10 11:20:14,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.49 | bwd_microstep: 1437.56 | bwd_inner_microstep: 1437.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 11:20:16,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.49 | bwd_microstep: 1249.93 | bwd_inner_microstep: 1249.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3432
[2024-06-10 11:20:17,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1422.40 | bwd_inner_microstep: 1422.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2148
[2024-06-10 11:20:19,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.99 | bwd_microstep: 943.29 | bwd_inner_microstep: 943.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 11:20:21,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.39 | bwd_microstep: 1335.65 | bwd_inner_microstep: 1335.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3696
[2024-06-10 11:20:23,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.34 | bwd_microstep: 1487.01 | bwd_inner_microstep: 1486.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 11:20:25,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.01 | bwd_microstep: 1520.41 | bwd_inner_microstep: 1520.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 11:20:26,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.86 | bwd_microstep: 1181.88 | bwd_inner_microstep: 1181.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 11:20:29,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.90 | bwd_microstep: 1659.34 | bwd_inner_microstep: 1659.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2939
[2024-06-10 11:20:30,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.18 | bwd_microstep: 1196.99 | bwd_inner_microstep: 1196.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 11:20:32,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1392.95 | bwd_inner_microstep: 1392.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 11:20:34,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1396.73 | bwd_inner_microstep: 1396.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:20:36,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1390.48 | bwd_inner_microstep: 1390.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 11:20:38,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.20 | bwd_microstep: 1495.50 | bwd_inner_microstep: 1495.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 11:20:40,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.51 | bwd_microstep: 1451.26 | bwd_inner_microstep: 1451.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 11:20:42,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.24 | bwd_microstep: 1611.88 | bwd_inner_microstep: 1611.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2078
[2024-06-10 11:20:44,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.90 | bwd_microstep: 917.92 | bwd_inner_microstep: 917.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 11:20:46,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1510.91 | bwd_inner_microstep: 1510.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 11:20:48,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1449.73 | bwd_inner_microstep: 1449.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 11:20:50,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.23 | bwd_microstep: 1504.10 | bwd_inner_microstep: 1504.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 11:20:52,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.24 | bwd_microstep: 1338.61 | bwd_inner_microstep: 1338.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 11:20:54,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1507.10 | bwd_inner_microstep: 1507.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3583
[2024-06-10 11:20:58,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 11:20:58,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 4068.46 | bwd_inner_microstep: 1529.93 | bwd_allreduce_microstep: 2538.48 | step_microstep: 38.02
[2024-06-10 11:20:58,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.49 | bwd: 44673.45 | bwd_inner: 42133.96 | bwd_allreduce: 2538.76 | step: 39.71
{'loss': 1.3094, 'learning_rate': 2.9738839832070128e-05, 'epoch': 0.36}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393
[2024-06-10 11:21:00,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.83 | bwd_microstep: 1232.55 | bwd_inner_microstep: 1232.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1864
[2024-06-10 11:21:01,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.97 | bwd_microstep: 736.76 | bwd_inner_microstep: 736.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3797
[2024-06-10 11:21:03,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.38 | bwd_microstep: 1408.19 | bwd_inner_microstep: 1408.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398
[2024-06-10 11:21:05,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.95 | bwd_microstep: 1147.03 | bwd_inner_microstep: 1147.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 11:21:07,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1383.42 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513
[2024-06-10 11:21:08,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.78 | bwd_microstep: 1190.24 | bwd_inner_microstep: 1190.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 11:21:10,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.06 | bwd_microstep: 1387.72 | bwd_inner_microstep: 1387.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3415
[2024-06-10 11:21:12,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.95 | bwd_microstep: 1216.71 | bwd_inner_microstep: 1216.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 11:21:14,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.20 | bwd_microstep: 1388.71 | bwd_inner_microstep: 1388.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3721
[2024-06-10 11:21:16,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.00 | bwd_microstep: 1598.47 | bwd_inner_microstep: 1598.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885
[2024-06-10 11:21:17,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.04 | bwd_microstep: 712.27 | bwd_inner_microstep: 712.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 11:21:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.16 | bwd_microstep: 1481.75 | bwd_inner_microstep: 1481.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 11:21:21,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1501.18 | bwd_inner_microstep: 1501.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 11:21:23,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.38 | bwd_microstep: 1290.16 | bwd_inner_microstep: 1290.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 11:21:25,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1483.76 | bwd_inner_microstep: 1483.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3459
[2024-06-10 11:21:27,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1331.63 | bwd_inner_microstep: 1331.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 11:21:29,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.65 | bwd_microstep: 1421.80 | bwd_inner_microstep: 1421.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 11:21:31,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1280.25 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 11:21:32,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1386.91 | bwd_inner_microstep: 1386.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3669
[2024-06-10 11:21:34,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.71 | bwd_microstep: 1326.09 | bwd_inner_microstep: 1326.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-10 11:21:36,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.69 | bwd_microstep: 879.26 | bwd_inner_microstep: 879.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-10 11:21:37,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.73 | bwd_microstep: 920.51 | bwd_inner_microstep: 920.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 11:21:39,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1277.72 | bwd_inner_microstep: 1277.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 11:21:41,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.43 | bwd_microstep: 1456.29 | bwd_inner_microstep: 1456.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2106
[2024-06-10 11:21:42,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.41 | bwd_microstep: 823.96 | bwd_inner_microstep: 823.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060
[2024-06-10 11:21:43,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.71 | bwd_microstep: 817.82 | bwd_inner_microstep: 817.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 11:21:45,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.37 | bwd_microstep: 1418.44 | bwd_inner_microstep: 1418.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2675
[2024-06-10 11:21:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.48 | bwd_microstep: 930.34 | bwd_inner_microstep: 930.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584
[2024-06-10 11:21:48,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.88 | bwd_microstep: 1405.25 | bwd_inner_microstep: 1405.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3749
[2024-06-10 11:21:50,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.91 | bwd_microstep: 1472.37 | bwd_inner_microstep: 1472.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609
[2024-06-10 11:21:52,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.93 | bwd_microstep: 1536.10 | bwd_inner_microstep: 1536.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 11:21:57,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 11:21:57,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.51 | bwd_microstep: 4139.98 | bwd_inner_microstep: 1634.17 | bwd_allreduce_microstep: 2505.76 | step_microstep: 38.23
[2024-06-10 11:21:57,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15167.21 | bwd: 42983.65 | bwd_inner: 40476.99 | bwd_allreduce: 2505.99 | step: 39.92
{'loss': 1.2375, 'learning_rate': 2.9706039250621626e-05, 'epoch': 0.36}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 11:21:59,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.38 | bwd_microstep: 1331.22 | bwd_inner_microstep: 1331.10 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 11:22:01,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.30 | bwd_microstep: 1515.97 | bwd_inner_microstep: 1515.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2303
[2024-06-10 11:22:02,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.98 | bwd_microstep: 815.10 | bwd_inner_microstep: 815.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 11:22:04,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.92 | bwd_microstep: 1350.28 | bwd_inner_microstep: 1350.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 11:22:06,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1385.78 | bwd_inner_microstep: 1385.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231
[2024-06-10 11:22:07,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.96 | bwd_microstep: 963.02 | bwd_inner_microstep: 963.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 11:22:09,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1399.19 | bwd_inner_microstep: 1399.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 11:22:11,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1284.51 | bwd_inner_microstep: 1284.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 11:22:13,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1249.46 | bwd_inner_microstep: 1249.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 11:22:14,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.58 | bwd_microstep: 802.11 | bwd_inner_microstep: 802.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2082
[2024-06-10 11:22:15,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.92 | bwd_microstep: 917.76 | bwd_inner_microstep: 917.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 11:22:17,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1347.50 | bwd_inner_microstep: 1347.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 11:22:19,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.95 | bwd_microstep: 1250.93 | bwd_inner_microstep: 1250.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1900
[2024-06-10 11:22:20,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.65 | bwd_microstep: 840.15 | bwd_inner_microstep: 840.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3670
[2024-06-10 11:22:22,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.60 | bwd_microstep: 1793.83 | bwd_inner_microstep: 1793.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3413
[2024-06-10 11:22:24,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1394.68 | bwd_inner_microstep: 1394.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137
[2024-06-10 11:22:25,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.79 | bwd_microstep: 929.34 | bwd_inner_microstep: 929.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 11:22:27,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.57 | bwd_microstep: 1491.86 | bwd_inner_microstep: 1491.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 11:22:29,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.31 | bwd_microstep: 1422.38 | bwd_inner_microstep: 1422.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3532
[2024-06-10 11:22:31,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.34 | bwd_microstep: 1199.52 | bwd_inner_microstep: 1199.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718
[2024-06-10 11:22:33,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.66 | bwd_microstep: 1337.60 | bwd_inner_microstep: 1337.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 11:22:35,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.60 | bwd_microstep: 1314.04 | bwd_inner_microstep: 1314.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 11:22:36,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.14 | bwd_microstep: 879.27 | bwd_inner_microstep: 879.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3615
[2024-06-10 11:22:38,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.48 | bwd_microstep: 1643.38 | bwd_inner_microstep: 1643.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 11:22:40,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.87 | bwd_microstep: 1496.56 | bwd_inner_microstep: 1496.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 11:22:43,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.35 | bwd_microstep: 1656.52 | bwd_inner_microstep: 1656.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 590
[2024-06-10 11:22:43,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.56 | bwd_microstep: 257.37 | bwd_inner_microstep: 257.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 613
[2024-06-10 11:22:43,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.70 | bwd_microstep: 260.41 | bwd_inner_microstep: 260.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566
[2024-06-10 11:22:45,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.17 | bwd_microstep: 1205.06 | bwd_inner_microstep: 1205.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3433
[2024-06-10 11:22:47,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.86 | bwd_microstep: 1406.54 | bwd_inner_microstep: 1406.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 11:22:49,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1384.56 | bwd_inner_microstep: 1384.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 11:22:58,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.33 | optimizer_step: 6.61
[2024-06-10 11:22:58,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.83 | bwd_microstep: 8175.63 | bwd_inner_microstep: 1753.43 | bwd_allreduce_microstep: 6422.12 | step_microstep: 40.75
[2024-06-10 11:22:58,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14664.55 | bwd: 45701.58 | bwd_inner: 39278.41 | bwd_allreduce: 6422.42 | step: 44.52
{'loss': 1.2446, 'learning_rate': 2.967320448455334e-05, 'epoch': 0.36}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1860
[2024-06-10 11:22:59,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.32 | bwd_microstep: 670.97 | bwd_inner_microstep: 670.84 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862
[2024-06-10 11:23:01,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.81 | bwd_microstep: 1656.74 | bwd_inner_microstep: 1656.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3465
[2024-06-10 11:23:03,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1420.55 | bwd_inner_microstep: 1420.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 11:23:05,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.83 | bwd_microstep: 1535.69 | bwd_inner_microstep: 1535.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 11:23:07,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1281.84 | bwd_inner_microstep: 1281.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 11:23:09,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1377.93 | bwd_inner_microstep: 1377.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 11:23:11,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1411.86 | bwd_inner_microstep: 1411.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 11:23:12,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1283.05 | bwd_inner_microstep: 1283.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 11:23:13,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.65 | bwd_microstep: 780.52 | bwd_inner_microstep: 780.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 11:23:15,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1247.44 | bwd_inner_microstep: 1247.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491
[2024-06-10 11:23:17,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.97 | bwd_microstep: 1443.30 | bwd_inner_microstep: 1443.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 11:23:19,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.49 | bwd_microstep: 1486.72 | bwd_inner_microstep: 1486.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 11:23:21,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1346.33 | bwd_inner_microstep: 1346.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3408
[2024-06-10 11:23:23,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.47 | bwd_microstep: 1212.17 | bwd_inner_microstep: 1212.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3534
[2024-06-10 11:23:25,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.89 | bwd_microstep: 1657.00 | bwd_inner_microstep: 1656.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 11:23:27,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1497.58 | bwd_inner_microstep: 1497.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3500
[2024-06-10 11:23:29,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 1552.72 | bwd_inner_microstep: 1552.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2334
[2024-06-10 11:23:30,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.22 | bwd_microstep: 895.56 | bwd_inner_microstep: 895.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-10 11:23:32,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.82 | bwd_microstep: 1316.67 | bwd_inner_microstep: 1316.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517
[2024-06-10 11:23:34,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.24 | bwd_microstep: 1320.84 | bwd_inner_microstep: 1320.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3630
[2024-06-10 11:23:36,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.83 | bwd_microstep: 1710.73 | bwd_inner_microstep: 1710.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 11:23:39,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.56 | bwd_microstep: 1501.40 | bwd_inner_microstep: 1501.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 11:23:41,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.11 | bwd_microstep: 1516.13 | bwd_inner_microstep: 1516.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 11:23:42,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1251.49 | bwd_inner_microstep: 1251.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 11:23:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1290.87 | bwd_inner_microstep: 1290.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 11:23:46,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.10 | bwd_microstep: 1658.57 | bwd_inner_microstep: 1658.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219
[2024-06-10 11:23:48,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.82 | bwd_microstep: 960.17 | bwd_inner_microstep: 960.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3721
[2024-06-10 11:23:50,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1368.77 | bwd_inner_microstep: 1368.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 11:23:52,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1497.66 | bwd_inner_microstep: 1497.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 11:23:54,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.77 | bwd_microstep: 1553.03 | bwd_inner_microstep: 1553.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 11:23:55,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.46 | bwd_microstep: 687.88 | bwd_inner_microstep: 687.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 11:23:58,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 11:23:58,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.87 | bwd_microstep: 2966.70 | bwd_inner_microstep: 1803.78 | bwd_allreduce_microstep: 1162.87 | step_microstep: 38.19
[2024-06-10 11:23:58,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16073.17 | bwd: 44358.91 | bwd_inner: 43195.02 | bwd_allreduce: 1163.15 | step: 39.84
{'loss': 1.2829, 'learning_rate': 2.9640335649509144e-05, 'epoch': 0.36}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1927
[2024-06-10 11:24:00,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.03 | bwd_microstep: 820.36 | bwd_inner_microstep: 820.27 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3917
[2024-06-10 11:24:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1550.82 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 11:24:04,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.31 | bwd_microstep: 1481.38 | bwd_inner_microstep: 1481.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 11:24:05,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.21 | bwd_microstep: 1190.11 | bwd_inner_microstep: 1190.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4104
[2024-06-10 11:24:08,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.69 | bwd_microstep: 1635.32 | bwd_inner_microstep: 1635.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-10 11:24:09,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.21 | bwd_microstep: 1153.95 | bwd_inner_microstep: 1153.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-10 11:24:11,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.56 | bwd_microstep: 1192.70 | bwd_inner_microstep: 1192.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3701
[2024-06-10 11:24:13,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.78 | bwd_microstep: 1458.25 | bwd_inner_microstep: 1458.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 11:24:15,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1392.08 | bwd_inner_microstep: 1392.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 11:24:17,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.26 | bwd_microstep: 1190.34 | bwd_inner_microstep: 1190.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3538
[2024-06-10 11:24:18,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.84 | bwd_microstep: 1327.82 | bwd_inner_microstep: 1327.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2186
[2024-06-10 11:24:20,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.08 | bwd_microstep: 1048.33 | bwd_inner_microstep: 1048.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2662
[2024-06-10 11:24:21,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.99 | bwd_microstep: 1154.51 | bwd_inner_microstep: 1154.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3631
[2024-06-10 11:24:23,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.98 | bwd_microstep: 1312.39 | bwd_inner_microstep: 1312.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 11:24:25,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1244.03 | bwd_inner_microstep: 1244.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1941
[2024-06-10 11:24:26,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.78 | bwd_microstep: 758.47 | bwd_inner_microstep: 758.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 11:24:28,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1281.21 | bwd_inner_microstep: 1281.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 11:24:30,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1507.98 | bwd_inner_microstep: 1507.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 11:24:32,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.07 | bwd_microstep: 1506.83 | bwd_inner_microstep: 1506.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 11:24:34,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1556.08 | bwd_inner_microstep: 1556.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 11:24:36,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1373.88 | bwd_inner_microstep: 1373.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 791
[2024-06-10 11:24:36,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.95 | bwd_microstep: 310.00 | bwd_inner_microstep: 309.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522
[2024-06-10 11:24:38,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1323.97 | bwd_inner_microstep: 1323.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3514
[2024-06-10 11:24:40,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.49 | bwd_microstep: 1551.06 | bwd_inner_microstep: 1551.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2169
[2024-06-10 11:24:42,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.03 | bwd_microstep: 954.88 | bwd_inner_microstep: 954.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 11:24:44,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1549.50 | bwd_inner_microstep: 1549.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 11:24:46,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1485.26 | bwd_inner_microstep: 1485.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-10 11:24:47,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.84 | bwd_microstep: 876.21 | bwd_inner_microstep: 876.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 11:24:49,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1379.05 | bwd_inner_microstep: 1379.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3759
[2024-06-10 11:24:51,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.73 | bwd_microstep: 1403.00 | bwd_inner_microstep: 1402.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577
[2024-06-10 11:24:53,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.00 | bwd_microstep: 1528.47 | bwd_inner_microstep: 1528.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2237
[2024-06-10 11:25:02,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 11:25:02,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.40 | bwd_microstep: 8718.83 | bwd_inner_microstep: 1153.54 | bwd_allreduce_microstep: 7565.24 | step_microstep: 38.00
[2024-06-10 11:25:02,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15216.04 | bwd: 48217.09 | bwd_inner: 40650.87 | bwd_allreduce: 7565.51 | step: 39.58
{'loss': 1.2655, 'learning_rate': 2.960743286125291e-05, 'epoch': 0.36}
]


 36%|███▌      | 617/1726 [10:42:34<18:49:17, 61.10s/it]
 36%|███▌      | 618/1726 [10:43:35<18:46:22, 60.99s/it]


 36%|███▌      | 618/1726 [10:43:35<18:46:22, 60.99s/it]
 36%|███▌      | 619/1726 [10:44:34<18:31:28, 60.24s/it]


 36%|███▌      | 619/1726 [10:44:34<18:31:28, 60.24s/it]
 36%|███▌      | 620/1726 [10:45:34<18:33:05, 60.38s/it]


 36%|███▌      | 620/1726 [10:45:34<18:33:05, 60.38s/it]
 36%|███▌      | 621/1726 [10:46:35<18:34:17, 60.50s/it]


 36%|███▌      | 621/1726 [10:46:35<18:34:17, 60.50s/it]
 36%|███▌      | 622/1726 [10:47:39<18:51:16, 61.48s/it]


 36%|███▌      |dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3478
[2024-06-10 11:25:04,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1496.14 | bwd_inner_microstep: 1496.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-10 11:25:06,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.77 | bwd_microstep: 1180.17 | bwd_inner_microstep: 1180.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4313
[2024-06-10 11:25:08,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.78 | bwd_microstep: 1774.97 | bwd_inner_microstep: 1774.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 11:25:10,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.10 | bwd_microstep: 1545.12 | bwd_inner_microstep: 1545.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 11:25:12,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.66 | bwd_microstep: 1477.12 | bwd_inner_microstep: 1477.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 11:25:14,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1412.35 | bwd_inner_microstep: 1412.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 11:25:16,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1282.75 | bwd_inner_microstep: 1282.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 11:25:18,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.04 | bwd_microstep: 1640.11 | bwd_inner_microstep: 1640.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 11:25:19,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 699.24 | bwd_inner_microstep: 699.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 11:25:21,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1435.27 | bwd_inner_microstep: 1435.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2138
[2024-06-10 11:25:23,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.38 | bwd_microstep: 955.90 | bwd_inner_microstep: 955.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 11:25:25,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1380.66 | bwd_inner_microstep: 1380.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511
[2024-06-10 11:25:27,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1440.44 | bwd_inner_microstep: 1440.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 11:25:28,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1340.58 | bwd_inner_microstep: 1340.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3593
[2024-06-10 11:25:31,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.06 | bwd_microstep: 1552.99 | bwd_inner_microstep: 1552.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 11:25:33,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.29 | bwd_microstep: 1512.47 | bwd_inner_microstep: 1512.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 11:25:35,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1388.39 | bwd_inner_microstep: 1388.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3837
[2024-06-10 11:25:37,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.41 | bwd_microstep: 1685.94 | bwd_inner_microstep: 1685.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2295
[2024-06-10 11:25:38,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.03 | bwd_microstep: 1074.35 | bwd_inner_microstep: 1074.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 11:25:40,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.36 | bwd_microstep: 1284.91 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 11:25:42,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.00 | bwd_microstep: 1660.74 | bwd_inner_microstep: 1660.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2271
[2024-06-10 11:25:44,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.39 | bwd_microstep: 1070.33 | bwd_inner_microstep: 1070.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822
[2024-06-10 11:25:46,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.13 | bwd_microstep: 1352.41 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 11:25:47,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.83 | bwd_microstep: 789.63 | bwd_inner_microstep: 789.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 11:25:49,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.16 | bwd_microstep: 1502.60 | bwd_inner_microstep: 1502.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 11:25:51,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1392.32 | bwd_inner_microstep: 1392.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 11:25:53,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.56 | bwd_microstep: 1442.64 | bwd_inner_microstep: 1442.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768
[2024-06-10 11:25:55,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1341.63 | bwd_inner_microstep: 1341.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 11:25:57,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 11:25:59,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 11:26:01,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.05 | bwd_microstep: 1402.49 | bwd_inner_microstep: 1402.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-10 11:26:03,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 11:26:03,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.12 | bwd_microstep: 1666.32 | bwd_inner_microstep: 1658.30 | bwd_allreduce_microstep: 7.97 | step_microstep: 37.73
[2024-06-10 11:26:03,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16377.45 | bwd: 43962.36 | bwd_inner: 43953.50 | bwd_allreduce: 8.20 | step: 39.47
{'loss': 1.2583, 'learning_rate': 2.9574496235668078e-05, 'epoch': 0.36}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 11:26:05,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.59 | bwd_microstep: 1244.51 | bwd_inner_microstep: 1244.32 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872
[2024-06-10 11:26:07,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.04 | bwd_microstep: 1665.87 | bwd_inner_microstep: 1665.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3881
[2024-06-10 11:26:09,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.20 | bwd_microstep: 1681.65 | bwd_inner_microstep: 1681.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 11:26:11,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1478.58 | bwd_inner_microstep: 1478.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 11:26:13,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1375.05 | bwd_inner_microstep: 1375.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798
[2024-06-10 11:26:15,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.94 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 11:26:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1244.67 | bwd_inner_microstep: 1244.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 11:26:19,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1384.08 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 11:26:21,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.83 | bwd_microstep: 1377.86 | bwd_inner_microstep: 1377.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1944
[2024-06-10 11:26:22,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.49 | bwd_microstep: 820.55 | bwd_inner_microstep: 820.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 11:26:24,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1311.00 | bwd_inner_microstep: 1310.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2118
[2024-06-10 11:26:25,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.58 | bwd_microstep: 925.01 | bwd_inner_microstep: 924.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:26:27,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 11:26:29,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1485.38 | bwd_inner_microstep: 1485.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 11:26:31,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1397.43 | bwd_inner_microstep: 1397.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3420
[2024-06-10 11:26:33,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1479.22 | bwd_inner_microstep: 1479.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504
[2024-06-10 11:26:35,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.20 | bwd_microstep: 1191.68 | bwd_inner_microstep: 1191.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461
[2024-06-10 11:26:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.72 | bwd_microstep: 1212.54 | bwd_inner_microstep: 1212.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 11:26:38,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1298.55 | bwd_inner_microstep: 1298.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 11:26:40,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1282.07 | bwd_inner_microstep: 1282.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 11:26:42,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1522.22 | bwd_inner_microstep: 1522.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069
[2024-06-10 11:26:43,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.98 | bwd_microstep: 916.73 | bwd_inner_microstep: 916.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 11:26:45,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1296.13 | bwd_inner_microstep: 1296.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-10 11:26:47,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.90 | bwd_microstep: 973.81 | bwd_inner_microstep: 973.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 11:26:49,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.70 | bwd_microstep: 1534.48 | bwd_inner_microstep: 1534.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 11:26:50,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1311.71 | bwd_inner_microstep: 1311.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825
[2024-06-10 11:26:53,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.05 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4015
[2024-06-10 11:26:55,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.99 | bwd_microstep: 1813.58 | bwd_inner_microstep: 1813.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 11:26:56,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.79 | bwd_microstep: 813.76 | bwd_inner_microstep: 813.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-10 11:26:59,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.54 | bwd_microstep: 1753.76 | bwd_inner_microstep: 1753.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3387
[2024-06-10 11:27:00,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.11 | bwd_microstep: 1240.29 | bwd_inner_microstep: 1240.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2047
[2024-06-10 11:27:03,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-10 11:27:03,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.12 | bwd_microstep: 2388.77 | bwd_inner_microstep: 1109.22 | bwd_allreduce_microstep: 1279.50 | step_microstep: 37.86
[2024-06-10 11:27:03,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15870.35 | bwd: 43938.34 | bwd_inner: 42657.78 | bwd_allreduce: 1279.81 | step: 39.57
{'loss': 1.2553, 'learning_rate': 2.9541525888757285e-05, 'epoch': 0.36}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3398
[2024-06-10 11:27:05,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.19 | bwd_microstep: 1177.82 | bwd_inner_microstep: 1177.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 11:27:07,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.74 | bwd_microstep: 1343.03 | bwd_inner_microstep: 1343.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3785
[2024-06-10 11:27:09,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1452.32 | bwd_inner_microstep: 1452.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 11:27:10,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.67 | bwd_microstep: 1392.47 | bwd_inner_microstep: 1392.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 11:27:13,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.60 | bwd_microstep: 1644.50 | bwd_inner_microstep: 1644.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4112
[2024-06-10 11:27:15,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.75 | bwd_microstep: 1704.42 | bwd_inner_microstep: 1704.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 11:27:17,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.92 | bwd_microstep: 1192.93 | bwd_inner_microstep: 1192.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 11:27:19,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.64 | bwd_microstep: 1298.75 | bwd_inner_microstep: 1298.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2165
[2024-06-10 11:27:20,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.08 | bwd_microstep: 883.45 | bwd_inner_microstep: 883.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 11:27:22,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1315.23 | bwd_inner_microstep: 1315.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3496
[2024-06-10 11:27:23,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.66 | bwd_microstep: 1255.34 | bwd_inner_microstep: 1255.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 11:27:25,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1345.57 | bwd_inner_microstep: 1345.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2364
[2024-06-10 11:27:26,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.68 | bwd_microstep: 957.67 | bwd_inner_microstep: 957.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909
[2024-06-10 11:27:28,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.04 | bwd_microstep: 776.68 | bwd_inner_microstep: 776.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 11:27:29,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1351.60 | bwd_inner_microstep: 1351.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3068
[2024-06-10 11:27:31,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.58 | bwd_microstep: 1397.28 | bwd_inner_microstep: 1397.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 11:27:33,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1479.60 | bwd_inner_microstep: 1479.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 11:27:35,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1277.81 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3521
[2024-06-10 11:27:37,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.80 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 11:27:39,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.12 | bwd_microstep: 1181.59 | bwd_inner_microstep: 1181.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2471
[2024-06-10 11:27:40,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.97 | bwd_microstep: 860.52 | bwd_inner_microstep: 860.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 11:27:42,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3516
[2024-06-10 11:27:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.87 | bwd_microstep: 1323.55 | bwd_inner_microstep: 1323.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3552
[2024-06-10 11:27:45,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.86 | bwd_microstep: 1233.12 | bwd_inner_microstep: 1233.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2183
[2024-06-10 11:27:47,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.35 | bwd_microstep: 765.11 | bwd_inner_microstep: 765.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 11:27:48,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.72 | bwd_microstep: 1187.90 | bwd_inner_microstep: 1187.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2045
[2024-06-10 11:27:49,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.04 | bwd_microstep: 786.48 | bwd_inner_microstep: 786.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3566
[2024-06-10 11:27:51,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.54 | bwd_microstep: 1433.71 | bwd_inner_microstep: 1433.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 11:27:52,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.94 | bwd_microstep: 802.70 | bwd_inner_microstep: 802.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-10 11:27:55,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.48 | bwd_microstep: 1591.39 | bwd_inner_microstep: 1591.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764
[2024-06-10 11:27:57,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.95 | bwd_microstep: 1439.73 | bwd_inner_microstep: 1439.57 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.15
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3805
[2024-06-10 11:28:06,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.37 | optimizer_step: 6.57
[2024-06-10 11:28:06,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.21 | bwd_microstep: 9252.15 | bwd_inner_microstep: 2217.29 | bwd_allreduce_microstep: 7034.79 | step_microstep: 40.45
[2024-06-10 11:28:06,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15199.41 | bwd: 47910.22 | bwd_inner: 40874.34 | bwd_allreduce: 7035.13 | step: 42.13
{'loss': 1.2696, 'learning_rate': 2.9508521936641906e-05, 'epoch': 0.36}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 11:28:08,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.30 | bwd_microstep: 1237.89 | bwd_inner_microstep: 1237.77 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 11:28:10,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 11:28:12,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1394.82 | bwd_inner_microstep: 1394.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3473
[2024-06-10 11:28:14,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.36 | bwd_microstep: 1244.51 | bwd_inner_microstep: 1244.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4248
[2024-06-10 11:28:16,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.10 | bwd_microstep: 1495.01 | bwd_inner_microstep: 1494.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 11:28:17,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.49 | bwd_microstep: 791.56 | bwd_inner_microstep: 791.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-10 11:28:18,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.96 | bwd_microstep: 803.68 | bwd_inner_microstep: 803.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3786
[2024-06-10 11:28:20,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.69 | bwd_microstep: 1581.58 | bwd_inner_microstep: 1581.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 11:28:21,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.62 | bwd_microstep: 789.40 | bwd_inner_microstep: 789.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 11:28:23,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1281.41 | bwd_inner_microstep: 1281.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 11:28:25,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1285.76 | bwd_inner_microstep: 1285.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 11:28:27,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.28 | bwd_microstep: 1408.44 | bwd_inner_microstep: 1408.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2092
[2024-06-10 11:28:28,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.67 | bwd_microstep: 734.02 | bwd_inner_microstep: 733.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633
[2024-06-10 11:28:30,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.32 | bwd_microstep: 1545.41 | bwd_inner_microstep: 1545.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532
[2024-06-10 11:28:32,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.20 | bwd_microstep: 1455.50 | bwd_inner_microstep: 1455.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982
[2024-06-10 11:28:33,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.71 | bwd_microstep: 830.49 | bwd_inner_microstep: 830.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 11:28:35,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.06 | bwd_microstep: 1385.80 | bwd_inner_microstep: 1385.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3979
[2024-06-10 11:28:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.03 | bwd_microstep: 1702.39 | bwd_inner_microstep: 1702.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 11:28:39,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1373.50 | bwd_inner_microstep: 1373.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3657
[2024-06-10 11:28:42,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1665.60 | bwd_inner_microstep: 1665.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 11:28:44,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.10 | bwd_microstep: 1658.07 | bwd_inner_microstep: 1658.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 11:28:46,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1551.58 | bwd_inner_microstep: 1551.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 11:28:48,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.62 | bwd_microstep: 1398.77 | bwd_inner_microstep: 1398.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-10 11:28:50,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1315.96 | bwd_inner_microstep: 1315.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 11:28:52,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1379.14 | bwd_inner_microstep: 1379.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3618
[2024-06-10 11:28:53,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1249.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 11:28:55,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1401.63 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4819
[2024-06-10 11:28:58,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 696.74 | bwd_microstep: 1836.17 | bwd_inner_microstep: 1836.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3590
[2024-06-10 11:29:00,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.74 | bwd_microstep: 1368.09 | bwd_inner_microstep: 1368.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 11:29:02,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.08 | bwd_microstep: 1296.92 | bwd_inner_microstep: 1296.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 11:29:04,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.63 | bwd_microstep: 1597.12 | bwd_inner_microstep: 1597.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 11:29:09,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.30 | optimizer_step: 6.55
[2024-06-10 11:29:09,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 4860.65 | bwd_inner_microstep: 1633.58 | bwd_allreduce_microstep: 3227.01 | step_microstep: 39.79
[2024-06-10 11:29:09,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16142.42 | bwd: 46302.51 | bwd_inner: 43074.49 | bwd_allreduce: 3227.30 | step: 41.79
{'loss': 1.26, 'learning_rate': 2.94754844955617e-05, 'epoch': 0.36}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514
[2024-06-10 11:29:11,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.97 | bwd_microstep: 1408.22 | bwd_inner_microstep: 1408.02 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1951
[2024-06-10 11:29:12,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.17 | bwd_microstep: 743.73 | bwd_inner_microstep: 743.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3874
[2024-06-10 11:29:15,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1629.01 | bwd_inner_microstep: 1628.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 11:29:17,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.90 | bwd_microstep: 1548.43 | bwd_inner_microstep: 1548.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 11:29:18,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.94 | bwd_microstep: 1280.36 | bwd_inner_microstep: 1280.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752
[2024-06-10 11:29:21,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.78 | bwd_microstep: 1639.45 | bwd_inner_microstep: 1639.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 11:29:22,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1284.30 | bwd_inner_microstep: 1284.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3764
[2024-06-10 11:29:24,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.89 | bwd_microstep: 1340.87 | bwd_inner_microstep: 1340.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3501
[2024-06-10 11:29:26,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1336.17 | bwd_inner_microstep: 1336.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 11:29:28,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.52 | bwd_microstep: 1356.13 | bwd_inner_microstep: 1356.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 11:29:30,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.67 | bwd_microstep: 1256.62 | bwd_inner_microstep: 1256.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 11:29:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1390.33 | bwd_inner_microstep: 1390.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-10 11:29:33,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.34 | bwd_microstep: 889.78 | bwd_inner_microstep: 889.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-10 11:29:35,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.66 | bwd_microstep: 1437.38 | bwd_inner_microstep: 1437.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3089
[2024-06-10 11:29:37,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.74 | bwd_microstep: 1243.50 | bwd_inner_microstep: 1243.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1964
[2024-06-10 11:29:38,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.39 | bwd_microstep: 898.33 | bwd_inner_microstep: 898.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3429
[2024-06-10 11:29:40,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1514.72 | bwd_inner_microstep: 1514.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 11:29:42,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1286.56 | bwd_inner_microstep: 1286.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3529
[2024-06-10 11:29:44,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1552.90 | bwd_inner_microstep: 1552.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-10 11:29:46,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.81 | bwd_microstep: 1611.31 | bwd_inner_microstep: 1611.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-10 11:29:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.19 | bwd_microstep: 930.71 | bwd_inner_microstep: 930.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 11:29:49,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.89 | bwd_microstep: 1495.16 | bwd_inner_microstep: 1495.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385
[2024-06-10 11:29:51,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.39 | bwd_microstep: 1437.78 | bwd_inner_microstep: 1437.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2625
[2024-06-10 11:29:53,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.86 | bwd_microstep: 1112.33 | bwd_inner_microstep: 1112.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 11:29:55,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1420.86 | bwd_inner_microstep: 1420.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-10 11:29:57,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1446.74 | bwd_inner_microstep: 1446.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 11:29:59,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1498.32 | bwd_inner_microstep: 1498.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807
[2024-06-10 11:30:01,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.94 | bwd_microstep: 1755.31 | bwd_inner_microstep: 1755.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2187
[2024-06-10 11:30:03,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.73 | bwd_microstep: 986.00 | bwd_inner_microstep: 985.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-10 11:30:05,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.75 | bwd_microstep: 1449.93 | bwd_inner_microstep: 1449.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 11:30:07,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1354.42 | bwd_inner_microstep: 1354.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2228
[2024-06-10 11:30:12,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.34 | optimizer_step: 6.62
[2024-06-10 11:30:12,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.57 | bwd_microstep: 4766.83 | bwd_inner_microstep: 1016.93 | bwd_allreduce_microstep: 3749.83 | step_microstep: 38.79
[2024-06-10 11:30:12,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15838.35 | bwd: 46302.48 | bwd_inner: 42551.59 | bwd_allreduce: 3750.14 | step: 40.48
{'loss': 1.2747, 'learning_rate': 2.9442413681874357e-05, 'epoch': 0.36}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 11:30:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.67 | bwd_microstep: 1306.37 | bwd_inner_microstep: 1306.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4006
[2024-06-10 11:30:16,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.77 | bwd_microstep: 1603.76 | bwd_inner_microstep: 1603.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-10 11:30:18,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.03 | bwd_microstep: 1557.14 | bwd_inner_microstep: 1557.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3905
[2024-06-10 11:30:20,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.21 | bwd_microstep: 1481.91 | bwd_inner_microstep: 1481.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 11:30:22,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1637.89 | bwd_inner_microstep: 1637.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 11:30:24,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1380.63 | bwd_inner_microstep: 1380.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 11:30:26,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.60 | bwd_microstep: 1284.44 | bwd_inner_microstep: 1284.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3749
[2024-06-10 11:30:28,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.05 | bwd_microstep: 1499.94 | bwd_inner_microstep: 1499.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-10 11:30:29,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.46 | bwd_microstep: 702.63 | bwd_inner_microstep: 702.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 11:30:31,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1394.01 | bwd_inner_microstep: 1393.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 11:30:33,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1286.20 | bwd_inner_microstep: 1286.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 11:30:35,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1415.23 | bwd_inner_microstep: 1415.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 11:30:37,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1381.60 | bwd_inner_microstep: 1381.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3530
[2024-06-10 11:30:39,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.32 | bwd_microstep: 1557.18 | bwd_inner_microstep: 1557.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 11:30:41,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.34 | bwd_microstep: 1476.33 | bwd_inner_microstep: 1476.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3717
[2024-06-10 11:30:43,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1556.29 | bwd_inner_microstep: 1556.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 11:30:45,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.57 | bwd_microstep: 1378.07 | bwd_inner_microstep: 1378.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 11:30:47,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1557.27 | bwd_inner_microstep: 1557.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3871
[2024-06-10 11:30:49,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 682.44 | bwd_microstep: 1873.03 | bwd_inner_microstep: 1873.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:30:51,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1384.13 | bwd_inner_microstep: 1384.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3614
[2024-06-10 11:30:54,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.72 | bwd_microstep: 1811.87 | bwd_inner_microstep: 1811.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3378
[2024-06-10 11:30:56,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.83 | bwd_microstep: 1272.00 | bwd_inner_microstep: 1271.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 11:30:58,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.99 | bwd_microstep: 1522.92 | bwd_inner_microstep: 1522.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 11:31:00,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1405.88 | bwd_inner_microstep: 1405.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2282
[2024-06-10 11:31:01,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.79 | bwd_microstep: 1007.80 | bwd_inner_microstep: 1007.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 11:31:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1431.88 | bwd_inner_microstep: 1431.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619
[2024-06-10 11:31:05,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1311.45 | bwd_inner_microstep: 1311.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 11:31:07,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.82 | bwd_microstep: 1456.95 | bwd_inner_microstep: 1456.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 11:31:08,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.34 | bwd_microstep: 975.15 | bwd_inner_microstep: 975.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2671
[2024-06-10 11:31:10,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.50 | bwd_microstep: 1026.94 | bwd_inner_microstep: 1026.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3805
[2024-06-10 11:31:12,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.39 | bwd_microstep: 1599.69 | bwd_inner_microstep: 1599.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3766
[2024-06-10 11:31:14,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 11:31:14,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.43 | bwd_microstep: 1514.58 | bwd_inner_microstep: 1506.89 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.83
[2024-06-10 11:31:14,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16776.22 | bwd: 45051.20 | bwd_inner: 45042.65 | bwd_allreduce: 7.87 | step: 39.35
 622/1726 [10:47:39<18:51:16, 61.48s/it]
 36%|███▌      | 623/1726 [10:48:40<18:45:52, 61.24s/it]


 36%|███▌      | 623/1726 [10:48:40<18:45:52, 61.24s/it]
 36%|███▌      | 624/1726 [10:49:40<18:38:53, 60.92s/it]


 36%|███▌      | 624/1726 [10:49:40<18:38:53, 60.92s/it]
 36%|███▌      | 625/1726 [10:50:43<18:51:48, 61.68s/it]


 36%|███▌      | 625/1726 [10:50:43<18:51:48, 61.68s/it]
 36%|███▋      | 626/1726 [10:51:46<18:57:01, 62.02s/it]


 36%|███▋      | 626/1726 [10:51:46<18:57:01, 62.02s/it]
 36%|███▋      | 627/1726 [10:52:49<18:58:37, 62.16s/it]


 36%|███▋      | 627/1726 [10:52:49<18:58:37, 62.16s/it]
 36%|███▋      | 628/1726 [10:53:51{'loss': 1.2699, 'learning_rate': 2.9409309612055116e-05, 'epoch': 0.36}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 11:31:16,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1343.79 | bwd_inner_microstep: 1343.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 11:31:18,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1378.84 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 11:31:19,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1246.13 | bwd_inner_microstep: 1246.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 11:31:21,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1337.56 | bwd_inner_microstep: 1337.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 11:31:23,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.27 | bwd_microstep: 1342.61 | bwd_inner_microstep: 1342.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 11:31:24,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.35 | bwd_microstep: 695.77 | bwd_inner_microstep: 695.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691
[2024-06-10 11:31:26,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.60 | bwd_microstep: 1626.63 | bwd_inner_microstep: 1626.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 11:31:27,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.36 | bwd_microstep: 680.29 | bwd_inner_microstep: 680.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1382
[2024-06-10 11:31:28,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 213.34 | bwd_microstep: 557.39 | bwd_inner_microstep: 557.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-10 11:31:29,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.78 | bwd_microstep: 822.98 | bwd_inner_microstep: 822.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 11:31:31,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.88 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 11:31:33,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.36 | bwd_microstep: 1616.17 | bwd_inner_microstep: 1616.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 11:31:35,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1485.86 | bwd_inner_microstep: 1485.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 11:31:37,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1412.71 | bwd_inner_microstep: 1412.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 11:31:39,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1621.80 | bwd_inner_microstep: 1621.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1853
[2024-06-10 11:31:40,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.68 | bwd_microstep: 700.68 | bwd_inner_microstep: 700.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3526
[2024-06-10 11:31:43,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1542.01 | bwd_inner_microstep: 1541.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3610
[2024-06-10 11:31:45,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.11 | bwd_microstep: 1566.35 | bwd_inner_microstep: 1566.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3712
[2024-06-10 11:31:47,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 11:31:49,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1559.17 | bwd_inner_microstep: 1559.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3584
[2024-06-10 11:31:50,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.29 | bwd_microstep: 1206.67 | bwd_inner_microstep: 1206.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 11:31:52,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1290.16 | bwd_inner_microstep: 1290.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 11:31:54,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1407.62 | bwd_inner_microstep: 1407.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 11:31:55,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 809.62 | bwd_inner_microstep: 809.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-10 11:31:58,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.84 | bwd_microstep: 1631.48 | bwd_inner_microstep: 1631.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 11:31:59,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1404.00 | bwd_inner_microstep: 1403.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 11:32:01,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1403.94 | bwd_inner_microstep: 1403.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655
[2024-06-10 11:32:04,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.87 | bwd_microstep: 1623.19 | bwd_inner_microstep: 1623.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-10 11:32:06,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1507.26 | bwd_inner_microstep: 1507.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2044
[2024-06-10 11:32:07,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.28 | bwd_microstep: 745.25 | bwd_inner_microstep: 745.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 11:32:09,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.98 | bwd_microstep: 1378.03 | bwd_inner_microstep: 1378.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3801
[2024-06-10 11:32:15,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 11:32:15,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.02 | bwd_microstep: 5636.13 | bwd_inner_microstep: 1811.85 | bwd_allreduce_microstep: 3824.23 | step_microstep: 38.19
[2024-06-10 11:32:15,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15407.89 | bwd: 45231.04 | bwd_inner: 41405.89 | bwd_allreduce: 3824.46 | step: 39.78
{'loss': 1.2697, 'learning_rate': 2.937617240269633e-05, 'epoch': 0.36}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-10 11:32:17,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1571.23 | bwd_inner_microstep: 1571.03 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3955
[2024-06-10 11:32:19,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.53 | bwd_microstep: 1692.06 | bwd_inner_microstep: 1692.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2288
[2024-06-10 11:32:21,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.90 | bwd_microstep: 932.94 | bwd_inner_microstep: 932.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3842
[2024-06-10 11:32:23,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.77 | bwd_microstep: 1659.03 | bwd_inner_microstep: 1659.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 11:32:25,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.75 | bwd_microstep: 1276.89 | bwd_inner_microstep: 1276.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 11:32:27,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1384.03 | bwd_inner_microstep: 1384.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 734
[2024-06-10 11:32:27,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.84 | bwd_microstep: 294.72 | bwd_inner_microstep: 294.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 11:32:29,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1276.98 | bwd_inner_microstep: 1276.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 11:32:31,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1415.78 | bwd_inner_microstep: 1415.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 11:32:33,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1404.97 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 11:32:35,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.75 | bwd_microstep: 1583.92 | bwd_inner_microstep: 1583.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 11:32:37,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 1417.75 | bwd_inner_microstep: 1417.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 11:32:39,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1262.26 | bwd_inner_microstep: 1262.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637
[2024-06-10 11:32:40,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.95 | bwd_microstep: 1316.03 | bwd_inner_microstep: 1316.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3675
[2024-06-10 11:32:42,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.67 | bwd_microstep: 1455.90 | bwd_inner_microstep: 1455.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 11:32:44,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1406.68 | bwd_inner_microstep: 1406.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3677
[2024-06-10 11:32:46,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.23 | bwd_microstep: 1357.89 | bwd_inner_microstep: 1357.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.28
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 650
[2024-06-10 11:32:47,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.65 | bwd_microstep: 277.64 | bwd_inner_microstep: 277.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 11:32:49,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.21 | bwd_microstep: 1452.40 | bwd_inner_microstep: 1452.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 11:32:51,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1498.63 | bwd_inner_microstep: 1498.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 11:32:53,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.30 | bwd_microstep: 1530.71 | bwd_inner_microstep: 1530.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 11:32:55,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.55 | bwd_microstep: 1498.68 | bwd_inner_microstep: 1498.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691
[2024-06-10 11:32:57,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1333.48 | bwd_inner_microstep: 1333.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3647
[2024-06-10 11:32:59,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.36 | bwd_microstep: 1319.93 | bwd_inner_microstep: 1319.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 11:33:01,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.14 | bwd_microstep: 1406.87 | bwd_inner_microstep: 1406.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 889
[2024-06-10 11:33:01,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.80 | bwd_microstep: 368.45 | bwd_inner_microstep: 368.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 11:33:03,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1390.05 | bwd_inner_microstep: 1390.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2683
[2024-06-10 11:33:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.09 | bwd_microstep: 1154.99 | bwd_inner_microstep: 1154.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3752
[2024-06-10 11:33:07,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.39 | bwd_microstep: 1739.13 | bwd_inner_microstep: 1739.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3432
[2024-06-10 11:33:09,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 11:33:11,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1492.50 | bwd_inner_microstep: 1492.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 11:33:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 11:33:14,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.31 | bwd_microstep: 2684.60 | bwd_inner_microstep: 1840.33 | bwd_allreduce_microstep: 844.22 | step_microstep: 38.27
[2024-06-10 11:33:14,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15766.43 | bwd: 43233.51 | bwd_inner: 42388.23 | bwd_allreduce: 844.53 | step: 41.20
{'loss': 1.2483, 'learning_rate': 2.9343002170507087e-05, 'epoch': 0.36}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 11:33:16,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.78 | bwd_microstep: 1468.14 | bwd_inner_microstep: 1468.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4393
[2024-06-10 11:33:19,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.05 | bwd_microstep: 1712.36 | bwd_inner_microstep: 1712.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 11:33:21,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1476.00 | bwd_inner_microstep: 1475.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2428
[2024-06-10 11:33:22,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.81 | bwd_microstep: 845.81 | bwd_inner_microstep: 845.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 11:33:23,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.93 | bwd_microstep: 799.12 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2591
[2024-06-10 11:33:24,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.97 | bwd_microstep: 950.82 | bwd_inner_microstep: 950.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 11:33:26,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1252.39 | bwd_inner_microstep: 1252.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709
[2024-06-10 11:33:28,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1332.21 | bwd_inner_microstep: 1332.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:33:30,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1394.12 | bwd_inner_microstep: 1394.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 11:33:32,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1526.97 | bwd_inner_microstep: 1526.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:33:34,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.62 | bwd_microstep: 1387.48 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 11:33:36,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1380.60 | bwd_inner_microstep: 1380.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 11:33:38,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.59 | bwd_microstep: 1411.77 | bwd_inner_microstep: 1411.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3914
[2024-06-10 11:33:40,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.37 | bwd_microstep: 1686.03 | bwd_inner_microstep: 1686.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 11:33:41,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.27 | bwd_microstep: 800.46 | bwd_inner_microstep: 800.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1854
[2024-06-10 11:33:42,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.06 | bwd_microstep: 672.63 | bwd_inner_microstep: 672.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 11:33:44,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.92 | bwd_microstep: 1482.96 | bwd_inner_microstep: 1482.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 11:33:46,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1409.87 | bwd_inner_microstep: 1409.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3428
[2024-06-10 11:33:48,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.12 | bwd_microstep: 1409.92 | bwd_inner_microstep: 1409.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 11:33:50,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.82 | bwd_microstep: 1403.21 | bwd_inner_microstep: 1403.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3467
[2024-06-10 11:33:52,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.20 | bwd_microstep: 1340.48 | bwd_inner_microstep: 1340.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 11:33:54,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.81 | bwd_microstep: 1380.65 | bwd_inner_microstep: 1380.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431
[2024-06-10 11:33:56,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.07 | bwd_microstep: 1542.51 | bwd_inner_microstep: 1542.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 11:33:58,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.79 | bwd_microstep: 1426.89 | bwd_inner_microstep: 1426.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021
[2024-06-10 11:33:59,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.08 | bwd_microstep: 716.85 | bwd_inner_microstep: 716.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-10 11:34:01,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.47 | bwd_microstep: 1515.46 | bwd_inner_microstep: 1515.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 11:34:03,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.95 | bwd_microstep: 1503.77 | bwd_inner_microstep: 1503.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 11:34:05,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.65 | bwd_microstep: 1547.86 | bwd_inner_microstep: 1547.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 11:34:07,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.96 | bwd_microstep: 1255.46 | bwd_inner_microstep: 1255.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 11:34:09,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1304.39 | bwd_inner_microstep: 1304.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 11:34:11,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.86 | bwd_microstep: 1538.86 | bwd_inner_microstep: 1538.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3081
[2024-06-10 11:34:17,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 11:34:17,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.70 | bwd_microstep: 6082.37 | bwd_inner_microstep: 1289.58 | bwd_allreduce_microstep: 4792.74 | step_microstep: 37.95
[2024-06-10 11:34:17,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15784.77 | bwd: 46958.45 | bwd_inner: 42164.76 | bwd_allreduce: 4792.98 | step: 39.73
{'loss': 1.2657, 'learning_rate': 2.9309799032312775e-05, 'epoch': 0.37}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-10 11:34:19,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.89 | bwd_microstep: 1327.60 | bwd_inner_microstep: 1327.41 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887
[2024-06-10 11:34:21,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1579.12 | bwd_inner_microstep: 1579.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 11:34:24,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1551.90 | bwd_inner_microstep: 1551.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 11:34:25,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.00 | bwd_microstep: 1283.82 | bwd_inner_microstep: 1283.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3733
[2024-06-10 11:34:27,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.06 | bwd_microstep: 1363.87 | bwd_inner_microstep: 1363.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233
[2024-06-10 11:34:28,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.29 | bwd_microstep: 862.97 | bwd_inner_microstep: 862.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2065
[2024-06-10 11:34:29,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.45 | bwd_microstep: 728.32 | bwd_inner_microstep: 728.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713
[2024-06-10 11:34:31,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1462.62 | bwd_inner_microstep: 1462.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 11:34:33,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.47 | bwd_microstep: 799.62 | bwd_inner_microstep: 799.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3657
[2024-06-10 11:34:34,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.98 | bwd_microstep: 1356.05 | bwd_inner_microstep: 1356.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 11:34:37,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.39 | bwd_microstep: 1614.39 | bwd_inner_microstep: 1614.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 11:34:39,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668
[2024-06-10 11:34:41,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.27 | bwd_microstep: 1615.88 | bwd_inner_microstep: 1615.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 11:34:43,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.38 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 11:34:44,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.96 | bwd_microstep: 1348.37 | bwd_inner_microstep: 1348.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2636
[2024-06-10 11:34:46,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 425.89 | bwd_microstep: 1149.36 | bwd_inner_microstep: 1149.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3716
[2024-06-10 11:34:48,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.46 | bwd_microstep: 1700.23 | bwd_inner_microstep: 1700.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 11:34:50,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.81 | bwd_microstep: 1391.91 | bwd_inner_microstep: 1391.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 11:34:52,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 11:34:54,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.69 | bwd_microstep: 1406.30 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888
[2024-06-10 11:34:56,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1591.50 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301
[2024-06-10 11:34:58,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.48 | bwd_microstep: 983.20 | bwd_inner_microstep: 983.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 11:35:00,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.87 | bwd_microstep: 1303.88 | bwd_inner_microstep: 1303.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 11:35:01,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.81 | bwd_microstep: 1353.94 | bwd_inner_microstep: 1353.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 11:35:04,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.53 | bwd_microstep: 1499.26 | bwd_inner_microstep: 1499.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3755
[2024-06-10 11:35:05,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.05 | bwd_microstep: 1253.01 | bwd_inner_microstep: 1252.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3813
[2024-06-10 11:35:07,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.20 | bwd_microstep: 1520.61 | bwd_inner_microstep: 1520.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2234
[2024-06-10 11:35:09,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.71 | bwd_microstep: 898.87 | bwd_inner_microstep: 898.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1930
[2024-06-10 11:35:10,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.97 | bwd_microstep: 776.11 | bwd_inner_microstep: 776.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 11:35:12,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1555.44 | bwd_inner_microstep: 1555.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3810
[2024-06-10 11:35:14,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.83 | bwd_microstep: 1857.17 | bwd_inner_microstep: 1857.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 11:35:18,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 11:35:18,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.74 | bwd_microstep: 3382.87 | bwd_inner_microstep: 1511.82 | bwd_allreduce_microstep: 1871.00 | step_microstep: 38.32
[2024-06-10 11:35:18,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15927.75 | bwd: 44627.27 | bwd_inner: 42755.22 | bwd_allreduce: 1871.31 | step: 40.32
{'loss': 1.2696, 'learning_rate': 2.927656310505466e-05, 'epoch': 0.37}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 11:35:20,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.40 | bwd_microstep: 1472.97 | bwd_inner_microstep: 1472.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2834
[2024-06-10 11:35:22,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.29 | bwd_microstep: 1018.41 | bwd_inner_microstep: 1018.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 11:35:24,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.96 | bwd_microstep: 1557.38 | bwd_inner_microstep: 1557.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 11:35:26,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.33 | bwd_microstep: 1410.84 | bwd_inner_microstep: 1410.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 11:35:28,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1282.47 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 11:35:30,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 1480.42 | bwd_inner_microstep: 1480.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3077
[2024-06-10 11:35:31,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.64 | bwd_microstep: 1131.02 | bwd_inner_microstep: 1130.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3608
[2024-06-10 11:35:33,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.65 | bwd_microstep: 1439.66 | bwd_inner_microstep: 1439.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3687
[2024-06-10 11:35:35,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.07 | bwd_microstep: 1374.45 | bwd_inner_microstep: 1374.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 11:35:37,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.19 | bwd_microstep: 1477.18 | bwd_inner_microstep: 1477.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 11:35:39,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 1349.20 | bwd_inner_microstep: 1349.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-10 11:35:41,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.02 | bwd_microstep: 1584.12 | bwd_inner_microstep: 1584.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 11:35:43,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1396.71 | bwd_inner_microstep: 1396.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1956
[2024-06-10 11:35:44,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.28 | bwd_microstep: 893.87 | bwd_inner_microstep: 893.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2407
[2024-06-10 11:35:46,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.03 | bwd_microstep: 1005.40 | bwd_inner_microstep: 1005.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 11:35:47,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.94 | bwd_microstep: 1184.81 | bwd_inner_microstep: 1184.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 11:35:49,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1391.93 | bwd_inner_microstep: 1391.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 11:35:52,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.40 | bwd_microstep: 1626.86 | bwd_inner_microstep: 1626.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 11:35:53,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1377.48 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3541
[2024-06-10 11:35:55,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.39 | bwd_microstep: 1325.68 | bwd_inner_microstep: 1325.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 11:35:57,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1296.73 | bwd_inner_microstep: 1296.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 11:35:59,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1512.58 | bwd_inner_microstep: 1512.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3662
[2024-06-10 11:36:01,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.55 | bwd_microstep: 1453.83 | bwd_inner_microstep: 1453.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 11:36:03,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.70 | bwd_microstep: 1317.26 | bwd_inner_microstep: 1317.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3614
[2024-06-10 11:36:05,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.48 | bwd_microstep: 1707.77 | bwd_inner_microstep: 1707.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2012
[2024-06-10 11:36:06,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.58 | bwd_microstep: 777.34 | bwd_inner_microstep: 777.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1908
[2024-06-10 11:36:08,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.44 | bwd_microstep: 766.41 | bwd_inner_microstep: 766.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 11:36:09,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1375.20 | bwd_inner_microstep: 1375.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3425
[2024-06-10 11:36:12,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.37 | bwd_microstep: 1540.43 | bwd_inner_microstep: 1540.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3625
[2024-06-10 11:36:13,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.58 | bwd_microstep: 1350.92 | bwd_inner_microstep: 1350.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2714
[2024-06-10 11:36:15,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.19 | bwd_microstep: 1137.16 | bwd_inner_microstep: 1137.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 11:36:20,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.33 | optimizer_step: 6.61
[2024-06-10 11:36:20,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 4709.55 | bwd_inner_microstep: 1565.84 | bwd_allreduce_microstep: 3143.65 | step_microstep: 38.57
[2024-06-10 11:36:20,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15895.37 | bwd: 45726.04 | bwd_inner: 42581.45 | bwd_allreduce: 3143.88 | step: 40.61
{'loss': 1.291, 'learning_rate': 2.9243294505789514e-05, 'epoch': 0.37}
<18:57:37, 62.17s/it]


 36%|███▋      | 628/1726 [10:53:51<18:57:37, 62.17s/it]
 36%|███▋      | 629/1726 [10:54:52<18:50:04, 61.81s/it]


 36%|███▋      | 629/1726 [10:54:52<18:50:04, 61.81s/it]
 37%|███▋      | 630/1726 [10:55:51<18:35:35, 61.07s/it]


 37%|███▋      | 630/1726 [10:55:51<18:35:35, 61.07s/it]
 37%|███▋      | 631/1726 [10:56:54<18:45:39, 61.68s/it]


 37%|███▋      | 631/1726 [10:56:54<18:45:39, 61.68s/it]
 37%|███▋      | 632/1726 [10:57:55<18:40:27, 61.45s/it]


 37%|███▋      | 632/1726 [10:57:55<18:40:27, 61.45s/it]
 37%|███▋      | 633/1726 [10:58:57<18:42:21, 61.61s/it]


 37%dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 11:36:22,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.62 | bwd_microstep: 1381.19 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 11:36:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1353.21 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4351
[2024-06-10 11:36:26,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.18 | bwd_microstep: 1572.97 | bwd_inner_microstep: 1572.74 | bwd_allreduce_microstep: 0.15 | step_microstep: 0.23
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 11:36:28,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1385.27 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 11:36:30,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1398.27 | bwd_inner_microstep: 1398.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:36:32,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.21 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 11:36:34,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.98 | bwd_microstep: 1148.65 | bwd_inner_microstep: 1148.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2189
[2024-06-10 11:36:35,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.77 | bwd_microstep: 953.41 | bwd_inner_microstep: 953.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582
[2024-06-10 11:36:37,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.29 | bwd_microstep: 1304.74 | bwd_inner_microstep: 1304.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777
[2024-06-10 11:36:39,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1649.95 | bwd_inner_microstep: 1649.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 11:36:41,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1531.77 | bwd_inner_microstep: 1531.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 11:36:43,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1311.61 | bwd_inner_microstep: 1311.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418
[2024-06-10 11:36:45,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.12 | bwd_microstep: 1189.44 | bwd_inner_microstep: 1189.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3948
[2024-06-10 11:36:47,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.80 | bwd_microstep: 1625.19 | bwd_inner_microstep: 1625.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3438
[2024-06-10 11:36:49,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1477.05 | bwd_inner_microstep: 1477.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3637
[2024-06-10 11:36:51,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.59 | bwd_microstep: 1579.08 | bwd_inner_microstep: 1579.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 11:36:53,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.16 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 11:36:55,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1405.66 | bwd_inner_microstep: 1405.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 11:36:57,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.58 | bwd_microstep: 1557.08 | bwd_inner_microstep: 1557.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2933
[2024-06-10 11:36:59,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.00 | bwd_microstep: 1099.10 | bwd_inner_microstep: 1099.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 11:37:00,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1278.89 | bwd_inner_microstep: 1278.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 11:37:02,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 11:37:04,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.86 | bwd_microstep: 1315.77 | bwd_inner_microstep: 1315.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 11:37:06,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.49 | bwd_microstep: 1499.14 | bwd_inner_microstep: 1499.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 11:37:09,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.10 | bwd_microstep: 1611.87 | bwd_inner_microstep: 1611.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3832
[2024-06-10 11:37:11,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1520.24 | bwd_inner_microstep: 1520.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4182
[2024-06-10 11:37:13,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.07 | bwd_microstep: 1688.84 | bwd_inner_microstep: 1688.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3586
[2024-06-10 11:37:15,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.50 | bwd_microstep: 1641.62 | bwd_inner_microstep: 1641.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 11:37:17,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1501.07 | bwd_inner_microstep: 1501.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 11:37:19,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1248.15 | bwd_inner_microstep: 1248.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3764
[2024-06-10 11:37:21,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.34 | bwd_microstep: 1741.61 | bwd_inner_microstep: 1741.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 11:37:24,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.16 | optimizer_step: 6.64
[2024-06-10 11:37:24,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.63 | bwd_microstep: 1571.17 | bwd_inner_microstep: 1563.40 | bwd_allreduce_microstep: 7.73 | step_microstep: 37.74
[2024-06-10 11:37:24,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17110.45 | bwd: 45832.81 | bwd_inner: 45823.89 | bwd_allreduce: 8.15 | step: 39.75
{'loss': 1.2623, 'learning_rate': 2.920999335168917e-05, 'epoch': 0.37}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 11:37:26,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.08 | bwd_microstep: 1621.60 | bwd_inner_microstep: 1621.40 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 11:37:27,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.84 | bwd_microstep: 792.23 | bwd_inner_microstep: 792.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 11:37:29,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.31 | bwd_microstep: 1652.66 | bwd_inner_microstep: 1652.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 11:37:31,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.55 | bwd_microstep: 1551.71 | bwd_inner_microstep: 1551.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 11:37:33,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1252.30 | bwd_inner_microstep: 1252.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407
[2024-06-10 11:37:35,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.78 | bwd_microstep: 1188.69 | bwd_inner_microstep: 1188.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3646
[2024-06-10 11:37:37,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1381.50 | bwd_inner_microstep: 1381.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2652
[2024-06-10 11:37:38,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.92 | bwd_microstep: 1114.14 | bwd_inner_microstep: 1114.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2210
[2024-06-10 11:37:39,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.13 | bwd_microstep: 864.56 | bwd_inner_microstep: 864.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3527
[2024-06-10 11:37:41,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.15 | bwd_microstep: 1440.77 | bwd_inner_microstep: 1440.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 11:37:43,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 11:37:45,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1380.19 | bwd_inner_microstep: 1380.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 11:37:47,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.41 | bwd_microstep: 1150.82 | bwd_inner_microstep: 1150.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 11:37:49,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.45 | bwd_microstep: 1524.74 | bwd_inner_microstep: 1524.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 11:37:51,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1296.59 | bwd_inner_microstep: 1296.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 11:37:53,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1347.48 | bwd_inner_microstep: 1347.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 11:37:54,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1385.40 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3941
[2024-06-10 11:37:56,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.56 | bwd_microstep: 1433.46 | bwd_inner_microstep: 1433.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3541
[2024-06-10 11:37:58,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.69 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 11:38:00,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.46 | bwd_microstep: 1296.32 | bwd_inner_microstep: 1296.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 11:38:02,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1451.28 | bwd_inner_microstep: 1451.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 11:38:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.55 | bwd_microstep: 1258.64 | bwd_inner_microstep: 1258.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 11:38:05,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.14 | bwd_microstep: 697.38 | bwd_inner_microstep: 697.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 11:38:07,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3214
[2024-06-10 11:38:09,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.60 | bwd_microstep: 1271.92 | bwd_inner_microstep: 1271.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 11:38:11,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.10 | bwd_microstep: 1439.79 | bwd_inner_microstep: 1439.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3552
[2024-06-10 11:38:12,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.12 | bwd_microstep: 1236.95 | bwd_inner_microstep: 1236.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3815
[2024-06-10 11:38:14,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1501.05 | bwd_inner_microstep: 1501.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 11:38:16,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1381.13 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 11:38:17,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 793.21 | bwd_inner_microstep: 793.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 11:38:19,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.52 | bwd_microstep: 1159.87 | bwd_inner_microstep: 1159.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 11:38:25,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.56
[2024-06-10 11:38:25,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.21 | bwd_microstep: 5759.62 | bwd_inner_microstep: 1759.38 | bwd_allreduce_microstep: 4000.18 | step_microstep: 38.09
[2024-06-10 11:38:25,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15651.16 | bwd: 45794.74 | bwd_inner: 41793.49 | bwd_allreduce: 4000.49 | step: 39.74
{'loss': 1.2144, 'learning_rate': 2.9176659760040125e-05, 'epoch': 0.37}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 11:38:27,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1365.16 | bwd_inner_microstep: 1364.98 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 11:38:29,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.24 | bwd_microstep: 1276.20 | bwd_inner_microstep: 1276.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3886
[2024-06-10 11:38:31,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.29 | bwd_microstep: 1686.30 | bwd_inner_microstep: 1686.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 11:38:33,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.42 | bwd_microstep: 805.24 | bwd_inner_microstep: 805.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 11:38:34,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1247.49 | bwd_inner_microstep: 1247.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 11:38:37,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.78 | bwd_microstep: 1654.06 | bwd_inner_microstep: 1654.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 11:38:38,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.00 | bwd_microstep: 1385.46 | bwd_inner_microstep: 1385.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3606
[2024-06-10 11:38:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1434.29 | bwd_inner_microstep: 1434.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 11:38:42,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.87 | bwd_microstep: 1153.33 | bwd_inner_microstep: 1153.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 11:38:44,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1389.94 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 11:38:45,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.25 | bwd_microstep: 789.37 | bwd_inner_microstep: 789.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 11:38:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1372.89 | bwd_inner_microstep: 1372.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 11:38:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.25 | bwd_microstep: 1288.27 | bwd_inner_microstep: 1288.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 11:38:51,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.44 | bwd_microstep: 1451.08 | bwd_inner_microstep: 1451.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 11:38:53,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.77 | bwd_microstep: 1388.46 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.45
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-10 11:38:54,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.83 | bwd_microstep: 1196.05 | bwd_inner_microstep: 1196.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 11:38:56,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 11:38:58,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1256.14 | bwd_inner_microstep: 1256.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967
[2024-06-10 11:38:59,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.23 | bwd_microstep: 703.53 | bwd_inner_microstep: 703.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-10 11:39:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1335.08 | bwd_inner_microstep: 1335.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568
[2024-06-10 11:39:03,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1360.14 | bwd_inner_microstep: 1360.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 11:39:04,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.22 | bwd_microstep: 1255.97 | bwd_inner_microstep: 1255.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 11:39:06,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.80 | bwd_microstep: 1399.04 | bwd_inner_microstep: 1399.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3804
[2024-06-10 11:39:08,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.24 | bwd_microstep: 1484.91 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 11:39:10,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1390.73 | bwd_inner_microstep: 1390.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-10 11:39:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 11:39:14,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1512.93 | bwd_inner_microstep: 1512.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2644
[2024-06-10 11:39:16,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.10 | bwd_microstep: 1054.65 | bwd_inner_microstep: 1054.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-10 11:39:17,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.37 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1280.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472
[2024-06-10 11:39:19,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.72 | bwd_microstep: 1217.57 | bwd_inner_microstep: 1217.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 11:39:21,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1596.84 | bwd_inner_microstep: 1596.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 11:39:27,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.29 | optimizer_step: 6.58
[2024-06-10 11:39:27,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.37 | bwd_microstep: 5164.32 | bwd_inner_microstep: 1311.37 | bwd_allreduce_microstep: 3852.89 | step_microstep: 38.70
[2024-06-10 11:39:27,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15621.83 | bwd: 45529.55 | bwd_inner: 41675.60 | bwd_allreduce: 3853.21 | step: 40.98
{'loss': 1.2276, 'learning_rate': 2.9143293848243103e-05, 'epoch': 0.37}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 11:39:29,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1280.81 | bwd_inner_microstep: 1280.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 11:39:30,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.01 | bwd_microstep: 1246.40 | bwd_inner_microstep: 1246.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 11:39:31,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.99 | bwd_microstep: 679.27 | bwd_inner_microstep: 679.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 11:39:33,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.33 | bwd_microstep: 1368.79 | bwd_inner_microstep: 1368.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3761
[2024-06-10 11:39:36,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.77 | bwd_microstep: 1737.00 | bwd_inner_microstep: 1736.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 11:39:38,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.69 | bwd_microstep: 1648.69 | bwd_inner_microstep: 1648.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 11:39:40,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.89 | bwd_microstep: 1478.34 | bwd_inner_microstep: 1478.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4067
[2024-06-10 11:39:42,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.99 | bwd_microstep: 1620.68 | bwd_inner_microstep: 1620.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 11:39:44,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1387.02 | bwd_inner_microstep: 1386.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 11:39:46,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 1287.63 | bwd_inner_microstep: 1287.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3736
[2024-06-10 11:39:48,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.67 | bwd_microstep: 1490.43 | bwd_inner_microstep: 1490.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 11:39:50,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1251.21 | bwd_inner_microstep: 1251.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3380
[2024-06-10 11:39:51,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1276.60 | bwd_inner_microstep: 1276.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2123
[2024-06-10 11:39:53,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.43 | bwd_microstep: 925.79 | bwd_inner_microstep: 925.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 11:39:55,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.58 | bwd_microstep: 1628.19 | bwd_inner_microstep: 1628.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 11:39:56,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.53 | bwd_microstep: 892.18 | bwd_inner_microstep: 892.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 11:39:58,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1429.98 | bwd_inner_microstep: 1429.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 11:40:00,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.44 | bwd_microstep: 1529.12 | bwd_inner_microstep: 1529.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 11:40:02,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.48 | bwd_microstep: 1312.16 | bwd_inner_microstep: 1312.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 11:40:04,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1491.47 | bwd_inner_microstep: 1491.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940
[2024-06-10 11:40:05,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.23 | bwd_microstep: 759.87 | bwd_inner_microstep: 759.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 11:40:07,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.38 | bwd_microstep: 1283.47 | bwd_inner_microstep: 1283.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3544
[2024-06-10 11:40:09,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.03 | bwd_microstep: 1199.12 | bwd_inner_microstep: 1199.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 11:40:11,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1558.40 | bwd_inner_microstep: 1558.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3550
[2024-06-10 11:40:13,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1231.94 | bwd_inner_microstep: 1231.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3549
[2024-06-10 11:40:14,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.91 | bwd_microstep: 1419.08 | bwd_inner_microstep: 1419.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905
[2024-06-10 11:40:16,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.13 | bwd_microstep: 778.02 | bwd_inner_microstep: 778.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 11:40:17,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1379.25 | bwd_inner_microstep: 1379.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3579
[2024-06-10 11:40:20,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.02 | bwd_microstep: 1698.00 | bwd_inner_microstep: 1697.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3424
[2024-06-10 11:40:22,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1490.63 | bwd_inner_microstep: 1490.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3816
[2024-06-10 11:40:24,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.42 | bwd_microstep: 1725.44 | bwd_inner_microstep: 1725.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3578
[2024-06-10 11:40:29,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 11:40:29,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.55 | bwd_microstep: 4325.67 | bwd_inner_microstep: 1920.66 | bwd_allreduce_microstep: 2404.96 | step_microstep: 72.36
[2024-06-10 11:40:29,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16165.52 | bwd: 45810.68 | bwd_inner: 43404.80 | bwd_allreduce: 2405.19 | step: 74.26
{'loss': 1.2861, 'learning_rate': 2.910989573381268e-05, 'epoch': 0.37}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 11:40:31,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.20 | bwd_microstep: 1379.15 | bwd_inner_microstep: 1379.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4672
[2024-06-10 11:40:33,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.43 | bwd_microstep: 1578.21 | bwd_inner_microstep: 1578.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 11:40:35,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 11:40:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.51 | bwd_microstep: 1148.07 | bwd_inner_microstep: 1148.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 11:40:39,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.06 | bwd_microstep: 1540.06 | bwd_inner_microstep: 1540.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 11:40:41,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-10 11:40:42,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.99 | bwd_microstep: 729.00 | bwd_inner_microstep: 728.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 11:40:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.80 | bwd_microstep: 699.04 | bwd_inner_microstep: 699.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 11:40:45,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1299.21 | bwd_inner_microstep: 1299.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2129
[2024-06-10 11:40:46,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.43 | bwd_microstep: 954.76 | bwd_inner_microstep: 954.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 823
[2024-06-10 11:40:46,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 123.55 | bwd_microstep: 321.94 | bwd_inner_microstep: 321.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3448
[2024-06-10 11:40:48,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.49 | bwd_microstep: 1300.22 | bwd_inner_microstep: 1300.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 11:40:50,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1480.56 | bwd_inner_microstep: 1480.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3600
[2024-06-10 11:40:53,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.78 | bwd_microstep: 1806.92 | bwd_inner_microstep: 1806.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 11:40:55,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1511.09 | bwd_inner_microstep: 1511.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 11:40:57,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.29 | bwd_microstep: 1689.31 | bwd_inner_microstep: 1689.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 11:40:59,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1508.34 | bwd_inner_microstep: 1508.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 11:41:01,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1285.28 | bwd_inner_microstep: 1285.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 11:41:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.35 | bwd_microstep: 1508.86 | bwd_inner_microstep: 1508.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3820
[2024-06-10 11:41:05,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.17 | bwd_microstep: 1291.81 | bwd_inner_microstep: 1291.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 609
[2024-06-10 11:41:05,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.90 | bwd_microstep: 260.83 | bwd_inner_microstep: 260.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 11:41:07,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.58 | bwd_microstep: 1240.53 | bwd_inner_microstep: 1240.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 11:41:09,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.15 | bwd_microstep: 1353.32 | bwd_inner_microstep: 1353.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 11:41:11,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1480.74 | bwd_inner_microstep: 1480.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-10 11:41:12,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.68 | bwd_microstep: 908.95 | bwd_inner_microstep: 908.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 11:41:14,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.58 | bwd_microstep: 1642.87 | bwd_inner_microstep: 1642.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 11:41:16,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.77 | bwd_microstep: 1530.18 | bwd_inner_microstep: 1530.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812
[2024-06-10 11:41:19,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 1512.58 | bwd_inner_microstep: 1512.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 11:41:21,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.55 | bwd_microstep: 1598.73 | bwd_inner_microstep: 1598.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 11:41:23,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1500.35 | bwd_inner_microstep: 1500.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2044
[2024-06-10 11:41:24,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.20 | bwd_microstep: 841.59 | bwd_inner_microstep: 841.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 11:41:33,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.40 | optimizer_step: 6.60
[2024-06-10 11:41:33,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.95 | bwd_microstep: 8010.83 | bwd_inner_microstep: 1680.82 | bwd_allreduce_microstep: 6329.94 | step_microstep: 39.83
[2024-06-10 11:41:33,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15379.91 | bwd: 47576.73 | bwd_inner: 41245.87 | bwd_allreduce: 6330.18 | step: 41.38
{'loss': 1.2814, 'learning_rate': 2.9076465534376847e-05, 'epoch': 0.37}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 11:41:35,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1415.45 | bwd_inner_microstep: 1415.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2764
[2024-06-10 11:41:36,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.06 | bwd_microstep: 1097.41 | bwd_inner_microstep: 1097.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900
[2024-06-10 11:41:38,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1583.01 | bwd_inner_microstep: 1582.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3863
[2024-06-10 11:41:40,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1393.69 | bwd_inner_microstep: 1393.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 11:41:42,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.67 | bwd_microstep: 1552.30 | bwd_inner_microstep: 1552.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3752
[2024-06-10 11:41:44,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.79 | bwd_microstep: 1433.89 | bwd_inner_microstep: 1433.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 11:41:46,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.31 | bwd_microstep: 1380.13 | bwd_inner_microstep: 1380.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 11:41:48,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.46 | bwd_microstep: 1281.65 | bwd_inner_microstep: 1281.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 11:41:50,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1315.48 | bwd_inner_microstep: 1315.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3678
[2024-06-10 11:41:52,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.09 | bwd_microstep: 1667.26 | bwd_inner_microstep: 1667.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-10 11:41:54,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.36 | bwd_microstep: 1522.93 | bwd_inner_microstep: 1522.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3495
[2024-06-10 11:41:56,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.78 | bwd_microstep: 1679.95 | bwd_inner_microstep: 1679.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2101
[2024-06-10 11:41:58,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.04 | bwd_microstep: 855.41 | bwd_inner_microstep: 855.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:42:00,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3578
[2024-06-10 11:42:02,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1461.63 | bwd_inner_microstep: 1461.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 11:42:03,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1246.38 | bwd_inner_microstep: 1246.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 11:42:05,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1282.83 | bwd_inner_microstep: 1282.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 11:42:07,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.11 | bwd_microstep: 1493.58 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2016
[2024-06-10 11:42:08,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.41 | bwd_microstep: 833.52 | bwd_inner_microstep: 833.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 11:42:10,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1287.83 | bwd_inner_microstep: 1287.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.38
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3627
[2024-06-10 11:42:12,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.24 | bwd_microstep: 1219.82 | bwd_inner_microstep: 1219.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 11:42:14,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.34 | bwd_microstep: 1439.40 | bwd_inner_microstep: 1439.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 11:42:16,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1510.05 | bwd_inner_microstep: 1510.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 645
[2024-06-10 11:42:16,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 111.01 | bwd_microstep: 274.74 | bwd_inner_microstep: 274.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2718
[2024-06-10 11:42:18,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.35 | bwd_microstep: 947.10 | bwd_inner_microstep: 947.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 11:42:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1326.84 | bwd_inner_microstep: 1326.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 11:42:21,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1285.52 | bwd_inner_microstep: 1285.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-10 11:42:23,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.88 | bwd_microstep: 1632.78 | bwd_inner_microstep: 1632.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 11:42:25,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.12 | bwd_microstep: 1308.81 | bwd_inner_microstep: 1308.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3813
[2024-06-10 11:42:27,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.33 | bwd_microstep: 1580.66 | bwd_inner_microstep: 1580.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476
[2024-06-10 11:42:29,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.10 | bwd_microstep: 1410.98 | bwd_inner_microstep: 1410.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 11:42:33,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 11:42:33,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.25 | bwd_microstep: 3399.34 | bwd_inner_microstep: 1104.86 | bwd_allreduce_microstep: 2294.42 | step_microstep: 39.09
[2024-06-10 11:42:33,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15792.42 | bwd: 44501.17 | bwd_inner: 42205.79 | bwd_allreduce: 2294.67 | step: 42.41
|███▋      | 633/1726 [10:58:57<18:42:21, 61.61s/it]
 37%|███▋      | 634/1726 [11:00:00<18:50:38, 62.12s/it]


 37%|███▋      | 634/1726 [11:00:00<18:50:38, 62.12s/it]
 37%|███▋      | 635/1726 [11:01:02<18:47:48, 62.02s/it]


 37%|███▋      | 635/1726 [11:01:02<18:47:48, 62.02s/it]
 37%|███▋      | 636/1726 [11:02:04<18:43:54, 61.87s/it]


 37%|███▋      | 636/1726 [11:02:04<18:43:54, 61.87s/it]
 37%|███▋      | 637/1726 [11:03:06<18:45:38, 62.02s/it]


 37%|███▋      | 637/1726 [11:03:06<18:45:38, 62.02s/it]
 37%|███▋      | 638/1726 [11:04:09<18:51:34, 62.40s/it]


 37%|███▋      | 638/1726 [11:04:09<18:51:34, 62.40s/it]
 37%|███▋      {'loss': 1.2251, 'learning_rate': 2.9043003367676577e-05, 'epoch': 0.37}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3464
[2024-06-10 11:42:35,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.62 | bwd_microstep: 1560.27 | bwd_inner_microstep: 1560.11 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.17
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 11:42:37,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1313.59 | bwd_inner_microstep: 1313.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 11:42:39,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.69 | bwd_microstep: 975.51 | bwd_inner_microstep: 975.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 11:42:40,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.92 | bwd_microstep: 1342.61 | bwd_inner_microstep: 1342.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 777
[2024-06-10 11:42:41,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.88 | bwd_microstep: 305.87 | bwd_inner_microstep: 305.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 11:42:43,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1385.24 | bwd_inner_microstep: 1385.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-10 11:42:44,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.06 | bwd_microstep: 1187.10 | bwd_inner_microstep: 1187.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2213
[2024-06-10 11:42:46,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.08 | bwd_microstep: 959.21 | bwd_inner_microstep: 959.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 11:42:48,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.24 | bwd_microstep: 1391.49 | bwd_inner_microstep: 1391.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 11:42:49,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.34 | bwd_microstep: 1278.62 | bwd_inner_microstep: 1278.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403
[2024-06-10 11:42:51,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.18 | bwd_microstep: 1308.05 | bwd_inner_microstep: 1308.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3498
[2024-06-10 11:42:53,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.61 | bwd_microstep: 1563.56 | bwd_inner_microstep: 1563.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 11:42:55,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.47 | bwd_microstep: 1483.24 | bwd_inner_microstep: 1483.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501
[2024-06-10 11:42:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.43 | bwd_microstep: 1553.30 | bwd_inner_microstep: 1553.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3450
[2024-06-10 11:43:00,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.74 | bwd_microstep: 1514.90 | bwd_inner_microstep: 1514.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583
[2024-06-10 11:43:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1459.17 | bwd_inner_microstep: 1459.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 11:43:04,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1348.55 | bwd_inner_microstep: 1348.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 11:43:06,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1493.79 | bwd_inner_microstep: 1493.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 11:43:08,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1451.14 | bwd_inner_microstep: 1451.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 11:43:10,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1399.20 | bwd_inner_microstep: 1399.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 11:43:11,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.63 | bwd_microstep: 1384.96 | bwd_inner_microstep: 1384.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3524
[2024-06-10 11:43:13,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.23 | bwd_microstep: 1244.03 | bwd_inner_microstep: 1244.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 11:43:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1499.16 | bwd_inner_microstep: 1499.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 11:43:17,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.82 | bwd_microstep: 1295.17 | bwd_inner_microstep: 1295.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 11:43:19,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.53 | bwd_microstep: 1652.41 | bwd_inner_microstep: 1652.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-10 11:43:21,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.29 | bwd_microstep: 916.71 | bwd_inner_microstep: 916.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 11:43:22,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1352.47 | bwd_inner_microstep: 1352.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-10 11:43:25,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.25 | bwd_microstep: 1602.44 | bwd_inner_microstep: 1602.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3814
[2024-06-10 11:43:27,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.55 | bwd_microstep: 1623.96 | bwd_inner_microstep: 1623.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 11:43:29,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.09 | bwd_microstep: 1446.16 | bwd_inner_microstep: 1446.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 11:43:31,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.45 | bwd_microstep: 1303.91 | bwd_inner_microstep: 1303.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 11:43:35,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.27 | optimizer_step: 6.63
[2024-06-10 11:43:35,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 3200.74 | bwd_inner_microstep: 1551.92 | bwd_allreduce_microstep: 1648.76 | step_microstep: 38.26
[2024-06-10 11:43:35,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16104.91 | bwd: 44796.57 | bwd_inner: 43146.75 | bwd_allreduce: 1649.07 | step: 40.17
{'loss': 1.3072, 'learning_rate': 2.9009509351565462e-05, 'epoch': 0.37}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544
[2024-06-10 11:43:36,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.38 | bwd_microstep: 1230.97 | bwd_inner_microstep: 1230.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 11:43:38,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1395.97 | bwd_inner_microstep: 1395.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 11:43:40,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.48 | bwd_microstep: 1563.84 | bwd_inner_microstep: 1563.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 11:43:42,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1480.82 | bwd_inner_microstep: 1480.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 11:43:43,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 806.16 | bwd_inner_microstep: 806.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3749
[2024-06-10 11:43:46,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.30 | bwd_microstep: 1587.44 | bwd_inner_microstep: 1587.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 11:43:48,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 11:43:49,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.19 | bwd_microstep: 799.61 | bwd_inner_microstep: 799.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 11:43:50,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.31 | bwd_microstep: 1194.53 | bwd_inner_microstep: 1194.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 11:43:52,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1255.04 | bwd_inner_microstep: 1255.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2946
[2024-06-10 11:43:54,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.67 | bwd_microstep: 1198.02 | bwd_inner_microstep: 1197.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510
[2024-06-10 11:43:56,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1348.27 | bwd_inner_microstep: 1348.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 11:43:57,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.03 | bwd_microstep: 793.80 | bwd_inner_microstep: 793.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2152
[2024-06-10 11:43:58,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.01 | bwd_microstep: 820.43 | bwd_inner_microstep: 820.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 11:44:00,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.08 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 11:44:02,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.36 | bwd_microstep: 1342.23 | bwd_inner_microstep: 1342.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3523
[2024-06-10 11:44:03,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.49 | bwd_microstep: 1342.20 | bwd_inner_microstep: 1342.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3584
[2024-06-10 11:44:05,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.80 | bwd_microstep: 1238.62 | bwd_inner_microstep: 1238.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 11:44:07,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1430.29 | bwd_inner_microstep: 1430.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 11:44:09,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1284.84 | bwd_inner_microstep: 1284.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114
[2024-06-10 11:44:10,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.41 | bwd_microstep: 924.90 | bwd_inner_microstep: 924.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 11:44:11,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 808.59 | bwd_inner_microstep: 808.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 11:44:14,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 1554.50 | bwd_inner_microstep: 1554.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 11:44:15,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1258.35 | bwd_inner_microstep: 1258.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3672
[2024-06-10 11:44:18,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.73 | bwd_microstep: 1822.87 | bwd_inner_microstep: 1822.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 11:44:20,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.55 | bwd_microstep: 1629.73 | bwd_inner_microstep: 1629.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-10 11:44:22,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1596.11 | bwd_inner_microstep: 1595.85 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-10 11:44:24,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.78 | bwd_microstep: 1637.27 | bwd_inner_microstep: 1637.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 11:44:27,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1550.63 | bwd_inner_microstep: 1550.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2092
[2024-06-10 11:44:28,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.69 | bwd_microstep: 823.53 | bwd_inner_microstep: 823.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 11:44:30,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.39 | bwd_microstep: 1535.84 | bwd_inner_microstep: 1535.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.01
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3442
[2024-06-10 11:44:36,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.32 | optimizer_step: 6.61
[2024-06-10 11:44:36,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.82 | bwd_microstep: 5235.10 | bwd_inner_microstep: 1565.50 | bwd_allreduce_microstep: 3669.52 | step_microstep: 38.61
[2024-06-10 11:44:36,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15505.43 | bwd: 45251.67 | bwd_inner: 41581.03 | bwd_allreduce: 3669.88 | step: 41.46
{'loss': 1.2391, 'learning_rate': 2.897598360400925e-05, 'epoch': 0.37}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 11:44:38,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.32 | bwd_microstep: 1443.31 | bwd_inner_microstep: 1443.16 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2459
[2024-06-10 11:44:39,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.65 | bwd_microstep: 1012.75 | bwd_inner_microstep: 1012.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939
[2024-06-10 11:44:40,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.93 | bwd_microstep: 820.16 | bwd_inner_microstep: 820.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 11:44:42,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1416.77 | bwd_inner_microstep: 1416.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732
[2024-06-10 11:44:44,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.23 | bwd_microstep: 1631.20 | bwd_inner_microstep: 1631.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874
[2024-06-10 11:44:45,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.96 | bwd_microstep: 680.16 | bwd_inner_microstep: 680.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 11:44:47,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 1384.22 | bwd_inner_microstep: 1384.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 11:44:49,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.19 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 11:44:51,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.65 | bwd_microstep: 1155.99 | bwd_inner_microstep: 1155.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 11:44:52,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1356.15 | bwd_inner_microstep: 1356.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 11:44:54,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.47 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1968
[2024-06-10 11:44:56,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.66 | bwd_microstep: 858.80 | bwd_inner_microstep: 858.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3681
[2024-06-10 11:44:58,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.84 | bwd_microstep: 1724.09 | bwd_inner_microstep: 1724.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3515
[2024-06-10 11:45:00,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 1367.41 | bwd_inner_microstep: 1367.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3645
[2024-06-10 11:45:02,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.19 | bwd_microstep: 1746.37 | bwd_inner_microstep: 1746.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979
[2024-06-10 11:45:03,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 893.80 | bwd_inner_microstep: 893.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3524
[2024-06-10 11:45:06,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.11 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625
[2024-06-10 11:45:08,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.87 | bwd_microstep: 1433.59 | bwd_inner_microstep: 1433.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 11:45:09,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 1412.69 | bwd_inner_microstep: 1412.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 11:45:12,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.27 | bwd_microstep: 1506.21 | bwd_inner_microstep: 1506.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 11:45:14,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1487.37 | bwd_inner_microstep: 1487.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977
[2024-06-10 11:45:15,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.97 | bwd_microstep: 893.65 | bwd_inner_microstep: 893.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 11:45:17,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1280.47 | bwd_inner_microstep: 1280.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 11:45:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.51 | bwd_microstep: 1657.40 | bwd_inner_microstep: 1657.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 11:45:21,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.40 | bwd_microstep: 1564.90 | bwd_inner_microstep: 1564.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 11:45:23,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.62 | bwd_microstep: 1404.95 | bwd_inner_microstep: 1404.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 11:45:25,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1376.48 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 11:45:27,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1352.74 | bwd_inner_microstep: 1352.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 11:45:29,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1390.24 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 11:45:31,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1494.87 | bwd_inner_microstep: 1494.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 11:45:33,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.77 | bwd_microstep: 1289.36 | bwd_inner_microstep: 1289.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 11:45:35,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-10 11:45:35,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.21 | bwd_microstep: 1892.91 | bwd_inner_microstep: 1442.10 | bwd_allreduce_microstep: 450.76 | step_microstep: 37.89
[2024-06-10 11:45:35,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15891.67 | bwd: 43101.57 | bwd_inner: 42649.79 | bwd_allreduce: 451.05 | step: 39.59
{'loss': 1.2723, 'learning_rate': 2.894242624308544e-05, 'epoch': 0.37}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 11:45:37,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.35 | bwd_microstep: 1477.64 | bwd_inner_microstep: 1477.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2362
[2024-06-10 11:45:38,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.65 | bwd_microstep: 989.40 | bwd_inner_microstep: 989.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 11:45:41,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.91 | bwd_microstep: 1555.34 | bwd_inner_microstep: 1555.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 11:45:42,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1250.71 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741
[2024-06-10 11:45:45,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.57 | bwd_microstep: 1634.32 | bwd_inner_microstep: 1634.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 11:45:46,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 11:45:48,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.20 | bwd_microstep: 1403.92 | bwd_inner_microstep: 1403.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2046
[2024-06-10 11:45:49,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.21 | bwd_microstep: 718.64 | bwd_inner_microstep: 718.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 11:45:51,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2487
[2024-06-10 11:45:53,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.78 | bwd_microstep: 1054.92 | bwd_inner_microstep: 1054.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3503
[2024-06-10 11:45:54,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1336.21 | bwd_inner_microstep: 1336.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3507
[2024-06-10 11:45:57,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1629.71 | bwd_inner_microstep: 1629.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3461
[2024-06-10 11:45:58,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1324.33 | bwd_inner_microstep: 1324.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 11:46:01,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.65 | bwd_microstep: 1475.89 | bwd_inner_microstep: 1475.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1966
[2024-06-10 11:46:02,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.35 | bwd_microstep: 893.11 | bwd_inner_microstep: 893.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504
[2024-06-10 11:46:04,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.48 | bwd_microstep: 1586.32 | bwd_inner_microstep: 1586.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3623
[2024-06-10 11:46:06,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1437.52 | bwd_inner_microstep: 1437.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3761
[2024-06-10 11:46:08,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1280.03 | bwd_inner_microstep: 1280.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 11:46:10,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.60 | bwd_microstep: 1297.91 | bwd_inner_microstep: 1297.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105
[2024-06-10 11:46:11,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.19 | bwd_microstep: 922.67 | bwd_inner_microstep: 922.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 11:46:12,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.65 | bwd_microstep: 917.38 | bwd_inner_microstep: 917.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-10 11:46:14,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.33 | bwd_microstep: 1650.90 | bwd_inner_microstep: 1650.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 11:46:16,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.42 | bwd_microstep: 1456.17 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 11:46:18,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1405.94 | bwd_inner_microstep: 1405.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-10 11:46:20,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.45 | bwd_microstep: 1420.59 | bwd_inner_microstep: 1420.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 11:46:22,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1291.59 | bwd_inner_microstep: 1291.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3469
[2024-06-10 11:46:24,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.53 | bwd_microstep: 1267.06 | bwd_inner_microstep: 1267.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1386
[2024-06-10 11:46:25,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 214.28 | bwd_microstep: 557.47 | bwd_inner_microstep: 557.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3558
[2024-06-10 11:46:26,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.61 | bwd_microstep: 1345.45 | bwd_inner_microstep: 1345.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3087
[2024-06-10 11:46:28,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.59 | bwd_microstep: 1196.10 | bwd_inner_microstep: 1196.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 11:46:30,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1493.80 | bwd_inner_microstep: 1493.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603
[2024-06-10 11:46:37,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.29 | optimizer_step: 6.57
[2024-06-10 11:46:37,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.71 | bwd_microstep: 6616.78 | bwd_inner_microstep: 1930.43 | bwd_allreduce_microstep: 4686.29 | step_microstep: 39.47
[2024-06-10 11:46:37,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15613.37 | bwd: 46525.74 | bwd_inner: 41838.51 | bwd_allreduce: 4686.53 | step: 41.37
{'loss': 1.2489, 'learning_rate': 2.890883738698289e-05, 'epoch': 0.37}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3529
[2024-06-10 11:46:40,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.07 | bwd_microstep: 1590.95 | bwd_inner_microstep: 1590.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3884
[2024-06-10 11:46:42,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.15 | bwd_microstep: 1580.30 | bwd_inner_microstep: 1580.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-10 11:46:44,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.13 | bwd_microstep: 1559.52 | bwd_inner_microstep: 1559.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 11:46:46,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 11:46:48,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1552.96 | bwd_inner_microstep: 1552.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 11:46:50,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.30 | bwd_microstep: 1440.51 | bwd_inner_microstep: 1440.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 11:46:52,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.83 | bwd_microstep: 1383.12 | bwd_inner_microstep: 1383.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 11:46:53,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.12 | bwd_microstep: 698.89 | bwd_inner_microstep: 698.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1478
[2024-06-10 11:46:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 221.10 | bwd_microstep: 583.55 | bwd_inner_microstep: 583.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428
[2024-06-10 11:46:55,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.03 | bwd_microstep: 1282.59 | bwd_inner_microstep: 1282.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 11:46:57,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.91 | bwd_microstep: 1483.38 | bwd_inner_microstep: 1483.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3378
[2024-06-10 11:46:59,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1338.69 | bwd_inner_microstep: 1338.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3641
[2024-06-10 11:47:01,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1545.05 | bwd_inner_microstep: 1545.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3495
[2024-06-10 11:47:04,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.93 | bwd_microstep: 1679.74 | bwd_inner_microstep: 1679.53 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 11:47:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.57 | bwd_microstep: 806.48 | bwd_inner_microstep: 806.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2102
[2024-06-10 11:47:06,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.79 | bwd_microstep: 918.11 | bwd_inner_microstep: 918.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 11:47:08,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1691.66 | bwd_inner_microstep: 1691.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 11:47:10,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.89 | bwd_microstep: 1509.58 | bwd_inner_microstep: 1509.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 11:47:13,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1563.79 | bwd_inner_microstep: 1563.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 11:47:15,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.31 | bwd_microstep: 1511.45 | bwd_inner_microstep: 1511.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 11:47:17,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.45 | bwd_microstep: 1404.75 | bwd_inner_microstep: 1404.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901
[2024-06-10 11:47:19,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.74 | bwd_microstep: 1600.64 | bwd_inner_microstep: 1600.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3580
[2024-06-10 11:47:21,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.35 | bwd_microstep: 1452.42 | bwd_inner_microstep: 1452.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822
[2024-06-10 11:47:23,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.96 | bwd_microstep: 1393.68 | bwd_inner_microstep: 1393.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 11:47:25,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.44 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2282
[2024-06-10 11:47:26,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.25 | bwd_microstep: 1040.50 | bwd_inner_microstep: 1040.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943
[2024-06-10 11:47:27,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.91 | bwd_microstep: 735.84 | bwd_inner_microstep: 735.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 11:47:29,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.12 | bwd_microstep: 1402.92 | bwd_inner_microstep: 1402.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065
[2024-06-10 11:47:30,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.05 | bwd_microstep: 822.07 | bwd_inner_microstep: 822.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613
[2024-06-10 11:47:32,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.05 | bwd_microstep: 1253.09 | bwd_inner_microstep: 1253.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3735
[2024-06-10 11:47:34,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1350.33 | bwd_inner_microstep: 1350.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465
[2024-06-10 11:47:40,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.78 | optimizer_step: 6.60
[2024-06-10 11:47:40,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.54 | bwd_microstep: 5947.10 | bwd_inner_microstep: 1783.55 | bwd_allreduce_microstep: 4163.45 | step_microstep: 42.22
[2024-06-10 11:47:40,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15864.22 | bwd: 46786.88 | bwd_inner: 42622.17 | bwd_allreduce: 4163.88 | step: 44.42
| 639/1726 [11:05:10<18:41:03, 61.88s/it]


 37%|███▋      | 639/1726 [11:05:10<18:41:03, 61.88s/it]
 37%|███▋      | 640/1726 [11:06:11<18:36:43, 61.70s/it]


 37%|███▋      | 640/1726 [11:06:11<18:36:43, 61.70s/it]
 37%|███▋      | 641/1726 [11:07:12<18:32:33, 61.52s/it]


 37%|███▋      | 641/1726 [11:07:12<18:32:33, 61.52s/it]
 37%|███▋      | 642/1726 [11:08:12<18:19:46, 60.87s/it]


 37%|███▋      | 642/1726 [11:08:12<18:19:46, 60.87s/it]
 37%|███▋      | 643/1726 [11:09:14<18:27:33, 61.36s/it]


 37%|███▋      | 643/1726 [11:09:14<18:27:33, 61.36s/it]
 37%|███▋      | 644/1726 [11:10:17<18:35:33, 61.86s/it]
                                          {'loss': 1.2283, 'learning_rate': 2.887521715400137e-05, 'epoch': 0.37}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 11:47:42,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1275.71 | bwd_inner_microstep: 1275.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 11:47:44,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1276.68 | bwd_inner_microstep: 1276.26 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.39
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912
[2024-06-10 11:47:46,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.76 | bwd_microstep: 1690.64 | bwd_inner_microstep: 1690.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 11:47:48,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.32 | bwd_microstep: 1400.39 | bwd_inner_microstep: 1400.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 11:47:50,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.76 | bwd_microstep: 1344.87 | bwd_inner_microstep: 1344.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 11:47:52,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.93 | bwd_microstep: 1259.40 | bwd_inner_microstep: 1259.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.33
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 11:47:53,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.19 | bwd_microstep: 772.81 | bwd_inner_microstep: 772.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4022
[2024-06-10 11:47:55,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 1630.49 | bwd_inner_microstep: 1630.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 11:47:57,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.07 | bwd_microstep: 1160.94 | bwd_inner_microstep: 1160.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 11:47:58,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.89 | bwd_microstep: 819.67 | bwd_inner_microstep: 819.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3922
[2024-06-10 11:48:00,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.48 | bwd_microstep: 1499.64 | bwd_inner_microstep: 1499.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 11:48:02,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1316.05 | bwd_inner_microstep: 1316.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465
[2024-06-10 11:48:04,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1329.90 | bwd_inner_microstep: 1329.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 11:48:06,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.16 | bwd_microstep: 1480.42 | bwd_inner_microstep: 1480.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 11:48:08,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1409.03 | bwd_inner_microstep: 1409.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 11:48:10,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.89 | bwd_microstep: 1251.84 | bwd_inner_microstep: 1251.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3821
[2024-06-10 11:48:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.86 | bwd_microstep: 1830.69 | bwd_inner_microstep: 1830.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 11:48:14,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1478.92 | bwd_inner_microstep: 1478.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 11:48:16,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.70 | bwd_microstep: 1567.59 | bwd_inner_microstep: 1567.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 11:48:18,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.62 | bwd_microstep: 1420.74 | bwd_inner_microstep: 1420.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 11:48:19,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.38 | bwd_microstep: 703.51 | bwd_inner_microstep: 703.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 11:48:21,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1352.85 | bwd_inner_microstep: 1352.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 11:48:23,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.33 | bwd_microstep: 1398.62 | bwd_inner_microstep: 1398.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 11:48:25,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1318.63 | bwd_inner_microstep: 1318.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 11:48:27,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1424.95 | bwd_inner_microstep: 1424.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3814
[2024-06-10 11:48:29,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1504.84 | bwd_inner_microstep: 1504.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3805
[2024-06-10 11:48:31,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.84 | bwd_microstep: 1720.50 | bwd_inner_microstep: 1720.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-10 11:48:33,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.18 | bwd_microstep: 974.44 | bwd_inner_microstep: 974.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 11:48:35,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.30 | bwd_microstep: 1546.57 | bwd_inner_microstep: 1546.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2187
[2024-06-10 11:48:36,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.83 | bwd_microstep: 1053.04 | bwd_inner_microstep: 1053.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 11:48:38,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.22 | bwd_microstep: 1542.69 | bwd_inner_microstep: 1542.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3546
[2024-06-10 11:48:44,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.30 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 11:48:44,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.29 | bwd_microstep: 5161.40 | bwd_inner_microstep: 1376.16 | bwd_allreduce_microstep: 3785.19 | step_microstep: 39.73
[2024-06-10 11:48:44,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16181.76 | bwd: 46918.51 | bwd_inner: 43131.97 | bwd_allreduce: 3785.67 | step: 42.07
{'loss': 1.2724, 'learning_rate': 2.8841565662551187e-05, 'epoch': 0.37}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 11:48:46,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.44 | bwd_microstep: 1477.51 | bwd_inner_microstep: 1477.33 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 11:48:48,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.86 | bwd_microstep: 1479.31 | bwd_inner_microstep: 1479.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3852
[2024-06-10 11:48:50,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.88 | bwd_microstep: 1561.03 | bwd_inner_microstep: 1561.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 11:48:51,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 790.29 | bwd_inner_microstep: 790.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 11:48:53,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 11:48:55,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1253.57 | bwd_inner_microstep: 1253.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.91
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 11:48:57,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1386.40 | bwd_inner_microstep: 1386.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2194
[2024-06-10 11:48:58,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.87 | bwd_microstep: 955.78 | bwd_inner_microstep: 955.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 11:49:00,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.54 | bwd_microstep: 1536.63 | bwd_inner_microstep: 1536.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 11:49:02,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 11:49:04,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.61 | bwd_microstep: 1287.49 | bwd_inner_microstep: 1287.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 11:49:06,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.51 | bwd_microstep: 1315.28 | bwd_inner_microstep: 1315.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1967
[2024-06-10 11:49:07,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.93 | bwd_microstep: 841.15 | bwd_inner_microstep: 841.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 11:49:08,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1258.49 | bwd_inner_microstep: 1258.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3488
[2024-06-10 11:49:11,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.79 | bwd_microstep: 1547.07 | bwd_inner_microstep: 1547.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 11:49:13,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 1451.19 | bwd_inner_microstep: 1451.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 11:49:14,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.81 | bwd_microstep: 1381.59 | bwd_inner_microstep: 1381.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 11:49:17,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.03 | bwd_microstep: 1475.99 | bwd_inner_microstep: 1475.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 11:49:19,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.05 | bwd_microstep: 1644.42 | bwd_inner_microstep: 1644.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 11:49:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.65 | bwd_microstep: 1477.78 | bwd_inner_microstep: 1477.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 11:49:23,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.51 | bwd_microstep: 1514.71 | bwd_inner_microstep: 1514.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 11:49:25,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1415.72 | bwd_inner_microstep: 1415.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 11:49:26,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.39 | bwd_microstep: 978.33 | bwd_inner_microstep: 978.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 11:49:28,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1382.84 | bwd_inner_microstep: 1382.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 11:49:30,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.91 | bwd_microstep: 1436.23 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 11:49:32,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.24 | bwd_microstep: 1354.70 | bwd_inner_microstep: 1354.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 11:49:34,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1469.46 | bwd_inner_microstep: 1469.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3763
[2024-06-10 11:49:36,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.91 | bwd_microstep: 1749.42 | bwd_inner_microstep: 1749.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3476
[2024-06-10 11:49:38,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.69 | bwd_microstep: 1461.22 | bwd_inner_microstep: 1461.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3736
[2024-06-10 11:49:40,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.21 | bwd_microstep: 1463.74 | bwd_inner_microstep: 1463.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2069
[2024-06-10 11:49:42,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.51 | bwd_microstep: 757.67 | bwd_inner_microstep: 757.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 11:49:45,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.38 | optimizer_step: 6.62
[2024-06-10 11:49:45,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.40 | bwd_microstep: 2994.90 | bwd_inner_microstep: 1681.62 | bwd_allreduce_microstep: 1313.20 | step_microstep: 39.22
[2024-06-10 11:49:45,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16145.68 | bwd: 44635.36 | bwd_inner: 43321.09 | bwd_allreduce: 1313.51 | step: 42.78
{'loss': 1.2589, 'learning_rate': 2.880788303115269e-05, 'epoch': 0.37}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 11:49:47,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.04 | bwd_microstep: 1473.28 | bwd_inner_microstep: 1473.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4375
[2024-06-10 11:49:50,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.84 | bwd_microstep: 1707.69 | bwd_inner_microstep: 1707.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 11:49:52,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1480.62 | bwd_inner_microstep: 1480.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 11:49:53,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1249.72 | bwd_inner_microstep: 1249.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 11:49:55,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.36 | bwd_microstep: 1387.27 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 11:49:57,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.52 | bwd_microstep: 1382.44 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 11:49:59,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.15 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2054
[2024-06-10 11:50:00,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.40 | bwd_microstep: 819.12 | bwd_inner_microstep: 819.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 11:50:02,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.89 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 45, images per sample: 11.25, dynamic token length: 3672
[2024-06-10 11:50:04,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.86 | bwd_microstep: 1707.05 | bwd_inner_microstep: 1707.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3412
[2024-06-10 11:50:06,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.72 | bwd_microstep: 1373.65 | bwd_inner_microstep: 1373.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.13
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487
[2024-06-10 11:50:08,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.71 | bwd_microstep: 1440.35 | bwd_inner_microstep: 1440.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-10 11:50:10,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.28 | bwd_microstep: 1611.74 | bwd_inner_microstep: 1611.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 11:50:12,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1285.40 | bwd_inner_microstep: 1285.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3535
[2024-06-10 11:50:14,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.39 | bwd_microstep: 1591.85 | bwd_inner_microstep: 1591.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1981
[2024-06-10 11:50:15,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 892.35 | bwd_inner_microstep: 892.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3867
[2024-06-10 11:50:18,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.17 | bwd_microstep: 1768.05 | bwd_inner_microstep: 1768.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 982
[2024-06-10 11:50:18,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 166.00 | bwd_microstep: 423.83 | bwd_inner_microstep: 423.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 11:50:20,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1432.68 | bwd_inner_microstep: 1432.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 11:50:22,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.04 | bwd_microstep: 1377.53 | bwd_inner_microstep: 1377.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 11:50:24,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2361
[2024-06-10 11:50:26,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.61 | bwd_microstep: 1089.42 | bwd_inner_microstep: 1089.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2577
[2024-06-10 11:50:27,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.80 | bwd_microstep: 1195.25 | bwd_inner_microstep: 1195.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-10 11:50:29,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1441.63 | bwd_inner_microstep: 1441.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1852
[2024-06-10 11:50:30,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.15 | bwd_microstep: 672.97 | bwd_inner_microstep: 672.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 11:50:33,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.27 | bwd_microstep: 1653.47 | bwd_inner_microstep: 1653.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2275
[2024-06-10 11:50:34,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.63 | bwd_microstep: 1070.49 | bwd_inner_microstep: 1070.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 11:50:35,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.92 | bwd_microstep: 973.55 | bwd_inner_microstep: 973.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 11:50:38,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.04 | bwd_microstep: 1601.76 | bwd_inner_microstep: 1601.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 11:50:39,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.69 | bwd_microstep: 979.17 | bwd_inner_microstep: 979.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 11:50:41,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1403.14 | bwd_inner_microstep: 1403.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 11:50:47,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 11:50:47,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 5248.53 | bwd_inner_microstep: 1686.13 | bwd_allreduce_microstep: 3562.35 | step_microstep: 38.69
[2024-06-10 11:50:47,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15663.31 | bwd: 45638.25 | bwd_inner: 42074.88 | bwd_allreduce: 3562.63 | step: 42.65
{'loss': 1.2226, 'learning_rate': 2.877416937843595e-05, 'epoch': 0.37}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4070
[2024-06-10 11:50:49,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.60 | bwd_microstep: 1698.71 | bwd_inner_microstep: 1698.50 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 11:50:51,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.23 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3883
[2024-06-10 11:50:53,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1543.15 | bwd_inner_microstep: 1543.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 11:50:55,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.05 | bwd_microstep: 1344.70 | bwd_inner_microstep: 1344.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3745
[2024-06-10 11:50:57,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.14 | bwd_microstep: 1444.90 | bwd_inner_microstep: 1444.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 876
[2024-06-10 11:50:58,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.66 | bwd_microstep: 367.73 | bwd_inner_microstep: 367.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 11:51:00,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.73 | bwd_microstep: 1481.74 | bwd_inner_microstep: 1481.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 11:51:02,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1436.98 | bwd_inner_microstep: 1436.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 11:51:03,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.90 | bwd_microstep: 1194.06 | bwd_inner_microstep: 1194.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 11:51:05,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1296.27 | bwd_inner_microstep: 1296.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 11:51:07,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.55 | bwd_microstep: 1287.07 | bwd_inner_microstep: 1287.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 11:51:09,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.66 | bwd_microstep: 1286.30 | bwd_inner_microstep: 1286.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 11:51:11,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1376.60 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.22
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 11:51:12,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1316.44 | bwd_inner_microstep: 1316.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3508
[2024-06-10 11:51:15,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1651.47 | bwd_inner_microstep: 1651.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 11:51:16,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1248.99 | bwd_inner_microstep: 1248.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 11:51:18,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 11:51:20,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.45 | bwd_microstep: 1411.70 | bwd_inner_microstep: 1411.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492
[2024-06-10 11:51:22,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.44 | bwd_microstep: 1429.56 | bwd_inner_microstep: 1429.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 11:51:23,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.82 | bwd_microstep: 702.63 | bwd_inner_microstep: 702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3593
[2024-06-10 11:51:25,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.59 | bwd_microstep: 1806.21 | bwd_inner_microstep: 1806.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 11:51:28,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.83 | bwd_microstep: 1658.51 | bwd_inner_microstep: 1658.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 11:51:30,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.34 | bwd_microstep: 1553.05 | bwd_inner_microstep: 1553.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 11:51:32,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 11:51:34,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.33 | bwd_microstep: 1585.94 | bwd_inner_microstep: 1585.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 11:51:36,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1310.29 | bwd_inner_microstep: 1310.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 11:51:37,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.84 | bwd_microstep: 808.45 | bwd_inner_microstep: 808.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2917
[2024-06-10 11:51:39,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.04 | bwd_microstep: 1193.37 | bwd_inner_microstep: 1193.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 11:51:41,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1509.98 | bwd_inner_microstep: 1509.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 11:51:42,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 11:51:45,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.91 | bwd_microstep: 1559.67 | bwd_inner_microstep: 1559.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 11:51:51,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-10 11:51:51,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.42 | bwd_microstep: 6117.59 | bwd_inner_microstep: 2161.68 | bwd_allreduce_microstep: 3955.84 | step_microstep: 39.00
[2024-06-10 11:51:51,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16261.58 | bwd: 47930.71 | bwd_inner: 43973.56 | bwd_allreduce: 3956.30 | step: 41.27
{'loss': 1.3081, 'learning_rate': 2.8740424823140268e-05, 'epoch': 0.38}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 11:51:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1348.37 | bwd_inner_microstep: 1348.28 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 11:51:55,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.85 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2372
[2024-06-10 11:51:56,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.56 | bwd_microstep: 960.53 | bwd_inner_microstep: 960.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 11:51:57,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.17 | bwd_microstep: 680.22 | bwd_inner_microstep: 680.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 11:51:59,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1384.42 | bwd_inner_microstep: 1384.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3483
[2024-06-10 11:52:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.53 | bwd_microstep: 1243.71 | bwd_inner_microstep: 1243.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4118
[2024-06-10 11:52:03,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1569.09 | bwd_inner_microstep: 1569.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3591
[2024-06-10 11:52:05,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.09 | bwd_microstep: 1356.28 | bwd_inner_microstep: 1356.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 11:52:07,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.60 | bwd_microstep: 1249.32 | bwd_inner_microstep: 1249.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 11:52:09,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1386.31 | bwd_inner_microstep: 1386.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 11:52:11,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1540.58 | bwd_inner_microstep: 1540.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-10 11:52:12,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.40 | bwd_microstep: 1155.53 | bwd_inner_microstep: 1155.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 11:52:14,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1282.67 | bwd_inner_microstep: 1282.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661
[2024-06-10 11:52:17,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.92 | bwd_microstep: 1716.92 | bwd_inner_microstep: 1716.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 11:52:19,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.77 | bwd_microstep: 1586.11 | bwd_inner_microstep: 1586.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 11:52:21,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.22 | bwd_microstep: 1599.88 | bwd_inner_microstep: 1599.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3497
[2024-06-10 11:52:23,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.39 | bwd_microstep: 1649.64 | bwd_inner_microstep: 1649.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643
[2024-06-10 11:52:25,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.35 | bwd_microstep: 1603.94 | bwd_inner_microstep: 1603.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2391
[2024-06-10 11:52:27,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.12 | bwd_microstep: 1030.41 | bwd_inner_microstep: 1030.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 11:52:29,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1483.81 | bwd_inner_microstep: 1483.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 11:52:31,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 1352.54 | bwd_inner_microstep: 1352.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 11:52:32,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.96 | bwd_microstep: 975.99 | bwd_inner_microstep: 975.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3832
[2024-06-10 11:52:34,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.87 | bwd_microstep: 1622.17 | bwd_inner_microstep: 1622.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 11:52:37,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.46 | bwd_microstep: 1627.22 | bwd_inner_microstep: 1627.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 11:52:38,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.02 | bwd_microstep: 1350.57 | bwd_inner_microstep: 1350.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-10 11:52:39,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.27 | bwd_microstep: 684.02 | bwd_inner_microstep: 683.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 11:52:41,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.76 | bwd_microstep: 808.93 | bwd_inner_microstep: 808.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2056
[2024-06-10 11:52:42,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.43 | bwd_microstep: 850.14 | bwd_inner_microstep: 850.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3786
[2024-06-10 11:52:44,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.53 | bwd_microstep: 1446.66 | bwd_inner_microstep: 1446.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 11:52:46,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1450.02 | bwd_inner_microstep: 1450.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-10 11:52:48,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1632.32 | bwd_inner_microstep: 1632.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749
[2024-06-10 11:52:51,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.27 | optimizer_step: 6.63
[2024-06-10 11:52:51,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.98 | bwd_microstep: 2604.75 | bwd_inner_microstep: 1861.33 | bwd_allreduce_microstep: 743.37 | step_microstep: 38.58
[2024-06-10 11:52:51,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15922.92 | bwd: 43618.49 | bwd_inner: 42874.10 | bwd_allreduce: 743.66 | step: 40.37
{'loss': 1.2421, 'learning_rate': 2.87066494841138e-05, 'epoch': 0.38}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1864
[2024-06-10 11:52:52,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.45 | bwd_microstep: 763.17 | bwd_inner_microstep: 763.04 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3956
[2024-06-10 11:52:55,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.13 | bwd_microstep: 1598.22 | bwd_inner_microstep: 1598.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2365
[2024-06-10 11:52:56,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.34 | bwd_microstep: 925.57 | bwd_inner_microstep: 925.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 11:52:58,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.18 | bwd_microstep: 1449.10 | bwd_inner_microstep: 1449.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3801
[2024-06-10 11:53:00,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.03 | bwd_microstep: 1579.13 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.33
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 11:53:02,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1251.89 | bwd_inner_microstep: 1251.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 11:53:03,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.12 | bwd_microstep: 1184.47 | bwd_inner_microstep: 1184.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 11:53:05,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1390.31 | bwd_inner_microstep: 1390.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 11:53:07,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.69 | bwd_microstep: 1288.85 | bwd_inner_microstep: 1288.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2175
[2024-06-10 11:53:08,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.46 | bwd_microstep: 948.21 | bwd_inner_microstep: 948.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3395
[2024-06-10 11:53:10,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.72 | bwd_microstep: 1312.52 | bwd_inner_microstep: 1312.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3681
[2024-06-10 11:53:12,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.34 | bwd_microstep: 1495.45 | bwd_inner_microstep: 1495.10 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.33
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3462
[2024-06-10 11:53:14,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1328.25 | bwd_inner_microstep: 1328.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 11:53:16,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1535.91 | bwd_inner_microstep: 1535.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 11:53:18,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.69 | bwd_microstep: 1522.83 | bwd_inner_microstep: 1522.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 11:53:20,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.07 | bwd_microstep: 1287.59 | bwd_inner_microstep: 1287.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 11:53:22,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1284.29 | bwd_inner_microstep: 1284.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304
[2024-06-10 11:53:23,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.22 | bwd_microstep: 981.69 | bwd_inner_microstep: 981.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 11:53:24,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.92 | bwd_microstep: 695.31 | bwd_inner_microstep: 695.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 11:53:26,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.86 | bwd_microstep: 1185.87 | bwd_inner_microstep: 1185.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 11:53:28,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.73 | bwd_microstep: 1763.97 | bwd_inner_microstep: 1763.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 11:53:30,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1253.05 | bwd_inner_microstep: 1253.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 11:53:31,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.50 | bwd_microstep: 684.93 | bwd_inner_microstep: 684.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 11:53:33,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1560.22 | bwd_inner_microstep: 1560.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 11:53:35,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.19 | bwd_microstep: 1256.48 | bwd_inner_microstep: 1256.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231
[2024-06-10 11:53:36,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.85 | bwd_microstep: 960.73 | bwd_inner_microstep: 960.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3601
[2024-06-10 11:53:38,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.14 | bwd_microstep: 1344.84 | bwd_inner_microstep: 1344.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3802
[2024-06-10 11:53:40,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.93 | bwd_microstep: 1579.35 | bwd_inner_microstep: 1579.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3778
[2024-06-10 11:53:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3576
[2024-06-10 11:53:44,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1546.58 | bwd_inner_microstep: 1546.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 11:53:45,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.92 | bwd_microstep: 973.36 | bwd_inner_microstep: 973.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806
[2024-06-10 11:53:51,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.32 | optimizer_step: 6.62
[2024-06-10 11:53:51,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.24 | bwd_microstep: 5256.24 | bwd_inner_microstep: 1826.00 | bwd_allreduce_microstep: 3430.18 | step_microstep: 38.96
[2024-06-10 11:53:51,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15032.85 | bwd: 44591.58 | bwd_inner: 41159.91 | bwd_allreduce: 3430.81 | step: 41.36


 37%|███▋      | 644/1726 [11:10:17<18:35:33, 61.86s/it]
 37%|███▋      | 645/1726 [11:11:21<18:43:12, 62.34s/it]


 37%|███▋      | 645/1726 [11:11:21<18:43:12, 62.34s/it]
 37%|███▋      | 646/1726 [11:12:22<18:35:41, 61.98s/it]


 37%|███▋      | 646/1726 [11:12:22<18:35:41, 61.98s/it]
 37%|███▋      | 647/1726 [11:13:24<18:32:58, 61.89s/it]


 37%|███▋      | 647/1726 [11:13:24<18:32:58, 61.89s/it]
 38%|███▊      | 648/1726 [11:14:28<18:46:22, 62.69s/it]


 38%|███▊      | 648/1726 [11:14:28<18:46:22, 62.69s/it]
 38%|███▊      | 649/1726 [11:15:28<18:30:18, 61.86s/it]


 38%|███▊      | 649/1726 [11:15:28<18:30:18, 61.86s/it]
 38{'loss': 1.2657, 'learning_rate': 2.8672843480313108e-05, 'epoch': 0.38}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 11:53:53,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.76 | bwd_microstep: 1605.08 | bwd_inner_microstep: 1605.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3598
[2024-06-10 11:53:55,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.18 | bwd_microstep: 1268.34 | bwd_inner_microstep: 1268.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-10 11:53:57,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.97 | bwd_microstep: 1408.78 | bwd_inner_microstep: 1408.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3918
[2024-06-10 11:53:59,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1495.39 | bwd_inner_microstep: 1495.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1886
[2024-06-10 11:54:00,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.92 | bwd_microstep: 711.46 | bwd_inner_microstep: 711.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-10 11:54:01,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.37 | bwd_microstep: 684.38 | bwd_inner_microstep: 684.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-10 11:54:03,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.76 | bwd_microstep: 1222.60 | bwd_inner_microstep: 1222.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3658
[2024-06-10 11:54:05,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.16 | bwd_microstep: 1231.31 | bwd_inner_microstep: 1231.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 11:54:06,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.88 | bwd_microstep: 804.36 | bwd_inner_microstep: 804.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 11:54:08,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.31 | bwd_microstep: 1480.89 | bwd_inner_microstep: 1480.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 11:54:10,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1380.31 | bwd_inner_microstep: 1380.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3454
[2024-06-10 11:54:12,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.80 | bwd_microstep: 1368.45 | bwd_inner_microstep: 1368.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 11:54:14,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.17 | bwd_microstep: 1598.06 | bwd_inner_microstep: 1598.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 11:54:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1388.31 | bwd_inner_microstep: 1388.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 11:54:17,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.81 | bwd_microstep: 1242.62 | bwd_inner_microstep: 1242.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 11:54:19,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.64 | bwd_microstep: 1250.35 | bwd_inner_microstep: 1250.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3438
[2024-06-10 11:54:21,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.57 | bwd_microstep: 1217.79 | bwd_inner_microstep: 1217.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 11:54:23,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1489.20 | bwd_inner_microstep: 1489.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921
[2024-06-10 11:54:24,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.56 | bwd_microstep: 725.35 | bwd_inner_microstep: 725.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 11:54:26,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 1459.41 | bwd_inner_microstep: 1459.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 11:54:27,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.08 | bwd_microstep: 803.82 | bwd_inner_microstep: 803.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 11:54:29,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.71 | bwd_microstep: 1520.46 | bwd_inner_microstep: 1520.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 11:54:31,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.21 | bwd_microstep: 1295.21 | bwd_inner_microstep: 1295.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 11:54:33,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1257.76 | bwd_inner_microstep: 1257.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2019
[2024-06-10 11:54:34,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.65 | bwd_microstep: 745.03 | bwd_inner_microstep: 745.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3543
[2024-06-10 11:54:36,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.97 | bwd_microstep: 1424.02 | bwd_inner_microstep: 1423.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3601
[2024-06-10 11:54:38,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1433.73 | bwd_inner_microstep: 1433.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 11:54:40,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.95 | bwd_microstep: 1478.28 | bwd_inner_microstep: 1478.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 11:54:42,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1342.55 | bwd_inner_microstep: 1342.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3809
[2024-06-10 11:54:44,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.09 | bwd_microstep: 1854.47 | bwd_inner_microstep: 1854.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 11:54:46,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1554.57 | bwd_inner_microstep: 1554.39 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 11:54:52,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 11:54:52,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 5192.12 | bwd_inner_microstep: 1876.22 | bwd_allreduce_microstep: 3315.83 | step_microstep: 38.92
[2024-06-10 11:54:52,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15471.38 | bwd: 44934.53 | bwd_inner: 41617.63 | bwd_allreduce: 3316.12 | step: 40.77
{'loss': 1.2324, 'learning_rate': 2.8639006930802762e-05, 'epoch': 0.38}
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1335
[2024-06-10 11:54:53,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 196.97 | bwd_microstep: 511.12 | bwd_inner_microstep: 510.97 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 11:54:54,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1243.36 | bwd_inner_microstep: 1243.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3960
[2024-06-10 11:54:56,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.20 | bwd_microstep: 1428.50 | bwd_inner_microstep: 1428.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3403
[2024-06-10 11:54:58,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.60 | bwd_microstep: 1372.84 | bwd_inner_microstep: 1372.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 11:55:00,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.09 | bwd_microstep: 1379.16 | bwd_inner_microstep: 1379.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2485
[2024-06-10 11:55:02,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.73 | bwd_microstep: 1005.16 | bwd_inner_microstep: 1005.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408
[2024-06-10 11:55:03,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.25 | bwd_microstep: 1312.91 | bwd_inner_microstep: 1312.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 11:55:05,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 11:55:07,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1244.03 | bwd_inner_microstep: 1244.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662
[2024-06-10 11:55:09,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.53 | bwd_microstep: 1426.85 | bwd_inner_microstep: 1426.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 11:55:11,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.66 | bwd_microstep: 1539.02 | bwd_inner_microstep: 1539.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1917
[2024-06-10 11:55:12,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.34 | bwd_microstep: 749.77 | bwd_inner_microstep: 749.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3506
[2024-06-10 11:55:14,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.56 | bwd_microstep: 1513.40 | bwd_inner_microstep: 1513.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 11:55:17,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.83 | bwd_microstep: 1618.51 | bwd_inner_microstep: 1618.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2419
[2024-06-10 11:55:18,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.32 | bwd_microstep: 1010.85 | bwd_inner_microstep: 1010.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3047
[2024-06-10 11:55:20,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.92 | bwd_microstep: 1265.18 | bwd_inner_microstep: 1265.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 11:55:22,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1411.78 | bwd_inner_microstep: 1411.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 11:55:24,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1348.24 | bwd_inner_microstep: 1348.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 11:55:26,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1602.23 | bwd_inner_microstep: 1602.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3425
[2024-06-10 11:55:28,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1294.94 | bwd_inner_microstep: 1294.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 11:55:30,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1556.99 | bwd_inner_microstep: 1556.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3824
[2024-06-10 11:55:32,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.21 | bwd_microstep: 1717.61 | bwd_inner_microstep: 1717.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 11:55:34,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.24 | bwd_microstep: 1632.72 | bwd_inner_microstep: 1632.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2003
[2024-06-10 11:55:35,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.61 | bwd_microstep: 710.96 | bwd_inner_microstep: 710.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 11:55:37,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.86 | bwd_microstep: 1456.47 | bwd_inner_microstep: 1456.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2288
[2024-06-10 11:55:39,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.77 | bwd_microstep: 910.26 | bwd_inner_microstep: 910.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 11:55:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.15 | bwd_microstep: 1539.48 | bwd_inner_microstep: 1539.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 11:55:43,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.87 | bwd_microstep: 1558.32 | bwd_inner_microstep: 1558.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563
[2024-06-10 11:55:45,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.42 | bwd_microstep: 1440.96 | bwd_inner_microstep: 1440.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 11:55:47,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1376.69 | bwd_inner_microstep: 1376.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2231
[2024-06-10 11:55:48,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.48 | bwd_microstep: 771.98 | bwd_inner_microstep: 771.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 11:55:53,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.10 | optimizer_gradients: 4.44 | optimizer_step: 6.59
[2024-06-10 11:55:53,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 4895.23 | bwd_inner_microstep: 1562.44 | bwd_allreduce_microstep: 3332.71 | step_microstep: 39.88
[2024-06-10 11:55:53,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15625.35 | bwd: 45275.52 | bwd_inner: 41941.77 | bwd_allreduce: 3333.00 | step: 41.69
{'loss': 1.2461, 'learning_rate': 2.8605139954754923e-05, 'epoch': 0.38}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 11:55:55,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.34 | bwd_microstep: 1328.46 | bwd_inner_microstep: 1328.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 11:55:56,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.95 | bwd_microstep: 787.94 | bwd_inner_microstep: 787.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867
[2024-06-10 11:55:58,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.13 | bwd_microstep: 1660.68 | bwd_inner_microstep: 1660.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3920
[2024-06-10 11:56:01,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.57 | bwd_microstep: 1694.71 | bwd_inner_microstep: 1694.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 11:56:03,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1250.68 | bwd_inner_microstep: 1250.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 11:56:05,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.35 | bwd_microstep: 1640.69 | bwd_inner_microstep: 1640.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3404
[2024-06-10 11:56:07,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.66 | bwd_microstep: 1330.50 | bwd_inner_microstep: 1330.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2513
[2024-06-10 11:56:08,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.42 | bwd_microstep: 864.65 | bwd_inner_microstep: 864.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 11:56:10,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1281.57 | bwd_inner_microstep: 1281.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 11:56:11,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1283.47 | bwd_inner_microstep: 1283.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 11:56:13,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1392.11 | bwd_inner_microstep: 1392.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-10 11:56:15,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.10 | bwd_microstep: 897.87 | bwd_inner_microstep: 897.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 11:56:16,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.89 | bwd_microstep: 1345.91 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1991
[2024-06-10 11:56:18,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.56 | bwd_microstep: 924.61 | bwd_inner_microstep: 924.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 11:56:20,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.65 | bwd_microstep: 1337.01 | bwd_inner_microstep: 1336.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 11:56:21,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1353.27 | bwd_inner_microstep: 1353.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-10 11:56:23,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1414.10 | bwd_inner_microstep: 1414.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 11:56:25,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.08 | bwd_microstep: 1559.81 | bwd_inner_microstep: 1559.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3639
[2024-06-10 11:56:28,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.25 | bwd_microstep: 1644.13 | bwd_inner_microstep: 1644.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-10 11:56:30,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1409.31 | bwd_inner_microstep: 1409.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 11:56:32,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1420.73 | bwd_inner_microstep: 1420.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 11:56:34,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.92 | bwd_microstep: 1453.63 | bwd_inner_microstep: 1453.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3547
[2024-06-10 11:56:35,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.01 | bwd_microstep: 1200.41 | bwd_inner_microstep: 1200.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 11:56:37,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.19 | bwd_microstep: 1159.40 | bwd_inner_microstep: 1159.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 11:56:39,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1395.04 | bwd_inner_microstep: 1395.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3629
[2024-06-10 11:56:41,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.23 | bwd_microstep: 1471.34 | bwd_inner_microstep: 1471.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 11:56:43,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1554.11 | bwd_inner_microstep: 1554.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 11:56:45,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1278.57 | bwd_inner_microstep: 1278.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 11:56:47,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1650.77 | bwd_inner_microstep: 1650.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 11:56:49,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.23 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 11:56:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1511.91 | bwd_inner_microstep: 1511.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 11:56:53,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 11:56:53,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.58 | bwd_microstep: 1540.02 | bwd_inner_microstep: 1531.98 | bwd_allreduce_microstep: 8.00 | step_microstep: 38.67
[2024-06-10 11:56:53,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16300.33 | bwd: 43596.45 | bwd_inner: 43587.55 | bwd_allreduce: 8.23 | step: 40.22
{'loss': 1.2745, 'learning_rate': 2.8571242671448892e-05, 'epoch': 0.38}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3458
[2024-06-10 11:56:56,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1503.42 | bwd_inner_microstep: 1503.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2458
[2024-06-10 11:56:57,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.56 | bwd_microstep: 979.24 | bwd_inner_microstep: 979.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 11:56:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4147
[2024-06-10 11:57:01,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.82 | bwd_microstep: 1639.82 | bwd_inner_microstep: 1639.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 11:57:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.48 | bwd_microstep: 1649.05 | bwd_inner_microstep: 1649.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 11:57:05,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1283.22 | bwd_inner_microstep: 1283.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-10 11:57:07,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.71 | bwd_microstep: 1152.65 | bwd_inner_microstep: 1152.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 11:57:08,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1248.33 | bwd_inner_microstep: 1248.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 11:57:10,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1386.72 | bwd_inner_microstep: 1386.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 11:57:12,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1282.03 | bwd_inner_microstep: 1282.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 11:57:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1486.88 | bwd_inner_microstep: 1486.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470
[2024-06-10 11:57:16,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1411.66 | bwd_inner_microstep: 1411.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3656
[2024-06-10 11:57:18,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.05 | bwd_microstep: 1351.27 | bwd_inner_microstep: 1351.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3473
[2024-06-10 11:57:20,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.39 | bwd_microstep: 1504.93 | bwd_inner_microstep: 1504.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3679
[2024-06-10 11:57:22,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.73 | bwd_microstep: 1822.95 | bwd_inner_microstep: 1822.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 11:57:24,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1245.79 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 11:57:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1483.71 | bwd_inner_microstep: 1483.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504
[2024-06-10 11:57:28,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1449.28 | bwd_inner_microstep: 1449.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828
[2024-06-10 11:57:30,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1480.97 | bwd_inner_microstep: 1480.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 11:57:32,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1421.47 | bwd_inner_microstep: 1421.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3629
[2024-06-10 11:57:34,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.83 | bwd_microstep: 1375.22 | bwd_inner_microstep: 1375.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185
[2024-06-10 11:57:35,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.34 | bwd_microstep: 953.66 | bwd_inner_microstep: 953.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3575
[2024-06-10 11:57:37,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.51 | bwd_microstep: 1335.59 | bwd_inner_microstep: 1335.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668
[2024-06-10 11:57:39,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.99 | bwd_microstep: 1326.55 | bwd_inner_microstep: 1326.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 11:57:41,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.10 | bwd_microstep: 1628.13 | bwd_inner_microstep: 1628.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 11:57:43,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1538.87 | bwd_inner_microstep: 1538.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3067
[2024-06-10 11:57:45,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.94 | bwd_microstep: 1111.22 | bwd_inner_microstep: 1111.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 11:57:47,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1399.92 | bwd_inner_microstep: 1399.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 11:57:49,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1498.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 11:57:51,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1287.98 | bwd_inner_microstep: 1287.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 11:57:53,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.87 | bwd_microstep: 1448.63 | bwd_inner_microstep: 1448.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-10 11:57:55,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 11:57:55,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.23 | bwd_microstep: 1491.02 | bwd_inner_microstep: 1483.33 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.56
[2024-06-10 11:57:55,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16633.32 | bwd: 44427.39 | bwd_inner: 44418.84 | bwd_allreduce: 7.87 | step: 39.12
{'loss': 1.3091, 'learning_rate': 2.8537315200270735e-05, 'epoch': 0.38}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 11:57:57,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1333.15 | bwd_inner_microstep: 1333.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 11:57:59,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.80 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3477
[2024-06-10 11:58:01,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1428.15 | bwd_inner_microstep: 1428.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2303
[2024-06-10 11:58:02,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.66 | bwd_microstep: 975.41 | bwd_inner_microstep: 975.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3749
[2024-06-10 11:58:04,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1370.70 | bwd_inner_microstep: 1370.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3491
[2024-06-10 11:58:06,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1348.99 | bwd_inner_microstep: 1348.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 11:58:08,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.33 | bwd_microstep: 1548.82 | bwd_inner_microstep: 1548.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3668
[2024-06-10 11:58:10,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.33 | bwd_microstep: 1354.92 | bwd_inner_microstep: 1354.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 11:58:12,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.23 | bwd_microstep: 1254.78 | bwd_inner_microstep: 1254.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 11:58:13,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.84 | bwd_microstep: 1318.47 | bwd_inner_microstep: 1318.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 11:58:15,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1347.23 | bwd_inner_microstep: 1347.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431
[2024-06-10 11:58:17,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.20 | bwd_microstep: 1542.98 | bwd_inner_microstep: 1542.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519
[2024-06-10 11:58:20,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1585.24 | bwd_inner_microstep: 1585.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 11:58:22,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.28 | bwd_microstep: 1482.73 | bwd_inner_microstep: 1482.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 11:58:24,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.91 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 11:58:26,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1525.62 | bwd_inner_microstep: 1525.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446
[2024-06-10 11:58:28,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1382.73 | bwd_inner_microstep: 1382.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 11:58:29,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1386.78 | bwd_inner_microstep: 1386.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2363
[2024-06-10 11:58:31,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.71 | bwd_microstep: 929.19 | bwd_inner_microstep: 929.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290
[2024-06-10 11:58:32,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.83 | bwd_microstep: 1071.86 | bwd_inner_microstep: 1071.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 11:58:34,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.03 | bwd_microstep: 1415.52 | bwd_inner_microstep: 1415.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3824
[2024-06-10 11:58:36,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1392.52 | bwd_inner_microstep: 1392.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 11:58:38,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.31 | bwd_microstep: 1506.55 | bwd_inner_microstep: 1506.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3682
[2024-06-10 11:58:40,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.97 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 11:58:43,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.90 | bwd_microstep: 1582.53 | bwd_inner_microstep: 1582.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3501
[2024-06-10 11:58:44,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.83 | bwd_microstep: 1353.90 | bwd_inner_microstep: 1353.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 11:58:46,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.12 | bwd_microstep: 1282.97 | bwd_inner_microstep: 1282.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3616
[2024-06-10 11:58:48,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.45 | bwd_microstep: 1454.26 | bwd_inner_microstep: 1454.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 11:58:50,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.73 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 11:58:52,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.39 | bwd_microstep: 1460.50 | bwd_inner_microstep: 1460.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 11:58:54,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.41 | bwd_microstep: 1503.18 | bwd_inner_microstep: 1503.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 11:58:56,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 11:58:56,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.38 | bwd_microstep: 1584.60 | bwd_inner_microstep: 1576.47 | bwd_allreduce_microstep: 8.08 | step_microstep: 38.81
[2024-06-10 11:58:56,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16599.80 | bwd: 44449.38 | bwd_inner: 44440.36 | bwd_allreduce: 8.32 | step: 40.33
%|███▊      | 650/1726 [11:16:28<18:19:12, 61.29s/it]


 38%|███▊      | 650/1726 [11:16:28<18:19:12, 61.29s/it]
 38%|███▊      | 651/1726 [11:17:29<18:15:17, 61.13s/it]


 38%|███▊      | 651/1726 [11:17:29<18:15:17, 61.13s/it]
 38%|███▊      | 652/1726 [11:18:30<18:14:53, 61.17s/it]


 38%|███▊      | 652/1726 [11:18:30<18:14:53, 61.17s/it]
 38%|███▊      | 653/1726 [11:19:30<18:08:53, 60.89s/it]


 38%|███▊      | 653/1726 [11:19:30<18:08:53, 60.89s/it]
 38%|███▊      | 654/1726 [11:20:32<18:10:39, 61.04s/it]


 38%|███▊      | 654/1726 [11:20:32<18:10:39, 61.04s/it]
 38%|███▊      | 655/1726 [11:21:33<18:11:31, 61.15s/it]
                      {'loss': 1.2638, 'learning_rate': 2.8503357660712815e-05, 'epoch': 0.38}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 11:58:58,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.91 | bwd_microstep: 1375.31 | bwd_inner_microstep: 1375.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 11:59:00,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.07 | bwd_microstep: 1312.59 | bwd_inner_microstep: 1312.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849
[2024-06-10 11:59:02,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.21 | bwd_microstep: 1562.22 | bwd_inner_microstep: 1562.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 11:59:04,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1402.91 | bwd_inner_microstep: 1402.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 11:59:06,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1348.12 | bwd_inner_microstep: 1348.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 11:59:08,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.98 | bwd_microstep: 1286.10 | bwd_inner_microstep: 1286.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 11:59:09,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.34 | bwd_microstep: 1191.34 | bwd_inner_microstep: 1191.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 11:59:11,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1347.62 | bwd_inner_microstep: 1347.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 752
[2024-06-10 11:59:12,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.05 | bwd_microstep: 299.25 | bwd_inner_microstep: 299.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 11:59:13,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1300.23 | bwd_inner_microstep: 1300.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 11:59:15,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.29 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 11:59:17,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.88 | bwd_microstep: 1489.74 | bwd_inner_microstep: 1489.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3536
[2024-06-10 11:59:20,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.84 | bwd_microstep: 1592.66 | bwd_inner_microstep: 1592.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3493
[2024-06-10 11:59:22,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.96 | bwd_microstep: 1679.77 | bwd_inner_microstep: 1679.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 11:59:24,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.22 | bwd_microstep: 1290.26 | bwd_inner_microstep: 1290.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 11:59:26,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1378.75 | bwd_inner_microstep: 1378.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 11:59:27,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1346.97 | bwd_inner_microstep: 1346.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 11:59:29,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.81 | bwd_microstep: 1295.81 | bwd_inner_microstep: 1295.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 649
[2024-06-10 11:59:30,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.74 | bwd_microstep: 274.60 | bwd_inner_microstep: 274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 11:59:32,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1556.11 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3613
[2024-06-10 11:59:34,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.61 | bwd_microstep: 1311.62 | bwd_inner_microstep: 1311.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3686
[2024-06-10 11:59:35,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1232.71 | bwd_inner_microstep: 1232.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 11:59:37,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1352.29 | bwd_inner_microstep: 1352.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-10 11:59:38,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.55 | bwd_microstep: 909.09 | bwd_inner_microstep: 909.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713
[2024-06-10 11:59:40,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1335.78 | bwd_inner_microstep: 1335.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 11:59:42,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.50 | bwd_microstep: 1423.33 | bwd_inner_microstep: 1423.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 11:59:44,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 11:59:46,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.25 | bwd_microstep: 1311.11 | bwd_inner_microstep: 1311.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 11:59:48,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1492.56 | bwd_inner_microstep: 1492.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-10 11:59:50,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.36 | bwd_microstep: 1579.47 | bwd_inner_microstep: 1579.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 11:59:52,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.86 | bwd_microstep: 1600.71 | bwd_inner_microstep: 1600.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 11:59:56,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-10 11:59:56,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.94 | bwd_microstep: 3352.14 | bwd_inner_microstep: 1416.63 | bwd_allreduce_microstep: 1935.44 | step_microstep: 38.61
[2024-06-10 11:59:56,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15700.39 | bwd: 43888.97 | bwd_inner: 41952.61 | bwd_allreduce: 1935.68 | step: 40.13
{'loss': 1.2496, 'learning_rate': 2.846937017237343e-05, 'epoch': 0.38}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 11:59:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.86 | bwd_microstep: 1329.52 | bwd_inner_microstep: 1329.46 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 12:00:00,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1383.57 | bwd_inner_microstep: 1383.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3843
[2024-06-10 12:00:02,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1362.24 | bwd_inner_microstep: 1362.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 12:00:04,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.54 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2452
[2024-06-10 12:00:05,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.58 | bwd_microstep: 944.60 | bwd_inner_microstep: 944.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 12:00:07,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1432.54 | bwd_inner_microstep: 1432.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 12:00:09,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.57 | bwd_microstep: 1286.07 | bwd_inner_microstep: 1286.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 12:00:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.13 | bwd_microstep: 1490.00 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3906
[2024-06-10 12:00:13,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1525.68 | bwd_inner_microstep: 1525.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 12:00:15,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1494.09 | bwd_inner_microstep: 1494.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503
[2024-06-10 12:00:17,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.03 | bwd_microstep: 1220.43 | bwd_inner_microstep: 1220.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482
[2024-06-10 12:00:19,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.48 | bwd_microstep: 1574.09 | bwd_inner_microstep: 1574.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 12:00:21,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1485.99 | bwd_inner_microstep: 1485.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 12:00:23,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1392.40 | bwd_inner_microstep: 1392.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-10 12:00:24,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.98 | bwd_microstep: 728.00 | bwd_inner_microstep: 727.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 12:00:26,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.03 | bwd_microstep: 1426.34 | bwd_inner_microstep: 1426.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 12:00:28,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.81 | bwd_microstep: 1288.07 | bwd_inner_microstep: 1288.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 12:00:30,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.47 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 12:00:32,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1429.09 | bwd_inner_microstep: 1429.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673
[2024-06-10 12:00:34,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1326.53 | bwd_inner_microstep: 1326.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 12:00:36,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1449.81 | bwd_inner_microstep: 1449.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 12:00:38,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.98 | bwd_microstep: 1422.47 | bwd_inner_microstep: 1422.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 12:00:40,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1399.65 | bwd_inner_microstep: 1399.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-10 12:00:41,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1311.95 | bwd_inner_microstep: 1311.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 12:00:43,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.94 | bwd_microstep: 876.14 | bwd_inner_microstep: 876.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834
[2024-06-10 12:00:44,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.48 | bwd_microstep: 1360.92 | bwd_inner_microstep: 1360.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-10 12:00:46,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.97 | bwd_microstep: 809.46 | bwd_inner_microstep: 809.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4062
[2024-06-10 12:00:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.89 | bwd_microstep: 1331.51 | bwd_inner_microstep: 1331.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 12:00:49,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.65 | bwd_microstep: 789.98 | bwd_inner_microstep: 789.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3087
[2024-06-10 12:00:50,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1245.68 | bwd_inner_microstep: 1245.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570
[2024-06-10 12:00:52,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1456.96 | bwd_inner_microstep: 1456.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 12:00:58,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 12:00:58,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 4884.88 | bwd_inner_microstep: 2019.57 | bwd_allreduce_microstep: 2865.25 | step_microstep: 37.93
[2024-06-10 12:00:58,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15806.89 | bwd: 45400.19 | bwd_inner: 42533.99 | bwd_allreduce: 2865.50 | step: 39.64
{'loss': 1.2707, 'learning_rate': 2.8435352854956315e-05, 'epoch': 0.38}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 12:01:00,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1469.04 | bwd_inner_microstep: 1469.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903
[2024-06-10 12:01:02,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.71 | bwd_microstep: 1487.52 | bwd_inner_microstep: 1487.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 12:01:04,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1391.52 | bwd_inner_microstep: 1391.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4201
[2024-06-10 12:01:06,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.71 | bwd_microstep: 1679.60 | bwd_inner_microstep: 1679.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 12:01:08,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.94 | bwd_microstep: 1287.63 | bwd_inner_microstep: 1287.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-10 12:01:09,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.03 | bwd_microstep: 728.61 | bwd_inner_microstep: 728.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 12:01:11,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1341.22 | bwd_inner_microstep: 1341.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 12:01:13,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.51 | bwd_microstep: 1530.28 | bwd_inner_microstep: 1530.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 12:01:15,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.00 | bwd_microstep: 1626.21 | bwd_inner_microstep: 1626.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2106
[2024-06-10 12:01:16,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.09 | bwd_microstep: 825.95 | bwd_inner_microstep: 825.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 12:01:18,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.59 | bwd_microstep: 1288.90 | bwd_inner_microstep: 1288.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3512
[2024-06-10 12:01:20,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1450.67 | bwd_inner_microstep: 1450.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2177
[2024-06-10 12:01:21,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.19 | bwd_microstep: 1046.21 | bwd_inner_microstep: 1046.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2054
[2024-06-10 12:01:23,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.62 | bwd_microstep: 818.08 | bwd_inner_microstep: 818.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 12:01:25,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1382.28 | bwd_inner_microstep: 1382.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2945
[2024-06-10 12:01:26,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.20 | bwd_microstep: 1035.39 | bwd_inner_microstep: 1035.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 12:01:28,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.08 | bwd_microstep: 1614.06 | bwd_inner_microstep: 1614.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3826
[2024-06-10 12:01:30,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.40 | bwd_microstep: 1620.69 | bwd_inner_microstep: 1620.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2129
[2024-06-10 12:01:32,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.79 | bwd_microstep: 863.26 | bwd_inner_microstep: 863.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 12:01:33,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1352.07 | bwd_inner_microstep: 1352.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-10 12:01:36,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.48 | bwd_microstep: 1751.14 | bwd_inner_microstep: 1751.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3845
[2024-06-10 12:01:38,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.55 | bwd_microstep: 1525.58 | bwd_inner_microstep: 1525.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1992
[2024-06-10 12:01:39,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.69 | bwd_microstep: 781.46 | bwd_inner_microstep: 781.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 12:01:41,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.95 | bwd_microstep: 1548.77 | bwd_inner_microstep: 1548.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1935
[2024-06-10 12:01:42,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.49 | bwd_microstep: 819.74 | bwd_inner_microstep: 819.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3605
[2024-06-10 12:01:45,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.88 | bwd_microstep: 1704.40 | bwd_inner_microstep: 1704.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 12:01:47,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1554.10 | bwd_inner_microstep: 1554.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 12:01:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.65 | bwd_microstep: 1285.23 | bwd_inner_microstep: 1285.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 12:01:51,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1505.44 | bwd_inner_microstep: 1505.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 12:01:53,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415
[2024-06-10 12:01:54,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.26 | bwd_microstep: 1403.84 | bwd_inner_microstep: 1403.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-10 12:01:58,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 12:01:58,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.31 | bwd_microstep: 2433.18 | bwd_inner_microstep: 1827.19 | bwd_allreduce_microstep: 605.93 | step_microstep: 37.72
[2024-06-10 12:01:58,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15943.16 | bwd: 43497.77 | bwd_inner: 42890.94 | bwd_allreduce: 606.15 | step: 39.18
{'loss': 1.2676, 'learning_rate': 2.8401305828270302e-05, 'epoch': 0.38}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2915
[2024-06-10 12:01:59,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.98 | bwd_microstep: 1189.19 | bwd_inner_microstep: 1189.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3969
[2024-06-10 12:02:01,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.87 | bwd_microstep: 1602.82 | bwd_inner_microstep: 1602.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3853
[2024-06-10 12:02:04,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.76 | bwd_microstep: 1662.11 | bwd_inner_microstep: 1662.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2303
[2024-06-10 12:02:05,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.42 | bwd_microstep: 879.16 | bwd_inner_microstep: 879.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 12:02:07,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1376.54 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 12:02:09,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1447.79 | bwd_inner_microstep: 1447.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 12:02:11,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 12:02:13,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.11 | bwd_microstep: 1483.44 | bwd_inner_microstep: 1483.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 12:02:15,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.26 | bwd_microstep: 1484.06 | bwd_inner_microstep: 1484.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 12:02:17,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1315.07 | bwd_inner_microstep: 1315.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 12:02:18,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1250.63 | bwd_inner_microstep: 1250.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-10 12:02:19,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.30 | bwd_microstep: 714.23 | bwd_inner_microstep: 714.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434
[2024-06-10 12:02:21,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.26 | bwd_microstep: 1310.94 | bwd_inner_microstep: 1310.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 12:02:23,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1487.89 | bwd_inner_microstep: 1487.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 12:02:25,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1510.48 | bwd_inner_microstep: 1510.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465
[2024-06-10 12:02:27,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.50 | bwd_microstep: 1570.16 | bwd_inner_microstep: 1570.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 12:02:29,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.47 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3618
[2024-06-10 12:02:32,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.25 | bwd_microstep: 1706.51 | bwd_inner_microstep: 1706.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 12:02:34,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1492.46 | bwd_inner_microstep: 1492.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3526
[2024-06-10 12:02:36,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.97 | bwd_microstep: 1340.43 | bwd_inner_microstep: 1340.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2297
[2024-06-10 12:02:37,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.88 | bwd_microstep: 941.26 | bwd_inner_microstep: 941.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 12:02:39,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.61 | bwd_microstep: 1616.89 | bwd_inner_microstep: 1616.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 12:02:41,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.91 | bwd_microstep: 907.32 | bwd_inner_microstep: 907.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 12:02:42,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.24 | bwd_microstep: 1288.51 | bwd_inner_microstep: 1288.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 12:02:44,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1561.62 | bwd_inner_microstep: 1561.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 12:02:46,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1419.92 | bwd_inner_microstep: 1419.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556
[2024-06-10 12:02:48,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1424.89 | bwd_inner_microstep: 1424.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 12:02:50,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1256.47 | bwd_inner_microstep: 1256.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 12:02:52,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1355.82 | bwd_inner_microstep: 1355.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3788
[2024-06-10 12:02:54,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.62 | bwd_microstep: 1352.81 | bwd_inner_microstep: 1352.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 12:02:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.50 | bwd_microstep: 1302.47 | bwd_inner_microstep: 1302.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 12:02:59,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-10 12:02:59,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.98 | bwd_microstep: 2646.79 | bwd_inner_microstep: 1451.35 | bwd_allreduce_microstep: 1195.39 | step_microstep: 37.75
[2024-06-10 12:02:59,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16251.08 | bwd: 44764.78 | bwd_inner: 43568.48 | bwd_allreduce: 1195.62 | step: 39.21
{'loss': 1.2801, 'learning_rate': 2.836722921222883e-05, 'epoch': 0.38}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 12:03:01,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1272.20 | bwd_inner_microstep: 1272.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 12:03:02,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.83 | bwd_microstep: 1246.58 | bwd_inner_microstep: 1246.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 12:03:04,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.12 | bwd_microstep: 1390.92 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 12:03:06,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.51 | bwd_microstep: 1562.39 | bwd_inner_microstep: 1562.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 12:03:08,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1351.75 | bwd_inner_microstep: 1351.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 12:03:10,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.29 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 12:03:12,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.77 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-10 12:03:13,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.25 | bwd_microstep: 676.92 | bwd_inner_microstep: 676.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-10 12:03:15,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1315.32 | bwd_inner_microstep: 1315.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 12:03:17,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.53 | bwd_microstep: 1220.89 | bwd_inner_microstep: 1220.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 12:03:18,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1253.22 | bwd_inner_microstep: 1253.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3681
[2024-06-10 12:03:20,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1446.47 | bwd_inner_microstep: 1446.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 12:03:22,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.58 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 12:03:24,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1481.71 | bwd_inner_microstep: 1481.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 12:03:26,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.38 | bwd_microstep: 1499.81 | bwd_inner_microstep: 1499.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 12:03:28,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.99 | bwd_microstep: 1280.69 | bwd_inner_microstep: 1280.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3886
[2024-06-10 12:03:30,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.81 | bwd_microstep: 1687.73 | bwd_inner_microstep: 1687.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686
[2024-06-10 12:03:32,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.03 | bwd_microstep: 1327.85 | bwd_inner_microstep: 1327.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 12:03:35,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.67 | bwd_microstep: 1660.26 | bwd_inner_microstep: 1660.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713
[2024-06-10 12:03:36,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1334.26 | bwd_inner_microstep: 1334.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 12:03:38,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1406.88 | bwd_inner_microstep: 1406.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3834
[2024-06-10 12:03:41,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.50 | bwd_microstep: 1586.40 | bwd_inner_microstep: 1586.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 12:03:42,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.47 | bwd_microstep: 1282.49 | bwd_inner_microstep: 1282.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 12:03:44,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814
[2024-06-10 12:03:46,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.77 | bwd_microstep: 1582.17 | bwd_inner_microstep: 1582.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 12:03:48,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1561.64 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 12:03:51,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1557.51 | bwd_inner_microstep: 1557.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 12:03:53,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.96 | bwd_microstep: 1498.63 | bwd_inner_microstep: 1498.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 12:03:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.45 | bwd_microstep: 1258.68 | bwd_inner_microstep: 1258.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 12:03:56,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.59 | bwd_microstep: 1453.42 | bwd_inner_microstep: 1453.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 12:03:58,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1338.67 | bwd_inner_microstep: 1338.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-10 12:04:01,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.90 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 12:04:01,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.57 | bwd_microstep: 1684.21 | bwd_inner_microstep: 1676.51 | bwd_allreduce_microstep: 7.65 | step_microstep: 38.76
[2024-06-10 12:04:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16715.23 | bwd: 44624.82 | bwd_inner: 44616.27 | bwd_allreduce: 7.87 | step: 40.23
{'loss': 1.2476, 'learning_rate': 2.8333123126849575e-05, 'epoch': 0.38}


 38%|███▊      | 655/1726 [11:21:33<18:11:31, 61.15s/it]
 38%|███▊      | 656/1726 [11:22:33<18:03:57, 60.78s/it]


 38%|███▊      | 656/1726 [11:22:33<18:03:57, 60.78s/it]
 38%|███▊      | 657/1726 [11:23:34<18:07:00, 61.01s/it]


 38%|███▊      | 657/1726 [11:23:35<18:07:00, 61.01s/it]
 38%|███▊      | 658/1726 [11:24:34<17:59:23, 60.64s/it]


 38%|███▊      | 658/1726 [11:24:34<17:59:23, 60.64s/it]
 38%|███▊      | 659/1726 [11:25:36<18:02:09, 60.85s/it]


 38%|███▊      | 659/1726 [11:25:36<18:02:09, 60.85s/it]
 38%|███▊      | 660/1726 [11:26:37<18:05:31, 61.10s/it]


 38%|███▊      | 660/1726 [11:26:37<18:0dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3438
[2024-06-10 12:04:02,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.21 | bwd_microstep: 1375.78 | bwd_inner_microstep: 1375.60 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3904
[2024-06-10 12:04:05,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.67 | bwd_microstep: 1689.92 | bwd_inner_microstep: 1689.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 12:04:07,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.46 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3829
[2024-06-10 12:04:09,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.20 | bwd_microstep: 1417.95 | bwd_inner_microstep: 1417.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3509
[2024-06-10 12:04:10,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.63 | bwd_microstep: 1220.47 | bwd_inner_microstep: 1220.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 12:04:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1353.60 | bwd_inner_microstep: 1353.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 12:04:13,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.38 | bwd_microstep: 792.74 | bwd_inner_microstep: 792.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 12:04:15,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1354.35 | bwd_inner_microstep: 1354.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 12:04:16,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.32 | bwd_microstep: 793.01 | bwd_inner_microstep: 792.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 12:04:18,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1481.85 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3678
[2024-06-10 12:04:21,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.55 | bwd_microstep: 1823.12 | bwd_inner_microstep: 1823.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 12:04:23,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1403.46 | bwd_inner_microstep: 1403.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 12:04:25,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.59 | bwd_microstep: 1255.08 | bwd_inner_microstep: 1255.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 12:04:27,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-10 12:04:29,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.82 | bwd_microstep: 1612.88 | bwd_inner_microstep: 1612.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 12:04:31,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.37 | bwd_microstep: 1425.45 | bwd_inner_microstep: 1425.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 12:04:33,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1377.88 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 646
[2024-06-10 12:04:33,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.33 | bwd_microstep: 275.02 | bwd_inner_microstep: 274.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 12:04:34,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 794.66 | bwd_inner_microstep: 794.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 12:04:36,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.00 | bwd_microstep: 1401.38 | bwd_inner_microstep: 1401.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 646
[2024-06-10 12:04:36,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.39 | bwd_microstep: 275.12 | bwd_inner_microstep: 275.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 12:04:38,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1350.51 | bwd_inner_microstep: 1350.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 12:04:39,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.24 | bwd_microstep: 802.92 | bwd_inner_microstep: 802.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539
[2024-06-10 12:04:41,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.46 | bwd_microstep: 1448.42 | bwd_inner_microstep: 1448.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441
[2024-06-10 12:04:43,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1411.66 | bwd_inner_microstep: 1411.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3418
[2024-06-10 12:04:45,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1407.51 | bwd_inner_microstep: 1407.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3546
[2024-06-10 12:04:47,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.58 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 12:04:49,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1276.76 | bwd_inner_microstep: 1276.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 12:04:51,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1308.89 | bwd_inner_microstep: 1308.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 12:04:52,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 1153.63 | bwd_inner_microstep: 1153.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 12:04:55,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 1554.24 | bwd_inner_microstep: 1554.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 12:05:04,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.36 | optimizer_step: 6.55
[2024-06-10 12:05:04,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 8828.70 | bwd_inner_microstep: 1437.88 | bwd_allreduce_microstep: 7390.74 | step_microstep: 39.22
[2024-06-10 12:05:04,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15171.54 | bwd: 47938.28 | bwd_inner: 40546.49 | bwd_allreduce: 7391.06 | step: 40.86
{'loss': 1.2642, 'learning_rate': 2.829898769225399e-05, 'epoch': 0.38}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 12:05:06,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.03 | bwd_microstep: 1462.04 | bwd_inner_microstep: 1462.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2448
[2024-06-10 12:05:07,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.01 | bwd_microstep: 947.19 | bwd_inner_microstep: 947.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862
[2024-06-10 12:05:09,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-10 12:05:11,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1409.38 | bwd_inner_microstep: 1409.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 12:05:13,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.61 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 12:05:15,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.83 | bwd_microstep: 1529.93 | bwd_inner_microstep: 1529.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3560
[2024-06-10 12:05:17,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.43 | bwd_microstep: 1201.86 | bwd_inner_microstep: 1201.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-10 12:05:19,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.13 | bwd_microstep: 1150.44 | bwd_inner_microstep: 1150.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 12:05:20,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1390.03 | bwd_inner_microstep: 1390.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 12:05:22,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 12:05:24,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.28 | bwd_microstep: 1299.66 | bwd_inner_microstep: 1299.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3499
[2024-06-10 12:05:26,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.72 | bwd_microstep: 1497.35 | bwd_inner_microstep: 1497.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 12:05:28,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1338.59 | bwd_inner_microstep: 1338.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931
[2024-06-10 12:05:29,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.75 | bwd_microstep: 818.67 | bwd_inner_microstep: 818.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3637
[2024-06-10 12:05:31,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.15 | bwd_microstep: 1218.68 | bwd_inner_microstep: 1218.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1989
[2024-06-10 12:05:32,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.84 | bwd_microstep: 737.53 | bwd_inner_microstep: 737.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 12:05:34,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.34 | bwd_microstep: 1276.62 | bwd_inner_microstep: 1276.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 12:05:36,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.84 | bwd_microstep: 1525.00 | bwd_inner_microstep: 1524.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 12:05:37,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.63 | bwd_microstep: 796.41 | bwd_inner_microstep: 796.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-10 12:05:39,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1527.87 | bwd_inner_microstep: 1527.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 12:05:41,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.53 | bwd_microstep: 1320.19 | bwd_inner_microstep: 1320.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2281
[2024-06-10 12:05:42,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.87 | bwd_microstep: 813.04 | bwd_inner_microstep: 813.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 12:05:44,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1549.73 | bwd_inner_microstep: 1549.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 12:05:46,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1379.67 | bwd_inner_microstep: 1379.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3599
[2024-06-10 12:05:48,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.53 | bwd_microstep: 1533.70 | bwd_inner_microstep: 1533.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 12:05:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.69 | bwd_microstep: 1288.03 | bwd_inner_microstep: 1288.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2277
[2024-06-10 12:05:51,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.39 | bwd_microstep: 1070.10 | bwd_inner_microstep: 1070.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 12:05:53,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.82 | bwd_microstep: 971.19 | bwd_inner_microstep: 971.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 12:05:55,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.85 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755
[2024-06-10 12:05:56,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.25 | bwd_microstep: 1343.68 | bwd_inner_microstep: 1343.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3768
[2024-06-10 12:05:58,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.30 | bwd_microstep: 1403.68 | bwd_inner_microstep: 1403.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3423
[2024-06-10 12:06:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 12:06:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.22 | bwd_microstep: 5629.83 | bwd_inner_microstep: 1756.45 | bwd_allreduce_microstep: 3873.33 | step_microstep: 37.91
[2024-06-10 12:06:05,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15338.78 | bwd: 44934.90 | bwd_inner: 41060.65 | bwd_allreduce: 3873.56 | step: 39.43
{'loss': 1.2893, 'learning_rate': 2.826482302866689e-05, 'epoch': 0.38}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 12:06:06,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.69 | bwd_microstep: 1267.36 | bwd_inner_microstep: 1267.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3947
[2024-06-10 12:06:08,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.34 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 12:06:10,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1354.84 | bwd_inner_microstep: 1354.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 12:06:12,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.55 | bwd_microstep: 1146.66 | bwd_inner_microstep: 1146.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789
[2024-06-10 12:06:14,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.42 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 12:06:16,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1245.81 | bwd_inner_microstep: 1245.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 12:06:17,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.07 | bwd_microstep: 676.78 | bwd_inner_microstep: 676.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 12:06:18,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.46 | bwd_microstep: 797.23 | bwd_inner_microstep: 797.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 12:06:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.18 | bwd_microstep: 1294.22 | bwd_inner_microstep: 1294.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 12:06:21,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1391.06 | bwd_inner_microstep: 1391.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1961
[2024-06-10 12:06:22,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.01 | bwd_microstep: 765.37 | bwd_inner_microstep: 765.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 12:06:24,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.07 | bwd_microstep: 1483.39 | bwd_inner_microstep: 1483.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525
[2024-06-10 12:06:26,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.04 | bwd_microstep: 1450.15 | bwd_inner_microstep: 1450.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-10 12:06:29,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.23 | bwd_microstep: 1601.99 | bwd_inner_microstep: 1601.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451
[2024-06-10 12:06:31,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 1411.47 | bwd_inner_microstep: 1411.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 12:06:32,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1248.40 | bwd_inner_microstep: 1248.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 12:06:35,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1580.15 | bwd_inner_microstep: 1580.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 12:06:37,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.11 | bwd_microstep: 1508.44 | bwd_inner_microstep: 1508.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3633
[2024-06-10 12:06:39,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.62 | bwd_microstep: 1392.46 | bwd_inner_microstep: 1392.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053
[2024-06-10 12:06:40,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.38 | bwd_microstep: 909.29 | bwd_inner_microstep: 909.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2153
[2024-06-10 12:06:41,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.59 | bwd_microstep: 852.88 | bwd_inner_microstep: 852.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2187
[2024-06-10 12:06:42,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.12 | bwd_microstep: 795.43 | bwd_inner_microstep: 795.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2059
[2024-06-10 12:06:43,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.99 | bwd_microstep: 799.31 | bwd_inner_microstep: 799.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 12:06:45,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.99 | bwd_microstep: 1451.61 | bwd_inner_microstep: 1451.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-10 12:06:46,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.59 | bwd_microstep: 684.78 | bwd_inner_microstep: 684.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 12:06:48,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1477.74 | bwd_inner_microstep: 1477.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 12:06:50,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 1357.90 | bwd_inner_microstep: 1357.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-10 12:06:52,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.96 | bwd_microstep: 1504.45 | bwd_inner_microstep: 1504.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2556
[2024-06-10 12:06:54,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.03 | bwd_microstep: 1065.61 | bwd_inner_microstep: 1065.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2063
[2024-06-10 12:06:55,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.12 | bwd_microstep: 750.54 | bwd_inner_microstep: 750.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 12:06:57,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.17 | bwd_microstep: 1532.37 | bwd_inner_microstep: 1532.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 12:07:08,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.32 | optimizer_step: 6.61
[2024-06-10 12:07:08,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.76 | bwd_microstep: 10661.13 | bwd_inner_microstep: 1313.83 | bwd_allreduce_microstep: 9347.24 | step_microstep: 39.16
[2024-06-10 12:07:08,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14598.31 | bwd: 48391.27 | bwd_inner: 39043.11 | bwd_allreduce: 9347.47 | step: 40.60
{'loss': 1.2344, 'learning_rate': 2.8230629256416046e-05, 'epoch': 0.38}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-10 12:07:10,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1430.25 | bwd_inner_microstep: 1430.05 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 12:07:12,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1473.62 | bwd_inner_microstep: 1473.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 12:07:14,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1542.14 | bwd_inner_microstep: 1542.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 12:07:16,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.90 | bwd_microstep: 1644.17 | bwd_inner_microstep: 1644.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 12:07:19,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.88 | bwd_microstep: 1644.33 | bwd_inner_microstep: 1644.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 12:07:20,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.14 | bwd_microstep: 1341.57 | bwd_inner_microstep: 1341.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 12:07:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.89 | bwd_microstep: 685.77 | bwd_inner_microstep: 685.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 12:07:22,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.48 | bwd_microstep: 785.26 | bwd_inner_microstep: 785.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 12:07:24,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1245.21 | bwd_inner_microstep: 1245.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3524
[2024-06-10 12:07:26,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.92 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3428
[2024-06-10 12:07:28,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.84 | bwd_microstep: 1212.74 | bwd_inner_microstep: 1212.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2899
[2024-06-10 12:07:29,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.12 | bwd_microstep: 1086.58 | bwd_inner_microstep: 1086.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 12:07:31,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1511.84 | bwd_inner_microstep: 1511.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3694
[2024-06-10 12:07:34,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.38 | bwd_microstep: 1717.03 | bwd_inner_microstep: 1717.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 12:07:36,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1484.71 | bwd_inner_microstep: 1484.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2447
[2024-06-10 12:07:37,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.40 | bwd_microstep: 946.52 | bwd_inner_microstep: 946.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-10 12:07:39,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1402.77 | bwd_inner_microstep: 1402.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296
[2024-06-10 12:07:40,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.68 | bwd_microstep: 975.77 | bwd_inner_microstep: 975.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2052
[2024-06-10 12:07:41,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.86 | bwd_microstep: 722.94 | bwd_inner_microstep: 722.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 12:07:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.07 | bwd_microstep: 1277.14 | bwd_inner_microstep: 1277.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2194
[2024-06-10 12:07:45,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.10 | bwd_microstep: 957.40 | bwd_inner_microstep: 957.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3792
[2024-06-10 12:07:46,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.68 | bwd_microstep: 1382.17 | bwd_inner_microstep: 1382.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 12:07:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1399.51 | bwd_inner_microstep: 1399.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 12:07:50,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.67 | bwd_microstep: 1457.61 | bwd_inner_microstep: 1457.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 12:07:52,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.59 | bwd_microstep: 1414.86 | bwd_inner_microstep: 1414.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3791
[2024-06-10 12:07:54,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1382.21 | bwd_inner_microstep: 1382.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3755
[2024-06-10 12:07:56,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.61 | bwd_microstep: 1445.11 | bwd_inner_microstep: 1445.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-10 12:07:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1626.60 | bwd_inner_microstep: 1626.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3616
[2024-06-10 12:08:00,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.02 | bwd_microstep: 1358.93 | bwd_inner_microstep: 1358.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-10 12:08:03,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1539.95 | bwd_inner_microstep: 1539.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 12:08:05,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.04 | bwd_microstep: 1650.04 | bwd_inner_microstep: 1650.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3807
[2024-06-10 12:08:10,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.35 | optimizer_step: 6.60
[2024-06-10 12:08:10,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.76 | bwd_microstep: 4112.44 | bwd_inner_microstep: 1722.67 | bwd_allreduce_microstep: 2389.71 | step_microstep: 38.64
[2024-06-10 12:08:10,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15975.10 | bwd: 45293.80 | bwd_inner: 42903.01 | bwd_allreduce: 2390.02 | step: 40.10
{'loss': 1.2643, 'learning_rate': 2.8196406495931753e-05, 'epoch': 0.38}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 12:08:11,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1366.94 | bwd_inner_microstep: 1366.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4112
[2024-06-10 12:08:14,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.03 | bwd_microstep: 1533.52 | bwd_inner_microstep: 1533.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 12:08:15,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1395.86 | bwd_inner_microstep: 1395.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-10 12:08:17,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1454.08 | bwd_inner_microstep: 1454.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 12:08:19,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.75 | bwd_microstep: 793.94 | bwd_inner_microstep: 793.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2440
[2024-06-10 12:08:20,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.31 | bwd_microstep: 946.11 | bwd_inner_microstep: 946.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3728
[2024-06-10 12:08:22,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.05 | bwd_microstep: 1583.36 | bwd_inner_microstep: 1583.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 925
[2024-06-10 12:08:23,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 156.28 | bwd_microstep: 408.33 | bwd_inner_microstep: 408.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 12:08:24,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.27 | bwd_microstep: 678.63 | bwd_inner_microstep: 678.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 12:08:25,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 12:08:26,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.67 | bwd_microstep: 804.38 | bwd_inner_microstep: 804.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2123
[2024-06-10 12:08:28,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.53 | bwd_microstep: 891.37 | bwd_inner_microstep: 891.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2004
[2024-06-10 12:08:29,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.97 | bwd_microstep: 863.90 | bwd_inner_microstep: 863.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2018
[2024-06-10 12:08:30,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.24 | bwd_microstep: 904.44 | bwd_inner_microstep: 904.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2144
[2024-06-10 12:08:32,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.91 | bwd_microstep: 1027.13 | bwd_inner_microstep: 1027.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 12:08:33,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.29 | bwd_microstep: 794.18 | bwd_inner_microstep: 794.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3597
[2024-06-10 12:08:35,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.68 | bwd_microstep: 1752.18 | bwd_inner_microstep: 1752.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 12:08:36,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.62 | bwd_microstep: 801.70 | bwd_inner_microstep: 801.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 12:08:38,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1499.67 | bwd_inner_microstep: 1499.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 12:08:40,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.87 | bwd_microstep: 1288.79 | bwd_inner_microstep: 1288.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545
[2024-06-10 12:08:42,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.90 | bwd_microstep: 1523.69 | bwd_inner_microstep: 1523.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3609
[2024-06-10 12:08:44,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.91 | bwd_microstep: 1657.08 | bwd_inner_microstep: 1657.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2283
[2024-06-10 12:08:46,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.77 | bwd_microstep: 1071.77 | bwd_inner_microstep: 1071.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 12:08:48,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.81 | bwd_microstep: 1401.01 | bwd_inner_microstep: 1400.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-10 12:08:49,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 877.44 | bwd_inner_microstep: 877.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 12:08:51,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.74 | bwd_microstep: 1310.70 | bwd_inner_microstep: 1310.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 12:08:53,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.69 | bwd_microstep: 1560.51 | bwd_inner_microstep: 1560.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 12:08:55,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1442.66 | bwd_inner_microstep: 1442.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 12:08:57,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 12:08:59,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.38 | bwd_microstep: 1604.52 | bwd_inner_microstep: 1604.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472
[2024-06-10 12:09:01,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.12 | bwd_microstep: 1576.52 | bwd_inner_microstep: 1576.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3782
[2024-06-10 12:09:11,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 12:09:11,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.99 | bwd_microstep: 9133.87 | bwd_inner_microstep: 1985.72 | bwd_allreduce_microstep: 7148.10 | step_microstep: 38.37
[2024-06-10 12:09:11,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14609.91 | bwd: 46484.51 | bwd_inner: 39335.49 | bwd_allreduce: 7148.34 | step: 39.87
{'loss': 1.3137, 'learning_rate': 2.8162154867746386e-05, 'epoch': 0.39}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 12:09:13,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.18 | bwd_microstep: 1466.56 | bwd_inner_microstep: 1466.46 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3914
[2024-06-10 12:09:15,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.56 | bwd_microstep: 1684.36 | bwd_inner_microstep: 1684.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 12:09:17,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.13 | bwd_microstep: 1543.30 | bwd_inner_microstep: 1543.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-10 12:09:20,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.04 | bwd_microstep: 1546.87 | bwd_inner_microstep: 1546.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 12:09:21,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1383.67 | bwd_inner_microstep: 1383.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 12:09:23,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.15 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-10 12:09:24,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.12 | bwd_microstep: 684.84 | bwd_inner_microstep: 684.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 12:09:26,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1429.24 | bwd_inner_microstep: 1429.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 12:09:28,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1402.14 | bwd_inner_microstep: 1402.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 12:09:30,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1254.29 | bwd_inner_microstep: 1254.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2163
[2024-06-10 12:09:31,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.28 | bwd_microstep: 979.47 | bwd_inner_microstep: 979.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1992
[2024-06-10 12:09:33,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.48 | bwd_microstep: 866.78 | bwd_inner_microstep: 866.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 12:09:35,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.58 | bwd_microstep: 1657.59 | bwd_inner_microstep: 1657.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 12:09:37,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.25 | bwd_microstep: 1604.52 | bwd_inner_microstep: 1604.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3516
[2024-06-10 12:09:39,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.39 | bwd_microstep: 1685.62 | bwd_inner_microstep: 1685.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 12:09:41,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1395.71 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-10 12:09:43,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1485.68 | bwd_inner_microstep: 1485.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3002
[2024-06-10 12:09:45,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.04 | bwd_microstep: 1045.81 | bwd_inner_microstep: 1045.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 12:09:46,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.96 | bwd_microstep: 801.74 | bwd_inner_microstep: 801.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 12:09:48,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.16 | bwd_microstep: 1396.27 | bwd_inner_microstep: 1396.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2949
[2024-06-10 12:09:49,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.53 | bwd_microstep: 1198.97 | bwd_inner_microstep: 1198.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437
[2024-06-10 12:09:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.88 | bwd_microstep: 1380.64 | bwd_inner_microstep: 1380.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3526
[2024-06-10 12:09:53,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.79 | bwd_microstep: 1258.09 | bwd_inner_microstep: 1258.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 12:09:55,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1414.81 | bwd_inner_microstep: 1414.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 12:09:57,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.80 | bwd_microstep: 1307.90 | bwd_inner_microstep: 1307.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 12:09:59,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.38 | bwd_microstep: 1496.44 | bwd_inner_microstep: 1496.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 12:10:00,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.98 | bwd_microstep: 809.13 | bwd_inner_microstep: 809.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 12:10:02,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.01 | bwd_microstep: 1649.07 | bwd_inner_microstep: 1649.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 12:10:04,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1347.34 | bwd_inner_microstep: 1347.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-10 12:10:06,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.20 | bwd_microstep: 1435.64 | bwd_inner_microstep: 1435.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572
[2024-06-10 12:10:08,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.05 | bwd_microstep: 1426.67 | bwd_inner_microstep: 1426.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 12:10:12,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 12:10:12,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.38 | bwd_microstep: 3154.74 | bwd_inner_microstep: 1697.12 | bwd_allreduce_microstep: 1457.57 | step_microstep: 37.68
[2024-06-10 12:10:12,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16023.93 | bwd: 44576.76 | bwd_inner: 43118.20 | bwd_allreduce: 1457.86 | step: 39.24
5:31, 61.10s/it]
 38%|███▊      | 661/1726 [11:27:41<18:17:02, 61.81s/it]


 38%|███▊      | 661/1726 [11:27:41<18:17:02, 61.81s/it]
 38%|███▊      | 662/1726 [11:28:41<18:09:35, 61.44s/it]


 38%|███▊      | 662/1726 [11:28:41<18:09:35, 61.44s/it]
 38%|███▊      | 663/1726 [11:29:45<18:18:27, 62.00s/it]


 38%|███▊      | 663/1726 [11:29:45<18:18:27, 62.00s/it]
 38%|███▊      | 664/1726 [11:30:46<18:15:20, 61.88s/it]


 38%|███▊      | 664/1726 [11:30:46<18:15:20, 61.88s/it]
 39%|███▊      | 665/1726 [11:31:48<18:11:52, 61.75s/it]


 39%|███▊      | 665/1726 [11:31:48<18:11:52, 61.75s/it]
 39%|███▊      | 666/1726 [11:32:49<18:06:34, 61.50s/it]
  {'loss': 1.2778, 'learning_rate': 2.8127874492494013e-05, 'epoch': 0.39}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 12:10:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.20 | bwd_microstep: 1247.94 | bwd_inner_microstep: 1247.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4028
[2024-06-10 12:10:16,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.66 | bwd_microstep: 1609.21 | bwd_inner_microstep: 1609.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 12:10:18,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.42 | bwd_microstep: 1252.76 | bwd_inner_microstep: 1252.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 12:10:19,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.99 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 12:10:22,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1538.96 | bwd_inner_microstep: 1538.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748
[2024-06-10 12:10:24,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1439.96 | bwd_inner_microstep: 1439.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 12:10:26,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.71 | bwd_microstep: 1631.58 | bwd_inner_microstep: 1631.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 12:10:27,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.39 | bwd_microstep: 789.62 | bwd_inner_microstep: 789.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3482
[2024-06-10 12:10:29,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.47 | bwd_microstep: 1215.70 | bwd_inner_microstep: 1215.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 12:10:31,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1383.26 | bwd_inner_microstep: 1383.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 12:10:32,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1249.56 | bwd_inner_microstep: 1249.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-10 12:10:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.91 | bwd_microstep: 1305.49 | bwd_inner_microstep: 1305.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 12:10:36,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1245.90 | bwd_inner_microstep: 1245.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 12:10:38,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.34 | bwd_microstep: 1476.60 | bwd_inner_microstep: 1476.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 12:10:40,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1524.30 | bwd_inner_microstep: 1524.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2432
[2024-06-10 12:10:41,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.56 | bwd_microstep: 988.81 | bwd_inner_microstep: 988.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 12:10:43,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1299.66 | bwd_inner_microstep: 1299.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828
[2024-06-10 12:10:45,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.99 | bwd_microstep: 1388.03 | bwd_inner_microstep: 1388.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3659
[2024-06-10 12:10:47,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1322.60 | bwd_inner_microstep: 1322.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1974
[2024-06-10 12:10:48,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 702.99 | bwd_inner_microstep: 702.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 12:10:50,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.93 | bwd_microstep: 1453.54 | bwd_inner_microstep: 1453.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 12:10:52,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.89 | bwd_microstep: 1653.43 | bwd_inner_microstep: 1653.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3535
[2024-06-10 12:10:54,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.67 | bwd_microstep: 1243.09 | bwd_inner_microstep: 1243.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-10 12:10:56,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1577.44 | bwd_inner_microstep: 1577.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 12:10:58,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.96 | bwd_microstep: 1439.67 | bwd_inner_microstep: 1439.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 12:11:00,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1489.80 | bwd_inner_microstep: 1489.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3671
[2024-06-10 12:11:02,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1502.70 | bwd_inner_microstep: 1502.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3590
[2024-06-10 12:11:04,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.46 | bwd_microstep: 1240.90 | bwd_inner_microstep: 1240.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456
[2024-06-10 12:11:06,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.19 | bwd_microstep: 1374.90 | bwd_inner_microstep: 1374.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3584
[2024-06-10 12:11:08,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.99 | bwd_microstep: 1693.34 | bwd_inner_microstep: 1693.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 12:11:10,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.75 | bwd_microstep: 1502.87 | bwd_inner_microstep: 1502.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 12:11:12,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.00 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 12:11:12,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1587.12 | bwd_inner_microstep: 1579.47 | bwd_allreduce_microstep: 7.60 | step_microstep: 37.74
[2024-06-10 12:11:12,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16391.69 | bwd: 43754.88 | bwd_inner: 43746.39 | bwd_allreduce: 7.83 | step: 39.24
{'loss': 1.2925, 'learning_rate': 2.809356549090992e-05, 'epoch': 0.39}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3455
[2024-06-10 12:11:14,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.33 | bwd_microstep: 1547.90 | bwd_inner_microstep: 1547.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 12:11:16,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1346.40 | bwd_inner_microstep: 1346.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 12:11:18,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1245.65 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 12:11:20,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.02 | bwd_microstep: 1378.28 | bwd_inner_microstep: 1378.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 12:11:22,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.65 | bwd_microstep: 1644.08 | bwd_inner_microstep: 1644.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 12:11:24,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 12:11:26,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1284.68 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 12:11:28,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.22 | bwd_microstep: 1625.29 | bwd_inner_microstep: 1625.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3425
[2024-06-10 12:11:30,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.79 | bwd_microstep: 1185.18 | bwd_inner_microstep: 1185.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 12:11:32,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.31 | bwd_microstep: 1420.83 | bwd_inner_microstep: 1420.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673
[2024-06-10 12:11:34,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.79 | bwd_microstep: 1585.29 | bwd_inner_microstep: 1585.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3516
[2024-06-10 12:11:36,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1443.70 | bwd_inner_microstep: 1443.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3394
[2024-06-10 12:11:38,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.60 | bwd_microstep: 1274.78 | bwd_inner_microstep: 1274.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 12:11:40,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1443.58 | bwd_inner_microstep: 1443.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 12:11:42,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.58 | bwd_microstep: 1431.25 | bwd_inner_microstep: 1431.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435
[2024-06-10 12:11:43,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.87 | bwd_microstep: 1153.09 | bwd_inner_microstep: 1153.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3650
[2024-06-10 12:11:45,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1321.41 | bwd_inner_microstep: 1321.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 12:11:47,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1284.62 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 12:11:49,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.42 | bwd_microstep: 1252.76 | bwd_inner_microstep: 1252.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 12:11:50,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.10 | bwd_microstep: 798.97 | bwd_inner_microstep: 798.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 12:11:52,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1457.02 | bwd_inner_microstep: 1456.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 12:11:53,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.78 | bwd_microstep: 696.76 | bwd_inner_microstep: 696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 12:11:55,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.58 | bwd_microstep: 1521.05 | bwd_inner_microstep: 1521.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1899
[2024-06-10 12:11:56,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.63 | bwd_microstep: 747.29 | bwd_inner_microstep: 747.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 12:11:58,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1309.52 | bwd_inner_microstep: 1309.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 12:12:00,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.78 | bwd_microstep: 1356.26 | bwd_inner_microstep: 1356.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-10 12:12:01,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.40 | bwd_microstep: 1303.61 | bwd_inner_microstep: 1303.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2278
[2024-06-10 12:12:03,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.63 | bwd_microstep: 940.47 | bwd_inner_microstep: 940.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 12:12:05,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 12:12:07,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1556.64 | bwd_inner_microstep: 1556.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 12:12:09,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1399.75 | bwd_inner_microstep: 1399.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577
[2024-06-10 12:12:15,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 12:12:15,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.41 | bwd_microstep: 5314.92 | bwd_inner_microstep: 1606.36 | bwd_allreduce_microstep: 3708.51 | step_microstep: 37.89
[2024-06-10 12:12:15,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15886.69 | bwd: 46314.03 | bwd_inner: 42604.61 | bwd_allreduce: 3708.74 | step: 39.37
{'loss': 1.2752, 'learning_rate': 2.805922798383025e-05, 'epoch': 0.39}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4290
[2024-06-10 12:12:17,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.88 | bwd_microstep: 1663.07 | bwd_inner_microstep: 1663.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 12:12:19,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.18 | bwd_microstep: 1241.86 | bwd_inner_microstep: 1241.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 12:12:21,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.68 | bwd_microstep: 1337.18 | bwd_inner_microstep: 1337.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 12:12:22,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1241.24 | bwd_inner_microstep: 1241.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 12:12:24,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.59 | bwd_microstep: 1279.47 | bwd_inner_microstep: 1279.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480
[2024-06-10 12:12:26,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.12 | bwd_microstep: 1218.66 | bwd_inner_microstep: 1218.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 12:12:28,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1280.80 | bwd_inner_microstep: 1280.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 12:12:30,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.41 | bwd_microstep: 1282.91 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493
[2024-06-10 12:12:31,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.24 | bwd_microstep: 1187.65 | bwd_inner_microstep: 1187.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 874
[2024-06-10 12:12:32,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.15 | bwd_microstep: 363.92 | bwd_inner_microstep: 363.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 12:12:33,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.40 | bwd_microstep: 1279.55 | bwd_inner_microstep: 1279.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529
[2024-06-10 12:12:35,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.46 | bwd_microstep: 1447.41 | bwd_inner_microstep: 1447.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 12:12:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1338.17 | bwd_inner_microstep: 1338.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2741
[2024-06-10 12:12:39,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.49 | bwd_microstep: 1136.62 | bwd_inner_microstep: 1136.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 12:12:41,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.81 | bwd_microstep: 1487.80 | bwd_inner_microstep: 1487.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 12:12:43,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.31 | bwd_microstep: 1390.92 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 12:12:45,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1325.79 | bwd_inner_microstep: 1325.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3610
[2024-06-10 12:12:47,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.56 | bwd_microstep: 1535.92 | bwd_inner_microstep: 1535.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 12:12:49,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.14 | bwd_microstep: 1291.20 | bwd_inner_microstep: 1291.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3503
[2024-06-10 12:12:51,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1532.41 | bwd_inner_microstep: 1532.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 12:12:53,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 1457.65 | bwd_inner_microstep: 1457.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2272
[2024-06-10 12:12:54,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.35 | bwd_microstep: 973.19 | bwd_inner_microstep: 973.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3611
[2024-06-10 12:12:56,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.56 | bwd_microstep: 1654.89 | bwd_inner_microstep: 1654.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 12:12:58,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1293.44 | bwd_inner_microstep: 1293.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038
[2024-06-10 12:12:59,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.29 | bwd_microstep: 904.60 | bwd_inner_microstep: 904.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 12:13:01,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1508.94 | bwd_inner_microstep: 1508.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761
[2024-06-10 12:13:03,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1344.12 | bwd_inner_microstep: 1344.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 12:13:05,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1561.37 | bwd_inner_microstep: 1561.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2243
[2024-06-10 12:13:07,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.87 | bwd_microstep: 1064.91 | bwd_inner_microstep: 1064.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 12:13:09,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.51 | bwd_microstep: 1494.83 | bwd_inner_microstep: 1494.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544
[2024-06-10 12:13:11,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1420.44 | bwd_inner_microstep: 1420.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3730
[2024-06-10 12:13:17,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.63
[2024-06-10 12:13:17,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.55 | bwd_microstep: 5910.76 | bwd_inner_microstep: 1673.24 | bwd_allreduce_microstep: 4237.47 | step_microstep: 37.98
[2024-06-10 12:13:17,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15770.93 | bwd: 46451.71 | bwd_inner: 42213.35 | bwd_allreduce: 4237.70 | step: 39.36
{'loss': 1.2599, 'learning_rate': 2.8024862092191516e-05, 'epoch': 0.39}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 12:13:19,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1374.33 | bwd_inner_microstep: 1374.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1865
[2024-06-10 12:13:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.14 | bwd_microstep: 704.16 | bwd_inner_microstep: 704.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3969
[2024-06-10 12:13:23,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.18 | bwd_microstep: 1596.85 | bwd_inner_microstep: 1596.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 12:13:25,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1487.47 | bwd_inner_microstep: 1487.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 12:13:27,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1480.06 | bwd_inner_microstep: 1480.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 12:13:29,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1382.00 | bwd_inner_microstep: 1381.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 921
[2024-06-10 12:13:29,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.09 | bwd_microstep: 374.12 | bwd_inner_microstep: 374.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 12:13:31,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.15 | bwd_microstep: 1245.15 | bwd_inner_microstep: 1245.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 12:13:33,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.86 | bwd_microstep: 1487.29 | bwd_inner_microstep: 1487.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 12:13:34,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.12 | bwd_microstep: 961.52 | bwd_inner_microstep: 961.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910
[2024-06-10 12:13:35,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.83 | bwd_microstep: 716.02 | bwd_inner_microstep: 715.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 12:13:37,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1388.06 | bwd_inner_microstep: 1388.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2709
[2024-06-10 12:13:38,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.26 | bwd_microstep: 1001.80 | bwd_inner_microstep: 1001.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 12:13:40,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1388.39 | bwd_inner_microstep: 1388.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2346
[2024-06-10 12:13:42,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.99 | bwd_microstep: 923.38 | bwd_inner_microstep: 923.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 12:13:44,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.02 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2442
[2024-06-10 12:13:45,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.84 | bwd_microstep: 947.67 | bwd_inner_microstep: 947.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3651
[2024-06-10 12:13:47,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1461.91 | bwd_inner_microstep: 1461.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3830
[2024-06-10 12:13:49,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.75 | bwd_microstep: 1581.32 | bwd_inner_microstep: 1581.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 12:13:51,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1343.02 | bwd_inner_microstep: 1342.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 12:13:53,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.01 | bwd_microstep: 1644.76 | bwd_inner_microstep: 1644.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-10 12:13:55,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1435.07 | bwd_inner_microstep: 1435.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3611
[2024-06-10 12:13:57,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.05 | bwd_microstep: 1440.44 | bwd_inner_microstep: 1440.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 12:13:59,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1555.59 | bwd_inner_microstep: 1555.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 12:14:01,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1397.64 | bwd_inner_microstep: 1397.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3593
[2024-06-10 12:14:04,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.20 | bwd_microstep: 1703.54 | bwd_inner_microstep: 1703.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 12:14:06,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.36 | bwd_microstep: 1611.88 | bwd_inner_microstep: 1611.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 12:14:08,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.32 | bwd_microstep: 1529.93 | bwd_inner_microstep: 1529.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 12:14:10,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1351.49 | bwd_inner_microstep: 1351.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 12:14:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.55 | bwd_microstep: 1595.64 | bwd_inner_microstep: 1595.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 12:14:14,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.13 | bwd_microstep: 1548.14 | bwd_inner_microstep: 1548.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-10 12:14:18,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 12:14:18,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 2745.10 | bwd_inner_microstep: 1640.09 | bwd_allreduce_microstep: 1104.97 | step_microstep: 37.76
[2024-06-10 12:14:18,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15870.68 | bwd: 43892.90 | bwd_inner: 42787.03 | bwd_allreduce: 1105.19 | step: 39.32
{'loss': 1.2378, 'learning_rate': 2.799046793703021e-05, 'epoch': 0.39}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3412
[2024-06-10 12:14:19,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.77 | bwd_microstep: 1209.28 | bwd_inner_microstep: 1209.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 12:14:21,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1509.65 | bwd_inner_microstep: 1509.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3853
[2024-06-10 12:14:23,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.64 | bwd_microstep: 1485.75 | bwd_inner_microstep: 1485.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 12:14:25,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.01 | bwd_microstep: 1149.62 | bwd_inner_microstep: 1149.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4274
[2024-06-10 12:14:27,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1468.20 | bwd_inner_microstep: 1468.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4138
[2024-06-10 12:14:29,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1541.25 | bwd_inner_microstep: 1541.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 12:14:31,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1295.79 | bwd_inner_microstep: 1295.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 12:14:33,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1529.32 | bwd_inner_microstep: 1529.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2341
[2024-06-10 12:14:34,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.25 | bwd_microstep: 892.38 | bwd_inner_microstep: 892.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2899
[2024-06-10 12:14:36,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.01 | bwd_microstep: 996.71 | bwd_inner_microstep: 996.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2117
[2024-06-10 12:14:37,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.00 | bwd_microstep: 926.52 | bwd_inner_microstep: 926.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-10 12:14:39,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1409.16 | bwd_inner_microstep: 1409.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 12:14:41,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1376.18 | bwd_inner_microstep: 1376.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3528
[2024-06-10 12:14:43,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1361.11 | bwd_inner_microstep: 1361.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3427
[2024-06-10 12:14:44,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.27 | bwd_microstep: 1313.24 | bwd_inner_microstep: 1313.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3443
[2024-06-10 12:14:46,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.67 | bwd_microstep: 1217.32 | bwd_inner_microstep: 1217.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3458
[2024-06-10 12:14:48,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1226.52 | bwd_inner_microstep: 1226.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 12:14:50,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.74 | bwd_microstep: 1656.52 | bwd_inner_microstep: 1656.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535
[2024-06-10 12:14:52,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1419.44 | bwd_inner_microstep: 1419.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2071
[2024-06-10 12:14:53,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.36 | bwd_microstep: 817.54 | bwd_inner_microstep: 817.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616
[2024-06-10 12:14:55,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 1343.24 | bwd_inner_microstep: 1343.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 12:14:57,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1553.46 | bwd_inner_microstep: 1553.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 12:14:58,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.13 | bwd_microstep: 803.20 | bwd_inner_microstep: 803.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 12:15:00,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1398.35 | bwd_inner_microstep: 1398.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 589
[2024-06-10 12:15:01,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.73 | bwd_microstep: 257.55 | bwd_inner_microstep: 257.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3614
[2024-06-10 12:15:03,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1344.36 | bwd_inner_microstep: 1344.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 12:15:04,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.31 | bwd_microstep: 973.64 | bwd_inner_microstep: 973.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545
[2024-06-10 12:15:06,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1522.95 | bwd_inner_microstep: 1522.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3843
[2024-06-10 12:15:08,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.27 | bwd_microstep: 1523.00 | bwd_inner_microstep: 1522.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 12:15:10,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.87 | bwd_microstep: 1497.73 | bwd_inner_microstep: 1497.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 12:15:12,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.65 | bwd_microstep: 1551.57 | bwd_inner_microstep: 1551.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 12:15:17,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.65
[2024-06-10 12:15:17,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 4546.94 | bwd_inner_microstep: 1523.95 | bwd_allreduce_microstep: 3022.94 | step_microstep: 38.03
[2024-06-10 12:15:17,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15393.57 | bwd: 44117.48 | bwd_inner: 41093.64 | bwd_allreduce: 3023.17 | step: 39.51
{'loss': 1.2813, 'learning_rate': 2.7956045639482365e-05, 'epoch': 0.39}


 39%|███▊      | 666/1726 [11:32:49<18:06:34, 61.50s/it]
 39%|███▊      | 667/1726 [11:33:49<18:00:06, 61.20s/it]


 39%|███▊      | 667/1726 [11:33:49<18:00:06, 61.20s/it]
 39%|███▊      | 668/1726 [11:34:52<18:06:09, 61.60s/it]


 39%|███▊      | 668/1726 [11:34:52<18:06:09, 61.60s/it]
 39%|███▉      | 669/1726 [11:35:54<18:10:09, 61.88s/it]


 39%|███▉      | 669/1726 [11:35:54<18:10:09, 61.88s/it]
 39%|███▉      | 670/1726 [11:36:54<17:59:43, 61.35s/it]


 39%|███▉      | 670/1726 [11:36:54<17:59:43, 61.35s/it]
 39%|███▉      | 671/1726 [11:37:54<17:50:43, 60.89s/it]


 39%|███▉      | 671dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1861
[2024-06-10 12:15:18,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.19 | bwd_microstep: 697.78 | bwd_inner_microstep: 697.68 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402
[2024-06-10 12:15:20,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.03 | bwd_microstep: 1176.50 | bwd_inner_microstep: 1176.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 12:15:22,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1472.79 | bwd_inner_microstep: 1472.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 12:15:24,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1496.92 | bwd_inner_microstep: 1496.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4227
[2024-06-10 12:15:26,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.27 | bwd_microstep: 1658.00 | bwd_inner_microstep: 1657.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2376
[2024-06-10 12:15:28,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.68 | bwd_microstep: 927.79 | bwd_inner_microstep: 927.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 12:15:29,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.35 | bwd_microstep: 1250.74 | bwd_inner_microstep: 1250.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 12:15:31,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.31 | bwd_microstep: 1289.48 | bwd_inner_microstep: 1289.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 12:15:32,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.94 | bwd_microstep: 787.65 | bwd_inner_microstep: 787.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 12:15:34,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.55 | bwd_microstep: 1151.70 | bwd_inner_microstep: 1151.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690
[2024-06-10 12:15:36,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.71 | bwd_microstep: 1325.21 | bwd_inner_microstep: 1325.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3497
[2024-06-10 12:15:37,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1245.76 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 12:15:39,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.25 | bwd_microstep: 1473.58 | bwd_inner_microstep: 1473.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3664
[2024-06-10 12:15:42,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.92 | bwd_microstep: 1816.16 | bwd_inner_microstep: 1816.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 12:15:44,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1384.91 | bwd_inner_microstep: 1384.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2988
[2024-06-10 12:15:45,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.44 | bwd_microstep: 1014.77 | bwd_inner_microstep: 1014.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603
[2024-06-10 12:15:47,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.26 | bwd_microstep: 1429.37 | bwd_inner_microstep: 1429.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2690
[2024-06-10 12:15:49,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.44 | bwd_microstep: 935.93 | bwd_inner_microstep: 935.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3606
[2024-06-10 12:15:51,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.07 | bwd_microstep: 1704.86 | bwd_inner_microstep: 1704.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-10 12:15:53,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.29 | bwd_microstep: 1537.14 | bwd_inner_microstep: 1537.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2282
[2024-06-10 12:15:54,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.31 | bwd_microstep: 1033.38 | bwd_inner_microstep: 1033.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2119
[2024-06-10 12:15:56,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.88 | bwd_microstep: 827.79 | bwd_inner_microstep: 827.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 12:15:58,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1556.33 | bwd_inner_microstep: 1556.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3530
[2024-06-10 12:15:59,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1257.57 | bwd_inner_microstep: 1257.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2296
[2024-06-10 12:16:01,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.03 | bwd_microstep: 910.57 | bwd_inner_microstep: 910.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 12:16:03,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1352.03 | bwd_inner_microstep: 1352.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748
[2024-06-10 12:16:05,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.57 | bwd_microstep: 1443.13 | bwd_inner_microstep: 1443.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716
[2024-06-10 12:16:06,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1336.74 | bwd_inner_microstep: 1336.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1144
[2024-06-10 12:16:07,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 176.95 | bwd_microstep: 460.69 | bwd_inner_microstep: 460.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2234
[2024-06-10 12:16:09,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.40 | bwd_microstep: 1060.08 | bwd_inner_microstep: 1060.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803
[2024-06-10 12:16:11,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.32 | bwd_microstep: 1457.73 | bwd_inner_microstep: 1457.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 12:16:19,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.45 | optimizer_step: 6.59
[2024-06-10 12:16:19,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 8238.54 | bwd_inner_microstep: 1855.06 | bwd_allreduce_microstep: 6383.41 | step_microstep: 40.42
[2024-06-10 12:16:19,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15034.07 | bwd: 46711.66 | bwd_inner: 40327.23 | bwd_allreduce: 6383.69 | step: 41.90
{'loss': 1.2295, 'learning_rate': 2.792159532078314e-05, 'epoch': 0.39}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 12:16:21,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.34 | bwd_microstep: 1364.01 | bwd_inner_microstep: 1363.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3936
[2024-06-10 12:16:24,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.96 | bwd_microstep: 1688.87 | bwd_inner_microstep: 1688.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 12:16:26,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1393.90 | bwd_inner_microstep: 1393.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 12:16:28,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1500.43 | bwd_inner_microstep: 1500.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3408
[2024-06-10 12:16:29,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.97 | bwd_microstep: 1294.80 | bwd_inner_microstep: 1294.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-10 12:16:31,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1308.55 | bwd_inner_microstep: 1308.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 12:16:33,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.31 | bwd_microstep: 1286.86 | bwd_inner_microstep: 1286.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1890
[2024-06-10 12:16:34,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.23 | bwd_microstep: 712.66 | bwd_inner_microstep: 712.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3636
[2024-06-10 12:16:36,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.77 | bwd_microstep: 1316.12 | bwd_inner_microstep: 1316.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 12:16:38,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1317.86 | bwd_inner_microstep: 1317.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 12:16:40,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1339.49 | bwd_inner_microstep: 1339.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 12:16:41,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.74 | bwd_microstep: 1339.11 | bwd_inner_microstep: 1339.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2043
[2024-06-10 12:16:42,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.87 | bwd_microstep: 717.26 | bwd_inner_microstep: 717.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499
[2024-06-10 12:16:45,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1578.56 | bwd_inner_microstep: 1578.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3657
[2024-06-10 12:16:46,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.51 | bwd_microstep: 1315.73 | bwd_inner_microstep: 1315.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 12:16:48,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1343.09 | bwd_inner_microstep: 1343.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3645
[2024-06-10 12:16:50,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1486.93 | bwd_inner_microstep: 1486.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3504
[2024-06-10 12:16:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1551.44 | bwd_inner_microstep: 1551.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 12:16:54,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1507.76 | bwd_inner_microstep: 1507.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-10 12:16:56,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1324.19 | bwd_inner_microstep: 1324.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 12:16:58,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.46 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 12:17:00,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1396.43 | bwd_inner_microstep: 1396.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 12:17:02,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.00 | bwd_microstep: 1556.11 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 12:17:04,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1507.09 | bwd_inner_microstep: 1507.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 12:17:06,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.69 | bwd_microstep: 1500.17 | bwd_inner_microstep: 1500.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2033
[2024-06-10 12:17:08,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.69 | bwd_microstep: 745.31 | bwd_inner_microstep: 745.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3623
[2024-06-10 12:17:10,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.57 | bwd_microstep: 1468.92 | bwd_inner_microstep: 1468.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2286
[2024-06-10 12:17:11,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.87 | bwd_microstep: 1071.20 | bwd_inner_microstep: 1071.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 12:17:13,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1376.80 | bwd_inner_microstep: 1376.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 12:17:15,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1446.55 | bwd_inner_microstep: 1446.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3759
[2024-06-10 12:17:17,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1404.75 | bwd_inner_microstep: 1404.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 12:17:22,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.24 | optimizer_step: 6.55
[2024-06-10 12:17:22,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.28 | bwd_microstep: 4083.45 | bwd_inner_microstep: 1809.42 | bwd_allreduce_microstep: 2273.96 | step_microstep: 38.34
[2024-06-10 12:17:22,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16160.02 | bwd: 45639.54 | bwd_inner: 43364.65 | bwd_allreduce: 2274.21 | step: 39.87
{'loss': 1.2368, 'learning_rate': 2.7887117102266373e-05, 'epoch': 0.39}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 12:17:23,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1360.38 | bwd_inner_microstep: 1360.29 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3907
[2024-06-10 12:17:26,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.01 | bwd_microstep: 1688.16 | bwd_inner_microstep: 1688.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 12:17:28,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1244.71 | bwd_inner_microstep: 1244.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 12:17:30,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1444.74 | bwd_inner_microstep: 1444.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 12:17:31,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.27 | bwd_microstep: 1276.54 | bwd_inner_microstep: 1276.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-10 12:17:33,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.73 | bwd_microstep: 1546.42 | bwd_inner_microstep: 1546.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3799
[2024-06-10 12:17:35,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.58 | bwd_microstep: 1508.20 | bwd_inner_microstep: 1508.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 12:17:37,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1382.95 | bwd_inner_microstep: 1382.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 12:17:39,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.95 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065
[2024-06-10 12:17:40,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.89 | bwd_microstep: 816.68 | bwd_inner_microstep: 816.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 12:17:42,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.30 | bwd_microstep: 1340.54 | bwd_inner_microstep: 1340.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3657
[2024-06-10 12:17:44,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.80 | bwd_microstep: 1449.20 | bwd_inner_microstep: 1449.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2947
[2024-06-10 12:17:46,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.48 | bwd_microstep: 1097.43 | bwd_inner_microstep: 1097.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406
[2024-06-10 12:17:47,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.48 | bwd_microstep: 1210.94 | bwd_inner_microstep: 1210.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 12:17:49,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1487.13 | bwd_inner_microstep: 1487.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3636
[2024-06-10 12:17:52,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.71 | bwd_microstep: 1705.52 | bwd_inner_microstep: 1705.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 12:17:54,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.14 | bwd_microstep: 1641.25 | bwd_inner_microstep: 1641.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3831
[2024-06-10 12:17:56,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1614.81 | bwd_inner_microstep: 1614.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-10 12:17:57,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.87 | bwd_microstep: 730.33 | bwd_inner_microstep: 730.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 12:17:59,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.91 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3880
[2024-06-10 12:18:01,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.42 | bwd_microstep: 1713.97 | bwd_inner_microstep: 1713.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615
[2024-06-10 12:18:02,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.32 | bwd_microstep: 260.75 | bwd_inner_microstep: 260.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 12:18:04,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.57 | bwd_microstep: 1456.35 | bwd_inner_microstep: 1456.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982
[2024-06-10 12:18:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.77 | bwd_microstep: 734.44 | bwd_inner_microstep: 734.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 12:18:07,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1648.16 | bwd_inner_microstep: 1648.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3586
[2024-06-10 12:18:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.63 | bwd_microstep: 1667.20 | bwd_inner_microstep: 1667.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3581
[2024-06-10 12:18:11,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.71 | bwd_microstep: 1544.69 | bwd_inner_microstep: 1544.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096
[2024-06-10 12:18:13,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 917.65 | bwd_inner_microstep: 917.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3766
[2024-06-10 12:18:15,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.47 | bwd_microstep: 1474.73 | bwd_inner_microstep: 1474.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3560
[2024-06-10 12:18:17,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1331.19 | bwd_inner_microstep: 1331.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 12:18:18,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.20 | bwd_microstep: 1307.06 | bwd_inner_microstep: 1307.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3566
[2024-06-10 12:18:24,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.21 | optimizer_step: 6.57
[2024-06-10 12:18:24,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 5019.53 | bwd_inner_microstep: 1859.77 | bwd_allreduce_microstep: 3159.71 | step_microstep: 40.51
[2024-06-10 12:18:24,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15949.51 | bwd: 46177.15 | bwd_inner: 43016.46 | bwd_allreduce: 3159.99 | step: 42.12
{'loss': 1.3422, 'learning_rate': 2.7852611105364175e-05, 'epoch': 0.39}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 12:18:26,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.06 | bwd_microstep: 1568.90 | bwd_inner_microstep: 1568.83 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2397
[2024-06-10 12:18:28,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.12 | bwd_microstep: 1000.25 | bwd_inner_microstep: 1000.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 866
[2024-06-10 12:18:28,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.18 | bwd_microstep: 362.65 | bwd_inner_microstep: 362.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 12:18:30,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1375.23 | bwd_inner_microstep: 1375.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866
[2024-06-10 12:18:31,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 674.22 | bwd_inner_microstep: 674.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 12:18:33,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.90 | bwd_microstep: 1283.20 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 12:18:34,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1249.70 | bwd_inner_microstep: 1249.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 12:18:36,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1246.07 | bwd_inner_microstep: 1246.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 12:18:38,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1382.72 | bwd_inner_microstep: 1382.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 12:18:39,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.18 | bwd_microstep: 795.64 | bwd_inner_microstep: 795.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3724
[2024-06-10 12:18:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.93 | bwd_microstep: 1728.62 | bwd_inner_microstep: 1728.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387
[2024-06-10 12:18:43,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 1335.95 | bwd_inner_microstep: 1335.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124
[2024-06-10 12:18:45,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.99 | bwd_microstep: 926.52 | bwd_inner_microstep: 926.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1987
[2024-06-10 12:18:46,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.03 | bwd_microstep: 827.95 | bwd_inner_microstep: 827.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 12:18:47,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.20 | bwd_microstep: 799.05 | bwd_inner_microstep: 799.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3840
[2024-06-10 12:18:49,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.49 | bwd_microstep: 1758.61 | bwd_inner_microstep: 1758.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652
[2024-06-10 12:18:51,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.09 | bwd_microstep: 1451.96 | bwd_inner_microstep: 1451.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-10 12:18:53,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.75 | bwd_microstep: 1191.54 | bwd_inner_microstep: 1191.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 12:18:55,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.91 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2091
[2024-06-10 12:18:56,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.32 | bwd_microstep: 852.06 | bwd_inner_microstep: 852.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 12:18:58,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.22 | bwd_microstep: 1325.95 | bwd_inner_microstep: 1325.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 12:19:00,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1404.84 | bwd_inner_microstep: 1404.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 12:19:02,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.82 | bwd_microstep: 1308.05 | bwd_inner_microstep: 1308.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 12:19:04,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1402.82 | bwd_inner_microstep: 1402.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 12:19:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1374.54 | bwd_inner_microstep: 1374.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830
[2024-06-10 12:19:08,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.38 | bwd_microstep: 1480.04 | bwd_inner_microstep: 1480.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 12:19:10,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.42 | bwd_microstep: 1634.80 | bwd_inner_microstep: 1634.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535
[2024-06-10 12:19:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1415.85 | bwd_inner_microstep: 1415.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3594
[2024-06-10 12:19:14,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.12 | bwd_microstep: 1463.45 | bwd_inner_microstep: 1463.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3811
[2024-06-10 12:19:16,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.23 | bwd_microstep: 1722.20 | bwd_inner_microstep: 1722.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998
[2024-06-10 12:19:17,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.25 | bwd_microstep: 896.99 | bwd_inner_microstep: 896.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3606
[2024-06-10 12:19:26,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 12:19:26,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.56 | bwd_microstep: 7800.40 | bwd_inner_microstep: 2044.92 | bwd_allreduce_microstep: 5755.42 | step_microstep: 38.00
[2024-06-10 12:19:26,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15120.90 | bwd: 46428.54 | bwd_inner: 40672.15 | bwd_allreduce: 5755.69 | step: 39.53
{'loss': 1.3037, 'learning_rate': 2.7818077451606486e-05, 'epoch': 0.39}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 12:19:27,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.15 | bwd_microstep: 790.14 | bwd_inner_microstep: 790.07 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893
[2024-06-10 12:19:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.28 | bwd_microstep: 1579.32 | bwd_inner_microstep: 1579.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3833
[2024-06-10 12:19:31,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1316.15 | bwd_inner_microstep: 1316.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-10 12:19:33,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.35 | bwd_microstep: 1182.76 | bwd_inner_microstep: 1182.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284
[2024-06-10 12:19:34,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.76 | bwd_microstep: 872.84 | bwd_inner_microstep: 872.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 12:19:35,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.88 | bwd_microstep: 1151.25 | bwd_inner_microstep: 1151.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704
[2024-06-10 12:19:38,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.65 | bwd_microstep: 1623.81 | bwd_inner_microstep: 1623.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1955
[2024-06-10 12:19:39,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.50 | bwd_microstep: 856.54 | bwd_inner_microstep: 856.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1903
[2024-06-10 12:19:40,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.90 | bwd_microstep: 712.84 | bwd_inner_microstep: 712.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 12:19:42,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.19 | bwd_microstep: 1617.53 | bwd_inner_microstep: 1617.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-10 12:19:44,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.41 | bwd_microstep: 1624.69 | bwd_inner_microstep: 1624.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 12:19:46,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1277.30 | bwd_inner_microstep: 1277.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 12:19:48,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.23 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 12:19:50,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1392.13 | bwd_inner_microstep: 1392.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-10 12:19:52,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.69 | bwd_microstep: 1562.48 | bwd_inner_microstep: 1562.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 12:19:54,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1414.96 | bwd_inner_microstep: 1414.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 12:19:56,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.52 | bwd_microstep: 1610.52 | bwd_inner_microstep: 1610.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 12:19:58,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1510.70 | bwd_inner_microstep: 1510.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 12:20:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1554.43 | bwd_inner_microstep: 1554.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291
[2024-06-10 12:20:02,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.23 | bwd_microstep: 909.44 | bwd_inner_microstep: 909.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3827
[2024-06-10 12:20:04,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.21 | bwd_microstep: 1580.84 | bwd_inner_microstep: 1580.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3714
[2024-06-10 12:20:06,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1363.68 | bwd_inner_microstep: 1363.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3685
[2024-06-10 12:20:08,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1489.35 | bwd_inner_microstep: 1489.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2013
[2024-06-10 12:20:09,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.93 | bwd_microstep: 867.45 | bwd_inner_microstep: 867.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 12:20:10,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.91 | bwd_microstep: 788.72 | bwd_inner_microstep: 788.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3685
[2024-06-10 12:20:12,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.31 | bwd_microstep: 1495.15 | bwd_inner_microstep: 1495.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 12:20:14,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.96 | bwd_microstep: 1450.02 | bwd_inner_microstep: 1449.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3432
[2024-06-10 12:20:16,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.62 | bwd_microstep: 1396.98 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825
[2024-06-10 12:20:19,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.10 | bwd_microstep: 1754.70 | bwd_inner_microstep: 1754.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 12:20:21,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.37 | bwd_microstep: 1696.15 | bwd_inner_microstep: 1696.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 12:20:23,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.42 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1481.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3556
[2024-06-10 12:20:26,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.57
[2024-06-10 12:20:26,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.54 | bwd_microstep: 2911.22 | bwd_inner_microstep: 1795.09 | bwd_allreduce_microstep: 1116.08 | step_microstep: 37.82
[2024-06-10 12:20:26,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.14 | bwd: 44225.62 | bwd_inner: 43108.59 | bwd_allreduce: 1116.34 | step: 39.29
{'loss': 1.208, 'learning_rate': 2.7783516262620657e-05, 'epoch': 0.39}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 12:20:28,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1368.80 | bwd_inner_microstep: 1368.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 12:20:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1384.52 | bwd_inner_microstep: 1384.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3879
[2024-06-10 12:20:33,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1681.70 | bwd_inner_microstep: 1681.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2459
[2024-06-10 12:20:34,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.90 | bwd_microstep: 952.33 | bwd_inner_microstep: 952.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 12:20:36,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.22 | bwd_microstep: 1151.81 | bwd_inner_microstep: 1151.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 12:20:37,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 790.87 | bwd_inner_microstep: 790.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 12:20:39,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 12:20:41,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.52 | bwd_microstep: 1633.47 | bwd_inner_microstep: 1633.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 12:20:43,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.17 | bwd_microstep: 1526.88 | bwd_inner_microstep: 1526.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 12:20:45,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.47 | bwd_microstep: 1387.52 | bwd_inner_microstep: 1387.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 12:20:47,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1288.17 | bwd_inner_microstep: 1288.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 12:20:48,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1252.70 | bwd_inner_microstep: 1252.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.62
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4026
[2024-06-10 12:20:50,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.97 | bwd_microstep: 1542.28 | bwd_inner_microstep: 1542.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 12:20:52,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.51 | bwd_microstep: 822.00 | bwd_inner_microstep: 821.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2097
[2024-06-10 12:20:53,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.72 | bwd_microstep: 918.06 | bwd_inner_microstep: 918.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 12:20:55,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.85 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 12:20:57,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.70 | bwd_microstep: 1606.27 | bwd_inner_microstep: 1606.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 12:20:59,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1510.36 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 12:21:01,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.55 | bwd_microstep: 1555.52 | bwd_inner_microstep: 1555.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 12:21:03,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.93 | bwd_microstep: 1257.92 | bwd_inner_microstep: 1257.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 12:21:05,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1397.80 | bwd_inner_microstep: 1397.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 12:21:06,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.01 | bwd_microstep: 975.38 | bwd_inner_microstep: 975.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2922
[2024-06-10 12:21:08,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.00 | bwd_microstep: 1283.01 | bwd_inner_microstep: 1282.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 12:21:10,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.55 | bwd_microstep: 1254.81 | bwd_inner_microstep: 1254.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 12:21:12,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.52 | bwd_microstep: 1488.92 | bwd_inner_microstep: 1488.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 12:21:13,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.68 | bwd_microstep: 794.99 | bwd_inner_microstep: 794.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3612
[2024-06-10 12:21:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.73 | bwd_microstep: 1709.14 | bwd_inner_microstep: 1709.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 12:21:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.50 | bwd_microstep: 1403.36 | bwd_inner_microstep: 1403.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2673
[2024-06-10 12:21:19,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.18 | bwd_microstep: 959.43 | bwd_inner_microstep: 959.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 12:21:21,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1396.53 | bwd_inner_microstep: 1396.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2227
[2024-06-10 12:21:22,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.52 | bwd_microstep: 865.76 | bwd_inner_microstep: 865.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 12:21:27,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 12:21:27,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.09 | bwd_microstep: 4449.68 | bwd_inner_microstep: 1523.54 | bwd_allreduce_microstep: 2926.09 | step_microstep: 38.00
[2024-06-10 12:21:27,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15477.63 | bwd: 44476.78 | bwd_inner: 41549.78 | bwd_allreduce: 2926.32 | step: 41.09
/1726 [11:37:54<17:50:43, 60.89s/it]
 39%|███▉      | 672/1726 [11:38:56<17:55:53, 61.25s/it]


 39%|███▉      | 672/1726 [11:38:56<17:55:53, 61.25s/it]
 39%|███▉      | 673/1726 [11:39:58<17:59:33, 61.51s/it]


 39%|███▉      | 673/1726 [11:39:58<17:59:33, 61.51s/it]
 39%|███▉      | 674/1726 [11:41:01<18:03:31, 61.80s/it]


 39%|███▉      | 674/1726 [11:41:01<18:03:31, 61.80s/it]
 39%|███▉      | 675/1726 [11:42:03<18:02:58, 61.83s/it]


 39%|███▉      | 675/1726 [11:42:03<18:02:58, 61.83s/it]
 39%|███▉      | 676/1726 [11:43:03<17:55:17, 61.45s/it]


 39%|███▉      | 676/1726 [11:43:03<17:55:17, 61.45s/it]
 39%|███▉      | 677/1726 [11:44:04<17:{'loss': 1.2973, 'learning_rate': 2.7748927660131006e-05, 'epoch': 0.39}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457
[2024-06-10 12:21:29,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.44 | bwd_microstep: 1561.37 | bwd_inner_microstep: 1561.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 12:21:31,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.27 | bwd_microstep: 1339.08 | bwd_inner_microstep: 1339.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3882
[2024-06-10 12:21:33,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.57 | bwd_microstep: 1577.91 | bwd_inner_microstep: 1577.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 12:21:35,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.92 | bwd_microstep: 1443.98 | bwd_inner_microstep: 1443.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 12:21:37,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1382.92 | bwd_inner_microstep: 1382.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 12:21:39,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.03 | bwd_microstep: 1282.58 | bwd_inner_microstep: 1282.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 12:21:40,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1282.46 | bwd_inner_microstep: 1282.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 12:21:42,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 794.83 | bwd_inner_microstep: 794.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2060
[2024-06-10 12:21:43,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.77 | bwd_microstep: 786.22 | bwd_inner_microstep: 786.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3560
[2024-06-10 12:21:44,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.64 | bwd_microstep: 1205.11 | bwd_inner_microstep: 1205.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2067
[2024-06-10 12:21:45,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 785.15 | bwd_inner_microstep: 785.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 12:21:47,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1381.35 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 12:21:49,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1512.98 | bwd_inner_microstep: 1512.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450
[2024-06-10 12:21:51,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 12:21:53,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.55 | bwd_microstep: 1406.24 | bwd_inner_microstep: 1406.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501
[2024-06-10 12:21:55,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.16 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 12:21:57,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1487.99 | bwd_inner_microstep: 1487.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 12:21:59,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1509.75 | bwd_inner_microstep: 1509.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2406
[2024-06-10 12:22:01,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.41 | bwd_microstep: 963.27 | bwd_inner_microstep: 963.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 12:22:03,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.31 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 12:22:05,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1395.58 | bwd_inner_microstep: 1395.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912
[2024-06-10 12:22:06,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.01 | bwd_microstep: 718.12 | bwd_inner_microstep: 718.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1982
[2024-06-10 12:22:07,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.42 | bwd_microstep: 765.99 | bwd_inner_microstep: 765.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 12:22:08,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1298.58 | bwd_inner_microstep: 1298.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 12:22:11,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.24 | bwd_microstep: 1576.98 | bwd_inner_microstep: 1576.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832
[2024-06-10 12:22:13,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.03 | bwd_microstep: 1388.41 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 12:22:14,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1401.03 | bwd_inner_microstep: 1401.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 12:22:16,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.71 | bwd_microstep: 1256.54 | bwd_inner_microstep: 1256.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2245
[2024-06-10 12:22:18,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.56 | bwd_microstep: 997.11 | bwd_inner_microstep: 997.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 12:22:20,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1489.19 | bwd_inner_microstep: 1489.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 12:22:22,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1549.72 | bwd_inner_microstep: 1549.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576
[2024-06-10 12:22:29,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-10 12:22:29,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 6240.06 | bwd_inner_microstep: 1499.67 | bwd_allreduce_microstep: 4740.32 | step_microstep: 39.01
[2024-06-10 12:22:29,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15459.24 | bwd: 46016.22 | bwd_inner: 41274.98 | bwd_allreduce: 4740.57 | step: 40.46
{'loss': 1.2812, 'learning_rate': 2.771431176595843e-05, 'epoch': 0.39}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 12:22:30,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 12:22:32,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.85 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 12:22:35,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1657.64 | bwd_inner_microstep: 1657.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-10 12:22:36,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.55 | bwd_microstep: 684.82 | bwd_inner_microstep: 684.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 12:22:37,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.32 | bwd_microstep: 1243.36 | bwd_inner_microstep: 1243.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 12:22:39,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1549.46 | bwd_inner_microstep: 1549.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 12:22:40,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.16 | bwd_microstep: 684.95 | bwd_inner_microstep: 684.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 12:22:42,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.89 | bwd_microstep: 1386.57 | bwd_inner_microstep: 1386.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 12:22:44,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1409.44 | bwd_inner_microstep: 1409.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975
[2024-06-10 12:22:45,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.54 | bwd_microstep: 829.58 | bwd_inner_microstep: 829.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 12:22:47,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.80 | bwd_microstep: 1381.02 | bwd_inner_microstep: 1380.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 12:22:49,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.23 | bwd_microstep: 1373.42 | bwd_inner_microstep: 1373.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3422
[2024-06-10 12:22:51,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1407.22 | bwd_inner_microstep: 1407.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 12:22:54,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.73 | bwd_microstep: 1717.29 | bwd_inner_microstep: 1717.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3428
[2024-06-10 12:22:56,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1475.74 | bwd_inner_microstep: 1475.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1985
[2024-06-10 12:22:57,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.06 | bwd_microstep: 833.69 | bwd_inner_microstep: 833.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 12:22:59,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.17 | bwd_microstep: 1388.26 | bwd_inner_microstep: 1388.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 635
[2024-06-10 12:22:59,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.05 | bwd_microstep: 265.65 | bwd_inner_microstep: 265.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2413
[2024-06-10 12:23:00,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.06 | bwd_microstep: 938.33 | bwd_inner_microstep: 938.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 12:23:01,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.49 | bwd_microstep: 813.99 | bwd_inner_microstep: 813.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 12:23:03,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1392.28 | bwd_inner_microstep: 1392.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182
[2024-06-10 12:23:05,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.48 | bwd_microstep: 857.29 | bwd_inner_microstep: 857.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 12:23:07,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1507.94 | bwd_inner_microstep: 1507.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 12:23:08,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.64 | bwd_microstep: 1300.52 | bwd_inner_microstep: 1300.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 12:23:10,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1375.64 | bwd_inner_microstep: 1375.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 12:23:12,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1507.35 | bwd_inner_microstep: 1507.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3822
[2024-06-10 12:23:15,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.88 | bwd_microstep: 1721.59 | bwd_inner_microstep: 1721.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 12:23:17,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1395.74 | bwd_inner_microstep: 1395.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2087
[2024-06-10 12:23:18,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.68 | bwd_microstep: 819.16 | bwd_inner_microstep: 819.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 12:23:20,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.94 | bwd_microstep: 1601.14 | bwd_inner_microstep: 1601.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 12:23:22,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1446.25 | bwd_inner_microstep: 1446.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2404
[2024-06-10 12:23:30,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.31 | optimizer_step: 6.62
[2024-06-10 12:23:30,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.25 | bwd_microstep: 8010.94 | bwd_inner_microstep: 1173.62 | bwd_allreduce_microstep: 6837.26 | step_microstep: 38.46
[2024-06-10 12:23:30,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14852.93 | bwd: 46701.10 | bwd_inner: 39862.93 | bwd_allreduce: 6837.49 | step: 39.88
{'loss': 1.2693, 'learning_rate': 2.767966870201991e-05, 'epoch': 0.39}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 12:23:32,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1336.57 | bwd_inner_microstep: 1336.50 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-10 12:23:35,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.10 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2804
[2024-06-10 12:23:36,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.48 | bwd_microstep: 1106.16 | bwd_inner_microstep: 1106.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 12:23:38,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.04 | bwd_microstep: 1244.33 | bwd_inner_microstep: 1244.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-10 12:23:39,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.03 | bwd_microstep: 872.28 | bwd_inner_microstep: 872.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 12:23:41,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.35 | bwd_microstep: 1274.85 | bwd_inner_microstep: 1274.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 12:23:43,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1381.35 | bwd_inner_microstep: 1381.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 12:23:45,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.11 | bwd_microstep: 1633.03 | bwd_inner_microstep: 1633.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3684
[2024-06-10 12:23:47,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.23 | bwd_microstep: 1581.07 | bwd_inner_microstep: 1581.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-10 12:23:48,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 713.18 | bwd_inner_microstep: 713.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-10 12:23:50,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.78 | bwd_microstep: 1615.50 | bwd_inner_microstep: 1615.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542
[2024-06-10 12:23:52,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.24 | bwd_microstep: 1448.26 | bwd_inner_microstep: 1448.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3428
[2024-06-10 12:23:54,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1313.20 | bwd_inner_microstep: 1313.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3670
[2024-06-10 12:23:56,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.02 | bwd_microstep: 1689.26 | bwd_inner_microstep: 1689.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 12:23:58,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1249.07 | bwd_inner_microstep: 1249.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2923
[2024-06-10 12:24:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.77 | bwd_microstep: 1186.90 | bwd_inner_microstep: 1186.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 12:24:02,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.00 | bwd_microstep: 1470.91 | bwd_inner_microstep: 1470.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 12:24:04,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1373.02 | bwd_inner_microstep: 1372.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 12:24:06,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1608.30 | bwd_inner_microstep: 1608.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 12:24:08,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1288.74 | bwd_inner_microstep: 1288.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 12:24:10,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1416.36 | bwd_inner_microstep: 1416.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978
[2024-06-10 12:24:11,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.69 | bwd_microstep: 736.85 | bwd_inner_microstep: 736.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 12:24:13,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.27 | bwd_microstep: 1348.57 | bwd_inner_microstep: 1348.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-10 12:24:14,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1344.24 | bwd_inner_microstep: 1344.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679
[2024-06-10 12:24:16,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1326.29 | bwd_inner_microstep: 1326.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 12:24:18,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.83 | bwd_microstep: 1403.92 | bwd_inner_microstep: 1403.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 12:24:20,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.70 | bwd_microstep: 976.13 | bwd_inner_microstep: 976.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3420
[2024-06-10 12:24:22,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1422.71 | bwd_inner_microstep: 1422.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 12:24:23,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.33 | bwd_microstep: 685.72 | bwd_inner_microstep: 685.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 12:24:25,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1514.12 | bwd_inner_microstep: 1514.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043
[2024-06-10 12:24:26,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.68 | bwd_microstep: 811.09 | bwd_inner_microstep: 811.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 12:24:31,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 12:24:31,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 4999.74 | bwd_inner_microstep: 1529.26 | bwd_allreduce_microstep: 3470.43 | step_microstep: 38.11
[2024-06-10 12:24:31,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15475.05 | bwd: 45026.72 | bwd_inner: 41555.34 | bwd_allreduce: 3470.68 | step: 39.61
{'loss': 1.2521, 'learning_rate': 2.7644998590328145e-05, 'epoch': 0.39}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 12:24:33,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.06 | bwd_microstep: 1378.99 | bwd_inner_microstep: 1378.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 12:24:35,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1340.93 | bwd_inner_microstep: 1340.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867
[2024-06-10 12:24:37,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.84 | bwd_microstep: 1664.61 | bwd_inner_microstep: 1664.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4168
[2024-06-10 12:24:40,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.77 | bwd_microstep: 1578.39 | bwd_inner_microstep: 1578.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3988
[2024-06-10 12:24:42,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1503.11 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2465
[2024-06-10 12:24:43,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 857.32 | bwd_inner_microstep: 857.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 12:24:44,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.59 | bwd_microstep: 697.72 | bwd_inner_microstep: 697.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 12:24:46,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.36 | bwd_microstep: 1425.90 | bwd_inner_microstep: 1425.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2645
[2024-06-10 12:24:47,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.51 | bwd_microstep: 1114.19 | bwd_inner_microstep: 1114.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534
[2024-06-10 12:24:49,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.30 | bwd_microstep: 1226.87 | bwd_inner_microstep: 1226.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 12:24:51,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 1482.10 | bwd_inner_microstep: 1482.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3150
[2024-06-10 12:24:53,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.46 | bwd_microstep: 1253.61 | bwd_inner_microstep: 1253.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3410
[2024-06-10 12:24:55,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.21 | bwd_microstep: 1503.15 | bwd_inner_microstep: 1503.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 12:24:57,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3400
[2024-06-10 12:24:58,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1294.81 | bwd_inner_microstep: 1294.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-10 12:25:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.26 | bwd_microstep: 1610.00 | bwd_inner_microstep: 1609.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3539
[2024-06-10 12:25:03,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1447.37 | bwd_inner_microstep: 1447.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2232
[2024-06-10 12:25:04,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.76 | bwd_microstep: 898.31 | bwd_inner_microstep: 898.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3457
[2024-06-10 12:25:06,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.72 | bwd_microstep: 1241.67 | bwd_inner_microstep: 1241.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 12:25:08,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1490.09 | bwd_inner_microstep: 1490.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 12:25:10,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.30 | bwd_microstep: 1563.73 | bwd_inner_microstep: 1563.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1973
[2024-06-10 12:25:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.89 | bwd_microstep: 828.34 | bwd_inner_microstep: 828.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1477
[2024-06-10 12:25:12,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 221.21 | bwd_microstep: 578.57 | bwd_inner_microstep: 578.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 12:25:13,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.62 | bwd_microstep: 800.47 | bwd_inner_microstep: 800.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 12:25:15,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1274.29 | bwd_inner_microstep: 1274.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 12:25:17,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-10 12:25:19,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1348.70 | bwd_inner_microstep: 1348.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3440
[2024-06-10 12:25:20,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.82 | bwd_microstep: 1185.90 | bwd_inner_microstep: 1185.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 12:25:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 12:25:24,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.12 | bwd_microstep: 1591.82 | bwd_inner_microstep: 1591.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3590
[2024-06-10 12:25:27,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.52 | bwd_microstep: 1675.81 | bwd_inner_microstep: 1675.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2230
[2024-06-10 12:25:33,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 12:25:33,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.79 | bwd_microstep: 6180.97 | bwd_inner_microstep: 1200.60 | bwd_allreduce_microstep: 4980.31 | step_microstep: 38.03
[2024-06-10 12:25:33,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15402.15 | bwd: 46177.77 | bwd_inner: 41196.54 | bwd_allreduce: 4980.55 | step: 39.48
{'loss': 1.254, 'learning_rate': 2.7610301552991092e-05, 'epoch': 0.39}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3412
[2024-06-10 12:25:35,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1396.70 | bwd_inner_microstep: 1396.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3954
[2024-06-10 12:25:37,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.55 | bwd_microstep: 1694.51 | bwd_inner_microstep: 1694.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1894
[2024-06-10 12:25:39,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.16 | bwd_microstep: 775.94 | bwd_inner_microstep: 775.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 12:25:40,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1376.11 | bwd_inner_microstep: 1376.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3417
[2024-06-10 12:25:42,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.40 | bwd_microstep: 1199.91 | bwd_inner_microstep: 1199.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 12:25:44,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.04 | bwd_microstep: 1292.64 | bwd_inner_microstep: 1292.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 12:25:46,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.74 | bwd_microstep: 1384.96 | bwd_inner_microstep: 1384.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 12:25:48,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1388.90 | bwd_inner_microstep: 1388.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 12:25:50,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1411.68 | bwd_inner_microstep: 1411.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 12:25:51,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1252.28 | bwd_inner_microstep: 1252.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-10 12:25:54,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.29 | bwd_microstep: 1614.14 | bwd_inner_microstep: 1614.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-10 12:25:56,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.46 | bwd_microstep: 1615.39 | bwd_inner_microstep: 1615.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-10 12:25:58,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.71 | bwd_microstep: 1585.03 | bwd_inner_microstep: 1585.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3038
[2024-06-10 12:26:00,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.43 | bwd_microstep: 1134.31 | bwd_inner_microstep: 1134.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2422
[2024-06-10 12:26:01,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.59 | bwd_microstep: 940.13 | bwd_inner_microstep: 940.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-10 12:26:03,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.36 | bwd_microstep: 1417.18 | bwd_inner_microstep: 1417.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3840
[2024-06-10 12:26:05,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1557.49 | bwd_inner_microstep: 1557.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2693
[2024-06-10 12:26:06,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.45 | bwd_microstep: 1032.68 | bwd_inner_microstep: 1032.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 12:26:09,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.45 | bwd_microstep: 1553.03 | bwd_inner_microstep: 1553.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 12:26:10,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1279.02 | bwd_inner_microstep: 1279.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 12:26:12,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1377.21 | bwd_inner_microstep: 1377.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 12:26:14,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1495.26 | bwd_inner_microstep: 1495.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017
[2024-06-10 12:26:15,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.74 | bwd_microstep: 834.65 | bwd_inner_microstep: 834.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 12:26:17,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.45 | bwd_microstep: 1296.22 | bwd_inner_microstep: 1296.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2031
[2024-06-10 12:26:18,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.59 | bwd_microstep: 841.20 | bwd_inner_microstep: 841.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 12:26:20,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.34 | bwd_microstep: 1482.07 | bwd_inner_microstep: 1482.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 12:26:22,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.22 | bwd_microstep: 805.73 | bwd_inner_microstep: 805.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 12:26:24,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.87 | bwd_microstep: 1650.34 | bwd_inner_microstep: 1650.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 12:26:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.06 | bwd_microstep: 1537.44 | bwd_inner_microstep: 1537.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043
[2024-06-10 12:26:27,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.80 | bwd_microstep: 904.39 | bwd_inner_microstep: 904.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 12:26:29,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1349.23 | bwd_inner_microstep: 1349.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 12:26:34,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 12:26:34,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 4641.25 | bwd_inner_microstep: 1584.55 | bwd_allreduce_microstep: 3056.65 | step_microstep: 38.22
[2024-06-10 12:26:34,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15673.76 | bwd: 45117.05 | bwd_inner: 42059.47 | bwd_allreduce: 3056.89 | step: 39.73
{'loss': 1.2589, 'learning_rate': 2.7575577712211524e-05, 'epoch': 0.4}
48:10, 61.10s/it]


 39%|███▉      | 677/1726 [11:44:04<17:48:10, 61.10s/it]
 39%|███▉      | 678/1726 [11:45:05<17:50:50, 61.31s/it]


 39%|███▉      | 678/1726 [11:45:05<17:50:50, 61.31s/it]
 39%|███▉      | 679/1726 [11:46:07<17:52:49, 61.48s/it]


 39%|███▉      | 679/1726 [11:46:07<17:52:49, 61.48s/it]
 39%|███▉      | 680/1726 [11:47:08<17:48:24, 61.29s/it]


 39%|███▉      | 680/1726 [11:47:08<17:48:24, 61.29s/it]
 39%|███▉      | 681/1726 [11:48:10<17:50:37, 61.47s/it]


 39%|███▉      | 681/1726 [11:48:10<17:50:37, 61.47s/it]
 40%|███▉      | 682/1726 [11:49:11<17:47:47, 61.37s/it]


 40%|█dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3522
[2024-06-10 12:26:36,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.05 | bwd_microstep: 1429.62 | bwd_inner_microstep: 1429.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3963
[2024-06-10 12:26:38,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.24 | bwd_microstep: 1592.10 | bwd_inner_microstep: 1592.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2314
[2024-06-10 12:26:40,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.72 | bwd_microstep: 976.69 | bwd_inner_microstep: 976.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 12:26:42,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.22 | bwd_microstep: 1276.42 | bwd_inner_microstep: 1276.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862
[2024-06-10 12:26:44,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.60 | bwd_microstep: 1662.19 | bwd_inner_microstep: 1662.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3868
[2024-06-10 12:26:46,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1365.60 | bwd_inner_microstep: 1365.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 12:26:48,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1255.40 | bwd_inner_microstep: 1255.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 12:26:50,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.21 | bwd_microstep: 1531.91 | bwd_inner_microstep: 1531.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3761
[2024-06-10 12:26:52,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1501.23 | bwd_inner_microstep: 1501.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1882
[2024-06-10 12:26:53,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.21 | bwd_microstep: 743.67 | bwd_inner_microstep: 743.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1898
[2024-06-10 12:26:54,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.84 | bwd_microstep: 775.58 | bwd_inner_microstep: 775.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3471
[2024-06-10 12:26:56,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.37 | bwd_microstep: 1230.86 | bwd_inner_microstep: 1230.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 12:26:57,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1394.57 | bwd_inner_microstep: 1394.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 12:27:00,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1500.06 | bwd_inner_microstep: 1500.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 12:27:01,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.42 | bwd_microstep: 1248.43 | bwd_inner_microstep: 1248.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3649
[2024-06-10 12:27:03,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.74 | bwd_microstep: 1615.59 | bwd_inner_microstep: 1615.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1987
[2024-06-10 12:27:05,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.37 | bwd_microstep: 844.91 | bwd_inner_microstep: 844.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 12:27:07,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.55 | bwd_microstep: 1625.53 | bwd_inner_microstep: 1625.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 12:27:09,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.39 | bwd_microstep: 1382.83 | bwd_inner_microstep: 1382.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 12:27:11,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1493.50 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652
[2024-06-10 12:27:13,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.03 | bwd_microstep: 1323.85 | bwd_inner_microstep: 1323.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079
[2024-06-10 12:27:14,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.36 | bwd_microstep: 819.10 | bwd_inner_microstep: 819.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 12:27:16,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.98 | bwd_microstep: 1528.93 | bwd_inner_microstep: 1528.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3537
[2024-06-10 12:27:18,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.66 | bwd_microstep: 1441.51 | bwd_inner_microstep: 1441.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 12:27:20,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1252.72 | bwd_inner_microstep: 1252.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-10 12:27:21,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.70 | bwd_microstep: 1155.05 | bwd_inner_microstep: 1155.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 12:27:23,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.38 | bwd_microstep: 1311.46 | bwd_inner_microstep: 1311.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3819
[2024-06-10 12:27:25,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.89 | bwd_microstep: 1578.23 | bwd_inner_microstep: 1578.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3812
[2024-06-10 12:27:27,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.06 | bwd_microstep: 1599.97 | bwd_inner_microstep: 1599.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2954
[2024-06-10 12:27:29,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.91 | bwd_microstep: 1101.89 | bwd_inner_microstep: 1101.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219
[2024-06-10 12:27:30,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.35 | bwd_microstep: 957.05 | bwd_inner_microstep: 957.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 12:27:34,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 12:27:34,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.43 | bwd_microstep: 2932.15 | bwd_inner_microstep: 2107.17 | bwd_allreduce_microstep: 824.93 | step_microstep: 37.66
[2024-06-10 12:27:34,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15776.78 | bwd: 43448.62 | bwd_inner: 42622.79 | bwd_allreduce: 825.16 | step: 39.10
{'loss': 1.2886, 'learning_rate': 2.754082719028664e-05, 'epoch': 0.4}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2012
[2024-06-10 12:27:35,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.82 | bwd_microstep: 891.43 | bwd_inner_microstep: 891.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 12:27:37,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.10 | bwd_microstep: 1341.99 | bwd_inner_microstep: 1341.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 12:27:39,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1550.35 | bwd_inner_microstep: 1550.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3843
[2024-06-10 12:27:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.68 | bwd_microstep: 1586.99 | bwd_inner_microstep: 1586.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 12:27:43,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1381.32 | bwd_inner_microstep: 1381.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2277
[2024-06-10 12:27:44,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.23 | bwd_microstep: 935.46 | bwd_inner_microstep: 935.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2212
[2024-06-10 12:27:46,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 809.43 | bwd_inner_microstep: 809.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 12:27:47,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.48 | bwd_microstep: 1150.01 | bwd_inner_microstep: 1149.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 12:27:49,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.76 | bwd_microstep: 1388.80 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3647
[2024-06-10 12:27:51,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.97 | bwd_microstep: 1219.76 | bwd_inner_microstep: 1219.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 12:27:53,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.25 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672
[2024-06-10 12:27:55,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.72 | bwd_microstep: 1546.59 | bwd_inner_microstep: 1546.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1923
[2024-06-10 12:27:56,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.98 | bwd_microstep: 726.59 | bwd_inner_microstep: 726.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-10 12:27:58,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.32 | bwd_microstep: 1579.09 | bwd_inner_microstep: 1579.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-10 12:28:00,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1605.92 | bwd_inner_microstep: 1605.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2528
[2024-06-10 12:28:02,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.99 | bwd_microstep: 1151.01 | bwd_inner_microstep: 1150.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3432
[2024-06-10 12:28:04,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1375.88 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 12:28:06,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1281.43 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3830
[2024-06-10 12:28:07,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.44 | bwd_microstep: 1417.11 | bwd_inner_microstep: 1417.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 12:28:10,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1530.27 | bwd_inner_microstep: 1530.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 987
[2024-06-10 12:28:10,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.40 | bwd_microstep: 390.51 | bwd_inner_microstep: 390.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 12:28:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.28 | bwd_microstep: 1605.61 | bwd_inner_microstep: 1605.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 12:28:14,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.38 | bwd_microstep: 1404.50 | bwd_inner_microstep: 1404.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 12:28:16,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.07 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-10 12:28:18,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 12:28:20,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.35 | bwd_microstep: 1624.40 | bwd_inner_microstep: 1624.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 12:28:22,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.97 | bwd_microstep: 1254.45 | bwd_inner_microstep: 1254.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816
[2024-06-10 12:28:24,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.63 | bwd_microstep: 1597.37 | bwd_inner_microstep: 1597.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 12:28:26,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.08 | bwd_microstep: 1504.89 | bwd_inner_microstep: 1504.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 12:28:28,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.29 | bwd_microstep: 808.09 | bwd_inner_microstep: 808.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-10 12:28:30,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.20 | bwd_microstep: 1622.75 | bwd_inner_microstep: 1622.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558
[2024-06-10 12:28:36,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-10 12:28:36,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.18 | bwd_microstep: 5543.82 | bwd_inner_microstep: 1755.10 | bwd_allreduce_microstep: 3788.67 | step_microstep: 37.87
[2024-06-10 12:28:36,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15696.99 | bwd: 46030.41 | bwd_inner: 42240.82 | bwd_allreduce: 3788.90 | step: 39.33
{'loss': 1.3169, 'learning_rate': 2.750605010960759e-05, 'epoch': 0.4}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 12:28:38,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.77 | bwd_microstep: 1434.71 | bwd_inner_microstep: 1434.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4007
[2024-06-10 12:28:40,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.38 | bwd_microstep: 1505.15 | bwd_inner_microstep: 1505.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 12:28:42,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.21 | bwd_microstep: 1655.96 | bwd_inner_microstep: 1655.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3407
[2024-06-10 12:28:44,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1440.69 | bwd_inner_microstep: 1440.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 12:28:46,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.52 | bwd_microstep: 1149.19 | bwd_inner_microstep: 1149.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217
[2024-06-10 12:28:47,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.70 | bwd_microstep: 956.00 | bwd_inner_microstep: 955.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 12:28:48,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.67 | bwd_microstep: 787.07 | bwd_inner_microstep: 787.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 12:28:50,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-10 12:28:52,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1527.43 | bwd_inner_microstep: 1527.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2902
[2024-06-10 12:28:54,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.38 | bwd_microstep: 1090.95 | bwd_inner_microstep: 1090.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 12:28:56,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.74 | bwd_microstep: 1628.08 | bwd_inner_microstep: 1628.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502
[2024-06-10 12:28:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1347.20 | bwd_inner_microstep: 1347.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2097
[2024-06-10 12:28:59,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.16 | bwd_microstep: 858.53 | bwd_inner_microstep: 858.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3917
[2024-06-10 12:29:01,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.90 | bwd_microstep: 1684.89 | bwd_inner_microstep: 1684.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 12:29:03,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1451.01 | bwd_inner_microstep: 1450.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2605
[2024-06-10 12:29:05,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.68 | bwd_microstep: 1124.22 | bwd_inner_microstep: 1124.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 12:29:07,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.72 | bwd_microstep: 1159.74 | bwd_inner_microstep: 1159.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 12:29:09,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.31 | bwd_microstep: 1493.26 | bwd_inner_microstep: 1493.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 12:29:10,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.27 | bwd_microstep: 1282.79 | bwd_inner_microstep: 1282.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1939
[2024-06-10 12:29:12,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.17 | bwd_microstep: 886.65 | bwd_inner_microstep: 886.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 12:29:14,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1514.54 | bwd_inner_microstep: 1514.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 12:29:16,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.16 | bwd_microstep: 1459.79 | bwd_inner_microstep: 1459.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 12:29:18,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.03 | bwd_microstep: 1539.62 | bwd_inner_microstep: 1539.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 12:29:20,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1394.81 | bwd_inner_microstep: 1394.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671
[2024-06-10 12:29:22,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.92 | bwd_microstep: 1551.08 | bwd_inner_microstep: 1551.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 12:29:24,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 1433.38 | bwd_inner_microstep: 1433.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2395
[2024-06-10 12:29:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.92 | bwd_microstep: 1005.48 | bwd_inner_microstep: 1005.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510
[2024-06-10 12:29:27,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.71 | bwd_microstep: 1418.02 | bwd_inner_microstep: 1417.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3604
[2024-06-10 12:29:29,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.62 | bwd_microstep: 1451.68 | bwd_inner_microstep: 1451.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2229
[2024-06-10 12:29:30,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.54 | bwd_microstep: 769.57 | bwd_inner_microstep: 769.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568
[2024-06-10 12:29:33,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1597.55 | bwd_inner_microstep: 1597.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 12:29:36,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 12:29:36,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 3081.85 | bwd_inner_microstep: 1528.02 | bwd_allreduce_microstep: 1553.78 | step_microstep: 39.77
[2024-06-10 12:29:36,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15815.83 | bwd: 44066.00 | bwd_inner: 42511.32 | bwd_allreduce: 1554.01 | step: 41.25
{'loss': 1.2293, 'learning_rate': 2.7471246592659075e-05, 'epoch': 0.4}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3463
[2024-06-10 12:29:38,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.28 | bwd_microstep: 1562.45 | bwd_inner_microstep: 1562.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3888
[2024-06-10 12:29:41,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.60 | bwd_microstep: 1721.02 | bwd_inner_microstep: 1720.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 12:29:43,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1655.64 | bwd_inner_microstep: 1655.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 12:29:45,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1549.61 | bwd_inner_microstep: 1549.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3762
[2024-06-10 12:29:47,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.09 | bwd_microstep: 1339.82 | bwd_inner_microstep: 1339.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 12:29:49,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 12:29:50,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1150.62 | bwd_inner_microstep: 1150.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 12:29:52,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.31 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 12:29:54,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.37 | bwd_microstep: 1152.00 | bwd_inner_microstep: 1151.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 12:29:56,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.83 | bwd_microstep: 1299.17 | bwd_inner_microstep: 1299.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 12:29:57,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1252.80 | bwd_inner_microstep: 1252.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 12:30:00,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.09 | bwd_microstep: 1618.77 | bwd_inner_microstep: 1618.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 12:30:01,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 12:30:04,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.67 | bwd_microstep: 1500.32 | bwd_inner_microstep: 1500.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 12:30:05,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1374.33 | bwd_inner_microstep: 1374.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 12:30:08,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.83 | bwd_microstep: 1609.65 | bwd_inner_microstep: 1609.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 12:30:10,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.13 | bwd_microstep: 1647.35 | bwd_inner_microstep: 1647.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-10 12:30:12,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3837
[2024-06-10 12:30:14,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.25 | bwd_microstep: 1689.67 | bwd_inner_microstep: 1689.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 12:30:16,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.30 | bwd_microstep: 1656.49 | bwd_inner_microstep: 1656.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 12:30:18,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1289.71 | bwd_inner_microstep: 1289.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 12:30:20,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1608.76 | bwd_inner_microstep: 1608.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 12:30:22,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.37 | bwd_microstep: 1457.17 | bwd_inner_microstep: 1457.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 12:30:24,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1298.45 | bwd_inner_microstep: 1298.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 12:30:26,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.36 | bwd_microstep: 1300.01 | bwd_inner_microstep: 1299.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3648
[2024-06-10 12:30:28,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.56 | bwd_microstep: 1445.12 | bwd_inner_microstep: 1445.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 12:30:30,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1282.53 | bwd_inner_microstep: 1282.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2281
[2024-06-10 12:30:31,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.98 | bwd_microstep: 791.69 | bwd_inner_microstep: 791.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-10 12:30:32,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.73 | bwd_microstep: 813.19 | bwd_inner_microstep: 813.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 12:30:34,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1396.00 | bwd_inner_microstep: 1395.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 12:30:36,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1411.35 | bwd_inner_microstep: 1411.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3578
[2024-06-10 12:30:38,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.16 | optimizer_step: 6.64
[2024-06-10 12:30:38,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.56 | bwd_microstep: 1405.72 | bwd_inner_microstep: 1398.11 | bwd_allreduce_microstep: 7.57 | step_microstep: 37.66
[2024-06-10 12:30:38,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16717.23 | bwd: 44707.21 | bwd_inner: 44698.74 | bwd_allreduce: 7.79 | step: 39.18
{'loss': 1.2716, 'learning_rate': 2.74364167620189e-05, 'epoch': 0.4}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2382
[2024-06-10 12:30:39,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.49 | bwd_microstep: 1027.10 | bwd_inner_microstep: 1027.00 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1946
[2024-06-10 12:30:40,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.72 | bwd_microstep: 824.89 | bwd_inner_microstep: 824.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 12:30:42,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1356.18 | bwd_inner_microstep: 1356.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 12:30:45,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.96 | bwd_microstep: 1648.83 | bwd_inner_microstep: 1648.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-10 12:30:46,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.16 | bwd_microstep: 678.86 | bwd_inner_microstep: 678.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 12:30:47,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1391.58 | bwd_inner_microstep: 1391.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 12:30:50,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.97 | bwd_microstep: 1541.03 | bwd_inner_microstep: 1541.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-10 12:30:51,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.37 | bwd_microstep: 685.27 | bwd_inner_microstep: 685.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 12:30:52,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.17 | bwd_microstep: 804.84 | bwd_inner_microstep: 804.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 12:30:53,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1289.78 | bwd_inner_microstep: 1289.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3894
[2024-06-10 12:30:56,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.98 | bwd_microstep: 1787.81 | bwd_inner_microstep: 1787.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 12:30:58,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3717
[2024-06-10 12:31:00,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.45 | bwd_microstep: 1394.24 | bwd_inner_microstep: 1394.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3490
[2024-06-10 12:31:02,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.01 | bwd_microstep: 1548.49 | bwd_inner_microstep: 1548.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 12:31:04,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1407.77 | bwd_inner_microstep: 1407.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 12:31:06,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1402.64 | bwd_inner_microstep: 1402.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3002
[2024-06-10 12:31:07,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.75 | bwd_microstep: 1204.81 | bwd_inner_microstep: 1204.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-10 12:31:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.47 | bwd_microstep: 1406.73 | bwd_inner_microstep: 1406.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662
[2024-06-10 12:31:11,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.09 | bwd_microstep: 1418.73 | bwd_inner_microstep: 1418.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 12:31:14,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.80 | bwd_microstep: 1648.96 | bwd_inner_microstep: 1648.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 12:31:16,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1602.54 | bwd_inner_microstep: 1602.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 12:31:18,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1391.22 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 12:31:20,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1657.82 | bwd_inner_microstep: 1657.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717
[2024-06-10 12:31:22,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.42 | bwd_microstep: 1337.95 | bwd_inner_microstep: 1337.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 12:31:24,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1258.46 | bwd_inner_microstep: 1258.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2143
[2024-06-10 12:31:25,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.98 | bwd_microstep: 865.82 | bwd_inner_microstep: 865.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 12:31:27,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1557.73 | bwd_inner_microstep: 1557.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2278
[2024-06-10 12:31:28,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.89 | bwd_microstep: 937.63 | bwd_inner_microstep: 937.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3632
[2024-06-10 12:31:30,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.69 | bwd_microstep: 1474.27 | bwd_inner_microstep: 1474.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-10 12:31:32,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.35 | bwd_microstep: 962.79 | bwd_inner_microstep: 962.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3576
[2024-06-10 12:31:34,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.95 | bwd_microstep: 1550.19 | bwd_inner_microstep: 1550.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-10 12:31:39,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.30 | optimizer_step: 6.62
[2024-06-10 12:31:39,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 4793.76 | bwd_inner_microstep: 1523.46 | bwd_allreduce_microstep: 3270.23 | step_microstep: 39.05
[2024-06-10 12:31:39,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15612.05 | bwd: 45243.61 | bwd_inner: 41972.38 | bwd_allreduce: 3270.51 | step: 40.54
{'loss': 1.249, 'learning_rate': 2.7401560740357546e-05, 'epoch': 0.4}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 12:31:41,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.22 | bwd_microstep: 1430.48 | bwd_inner_microstep: 1430.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 12:31:43,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.61 | bwd_microstep: 1412.42 | bwd_inner_microstep: 1412.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 12:31:45,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.42 | bwd_microstep: 1547.60 | bwd_inner_microstep: 1547.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 12:31:47,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.71 | bwd_microstep: 1341.67 | bwd_inner_microstep: 1341.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 12:31:49,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1385.08 | bwd_inner_microstep: 1385.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 12:31:51,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1384.37 | bwd_inner_microstep: 1384.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 12:31:53,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.63 | bwd_microstep: 1383.67 | bwd_inner_microstep: 1383.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 12:31:54,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.88 | bwd_microstep: 796.77 | bwd_inner_microstep: 796.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 12:31:55,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.19 | bwd_microstep: 1155.12 | bwd_inner_microstep: 1155.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 12:31:57,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1384.81 | bwd_inner_microstep: 1384.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 12:31:59,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1311.10 | bwd_inner_microstep: 1311.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3537
[2024-06-10 12:32:01,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.08 | bwd_microstep: 1256.40 | bwd_inner_microstep: 1256.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 12:32:03,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.98 | bwd_microstep: 1477.43 | bwd_inner_microstep: 1477.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1985
[2024-06-10 12:32:04,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.33 | bwd_microstep: 827.51 | bwd_inner_microstep: 827.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 12:32:06,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.60 | bwd_microstep: 1419.82 | bwd_inner_microstep: 1419.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3514
[2024-06-10 12:32:08,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.14 | bwd_microstep: 1651.08 | bwd_inner_microstep: 1651.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 12:32:10,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.57 | bwd_microstep: 1161.00 | bwd_inner_microstep: 1160.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1969
[2024-06-10 12:32:11,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.76 | bwd_microstep: 747.91 | bwd_inner_microstep: 747.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 12:32:13,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.45 | bwd_microstep: 1488.77 | bwd_inner_microstep: 1488.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 12:32:14,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 810.67 | bwd_inner_microstep: 810.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3304
[2024-06-10 12:32:16,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.75 | bwd_microstep: 1194.53 | bwd_inner_microstep: 1194.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 12:32:18,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1374.75 | bwd_inner_microstep: 1374.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2003
[2024-06-10 12:32:19,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.10 | bwd_microstep: 839.17 | bwd_inner_microstep: 839.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3541
[2024-06-10 12:32:21,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.51 | bwd_microstep: 1520.99 | bwd_inner_microstep: 1520.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 12:32:23,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.17 | bwd_microstep: 1617.41 | bwd_inner_microstep: 1617.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 12:32:25,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.57 | bwd_microstep: 1404.27 | bwd_inner_microstep: 1404.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526
[2024-06-10 12:32:27,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1453.61 | bwd_inner_microstep: 1453.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3594
[2024-06-10 12:32:30,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.56 | bwd_microstep: 1769.82 | bwd_inner_microstep: 1769.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 12:32:32,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.75 | bwd_microstep: 1484.77 | bwd_inner_microstep: 1484.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3311
[2024-06-10 12:32:34,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.46 | bwd_microstep: 1386.76 | bwd_inner_microstep: 1386.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3582
[2024-06-10 12:32:36,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.40 | bwd_microstep: 1524.99 | bwd_inner_microstep: 1524.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570
[2024-06-10 12:32:38,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.67
[2024-06-10 12:32:38,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1746.12 | bwd_inner_microstep: 1738.44 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.72
[2024-06-10 12:32:38,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15928.58 | bwd: 42690.87 | bwd_inner: 42682.34 | bwd_allreduce: 7.86 | step: 39.19
██▉      | 682/1726 [11:49:11<17:47:47, 61.37s/it]
 40%|███▉      | 683/1726 [11:50:11<17:37:18, 60.82s/it]


 40%|███▉      | 683/1726 [11:50:11<17:37:18, 60.82s/it]
 40%|███▉      | 684/1726 [11:51:13<17:42:41, 61.19s/it]


 40%|███▉      | 684/1726 [11:51:13<17:42:41, 61.19s/it]
 40%|███▉      | 685/1726 [11:52:13<17:36:36, 60.90s/it]


 40%|███▉      | 685/1726 [11:52:13<17:36:36, 60.90s/it]
 40%|███▉      | 686/1726 [11:53:15<17:40:04, 61.16s/it]


 40%|███▉      | 686/1726 [11:53:15<17:40:04, 61.16s/it]
 40%|███▉      | 687/1726 [11:54:16<17:39:12, 61.17s/it]


 40%|███▉      | 687/1726 [11:54:16<17:39:12, 61.17s/it]
 40%|███▉      | 68{'loss': 1.2463, 'learning_rate': 2.736667865043775e-05, 'epoch': 0.4}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4878
[2024-06-10 12:32:41,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 720.49 | bwd_microstep: 1925.27 | bwd_inner_microstep: 1925.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 12:32:42,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.45 | bwd_microstep: 790.04 | bwd_inner_microstep: 790.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 12:32:44,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1550.24 | bwd_inner_microstep: 1550.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 12:32:46,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.98 | bwd_microstep: 1546.80 | bwd_inner_microstep: 1546.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1969
[2024-06-10 12:32:47,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.91 | bwd_microstep: 764.46 | bwd_inner_microstep: 764.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 12:32:49,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.06 | bwd_microstep: 1478.03 | bwd_inner_microstep: 1478.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2093
[2024-06-10 12:32:50,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.98 | bwd_microstep: 791.95 | bwd_inner_microstep: 791.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 12:32:52,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.15 | bwd_microstep: 1285.24 | bwd_inner_microstep: 1285.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 12:32:54,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.66 | bwd_microstep: 1286.85 | bwd_inner_microstep: 1286.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 12:32:56,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.15 | bwd_microstep: 1388.22 | bwd_inner_microstep: 1388.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2462
[2024-06-10 12:32:57,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.18 | bwd_microstep: 951.90 | bwd_inner_microstep: 951.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 12:32:58,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.68 | bwd_microstep: 795.38 | bwd_inner_microstep: 795.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 12:33:00,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.06 | bwd_microstep: 1292.49 | bwd_inner_microstep: 1292.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1959
[2024-06-10 12:33:01,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.09 | bwd_microstep: 892.76 | bwd_inner_microstep: 892.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628
[2024-06-10 12:33:03,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 1543.25 | bwd_inner_microstep: 1543.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3467
[2024-06-10 12:33:05,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.56 | bwd_microstep: 1548.18 | bwd_inner_microstep: 1548.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 12:33:07,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.94 | bwd_microstep: 788.58 | bwd_inner_microstep: 788.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 12:33:08,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 12:33:10,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1449.54 | bwd_inner_microstep: 1449.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 12:33:12,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1283.53 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3524
[2024-06-10 12:33:14,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1230.69 | bwd_inner_microstep: 1230.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 12:33:15,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.62 | bwd_microstep: 800.15 | bwd_inner_microstep: 800.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 12:33:17,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1399.54 | bwd_inner_microstep: 1399.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3387
[2024-06-10 12:33:19,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.12 | bwd_microstep: 1177.15 | bwd_inner_microstep: 1177.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2279
[2024-06-10 12:33:20,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.54 | bwd_microstep: 923.71 | bwd_inner_microstep: 923.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754
[2024-06-10 12:33:22,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1534.44 | bwd_inner_microstep: 1534.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 12:33:24,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1498.74 | bwd_inner_microstep: 1498.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-10 12:33:27,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.17 | bwd_microstep: 1752.07 | bwd_inner_microstep: 1752.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596
[2024-06-10 12:33:28,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.44 | bwd_microstep: 1432.49 | bwd_inner_microstep: 1432.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 12:33:31,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.01 | bwd_microstep: 1644.61 | bwd_inner_microstep: 1644.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 12:33:33,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.75 | bwd_microstep: 1407.68 | bwd_inner_microstep: 1407.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 12:33:38,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 12:33:38,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 4373.97 | bwd_inner_microstep: 1698.63 | bwd_allreduce_microstep: 2675.29 | step_microstep: 37.97
[2024-06-10 12:33:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15380.58 | bwd: 43916.05 | bwd_inner: 41239.86 | bwd_allreduce: 2675.52 | step: 39.44
{'loss': 1.2941, 'learning_rate': 2.7331770615114038e-05, 'epoch': 0.4}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-10 12:33:40,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1435.66 | bwd_inner_microstep: 1435.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4477
[2024-06-10 12:33:42,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.84 | bwd_microstep: 1625.62 | bwd_inner_microstep: 1625.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 12:33:44,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.33 | bwd_microstep: 1483.82 | bwd_inner_microstep: 1483.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1880
[2024-06-10 12:33:45,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.02 | bwd_microstep: 709.78 | bwd_inner_microstep: 709.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-10 12:33:46,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.73 | bwd_microstep: 683.56 | bwd_inner_microstep: 683.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 12:33:48,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.23 | bwd_inner_microstep: 1283.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 12:33:50,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.56 | bwd_microstep: 1630.29 | bwd_inner_microstep: 1630.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3757
[2024-06-10 12:33:52,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1402.18 | bwd_inner_microstep: 1402.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 12:33:54,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1254.05 | bwd_inner_microstep: 1254.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 12:33:56,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1410.92 | bwd_inner_microstep: 1410.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2976
[2024-06-10 12:33:57,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.36 | bwd_microstep: 1166.67 | bwd_inner_microstep: 1166.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656
[2024-06-10 12:34:00,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.87 | bwd_microstep: 1717.63 | bwd_inner_microstep: 1717.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 12:34:01,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.68 | bwd_microstep: 1242.31 | bwd_inner_microstep: 1242.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 12:34:03,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1279.27 | bwd_inner_microstep: 1279.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 12:34:05,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1578.73 | bwd_inner_microstep: 1578.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 12:34:07,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 12:34:09,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1348.40 | bwd_inner_microstep: 1348.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2081
[2024-06-10 12:34:10,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.66 | bwd_microstep: 852.69 | bwd_inner_microstep: 852.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1980
[2024-06-10 12:34:11,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 736.35 | bwd_inner_microstep: 736.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-10 12:34:13,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.59 | bwd_microstep: 1162.27 | bwd_inner_microstep: 1162.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 12:34:15,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1398.21 | bwd_inner_microstep: 1398.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3469
[2024-06-10 12:34:16,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1235.39 | bwd_inner_microstep: 1235.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3543
[2024-06-10 12:34:19,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.19 | bwd_microstep: 1642.36 | bwd_inner_microstep: 1642.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-10 12:34:21,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1415.87 | bwd_inner_microstep: 1415.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 12:34:23,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.66 | bwd_microstep: 1545.31 | bwd_inner_microstep: 1545.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 12:34:25,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1446.66 | bwd_inner_microstep: 1446.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-10 12:34:27,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1419.93 | bwd_inner_microstep: 1419.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 12:34:29,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.08 | bwd_microstep: 1651.42 | bwd_inner_microstep: 1651.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 12:34:31,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.93 | bwd_microstep: 1637.48 | bwd_inner_microstep: 1637.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-10 12:34:32,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.38 | bwd_microstep: 819.41 | bwd_inner_microstep: 819.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-10 12:34:35,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.43 | bwd_microstep: 1716.81 | bwd_inner_microstep: 1716.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-10 12:34:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 12:34:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 3640.22 | bwd_inner_microstep: 1637.17 | bwd_allreduce_microstep: 2003.00 | step_microstep: 38.10
[2024-06-10 12:34:39,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15972.79 | bwd: 44964.85 | bwd_inner: 42960.95 | bwd_allreduce: 2003.22 | step: 39.58
{'loss': 1.2444, 'learning_rate': 2.729683675733233e-05, 'epoch': 0.4}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3451
[2024-06-10 12:34:41,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.03 | bwd_microstep: 1507.60 | bwd_inner_microstep: 1507.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4131
[2024-06-10 12:34:43,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.01 | bwd_microstep: 1566.41 | bwd_inner_microstep: 1566.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1850
[2024-06-10 12:34:44,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.11 | bwd_microstep: 733.03 | bwd_inner_microstep: 733.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 12:34:46,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1390.46 | bwd_inner_microstep: 1390.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 12:34:48,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 12:34:50,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1377.88 | bwd_inner_microstep: 1377.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638
[2024-06-10 12:34:51,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.29 | bwd_microstep: 1017.65 | bwd_inner_microstep: 1017.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 12:34:52,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 797.76 | bwd_inner_microstep: 797.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-10 12:34:54,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1400.62 | bwd_inner_microstep: 1400.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 12:34:56,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.21 | bwd_microstep: 1613.52 | bwd_inner_microstep: 1613.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2721
[2024-06-10 12:34:58,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.32 | bwd_microstep: 1037.93 | bwd_inner_microstep: 1037.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3492
[2024-06-10 12:35:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476
[2024-06-10 12:35:01,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.54 | bwd_microstep: 1214.61 | bwd_inner_microstep: 1214.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3640
[2024-06-10 12:35:03,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1444.30 | bwd_inner_microstep: 1444.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2504
[2024-06-10 12:35:05,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.59 | bwd_microstep: 958.89 | bwd_inner_microstep: 958.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978
[2024-06-10 12:35:06,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.56 | bwd_microstep: 734.25 | bwd_inner_microstep: 734.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 12:35:08,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.51 | bwd_microstep: 1393.93 | bwd_inner_microstep: 1393.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-10 12:35:09,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.05 | bwd_microstep: 975.57 | bwd_inner_microstep: 975.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 12:35:11,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.33 | bwd_microstep: 1458.58 | bwd_inner_microstep: 1458.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 12:35:13,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1384.37 | bwd_inner_microstep: 1384.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2583
[2024-06-10 12:35:14,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.35 | bwd_microstep: 1070.17 | bwd_inner_microstep: 1070.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 12:35:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.47 | bwd_microstep: 877.41 | bwd_inner_microstep: 877.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1965
[2024-06-10 12:35:17,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.07 | bwd_microstep: 701.82 | bwd_inner_microstep: 701.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 12:35:19,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.22 | bwd_microstep: 1504.92 | bwd_inner_microstep: 1504.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3542
[2024-06-10 12:35:20,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.41 | bwd_microstep: 1196.86 | bwd_inner_microstep: 1196.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 12:35:23,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.96 | bwd_microstep: 1660.01 | bwd_inner_microstep: 1659.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3690
[2024-06-10 12:35:25,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.36 | bwd_microstep: 1450.98 | bwd_inner_microstep: 1450.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3416
[2024-06-10 12:35:26,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.39 | bwd_microstep: 1197.80 | bwd_inner_microstep: 1197.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-10 12:35:28,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.59 | bwd_microstep: 1588.50 | bwd_inner_microstep: 1588.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3401
[2024-06-10 12:35:30,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1387.16 | bwd_inner_microstep: 1387.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807
[2024-06-10 12:35:33,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.67 | bwd_microstep: 1749.85 | bwd_inner_microstep: 1749.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-10 12:35:38,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 12:35:38,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.46 | bwd_microstep: 5150.09 | bwd_inner_microstep: 1214.10 | bwd_allreduce_microstep: 3935.93 | step_microstep: 37.84
[2024-06-10 12:35:38,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14983.56 | bwd: 44034.80 | bwd_inner: 40097.97 | bwd_allreduce: 3936.16 | step: 39.31
{'loss': 1.2767, 'learning_rate': 2.7261877200129495e-05, 'epoch': 0.4}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 12:35:40,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.30 | bwd_microstep: 1468.99 | bwd_inner_microstep: 1468.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 12:35:41,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.99 | bwd_microstep: 790.96 | bwd_inner_microstep: 790.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3822
[2024-06-10 12:35:44,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.20 | bwd_microstep: 1612.20 | bwd_inner_microstep: 1612.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 12:35:46,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1482.27 | bwd_inner_microstep: 1482.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 12:35:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.86 | bwd_microstep: 679.12 | bwd_inner_microstep: 679.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-10 12:35:48,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.17 | bwd_microstep: 800.87 | bwd_inner_microstep: 800.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 12:35:49,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.35 | bwd_microstep: 1241.56 | bwd_inner_microstep: 1241.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 12:35:51,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.37 | bwd_microstep: 1157.27 | bwd_inner_microstep: 1157.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1898
[2024-06-10 12:35:52,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.41 | bwd_microstep: 746.55 | bwd_inner_microstep: 746.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 12:35:54,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.33 | bwd_microstep: 1483.60 | bwd_inner_microstep: 1483.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3682
[2024-06-10 12:35:56,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.61 | bwd_microstep: 1335.42 | bwd_inner_microstep: 1335.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 12:35:58,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.41 | bwd_microstep: 1345.41 | bwd_inner_microstep: 1345.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2516
[2024-06-10 12:35:59,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.10 | bwd_microstep: 964.51 | bwd_inner_microstep: 964.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 12:36:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.48 | bwd_microstep: 1346.81 | bwd_inner_microstep: 1346.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2017
[2024-06-10 12:36:02,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.01 | bwd_microstep: 900.22 | bwd_inner_microstep: 900.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2892
[2024-06-10 12:36:04,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.64 | bwd_microstep: 1184.70 | bwd_inner_microstep: 1184.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2923
[2024-06-10 12:36:06,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.70 | bwd_microstep: 1140.73 | bwd_inner_microstep: 1140.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 12:36:08,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.12 | bwd_microstep: 1522.13 | bwd_inner_microstep: 1522.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3689
[2024-06-10 12:36:10,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.73 | bwd_microstep: 1487.10 | bwd_inner_microstep: 1487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2043
[2024-06-10 12:36:11,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.74 | bwd_microstep: 747.95 | bwd_inner_microstep: 747.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009
[2024-06-10 12:36:12,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.76 | bwd_microstep: 739.52 | bwd_inner_microstep: 739.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 12:36:14,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.15 | bwd_microstep: 1389.64 | bwd_inner_microstep: 1389.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 12:36:16,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1556.45 | bwd_inner_microstep: 1556.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 12:36:17,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.70 | bwd_microstep: 811.14 | bwd_inner_microstep: 811.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 12:36:19,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1252.99 | bwd_inner_microstep: 1252.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 12:36:21,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1500.85 | bwd_inner_microstep: 1500.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 12:36:23,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.33 | bwd_microstep: 1288.45 | bwd_inner_microstep: 1288.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 12:36:25,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.74 | bwd_microstep: 1545.92 | bwd_inner_microstep: 1545.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 12:36:27,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.61 | bwd_microstep: 1500.05 | bwd_inner_microstep: 1500.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3393
[2024-06-10 12:36:28,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.52 | bwd_microstep: 1274.91 | bwd_inner_microstep: 1274.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 12:36:30,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1409.67 | bwd_inner_microstep: 1409.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380
[2024-06-10 12:36:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.56
[2024-06-10 12:36:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 8801.31 | bwd_inner_microstep: 1519.42 | bwd_allreduce_microstep: 7281.84 | step_microstep: 38.00
[2024-06-10 12:36:40,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14646.76 | bwd: 46509.27 | bwd_inner: 39226.52 | bwd_allreduce: 7282.08 | step: 39.47
{'loss': 1.2471, 'learning_rate': 2.722689206663291e-05, 'epoch': 0.4}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 12:36:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1371.63 | bwd_inner_microstep: 1371.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 12:36:44,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.09 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2432
[2024-06-10 12:36:45,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.71 | bwd_microstep: 938.33 | bwd_inner_microstep: 938.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 12:36:47,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.10 | bwd_microstep: 1545.53 | bwd_inner_microstep: 1545.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 12:36:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1398.89 | bwd_inner_microstep: 1398.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 12:36:51,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1379.43 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 12:36:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1525.44 | bwd_inner_microstep: 1525.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3602
[2024-06-10 12:36:55,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.19 | bwd_microstep: 1212.02 | bwd_inner_microstep: 1212.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 12:36:56,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1250.49 | bwd_inner_microstep: 1250.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 12:36:58,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1476.54 | bwd_inner_microstep: 1476.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 12:37:00,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1347.56 | bwd_inner_microstep: 1347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3417
[2024-06-10 12:37:02,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1388.66 | bwd_inner_microstep: 1388.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 12:37:04,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.17 | bwd_microstep: 1480.58 | bwd_inner_microstep: 1480.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 12:37:06,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.37 | bwd_microstep: 1438.11 | bwd_inner_microstep: 1438.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 12:37:08,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 12:37:09,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.49 | bwd_microstep: 914.06 | bwd_inner_microstep: 914.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3661
[2024-06-10 12:37:11,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.34 | bwd_microstep: 1323.00 | bwd_inner_microstep: 1322.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3829
[2024-06-10 12:37:13,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.63 | bwd_microstep: 1389.37 | bwd_inner_microstep: 1389.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 12:37:15,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1296.92 | bwd_inner_microstep: 1296.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 12:37:17,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 12:37:18,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.03 | bwd_microstep: 1283.22 | bwd_inner_microstep: 1283.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 12:37:20,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 12:37:22,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1391.46 | bwd_inner_microstep: 1391.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2146
[2024-06-10 12:37:24,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.50 | bwd_microstep: 946.72 | bwd_inner_microstep: 946.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 12:37:26,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1451.83 | bwd_inner_microstep: 1451.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603
[2024-06-10 12:37:28,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1437.47 | bwd_inner_microstep: 1437.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 12:37:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.71 | bwd_microstep: 1653.47 | bwd_inner_microstep: 1653.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-10 12:37:32,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.45 | bwd_microstep: 1411.67 | bwd_inner_microstep: 1411.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 12:37:34,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.06 | bwd_microstep: 1485.69 | bwd_inner_microstep: 1485.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3590
[2024-06-10 12:37:36,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.73 | bwd_microstep: 1463.92 | bwd_inner_microstep: 1463.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 12:37:38,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.05 | bwd_microstep: 1540.60 | bwd_inner_microstep: 1540.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 12:37:40,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.64
[2024-06-10 12:37:40,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.87 | bwd_microstep: 1501.16 | bwd_inner_microstep: 1493.39 | bwd_allreduce_microstep: 7.72 | step_microstep: 37.74
[2024-06-10 12:37:40,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16330.62 | bwd: 43617.12 | bwd_inner: 43608.50 | bwd_allreduce: 7.94 | step: 39.27
8/1726 [11:55:15<17:26:42, 60.50s/it]


 40%|███▉      | 688/1726 [11:55:15<17:26:42, 60.50s/it]
 40%|███▉      | 689/1726 [11:56:14<17:21:08, 60.24s/it]


 40%|███▉      | 689/1726 [11:56:14<17:21:08, 60.24s/it]
 40%|███▉      | 690/1726 [11:57:16<17:25:31, 60.55s/it]


 40%|███▉      | 690/1726 [11:57:16<17:25:31, 60.55s/it]
 40%|████      | 691/1726 [11:58:15<17:18:14, 60.19s/it]


 40%|████      | 691/1726 [11:58:15<17:18:14, 60.19s/it]
 40%|████      | 692/1726 [11:59:17<17:23:53, 60.57s/it]


 40%|████      | 692/1726 [11:59:17<17:23:53, 60.57s/it]
 40%|████      | 693/1726 [12:00:17<17:21:24, 60.49s/it]
                                              {'loss': 1.2422, 'learning_rate': 2.7191881480060044e-05, 'epoch': 0.4}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459
[2024-06-10 12:37:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.88 | bwd_microstep: 1330.76 | bwd_inner_microstep: 1330.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4547
[2024-06-10 12:37:44,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 685.26 | bwd_microstep: 1846.03 | bwd_inner_microstep: 1846.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3925
[2024-06-10 12:37:46,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1487.62 | bwd_inner_microstep: 1487.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 12:37:49,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.44 | bwd_microstep: 1653.40 | bwd_inner_microstep: 1653.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 12:37:51,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.49 | bwd_microstep: 1548.22 | bwd_inner_microstep: 1548.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 12:37:53,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.71 | bwd_microstep: 1647.27 | bwd_inner_microstep: 1647.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3464
[2024-06-10 12:37:55,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.47 | bwd_microstep: 1211.77 | bwd_inner_microstep: 1211.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864
[2024-06-10 12:37:56,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 706.73 | bwd_inner_microstep: 706.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-10 12:37:58,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.74 | bwd_microstep: 1215.26 | bwd_inner_microstep: 1215.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 12:37:59,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 12:38:01,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.82 | bwd_microstep: 1279.85 | bwd_inner_microstep: 1279.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 12:38:03,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.74 | bwd_microstep: 1373.92 | bwd_inner_microstep: 1373.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 12:38:05,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1485.32 | bwd_inner_microstep: 1485.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 12:38:07,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3465
[2024-06-10 12:38:09,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1421.26 | bwd_inner_microstep: 1421.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 12:38:11,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1297.69 | bwd_inner_microstep: 1297.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 12:38:13,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.45 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3450
[2024-06-10 12:38:14,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.56 | bwd_microstep: 1312.80 | bwd_inner_microstep: 1312.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 12:38:16,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1459.39 | bwd_inner_microstep: 1459.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 12:38:19,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1551.88 | bwd_inner_microstep: 1551.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3455
[2024-06-10 12:38:20,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.40 | bwd_microstep: 1219.03 | bwd_inner_microstep: 1219.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 12:38:21,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.07 | bwd_microstep: 797.93 | bwd_inner_microstep: 797.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 12:38:23,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1286.92 | bwd_inner_microstep: 1286.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 945
[2024-06-10 12:38:24,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.55 | bwd_microstep: 379.53 | bwd_inner_microstep: 379.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 12:38:26,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.89 | bwd_microstep: 1458.40 | bwd_inner_microstep: 1458.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2828
[2024-06-10 12:38:27,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.47 | bwd_microstep: 1188.38 | bwd_inner_microstep: 1188.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2237
[2024-06-10 12:38:29,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.03 | bwd_microstep: 962.90 | bwd_inner_microstep: 962.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3777
[2024-06-10 12:38:31,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.49 | bwd_microstep: 1743.51 | bwd_inner_microstep: 1743.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 12:38:33,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1394.73 | bwd_inner_microstep: 1394.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 12:38:35,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.03 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3552
[2024-06-10 12:38:37,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.31 | bwd_microstep: 1520.29 | bwd_inner_microstep: 1520.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 12:38:41,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 12:38:41,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.72 | bwd_microstep: 3844.70 | bwd_inner_microstep: 1659.62 | bwd_allreduce_microstep: 2185.03 | step_microstep: 38.28
[2024-06-10 12:38:41,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15979.52 | bwd: 44974.18 | bwd_inner: 42788.25 | bwd_allreduce: 2185.25 | step: 39.77
{'loss': 1.2767, 'learning_rate': 2.7156845563717987e-05, 'epoch': 0.4}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 12:38:43,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.87 | bwd_microstep: 1326.50 | bwd_inner_microstep: 1326.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893
[2024-06-10 12:38:45,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.90 | bwd_microstep: 1684.15 | bwd_inner_microstep: 1684.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 12:38:48,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.04 | bwd_microstep: 1645.53 | bwd_inner_microstep: 1645.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 12:38:49,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1192.04 | bwd_inner_microstep: 1192.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 12:38:51,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.32 | bwd_microstep: 1246.06 | bwd_inner_microstep: 1246.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3542
[2024-06-10 12:38:53,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.24 | bwd_microstep: 1354.72 | bwd_inner_microstep: 1354.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734
[2024-06-10 12:38:55,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1532.46 | bwd_inner_microstep: 1532.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 12:38:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1250.60 | bwd_inner_microstep: 1250.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 12:38:59,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1247.31 | bwd_inner_microstep: 1247.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 12:39:00,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.35 | bwd_microstep: 1240.78 | bwd_inner_microstep: 1240.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489
[2024-06-10 12:39:02,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1434.79 | bwd_inner_microstep: 1434.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 12:39:04,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1525.23 | bwd_inner_microstep: 1525.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 12:39:06,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.20 | bwd_microstep: 1385.83 | bwd_inner_microstep: 1385.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671
[2024-06-10 12:39:08,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.51 | bwd_microstep: 1552.61 | bwd_inner_microstep: 1552.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975
[2024-06-10 12:39:09,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 703.35 | bwd_inner_microstep: 703.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 12:39:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.34 | bwd_microstep: 792.08 | bwd_inner_microstep: 792.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 12:39:12,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1285.38 | bwd_inner_microstep: 1285.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 12:39:14,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1508.87 | bwd_inner_microstep: 1508.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2125
[2024-06-10 12:39:16,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.83 | bwd_microstep: 930.17 | bwd_inner_microstep: 930.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 12:39:18,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.03 | bwd_microstep: 1391.12 | bwd_inner_microstep: 1391.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-10 12:39:20,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1537.67 | bwd_inner_microstep: 1537.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-10 12:39:22,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 12:39:24,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.22 | bwd_microstep: 1281.88 | bwd_inner_microstep: 1281.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3666
[2024-06-10 12:39:26,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1386.05 | bwd_inner_microstep: 1386.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2908
[2024-06-10 12:39:27,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.40 | bwd_microstep: 1190.75 | bwd_inner_microstep: 1190.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3722
[2024-06-10 12:39:30,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.78 | bwd_microstep: 1701.23 | bwd_inner_microstep: 1701.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 12:39:32,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.63 | bwd_microstep: 1590.12 | bwd_inner_microstep: 1590.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 12:39:34,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.12 | bwd_microstep: 1399.57 | bwd_inner_microstep: 1399.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 12:39:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 1591.96 | bwd_inner_microstep: 1591.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3712
[2024-06-10 12:39:38,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.40 | bwd_microstep: 1728.01 | bwd_inner_microstep: 1727.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-10 12:39:40,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1442.68 | bwd_inner_microstep: 1442.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019
[2024-06-10 12:39:41,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.15 | optimizer_step: 6.57
[2024-06-10 12:39:41,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.21 | bwd_microstep: 880.06 | bwd_inner_microstep: 870.70 | bwd_allreduce_microstep: 9.31 | step_microstep: 37.69
[2024-06-10 12:39:41,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16243.62 | bwd: 43564.11 | bwd_inner: 43553.90 | bwd_allreduce: 9.53 | step: 39.14
{'loss': 1.2746, 'learning_rate': 2.7121784441003064e-05, 'epoch': 0.4}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1850
[2024-06-10 12:39:43,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.67 | bwd_microstep: 734.53 | bwd_inner_microstep: 734.44 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4065
[2024-06-10 12:39:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.97 | bwd_microstep: 1423.30 | bwd_inner_microstep: 1423.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 12:39:46,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1383.42 | bwd_inner_microstep: 1383.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 12:39:49,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.56 | bwd_microstep: 1652.24 | bwd_inner_microstep: 1652.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 12:39:51,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 1539.28 | bwd_inner_microstep: 1539.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302
[2024-06-10 12:39:52,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.74 | bwd_microstep: 977.46 | bwd_inner_microstep: 977.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4170
[2024-06-10 12:39:54,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.13 | bwd_microstep: 1551.65 | bwd_inner_microstep: 1551.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2386
[2024-06-10 12:39:56,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.98 | bwd_microstep: 933.70 | bwd_inner_microstep: 933.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2156
[2024-06-10 12:39:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.38 | bwd_microstep: 945.69 | bwd_inner_microstep: 945.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-10 12:39:58,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.39 | bwd_microstep: 819.87 | bwd_inner_microstep: 819.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3572
[2024-06-10 12:40:00,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.22 | bwd_microstep: 1235.13 | bwd_inner_microstep: 1235.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3681
[2024-06-10 12:40:02,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.23 | bwd_microstep: 1359.65 | bwd_inner_microstep: 1359.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3503
[2024-06-10 12:40:03,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.19 | bwd_microstep: 1251.33 | bwd_inner_microstep: 1251.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664
[2024-06-10 12:40:06,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.33 | bwd_microstep: 1717.21 | bwd_inner_microstep: 1717.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2446
[2024-06-10 12:40:07,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.53 | bwd_microstep: 948.30 | bwd_inner_microstep: 948.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 12:40:09,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1485.68 | bwd_inner_microstep: 1485.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 12:40:11,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1426.71 | bwd_inner_microstep: 1426.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-10 12:40:13,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.21 | bwd_microstep: 1193.32 | bwd_inner_microstep: 1193.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695
[2024-06-10 12:40:15,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.52 | bwd_microstep: 1461.07 | bwd_inner_microstep: 1461.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 12:40:17,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1294.09 | bwd_inner_microstep: 1294.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 12:40:18,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.46 | bwd_microstep: 1184.28 | bwd_inner_microstep: 1184.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1943
[2024-06-10 12:40:19,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.91 | bwd_microstep: 746.13 | bwd_inner_microstep: 746.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3822
[2024-06-10 12:40:21,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.16 | bwd_microstep: 1617.21 | bwd_inner_microstep: 1617.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2091
[2024-06-10 12:40:23,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.62 | bwd_microstep: 788.03 | bwd_inner_microstep: 788.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3632
[2024-06-10 12:40:25,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.81 | bwd_microstep: 1567.62 | bwd_inner_microstep: 1567.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3837
[2024-06-10 12:40:27,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.59 | bwd_microstep: 1520.61 | bwd_inner_microstep: 1520.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 12:40:28,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.63 | bwd_microstep: 876.40 | bwd_inner_microstep: 876.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 12:40:30,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1254.00 | bwd_inner_microstep: 1253.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3754
[2024-06-10 12:40:32,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1372.92 | bwd_inner_microstep: 1372.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3766
[2024-06-10 12:40:34,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1376.13 | bwd_inner_microstep: 1376.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 12:40:36,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.96 | bwd_microstep: 1450.11 | bwd_inner_microstep: 1450.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3435
[2024-06-10 12:40:43,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 12:40:43,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.16 | bwd_microstep: 6609.65 | bwd_inner_microstep: 1358.10 | bwd_allreduce_microstep: 5251.50 | step_microstep: 37.92
[2024-06-10 12:40:43,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15149.43 | bwd: 45696.74 | bwd_inner: 40444.26 | bwd_allreduce: 5251.77 | step: 39.42
{'loss': 1.2382, 'learning_rate': 2.7086698235400353e-05, 'epoch': 0.4}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 12:40:45,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.27 | bwd_microstep: 1467.02 | bwd_inner_microstep: 1466.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2442
[2024-06-10 12:40:46,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.25 | bwd_microstep: 973.62 | bwd_inner_microstep: 973.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3484
[2024-06-10 12:40:48,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.33 | bwd_microstep: 1447.13 | bwd_inner_microstep: 1447.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 12:40:50,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1387.99 | bwd_inner_microstep: 1387.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 12:40:52,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.29 | bwd_microstep: 1539.87 | bwd_inner_microstep: 1539.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-10 12:40:53,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.74 | bwd_microstep: 777.61 | bwd_inner_microstep: 777.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1962
[2024-06-10 12:40:54,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.93 | bwd_microstep: 762.90 | bwd_inner_microstep: 762.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 12:40:55,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.00 | bwd_microstep: 677.84 | bwd_inner_microstep: 677.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 12:40:56,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 791.04 | bwd_inner_microstep: 791.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1958
[2024-06-10 12:40:57,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.00 | bwd_microstep: 731.25 | bwd_inner_microstep: 731.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 12:40:59,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1385.87 | bwd_inner_microstep: 1385.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 12:41:01,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.94 | bwd_microstep: 1341.35 | bwd_inner_microstep: 1341.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 12:41:03,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.05 | bwd_microstep: 1493.55 | bwd_inner_microstep: 1493.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3481
[2024-06-10 12:41:05,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1612.15 | bwd_inner_microstep: 1612.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3498
[2024-06-10 12:41:08,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.19 | bwd_microstep: 1611.25 | bwd_inner_microstep: 1611.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2485
[2024-06-10 12:41:09,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.35 | bwd_microstep: 959.91 | bwd_inner_microstep: 959.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 12:41:10,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.47 | bwd_microstep: 794.75 | bwd_inner_microstep: 794.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 12:41:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1562.98 | bwd_inner_microstep: 1562.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295
[2024-06-10 12:41:13,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.59 | bwd_microstep: 880.79 | bwd_inner_microstep: 880.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 12:41:15,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1354.59 | bwd_inner_microstep: 1354.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 12:41:16,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.44 | bwd_microstep: 686.43 | bwd_inner_microstep: 686.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2280
[2024-06-10 12:41:17,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.20 | bwd_microstep: 909.45 | bwd_inner_microstep: 909.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 12:41:19,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3821
[2024-06-10 12:41:22,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.43 | bwd_microstep: 1725.99 | bwd_inner_microstep: 1725.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3600
[2024-06-10 12:41:23,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1342.93 | bwd_inner_microstep: 1342.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 12:41:25,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1282.12 | bwd_inner_microstep: 1282.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 12:41:27,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1528.24 | bwd_inner_microstep: 1528.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3787
[2024-06-10 12:41:29,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1411.79 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 12:41:31,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1407.07 | bwd_inner_microstep: 1407.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 12:41:33,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1499.61 | bwd_inner_microstep: 1499.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3734
[2024-06-10 12:41:35,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1366.16 | bwd_inner_microstep: 1366.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 12:41:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.35 | optimizer_step: 6.61
[2024-06-10 12:41:45,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 9686.06 | bwd_inner_microstep: 2159.28 | bwd_allreduce_microstep: 7526.71 | step_microstep: 38.80
[2024-06-10 12:41:46,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14848.58 | bwd: 47686.32 | bwd_inner: 40158.68 | bwd_allreduce: 7526.95 | step: 40.38
{'loss': 1.2689, 'learning_rate': 2.7051587070483307e-05, 'epoch': 0.4}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 12:41:47,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.83 | bwd_microstep: 1240.66 | bwd_inner_microstep: 1240.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 12:41:49,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.46 | bwd_microstep: 1339.91 | bwd_inner_microstep: 1339.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3841
[2024-06-10 12:41:51,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1356.86 | bwd_inner_microstep: 1356.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 12:41:53,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.78 | bwd_microstep: 1548.95 | bwd_inner_microstep: 1548.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 12:41:55,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.30 | bwd_microstep: 1629.38 | bwd_inner_microstep: 1629.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 12:41:56,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.21 | bwd_microstep: 787.45 | bwd_inner_microstep: 787.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 12:41:58,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1247.69 | bwd_inner_microstep: 1247.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 12:42:00,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.49 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 12:42:02,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2175
[2024-06-10 12:42:03,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.52 | bwd_microstep: 854.50 | bwd_inner_microstep: 854.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 12:42:05,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.64 | bwd_microstep: 1417.46 | bwd_inner_microstep: 1417.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 12:42:07,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.76 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 12:42:09,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1518.41 | bwd_inner_microstep: 1518.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 12:42:11,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.34 | bwd_microstep: 1344.57 | bwd_inner_microstep: 1344.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-10 12:42:12,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.63 | bwd_microstep: 853.16 | bwd_inner_microstep: 853.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3660
[2024-06-10 12:42:14,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.27 | bwd_microstep: 1355.25 | bwd_inner_microstep: 1355.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 12:42:16,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.27 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3627
[2024-06-10 12:42:18,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.58 | bwd_microstep: 1776.61 | bwd_inner_microstep: 1776.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 12:42:20,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1489.52 | bwd_inner_microstep: 1489.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 12:42:23,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.90 | bwd_microstep: 1716.82 | bwd_inner_microstep: 1716.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 12:42:25,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.35 | bwd_microstep: 1646.39 | bwd_inner_microstep: 1646.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 12:42:26,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 974.27 | bwd_inner_microstep: 974.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 12:42:28,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.47 | bwd_microstep: 1187.29 | bwd_inner_microstep: 1187.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 12:42:30,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1286.07 | bwd_inner_microstep: 1286.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 12:42:32,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.74 | bwd_microstep: 1496.26 | bwd_inner_microstep: 1496.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-10 12:42:34,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1477.34 | bwd_inner_microstep: 1477.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3623
[2024-06-10 12:42:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.88 | bwd_microstep: 1216.79 | bwd_inner_microstep: 1216.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 12:42:37,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.58 | bwd_microstep: 1281.75 | bwd_inner_microstep: 1281.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513
[2024-06-10 12:42:39,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1547.83 | bwd_inner_microstep: 1547.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 12:42:41,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.78 | bwd_microstep: 1379.61 | bwd_inner_microstep: 1379.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 12:42:43,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.75 | bwd_microstep: 1387.89 | bwd_inner_microstep: 1387.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-10 12:42:46,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.09 | optimizer_gradients: 4.12 | optimizer_step: 6.57
[2024-06-10 12:42:46,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.91 | bwd_microstep: 2312.24 | bwd_inner_microstep: 1101.85 | bwd_allreduce_microstep: 1210.35 | step_microstep: 37.82
[2024-06-10 12:42:46,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15997.05 | bwd: 44076.81 | bwd_inner: 42865.51 | bwd_allreduce: 1210.61 | step: 39.38
{'loss': 1.2572, 'learning_rate': 2.701645106991325e-05, 'epoch': 0.4}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 12:42:48,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1236.69 | bwd_inner_microstep: 1236.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3939
[2024-06-10 12:42:50,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.51 | bwd_microstep: 1593.36 | bwd_inner_microstep: 1593.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2885
[2024-06-10 12:42:51,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.72 | bwd_microstep: 1084.04 | bwd_inner_microstep: 1084.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-10 12:42:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.63 | bwd_microstep: 1646.12 | bwd_inner_microstep: 1646.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 12:42:55,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1273.21 | bwd_inner_microstep: 1273.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 12:42:57,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3557
[2024-06-10 12:42:59,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.18 | bwd_microstep: 1202.30 | bwd_inner_microstep: 1202.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 12:43:01,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.99 | bwd_microstep: 1530.28 | bwd_inner_microstep: 1530.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 12:43:02,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.51 | bwd_microstep: 699.83 | bwd_inner_microstep: 699.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403
[2024-06-10 12:43:04,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1304.08 | bwd_inner_microstep: 1304.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3749
[2024-06-10 12:43:06,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.14 | bwd_microstep: 1735.22 | bwd_inner_microstep: 1735.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 12:43:07,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.48 | bwd_microstep: 912.00 | bwd_inner_microstep: 911.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 12:43:09,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.53 | bwd_microstep: 1484.64 | bwd_inner_microstep: 1484.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 12:43:11,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.67 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 12:43:14,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 12:43:16,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1513.81 | bwd_inner_microstep: 1513.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1979
[2024-06-10 12:43:17,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.32 | bwd_microstep: 735.75 | bwd_inner_microstep: 735.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 12:43:19,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.23 | bwd_microstep: 1457.82 | bwd_inner_microstep: 1457.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623
[2024-06-10 12:43:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1342.24 | bwd_inner_microstep: 1342.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 12:43:23,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.78 | bwd_microstep: 1603.26 | bwd_inner_microstep: 1603.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3632
[2024-06-10 12:43:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.69 | bwd_microstep: 1709.09 | bwd_inner_microstep: 1709.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671
[2024-06-10 12:43:27,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.54 | bwd_microstep: 1621.81 | bwd_inner_microstep: 1621.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 12:43:29,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1349.98 | bwd_inner_microstep: 1349.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 12:43:31,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.18 | bwd_microstep: 1500.43 | bwd_inner_microstep: 1500.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 12:43:33,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.94 | bwd_microstep: 1354.23 | bwd_inner_microstep: 1354.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3578
[2024-06-10 12:43:35,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.70 | bwd_microstep: 1431.20 | bwd_inner_microstep: 1431.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445
[2024-06-10 12:43:37,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.34 | bwd_microstep: 1158.27 | bwd_inner_microstep: 1158.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 12:43:39,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1500.66 | bwd_inner_microstep: 1500.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 12:43:41,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1511.24 | bwd_inner_microstep: 1511.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3780
[2024-06-10 12:43:43,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.41 | bwd_microstep: 1412.47 | bwd_inner_microstep: 1412.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 12:43:45,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.27 | bwd_microstep: 1503.20 | bwd_inner_microstep: 1503.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3765
[2024-06-10 12:43:47,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.39 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 12:43:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1321.23 | bwd_inner_microstep: 1311.04 | bwd_allreduce_microstep: 10.14 | step_microstep: 39.16
[2024-06-10 12:43:47,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16429.42 | bwd: 44003.75 | bwd_inner: 43992.71 | bwd_allreduce: 10.37 | step: 40.58


 40%|████      | 693/1726 [12:00:17<17:21:24, 60.49s/it]
 40%|████      | 694/1726 [12:01:18<17:24:30, 60.73s/it]


 40%|████      | 694/1726 [12:01:18<17:24:30, 60.73s/it]
 40%|████      | 695/1726 [12:02:18<17:20:28, 60.55s/it]


 40%|████      | 695/1726 [12:02:18<17:20:28, 60.55s/it]
 40%|████      | 696/1726 [12:03:19<17:22:38, 60.74s/it]


 40%|████      | 696/1726 [12:03:19<17:22:38, 60.74s/it]
 40%|████      | 697/1726 [12:04:22<17:32:32, 61.37s/it]


 40%|████      | 697/1726 [12:04:22<17:32:32, 61.37s/it]
 40%|████      | 698/1726 [12:05:23<17:26:34, 61.08s/it]


 40%|████      | 698/1726 [12:05:23<17:26:34, 61.08s/it]
 40%|�{'loss': 1.261, 'learning_rate': 2.6981290357439004e-05, 'epoch': 0.4}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3476
[2024-06-10 12:43:49,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.31 | bwd_microstep: 1576.47 | bwd_inner_microstep: 1576.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 12:43:51,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1281.96 | bwd_inner_microstep: 1281.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 12:43:53,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 1482.05 | bwd_inner_microstep: 1482.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608
[2024-06-10 12:43:54,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1310.64 | bwd_inner_microstep: 1310.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 12:43:56,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1251.29 | bwd_inner_microstep: 1251.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 12:43:58,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1250.12 | bwd_inner_microstep: 1250.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 12:44:00,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1254.81 | bwd_inner_microstep: 1254.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 12:44:01,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1244.81 | bwd_inner_microstep: 1244.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 12:44:03,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.14 | bwd_microstep: 1281.56 | bwd_inner_microstep: 1281.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 12:44:05,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.20 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3640
[2024-06-10 12:44:07,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.96 | bwd_microstep: 1346.14 | bwd_inner_microstep: 1346.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3507
[2024-06-10 12:44:09,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.49 | bwd_microstep: 1534.33 | bwd_inner_microstep: 1534.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3763
[2024-06-10 12:44:11,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.03 | bwd_microstep: 1771.33 | bwd_inner_microstep: 1771.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 12:44:13,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.92 | bwd_microstep: 825.70 | bwd_inner_microstep: 825.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 12:44:15,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.61 | bwd_microstep: 1476.92 | bwd_inner_microstep: 1476.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 12:44:16,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.25 | bwd_microstep: 1293.21 | bwd_inner_microstep: 1293.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 12:44:18,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.27 | bwd_microstep: 1279.31 | bwd_inner_microstep: 1279.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3613
[2024-06-10 12:44:20,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.83 | bwd_microstep: 1431.47 | bwd_inner_microstep: 1431.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 12:44:22,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 1613.89 | bwd_inner_microstep: 1613.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 12:44:24,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.73 | bwd_microstep: 1507.20 | bwd_inner_microstep: 1507.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 12:44:26,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.55 | bwd_microstep: 1152.95 | bwd_inner_microstep: 1152.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 12:44:28,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.31 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 12:44:30,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 12:44:32,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.21 | bwd_microstep: 1182.72 | bwd_inner_microstep: 1182.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3554
[2024-06-10 12:44:34,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1454.49 | bwd_inner_microstep: 1454.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1917
[2024-06-10 12:44:35,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.05 | bwd_microstep: 716.41 | bwd_inner_microstep: 716.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 12:44:36,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.29 | bwd_microstep: 1278.12 | bwd_inner_microstep: 1278.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2286
[2024-06-10 12:44:38,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.08 | bwd_microstep: 1006.47 | bwd_inner_microstep: 1006.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 12:44:40,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.49 | bwd_microstep: 1501.91 | bwd_inner_microstep: 1501.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 12:44:42,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.73 | bwd_microstep: 1539.77 | bwd_inner_microstep: 1539.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3871
[2024-06-10 12:44:44,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.66 | bwd_microstep: 1765.87 | bwd_inner_microstep: 1765.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3302
[2024-06-10 12:44:49,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 12:44:49,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 3662.48 | bwd_inner_microstep: 1666.35 | bwd_allreduce_microstep: 1996.08 | step_microstep: 37.82
[2024-06-10 12:44:49,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16207.66 | bwd: 45423.34 | bwd_inner: 43426.36 | bwd_allreduce: 1996.30 | step: 39.28
{'loss': 1.2608, 'learning_rate': 2.6946105056896406e-05, 'epoch': 0.41}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 12:44:51,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.42 | bwd_microstep: 1497.40 | bwd_inner_microstep: 1497.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3990
[2024-06-10 12:44:53,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.78 | bwd_microstep: 1601.55 | bwd_inner_microstep: 1601.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3918
[2024-06-10 12:44:55,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1520.69 | bwd_inner_microstep: 1520.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1863
[2024-06-10 12:44:56,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.97 | bwd_microstep: 704.81 | bwd_inner_microstep: 704.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 12:44:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.60 | bwd_microstep: 1488.30 | bwd_inner_microstep: 1488.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 12:45:00,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1248.86 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945
[2024-06-10 12:45:01,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.21 | bwd_microstep: 729.23 | bwd_inner_microstep: 729.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3598
[2024-06-10 12:45:03,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.08 | bwd_microstep: 1244.69 | bwd_inner_microstep: 1244.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2653
[2024-06-10 12:45:04,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.96 | bwd_microstep: 1116.00 | bwd_inner_microstep: 1115.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3564
[2024-06-10 12:45:06,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.50 | bwd_microstep: 1458.94 | bwd_inner_microstep: 1458.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 12:45:08,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.18 | bwd_microstep: 1624.41 | bwd_inner_microstep: 1624.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 12:45:10,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1246.75 | bwd_inner_microstep: 1246.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3387
[2024-06-10 12:45:12,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.37 | bwd_microstep: 1273.82 | bwd_inner_microstep: 1273.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 12:45:14,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1251.47 | bwd_inner_microstep: 1251.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 12:45:16,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1482.76 | bwd_inner_microstep: 1482.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4107
[2024-06-10 12:45:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.35 | bwd_microstep: 1441.28 | bwd_inner_microstep: 1441.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519
[2024-06-10 12:45:20,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1411.54 | bwd_inner_microstep: 1411.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525
[2024-06-10 12:45:21,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.70 | bwd_microstep: 1353.66 | bwd_inner_microstep: 1353.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3528
[2024-06-10 12:45:23,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1261.57 | bwd_inner_microstep: 1261.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523
[2024-06-10 12:45:25,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.85 | bwd_microstep: 1231.38 | bwd_inner_microstep: 1231.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 12:45:27,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1460.32 | bwd_inner_microstep: 1460.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3617
[2024-06-10 12:45:29,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.42 | bwd_microstep: 1215.81 | bwd_inner_microstep: 1215.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 12:45:30,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1380.12 | bwd_inner_microstep: 1380.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 12:45:32,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.96 | bwd_microstep: 1284.44 | bwd_inner_microstep: 1284.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 12:45:34,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1397.43 | bwd_inner_microstep: 1397.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 12:45:36,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1349.28 | bwd_inner_microstep: 1349.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 12:45:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.99 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3715
[2024-06-10 12:45:40,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.16 | bwd_microstep: 1394.83 | bwd_inner_microstep: 1394.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 12:45:42,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.39 | bwd_microstep: 1451.65 | bwd_inner_microstep: 1451.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3590
[2024-06-10 12:45:44,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.49 | bwd_microstep: 1634.75 | bwd_inner_microstep: 1634.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2226
[2024-06-10 12:45:45,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.73 | bwd_microstep: 959.37 | bwd_inner_microstep: 959.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 12:45:48,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.12 | optimizer_step: 6.59
[2024-06-10 12:45:48,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1766.63 | bwd_inner_microstep: 1582.83 | bwd_allreduce_microstep: 183.75 | step_microstep: 37.78
[2024-06-10 12:45:48,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15997.43 | bwd: 42766.09 | bwd_inner: 42581.44 | bwd_allreduce: 183.99 | step: 39.32
{'loss': 1.244, 'learning_rate': 2.6910895292207918e-05, 'epoch': 0.41}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 12:45:50,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.42 | bwd_microstep: 1375.77 | bwd_inner_microstep: 1375.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903
[2024-06-10 12:45:52,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.99 | bwd_microstep: 1483.37 | bwd_inner_microstep: 1483.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-10 12:45:54,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.20 | bwd_microstep: 1581.46 | bwd_inner_microstep: 1581.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415
[2024-06-10 12:45:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.97 | bwd_microstep: 1308.11 | bwd_inner_microstep: 1308.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3737
[2024-06-10 12:45:58,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1380.72 | bwd_inner_microstep: 1380.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 12:46:00,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.01 | bwd_microstep: 1532.94 | bwd_inner_microstep: 1532.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3583
[2024-06-10 12:46:02,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1364.29 | bwd_inner_microstep: 1364.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 12:46:03,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.81 | bwd_microstep: 1285.71 | bwd_inner_microstep: 1285.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2471
[2024-06-10 12:46:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.73 | bwd_microstep: 954.54 | bwd_inner_microstep: 954.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 12:46:07,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.26 | bwd_microstep: 1394.48 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440
[2024-06-10 12:46:08,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.20 | bwd_microstep: 1216.61 | bwd_inner_microstep: 1216.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2191
[2024-06-10 12:46:10,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.76 | bwd_microstep: 855.85 | bwd_inner_microstep: 855.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424
[2024-06-10 12:46:11,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.12 | bwd_microstep: 1308.62 | bwd_inner_microstep: 1308.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 12:46:13,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.96 | bwd_microstep: 1473.40 | bwd_inner_microstep: 1473.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3434
[2024-06-10 12:46:15,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1300.45 | bwd_inner_microstep: 1300.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 12:46:17,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.13 | bwd_microstep: 1291.87 | bwd_inner_microstep: 1291.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 12:46:19,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.91 | bwd_microstep: 1482.44 | bwd_inner_microstep: 1482.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 12:46:21,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1285.67 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 12:46:22,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.22 | bwd_microstep: 1159.14 | bwd_inner_microstep: 1159.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1974
[2024-06-10 12:46:24,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.41 | bwd_microstep: 862.22 | bwd_inner_microstep: 862.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 12:46:25,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.24 | bwd_microstep: 1288.17 | bwd_inner_microstep: 1288.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 12:46:28,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.12 | bwd_microstep: 1622.60 | bwd_inner_microstep: 1622.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 12:46:29,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.21 | bwd_microstep: 975.53 | bwd_inner_microstep: 975.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 12:46:31,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1395.50 | bwd_inner_microstep: 1395.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3599
[2024-06-10 12:46:33,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.82 | bwd_microstep: 1376.63 | bwd_inner_microstep: 1376.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 12:46:35,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.00 | bwd_microstep: 1305.89 | bwd_inner_microstep: 1305.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 12:46:37,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.08 | bwd_microstep: 1636.13 | bwd_inner_microstep: 1636.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3726
[2024-06-10 12:46:39,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.01 | bwd_microstep: 1370.69 | bwd_inner_microstep: 1370.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 12:46:41,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.65 | bwd_microstep: 1452.55 | bwd_inner_microstep: 1452.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3815
[2024-06-10 12:46:43,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.48 | bwd_microstep: 1719.50 | bwd_inner_microstep: 1719.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 12:46:45,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1499.77 | bwd_inner_microstep: 1499.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 12:46:49,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.18 | optimizer_step: 6.64
[2024-06-10 12:46:49,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 3728.55 | bwd_inner_microstep: 1734.24 | bwd_allreduce_microstep: 1994.27 | step_microstep: 37.93
[2024-06-10 12:46:50,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16147.86 | bwd: 45269.18 | bwd_inner: 43274.02 | bwd_allreduce: 1994.49 | step: 39.52
{'loss': 1.2504, 'learning_rate': 2.6875661187382127e-05, 'epoch': 0.41}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398
[2024-06-10 12:46:51,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.74 | bwd_microstep: 1141.36 | bwd_inner_microstep: 1141.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 12:46:53,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.34 | bwd_microstep: 1242.48 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940
[2024-06-10 12:46:54,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 756.77 | bwd_inner_microstep: 756.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4205
[2024-06-10 12:46:56,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.29 | bwd_microstep: 1754.86 | bwd_inner_microstep: 1754.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 12:46:58,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.07 | bwd_microstep: 1382.44 | bwd_inner_microstep: 1382.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 12:47:00,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 1275.75 | bwd_inner_microstep: 1275.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465
[2024-06-10 12:47:02,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.79 | bwd_microstep: 1213.55 | bwd_inner_microstep: 1213.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 12:47:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1383.35 | bwd_inner_microstep: 1383.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 12:47:05,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1283.14 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3699
[2024-06-10 12:47:08,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.52 | bwd_microstep: 1692.51 | bwd_inner_microstep: 1692.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 12:47:10,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1376.39 | bwd_inner_microstep: 1376.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 12:47:12,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.73 | bwd_microstep: 1402.54 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 12:47:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1342.97 | bwd_inner_microstep: 1342.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 12:47:15,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.79 | bwd_microstep: 974.20 | bwd_inner_microstep: 974.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 12:47:16,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1256.41 | bwd_inner_microstep: 1256.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 12:47:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.41 | bwd_microstep: 1427.92 | bwd_inner_microstep: 1427.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 12:47:20,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.63 | bwd_microstep: 1186.27 | bwd_inner_microstep: 1186.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 12:47:22,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1282.39 | bwd_inner_microstep: 1282.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 12:47:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.51 | bwd_microstep: 799.35 | bwd_inner_microstep: 799.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616
[2024-06-10 12:47:25,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1339.64 | bwd_inner_microstep: 1339.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 12:47:27,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.38 | bwd_microstep: 1655.13 | bwd_inner_microstep: 1655.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 12:47:29,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1249.26 | bwd_inner_microstep: 1249.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 12:47:31,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1606.00 | bwd_inner_microstep: 1605.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1909
[2024-06-10 12:47:32,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.92 | bwd_microstep: 749.16 | bwd_inner_microstep: 749.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 12:47:34,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.43 | bwd_microstep: 1495.96 | bwd_inner_microstep: 1495.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886
[2024-06-10 12:47:35,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.86 | bwd_microstep: 680.50 | bwd_inner_microstep: 680.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 12:47:37,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.64 | bwd_microstep: 1620.90 | bwd_inner_microstep: 1620.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-10 12:47:40,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.35 | bwd_microstep: 1595.38 | bwd_inner_microstep: 1595.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 12:47:42,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1532.47 | bwd_inner_microstep: 1532.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3804
[2024-06-10 12:47:44,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.71 | bwd_microstep: 1849.90 | bwd_inner_microstep: 1849.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 12:47:46,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1540.75 | bwd_inner_microstep: 1540.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2421
[2024-06-10 12:47:49,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 12:47:49,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.97 | bwd_microstep: 2671.28 | bwd_inner_microstep: 992.72 | bwd_allreduce_microstep: 1678.51 | step_microstep: 37.67
[2024-06-10 12:47:49,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15752.07 | bwd: 43760.99 | bwd_inner: 42081.58 | bwd_allreduce: 1678.74 | step: 39.18
{'loss': 1.2471, 'learning_rate': 2.684040286651338e-05, 'epoch': 0.41}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514
[2024-06-10 12:47:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.34 | bwd_microstep: 1436.17 | bwd_inner_microstep: 1436.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893
[2024-06-10 12:47:54,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.97 | bwd_microstep: 1581.75 | bwd_inner_microstep: 1581.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411
[2024-06-10 12:47:55,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.75 | bwd_microstep: 1370.74 | bwd_inner_microstep: 1370.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070
[2024-06-10 12:47:57,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.88 | bwd_microstep: 848.99 | bwd_inner_microstep: 848.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 12:47:58,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.47 | bwd_microstep: 1341.96 | bwd_inner_microstep: 1341.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 12:48:00,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.10 | bwd_microstep: 1478.54 | bwd_inner_microstep: 1478.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-10 12:48:03,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.79 | bwd_microstep: 1536.83 | bwd_inner_microstep: 1536.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 12:48:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1297.76 | bwd_inner_microstep: 1297.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2254
[2024-06-10 12:48:06,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.94 | bwd_microstep: 870.58 | bwd_inner_microstep: 870.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 12:48:07,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1253.15 | bwd_inner_microstep: 1253.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431
[2024-06-10 12:48:09,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.85 | bwd_microstep: 1309.42 | bwd_inner_microstep: 1309.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-10 12:48:11,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.25 | bwd_microstep: 1288.57 | bwd_inner_microstep: 1288.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 12:48:13,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.89 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2881
[2024-06-10 12:48:15,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.13 | bwd_microstep: 1179.33 | bwd_inner_microstep: 1179.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 12:48:17,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1392.99 | bwd_inner_microstep: 1392.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3645
[2024-06-10 12:48:18,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1380.39 | bwd_inner_microstep: 1380.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3390
[2024-06-10 12:48:20,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.23 | bwd_microstep: 1432.72 | bwd_inner_microstep: 1432.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 12:48:23,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.40 | bwd_microstep: 1614.74 | bwd_inner_microstep: 1614.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3512
[2024-06-10 12:48:25,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.32 | bwd_microstep: 1512.44 | bwd_inner_microstep: 1512.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 12:48:27,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1379.21 | bwd_inner_microstep: 1379.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 12:48:29,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082
[2024-06-10 12:48:30,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.15 | bwd_microstep: 819.85 | bwd_inner_microstep: 819.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 12:48:32,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1414.38 | bwd_inner_microstep: 1414.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 12:48:34,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.51 | bwd_microstep: 1457.11 | bwd_inner_microstep: 1457.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 12:48:35,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1295.49 | bwd_inner_microstep: 1295.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 12:48:37,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.76 | bwd_microstep: 1184.46 | bwd_inner_microstep: 1184.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 12:48:39,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1282.01 | bwd_inner_microstep: 1281.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 12:48:41,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.18 | bwd_microstep: 1283.06 | bwd_inner_microstep: 1283.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3403
[2024-06-10 12:48:43,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.38 | bwd_microstep: 1372.86 | bwd_inner_microstep: 1372.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735
[2024-06-10 12:48:44,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.59 | bwd_microstep: 1430.19 | bwd_inner_microstep: 1430.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3817
[2024-06-10 12:48:46,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1371.56 | bwd_inner_microstep: 1371.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 12:48:49,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 12:48:49,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.71 | bwd_microstep: 1681.97 | bwd_inner_microstep: 1674.21 | bwd_allreduce_microstep: 7.71 | step_microstep: 37.71
[2024-06-10 12:48:49,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16091.71 | bwd: 42956.73 | bwd_inner: 42948.12 | bwd_allreduce: 7.93 | step: 39.17
�███      | 699/1726 [12:06:23<17:23:55, 60.99s/it]


 40%|████      | 699/1726 [12:06:23<17:23:55, 60.99s/it]
 41%|████      | 700/1726 [12:07:25<17:27:54, 61.28s/it]


 41%|████      | 700/1726 [12:07:25<17:27:54, 61.28s/it]
 41%|████      | 701/1726 [12:08:24<17:15:43, 60.63s/it]


 41%|████      | 701/1726 [12:08:24<17:15:43, 60.63s/it]
 41%|████      | 702/1726 [12:09:26<17:20:29, 60.97s/it]


 41%|████      | 702/1726 [12:09:26<17:20:29, 60.97s/it]
 41%|████      | 703/1726 [12:10:26<17:13:43, 60.63s/it]


 41%|████      | 703/1726 [12:10:26<17:13:43, 60.63s/it]
 41%|████      | 704/1726 [12:11:25<17:06:18, 60.25s/it]
                          {'loss': 1.2739, 'learning_rate': 2.6805120453781296e-05, 'epoch': 0.41}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 12:48:51,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.06 | bwd_microstep: 1303.22 | bwd_inner_microstep: 1303.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3857
[2024-06-10 12:48:53,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.95 | bwd_microstep: 1457.98 | bwd_inner_microstep: 1457.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 12:48:55,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.44 | bwd_microstep: 1550.35 | bwd_inner_microstep: 1550.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 12:48:57,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1478.23 | bwd_inner_microstep: 1478.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 12:48:59,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.30 | bwd_microstep: 1538.06 | bwd_inner_microstep: 1538.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 12:49:01,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1638.34 | bwd_inner_microstep: 1638.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 12:49:02,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 789.47 | bwd_inner_microstep: 789.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3611
[2024-06-10 12:49:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.31 | bwd_microstep: 1468.08 | bwd_inner_microstep: 1468.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 12:49:05,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.06 | bwd_microstep: 699.60 | bwd_inner_microstep: 699.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3582
[2024-06-10 12:49:07,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.23 | bwd_microstep: 1238.07 | bwd_inner_microstep: 1238.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 12:49:09,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1248.10 | bwd_inner_microstep: 1248.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3623
[2024-06-10 12:49:10,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.40 | bwd_microstep: 1275.97 | bwd_inner_microstep: 1275.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3449
[2024-06-10 12:49:12,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.24 | bwd_microstep: 1203.12 | bwd_inner_microstep: 1203.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3542
[2024-06-10 12:49:14,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.04 | bwd_microstep: 1585.18 | bwd_inner_microstep: 1585.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-10 12:49:16,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.16 | bwd_microstep: 1180.59 | bwd_inner_microstep: 1180.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 12:49:18,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1609.41 | bwd_inner_microstep: 1609.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 12:49:20,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1488.60 | bwd_inner_microstep: 1488.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 12:49:22,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1401.32 | bwd_inner_microstep: 1401.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3830
[2024-06-10 12:49:24,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1596.19 | bwd_inner_microstep: 1596.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 12:49:26,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1348.30 | bwd_inner_microstep: 1348.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3585
[2024-06-10 12:49:28,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.29 | bwd_microstep: 1701.08 | bwd_inner_microstep: 1701.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 12:49:30,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.76 | bwd_microstep: 1428.35 | bwd_inner_microstep: 1428.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 12:49:32,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1428.29 | bwd_inner_microstep: 1428.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 12:49:34,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 800.62 | bwd_inner_microstep: 800.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-10 12:49:36,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.89 | bwd_microstep: 1482.44 | bwd_inner_microstep: 1482.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3762
[2024-06-10 12:49:38,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.18 | bwd_microstep: 1569.70 | bwd_inner_microstep: 1569.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2426
[2024-06-10 12:49:39,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.89 | bwd_microstep: 1037.53 | bwd_inner_microstep: 1037.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2231
[2024-06-10 12:49:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.26 | bwd_microstep: 865.20 | bwd_inner_microstep: 865.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2423
[2024-06-10 12:49:42,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.65 | bwd_microstep: 844.35 | bwd_inner_microstep: 844.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2049
[2024-06-10 12:49:43,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.14 | bwd_microstep: 845.72 | bwd_inner_microstep: 845.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-10 12:49:44,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 698.94 | bwd_inner_microstep: 698.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063
[2024-06-10 12:49:50,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.37 | optimizer_step: 6.58
[2024-06-10 12:49:50,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.98 | bwd_microstep: 5891.91 | bwd_inner_microstep: 937.02 | bwd_allreduce_microstep: 4954.83 | step_microstep: 38.70
[2024-06-10 12:49:50,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15229.08 | bwd: 45692.35 | bwd_inner: 40736.60 | bwd_allreduce: 4955.06 | step: 40.19
{'loss': 1.2834, 'learning_rate': 2.6769814073450348e-05, 'epoch': 0.41}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 12:49:52,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.22 | bwd_microstep: 1273.15 | bwd_inner_microstep: 1273.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 12:49:53,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.87 | bwd_microstep: 1240.78 | bwd_inner_microstep: 1240.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903
[2024-06-10 12:49:56,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.47 | bwd_microstep: 1483.65 | bwd_inner_microstep: 1483.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 12:49:58,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.08 | bwd_microstep: 1581.21 | bwd_inner_microstep: 1581.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 12:50:00,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.81 | bwd_microstep: 1437.78 | bwd_inner_microstep: 1437.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 12:50:01,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1280.00 | bwd_inner_microstep: 1279.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2247
[2024-06-10 12:50:03,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.05 | bwd_microstep: 778.34 | bwd_inner_microstep: 778.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3766
[2024-06-10 12:50:05,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.57 | bwd_microstep: 1471.10 | bwd_inner_microstep: 1471.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 12:50:06,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.41 | bwd_microstep: 1282.96 | bwd_inner_microstep: 1282.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1445
[2024-06-10 12:50:07,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 217.76 | bwd_microstep: 570.80 | bwd_inner_microstep: 570.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3708
[2024-06-10 12:50:09,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.32 | bwd_microstep: 1451.27 | bwd_inner_microstep: 1451.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 12:50:11,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1253.98 | bwd_inner_microstep: 1253.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 12:50:13,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1416.54 | bwd_inner_microstep: 1416.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 12:50:15,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.82 | bwd_microstep: 1481.15 | bwd_inner_microstep: 1481.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 12:50:17,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1284.96 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 12:50:19,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1613.12 | bwd_inner_microstep: 1613.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-10 12:50:21,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.17 | bwd_microstep: 1191.20 | bwd_inner_microstep: 1191.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658
[2024-06-10 12:50:22,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.60 | bwd_microstep: 1416.03 | bwd_inner_microstep: 1416.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 12:50:25,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1496.40 | bwd_inner_microstep: 1496.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1438
[2024-06-10 12:50:25,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 238.32 | bwd_microstep: 632.16 | bwd_inner_microstep: 632.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 12:50:28,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 1558.64 | bwd_inner_microstep: 1558.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2074
[2024-06-10 12:50:29,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.57 | bwd_microstep: 786.90 | bwd_inner_microstep: 786.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 12:50:30,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1252.26 | bwd_inner_microstep: 1252.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2169
[2024-06-10 12:50:32,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.10 | bwd_microstep: 853.61 | bwd_inner_microstep: 853.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 12:50:34,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1512.31 | bwd_inner_microstep: 1512.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2298
[2024-06-10 12:50:35,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.62 | bwd_microstep: 1012.01 | bwd_inner_microstep: 1011.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3740
[2024-06-10 12:50:37,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 1398.65 | bwd_inner_microstep: 1398.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 12:50:39,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1474.76 | bwd_inner_microstep: 1474.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 12:50:41,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1409.57 | bwd_inner_microstep: 1409.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3579
[2024-06-10 12:50:43,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.01 | bwd_microstep: 1629.43 | bwd_inner_microstep: 1629.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 12:50:45,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1397.10 | bwd_inner_microstep: 1397.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3550
[2024-06-10 12:50:52,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 12:50:52,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.79 | bwd_microstep: 6385.10 | bwd_inner_microstep: 1915.13 | bwd_allreduce_microstep: 4469.91 | step_microstep: 38.12
[2024-06-10 12:50:52,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15577.94 | bwd: 46306.90 | bwd_inner: 41836.09 | bwd_allreduce: 4470.14 | step: 39.62
{'loss': 1.278, 'learning_rate': 2.673448384986943e-05, 'epoch': 0.41}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 12:50:54,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1440.75 | bwd_inner_microstep: 1440.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 12:50:56,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.84 | bwd_microstep: 1534.66 | bwd_inner_microstep: 1534.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3916
[2024-06-10 12:50:58,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1494.89 | bwd_inner_microstep: 1494.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 12:51:00,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1377.52 | bwd_inner_microstep: 1377.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3779
[2024-06-10 12:51:02,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.92 | bwd_microstep: 1505.75 | bwd_inner_microstep: 1505.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 12:51:04,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1392.12 | bwd_inner_microstep: 1392.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 12:51:06,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1444.62 | bwd_inner_microstep: 1444.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 12:51:08,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1246.64 | bwd_inner_microstep: 1246.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 12:51:10,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1576.51 | bwd_inner_microstep: 1576.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3401
[2024-06-10 12:51:12,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1403.16 | bwd_inner_microstep: 1403.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2095
[2024-06-10 12:51:13,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.59 | bwd_microstep: 726.78 | bwd_inner_microstep: 726.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 12:51:15,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.08 | bwd_microstep: 1286.70 | bwd_inner_microstep: 1286.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3549
[2024-06-10 12:51:17,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1594.67 | bwd_inner_microstep: 1594.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 12:51:19,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.03 | bwd_microstep: 1524.74 | bwd_inner_microstep: 1524.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3996
[2024-06-10 12:51:21,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.47 | bwd_microstep: 1505.36 | bwd_inner_microstep: 1505.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2698
[2024-06-10 12:51:23,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.43 | bwd_microstep: 1128.72 | bwd_inner_microstep: 1128.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445
[2024-06-10 12:51:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.73 | bwd_microstep: 1164.97 | bwd_inner_microstep: 1164.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-10 12:51:27,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.65 | bwd_microstep: 1586.25 | bwd_inner_microstep: 1586.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 12:51:28,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.19 | bwd_microstep: 798.32 | bwd_inner_microstep: 798.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644
[2024-06-10 12:51:30,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.59 | bwd_microstep: 1517.12 | bwd_inner_microstep: 1517.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2036
[2024-06-10 12:51:31,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.65 | bwd_microstep: 871.19 | bwd_inner_microstep: 871.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-10 12:51:33,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.45 | bwd_microstep: 1215.33 | bwd_inner_microstep: 1215.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-10 12:51:35,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.69 | bwd_microstep: 1297.62 | bwd_inner_microstep: 1297.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 12:51:37,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.34 | bwd_microstep: 1453.72 | bwd_inner_microstep: 1453.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024
[2024-06-10 12:51:38,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.19 | bwd_microstep: 713.75 | bwd_inner_microstep: 713.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811
[2024-06-10 12:51:39,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.22 | bwd_microstep: 1355.46 | bwd_inner_microstep: 1355.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 12:51:41,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1255.14 | bwd_inner_microstep: 1255.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 12:51:43,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1511.12 | bwd_inner_microstep: 1511.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575
[2024-06-10 12:51:45,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.13 | bwd_microstep: 1432.33 | bwd_inner_microstep: 1432.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 12:51:47,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1300.65 | bwd_inner_microstep: 1300.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 12:51:49,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.45 | bwd_microstep: 1549.17 | bwd_inner_microstep: 1549.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812
[2024-06-10 12:51:54,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 12:51:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.52 | bwd_microstep: 4449.80 | bwd_inner_microstep: 1562.09 | bwd_allreduce_microstep: 2887.66 | step_microstep: 38.16
[2024-06-10 12:51:54,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15981.85 | bwd: 45655.55 | bwd_inner: 42766.99 | bwd_allreduce: 2887.89 | step: 39.69
{'loss': 1.2184, 'learning_rate': 2.66991299074714e-05, 'epoch': 0.41}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3480
[2024-06-10 12:51:56,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1420.47 | bwd_inner_microstep: 1420.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 4586
[2024-06-10 12:51:58,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.96 | bwd_microstep: 1640.50 | bwd_inner_microstep: 1640.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 12:52:00,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1386.00 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 12:52:02,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.76 | bwd_microstep: 1293.52 | bwd_inner_microstep: 1293.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3791
[2024-06-10 12:52:04,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.12 | bwd_microstep: 1579.46 | bwd_inner_microstep: 1579.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 12:52:06,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1280.76 | bwd_inner_microstep: 1280.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 12:52:08,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1480.77 | bwd_inner_microstep: 1480.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921
[2024-06-10 12:52:09,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 788.64 | bwd_inner_microstep: 788.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-10 12:52:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.84 | bwd_microstep: 1425.10 | bwd_inner_microstep: 1425.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076
[2024-06-10 12:52:12,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.98 | bwd_microstep: 820.87 | bwd_inner_microstep: 820.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 12:52:14,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1245.91 | bwd_inner_microstep: 1245.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 12:52:16,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1313.98 | bwd_inner_microstep: 1313.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3684
[2024-06-10 12:52:18,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.21 | bwd_microstep: 1385.78 | bwd_inner_microstep: 1385.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3492
[2024-06-10 12:52:20,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.77 | bwd_microstep: 1347.38 | bwd_inner_microstep: 1347.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 12:52:22,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.69 | bwd_microstep: 1379.41 | bwd_inner_microstep: 1379.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653
[2024-06-10 12:52:23,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1321.14 | bwd_inner_microstep: 1321.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 12:52:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.11 | bwd_microstep: 1602.32 | bwd_inner_microstep: 1602.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-10 12:52:28,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.06 | bwd_microstep: 1572.84 | bwd_inner_microstep: 1572.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 12:52:30,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.56 | bwd_microstep: 1664.96 | bwd_inner_microstep: 1664.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 12:52:32,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.47 | bwd_microstep: 1289.50 | bwd_inner_microstep: 1289.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733
[2024-06-10 12:52:34,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1430.54 | bwd_inner_microstep: 1430.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3624
[2024-06-10 12:52:36,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.74 | bwd_microstep: 1637.18 | bwd_inner_microstep: 1637.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 12:52:38,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.46 | bwd_microstep: 1459.87 | bwd_inner_microstep: 1459.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3448
[2024-06-10 12:52:40,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.81 | bwd_microstep: 1225.25 | bwd_inner_microstep: 1225.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 12:52:42,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.27 | bwd_microstep: 1506.52 | bwd_inner_microstep: 1506.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 12:52:44,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1403.01 | bwd_inner_microstep: 1402.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077
[2024-06-10 12:52:45,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.09 | bwd_microstep: 914.63 | bwd_inner_microstep: 914.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 12:52:46,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.10 | bwd_microstep: 873.84 | bwd_inner_microstep: 873.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 12:52:49,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.42 | bwd_microstep: 1748.78 | bwd_inner_microstep: 1748.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 12:52:51,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1499.49 | bwd_inner_microstep: 1499.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 12:52:53,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1378.64 | bwd_inner_microstep: 1378.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033
[2024-06-10 12:52:54,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.94 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-10 12:52:54,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.94 | bwd_microstep: 1404.28 | bwd_inner_microstep: 918.87 | bwd_allreduce_microstep: 485.36 | step_microstep: 37.70
[2024-06-10 12:52:54,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16156.57 | bwd: 43721.36 | bwd_inner: 43235.11 | bwd_allreduce: 485.59 | step: 39.21
{'loss': 1.2286, 'learning_rate': 2.6663752370772663e-05, 'epoch': 0.41}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524
[2024-06-10 12:52:57,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.32 | bwd_microstep: 1582.53 | bwd_inner_microstep: 1582.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 12:52:58,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1377.05 | bwd_inner_microstep: 1377.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4311
[2024-06-10 12:53:01,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.66 | bwd_microstep: 1778.43 | bwd_inner_microstep: 1778.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 12:53:03,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.97 | bwd_microstep: 1644.86 | bwd_inner_microstep: 1644.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2929
[2024-06-10 12:53:05,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.66 | bwd_microstep: 1188.47 | bwd_inner_microstep: 1188.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 12:53:07,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 12:53:08,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1242.92 | bwd_inner_microstep: 1242.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3480
[2024-06-10 12:53:10,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.84 | bwd_microstep: 1245.94 | bwd_inner_microstep: 1245.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-10 12:53:12,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1340.18 | bwd_inner_microstep: 1340.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3745
[2024-06-10 12:53:14,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.60 | bwd_microstep: 1479.94 | bwd_inner_microstep: 1479.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3709
[2024-06-10 12:53:16,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.77 | bwd_microstep: 1562.18 | bwd_inner_microstep: 1562.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3725
[2024-06-10 12:53:19,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.07 | bwd_microstep: 1726.47 | bwd_inner_microstep: 1726.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-10 12:53:21,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.07 | bwd_microstep: 1514.63 | bwd_inner_microstep: 1514.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3452
[2024-06-10 12:53:23,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1519.68 | bwd_inner_microstep: 1519.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654
[2024-06-10 12:53:25,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1412.24 | bwd_inner_microstep: 1412.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3832
[2024-06-10 12:53:27,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.51 | bwd_microstep: 1751.31 | bwd_inner_microstep: 1751.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1967
[2024-06-10 12:53:28,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.70 | bwd_microstep: 763.57 | bwd_inner_microstep: 763.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2027
[2024-06-10 12:53:29,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.16 | bwd_microstep: 713.97 | bwd_inner_microstep: 713.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 12:53:31,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1392.57 | bwd_inner_microstep: 1392.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 12:53:33,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1311.72 | bwd_inner_microstep: 1311.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 12:53:35,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.09 | bwd_microstep: 1709.29 | bwd_inner_microstep: 1709.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154
[2024-06-10 12:53:36,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.35 | bwd_microstep: 851.82 | bwd_inner_microstep: 851.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3832
[2024-06-10 12:53:39,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.48 | bwd_microstep: 1752.14 | bwd_inner_microstep: 1752.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 12:53:41,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1602.06 | bwd_inner_microstep: 1602.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 12:53:43,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.78 | bwd_microstep: 1451.10 | bwd_inner_microstep: 1451.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 12:53:45,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1496.10 | bwd_inner_microstep: 1496.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3781
[2024-06-10 12:53:47,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.28 | bwd_microstep: 1711.16 | bwd_inner_microstep: 1711.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2268
[2024-06-10 12:53:49,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.05 | bwd_microstep: 1033.50 | bwd_inner_microstep: 1033.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 12:53:51,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.34 | bwd_microstep: 1503.91 | bwd_inner_microstep: 1503.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 12:53:53,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.82 | bwd_microstep: 1502.78 | bwd_inner_microstep: 1502.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613
[2024-06-10 12:53:55,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.49 | bwd_microstep: 1212.47 | bwd_inner_microstep: 1212.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3564
[2024-06-10 12:53:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.31 | optimizer_step: 6.61
[2024-06-10 12:53:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 3139.39 | bwd_inner_microstep: 1528.67 | bwd_allreduce_microstep: 1610.67 | step_microstep: 39.52
[2024-06-10 12:53:58,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16829.09 | bwd: 46899.86 | bwd_inner: 45288.29 | bwd_allreduce: 1610.90 | step: 40.99
{'loss': 1.263, 'learning_rate': 2.6628351364372717e-05, 'epoch': 0.41}


 41%|████      | 704/1726 [12:11:25<17:06:18, 60.25s/it]
 41%|████      | 705/1726 [12:12:27<17:10:25, 60.55s/it]


 41%|████      | 705/1726 [12:12:27<17:10:25, 60.55s/it]
 41%|████      | 706/1726 [12:13:29<17:17:52, 61.05s/it]


 41%|████      | 706/1726 [12:13:29<17:17:52, 61.05s/it]
 41%|████      | 707/1726 [12:14:31<17:21:31, 61.33s/it]


 41%|████      | 707/1726 [12:14:31<17:21:31, 61.33s/it]
 41%|████      | 708/1726 [12:15:31<17:14:51, 60.99s/it]


 41%|████      | 708/1726 [12:15:31<17:14:51, 60.99s/it]
 41%|████      | 709/1726 [12:16:35<17:29:30, 61.92s/it]


 41%|████      | 709/1726 [12:16:35<17:29:30dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 12:54:00,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 1436.09 | bwd_inner_microstep: 1436.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4344
[2024-06-10 12:54:03,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.40 | bwd_microstep: 1598.87 | bwd_inner_microstep: 1598.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 12:54:04,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1244.92 | bwd_inner_microstep: 1244.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4071
[2024-06-10 12:54:06,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.68 | bwd_microstep: 1424.16 | bwd_inner_microstep: 1424.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 12:54:08,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.95 | bwd_microstep: 1544.68 | bwd_inner_microstep: 1544.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 995
[2024-06-10 12:54:09,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.44 | bwd_microstep: 391.77 | bwd_inner_microstep: 391.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 12:54:11,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1388.27 | bwd_inner_microstep: 1388.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 12:54:12,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.61 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1132
[2024-06-10 12:54:13,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.78 | bwd_microstep: 459.71 | bwd_inner_microstep: 459.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 12:54:14,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.34 | bwd_microstep: 793.34 | bwd_inner_microstep: 793.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 12:54:15,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.04 | bwd_microstep: 1150.69 | bwd_inner_microstep: 1150.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 12:54:16,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 793.81 | bwd_inner_microstep: 793.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219
[2024-06-10 12:54:18,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.17 | bwd_microstep: 863.60 | bwd_inner_microstep: 863.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3699
[2024-06-10 12:54:20,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.11 | bwd_microstep: 1452.31 | bwd_inner_microstep: 1452.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 12:54:22,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1483.45 | bwd_inner_microstep: 1483.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 12:54:24,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.33 | bwd_microstep: 1343.82 | bwd_inner_microstep: 1343.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3642
[2024-06-10 12:54:26,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 1535.74 | bwd_inner_microstep: 1535.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-10 12:54:28,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.54 | bwd_microstep: 1522.55 | bwd_inner_microstep: 1522.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 12:54:29,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.56 | bwd_microstep: 1156.09 | bwd_inner_microstep: 1156.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 12:54:31,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1507.11 | bwd_inner_microstep: 1507.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2942
[2024-06-10 12:54:33,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.28 | bwd_microstep: 1094.19 | bwd_inner_microstep: 1094.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468
[2024-06-10 12:54:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1399.18 | bwd_inner_microstep: 1399.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 12:54:37,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.44 | bwd_microstep: 1160.06 | bwd_inner_microstep: 1160.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3617
[2024-06-10 12:54:38,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.65 | bwd_microstep: 1212.67 | bwd_inner_microstep: 1212.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444
[2024-06-10 12:54:40,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.52 | bwd_microstep: 1216.97 | bwd_inner_microstep: 1216.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 12:54:42,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 12:54:44,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.92 | bwd_microstep: 1538.26 | bwd_inner_microstep: 1538.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3552
[2024-06-10 12:54:46,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1455.03 | bwd_inner_microstep: 1455.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3738
[2024-06-10 12:54:48,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.00 | bwd_microstep: 1582.58 | bwd_inner_microstep: 1582.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 12:54:50,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.11 | bwd_microstep: 1408.06 | bwd_inner_microstep: 1408.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 12:54:52,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1473.55 | bwd_inner_microstep: 1473.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3602
[2024-06-10 12:55:00,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.94 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 12:55:00,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.18 | bwd_microstep: 7436.54 | bwd_inner_microstep: 1884.17 | bwd_allreduce_microstep: 5552.32 | step_microstep: 38.15
[2024-06-10 12:55:00,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15236.05 | bwd: 46416.59 | bwd_inner: 40863.38 | bwd_allreduce: 5552.54 | step: 39.66
{'loss': 1.3373, 'learning_rate': 2.6592927012953724e-05, 'epoch': 0.41}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 12:55:02,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.43 | bwd_microstep: 1269.91 | bwd_inner_microstep: 1269.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 12:55:04,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1408.99 | bwd_inner_microstep: 1408.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 12:55:06,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.22 | bwd_microstep: 1340.77 | bwd_inner_microstep: 1340.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 12:55:08,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.18 | bwd_microstep: 1149.44 | bwd_inner_microstep: 1149.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 12:55:10,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1481.03 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 12:55:11,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.33 | bwd_microstep: 1242.92 | bwd_inner_microstep: 1242.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2210
[2024-06-10 12:55:13,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.70 | bwd_microstep: 860.19 | bwd_inner_microstep: 860.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 12:55:14,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.44 | bwd_microstep: 793.32 | bwd_inner_microstep: 793.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 12:55:15,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1280.09 | bwd_inner_microstep: 1280.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 12:55:17,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.54 | bwd_microstep: 1189.33 | bwd_inner_microstep: 1189.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 12:55:19,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.60 | bwd_microstep: 1517.24 | bwd_inner_microstep: 1517.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 12:55:21,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.00 | bwd_microstep: 1185.63 | bwd_inner_microstep: 1185.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1963
[2024-06-10 12:55:22,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.64 | bwd_microstep: 854.06 | bwd_inner_microstep: 854.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2009
[2024-06-10 12:55:23,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.70 | bwd_microstep: 930.46 | bwd_inner_microstep: 930.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 12:55:25,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.94 | bwd_microstep: 1439.12 | bwd_inner_microstep: 1439.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3441
[2024-06-10 12:55:27,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.89 | bwd_microstep: 1302.08 | bwd_inner_microstep: 1302.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 12:55:29,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.62 | bwd_microstep: 1589.90 | bwd_inner_microstep: 1589.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 12:55:31,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.56 | bwd_microstep: 1525.21 | bwd_inner_microstep: 1525.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 12:55:33,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1492.93 | bwd_inner_microstep: 1492.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 12:55:36,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.72 | bwd_microstep: 1555.72 | bwd_inner_microstep: 1555.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 12:55:38,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1504.01 | bwd_inner_microstep: 1503.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 12:55:40,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1402.36 | bwd_inner_microstep: 1402.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 12:55:41,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.40 | bwd_microstep: 976.30 | bwd_inner_microstep: 976.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 12:55:43,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1379.17 | bwd_inner_microstep: 1379.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 12:55:44,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.56 | bwd_microstep: 699.19 | bwd_inner_microstep: 699.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 12:55:46,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.23 | bwd_microstep: 1499.30 | bwd_inner_microstep: 1499.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2045
[2024-06-10 12:55:47,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.39 | bwd_microstep: 1002.81 | bwd_inner_microstep: 1002.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3680
[2024-06-10 12:55:49,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.19 | bwd_microstep: 1357.22 | bwd_inner_microstep: 1357.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 12:55:51,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 1443.67 | bwd_inner_microstep: 1443.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3594
[2024-06-10 12:55:53,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.49 | bwd_microstep: 1707.72 | bwd_inner_microstep: 1707.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430
[2024-06-10 12:55:55,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1408.90 | bwd_inner_microstep: 1408.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2939
[2024-06-10 12:56:02,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 12:56:02,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.79 | bwd_microstep: 5597.84 | bwd_inner_microstep: 1349.39 | bwd_allreduce_microstep: 4248.40 | step_microstep: 38.26
[2024-06-10 12:56:02,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15365.13 | bwd: 45386.84 | bwd_inner: 41137.54 | bwd_allreduce: 4248.62 | step: 39.78
{'loss': 1.2307, 'learning_rate': 2.6557479441280066e-05, 'epoch': 0.41}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846
[2024-06-10 12:56:02,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.10 | bwd_microstep: 667.91 | bwd_inner_microstep: 667.78 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3964
[2024-06-10 12:56:05,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.72 | bwd_microstep: 1698.94 | bwd_inner_microstep: 1698.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 12:56:06,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.62 | bwd_microstep: 1240.79 | bwd_inner_microstep: 1240.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3843
[2024-06-10 12:56:08,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1389.19 | bwd_inner_microstep: 1389.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2434
[2024-06-10 12:56:10,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.38 | bwd_microstep: 879.89 | bwd_inner_microstep: 879.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 12:56:11,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1245.85 | bwd_inner_microstep: 1245.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 12:56:13,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.85 | bwd_microstep: 1482.80 | bwd_inner_microstep: 1482.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 12:56:15,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 1475.66 | bwd_inner_microstep: 1475.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 12:56:17,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1383.44 | bwd_inner_microstep: 1383.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 12:56:19,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.89 | bwd_microstep: 1182.24 | bwd_inner_microstep: 1182.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3670
[2024-06-10 12:56:21,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.52 | bwd_microstep: 1352.97 | bwd_inner_microstep: 1352.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4008
[2024-06-10 12:56:23,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.81 | bwd_microstep: 1708.31 | bwd_inner_microstep: 1708.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 12:56:25,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.98 | bwd_microstep: 1335.33 | bwd_inner_microstep: 1335.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 12:56:27,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.16 | bwd_microstep: 1485.57 | bwd_inner_microstep: 1485.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 12:56:29,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.77 | bwd_microstep: 1599.65 | bwd_inner_microstep: 1599.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645
[2024-06-10 12:56:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.61 | bwd_microstep: 1511.67 | bwd_inner_microstep: 1511.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3519
[2024-06-10 12:56:33,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.24 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 12:56:35,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.40 | bwd_microstep: 1494.55 | bwd_inner_microstep: 1494.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 12:56:38,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.71 | bwd_microstep: 1524.14 | bwd_inner_microstep: 1524.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 12:56:39,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1387.85 | bwd_inner_microstep: 1387.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 12:56:41,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1404.91 | bwd_inner_microstep: 1404.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505
[2024-06-10 12:56:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.38 | bwd_microstep: 1221.61 | bwd_inner_microstep: 1221.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 12:56:45,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617
[2024-06-10 12:56:47,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.46 | bwd_microstep: 1308.98 | bwd_inner_microstep: 1308.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 12:56:49,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1471.20 | bwd_inner_microstep: 1471.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 12:56:51,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1380.59 | bwd_inner_microstep: 1380.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2203
[2024-06-10 12:56:52,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.31 | bwd_microstep: 959.85 | bwd_inner_microstep: 959.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2877
[2024-06-10 12:56:53,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 403.60 | bwd_microstep: 1069.20 | bwd_inner_microstep: 1069.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3819
[2024-06-10 12:56:56,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.71 | bwd_microstep: 1616.31 | bwd_inner_microstep: 1616.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 12:56:57,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.48 | bwd_microstep: 807.33 | bwd_inner_microstep: 807.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 12:56:58,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.90 | bwd_microstep: 1253.95 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 12:57:02,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 12:57:02,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.50 | bwd_microstep: 2717.42 | bwd_inner_microstep: 1811.67 | bwd_allreduce_microstep: 905.71 | step_microstep: 37.58
[2024-06-10 12:57:02,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16037.62 | bwd: 43949.54 | bwd_inner: 43042.84 | bwd_allreduce: 905.98 | step: 39.09
{'loss': 1.2637, 'learning_rate': 2.6522008774197902e-05, 'epoch': 0.41}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 12:57:04,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3904
[2024-06-10 12:57:06,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.93 | bwd_microstep: 1684.65 | bwd_inner_microstep: 1684.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 12:57:08,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1550.96 | bwd_inner_microstep: 1550.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2306
[2024-06-10 12:57:09,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.66 | bwd_microstep: 943.90 | bwd_inner_microstep: 943.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 12:57:11,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1379.47 | bwd_inner_microstep: 1379.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1890
[2024-06-10 12:57:12,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.30 | bwd_microstep: 682.41 | bwd_inner_microstep: 682.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 12:57:14,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1413.26 | bwd_inner_microstep: 1413.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 12:57:16,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1245.90 | bwd_inner_microstep: 1245.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 12:57:17,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.32 | bwd_microstep: 802.26 | bwd_inner_microstep: 802.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 12:57:19,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 12:57:21,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.14 | bwd_microstep: 1288.32 | bwd_inner_microstep: 1288.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3663
[2024-06-10 12:57:23,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.78 | bwd_microstep: 1455.37 | bwd_inner_microstep: 1455.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3422
[2024-06-10 12:57:25,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1308.20 | bwd_inner_microstep: 1308.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082
[2024-06-10 12:57:26,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.93 | bwd_microstep: 852.41 | bwd_inner_microstep: 852.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 12:57:28,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.25 | bwd_microstep: 1355.98 | bwd_inner_microstep: 1355.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 12:57:30,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.94 | bwd_microstep: 1478.03 | bwd_inner_microstep: 1478.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399
[2024-06-10 12:57:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.65 | bwd_microstep: 1368.78 | bwd_inner_microstep: 1368.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 12:57:34,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 12:57:36,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1422.00 | bwd_inner_microstep: 1421.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3656
[2024-06-10 12:57:37,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1286.09 | bwd_inner_microstep: 1286.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433
[2024-06-10 12:57:39,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.68 | bwd_microstep: 1442.83 | bwd_inner_microstep: 1442.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 12:57:41,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1395.89 | bwd_inner_microstep: 1395.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3576
[2024-06-10 12:57:43,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.72 | bwd_microstep: 1236.12 | bwd_inner_microstep: 1236.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-10 12:57:45,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1327.81 | bwd_inner_microstep: 1327.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 12:57:47,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.74 | bwd_microstep: 1403.84 | bwd_inner_microstep: 1403.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 12:57:48,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.98 | bwd_microstep: 1184.74 | bwd_inner_microstep: 1184.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 12:57:50,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.34 | bwd_microstep: 1182.15 | bwd_inner_microstep: 1182.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849
[2024-06-10 12:57:52,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.03 | bwd_microstep: 1565.36 | bwd_inner_microstep: 1565.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 12:57:54,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1284.22 | bwd_inner_microstep: 1284.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3564
[2024-06-10 12:57:56,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.61 | bwd_microstep: 1232.58 | bwd_inner_microstep: 1232.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-10 12:57:58,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.61 | bwd_microstep: 1749.88 | bwd_inner_microstep: 1749.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 12:58:02,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.57
[2024-06-10 12:58:02,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.13 | bwd_microstep: 3601.03 | bwd_inner_microstep: 1741.57 | bwd_allreduce_microstep: 1859.42 | step_microstep: 37.82
[2024-06-10 12:58:02,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15873.65 | bwd: 44276.10 | bwd_inner: 42415.78 | bwd_allreduce: 1859.64 | step: 39.27
{'loss': 1.2195, 'learning_rate': 2.6486515136634738e-05, 'epoch': 0.41}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3455
[2024-06-10 12:58:04,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1547.74 | bwd_inner_microstep: 1547.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 12:58:06,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.33 | bwd_microstep: 1271.35 | bwd_inner_microstep: 1271.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 12:58:08,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1374.29 | bwd_inner_microstep: 1374.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 12:58:10,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.49 | bwd_microstep: 1280.44 | bwd_inner_microstep: 1280.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 12:58:12,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 12:58:13,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.82 | bwd_microstep: 1147.79 | bwd_inner_microstep: 1147.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 12:58:16,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.46 | bwd_microstep: 1629.31 | bwd_inner_microstep: 1629.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 12:58:18,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.02 | bwd_microstep: 1385.53 | bwd_inner_microstep: 1385.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 12:58:19,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1245.13 | bwd_inner_microstep: 1245.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2468
[2024-06-10 12:58:21,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.06 | bwd_microstep: 957.15 | bwd_inner_microstep: 957.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3478
[2024-06-10 12:58:22,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.31 | bwd_microstep: 1212.42 | bwd_inner_microstep: 1212.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-10 12:58:24,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.54 | bwd_microstep: 1428.86 | bwd_inner_microstep: 1428.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 12:58:26,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.18 | bwd_microstep: 1191.36 | bwd_inner_microstep: 1191.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3446
[2024-06-10 12:58:28,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1395.14 | bwd_inner_microstep: 1395.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066
[2024-06-10 12:58:29,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.92 | bwd_microstep: 910.64 | bwd_inner_microstep: 910.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 12:58:31,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.77 | bwd_microstep: 1501.48 | bwd_inner_microstep: 1501.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 12:58:33,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1507.94 | bwd_inner_microstep: 1507.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 12:58:35,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1406.62 | bwd_inner_microstep: 1406.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 12:58:37,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1553.96 | bwd_inner_microstep: 1553.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 12:58:39,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1509.05 | bwd_inner_microstep: 1509.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 12:58:41,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.16 | bwd_microstep: 1457.48 | bwd_inner_microstep: 1457.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 12:58:43,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.24 | bwd_microstep: 1379.30 | bwd_inner_microstep: 1379.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 12:58:45,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.80 | bwd_microstep: 1253.98 | bwd_inner_microstep: 1253.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 12:58:47,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.93 | bwd_microstep: 1457.82 | bwd_inner_microstep: 1457.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 12:58:49,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.50 | bwd_microstep: 1253.59 | bwd_inner_microstep: 1253.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-10 12:58:50,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.00 | bwd_microstep: 817.96 | bwd_inner_microstep: 817.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2050
[2024-06-10 12:58:51,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.05 | bwd_microstep: 988.00 | bwd_inner_microstep: 987.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 12:58:54,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1629.39 | bwd_inner_microstep: 1629.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3390
[2024-06-10 12:58:55,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.39 | bwd_microstep: 1273.16 | bwd_inner_microstep: 1273.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3578
[2024-06-10 12:58:57,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.71 | bwd_microstep: 1329.92 | bwd_inner_microstep: 1329.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446
[2024-06-10 12:58:59,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.43 | bwd_microstep: 1281.25 | bwd_inner_microstep: 1281.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 12:59:04,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 12:59:04,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 4109.84 | bwd_inner_microstep: 1582.90 | bwd_allreduce_microstep: 2526.88 | step_microstep: 37.87
[2024-06-10 12:59:04,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15880.16 | bwd: 45068.95 | bwd_inner: 42541.16 | bwd_allreduce: 2527.11 | step: 39.34
{'loss': 1.238, 'learning_rate': 2.645099865359899e-05, 'epoch': 0.41}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-10 12:59:05,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.40 | bwd_microstep: 1144.88 | bwd_inner_microstep: 1144.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 12:59:06,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 792.93 | bwd_inner_microstep: 792.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3884
[2024-06-10 12:59:09,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.22 | bwd_microstep: 1683.93 | bwd_inner_microstep: 1683.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 12:59:10,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1246.75 | bwd_inner_microstep: 1246.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 12:59:12,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1280.90 | bwd_inner_microstep: 1280.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 12:59:14,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.48 | bwd_microstep: 1278.24 | bwd_inner_microstep: 1278.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 12:59:16,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1407.08 | bwd_inner_microstep: 1407.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3747
[2024-06-10 12:59:18,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.59 | bwd_microstep: 1339.01 | bwd_inner_microstep: 1338.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 12:59:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.20 | bwd_microstep: 1434.88 | bwd_inner_microstep: 1434.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 12:59:21,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.00 | bwd_microstep: 1148.58 | bwd_inner_microstep: 1148.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442
[2024-06-10 12:59:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1324.20 | bwd_inner_microstep: 1324.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3505
[2024-06-10 12:59:25,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.66 | bwd_microstep: 1443.45 | bwd_inner_microstep: 1443.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 12:59:27,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.53 | bwd_microstep: 1518.78 | bwd_inner_microstep: 1518.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 12:59:28,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.11 | bwd_microstep: 789.45 | bwd_inner_microstep: 789.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 12:59:30,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.31 | bwd_microstep: 1612.35 | bwd_inner_microstep: 1612.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3675
[2024-06-10 12:59:32,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.75 | bwd_microstep: 1454.98 | bwd_inner_microstep: 1454.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 12:59:34,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.03 | bwd_microstep: 1391.02 | bwd_inner_microstep: 1390.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 12:59:35,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 696.39 | bwd_inner_microstep: 696.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3623
[2024-06-10 12:59:38,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.83 | bwd_microstep: 1552.65 | bwd_inner_microstep: 1552.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 12:59:39,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1350.98 | bwd_inner_microstep: 1350.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 12:59:41,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1287.07 | bwd_inner_microstep: 1287.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 12:59:43,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1613.66 | bwd_inner_microstep: 1613.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 12:59:45,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1405.63 | bwd_inner_microstep: 1405.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 12:59:47,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1555.78 | bwd_inner_microstep: 1555.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 12:59:49,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-10 12:59:52,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.73 | bwd_microstep: 1543.14 | bwd_inner_microstep: 1543.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3680
[2024-06-10 12:59:53,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.57 | bwd_microstep: 1259.31 | bwd_inner_microstep: 1259.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3588
[2024-06-10 12:59:55,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.32 | bwd_microstep: 1530.61 | bwd_inner_microstep: 1530.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 12:59:57,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1311.90 | bwd_inner_microstep: 1311.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 12:59:59,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1497.87 | bwd_inner_microstep: 1497.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 13:00:01,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.62 | bwd_microstep: 1607.39 | bwd_inner_microstep: 1607.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 13:00:05,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.24 | optimizer_step: 6.56
[2024-06-10 13:00:05,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 2570.10 | bwd_inner_microstep: 1417.64 | bwd_allreduce_microstep: 1152.41 | step_microstep: 38.99
[2024-06-10 13:00:05,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16162.59 | bwd: 44487.78 | bwd_inner: 43334.47 | bwd_allreduce: 1152.64 | step: 40.46
, 61.92s/it]
 41%|████      | 710/1726 [12:17:37<17:28:47, 61.94s/it]


 41%|████      | 710/1726 [12:17:37<17:28:47, 61.94s/it]
 41%|████      | 711/1726 [12:18:38<17:23:26, 61.68s/it]


 41%|████      | 711/1726 [12:18:38<17:23:26, 61.68s/it]
 41%|████▏     | 712/1726 [12:19:39<17:15:29, 61.27s/it]


 41%|████▏     | 712/1726 [12:19:39<17:15:29, 61.27s/it]
 41%|████▏     | 713/1726 [12:20:39<17:10:25, 61.03s/it]


 41%|████▏     | 713/1726 [12:20:39<17:10:25, 61.03s/it]
 41%|████▏     | 714/1726 [12:21:40<17:10:39, 61.11s/it]


 41%|████▏     | 714/1726 [12:21:40<17:10:39, 61.11s/it]
 41%|████▏     | 715/1726 [12:22:41<17:09:02, 61.{'loss': 1.2902, 'learning_rate': 2.6415459450179515e-05, 'epoch': 0.41}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 13:00:07,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1394.78 | bwd_inner_microstep: 1394.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 13:00:08,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.59 | bwd_microstep: 1275.96 | bwd_inner_microstep: 1275.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495
[2024-06-10 13:00:10,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.85 | bwd_microstep: 1215.79 | bwd_inner_microstep: 1215.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 13:00:12,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.90 | bwd_microstep: 1650.25 | bwd_inner_microstep: 1650.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3833
[2024-06-10 13:00:14,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.75 | bwd_microstep: 1484.28 | bwd_inner_microstep: 1484.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 13:00:16,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 1438.06 | bwd_inner_microstep: 1438.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 13:00:18,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.12 | bwd_microstep: 1251.03 | bwd_inner_microstep: 1251.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 13:00:20,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.75 | bwd_microstep: 1145.41 | bwd_inner_microstep: 1145.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-10 13:00:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1411.30 | bwd_inner_microstep: 1411.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 13:00:23,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.30 | bwd_microstep: 793.77 | bwd_inner_microstep: 793.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3685
[2024-06-10 13:00:25,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.98 | bwd_microstep: 1546.75 | bwd_inner_microstep: 1546.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 13:00:27,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.57 | bwd_microstep: 1481.24 | bwd_inner_microstep: 1481.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 13:00:29,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1335.48 | bwd_inner_microstep: 1335.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3688
[2024-06-10 13:00:31,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.96 | bwd_microstep: 1687.08 | bwd_inner_microstep: 1687.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 13:00:33,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.42 | bwd_microstep: 1418.10 | bwd_inner_microstep: 1418.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3518
[2024-06-10 13:00:35,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1317.47 | bwd_inner_microstep: 1317.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 13:00:37,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3637
[2024-06-10 13:00:38,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.09 | bwd_microstep: 1347.85 | bwd_inner_microstep: 1347.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 13:00:40,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.13 | bwd_microstep: 1405.23 | bwd_inner_microstep: 1405.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510
[2024-06-10 13:00:42,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1407.77 | bwd_inner_microstep: 1407.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837
[2024-06-10 13:00:44,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.67 | bwd_microstep: 1485.09 | bwd_inner_microstep: 1485.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 13:00:46,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1257.98 | bwd_inner_microstep: 1257.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3811
[2024-06-10 13:00:48,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.10 | bwd_microstep: 1617.22 | bwd_inner_microstep: 1617.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 13:00:50,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1434.72 | bwd_inner_microstep: 1434.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 13:00:52,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.30 | bwd_microstep: 1606.75 | bwd_inner_microstep: 1606.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 13:00:54,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.30 | bwd_microstep: 1456.31 | bwd_inner_microstep: 1456.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 13:00:56,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.46 | bwd_microstep: 1412.79 | bwd_inner_microstep: 1412.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3817
[2024-06-10 13:00:59,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.90 | bwd_microstep: 1594.69 | bwd_inner_microstep: 1594.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2206
[2024-06-10 13:01:00,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.82 | bwd_microstep: 956.25 | bwd_inner_microstep: 956.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 13:01:02,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.40 | bwd_microstep: 1447.34 | bwd_inner_microstep: 1447.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-10 13:01:04,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.59 | bwd_microstep: 1591.03 | bwd_inner_microstep: 1591.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 13:01:07,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.31 | optimizer_step: 6.58
[2024-06-10 13:01:07,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.10 | bwd_microstep: 2283.81 | bwd_inner_microstep: 1833.23 | bwd_allreduce_microstep: 450.52 | step_microstep: 39.40
[2024-06-10 13:01:07,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16723.22 | bwd: 45443.86 | bwd_inner: 44992.43 | bwd_allreduce: 450.75 | step: 41.01
{'loss': 1.2278, 'learning_rate': 2.637989765154521e-05, 'epoch': 0.41}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 13:01:09,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1341.34 | bwd_inner_microstep: 1341.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-10 13:01:10,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.15 | bwd_microstep: 694.51 | bwd_inner_microstep: 694.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 13:01:12,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.43 | bwd_microstep: 1342.93 | bwd_inner_microstep: 1342.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3927
[2024-06-10 13:01:14,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.17 | bwd_microstep: 1592.44 | bwd_inner_microstep: 1592.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 13:01:16,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.20 | bwd_microstep: 1650.85 | bwd_inner_microstep: 1650.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 13:01:18,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3685
[2024-06-10 13:01:20,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.80 | bwd_microstep: 1457.75 | bwd_inner_microstep: 1457.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1890
[2024-06-10 13:01:21,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.25 | bwd_microstep: 684.83 | bwd_inner_microstep: 684.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2133
[2024-06-10 13:01:22,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.54 | bwd_microstep: 893.85 | bwd_inner_microstep: 893.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 13:01:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1317.90 | bwd_inner_microstep: 1317.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 13:01:26,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1286.58 | bwd_inner_microstep: 1286.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3508
[2024-06-10 13:01:28,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1343.99 | bwd_inner_microstep: 1343.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3418
[2024-06-10 13:01:30,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.35 | bwd_microstep: 1503.24 | bwd_inner_microstep: 1503.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3959
[2024-06-10 13:01:32,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1494.49 | bwd_inner_microstep: 1494.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 13:01:34,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.69 | bwd_microstep: 1373.36 | bwd_inner_microstep: 1373.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3694
[2024-06-10 13:01:36,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.09 | bwd_microstep: 1721.51 | bwd_inner_microstep: 1721.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687
[2024-06-10 13:01:39,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.60 | bwd_microstep: 1724.61 | bwd_inner_microstep: 1724.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3670
[2024-06-10 13:01:41,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.81 | bwd_microstep: 1570.64 | bwd_inner_microstep: 1570.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2095
[2024-06-10 13:01:42,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.56 | bwd_microstep: 915.01 | bwd_inner_microstep: 914.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3828
[2024-06-10 13:01:44,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.07 | bwd_microstep: 1726.75 | bwd_inner_microstep: 1726.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 13:01:46,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1437.66 | bwd_inner_microstep: 1437.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 13:01:47,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.66 | bwd_microstep: 699.03 | bwd_inner_microstep: 699.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 13:01:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1405.65 | bwd_inner_microstep: 1405.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3587
[2024-06-10 13:01:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1467.34 | bwd_inner_microstep: 1467.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3595
[2024-06-10 13:01:53,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.28 | bwd_microstep: 1209.74 | bwd_inner_microstep: 1209.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 13:01:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1437.49 | bwd_inner_microstep: 1437.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3722
[2024-06-10 13:01:57,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1458.09 | bwd_inner_microstep: 1458.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3452
[2024-06-10 13:01:59,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1547.69 | bwd_inner_microstep: 1547.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-10 13:02:01,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1612.05 | bwd_inner_microstep: 1612.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379
[2024-06-10 13:02:03,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.14 | bwd_microstep: 1435.12 | bwd_inner_microstep: 1435.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 13:02:05,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.68 | bwd_microstep: 1146.01 | bwd_inner_microstep: 1145.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3631
[2024-06-10 13:02:09,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.18 | optimizer_step: 6.61
[2024-06-10 13:02:09,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 3257.42 | bwd_inner_microstep: 1772.13 | bwd_allreduce_microstep: 1485.24 | step_microstep: 38.00
[2024-06-10 13:02:09,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16197.44 | bwd: 45134.24 | bwd_inner: 43648.10 | bwd_allreduce: 1485.47 | step: 39.43
{'loss': 1.2383, 'learning_rate': 2.6344313382944537e-05, 'epoch': 0.42}
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1929
[2024-06-10 13:02:10,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.07 | bwd_microstep: 771.02 | bwd_inner_microstep: 770.95 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3398
[2024-06-10 13:02:11,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.09 | bwd_microstep: 1180.24 | bwd_inner_microstep: 1180.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 13:02:13,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.88 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2319
[2024-06-10 13:02:14,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.56 | bwd_microstep: 851.86 | bwd_inner_microstep: 851.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 13:02:16,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1341.99 | bwd_inner_microstep: 1341.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3766
[2024-06-10 13:02:18,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1374.15 | bwd_inner_microstep: 1374.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 13:02:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.84 | bwd_microstep: 1531.33 | bwd_inner_microstep: 1531.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 13:02:21,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.14 | bwd_microstep: 819.26 | bwd_inner_microstep: 819.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2027
[2024-06-10 13:02:23,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.26 | bwd_microstep: 810.12 | bwd_inner_microstep: 810.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442
[2024-06-10 13:02:24,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.59 | bwd_microstep: 1189.40 | bwd_inner_microstep: 1189.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487
[2024-06-10 13:02:26,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1413.01 | bwd_inner_microstep: 1412.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504
[2024-06-10 13:02:28,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1317.61 | bwd_inner_microstep: 1317.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2459
[2024-06-10 13:02:29,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.78 | bwd_microstep: 1013.89 | bwd_inner_microstep: 1013.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 13:02:31,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1345.45 | bwd_inner_microstep: 1345.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674
[2024-06-10 13:02:33,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1557.66 | bwd_inner_microstep: 1557.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 13:02:34,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.60 | bwd_microstep: 789.77 | bwd_inner_microstep: 789.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 13:02:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1244.74 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3835
[2024-06-10 13:02:39,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.01 | bwd_microstep: 1752.44 | bwd_inner_microstep: 1752.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 13:02:41,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1492.98 | bwd_inner_microstep: 1492.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 13:02:43,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.73 | bwd_microstep: 1341.19 | bwd_inner_microstep: 1341.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 13:02:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.57 | bwd_microstep: 1531.24 | bwd_inner_microstep: 1531.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 13:02:47,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.50 | bwd_microstep: 1648.66 | bwd_inner_microstep: 1648.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 13:02:49,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.71 | bwd_microstep: 1425.79 | bwd_inner_microstep: 1425.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3456
[2024-06-10 13:02:51,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.69 | bwd_microstep: 1188.40 | bwd_inner_microstep: 1188.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 13:02:53,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1609.89 | bwd_inner_microstep: 1609.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3528
[2024-06-10 13:02:55,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.30 | bwd_microstep: 1420.54 | bwd_inner_microstep: 1420.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 13:02:57,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1392.64 | bwd_inner_microstep: 1392.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 13:02:59,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.30 | bwd_microstep: 1459.01 | bwd_inner_microstep: 1458.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3605
[2024-06-10 13:03:01,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.76 | bwd_microstep: 1643.37 | bwd_inner_microstep: 1643.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732
[2024-06-10 13:03:03,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.18 | bwd_microstep: 1637.20 | bwd_inner_microstep: 1637.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3778
[2024-06-10 13:03:05,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.06 | bwd_microstep: 1363.91 | bwd_inner_microstep: 1363.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 13:03:11,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.59
[2024-06-10 13:03:11,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 5359.07 | bwd_inner_microstep: 1821.91 | bwd_allreduce_microstep: 3537.10 | step_microstep: 38.62
[2024-06-10 13:03:11,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15782.14 | bwd: 46104.12 | bwd_inner: 42566.07 | bwd_allreduce: 3537.36 | step: 40.10
{'loss': 1.2281, 'learning_rate': 2.6308706769705118e-05, 'epoch': 0.42}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 13:03:13,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.32 | bwd_microstep: 1240.20 | bwd_inner_microstep: 1240.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3964
[2024-06-10 13:03:15,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1490.93 | bwd_inner_microstep: 1490.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 13:03:16,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.76 | bwd_microstep: 725.63 | bwd_inner_microstep: 725.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2311
[2024-06-10 13:03:17,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.28 | bwd_microstep: 881.63 | bwd_inner_microstep: 881.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 13:03:19,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.26 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 13:03:20,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.23 | bwd_microstep: 677.24 | bwd_inner_microstep: 677.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 13:03:22,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1429.46 | bwd_inner_microstep: 1429.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 13:03:24,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.31 | bwd_microstep: 1428.61 | bwd_inner_microstep: 1428.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 13:03:25,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.59 | bwd_microstep: 1283.74 | bwd_inner_microstep: 1283.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2100
[2024-06-10 13:03:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.03 | bwd_microstep: 855.53 | bwd_inner_microstep: 855.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 13:03:28,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1388.41 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-10 13:03:30,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.75 | bwd_microstep: 1414.85 | bwd_inner_microstep: 1414.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2148
[2024-06-10 13:03:32,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.05 | bwd_microstep: 948.75 | bwd_inner_microstep: 948.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 13:03:34,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1384.53 | bwd_inner_microstep: 1384.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 13:03:36,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.68 | bwd_microstep: 1435.42 | bwd_inner_microstep: 1435.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2148
[2024-06-10 13:03:37,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 787.97 | bwd_inner_microstep: 787.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1935
[2024-06-10 13:03:38,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.05 | bwd_microstep: 819.45 | bwd_inner_microstep: 819.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 13:03:40,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.54 | bwd_microstep: 1424.33 | bwd_inner_microstep: 1424.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2504
[2024-06-10 13:03:41,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.32 | bwd_microstep: 960.78 | bwd_inner_microstep: 960.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 13:03:43,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.45 | bwd_microstep: 1183.13 | bwd_inner_microstep: 1183.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126
[2024-06-10 13:03:44,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.44 | bwd_microstep: 927.13 | bwd_inner_microstep: 927.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 13:03:46,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1555.38 | bwd_inner_microstep: 1555.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 13:03:48,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.85 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 13:03:50,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1554.77 | bwd_inner_microstep: 1554.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 13:03:52,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1457.39 | bwd_inner_microstep: 1457.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569
[2024-06-10 13:03:54,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.95 | bwd_microstep: 1204.81 | bwd_inner_microstep: 1204.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3829
[2024-06-10 13:03:56,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.65 | bwd_microstep: 1388.38 | bwd_inner_microstep: 1388.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 13:03:58,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1305.85 | bwd_inner_microstep: 1305.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3584
[2024-06-10 13:03:59,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.79 | bwd_microstep: 1206.45 | bwd_inner_microstep: 1206.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 13:04:01,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1336.14 | bwd_inner_microstep: 1336.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 13:04:03,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1378.80 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3589
[2024-06-10 13:04:12,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 13:04:12,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 8030.96 | bwd_inner_microstep: 1771.94 | bwd_allreduce_microstep: 6258.97 | step_microstep: 38.31
[2024-06-10 13:04:12,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14735.40 | bwd: 45642.20 | bwd_inner: 39382.29 | bwd_allreduce: 6259.20 | step: 39.80
{'loss': 1.2516, 'learning_rate': 2.6273077937233243e-05, 'epoch': 0.42}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4103
[2024-06-10 13:04:14,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.64 | bwd_microstep: 1710.07 | bwd_inner_microstep: 1710.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 13:04:16,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.19 | bwd_microstep: 1277.64 | bwd_inner_microstep: 1277.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3857
[2024-06-10 13:04:18,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1491.77 | bwd_inner_microstep: 1491.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 13:04:20,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1377.30 | bwd_inner_microstep: 1377.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 13:04:22,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.10 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3406
[2024-06-10 13:04:23,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.85 | bwd_microstep: 1309.32 | bwd_inner_microstep: 1309.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 13:04:25,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.57 | bwd_microstep: 1148.96 | bwd_inner_microstep: 1148.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407
[2024-06-10 13:04:27,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1311.40 | bwd_inner_microstep: 1311.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 13:04:29,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1251.27 | bwd_inner_microstep: 1251.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3424
[2024-06-10 13:04:30,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.30 | bwd_microstep: 1184.27 | bwd_inner_microstep: 1184.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3630
[2024-06-10 13:04:33,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.49 | bwd_microstep: 1705.52 | bwd_inner_microstep: 1705.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 13:04:35,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.96 | bwd_microstep: 1519.38 | bwd_inner_microstep: 1519.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651
[2024-06-10 13:04:37,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.01 | bwd_microstep: 1714.27 | bwd_inner_microstep: 1714.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 13:04:39,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 13:04:41,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.36 | bwd_microstep: 1520.99 | bwd_inner_microstep: 1520.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445
[2024-06-10 13:04:43,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.91 | bwd_microstep: 1157.13 | bwd_inner_microstep: 1157.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 13:04:45,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.01 | bwd_microstep: 1657.88 | bwd_inner_microstep: 1657.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 13:04:47,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1284.51 | bwd_inner_microstep: 1284.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 13:04:49,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1390.95 | bwd_inner_microstep: 1390.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3620
[2024-06-10 13:04:51,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.59 | bwd_microstep: 1492.53 | bwd_inner_microstep: 1492.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 13:04:52,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.84 | bwd_microstep: 1279.53 | bwd_inner_microstep: 1279.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3527
[2024-06-10 13:04:54,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1196.11 | bwd_inner_microstep: 1196.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 13:04:56,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.51 | bwd_microstep: 1531.25 | bwd_inner_microstep: 1531.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 13:04:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.48 | bwd_microstep: 1656.61 | bwd_inner_microstep: 1656.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3783
[2024-06-10 13:05:01,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.17 | bwd_microstep: 1574.34 | bwd_inner_microstep: 1574.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623
[2024-06-10 13:05:03,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.06 | bwd_microstep: 1345.49 | bwd_inner_microstep: 1345.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 13:05:04,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 13:05:07,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.84 | bwd_microstep: 1627.83 | bwd_inner_microstep: 1627.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3864
[2024-06-10 13:05:09,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1370.27 | bwd_inner_microstep: 1370.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 13:05:11,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1404.65 | bwd_inner_microstep: 1404.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 13:05:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1543.75 | bwd_inner_microstep: 1543.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574
[2024-06-10 13:05:15,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.34 | optimizer_step: 6.65
[2024-06-10 13:05:15,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.67 | bwd_microstep: 1602.11 | bwd_inner_microstep: 1594.45 | bwd_allreduce_microstep: 7.62 | step_microstep: 39.15
[2024-06-10 13:05:15,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17115.71 | bwd: 45784.27 | bwd_inner: 45775.76 | bwd_allreduce: 7.84 | step: 40.61
07s/it]


 41%|████▏     | 715/1726 [12:22:41<17:09:02, 61.07s/it]
 41%|████▏     | 716/1726 [12:23:44<17:15:18, 61.50s/it]


 41%|████▏     | 716/1726 [12:23:44<17:15:18, 61.50s/it]
 42%|████▏     | 717/1726 [12:24:45<17:15:07, 61.55s/it]


 42%|████▏     | 717/1726 [12:24:45<17:15:07, 61.55s/it]
 42%|████▏     | 718/1726 [12:25:48<17:17:27, 61.75s/it]


 42%|████▏     | 718/1726 [12:25:48<17:17:27, 61.75s/it]
 42%|████▏     | 719/1726 [12:26:48<17:11:07, 61.44s/it]


 42%|████▏     | 719/1726 [12:26:48<17:11:07, 61.44s/it]
 42%|████▏     | 720/1726 [12:27:52<17:19:09, 61.98s/it]
                                                        {'loss': 1.2551, 'learning_rate': 2.6237427011013486e-05, 'epoch': 0.42}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 13:05:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1483.21 | bwd_inner_microstep: 1483.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 13:05:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1380.34 | bwd_inner_microstep: 1380.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3474
[2024-06-10 13:05:21,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1425.28 | bwd_inner_microstep: 1425.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3877
[2024-06-10 13:05:23,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1583.07 | bwd_inner_microstep: 1583.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 13:05:25,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1412.65 | bwd_inner_microstep: 1412.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482
[2024-06-10 13:05:27,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.42 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2445
[2024-06-10 13:05:28,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.31 | bwd_microstep: 947.37 | bwd_inner_microstep: 947.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3472
[2024-06-10 13:05:30,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1230.71 | bwd_inner_microstep: 1230.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 13:05:32,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.04 | bwd_microstep: 1386.52 | bwd_inner_microstep: 1386.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 13:05:33,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.46 | bwd_microstep: 795.82 | bwd_inner_microstep: 795.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 13:05:34,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1253.45 | bwd_inner_microstep: 1253.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2168
[2024-06-10 13:05:36,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.13 | bwd_microstep: 948.34 | bwd_inner_microstep: 948.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 13:05:38,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.11 | bwd_inner_microstep: 1348.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 13:05:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.00 | bwd_microstep: 1522.01 | bwd_inner_microstep: 1521.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-10 13:05:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.46 | bwd_microstep: 1616.68 | bwd_inner_microstep: 1616.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3651
[2024-06-10 13:05:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.66 | bwd_microstep: 1567.28 | bwd_inner_microstep: 1567.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 13:05:46,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1290.53 | bwd_inner_microstep: 1290.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 13:05:48,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.29 | bwd_microstep: 1657.63 | bwd_inner_microstep: 1657.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 13:05:50,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1396.46 | bwd_inner_microstep: 1396.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 13:05:51,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.54 | bwd_microstep: 698.63 | bwd_inner_microstep: 698.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3833
[2024-06-10 13:05:53,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.65 | bwd_microstep: 1387.69 | bwd_inner_microstep: 1387.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2010
[2024-06-10 13:05:54,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.33 | bwd_microstep: 841.28 | bwd_inner_microstep: 841.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3541
[2024-06-10 13:05:56,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1231.56 | bwd_inner_microstep: 1231.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555
[2024-06-10 13:05:58,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1328.54 | bwd_inner_microstep: 1328.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 13:06:00,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.93 | bwd_microstep: 1585.65 | bwd_inner_microstep: 1585.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2053
[2024-06-10 13:06:01,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.47 | bwd_microstep: 944.74 | bwd_inner_microstep: 944.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2273
[2024-06-10 13:06:03,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.45 | bwd_microstep: 1009.56 | bwd_inner_microstep: 1009.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 13:06:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 1646.37 | bwd_inner_microstep: 1646.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3731
[2024-06-10 13:06:07,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.06 | bwd_microstep: 1731.25 | bwd_inner_microstep: 1731.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3827
[2024-06-10 13:06:10,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.19 | bwd_microstep: 1822.68 | bwd_inner_microstep: 1822.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053
[2024-06-10 13:06:11,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.23 | bwd_microstep: 815.21 | bwd_inner_microstep: 815.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 13:06:16,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 13:06:16,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 4396.46 | bwd_inner_microstep: 1694.81 | bwd_allreduce_microstep: 2701.59 | step_microstep: 38.29
[2024-06-10 13:06:16,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15730.63 | bwd: 44931.18 | bwd_inner: 42228.68 | bwd_allreduce: 2701.83 | step: 39.82
{'loss': 1.2607, 'learning_rate': 2.6201754116608222e-05, 'epoch': 0.42}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545
[2024-06-10 13:06:18,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1585.68 | bwd_inner_microstep: 1585.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 13:06:20,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.92 | bwd_microstep: 1185.49 | bwd_inner_microstep: 1185.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 13:06:22,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.73 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 13:06:23,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 1339.32 | bwd_inner_microstep: 1339.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 13:06:25,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.16 | bwd_microstep: 1484.26 | bwd_inner_microstep: 1484.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301
[2024-06-10 13:06:27,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.25 | bwd_microstep: 975.04 | bwd_inner_microstep: 975.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 13:06:28,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.77 | bwd_microstep: 676.81 | bwd_inner_microstep: 676.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2255
[2024-06-10 13:06:29,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.99 | bwd_microstep: 965.57 | bwd_inner_microstep: 965.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 13:06:31,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.23 | bwd_microstep: 1425.08 | bwd_inner_microstep: 1425.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 13:06:33,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1348.24 | bwd_inner_microstep: 1348.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 13:06:34,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.00 | bwd_microstep: 802.03 | bwd_inner_microstep: 802.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-10 13:06:36,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.09 | bwd_microstep: 1644.93 | bwd_inner_microstep: 1644.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 13:06:38,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1348.28 | bwd_inner_microstep: 1348.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 13:06:39,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 799.01 | bwd_inner_microstep: 798.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199
[2024-06-10 13:06:41,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.76 | bwd_microstep: 954.93 | bwd_inner_microstep: 954.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2117
[2024-06-10 13:06:42,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.00 | bwd_microstep: 778.63 | bwd_inner_microstep: 778.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 13:06:44,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.20 | bwd_microstep: 1390.90 | bwd_inner_microstep: 1390.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 13:06:46,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1486.57 | bwd_inner_microstep: 1486.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2914
[2024-06-10 13:06:47,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.11 | bwd_microstep: 1156.93 | bwd_inner_microstep: 1156.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2310
[2024-06-10 13:06:48,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.62 | bwd_microstep: 948.02 | bwd_inner_microstep: 947.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454
[2024-06-10 13:06:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.90 | bwd_microstep: 1190.68 | bwd_inner_microstep: 1190.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 13:06:52,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1400.21 | bwd_inner_microstep: 1400.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 13:06:54,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1510.57 | bwd_inner_microstep: 1510.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 13:06:56,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3610
[2024-06-10 13:06:58,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1472.58 | bwd_inner_microstep: 1472.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3552
[2024-06-10 13:07:00,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.65 | bwd_microstep: 1330.97 | bwd_inner_microstep: 1330.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 13:07:02,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.23 | bwd_microstep: 1254.73 | bwd_inner_microstep: 1254.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 13:07:04,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.74 | bwd_microstep: 1511.78 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2625
[2024-06-10 13:07:05,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.05 | bwd_microstep: 1210.19 | bwd_inner_microstep: 1210.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2277
[2024-06-10 13:07:07,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.44 | bwd_microstep: 1005.90 | bwd_inner_microstep: 1005.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 13:07:09,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 13:07:18,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 13:07:18,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.05 | bwd_microstep: 8991.71 | bwd_inner_microstep: 1873.50 | bwd_allreduce_microstep: 7118.15 | step_microstep: 39.27
[2024-06-10 13:07:18,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14950.16 | bwd: 47245.83 | bwd_inner: 40126.77 | bwd_allreduce: 7118.38 | step: 40.72
{'loss': 1.2905, 'learning_rate': 2.6166059379657197e-05, 'epoch': 0.42}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3464
[2024-06-10 13:07:21,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.54 | bwd_microstep: 1564.64 | bwd_inner_microstep: 1564.47 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 13:07:23,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.44 | bwd_microstep: 1479.20 | bwd_inner_microstep: 1479.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-10 13:07:25,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.55 | bwd_microstep: 1556.54 | bwd_inner_microstep: 1556.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 13:07:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1243.66 | bwd_inner_microstep: 1243.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 13:07:28,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1375.25 | bwd_inner_microstep: 1375.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 13:07:30,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1243.50 | bwd_inner_microstep: 1243.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 13:07:32,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.92 | bwd_microstep: 1547.09 | bwd_inner_microstep: 1547.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1955
[2024-06-10 13:07:33,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.32 | bwd_microstep: 731.66 | bwd_inner_microstep: 731.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-10 13:07:34,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.07 | bwd_microstep: 806.75 | bwd_inner_microstep: 806.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 13:07:36,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.60 | bwd_microstep: 1346.44 | bwd_inner_microstep: 1346.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 13:07:38,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1405.58 | bwd_inner_microstep: 1405.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 13:07:40,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1313.63 | bwd_inner_microstep: 1313.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3444
[2024-06-10 13:07:42,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.20 | bwd_microstep: 1312.41 | bwd_inner_microstep: 1312.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 13:07:44,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1486.51 | bwd_inner_microstep: 1486.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696
[2024-06-10 13:07:46,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.50 | bwd_microstep: 1620.04 | bwd_inner_microstep: 1620.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1956
[2024-06-10 13:07:47,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.51 | bwd_microstep: 888.24 | bwd_inner_microstep: 888.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3661
[2024-06-10 13:07:50,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.85 | bwd_microstep: 1584.54 | bwd_inner_microstep: 1584.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3934
[2024-06-10 13:07:51,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.20 | bwd_microstep: 1301.91 | bwd_inner_microstep: 1301.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 13:07:53,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.12 | bwd_microstep: 1518.79 | bwd_inner_microstep: 1518.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 13:07:55,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.30 | bwd_microstep: 1430.93 | bwd_inner_microstep: 1430.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058
[2024-06-10 13:07:57,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.66 | bwd_microstep: 911.50 | bwd_inner_microstep: 911.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 13:07:58,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 802.97 | bwd_inner_microstep: 802.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 13:08:00,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.57 | bwd_microstep: 1360.08 | bwd_inner_microstep: 1360.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940
[2024-06-10 13:08:01,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.15 | bwd_microstep: 760.30 | bwd_inner_microstep: 760.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 13:08:03,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1512.47 | bwd_inner_microstep: 1512.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 13:08:05,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.15 | bwd_microstep: 1440.32 | bwd_inner_microstep: 1440.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3601
[2024-06-10 13:08:07,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.12 | bwd_microstep: 1437.02 | bwd_inner_microstep: 1436.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 13:08:09,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.84 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 13:08:11,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1592.95 | bwd_inner_microstep: 1592.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573
[2024-06-10 13:08:13,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1422.05 | bwd_inner_microstep: 1422.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 13:08:15,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.00 | bwd_microstep: 1514.74 | bwd_inner_microstep: 1514.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 13:08:19,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 13:08:19,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.82 | bwd_microstep: 3032.51 | bwd_inner_microstep: 1679.10 | bwd_allreduce_microstep: 1353.36 | step_microstep: 37.88
[2024-06-10 13:08:19,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15897.04 | bwd: 43993.59 | bwd_inner: 42639.21 | bwd_allreduce: 1353.64 | step: 39.43
{'loss': 1.2666, 'learning_rate': 2.6130342925877096e-05, 'epoch': 0.42}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 13:08:20,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.99 | bwd_microstep: 1239.36 | bwd_inner_microstep: 1239.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3984
[2024-06-10 13:08:22,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1406.73 | bwd_inner_microstep: 1406.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 13:08:24,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.68 | bwd_microstep: 1477.91 | bwd_inner_microstep: 1477.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428
[2024-06-10 13:08:26,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.87 | bwd_microstep: 1284.32 | bwd_inner_microstep: 1284.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 13:08:27,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.38 | bwd_microstep: 796.38 | bwd_inner_microstep: 796.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874
[2024-06-10 13:08:28,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.30 | bwd_microstep: 676.68 | bwd_inner_microstep: 676.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 13:08:30,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.18 | bwd_microstep: 1185.89 | bwd_inner_microstep: 1185.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 13:08:32,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.01 | bwd_microstep: 1277.95 | bwd_inner_microstep: 1277.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 13:08:34,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1387.13 | bwd_inner_microstep: 1387.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 13:08:35,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.71 | bwd_microstep: 1339.06 | bwd_inner_microstep: 1339.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3664
[2024-06-10 13:08:38,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.95 | bwd_microstep: 1820.13 | bwd_inner_microstep: 1820.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 13:08:40,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.46 | bwd_microstep: 1482.14 | bwd_inner_microstep: 1482.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3028
[2024-06-10 13:08:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.24 | bwd_microstep: 1264.32 | bwd_inner_microstep: 1264.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 13:08:44,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1482.00 | bwd_inner_microstep: 1481.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 13:08:46,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1515.13 | bwd_inner_microstep: 1515.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3946
[2024-06-10 13:08:48,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1401.67 | bwd_inner_microstep: 1401.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 13:08:50,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1396.94 | bwd_inner_microstep: 1396.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658
[2024-06-10 13:08:52,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.19 | bwd_microstep: 1427.52 | bwd_inner_microstep: 1427.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498
[2024-06-10 13:08:54,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1445.82 | bwd_inner_microstep: 1445.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873
[2024-06-10 13:08:56,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.50 | bwd_microstep: 1686.96 | bwd_inner_microstep: 1686.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 13:08:58,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1351.38 | bwd_inner_microstep: 1351.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3577
[2024-06-10 13:09:00,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.56 | bwd_microstep: 1331.34 | bwd_inner_microstep: 1331.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-10 13:09:02,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.19 | bwd_microstep: 1659.45 | bwd_inner_microstep: 1659.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 13:09:04,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.99 | bwd_microstep: 1435.74 | bwd_inner_microstep: 1435.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 13:09:05,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.98 | bwd_microstep: 698.19 | bwd_inner_microstep: 698.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2081
[2024-06-10 13:09:06,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.72 | bwd_microstep: 863.53 | bwd_inner_microstep: 863.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 13:09:08,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.96 | bwd_microstep: 1305.96 | bwd_inner_microstep: 1305.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 13:09:09,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.32 | bwd_microstep: 971.99 | bwd_inner_microstep: 971.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 13:09:12,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.73 | bwd_microstep: 1649.17 | bwd_inner_microstep: 1649.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3584
[2024-06-10 13:09:13,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.86 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 13:09:16,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1639.28 | bwd_inner_microstep: 1639.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 13:09:19,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-10 13:09:19,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 2788.36 | bwd_inner_microstep: 1539.94 | bwd_allreduce_microstep: 1248.37 | step_microstep: 37.70
[2024-06-10 13:09:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15950.50 | bwd: 44018.66 | bwd_inner: 42769.39 | bwd_allreduce: 1248.60 | step: 39.13
{'loss': 1.219, 'learning_rate': 2.6094604881061076e-05, 'epoch': 0.42}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3398
[2024-06-10 13:09:21,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1295.34 | bwd_inner_microstep: 1295.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4589
[2024-06-10 13:09:23,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 687.22 | bwd_microstep: 1854.31 | bwd_inner_microstep: 1854.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 13:09:25,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1477.76 | bwd_inner_microstep: 1477.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3864
[2024-06-10 13:09:27,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1425.69 | bwd_inner_microstep: 1425.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 13:09:29,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.14 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 13:09:31,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.16 | bwd_microstep: 788.72 | bwd_inner_microstep: 788.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 13:09:32,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.82 | bwd_microstep: 1278.52 | bwd_inner_microstep: 1278.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 13:09:34,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.03 | bwd_microstep: 1278.31 | bwd_inner_microstep: 1278.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 13:09:36,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.37 | bwd_microstep: 1249.79 | bwd_inner_microstep: 1249.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-10 13:09:37,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.78 | bwd_microstep: 678.42 | bwd_inner_microstep: 678.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3992
[2024-06-10 13:09:39,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1606.43 | bwd_inner_microstep: 1606.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671
[2024-06-10 13:09:41,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.30 | bwd_microstep: 1620.27 | bwd_inner_microstep: 1620.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 13:09:42,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.36 | bwd_microstep: 788.90 | bwd_inner_microstep: 788.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958
[2024-06-10 13:09:44,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.97 | bwd_microstep: 889.86 | bwd_inner_microstep: 889.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 13:09:45,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1345.70 | bwd_inner_microstep: 1345.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-10 13:09:48,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1578.35 | bwd_inner_microstep: 1578.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 13:09:49,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 788.23 | bwd_inner_microstep: 788.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3516
[2024-06-10 13:09:51,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.58 | bwd_microstep: 1683.74 | bwd_inner_microstep: 1683.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 13:09:53,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1489.16 | bwd_inner_microstep: 1489.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 13:09:54,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.69 | bwd_microstep: 794.37 | bwd_inner_microstep: 794.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 13:09:56,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1485.76 | bwd_inner_microstep: 1485.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-10 13:09:58,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 1422.44 | bwd_inner_microstep: 1422.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 13:10:00,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.85 | bwd_microstep: 1289.76 | bwd_inner_microstep: 1289.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2058
[2024-06-10 13:10:01,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.69 | bwd_microstep: 844.48 | bwd_inner_microstep: 844.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 13:10:03,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.63 | bwd_microstep: 1406.99 | bwd_inner_microstep: 1406.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2440
[2024-06-10 13:10:05,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.69 | bwd_microstep: 1091.73 | bwd_inner_microstep: 1091.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 13:10:06,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1350.85 | bwd_inner_microstep: 1350.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 13:10:08,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.05 | bwd_microstep: 1435.99 | bwd_inner_microstep: 1435.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692
[2024-06-10 13:10:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1328.78 | bwd_inner_microstep: 1328.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-10 13:10:12,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.19 | bwd_microstep: 1617.29 | bwd_inner_microstep: 1617.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-10 13:10:14,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.04 | bwd_microstep: 1451.60 | bwd_inner_microstep: 1451.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060
[2024-06-10 13:10:21,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 13:10:21,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.67 | bwd_microstep: 6489.63 | bwd_inner_microstep: 935.18 | bwd_allreduce_microstep: 5554.40 | step_microstep: 38.02
[2024-06-10 13:10:21,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.28 | bwd: 46674.66 | bwd_inner: 41119.36 | bwd_allreduce: 5554.62 | step: 39.54
{'loss': 1.2533, 'learning_rate': 2.6058845371078353e-05, 'epoch': 0.42}


 42%|████▏     | 720/1726 [12:27:52<17:19:09, 61.98s/it]
 42%|████▏     | 721/1726 [12:28:53<17:13:11, 61.68s/it]


 42%|████▏     | 721/1726 [12:28:53<17:13:11, 61.68s/it]
 42%|████▏     | 722/1726 [12:29:55<17:16:23, 61.94s/it]


 42%|████▏     | 722/1726 [12:29:55<17:16:23, 61.94s/it]
 42%|████▏     | 723/1726 [12:30:55<17:06:48, 61.42s/it]


 42%|████▏     | 723/1726 [12:30:55<17:06:48, 61.42s/it]
 42%|████▏     | 724/1726 [12:31:56<17:00:09, 61.09s/it]


 42%|████▏     | 724/1726 [12:31:56<17:00:09, 61.09s/it]
 42%|████▏     | 725/1726 [12:32:58<17:05:23, 61.46s/it]


 42%|████▏     | 725/1726 [12:32:58<17:05:23, 61.46sdynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 13:10:23,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.77 | bwd_microstep: 1268.00 | bwd_inner_microstep: 1267.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3466
[2024-06-10 13:10:25,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.45 | bwd_microstep: 1502.24 | bwd_inner_microstep: 1502.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 13:10:27,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.43 | bwd_microstep: 1389.60 | bwd_inner_microstep: 1389.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 13:10:29,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.62 | bwd_microstep: 1645.47 | bwd_inner_microstep: 1645.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 13:10:31,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1535.25 | bwd_inner_microstep: 1535.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4065
[2024-06-10 13:10:33,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1490.65 | bwd_inner_microstep: 1490.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 13:10:35,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1281.40 | bwd_inner_microstep: 1281.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 13:10:38,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.98 | bwd_microstep: 1629.32 | bwd_inner_microstep: 1629.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971
[2024-06-10 13:10:39,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.94 | bwd_microstep: 890.37 | bwd_inner_microstep: 890.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 13:10:41,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.03 | bwd_microstep: 1622.93 | bwd_inner_microstep: 1622.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 13:10:43,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1347.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 13:10:45,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.62 | bwd_microstep: 1440.02 | bwd_inner_microstep: 1439.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 13:10:47,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1616.23 | bwd_inner_microstep: 1616.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2171
[2024-06-10 13:10:48,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.69 | bwd_microstep: 1048.60 | bwd_inner_microstep: 1048.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975
[2024-06-10 13:10:50,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.42 | bwd_microstep: 853.94 | bwd_inner_microstep: 853.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 13:10:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.73 | bwd_microstep: 1311.65 | bwd_inner_microstep: 1311.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 13:10:53,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.68 | bwd_microstep: 1293.78 | bwd_inner_microstep: 1293.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552
[2024-06-10 13:10:55,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1425.64 | bwd_inner_microstep: 1425.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-10 13:10:56,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.25 | bwd_microstep: 702.42 | bwd_inner_microstep: 702.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3825
[2024-06-10 13:10:58,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1423.59 | bwd_inner_microstep: 1423.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 13:11:00,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.82 | bwd_microstep: 1280.92 | bwd_inner_microstep: 1280.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3819
[2024-06-10 13:11:02,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.49 | bwd_microstep: 1415.93 | bwd_inner_microstep: 1415.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-10 13:11:04,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.38 | bwd_microstep: 1568.98 | bwd_inner_microstep: 1568.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 13:11:06,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 1355.94 | bwd_inner_microstep: 1355.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 13:11:08,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.16 | bwd_microstep: 1283.41 | bwd_inner_microstep: 1283.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3475
[2024-06-10 13:11:10,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.96 | bwd_microstep: 1328.22 | bwd_inner_microstep: 1328.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2021
[2024-06-10 13:11:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.96 | bwd_microstep: 744.03 | bwd_inner_microstep: 744.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2271
[2024-06-10 13:11:12,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.33 | bwd_microstep: 908.09 | bwd_inner_microstep: 908.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 13:11:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.49 | bwd_microstep: 2351.66 | bwd_inner_microstep: 2351.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3596
[2024-06-10 13:11:17,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.34 | bwd_microstep: 1637.83 | bwd_inner_microstep: 1637.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431
[2024-06-10 13:11:19,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1535.81 | bwd_inner_microstep: 1535.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 13:11:22,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 13:11:22,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.60 | bwd_microstep: 2539.50 | bwd_inner_microstep: 1642.15 | bwd_allreduce_microstep: 897.31 | step_microstep: 37.53
[2024-06-10 13:11:22,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15985.88 | bwd: 44668.63 | bwd_inner: 43770.42 | bwd_allreduce: 897.53 | step: 39.08
{'loss': 1.2258, 'learning_rate': 2.6023064521873726e-05, 'epoch': 0.42}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 13:11:24,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.13 | bwd_microstep: 1520.77 | bwd_inner_microstep: 1520.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 13:11:26,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.35 | bwd_microstep: 1240.35 | bwd_inner_microstep: 1240.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3398
[2024-06-10 13:11:28,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.94 | bwd_microstep: 1208.19 | bwd_inner_microstep: 1208.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4260
[2024-06-10 13:11:30,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.46 | bwd_microstep: 1570.35 | bwd_inner_microstep: 1570.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 13:11:32,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1384.04 | bwd_inner_microstep: 1384.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3421
[2024-06-10 13:11:34,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.76 | bwd_microstep: 1298.17 | bwd_inner_microstep: 1298.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2922
[2024-06-10 13:11:35,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.84 | bwd_microstep: 1091.91 | bwd_inner_microstep: 1091.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 13:11:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 1524.79 | bwd_inner_microstep: 1524.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 13:11:39,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.72 | bwd_microstep: 1534.79 | bwd_inner_microstep: 1534.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3743
[2024-06-10 13:11:42,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.43 | bwd_microstep: 1594.57 | bwd_inner_microstep: 1594.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 13:11:43,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.25 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 13:11:44,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.46 | bwd_microstep: 796.49 | bwd_inner_microstep: 796.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3508
[2024-06-10 13:11:47,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.64 | bwd_microstep: 1680.59 | bwd_inner_microstep: 1680.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1881
[2024-06-10 13:11:48,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.79 | bwd_microstep: 805.12 | bwd_inner_microstep: 805.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 13:11:50,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1445.66 | bwd_inner_microstep: 1445.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488
[2024-06-10 13:11:52,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.27 | bwd_microstep: 1219.55 | bwd_inner_microstep: 1219.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-10 13:11:53,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.17 | bwd_microstep: 852.62 | bwd_inner_microstep: 852.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627
[2024-06-10 13:11:55,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.03 | bwd_microstep: 1435.06 | bwd_inner_microstep: 1435.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 13:11:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.45 | bwd_microstep: 1306.71 | bwd_inner_microstep: 1306.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 13:11:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.76 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 13:12:01,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1392.91 | bwd_inner_microstep: 1392.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-10 13:12:03,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.12 | bwd_microstep: 1491.28 | bwd_inner_microstep: 1491.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 13:12:04,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.71 | bwd_microstep: 1180.60 | bwd_inner_microstep: 1180.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 13:12:06,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1397.59 | bwd_inner_microstep: 1397.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 13:12:08,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1401.55 | bwd_inner_microstep: 1401.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 13:12:10,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.53 | bwd_microstep: 1552.50 | bwd_inner_microstep: 1552.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 13:12:12,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1280.41 | bwd_inner_microstep: 1280.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 13:12:14,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1479.37 | bwd_inner_microstep: 1479.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2280
[2024-06-10 13:12:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.24 | bwd_microstep: 1069.59 | bwd_inner_microstep: 1069.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559
[2024-06-10 13:12:18,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1421.55 | bwd_inner_microstep: 1421.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3574
[2024-06-10 13:12:20,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.14 | bwd_microstep: 1446.34 | bwd_inner_microstep: 1446.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2225
[2024-06-10 13:12:23,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 13:12:23,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.69 | bwd_microstep: 3196.79 | bwd_inner_microstep: 1053.30 | bwd_allreduce_microstep: 2143.43 | step_microstep: 38.47
[2024-06-10 13:12:23,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15891.09 | bwd: 44629.34 | bwd_inner: 42485.00 | bwd_allreduce: 2143.66 | step: 39.98
{'loss': 1.2795, 'learning_rate': 2.5987262459467168e-05, 'epoch': 0.42}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 13:12:25,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.21 | bwd_microstep: 1329.08 | bwd_inner_microstep: 1329.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4712
[2024-06-10 13:12:27,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.06 | bwd_microstep: 1782.25 | bwd_inner_microstep: 1782.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 13:12:29,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1377.02 | bwd_inner_microstep: 1377.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 13:12:31,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1345.21 | bwd_inner_microstep: 1345.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-10 13:12:33,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.67 | bwd_microstep: 1645.89 | bwd_inner_microstep: 1645.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 13:12:35,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1345.81 | bwd_inner_microstep: 1345.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-10 13:12:37,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.21 | bwd_microstep: 1313.31 | bwd_inner_microstep: 1313.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897
[2024-06-10 13:12:38,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.46 | bwd_microstep: 681.38 | bwd_inner_microstep: 681.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 13:12:40,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1254.59 | bwd_inner_microstep: 1254.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 13:12:42,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 13:12:43,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.72 | bwd_microstep: 1286.20 | bwd_inner_microstep: 1286.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671
[2024-06-10 13:12:46,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.24 | bwd_microstep: 1548.09 | bwd_inner_microstep: 1548.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1929
[2024-06-10 13:12:47,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.96 | bwd_microstep: 770.58 | bwd_inner_microstep: 770.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3474
[2024-06-10 13:12:49,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.56 | bwd_microstep: 1426.57 | bwd_inner_microstep: 1426.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126
[2024-06-10 13:12:50,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.18 | bwd_microstep: 921.68 | bwd_inner_microstep: 921.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 13:12:52,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.14 | bwd_microstep: 1610.11 | bwd_inner_microstep: 1610.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459
[2024-06-10 13:12:54,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.27 | bwd_microstep: 1336.33 | bwd_inner_microstep: 1336.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 13:12:56,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.62 | bwd_microstep: 1413.57 | bwd_inner_microstep: 1413.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2023
[2024-06-10 13:12:57,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.34 | bwd_microstep: 867.37 | bwd_inner_microstep: 867.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3596
[2024-06-10 13:12:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.27 | bwd_microstep: 1208.99 | bwd_inner_microstep: 1208.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3623
[2024-06-10 13:13:01,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.43 | bwd_microstep: 1654.68 | bwd_inner_microstep: 1654.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3541
[2024-06-10 13:13:03,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.55 | bwd_microstep: 1591.50 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 13:13:05,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1389.14 | bwd_inner_microstep: 1389.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447
[2024-06-10 13:13:07,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.66 | bwd_microstep: 1285.28 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275
[2024-06-10 13:13:08,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.16 | bwd_microstep: 910.41 | bwd_inner_microstep: 910.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554
[2024-06-10 13:13:10,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1325.13 | bwd_inner_microstep: 1325.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279
[2024-06-10 13:13:11,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.96 | bwd_microstep: 906.95 | bwd_inner_microstep: 906.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-10 13:13:13,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1418.75 | bwd_inner_microstep: 1418.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3453
[2024-06-10 13:13:15,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.83 | bwd_microstep: 1221.63 | bwd_inner_microstep: 1221.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 13:13:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1279.49 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 13:13:19,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1381.58 | bwd_inner_microstep: 1381.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 13:13:24,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 13:13:24,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.04 | bwd_microstep: 5291.27 | bwd_inner_microstep: 1455.47 | bwd_allreduce_microstep: 3835.75 | step_microstep: 38.14
[2024-06-10 13:13:24,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15554.67 | bwd: 45400.83 | bwd_inner: 41564.14 | bwd_allreduce: 3835.98 | step: 39.64
{'loss': 1.2508, 'learning_rate': 2.5951439309953347e-05, 'epoch': 0.42}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1841
[2024-06-10 13:13:25,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.11 | bwd_microstep: 695.92 | bwd_inner_microstep: 695.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 13:13:27,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.28 | bwd_microstep: 1272.52 | bwd_inner_microstep: 1272.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921
[2024-06-10 13:13:28,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.27 | bwd_microstep: 785.66 | bwd_inner_microstep: 785.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 13:13:30,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.76 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 13:13:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1381.85 | bwd_inner_microstep: 1381.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3588
[2024-06-10 13:13:34,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.53 | bwd_microstep: 1372.00 | bwd_inner_microstep: 1371.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 13:13:36,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.17 | bwd_microstep: 1340.20 | bwd_inner_microstep: 1340.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 13:13:38,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.71 | bwd_microstep: 1531.09 | bwd_inner_microstep: 1531.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 13:13:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.94 | bwd_microstep: 1180.72 | bwd_inner_microstep: 1180.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 13:13:41,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1284.81 | bwd_inner_microstep: 1284.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 13:13:43,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1285.07 | bwd_inner_microstep: 1285.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 13:13:44,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 795.50 | bwd_inner_microstep: 795.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3674
[2024-06-10 13:13:46,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1324.94 | bwd_inner_microstep: 1324.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3669
[2024-06-10 13:13:48,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.91 | bwd_microstep: 1682.82 | bwd_inner_microstep: 1682.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-10 13:13:50,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.16 | bwd_microstep: 1580.18 | bwd_inner_microstep: 1580.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 13:13:52,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.47 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 13:13:54,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1491.95 | bwd_inner_microstep: 1491.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 13:13:56,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.69 | bwd_microstep: 1454.34 | bwd_inner_microstep: 1454.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2176
[2024-06-10 13:13:57,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.92 | bwd_microstep: 791.90 | bwd_inner_microstep: 791.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 13:13:59,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1285.11 | bwd_inner_microstep: 1285.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 13:14:01,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.19 | bwd_microstep: 1427.75 | bwd_inner_microstep: 1427.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 13:14:03,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.17 | bwd_microstep: 1311.69 | bwd_inner_microstep: 1311.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3624
[2024-06-10 13:14:05,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.69 | bwd_microstep: 1577.36 | bwd_inner_microstep: 1577.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 13:14:07,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1390.90 | bwd_inner_microstep: 1390.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 13:14:09,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1555.13 | bwd_inner_microstep: 1555.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2068
[2024-06-10 13:14:10,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.67 | bwd_microstep: 754.50 | bwd_inner_microstep: 754.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 13:14:12,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1529.11 | bwd_inner_microstep: 1529.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3742
[2024-06-10 13:14:14,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.85 | bwd_microstep: 1339.66 | bwd_inner_microstep: 1339.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597
[2024-06-10 13:14:16,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.90 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3823
[2024-06-10 13:14:18,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1394.83 | bwd_inner_microstep: 1394.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3560
[2024-06-10 13:14:21,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.17 | bwd_microstep: 1693.97 | bwd_inner_microstep: 1693.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 13:14:23,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 13:14:23,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.90 | bwd_microstep: 1490.80 | bwd_inner_microstep: 1471.73 | bwd_allreduce_microstep: 19.02 | step_microstep: 37.57
[2024-06-10 13:14:23,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15763.39 | bwd: 42069.57 | bwd_inner: 42049.61 | bwd_allreduce: 19.26 | step: 39.11
{'loss': 1.2333, 'learning_rate': 2.5915595199501212e-05, 'epoch': 0.42}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 13:14:25,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1380.45 | bwd_inner_microstep: 1380.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 13:14:26,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1381.32 | bwd_inner_microstep: 1381.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 13:14:28,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1291.20 | bwd_inner_microstep: 1291.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 13:14:30,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.30 | bwd_microstep: 1340.80 | bwd_inner_microstep: 1340.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-10 13:14:32,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.19 | bwd_microstep: 1639.90 | bwd_inner_microstep: 1639.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3611
[2024-06-10 13:14:34,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1342.90 | bwd_inner_microstep: 1342.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 13:14:36,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1342.48 | bwd_inner_microstep: 1342.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871
[2024-06-10 13:14:38,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.41 | bwd_microstep: 1467.13 | bwd_inner_microstep: 1467.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 13:14:40,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1388.13 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 13:14:42,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.88 | bwd_microstep: 1282.70 | bwd_inner_microstep: 1282.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897
[2024-06-10 13:14:43,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.59 | bwd_microstep: 713.86 | bwd_inner_microstep: 713.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 13:14:45,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1483.24 | bwd_inner_microstep: 1483.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 13:14:47,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1389.55 | bwd_inner_microstep: 1389.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2000
[2024-06-10 13:14:48,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.73 | bwd_microstep: 832.75 | bwd_inner_microstep: 832.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-10 13:14:50,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1319.08 | bwd_inner_microstep: 1319.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655
[2024-06-10 13:14:52,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.77 | bwd_microstep: 1621.83 | bwd_inner_microstep: 1621.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645
[2024-06-10 13:14:54,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1512.51 | bwd_inner_microstep: 1512.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537
[2024-06-10 13:14:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.05 | bwd_microstep: 1583.41 | bwd_inner_microstep: 1583.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 13:14:58,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.51 | bwd_microstep: 1487.35 | bwd_inner_microstep: 1487.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-10 13:15:00,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.60 | bwd_microstep: 1583.31 | bwd_inner_microstep: 1583.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 13:15:03,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.90 | bwd_microstep: 1602.88 | bwd_inner_microstep: 1602.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3825
[2024-06-10 13:15:05,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.56 | bwd_microstep: 1418.42 | bwd_inner_microstep: 1418.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 13:15:07,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.55 | bwd_microstep: 1624.83 | bwd_inner_microstep: 1624.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3623
[2024-06-10 13:15:09,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1245.75 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 13:15:10,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1397.54 | bwd_inner_microstep: 1397.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 13:15:12,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 13:15:14,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1252.96 | bwd_inner_microstep: 1252.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 13:15:16,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.87 | bwd_microstep: 1347.99 | bwd_inner_microstep: 1347.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3459
[2024-06-10 13:15:18,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1430.67 | bwd_inner_microstep: 1430.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 13:15:20,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.05 | bwd_microstep: 1377.49 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 13:15:22,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1402.15 | bwd_inner_microstep: 1402.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 13:15:24,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.16 | optimizer_step: 6.64
[2024-06-10 13:15:24,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.21 | bwd_microstep: 1357.18 | bwd_inner_microstep: 1349.31 | bwd_allreduce_microstep: 7.82 | step_microstep: 37.78
[2024-06-10 13:15:24,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16487.83 | bwd: 44123.59 | bwd_inner: 44114.88 | bwd_allreduce: 8.04 | step: 39.26
{'loss': 1.2666, 'learning_rate': 2.5879730254353543e-05, 'epoch': 0.42}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 13:15:25,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1392.62 | bwd_inner_microstep: 1392.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 13:15:27,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1353.25 | bwd_inner_microstep: 1353.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 13:15:29,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1477.50 | bwd_inner_microstep: 1477.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2236
[2024-06-10 13:15:31,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.54 | bwd_microstep: 958.90 | bwd_inner_microstep: 958.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 13:15:33,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.36 | bwd_microstep: 1340.77 | bwd_inner_microstep: 1340.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 13:15:35,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.92 | bwd_microstep: 1451.75 | bwd_inner_microstep: 1451.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 13:15:36,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.51 | bwd_microstep: 1297.82 | bwd_inner_microstep: 1297.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 13:15:38,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1250.53 | bwd_inner_microstep: 1250.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 13:15:40,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1384.01 | bwd_inner_microstep: 1383.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 13:15:42,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1285.34 | bwd_inner_microstep: 1285.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2179
[2024-06-10 13:15:43,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.48 | bwd_microstep: 856.78 | bwd_inner_microstep: 856.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-10 13:15:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.45 | bwd_microstep: 1195.76 | bwd_inner_microstep: 1195.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 13:15:47,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1491.80 | bwd_inner_microstep: 1491.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 13:15:49,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1380.22 | bwd_inner_microstep: 1380.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3419
[2024-06-10 13:15:51,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1408.02 | bwd_inner_microstep: 1408.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3415
[2024-06-10 13:15:52,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1393.18 | bwd_inner_microstep: 1393.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-10 13:15:55,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.40 | bwd_microstep: 1583.23 | bwd_inner_microstep: 1583.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 13:15:57,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1392.36 | bwd_inner_microstep: 1392.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3671
[2024-06-10 13:15:58,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.21 | bwd_microstep: 1261.52 | bwd_inner_microstep: 1261.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-10 13:16:01,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1614.46 | bwd_inner_microstep: 1614.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3664
[2024-06-10 13:16:02,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.62 | bwd_microstep: 1323.03 | bwd_inner_microstep: 1323.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 13:16:04,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1508.38 | bwd_inner_microstep: 1508.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 13:16:06,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1397.61 | bwd_inner_microstep: 1397.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 13:16:07,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.04 | bwd_microstep: 696.64 | bwd_inner_microstep: 696.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 13:16:09,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.29 | bwd_microstep: 1484.29 | bwd_inner_microstep: 1484.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 13:16:11,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1394.27 | bwd_inner_microstep: 1394.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3770
[2024-06-10 13:16:14,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.64 | bwd_microstep: 1746.29 | bwd_inner_microstep: 1746.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3771
[2024-06-10 13:16:16,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.77 | bwd_microstep: 1840.22 | bwd_inner_microstep: 1840.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038
[2024-06-10 13:16:17,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.32 | bwd_microstep: 905.24 | bwd_inner_microstep: 905.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 13:16:19,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.11 | bwd_microstep: 973.13 | bwd_inner_microstep: 973.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012
[2024-06-10 13:16:20,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.09 | bwd_microstep: 709.10 | bwd_inner_microstep: 709.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 13:16:27,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 13:16:27,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.06 | bwd_microstep: 6059.17 | bwd_inner_microstep: 2017.82 | bwd_allreduce_microstep: 4041.28 | step_microstep: 38.46
[2024-06-10 13:16:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15835.08 | bwd: 46807.21 | bwd_inner: 42765.01 | bwd_allreduce: 4041.52 | step: 39.87
/it]
 42%|████▏     | 726/1726 [12:33:59<17:02:01, 61.32s/it]


 42%|████▏     | 726/1726 [12:33:59<17:02:01, 61.32s/it]
 42%|████▏     | 727/1726 [12:35:00<16:58:41, 61.18s/it]


 42%|████▏     | 727/1726 [12:35:00<16:58:41, 61.18s/it]
 42%|████▏     | 728/1726 [12:36:01<16:58:10, 61.21s/it]


 42%|████▏     | 728/1726 [12:36:01<16:58:10, 61.21s/it]
 42%|████▏     | 729/1726 [12:36:59<16:41:56, 60.30s/it]


 42%|████▏     | 729/1726 [12:36:59<16:41:56, 60.30s/it]
 42%|████▏     | 730/1726 [12:38:00<16:44:08, 60.49s/it]


 42%|████▏     | 730/1726 [12:38:00<16:44:08, 60.49s/it]
 42%|████▏     | 731/1726 [12:39:03<16:55:29, 61.{'loss': 1.3183, 'learning_rate': 2.584384460082648e-05, 'epoch': 0.42}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 13:16:28,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.07 | bwd_microstep: 781.78 | bwd_inner_microstep: 781.65 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2882
[2024-06-10 13:16:29,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.86 | bwd_microstep: 1084.40 | bwd_inner_microstep: 1084.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2381
[2024-06-10 13:16:30,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.21 | bwd_microstep: 996.08 | bwd_inner_microstep: 996.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 13:16:32,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.02 | bwd_microstep: 1275.71 | bwd_inner_microstep: 1275.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919
[2024-06-10 13:16:33,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.11 | bwd_microstep: 779.42 | bwd_inner_microstep: 779.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754
[2024-06-10 13:16:35,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.60 | bwd_microstep: 1534.88 | bwd_inner_microstep: 1534.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 13:16:37,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1247.40 | bwd_inner_microstep: 1247.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 13:16:39,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.65 | bwd_microstep: 1240.35 | bwd_inner_microstep: 1240.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 13:16:41,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1250.40 | bwd_inner_microstep: 1250.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 13:16:43,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1532.32 | bwd_inner_microstep: 1532.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 13:16:44,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.84 | bwd_microstep: 788.68 | bwd_inner_microstep: 788.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2385
[2024-06-10 13:16:45,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.39 | bwd_microstep: 960.36 | bwd_inner_microstep: 960.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-10 13:16:47,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.53 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1918
[2024-06-10 13:16:49,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.95 | bwd_microstep: 873.14 | bwd_inner_microstep: 873.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3455
[2024-06-10 13:16:50,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1397.99 | bwd_inner_microstep: 1397.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 13:16:52,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1410.39 | bwd_inner_microstep: 1410.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 13:16:54,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.15 | bwd_microstep: 1291.05 | bwd_inner_microstep: 1291.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2151
[2024-06-10 13:16:56,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.71 | bwd_microstep: 948.53 | bwd_inner_microstep: 948.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3428
[2024-06-10 13:16:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.18 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3556
[2024-06-10 13:17:00,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1526.51 | bwd_inner_microstep: 1526.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 13:17:01,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.03 | bwd_microstep: 917.22 | bwd_inner_microstep: 917.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2713
[2024-06-10 13:17:02,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.33 | bwd_microstep: 1080.41 | bwd_inner_microstep: 1080.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 13:17:05,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.15 | bwd_microstep: 1583.77 | bwd_inner_microstep: 1583.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4920
[2024-06-10 13:17:07,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.08 | bwd_microstep: 1751.58 | bwd_inner_microstep: 1751.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 13:17:09,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.47 | bwd_microstep: 1544.75 | bwd_inner_microstep: 1544.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 13:17:11,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.40 | bwd_microstep: 1513.40 | bwd_inner_microstep: 1513.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 953
[2024-06-10 13:17:12,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.70 | bwd_microstep: 381.37 | bwd_inner_microstep: 381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 13:17:14,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.93 | bwd_microstep: 1645.47 | bwd_inner_microstep: 1645.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451
[2024-06-10 13:17:16,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.38 | bwd_microstep: 1318.46 | bwd_inner_microstep: 1318.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 13:17:18,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.70 | bwd_microstep: 1642.75 | bwd_inner_microstep: 1642.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3739
[2024-06-10 13:17:20,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.35 | bwd_microstep: 1734.37 | bwd_inner_microstep: 1734.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 13:17:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 13:17:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.61 | bwd_microstep: 7123.59 | bwd_inner_microstep: 1695.85 | bwd_allreduce_microstep: 5427.68 | step_microstep: 38.21
[2024-06-10 13:17:28,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15138.82 | bwd: 46146.90 | bwd_inner: 40718.21 | bwd_allreduce: 5427.96 | step: 39.69
{'loss': 1.216, 'learning_rate': 2.5807938365309113e-05, 'epoch': 0.42}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 13:17:30,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.23 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4004
[2024-06-10 13:17:32,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.73 | bwd_microstep: 1703.74 | bwd_inner_microstep: 1703.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 13:17:35,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.90 | bwd_microstep: 1645.27 | bwd_inner_microstep: 1645.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 13:17:37,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.93 | bwd_microstep: 1400.46 | bwd_inner_microstep: 1400.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3789
[2024-06-10 13:17:38,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.43 | bwd_microstep: 1345.42 | bwd_inner_microstep: 1345.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 13:17:40,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.81 | bwd_microstep: 1282.30 | bwd_inner_microstep: 1282.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3614
[2024-06-10 13:17:42,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.71 | bwd_microstep: 1308.53 | bwd_inner_microstep: 1308.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 13:17:44,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1381.86 | bwd_inner_microstep: 1381.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 13:17:46,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.31 | bwd_microstep: 1384.40 | bwd_inner_microstep: 1384.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 13:17:48,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.83 | bwd_microstep: 1242.33 | bwd_inner_microstep: 1242.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437
[2024-06-10 13:17:49,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.81 | bwd_microstep: 1275.14 | bwd_inner_microstep: 1275.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2087
[2024-06-10 13:17:51,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.06 | bwd_microstep: 914.81 | bwd_inner_microstep: 914.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505
[2024-06-10 13:17:53,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1430.38 | bwd_inner_microstep: 1430.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 13:17:54,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1282.31 | bwd_inner_microstep: 1282.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3518
[2024-06-10 13:17:56,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.21 | bwd_microstep: 1432.27 | bwd_inner_microstep: 1432.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 5006
[2024-06-10 13:17:59,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.26 | bwd_microstep: 1676.93 | bwd_inner_microstep: 1676.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1976
[2024-06-10 13:18:00,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.17 | bwd_microstep: 736.12 | bwd_inner_microstep: 736.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 13:18:01,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.39 | bwd_microstep: 797.43 | bwd_inner_microstep: 797.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 13:18:03,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.69 | bwd_microstep: 1518.59 | bwd_inner_microstep: 1518.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 13:18:05,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.44 | bwd_microstep: 1451.34 | bwd_inner_microstep: 1451.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3826
[2024-06-10 13:18:07,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.31 | bwd_microstep: 1603.96 | bwd_inner_microstep: 1603.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 13:18:09,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1392.74 | bwd_inner_microstep: 1392.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 13:18:11,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1350.12 | bwd_inner_microstep: 1350.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 13:18:13,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 1492.69 | bwd_inner_microstep: 1492.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534
[2024-06-10 13:18:15,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1454.28 | bwd_inner_microstep: 1454.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 13:18:17,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1253.61 | bwd_inner_microstep: 1253.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3562
[2024-06-10 13:18:18,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.37 | bwd_microstep: 1260.66 | bwd_inner_microstep: 1260.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3615
[2024-06-10 13:18:21,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.06 | bwd_microstep: 1648.25 | bwd_inner_microstep: 1648.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3417
[2024-06-10 13:18:23,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.47 | bwd_microstep: 1511.07 | bwd_inner_microstep: 1511.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 13:18:25,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.53 | bwd_microstep: 1499.79 | bwd_inner_microstep: 1499.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 13:18:27,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.60 | bwd_microstep: 1344.71 | bwd_inner_microstep: 1344.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2241
[2024-06-10 13:18:28,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 13:18:28,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.43 | bwd_microstep: 971.73 | bwd_inner_microstep: 962.67 | bwd_allreduce_microstep: 9.01 | step_microstep: 37.90
[2024-06-10 13:18:28,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16243.43 | bwd: 43364.22 | bwd_inner: 43354.31 | bwd_allreduce: 9.24 | step: 39.37
{'loss': 1.2946, 'learning_rate': 2.5772011674263017e-05, 'epoch': 0.42}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 13:18:30,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.11 | bwd_microstep: 1499.83 | bwd_inner_microstep: 1499.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-10 13:18:31,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.98 | bwd_microstep: 679.72 | bwd_inner_microstep: 679.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 13:18:33,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1479.47 | bwd_inner_microstep: 1479.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:18:35,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1378.33 | bwd_inner_microstep: 1378.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-10 13:18:37,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.40 | bwd_microstep: 1644.61 | bwd_inner_microstep: 1644.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 13:18:39,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1395.76 | bwd_inner_microstep: 1395.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 13:18:41,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1388.05 | bwd_inner_microstep: 1388.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 790
[2024-06-10 13:18:42,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 122.06 | bwd_microstep: 311.97 | bwd_inner_microstep: 311.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 13:18:43,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.96 | bwd_microstep: 1153.93 | bwd_inner_microstep: 1153.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 13:18:45,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1393.73 | bwd_inner_microstep: 1393.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3533
[2024-06-10 13:18:47,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.43 | bwd_microstep: 1344.73 | bwd_inner_microstep: 1344.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3545
[2024-06-10 13:18:49,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1425.24 | bwd_inner_microstep: 1425.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3684
[2024-06-10 13:18:51,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.74 | bwd_microstep: 1358.95 | bwd_inner_microstep: 1358.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2148
[2024-06-10 13:18:52,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.09 | bwd_microstep: 1008.90 | bwd_inner_microstep: 1008.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2164
[2024-06-10 13:18:54,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.52 | bwd_microstep: 953.65 | bwd_inner_microstep: 953.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 13:18:56,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.67 | bwd_microstep: 1525.63 | bwd_inner_microstep: 1525.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632
[2024-06-10 13:18:58,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.37 | bwd_microstep: 1441.22 | bwd_inner_microstep: 1441.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 13:19:00,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1416.40 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 13:19:01,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.69 | bwd_microstep: 806.21 | bwd_inner_microstep: 806.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 13:19:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.23 | bwd_microstep: 1500.52 | bwd_inner_microstep: 1500.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 13:19:05,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1416.46 | bwd_inner_microstep: 1416.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 13:19:07,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.75 | bwd_microstep: 1407.55 | bwd_inner_microstep: 1407.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 13:19:09,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 1435.49 | bwd_inner_microstep: 1435.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 13:19:10,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.66 | bwd_microstep: 1336.74 | bwd_inner_microstep: 1336.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 13:19:13,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.84 | bwd_microstep: 1649.25 | bwd_inner_microstep: 1649.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 13:19:15,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.96 | bwd_microstep: 1528.56 | bwd_inner_microstep: 1528.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 13:19:17,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1518.52 | bwd_inner_microstep: 1518.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975
[2024-06-10 13:19:18,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.28 | bwd_microstep: 705.44 | bwd_inner_microstep: 705.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 13:19:20,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.92 | bwd_microstep: 1605.66 | bwd_inner_microstep: 1605.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3574
[2024-06-10 13:19:22,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.10 | bwd_microstep: 1446.20 | bwd_inner_microstep: 1446.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2954
[2024-06-10 13:19:24,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.79 | bwd_microstep: 1103.81 | bwd_inner_microstep: 1103.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3594
[2024-06-10 13:19:31,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 13:19:31,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.68 | bwd_microstep: 6115.09 | bwd_inner_microstep: 2043.21 | bwd_allreduce_microstep: 4071.82 | step_microstep: 39.52
[2024-06-10 13:19:31,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15717.34 | bwd: 46375.64 | bwd_inner: 42302.90 | bwd_allreduce: 4072.05 | step: 41.01
{'loss': 1.2544, 'learning_rate': 2.5736064654221808e-05, 'epoch': 0.43}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411
[2024-06-10 13:19:32,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1298.84 | bwd_inner_microstep: 1298.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4024
[2024-06-10 13:19:35,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.50 | bwd_microstep: 1609.07 | bwd_inner_microstep: 1609.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857
[2024-06-10 13:19:37,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.50 | bwd_microstep: 1561.65 | bwd_inner_microstep: 1561.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 13:19:38,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1252.62 | bwd_inner_microstep: 1252.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 13:19:40,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1382.48 | bwd_inner_microstep: 1382.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481
[2024-06-10 13:19:42,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.98 | bwd_microstep: 1409.88 | bwd_inner_microstep: 1409.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-10 13:19:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.46 | bwd_microstep: 1186.16 | bwd_inner_microstep: 1186.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 13:19:46,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1287.13 | bwd_inner_microstep: 1287.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 13:19:47,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1185.29 | bwd_inner_microstep: 1185.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 13:19:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.20 | bwd_microstep: 1289.53 | bwd_inner_microstep: 1289.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3683
[2024-06-10 13:19:51,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1467.64 | bwd_inner_microstep: 1467.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 13:19:53,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1345.33 | bwd_inner_microstep: 1345.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2172
[2024-06-10 13:19:54,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.50 | bwd_microstep: 949.28 | bwd_inner_microstep: 949.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 13:19:56,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.95 | bwd_microstep: 1416.33 | bwd_inner_microstep: 1416.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 13:19:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.08 | bwd_microstep: 1337.59 | bwd_inner_microstep: 1337.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1898
[2024-06-10 13:19:59,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.90 | bwd_microstep: 777.32 | bwd_inner_microstep: 777.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 13:20:01,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1456.03 | bwd_inner_microstep: 1456.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 13:20:03,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.89 | bwd_microstep: 1515.98 | bwd_inner_microstep: 1515.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3541
[2024-06-10 13:20:05,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1326.33 | bwd_inner_microstep: 1326.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651
[2024-06-10 13:20:07,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1419.99 | bwd_inner_microstep: 1419.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 13:20:09,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1553.99 | bwd_inner_microstep: 1553.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 13:20:11,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.34 | bwd_microstep: 1182.31 | bwd_inner_microstep: 1182.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2427
[2024-06-10 13:20:12,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.86 | bwd_microstep: 1034.88 | bwd_inner_microstep: 1034.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553
[2024-06-10 13:20:14,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1424.02 | bwd_inner_microstep: 1423.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 13:20:16,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1531.82 | bwd_inner_microstep: 1531.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 13:20:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.04 | bwd_microstep: 1158.62 | bwd_inner_microstep: 1158.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 13:20:20,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1507.66 | bwd_inner_microstep: 1507.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 13:20:22,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1423.07 | bwd_inner_microstep: 1423.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3523
[2024-06-10 13:20:24,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.91 | bwd_microstep: 1519.02 | bwd_inner_microstep: 1518.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764
[2024-06-10 13:20:26,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.85 | bwd_microstep: 1548.39 | bwd_inner_microstep: 1548.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3582
[2024-06-10 13:20:28,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.84 | bwd_microstep: 1525.87 | bwd_inner_microstep: 1525.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3807
[2024-06-10 13:20:31,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 13:20:31,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.99 | bwd_microstep: 1761.19 | bwd_inner_microstep: 1405.20 | bwd_allreduce_microstep: 355.94 | step_microstep: 37.78
[2024-06-10 13:20:31,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16223.69 | bwd: 43645.32 | bwd_inner: 43288.48 | bwd_allreduce: 356.17 | step: 39.28
{'loss': 1.2433, 'learning_rate': 2.5700097431790708e-05, 'epoch': 0.43}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 13:20:32,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1256.26 | bwd_inner_microstep: 1256.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900
[2024-06-10 13:20:35,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1587.40 | bwd_inner_microstep: 1587.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 13:20:36,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.64 | bwd_microstep: 1344.74 | bwd_inner_microstep: 1344.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909
[2024-06-10 13:20:38,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.21 | bwd_microstep: 776.48 | bwd_inner_microstep: 776.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2976
[2024-06-10 13:20:39,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.45 | bwd_microstep: 1098.31 | bwd_inner_microstep: 1098.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2307
[2024-06-10 13:20:40,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.74 | bwd_microstep: 883.38 | bwd_inner_microstep: 883.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890
[2024-06-10 13:20:41,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.32 | bwd_microstep: 776.11 | bwd_inner_microstep: 776.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 13:20:43,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.85 | bwd_microstep: 1482.74 | bwd_inner_microstep: 1482.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3687
[2024-06-10 13:20:46,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.48 | bwd_microstep: 1552.28 | bwd_inner_microstep: 1552.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 13:20:47,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.83 | bwd_microstep: 1387.80 | bwd_inner_microstep: 1387.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 13:20:49,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.60 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 13:20:51,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410
[2024-06-10 13:20:53,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.57 | bwd_microstep: 1374.44 | bwd_inner_microstep: 1374.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662
[2024-06-10 13:20:55,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.57 | bwd_microstep: 1422.64 | bwd_inner_microstep: 1422.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3657
[2024-06-10 13:20:57,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.46 | bwd_microstep: 1540.42 | bwd_inner_microstep: 1540.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 13:20:59,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.53 | bwd_microstep: 1390.08 | bwd_inner_microstep: 1390.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 13:21:01,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1289.70 | bwd_inner_microstep: 1289.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523
[2024-06-10 13:21:03,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.66 | bwd_microstep: 1451.34 | bwd_inner_microstep: 1451.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 13:21:05,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1507.59 | bwd_inner_microstep: 1507.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1988
[2024-06-10 13:21:06,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.01 | bwd_microstep: 707.65 | bwd_inner_microstep: 707.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 13:21:08,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.60 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 13:21:10,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1555.79 | bwd_inner_microstep: 1555.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999
[2024-06-10 13:21:11,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.89 | bwd_microstep: 708.13 | bwd_inner_microstep: 708.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 13:21:13,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1467.89 | bwd_inner_microstep: 1467.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3822
[2024-06-10 13:21:15,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1498.70 | bwd_inner_microstep: 1498.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 13:21:17,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.50 | bwd_microstep: 1307.94 | bwd_inner_microstep: 1307.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3586
[2024-06-10 13:21:19,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1511.69 | bwd_inner_microstep: 1511.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828
[2024-06-10 13:21:21,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3591
[2024-06-10 13:21:23,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.51 | bwd_microstep: 1532.90 | bwd_inner_microstep: 1532.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 13:21:25,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 13:21:27,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.76 | bwd_microstep: 1634.64 | bwd_inner_microstep: 1634.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3421
[2024-06-10 13:21:34,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 13:21:34,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.01 | bwd_microstep: 6364.75 | bwd_inner_microstep: 1562.07 | bwd_allreduce_microstep: 4802.62 | step_microstep: 38.20
[2024-06-10 13:21:34,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15774.52 | bwd: 47205.41 | bwd_inner: 42401.87 | bwd_allreduce: 4802.86 | step: 39.71
24s/it]


 42%|████▏     | 731/1726 [12:39:03<16:55:29, 61.24s/it]
 42%|████▏     | 732/1726 [12:40:05<16:56:21, 61.35s/it]


 42%|████▏     | 732/1726 [12:40:05<16:56:21, 61.35s/it]
 42%|████▏     | 733/1726 [12:41:05<16:48:23, 60.93s/it]


 42%|████▏     | 733/1726 [12:41:05<16:48:23, 60.93s/it]
 43%|████▎     | 734/1726 [12:42:07<16:54:48, 61.38s/it]


 43%|████▎     | 734/1726 [12:42:07<16:54:48, 61.38s/it]
 43%|████▎     | 735/1726 [12:43:07<16:47:57, 61.03s/it]


 43%|████▎     | 735/1726 [12:43:07<16:47:57, 61.03s/it]
 43%|████▎     | 736/1726 [12:44:11<16:58:13, 61.71s/it]
                                                        {'loss': 1.3043, 'learning_rate': 2.566411013364608e-05, 'epoch': 0.43}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-10 13:21:36,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1400.46 | bwd_inner_microstep: 1400.26 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 13:21:38,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.66 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 13:21:40,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.18 | bwd_microstep: 1312.29 | bwd_inner_microstep: 1312.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 13:21:42,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.17 | bwd_microstep: 1645.54 | bwd_inner_microstep: 1645.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 13:21:44,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.55 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 13:21:46,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.49 | bwd_microstep: 1276.37 | bwd_inner_microstep: 1276.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 13:21:48,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1400.79 | bwd_inner_microstep: 1400.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 13:21:50,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1533.86 | bwd_inner_microstep: 1533.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3881
[2024-06-10 13:21:52,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.61 | bwd_microstep: 1582.87 | bwd_inner_microstep: 1582.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 13:21:54,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 13:21:56,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.11 | bwd_microstep: 1316.08 | bwd_inner_microstep: 1316.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 13:21:58,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.54 | bwd_microstep: 1317.72 | bwd_inner_microstep: 1317.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 13:22:00,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.93 | bwd_microstep: 1484.06 | bwd_inner_microstep: 1484.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1892
[2024-06-10 13:22:01,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.42 | bwd_microstep: 727.14 | bwd_inner_microstep: 727.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1972
[2024-06-10 13:22:02,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.12 | bwd_microstep: 893.84 | bwd_inner_microstep: 893.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-10 13:22:04,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.01 | bwd_microstep: 1439.68 | bwd_inner_microstep: 1439.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 13:22:06,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.47 | bwd_microstep: 1526.19 | bwd_inner_microstep: 1526.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 13:22:07,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.98 | bwd_microstep: 803.36 | bwd_inner_microstep: 803.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 13:22:09,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1308.71 | bwd_inner_microstep: 1308.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 13:22:11,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.74 | bwd_microstep: 1492.73 | bwd_inner_microstep: 1492.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 13:22:13,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.52 | bwd_microstep: 1291.62 | bwd_inner_microstep: 1291.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 13:22:15,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.58 | bwd_microstep: 1404.33 | bwd_inner_microstep: 1404.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684
[2024-06-10 13:22:17,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1425.69 | bwd_inner_microstep: 1425.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2436
[2024-06-10 13:22:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.58 | bwd_microstep: 1047.88 | bwd_inner_microstep: 1047.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-10 13:22:20,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.59 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3751
[2024-06-10 13:22:22,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.35 | bwd_microstep: 1607.42 | bwd_inner_microstep: 1607.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 13:22:24,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.67 | bwd_microstep: 1312.11 | bwd_inner_microstep: 1312.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2243
[2024-06-10 13:22:25,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.37 | bwd_microstep: 1065.97 | bwd_inner_microstep: 1065.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891
[2024-06-10 13:22:27,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.88 | bwd_microstep: 775.79 | bwd_inner_microstep: 775.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 13:22:29,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.41 | bwd_microstep: 1501.51 | bwd_inner_microstep: 1501.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 13:22:31,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 1649.32 | bwd_inner_microstep: 1649.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3786
[2024-06-10 13:22:36,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.19 | optimizer_step: 6.58
[2024-06-10 13:22:36,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.74 | bwd_microstep: 4010.96 | bwd_inner_microstep: 2054.78 | bwd_allreduce_microstep: 1956.13 | step_microstep: 38.02
[2024-06-10 13:22:36,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15993.28 | bwd: 45176.78 | bwd_inner: 43219.60 | bwd_allreduce: 1956.44 | step: 39.65
{'loss': 1.2896, 'learning_rate': 2.562810288653501e-05, 'epoch': 0.43}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 13:22:37,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.88 | bwd_microstep: 782.39 | bwd_inner_microstep: 782.33 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4372
[2024-06-10 13:22:39,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.04 | bwd_microstep: 1806.48 | bwd_inner_microstep: 1806.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 13:22:40,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 790.59 | bwd_inner_microstep: 790.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 13:22:42,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1374.17 | bwd_inner_microstep: 1374.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1938
[2024-06-10 13:22:43,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.58 | bwd_microstep: 821.83 | bwd_inner_microstep: 821.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-10 13:22:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.74 | bwd_microstep: 1452.83 | bwd_inner_microstep: 1452.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 13:22:46,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.70 | bwd_microstep: 790.27 | bwd_inner_microstep: 790.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3470
[2024-06-10 13:22:48,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1331.43 | bwd_inner_microstep: 1331.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 13:22:50,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.33 | bwd_microstep: 1631.87 | bwd_inner_microstep: 1631.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 13:22:51,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.12 | bwd_microstep: 684.35 | bwd_inner_microstep: 684.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 13:22:54,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1574.21 | bwd_inner_microstep: 1574.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643
[2024-06-10 13:22:55,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.28 | bwd_microstep: 1315.33 | bwd_inner_microstep: 1315.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 13:22:57,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1349.95 | bwd_inner_microstep: 1349.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2120
[2024-06-10 13:22:59,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.84 | bwd_microstep: 925.12 | bwd_inner_microstep: 925.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 13:23:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1484.86 | bwd_inner_microstep: 1484.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-10 13:23:02,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.63 | bwd_microstep: 722.40 | bwd_inner_microstep: 722.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 13:23:04,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.48 | bwd_microstep: 1486.14 | bwd_inner_microstep: 1486.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2005
[2024-06-10 13:23:05,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.24 | bwd_microstep: 738.46 | bwd_inner_microstep: 738.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-10 13:23:07,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.17 | bwd_microstep: 1440.26 | bwd_inner_microstep: 1440.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 13:23:09,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.78 | bwd_microstep: 1655.15 | bwd_inner_microstep: 1655.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 13:23:10,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.56 | bwd_microstep: 801.23 | bwd_inner_microstep: 801.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 13:23:12,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1293.01 | bwd_inner_microstep: 1292.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 13:23:14,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1552.50 | bwd_inner_microstep: 1552.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 13:23:16,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1414.57 | bwd_inner_microstep: 1414.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559
[2024-06-10 13:23:18,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.56 | bwd_microstep: 1330.37 | bwd_inner_microstep: 1330.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 13:23:20,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.57 | bwd_microstep: 1585.42 | bwd_inner_microstep: 1585.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462
[2024-06-10 13:23:22,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.77 | bwd_microstep: 1338.82 | bwd_inner_microstep: 1338.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 13:23:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.17 | bwd_microstep: 817.13 | bwd_inner_microstep: 817.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3449
[2024-06-10 13:23:25,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.99 | bwd_microstep: 1404.66 | bwd_inner_microstep: 1404.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569
[2024-06-10 13:23:27,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1456.87 | bwd_inner_microstep: 1456.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3808
[2024-06-10 13:23:29,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.16 | bwd_microstep: 1748.29 | bwd_inner_microstep: 1748.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3775
[2024-06-10 13:23:37,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 13:23:37,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.92 | bwd_microstep: 6739.19 | bwd_inner_microstep: 2093.86 | bwd_allreduce_microstep: 4645.27 | step_microstep: 38.19
[2024-06-10 13:23:37,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15224.72 | bwd: 45640.17 | bwd_inner: 40993.95 | bwd_allreduce: 4645.52 | step: 39.65
{'loss': 1.271, 'learning_rate': 2.559207581727484e-05, 'epoch': 0.43}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 13:23:39,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1381.20 | bwd_inner_microstep: 1381.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 13:23:40,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1281.71 | bwd_inner_microstep: 1281.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 13:23:43,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.74 | bwd_microstep: 1552.86 | bwd_inner_microstep: 1552.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 13:23:44,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.99 | bwd_microstep: 1276.09 | bwd_inner_microstep: 1276.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 13:23:46,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.00 | bwd_microstep: 1447.48 | bwd_inner_microstep: 1447.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4253
[2024-06-10 13:23:49,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.80 | bwd_microstep: 1665.06 | bwd_inner_microstep: 1665.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 13:23:50,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1245.17 | bwd_inner_microstep: 1245.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653
[2024-06-10 13:23:52,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.17 | bwd_microstep: 1320.77 | bwd_inner_microstep: 1320.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 13:23:54,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1282.64 | bwd_inner_microstep: 1282.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 13:23:56,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.94 | bwd_microstep: 1290.43 | bwd_inner_microstep: 1290.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 13:23:57,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1254.45 | bwd_inner_microstep: 1254.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 13:24:00,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.45 | bwd_microstep: 1522.47 | bwd_inner_microstep: 1522.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3692
[2024-06-10 13:24:02,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.32 | bwd_microstep: 1553.26 | bwd_inner_microstep: 1553.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 13:24:03,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.20 | bwd_microstep: 896.95 | bwd_inner_microstep: 896.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 13:24:05,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.81 | bwd_microstep: 1274.29 | bwd_inner_microstep: 1274.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3524
[2024-06-10 13:24:07,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.58 | bwd_microstep: 1686.69 | bwd_inner_microstep: 1686.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 13:24:09,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1247.70 | bwd_inner_microstep: 1247.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 13:24:11,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1391.39 | bwd_inner_microstep: 1391.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3478
[2024-06-10 13:24:13,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.01 | bwd_microstep: 1428.10 | bwd_inner_microstep: 1428.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 13:24:14,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1293.99 | bwd_inner_microstep: 1293.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 13:24:16,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.53 | bwd_microstep: 1184.34 | bwd_inner_microstep: 1184.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 13:24:17,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.19 | bwd_microstep: 802.07 | bwd_inner_microstep: 802.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 13:24:19,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.65 | bwd_microstep: 1404.46 | bwd_inner_microstep: 1404.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-10 13:24:21,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.01 | bwd_microstep: 971.83 | bwd_inner_microstep: 971.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434
[2024-06-10 13:24:22,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.71 | bwd_microstep: 1184.76 | bwd_inner_microstep: 1184.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 13:24:24,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1281.03 | bwd_inner_microstep: 1281.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 13:24:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1414.43 | bwd_inner_microstep: 1414.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3564
[2024-06-10 13:24:28,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.14 | bwd_microstep: 1233.78 | bwd_inner_microstep: 1233.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-10 13:24:30,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.17 | bwd_microstep: 1592.09 | bwd_inner_microstep: 1592.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 13:24:32,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.29 | bwd_microstep: 1398.48 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 13:24:34,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1392.38 | bwd_inner_microstep: 1392.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763
[2024-06-10 13:24:39,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 13:24:39,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.18 | bwd_microstep: 4468.22 | bwd_inner_microstep: 1661.64 | bwd_allreduce_microstep: 2806.52 | step_microstep: 37.87
[2024-06-10 13:24:39,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16048.15 | bwd: 45620.57 | bwd_inner: 42813.14 | bwd_allreduce: 2806.75 | step: 39.32
{'loss': 1.2371, 'learning_rate': 2.5556029052752704e-05, 'epoch': 0.43}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 13:24:41,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1376.08 | bwd_inner_microstep: 1376.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3919
[2024-06-10 13:24:43,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.36 | bwd_microstep: 1688.78 | bwd_inner_microstep: 1688.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 13:24:45,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1376.54 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3791
[2024-06-10 13:24:47,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.70 | bwd_microstep: 1444.04 | bwd_inner_microstep: 1444.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 13:24:49,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 1378.97 | bwd_inner_microstep: 1378.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 13:24:51,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.34 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 845
[2024-06-10 13:24:51,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.77 | bwd_microstep: 345.58 | bwd_inner_microstep: 345.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 13:24:53,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1251.17 | bwd_inner_microstep: 1251.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-10 13:24:55,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.40 | bwd_microstep: 1633.82 | bwd_inner_microstep: 1633.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 13:24:57,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.56 | bwd_microstep: 1422.65 | bwd_inner_microstep: 1422.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 13:24:59,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1415.69 | bwd_inner_microstep: 1415.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 13:25:01,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.45 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 13:25:03,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 1386.53 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 13:25:05,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.86 | bwd_microstep: 1412.31 | bwd_inner_microstep: 1412.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3648
[2024-06-10 13:25:07,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.85 | bwd_microstep: 1708.03 | bwd_inner_microstep: 1708.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 13:25:09,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1486.02 | bwd_inner_microstep: 1485.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2669
[2024-06-10 13:25:11,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.53 | bwd_microstep: 1119.77 | bwd_inner_microstep: 1119.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3432
[2024-06-10 13:25:13,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1540.17 | bwd_inner_microstep: 1540.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3826
[2024-06-10 13:25:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.99 | bwd_microstep: 1752.74 | bwd_inner_microstep: 1752.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 13:25:17,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1525.09 | bwd_inner_microstep: 1525.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 13:25:19,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1280.27 | bwd_inner_microstep: 1280.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928
[2024-06-10 13:25:20,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.11 | bwd_microstep: 696.35 | bwd_inner_microstep: 696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-10 13:25:21,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.29 | bwd_microstep: 907.99 | bwd_inner_microstep: 907.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823
[2024-06-10 13:25:23,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1478.79 | bwd_inner_microstep: 1478.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3720
[2024-06-10 13:25:25,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.72 | bwd_microstep: 1561.70 | bwd_inner_microstep: 1561.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 13:25:27,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1376.35 | bwd_inner_microstep: 1376.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 13:25:29,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1407.51 | bwd_inner_microstep: 1407.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 13:25:31,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.46 | bwd_microstep: 1548.24 | bwd_inner_microstep: 1548.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 13:25:34,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1500.26 | bwd_inner_microstep: 1500.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 13:25:36,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1451.27 | bwd_inner_microstep: 1451.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444
[2024-06-10 13:25:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1411.38 | bwd_inner_microstep: 1411.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 13:25:41,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.25 | optimizer_step: 6.58
[2024-06-10 13:25:41,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 2463.52 | bwd_inner_microstep: 1613.13 | bwd_allreduce_microstep: 850.33 | step_microstep: 39.17
[2024-06-10 13:25:41,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16451.85 | bwd: 45047.92 | bwd_inner: 44196.67 | bwd_allreduce: 850.56 | step: 40.67
{'loss': 1.2448, 'learning_rate': 2.5519962719925122e-05, 'epoch': 0.43}
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1963
[2024-06-10 13:25:42,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.98 | bwd_microstep: 842.48 | bwd_inner_microstep: 842.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 13:25:43,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 877.01 | bwd_inner_microstep: 876.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 13:25:45,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1448.32 | bwd_inner_microstep: 1448.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 13:25:47,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1245.62 | bwd_inner_microstep: 1245.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-10 13:25:49,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.86 | bwd_microstep: 1444.87 | bwd_inner_microstep: 1444.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 13:25:51,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1381.37 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 13:25:52,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1281.18 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 13:25:54,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.24 | bwd_microstep: 1290.68 | bwd_inner_microstep: 1290.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 13:25:56,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.72 | bwd_microstep: 1526.72 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 13:25:57,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.52 | bwd_microstep: 802.68 | bwd_inner_microstep: 802.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 13:25:58,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.05 | bwd_microstep: 793.80 | bwd_inner_microstep: 793.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1884
[2024-06-10 13:25:59,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.90 | bwd_microstep: 744.22 | bwd_inner_microstep: 744.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3547
[2024-06-10 13:26:01,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.60 | bwd_microstep: 1409.52 | bwd_inner_microstep: 1409.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 13:26:04,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1611.02 | bwd_inner_microstep: 1610.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 13:26:06,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.53 | bwd_microstep: 1604.61 | bwd_inner_microstep: 1604.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3512
[2024-06-10 13:26:08,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.35 | bwd_microstep: 1268.43 | bwd_inner_microstep: 1268.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3532
[2024-06-10 13:26:10,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.43 | bwd_microstep: 1434.90 | bwd_inner_microstep: 1434.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3533
[2024-06-10 13:26:12,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1451.67 | bwd_inner_microstep: 1451.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 13:26:14,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.60 | bwd_microstep: 1614.65 | bwd_inner_microstep: 1614.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 13:26:16,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1431.24 | bwd_inner_microstep: 1431.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 13:26:17,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.23 | bwd_microstep: 698.40 | bwd_inner_microstep: 698.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 13:26:19,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1401.93 | bwd_inner_microstep: 1401.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822
[2024-06-10 13:26:21,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.04 | bwd_microstep: 1689.09 | bwd_inner_microstep: 1689.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3711
[2024-06-10 13:26:23,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.20 | bwd_microstep: 1435.10 | bwd_inner_microstep: 1435.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 13:26:25,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.10 | bwd_microstep: 1559.99 | bwd_inner_microstep: 1559.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 13:26:27,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1534.42 | bwd_inner_microstep: 1534.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3737
[2024-06-10 13:26:29,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.21 | bwd_microstep: 1438.86 | bwd_inner_microstep: 1438.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576
[2024-06-10 13:26:31,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1530.98 | bwd_inner_microstep: 1530.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3613
[2024-06-10 13:26:33,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.32 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3721
[2024-06-10 13:26:36,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.86 | bwd_microstep: 1728.56 | bwd_inner_microstep: 1728.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3802
[2024-06-10 13:26:38,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.39 | bwd_microstep: 1750.87 | bwd_inner_microstep: 1750.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3622
[2024-06-10 13:26:42,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.10 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 13:26:42,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 3576.37 | bwd_inner_microstep: 1868.75 | bwd_allreduce_microstep: 1707.57 | step_microstep: 37.99
[2024-06-10 13:26:42,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16132.23 | bwd: 45385.68 | bwd_inner: 43677.17 | bwd_allreduce: 1707.81 | step: 39.48
{'loss': 1.2585, 'learning_rate': 2.5483876945817544e-05, 'epoch': 0.43}


 43%|████▎     | 736/1726 [12:44:11<16:58:13, 61.71s/it]
 43%|████▎     | 737/1726 [12:45:12<16:56:11, 61.65s/it]


 43%|████▎     | 737/1726 [12:45:12<16:56:11, 61.65s/it]
 43%|████▎     | 738/1726 [12:46:13<16:52:54, 61.51s/it]


 43%|████▎     | 738/1726 [12:46:13<16:52:54, 61.51s/it]
 43%|████▎     | 739/1726 [12:47:15<16:54:16, 61.66s/it]


 43%|████▎     | 739/1726 [12:47:15<16:54:16, 61.66s/it]
 43%|████▎     | 740/1726 [12:48:17<16:54:09, 61.71s/it]


 43%|████▎     | 740/1726 [12:48:17<16:54:09, 61.71s/it]
 43%|████▎     | 741/1726 [12:49:19<16:53:49, 61.76s/it]


 43%|████▎     | 741/1726 [12:49:19<16:53:49, 61.76sdynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 13:26:44,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 13:26:46,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1390.96 | bwd_inner_microstep: 1390.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 13:26:48,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1447.39 | bwd_inner_microstep: 1447.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 13:26:50,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1455.59 | bwd_inner_microstep: 1455.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2293
[2024-06-10 13:26:51,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.61 | bwd_microstep: 938.57 | bwd_inner_microstep: 938.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 13:26:53,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.01 | bwd_microstep: 1190.85 | bwd_inner_microstep: 1190.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 13:26:55,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.95 | bwd_microstep: 1527.57 | bwd_inner_microstep: 1527.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-10 13:26:57,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.81 | bwd_microstep: 1191.43 | bwd_inner_microstep: 1191.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1991
[2024-06-10 13:26:58,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.76 | bwd_microstep: 741.44 | bwd_inner_microstep: 741.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3431
[2024-06-10 13:27:00,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.90 | bwd_microstep: 1295.67 | bwd_inner_microstep: 1295.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2479
[2024-06-10 13:27:01,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.05 | bwd_microstep: 1055.72 | bwd_inner_microstep: 1055.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 13:27:03,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1341.54 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 13:27:05,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.27 | bwd_microstep: 1490.86 | bwd_inner_microstep: 1490.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 13:27:07,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.02 | bwd_microstep: 1590.83 | bwd_inner_microstep: 1590.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-10 13:27:10,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.67 | bwd_microstep: 1707.99 | bwd_inner_microstep: 1707.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 13:27:12,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1613.11 | bwd_inner_microstep: 1613.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 13:27:14,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.99 | bwd_microstep: 1613.53 | bwd_inner_microstep: 1613.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 648
[2024-06-10 13:27:14,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.46 | bwd_microstep: 276.20 | bwd_inner_microstep: 276.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 13:27:17,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1657.63 | bwd_inner_microstep: 1657.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 13:27:18,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1282.74 | bwd_inner_microstep: 1282.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 13:27:20,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 1280.25 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-10 13:27:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.63 | bwd_microstep: 818.30 | bwd_inner_microstep: 818.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 13:27:23,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1257.30 | bwd_inner_microstep: 1257.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 13:27:25,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.81 | bwd_microstep: 1611.66 | bwd_inner_microstep: 1611.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 13:27:27,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1506.50 | bwd_inner_microstep: 1506.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2258
[2024-06-10 13:27:29,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.17 | bwd_microstep: 922.06 | bwd_inner_microstep: 922.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 13:27:31,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.75 | bwd_microstep: 1607.42 | bwd_inner_microstep: 1607.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2287
[2024-06-10 13:27:32,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.28 | bwd_microstep: 907.53 | bwd_inner_microstep: 907.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2247
[2024-06-10 13:27:33,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 811.17 | bwd_inner_microstep: 811.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 13:27:36,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.70 | bwd_microstep: 1649.73 | bwd_inner_microstep: 1649.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3578
[2024-06-10 13:27:38,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1527.50 | bwd_inner_microstep: 1527.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481
[2024-06-10 13:27:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.29 | optimizer_step: 6.58
[2024-06-10 13:27:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.64 | bwd_microstep: 4521.12 | bwd_inner_microstep: 1781.01 | bwd_allreduce_microstep: 2740.05 | step_microstep: 38.33
[2024-06-10 13:27:43,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15530.01 | bwd: 44511.62 | bwd_inner: 41770.67 | bwd_allreduce: 2740.29 | step: 39.81
{'loss': 1.216, 'learning_rate': 2.5447771857523868e-05, 'epoch': 0.43}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416
[2024-06-10 13:27:44,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.42 | bwd_microstep: 1139.81 | bwd_inner_microstep: 1139.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4017
[2024-06-10 13:27:47,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.93 | bwd_microstep: 1607.97 | bwd_inner_microstep: 1607.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 13:27:48,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1349.12 | bwd_inner_microstep: 1349.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-10 13:27:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.18 | bwd_microstep: 1430.92 | bwd_inner_microstep: 1430.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777
[2024-06-10 13:27:53,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.63 | bwd_microstep: 1542.50 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2727
[2024-06-10 13:27:54,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.77 | bwd_microstep: 1035.89 | bwd_inner_microstep: 1035.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 13:27:55,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.02 | bwd_microstep: 788.59 | bwd_inner_microstep: 788.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 13:27:57,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.42 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 13:27:58,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.50 | bwd_microstep: 810.14 | bwd_inner_microstep: 810.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563
[2024-06-10 13:28:00,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1428.03 | bwd_inner_microstep: 1428.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3656
[2024-06-10 13:28:02,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.67 | bwd_microstep: 1547.87 | bwd_inner_microstep: 1547.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 13:28:04,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1477.39 | bwd_inner_microstep: 1477.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 13:28:06,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.14 | bwd_microstep: 1610.24 | bwd_inner_microstep: 1610.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 13:28:08,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1479.25 | bwd_inner_microstep: 1479.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 13:28:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.59 | bwd_microstep: 1610.90 | bwd_inner_microstep: 1610.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2136
[2024-06-10 13:28:12,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.46 | bwd_microstep: 1025.81 | bwd_inner_microstep: 1025.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 13:28:14,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1376.27 | bwd_inner_microstep: 1376.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532
[2024-06-10 13:28:16,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1458.45 | bwd_inner_microstep: 1458.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 13:28:18,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1401.33 | bwd_inner_microstep: 1401.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004
[2024-06-10 13:28:19,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.60 | bwd_microstep: 710.08 | bwd_inner_microstep: 710.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3387
[2024-06-10 13:28:21,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.81 | bwd_microstep: 1433.06 | bwd_inner_microstep: 1433.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 13:28:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 1509.67 | bwd_inner_microstep: 1509.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 13:28:25,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1413.37 | bwd_inner_microstep: 1413.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 13:28:27,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1402.16 | bwd_inner_microstep: 1402.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 13:28:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.93 | bwd_microstep: 810.56 | bwd_inner_microstep: 810.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 13:28:30,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.60 | bwd_microstep: 1445.09 | bwd_inner_microstep: 1445.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 13:28:32,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.75 | bwd_microstep: 1554.89 | bwd_inner_microstep: 1554.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 13:28:34,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1297.87 | bwd_inner_microstep: 1297.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 13:28:36,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.89 | bwd_microstep: 1536.85 | bwd_inner_microstep: 1536.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-10 13:28:38,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.17 | bwd_microstep: 1477.14 | bwd_inner_microstep: 1477.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 5453
[2024-06-10 13:28:41,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 813.00 | bwd_microstep: 2191.92 | bwd_inner_microstep: 2191.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2263
[2024-06-10 13:28:44,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 5.21 | optimizer_step: 6.61
[2024-06-10 13:28:44,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.66 | bwd_microstep: 2582.17 | bwd_inner_microstep: 1139.76 | bwd_allreduce_microstep: 1442.36 | step_microstep: 38.60
[2024-06-10 13:28:44,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16166.81 | bwd: 44796.65 | bwd_inner: 43353.39 | bwd_allreduce: 1442.59 | step: 40.12
{'loss': 1.1939, 'learning_rate': 2.541164758220603e-05, 'epoch': 0.43}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 13:28:46,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.08 | bwd_microstep: 1361.60 | bwd_inner_microstep: 1361.44 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 13:28:48,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1242.07 | bwd_inner_microstep: 1242.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 13:28:49,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 790.85 | bwd_inner_microstep: 790.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847
[2024-06-10 13:28:51,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1658.57 | bwd_inner_microstep: 1658.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 13:28:53,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1545.20 | bwd_inner_microstep: 1545.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 13:28:55,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.72 | bwd_microstep: 1148.82 | bwd_inner_microstep: 1148.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 13:28:56,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.62 | bwd_microstep: 686.71 | bwd_inner_microstep: 686.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 13:28:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1531.07 | bwd_inner_microstep: 1531.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 13:29:00,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1350.75 | bwd_inner_microstep: 1350.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 13:29:01,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.22 | bwd_microstep: 1194.33 | bwd_inner_microstep: 1194.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 13:29:04,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.41 | bwd_microstep: 1629.06 | bwd_inner_microstep: 1629.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-10 13:29:05,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.23 | bwd_microstep: 1284.09 | bwd_inner_microstep: 1284.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 13:29:07,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1382.32 | bwd_inner_microstep: 1382.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3626
[2024-06-10 13:29:10,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.41 | bwd_microstep: 1675.46 | bwd_inner_microstep: 1675.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 13:29:12,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.31 | bwd_microstep: 1521.10 | bwd_inner_microstep: 1521.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 13:29:14,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.94 | bwd_microstep: 1448.92 | bwd_inner_microstep: 1448.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3534
[2024-06-10 13:29:16,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.82 | bwd_microstep: 1448.20 | bwd_inner_microstep: 1448.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2011
[2024-06-10 13:29:17,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.47 | bwd_microstep: 771.96 | bwd_inner_microstep: 771.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 13:29:18,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.19 | bwd_microstep: 801.00 | bwd_inner_microstep: 800.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2244
[2024-06-10 13:29:19,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.95 | bwd_microstep: 873.14 | bwd_inner_microstep: 873.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 887
[2024-06-10 13:29:20,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.58 | bwd_microstep: 367.90 | bwd_inner_microstep: 367.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2300
[2024-06-10 13:29:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.62 | bwd_microstep: 979.72 | bwd_inner_microstep: 979.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 13:29:23,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.87 | bwd_microstep: 1355.85 | bwd_inner_microstep: 1355.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 13:29:25,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1495.98 | bwd_inner_microstep: 1495.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 13:29:27,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1553.50 | bwd_inner_microstep: 1553.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3818
[2024-06-10 13:29:29,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1516.67 | bwd_inner_microstep: 1516.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 13:29:30,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.25 | bwd_microstep: 972.54 | bwd_inner_microstep: 972.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3756
[2024-06-10 13:29:33,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.69 | bwd_microstep: 1737.17 | bwd_inner_microstep: 1737.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 13:29:35,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1548.29 | bwd_inner_microstep: 1548.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3474
[2024-06-10 13:29:37,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.31 | bwd_microstep: 1439.96 | bwd_inner_microstep: 1439.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2224
[2024-06-10 13:29:38,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.61 | bwd_microstep: 861.51 | bwd_inner_microstep: 861.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 13:29:45,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 13:29:45,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 6264.05 | bwd_inner_microstep: 2021.41 | bwd_allreduce_microstep: 4242.59 | step_microstep: 37.93
[2024-06-10 13:29:45,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15218.57 | bwd: 45438.38 | bwd_inner: 41194.78 | bwd_allreduce: 4242.87 | step: 39.52
{'loss': 1.2208, 'learning_rate': 2.537550424709354e-05, 'epoch': 0.43}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1981
[2024-06-10 13:29:46,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.04 | bwd_microstep: 889.19 | bwd_inner_microstep: 889.10 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2918
[2024-06-10 13:29:48,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.75 | bwd_microstep: 1122.05 | bwd_inner_microstep: 1122.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3876
[2024-06-10 13:29:50,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.34 | bwd_microstep: 1676.89 | bwd_inner_microstep: 1676.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 13:29:52,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.97 | bwd_microstep: 1351.24 | bwd_inner_microstep: 1351.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 13:29:54,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1536.62 | bwd_inner_microstep: 1536.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 13:29:56,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.92 | bwd_microstep: 1381.95 | bwd_inner_microstep: 1381.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3785
[2024-06-10 13:29:58,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.71 | bwd_microstep: 1347.18 | bwd_inner_microstep: 1347.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 13:29:59,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.82 | bwd_microstep: 788.03 | bwd_inner_microstep: 788.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 13:30:01,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1255.05 | bwd_inner_microstep: 1255.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3447
[2024-06-10 13:30:03,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1299.62 | bwd_inner_microstep: 1299.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1879
[2024-06-10 13:30:04,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.39 | bwd_microstep: 710.04 | bwd_inner_microstep: 710.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486
[2024-06-10 13:30:05,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1404.84 | bwd_inner_microstep: 1404.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3482
[2024-06-10 13:30:07,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1430.72 | bwd_inner_microstep: 1430.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 13:30:09,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.26 | bwd_microstep: 808.82 | bwd_inner_microstep: 808.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 13:30:11,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.35 | bwd_microstep: 1602.30 | bwd_inner_microstep: 1602.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 13:30:13,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1491.42 | bwd_inner_microstep: 1491.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3829
[2024-06-10 13:30:15,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.43 | bwd_microstep: 1617.13 | bwd_inner_microstep: 1617.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857
[2024-06-10 13:30:17,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.52 | bwd_microstep: 1562.98 | bwd_inner_microstep: 1562.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2105
[2024-06-10 13:30:18,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.65 | bwd_microstep: 790.41 | bwd_inner_microstep: 790.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 13:30:20,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1394.86 | bwd_inner_microstep: 1394.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815
[2024-06-10 13:30:22,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1385.06 | bwd_inner_microstep: 1385.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 13:30:24,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.55 | bwd_microstep: 1537.69 | bwd_inner_microstep: 1537.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 13:30:26,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1402.75 | bwd_inner_microstep: 1402.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 13:30:28,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1396.98 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 13:30:29,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.49 | bwd_microstep: 802.13 | bwd_inner_microstep: 802.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 13:30:31,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.28 | bwd_microstep: 1454.29 | bwd_inner_microstep: 1454.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 13:30:33,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1553.99 | bwd_inner_microstep: 1553.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 13:30:35,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.34 | bwd_microstep: 1379.96 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 13:30:37,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1488.82 | bwd_inner_microstep: 1488.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 13:30:40,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.64 | bwd_microstep: 1602.48 | bwd_inner_microstep: 1602.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 13:30:42,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.64 | bwd_microstep: 1650.27 | bwd_inner_microstep: 1650.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 13:30:44,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 13:30:44,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1449.90 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 67.61 | step_microstep: 37.73
[2024-06-10 13:30:44,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15883.98 | bwd: 42565.68 | bwd_inner: 42497.10 | bwd_allreduce: 67.88 | step: 39.25
{'loss': 1.2658, 'learning_rate': 2.5339341979483037e-05, 'epoch': 0.43}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 13:30:46,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1379.73 | bwd_inner_microstep: 1379.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 13:30:48,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.64 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3954
[2024-06-10 13:30:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.21 | bwd_microstep: 1599.21 | bwd_inner_microstep: 1599.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 13:30:52,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1494.29 | bwd_inner_microstep: 1494.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907
[2024-06-10 13:30:54,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1591.29 | bwd_inner_microstep: 1591.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-10 13:30:56,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.65 | bwd_microstep: 1583.00 | bwd_inner_microstep: 1582.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901
[2024-06-10 13:30:57,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.51 | bwd_microstep: 684.23 | bwd_inner_microstep: 684.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 13:30:59,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1401.74 | bwd_inner_microstep: 1401.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3569
[2024-06-10 13:31:01,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.56 | bwd_microstep: 1334.47 | bwd_inner_microstep: 1334.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4024
[2024-06-10 13:31:03,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.32 | bwd_microstep: 1714.24 | bwd_inner_microstep: 1714.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-10 13:31:05,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1413.07 | bwd_inner_microstep: 1413.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 13:31:07,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1345.04 | bwd_inner_microstep: 1345.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3524
[2024-06-10 13:31:09,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.22 | bwd_microstep: 1447.22 | bwd_inner_microstep: 1447.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3652
[2024-06-10 13:31:11,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.27 | bwd_microstep: 1545.98 | bwd_inner_microstep: 1545.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 13:31:13,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1346.31 | bwd_inner_microstep: 1346.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 13:31:15,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.15 | bwd_microstep: 1485.97 | bwd_inner_microstep: 1485.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3514
[2024-06-10 13:31:17,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1251.08 | bwd_inner_microstep: 1251.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 13:31:19,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.41 | bwd_microstep: 1520.71 | bwd_inner_microstep: 1520.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2285
[2024-06-10 13:31:20,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.19 | bwd_microstep: 1071.76 | bwd_inner_microstep: 1071.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 13:31:23,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.48 | bwd_microstep: 1503.48 | bwd_inner_microstep: 1503.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2172
[2024-06-10 13:31:24,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.55 | bwd_microstep: 884.46 | bwd_inner_microstep: 884.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 13:31:26,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1322.00 | bwd_inner_microstep: 1321.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3694
[2024-06-10 13:31:27,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.55 | bwd_microstep: 1359.99 | bwd_inner_microstep: 1359.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527
[2024-06-10 13:31:30,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.22 | bwd_microstep: 1588.24 | bwd_inner_microstep: 1588.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 13:31:32,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.65 | bwd_microstep: 1406.23 | bwd_inner_microstep: 1406.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 13:31:34,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.06 | bwd_microstep: 1608.40 | bwd_inner_microstep: 1608.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3811
[2024-06-10 13:31:36,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.47 | bwd_microstep: 1509.91 | bwd_inner_microstep: 1509.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 13:31:38,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.48 | bwd_microstep: 1372.50 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 13:31:40,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.72 | bwd_microstep: 1542.77 | bwd_inner_microstep: 1542.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 13:31:41,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.37 | bwd_microstep: 976.72 | bwd_inner_microstep: 976.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3598
[2024-06-10 13:31:43,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.20 | bwd_microstep: 1212.78 | bwd_inner_microstep: 1212.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 13:31:48,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.37 | optimizer_gradients: 4.26 | optimizer_step: 6.63
[2024-06-10 13:31:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.77 | bwd_microstep: 4181.25 | bwd_inner_microstep: 1629.92 | bwd_allreduce_microstep: 2551.27 | step_microstep: 41.04
[2024-06-10 13:31:48,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16550.31 | bwd: 46963.14 | bwd_inner: 44410.95 | bwd_allreduce: 2551.51 | step: 42.56
{'loss': 1.1871, 'learning_rate': 2.530316090673784e-05, 'epoch': 0.43}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 13:31:49,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.38 | bwd_microstep: 781.09 | bwd_inner_microstep: 781.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4560
[2024-06-10 13:31:51,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 684.79 | bwd_microstep: 1849.26 | bwd_inner_microstep: 1849.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 13:31:53,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.25 | bwd_microstep: 1380.05 | bwd_inner_microstep: 1380.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4310
[2024-06-10 13:31:56,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.51 | bwd_microstep: 1781.51 | bwd_inner_microstep: 1781.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 13:31:58,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1344.65 | bwd_inner_microstep: 1344.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 13:32:00,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1412.59 | bwd_inner_microstep: 1412.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-10 13:32:01,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.27 | bwd_microstep: 821.16 | bwd_inner_microstep: 821.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 13:32:03,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1353.98 | bwd_inner_microstep: 1353.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 13:32:04,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1251.50 | bwd_inner_microstep: 1251.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 13:32:07,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.78 | bwd_microstep: 1632.83 | bwd_inner_microstep: 1632.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1890
[2024-06-10 13:32:07,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.43 | bwd_microstep: 686.14 | bwd_inner_microstep: 686.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3545
[2024-06-10 13:32:09,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1457.57 | bwd_inner_microstep: 1457.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960
[2024-06-10 13:32:11,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.78 | bwd_microstep: 828.86 | bwd_inner_microstep: 828.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3418
[2024-06-10 13:32:13,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1407.36 | bwd_inner_microstep: 1407.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-10 13:32:15,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1576.97 | bwd_inner_microstep: 1576.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386
[2024-06-10 13:32:17,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.92 | bwd_microstep: 1437.93 | bwd_inner_microstep: 1437.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 13:32:19,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1375.63 | bwd_inner_microstep: 1375.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-10 13:32:21,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.92 | bwd_microstep: 1540.69 | bwd_inner_microstep: 1540.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 13:32:23,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1380.62 | bwd_inner_microstep: 1380.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 13:32:24,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.26 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3830
[2024-06-10 13:32:26,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.91 | bwd_microstep: 1470.64 | bwd_inner_microstep: 1470.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2289
[2024-06-10 13:32:28,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.03 | bwd_microstep: 1072.87 | bwd_inner_microstep: 1072.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3609
[2024-06-10 13:32:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.48 | bwd_microstep: 1470.15 | bwd_inner_microstep: 1470.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 13:32:32,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.35 | bwd_microstep: 1625.94 | bwd_inner_microstep: 1625.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-10 13:32:34,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.67 | bwd_microstep: 1510.92 | bwd_inner_microstep: 1510.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 13:32:36,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1392.63 | bwd_inner_microstep: 1392.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 13:32:38,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1492.41 | bwd_inner_microstep: 1492.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3707
[2024-06-10 13:32:40,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.36 | bwd_microstep: 1328.81 | bwd_inner_microstep: 1328.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 13:32:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1349.75 | bwd_inner_microstep: 1349.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 13:32:44,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.86 | bwd_microstep: 1546.99 | bwd_inner_microstep: 1546.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 13:32:46,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1409.02 | bwd_inner_microstep: 1409.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 13:32:48,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.42 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 13:32:48,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.20 | bwd_microstep: 1594.14 | bwd_inner_microstep: 1534.26 | bwd_allreduce_microstep: 59.84 | step_microstep: 39.07
[2024-06-10 13:32:48,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16337.46 | bwd: 43845.20 | bwd_inner: 43784.45 | bwd_allreduce: 60.06 | step: 40.56
/it]
 43%|████▎     | 742/1726 [12:50:20<16:46:00, 61.34s/it]


 43%|████▎     | 742/1726 [12:50:20<16:46:00, 61.34s/it]
 43%|████▎     | 743/1726 [12:51:21<16:44:47, 61.33s/it]


 43%|████▎     | 743/1726 [12:51:21<16:44:47, 61.33s/it]
 43%|████▎     | 744/1726 [12:52:22<16:42:07, 61.23s/it]


 43%|████▎     | 744/1726 [12:52:22<16:42:07, 61.23s/it]
 43%|████▎     | 745/1726 [12:53:21<16:29:05, 60.49s/it]


 43%|████▎     | 745/1726 [12:53:21<16:29:05, 60.49s/it]
 43%|████▎     | 746/1726 [12:54:24<16:44:32, 61.50s/it]


 43%|████▎     | 746/1726 [12:54:24<16:44:32, 61.50s/it]
 43%|████▎     | 747/1726 [12:55:25<16:38:43, 61.{'loss': 1.1888, 'learning_rate': 2.5266961156287493e-05, 'epoch': 0.43}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1852
[2024-06-10 13:32:49,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.39 | bwd_microstep: 698.73 | bwd_inner_microstep: 698.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3890
[2024-06-10 13:32:51,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.75 | bwd_microstep: 1581.39 | bwd_inner_microstep: 1581.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 13:32:53,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1244.85 | bwd_inner_microstep: 1244.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 13:32:55,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1346.52 | bwd_inner_microstep: 1346.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 13:32:57,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1400.22 | bwd_inner_microstep: 1400.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 13:32:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.45 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1249.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 13:33:00,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1243.19 | bwd_inner_microstep: 1243.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 13:33:02,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.02 | bwd_microstep: 1275.44 | bwd_inner_microstep: 1275.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 13:33:04,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1382.95 | bwd_inner_microstep: 1382.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1390
[2024-06-10 13:33:05,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.14 | bwd_microstep: 525.77 | bwd_inner_microstep: 525.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 13:33:07,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1377.82 | bwd_inner_microstep: 1377.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-10 13:33:09,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1425.76 | bwd_inner_microstep: 1425.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-10 13:33:10,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.46 | bwd_microstep: 1277.63 | bwd_inner_microstep: 1277.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3431
[2024-06-10 13:33:12,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1489.18 | bwd_inner_microstep: 1489.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3842
[2024-06-10 13:33:15,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.44 | bwd_microstep: 1621.15 | bwd_inner_microstep: 1621.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2472
[2024-06-10 13:33:16,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.27 | bwd_microstep: 989.83 | bwd_inner_microstep: 989.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 13:33:18,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1294.68 | bwd_inner_microstep: 1294.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 13:33:19,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.14 | bwd_microstep: 801.20 | bwd_inner_microstep: 801.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 13:33:21,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1373.02 | bwd_inner_microstep: 1372.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 13:33:23,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1490.15 | bwd_inner_microstep: 1490.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 13:33:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.29 | bwd_microstep: 1642.09 | bwd_inner_microstep: 1642.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2040
[2024-06-10 13:33:26,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.92 | bwd_microstep: 778.95 | bwd_inner_microstep: 778.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-10 13:33:28,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.08 | bwd_microstep: 1512.60 | bwd_inner_microstep: 1512.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 13:33:30,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1506.63 | bwd_inner_microstep: 1506.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2945
[2024-06-10 13:33:32,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.52 | bwd_microstep: 1198.07 | bwd_inner_microstep: 1198.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 13:33:34,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.91 | bwd_microstep: 1530.60 | bwd_inner_microstep: 1530.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 13:33:36,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1426.49 | bwd_inner_microstep: 1426.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 13:33:38,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.57 | bwd_microstep: 1302.02 | bwd_inner_microstep: 1302.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3738
[2024-06-10 13:33:40,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.28 | bwd_microstep: 1672.69 | bwd_inner_microstep: 1672.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3469
[2024-06-10 13:33:42,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.01 | bwd_microstep: 1455.41 | bwd_inner_microstep: 1455.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570
[2024-06-10 13:33:44,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.66 | bwd_microstep: 1449.08 | bwd_inner_microstep: 1449.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 13:33:49,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 13:33:49,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 4110.86 | bwd_inner_microstep: 1530.08 | bwd_allreduce_microstep: 2580.72 | step_microstep: 37.93
[2024-06-10 13:33:49,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15727.83 | bwd: 44674.11 | bwd_inner: 42092.43 | bwd_allreduce: 2580.97 | step: 39.42
{'loss': 1.2863, 'learning_rate': 2.523074285562734e-05, 'epoch': 0.43}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-10 13:33:51,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1578.59 | bwd_inner_microstep: 1578.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3917
[2024-06-10 13:33:53,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.61 | bwd_microstep: 1387.61 | bwd_inner_microstep: 1387.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469
[2024-06-10 13:33:55,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1542.90 | bwd_inner_microstep: 1542.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 13:33:57,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1337.03 | bwd_inner_microstep: 1337.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 13:33:59,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.75 | bwd_microstep: 1374.41 | bwd_inner_microstep: 1374.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 13:34:01,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1394.52 | bwd_inner_microstep: 1394.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 13:34:03,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1393.59 | bwd_inner_microstep: 1393.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 13:34:04,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.46 | bwd_microstep: 678.11 | bwd_inner_microstep: 678.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 13:34:05,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.01 | bwd_microstep: 808.47 | bwd_inner_microstep: 808.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3433
[2024-06-10 13:34:07,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.45 | bwd_microstep: 1408.26 | bwd_inner_microstep: 1408.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 13:34:08,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.39 | bwd_microstep: 797.37 | bwd_inner_microstep: 797.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3621
[2024-06-10 13:34:10,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.31 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 13:34:12,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1393.36 | bwd_inner_microstep: 1393.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 13:34:14,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.97 | bwd_microstep: 1497.53 | bwd_inner_microstep: 1497.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 13:34:15,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.84 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3517
[2024-06-10 13:34:18,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.94 | bwd_microstep: 1682.08 | bwd_inner_microstep: 1682.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3640
[2024-06-10 13:34:20,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.37 | bwd_microstep: 1679.03 | bwd_inner_microstep: 1679.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2023
[2024-06-10 13:34:21,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.90 | bwd_microstep: 841.86 | bwd_inner_microstep: 841.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3708
[2024-06-10 13:34:23,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.78 | bwd_microstep: 1459.00 | bwd_inner_microstep: 1458.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 13:34:25,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1487.72 | bwd_inner_microstep: 1487.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 13:34:27,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.19 | bwd_microstep: 1255.86 | bwd_inner_microstep: 1255.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3830
[2024-06-10 13:34:29,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.58 | bwd_microstep: 1690.69 | bwd_inner_microstep: 1690.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 13:34:31,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1489.64 | bwd_inner_microstep: 1489.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826
[2024-06-10 13:34:33,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.87 | bwd_microstep: 1510.88 | bwd_inner_microstep: 1510.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-10 13:34:36,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.46 | bwd_microstep: 1590.67 | bwd_inner_microstep: 1590.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 13:34:38,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1508.90 | bwd_inner_microstep: 1508.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 13:34:40,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1558.25 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3699
[2024-06-10 13:34:42,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1431.59 | bwd_inner_microstep: 1431.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 13:34:44,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.29 | bwd_microstep: 1404.12 | bwd_inner_microstep: 1404.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 13:34:46,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.31 | bwd_microstep: 1397.67 | bwd_inner_microstep: 1397.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624
[2024-06-10 13:34:48,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.49 | bwd_microstep: 1311.86 | bwd_inner_microstep: 1311.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 13:34:50,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 13:34:50,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.37 | bwd_microstep: 2314.14 | bwd_inner_microstep: 1520.75 | bwd_allreduce_microstep: 793.35 | step_microstep: 37.84
[2024-06-10 13:34:50,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16359.49 | bwd: 44737.94 | bwd_inner: 43943.68 | bwd_allreduce: 793.57 | step: 39.46
{'loss': 1.2821, 'learning_rate': 2.5194506132318033e-05, 'epoch': 0.43}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 13:34:52,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1371.90 | bwd_inner_microstep: 1371.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 13:34:54,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1276.82 | bwd_inner_microstep: 1276.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 13:34:56,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.75 | bwd_microstep: 1342.93 | bwd_inner_microstep: 1342.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 13:34:58,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1384.17 | bwd_inner_microstep: 1384.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-10 13:34:59,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.37 | bwd_microstep: 1148.59 | bwd_inner_microstep: 1148.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 13:35:02,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.47 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 13:35:04,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.08 | bwd_microstep: 1528.90 | bwd_inner_microstep: 1528.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2660
[2024-06-10 13:35:05,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.71 | bwd_microstep: 1021.80 | bwd_inner_microstep: 1021.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3696
[2024-06-10 13:35:07,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.50 | bwd_microstep: 1456.12 | bwd_inner_microstep: 1456.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 13:35:09,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.39 | bwd_microstep: 1629.08 | bwd_inner_microstep: 1629.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3707
[2024-06-10 13:35:11,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.47 | bwd_microstep: 1487.02 | bwd_inner_microstep: 1486.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3724
[2024-06-10 13:35:14,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.48 | bwd_microstep: 1664.16 | bwd_inner_microstep: 1664.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3678
[2024-06-10 13:35:16,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.46 | bwd_microstep: 1681.67 | bwd_inner_microstep: 1681.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513
[2024-06-10 13:35:18,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1431.23 | bwd_inner_microstep: 1431.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1899
[2024-06-10 13:35:19,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.03 | bwd_microstep: 745.39 | bwd_inner_microstep: 745.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 13:35:21,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.61 | bwd_microstep: 1185.16 | bwd_inner_microstep: 1185.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 13:35:23,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.17 | bwd_microstep: 1389.28 | bwd_inner_microstep: 1389.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 13:35:25,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.50 | bwd_microstep: 1507.19 | bwd_inner_microstep: 1507.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 13:35:27,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.27 | bwd_microstep: 1391.62 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 13:35:29,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1393.23 | bwd_inner_microstep: 1393.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3613
[2024-06-10 13:35:30,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.72 | bwd_microstep: 1273.47 | bwd_inner_microstep: 1273.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 13:35:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.37 | bwd_microstep: 1649.69 | bwd_inner_microstep: 1649.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 13:35:34,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1413.53 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 13:35:37,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.80 | bwd_microstep: 1476.77 | bwd_inner_microstep: 1476.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3809
[2024-06-10 13:35:38,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1411.78 | bwd_inner_microstep: 1411.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 13:35:41,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1638.46 | bwd_inner_microstep: 1638.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 13:35:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.33 | bwd_microstep: 974.90 | bwd_inner_microstep: 974.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 13:35:44,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1404.82 | bwd_inner_microstep: 1404.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 13:35:46,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.73 | bwd_microstep: 1341.82 | bwd_inner_microstep: 1341.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 13:35:48,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1552.67 | bwd_inner_microstep: 1552.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 13:35:50,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.24 | bwd_microstep: 1545.61 | bwd_inner_microstep: 1545.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570
[2024-06-10 13:35:53,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 13:35:53,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 2429.59 | bwd_inner_microstep: 1794.27 | bwd_allreduce_microstep: 635.26 | step_microstep: 37.72
[2024-06-10 13:35:53,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16755.05 | bwd: 45685.95 | bwd_inner: 45049.79 | bwd_allreduce: 635.49 | step: 39.20
{'loss': 1.2571, 'learning_rate': 2.515825111398514e-05, 'epoch': 0.43}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 13:35:55,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4213
[2024-06-10 13:35:57,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.75 | bwd_microstep: 1654.15 | bwd_inner_microstep: 1654.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 13:35:58,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.79 | bwd_microstep: 787.28 | bwd_inner_microstep: 787.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 13:36:00,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.26 | bwd_microstep: 1276.59 | bwd_inner_microstep: 1276.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 13:36:02,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.65 | bwd_microstep: 1340.21 | bwd_inner_microstep: 1340.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 13:36:04,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1248.53 | bwd_inner_microstep: 1248.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205
[2024-06-10 13:36:05,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.50 | bwd_microstep: 954.28 | bwd_inner_microstep: 954.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 13:36:07,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.13 | bwd_microstep: 1279.25 | bwd_inner_microstep: 1279.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-10 13:36:09,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.19 | bwd_microstep: 1179.06 | bwd_inner_microstep: 1179.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 13:36:11,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.32 | bwd_microstep: 1529.38 | bwd_inner_microstep: 1529.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 779
[2024-06-10 13:36:11,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.51 | bwd_microstep: 306.86 | bwd_inner_microstep: 306.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3501
[2024-06-10 13:36:13,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.79 | bwd_microstep: 1347.32 | bwd_inner_microstep: 1347.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 13:36:15,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1385.75 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 13:36:17,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.75 | bwd_microstep: 1415.78 | bwd_inner_microstep: 1415.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3572
[2024-06-10 13:36:19,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.12 | bwd_microstep: 1447.88 | bwd_inner_microstep: 1447.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-10 13:36:20,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.32 | bwd_microstep: 886.70 | bwd_inner_microstep: 886.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 13:36:21,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.35 | bwd_microstep: 794.87 | bwd_inner_microstep: 794.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3664
[2024-06-10 13:36:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1557.45 | bwd_inner_microstep: 1557.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3510
[2024-06-10 13:36:25,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.44 | bwd_microstep: 1509.48 | bwd_inner_microstep: 1509.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 13:36:27,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1489.07 | bwd_inner_microstep: 1489.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3667
[2024-06-10 13:36:29,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.75 | bwd_microstep: 1354.56 | bwd_inner_microstep: 1354.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 13:36:31,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.13 | bwd_microstep: 1156.99 | bwd_inner_microstep: 1156.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 13:36:33,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.37 | bwd_microstep: 1580.69 | bwd_inner_microstep: 1580.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 13:36:35,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 13:36:37,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.94 | bwd_microstep: 1179.99 | bwd_inner_microstep: 1179.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-10 13:36:39,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.59 | bwd_microstep: 1547.42 | bwd_inner_microstep: 1547.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2551
[2024-06-10 13:36:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.74 | bwd_microstep: 973.20 | bwd_inner_microstep: 973.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798
[2024-06-10 13:36:42,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1453.84 | bwd_inner_microstep: 1453.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 13:36:44,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1379.07 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3774
[2024-06-10 13:36:47,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.90 | bwd_microstep: 1846.58 | bwd_inner_microstep: 1846.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 13:36:48,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 13:36:56,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 13:36:56,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 7397.26 | bwd_inner_microstep: 1587.60 | bwd_allreduce_microstep: 5809.61 | step_microstep: 38.03
[2024-06-10 13:36:56,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15537.47 | bwd: 47395.00 | bwd_inner: 41584.48 | bwd_allreduce: 5809.84 | step: 39.46
{'loss': 1.2482, 'learning_rate': 2.5121977928318638e-05, 'epoch': 0.44}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 13:36:58,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.51 | bwd_microstep: 1330.18 | bwd_inner_microstep: 1330.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 13:37:00,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.57 | bwd_microstep: 1295.74 | bwd_inner_microstep: 1295.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 13:37:02,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1405.68 | bwd_inner_microstep: 1405.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3821
[2024-06-10 13:37:04,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.83 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2041
[2024-06-10 13:37:05,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.45 | bwd_microstep: 776.82 | bwd_inner_microstep: 776.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 13:37:07,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.93 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1954
[2024-06-10 13:37:08,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.39 | bwd_microstep: 774.89 | bwd_inner_microstep: 774.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 13:37:10,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.30 | bwd_microstep: 1285.68 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096
[2024-06-10 13:37:11,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.66 | bwd_microstep: 821.13 | bwd_inner_microstep: 821.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 13:37:12,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 806.50 | bwd_inner_microstep: 806.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2125
[2024-06-10 13:37:13,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.68 | bwd_microstep: 831.69 | bwd_inner_microstep: 831.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 13:37:15,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.77 | bwd_microstep: 798.62 | bwd_inner_microstep: 798.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3707
[2024-06-10 13:37:17,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.31 | bwd_microstep: 1561.64 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 13:37:19,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1490.46 | bwd_inner_microstep: 1490.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977
[2024-06-10 13:37:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.00 | bwd_microstep: 831.33 | bwd_inner_microstep: 831.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 13:37:22,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 13:37:24,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1447.75 | bwd_inner_microstep: 1447.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 13:37:26,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.62 | bwd_microstep: 1472.94 | bwd_inner_microstep: 1472.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 13:37:28,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1388.87 | bwd_inner_microstep: 1388.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441
[2024-06-10 13:37:30,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.45 | bwd_microstep: 1408.46 | bwd_inner_microstep: 1408.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435
[2024-06-10 13:37:31,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.74 | bwd_microstep: 1157.11 | bwd_inner_microstep: 1157.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 13:37:33,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.01 | bwd_microstep: 1560.77 | bwd_inner_microstep: 1560.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 13:37:35,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.25 | bwd_microstep: 1299.85 | bwd_inner_microstep: 1299.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 13:37:37,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.47 | bwd_inner_microstep: 1556.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 13:37:39,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.29 | bwd_microstep: 1185.14 | bwd_inner_microstep: 1185.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3557
[2024-06-10 13:37:41,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.61 | bwd_microstep: 1202.46 | bwd_inner_microstep: 1202.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 13:37:42,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.10 | bwd_microstep: 697.11 | bwd_inner_microstep: 697.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2246
[2024-06-10 13:37:43,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.37 | bwd_microstep: 901.78 | bwd_inner_microstep: 901.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3484
[2024-06-10 13:37:45,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1330.37 | bwd_inner_microstep: 1330.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3445
[2024-06-10 13:37:47,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.11 | bwd_microstep: 1310.75 | bwd_inner_microstep: 1310.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 13:37:48,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1346.95 | bwd_inner_microstep: 1346.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 13:37:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 13:37:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 8913.01 | bwd_inner_microstep: 1745.50 | bwd_allreduce_microstep: 7167.46 | step_microstep: 38.17
[2024-06-10 13:37:58,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14721.80 | bwd: 46503.04 | bwd_inner: 39334.68 | bwd_allreduce: 7167.68 | step: 39.61
21s/it]


 43%|████▎     | 747/1726 [12:55:25<16:38:43, 61.21s/it]
 43%|████▎     | 748/1726 [12:56:26<16:35:24, 61.07s/it]


 43%|████▎     | 748/1726 [12:56:26<16:35:24, 61.07s/it]
 43%|████▎     | 749/1726 [12:57:27<16:36:13, 61.18s/it]


 43%|████▎     | 749/1726 [12:57:27<16:36:13, 61.18s/it]
 43%|████▎     | 750/1726 [12:58:30<16:42:59, 61.66s/it]


 43%|████▎     | 750/1726 [12:58:30<16:42:59, 61.66s/it]
 44%|████▎     | 751/1726 [12:59:33<16:49:46, 62.14s/it]


 44%|████▎     | 751/1726 [12:59:33<16:49:46, 62.14s/it]
 44%|████▎     | 752/1726 [13:00:35<16:45:50, 61.96s/it]
                                                        {'loss': 1.1944, 'learning_rate': 2.5085686703072492e-05, 'epoch': 0.44}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 13:38:00,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1586.05 | bwd_inner_microstep: 1586.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419
[2024-06-10 13:38:02,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.32 | bwd_microstep: 1276.97 | bwd_inner_microstep: 1276.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4462
[2024-06-10 13:38:04,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 676.17 | bwd_microstep: 1824.18 | bwd_inner_microstep: 1824.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3833
[2024-06-10 13:38:06,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942
[2024-06-10 13:38:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.74 | bwd_microstep: 698.54 | bwd_inner_microstep: 698.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426
[2024-06-10 13:38:09,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.39 | bwd_microstep: 1183.53 | bwd_inner_microstep: 1183.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 13:38:10,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.64 | bwd_microstep: 791.98 | bwd_inner_microstep: 791.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3560
[2024-06-10 13:38:12,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.61 | bwd_microstep: 1201.98 | bwd_inner_microstep: 1201.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2390
[2024-06-10 13:38:13,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.64 | bwd_microstep: 932.18 | bwd_inner_microstep: 932.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 13:38:15,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1423.61 | bwd_inner_microstep: 1423.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515
[2024-06-10 13:38:17,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1250.71 | bwd_inner_microstep: 1250.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1908
[2024-06-10 13:38:18,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.86 | bwd_microstep: 720.78 | bwd_inner_microstep: 720.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 13:38:20,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1448.37 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 13:38:22,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2115
[2024-06-10 13:38:23,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.37 | bwd_microstep: 926.46 | bwd_inner_microstep: 926.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 13:38:25,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1446.69 | bwd_inner_microstep: 1446.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 13:38:27,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1486.00 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 13:38:29,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.92 | bwd_microstep: 1319.93 | bwd_inner_microstep: 1319.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3636
[2024-06-10 13:38:31,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.14 | bwd_microstep: 1522.27 | bwd_inner_microstep: 1522.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 979
[2024-06-10 13:38:31,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 150.87 | bwd_microstep: 387.90 | bwd_inner_microstep: 387.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 13:38:33,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.49 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3961
[2024-06-10 13:38:36,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.84 | bwd_microstep: 1697.19 | bwd_inner_microstep: 1697.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 13:38:37,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1281.08 | bwd_inner_microstep: 1281.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898
[2024-06-10 13:38:40,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.61 | bwd_microstep: 1688.90 | bwd_inner_microstep: 1688.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 13:38:42,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.39 | bwd_microstep: 1654.12 | bwd_inner_microstep: 1654.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 13:38:44,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1351.28 | bwd_inner_microstep: 1351.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 13:38:46,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1355.25 | bwd_inner_microstep: 1355.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 13:38:48,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.72 | bwd_microstep: 1442.76 | bwd_inner_microstep: 1442.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-10 13:38:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.55 | bwd_microstep: 972.36 | bwd_inner_microstep: 972.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2065
[2024-06-10 13:38:50,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.83 | bwd_microstep: 849.80 | bwd_inner_microstep: 849.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2087
[2024-06-10 13:38:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 916.05 | bwd_inner_microstep: 916.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 13:38:59,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 13:38:59,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 6743.96 | bwd_inner_microstep: 1698.78 | bwd_allreduce_microstep: 5045.13 | step_microstep: 38.25
[2024-06-10 13:38:59,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15054.68 | bwd: 45433.97 | bwd_inner: 40387.94 | bwd_allreduce: 5045.35 | step: 39.73
{'loss': 1.2414, 'learning_rate': 2.5049377566064226e-05, 'epoch': 0.44}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 13:39:01,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.21 | bwd_microstep: 1437.68 | bwd_inner_microstep: 1437.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 13:39:03,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.67 | bwd_microstep: 1276.61 | bwd_inner_microstep: 1276.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883
[2024-06-10 13:39:05,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.30 | bwd_microstep: 1581.25 | bwd_inner_microstep: 1581.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 13:39:07,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1501.05 | bwd_inner_microstep: 1501.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2898
[2024-06-10 13:39:08,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.02 | bwd_microstep: 1085.92 | bwd_inner_microstep: 1085.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 13:39:10,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.50 | bwd_microstep: 1246.33 | bwd_inner_microstep: 1246.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-10 13:39:12,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.68 | bwd_microstep: 1634.22 | bwd_inner_microstep: 1634.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 13:39:14,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1396.97 | bwd_inner_microstep: 1396.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 13:39:16,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.92 | bwd_microstep: 1248.11 | bwd_inner_microstep: 1248.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 13:39:18,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.85 | bwd_microstep: 1286.39 | bwd_inner_microstep: 1286.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654
[2024-06-10 13:39:20,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.69 | bwd_microstep: 1417.83 | bwd_inner_microstep: 1417.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 13:39:21,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1249.75 | bwd_inner_microstep: 1249.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3481
[2024-06-10 13:39:23,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.79 | bwd_microstep: 1439.48 | bwd_inner_microstep: 1439.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-10 13:39:24,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.19 | bwd_microstep: 699.24 | bwd_inner_microstep: 699.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 13:39:26,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.90 | bwd_microstep: 1522.06 | bwd_inner_microstep: 1522.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 13:39:28,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1336.39 | bwd_inner_microstep: 1336.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 13:39:31,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.59 | bwd_microstep: 1598.65 | bwd_inner_microstep: 1598.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 13:39:33,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:39:35,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1381.23 | bwd_inner_microstep: 1381.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 13:39:36,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.90 | bwd_microstep: 1292.99 | bwd_inner_microstep: 1292.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3554
[2024-06-10 13:39:38,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1423.37 | bwd_inner_microstep: 1423.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 13:39:40,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1380.84 | bwd_inner_microstep: 1380.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 13:39:42,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1392.93 | bwd_inner_microstep: 1392.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-10 13:39:44,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1306.44 | bwd_inner_microstep: 1306.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 13:39:46,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.53 | bwd_microstep: 1451.11 | bwd_inner_microstep: 1451.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 13:39:48,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3699
[2024-06-10 13:39:50,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.69 | bwd_microstep: 1433.36 | bwd_inner_microstep: 1433.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 13:39:52,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.24 | bwd_microstep: 1410.41 | bwd_inner_microstep: 1410.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 13:39:54,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.91 | bwd_microstep: 1448.99 | bwd_inner_microstep: 1448.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2054
[2024-06-10 13:39:55,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.69 | bwd_microstep: 842.64 | bwd_inner_microstep: 842.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3613
[2024-06-10 13:39:57,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.63 | bwd_microstep: 1702.17 | bwd_inner_microstep: 1702.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3776
[2024-06-10 13:40:02,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 13:40:02,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.90 | bwd_microstep: 3535.26 | bwd_inner_microstep: 1931.99 | bwd_allreduce_microstep: 1603.21 | step_microstep: 38.06
[2024-06-10 13:40:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16507.86 | bwd: 45914.29 | bwd_inner: 44310.13 | bwd_allreduce: 1603.46 | step: 39.54
{'loss': 1.1912, 'learning_rate': 2.5013050645174414e-05, 'epoch': 0.44}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 13:40:03,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1365.67 | bwd_inner_microstep: 1365.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 13:40:05,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1243.55 | bwd_inner_microstep: 1243.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 13:40:07,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.70 | bwd_microstep: 1586.54 | bwd_inner_microstep: 1586.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 13:40:09,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 13:40:11,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.47 | bwd_microstep: 1541.50 | bwd_inner_microstep: 1541.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 13:40:12,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.67 | bwd_microstep: 679.56 | bwd_inner_microstep: 679.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 13:40:14,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1383.26 | bwd_inner_microstep: 1383.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 13:40:16,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1245.49 | bwd_inner_microstep: 1245.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 13:40:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.78 | bwd_microstep: 1151.96 | bwd_inner_microstep: 1151.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-10 13:40:19,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1316.08 | bwd_inner_microstep: 1316.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502
[2024-06-10 13:40:21,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1431.12 | bwd_inner_microstep: 1431.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 13:40:23,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.63 | bwd_microstep: 1475.73 | bwd_inner_microstep: 1475.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3714
[2024-06-10 13:40:26,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.11 | bwd_microstep: 1673.56 | bwd_inner_microstep: 1673.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402
[2024-06-10 13:40:28,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.79 | bwd_microstep: 1368.39 | bwd_inner_microstep: 1368.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 13:40:30,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.39 | bwd_microstep: 1498.23 | bwd_inner_microstep: 1498.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 13:40:32,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.38 | bwd_microstep: 1586.81 | bwd_inner_microstep: 1586.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3645
[2024-06-10 13:40:34,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1574.09 | bwd_inner_microstep: 1574.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 13:40:36,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.76 | bwd_microstep: 1391.86 | bwd_inner_microstep: 1391.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 13:40:38,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.24 | bwd_microstep: 1619.16 | bwd_inner_microstep: 1619.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 13:40:40,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.51 | bwd_microstep: 1257.49 | bwd_inner_microstep: 1257.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 13:40:42,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1457.78 | bwd_inner_microstep: 1457.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3838
[2024-06-10 13:40:44,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.40 | bwd_microstep: 1586.93 | bwd_inner_microstep: 1586.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 13:40:45,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 800.14 | bwd_inner_microstep: 800.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 13:40:47,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.13 | bwd_microstep: 1458.64 | bwd_inner_microstep: 1458.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3561
[2024-06-10 13:40:49,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.78 | bwd_microstep: 1429.72 | bwd_inner_microstep: 1429.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 13:40:51,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 1511.78 | bwd_inner_microstep: 1511.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 13:40:53,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1553.27 | bwd_inner_microstep: 1553.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 13:40:56,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1507.12 | bwd_inner_microstep: 1507.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3461
[2024-06-10 13:40:57,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1425.67 | bwd_inner_microstep: 1425.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 13:41:00,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.94 | bwd_microstep: 1616.95 | bwd_inner_microstep: 1616.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 13:41:02,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.67 | bwd_microstep: 1544.43 | bwd_inner_microstep: 1544.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 13:41:05,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.05 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 13:41:05,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.75 | bwd_microstep: 2827.84 | bwd_inner_microstep: 1867.35 | bwd_allreduce_microstep: 960.43 | step_microstep: 37.98
[2024-06-10 13:41:05,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16905.63 | bwd: 46491.78 | bwd_inner: 45530.44 | bwd_allreduce: 960.66 | step: 39.49
{'loss': 1.2396, 'learning_rate': 2.4976706068346293e-05, 'epoch': 0.44}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 13:41:07,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.46 | bwd_microstep: 1562.66 | bwd_inner_microstep: 1562.54 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 13:41:10,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1494.68 | bwd_inner_microstep: 1494.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 13:41:11,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.05 | bwd_microstep: 792.24 | bwd_inner_microstep: 792.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-10 13:41:13,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1451.24 | bwd_inner_microstep: 1451.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2052
[2024-06-10 13:41:14,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.65 | bwd_microstep: 721.79 | bwd_inner_microstep: 721.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 13:41:15,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1280.16 | bwd_inner_microstep: 1280.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 13:41:17,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 787.72 | bwd_inner_microstep: 787.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 13:41:19,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.39 | bwd_microstep: 1505.80 | bwd_inner_microstep: 1505.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-10 13:41:21,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.93 | bwd_microstep: 1633.48 | bwd_inner_microstep: 1633.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 13:41:23,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 13:41:24,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.12 | bwd_microstep: 1250.26 | bwd_inner_microstep: 1250.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 13:41:26,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1247.28 | bwd_inner_microstep: 1247.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3590
[2024-06-10 13:41:28,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.65 | bwd_microstep: 1366.25 | bwd_inner_microstep: 1366.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 13:41:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.06 | bwd_microstep: 800.05 | bwd_inner_microstep: 800.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1904
[2024-06-10 13:41:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.34 | bwd_microstep: 836.66 | bwd_inner_microstep: 836.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 13:41:31,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.89 | bwd_microstep: 793.47 | bwd_inner_microstep: 793.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 13:41:33,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.02 | bwd_microstep: 1184.88 | bwd_inner_microstep: 1184.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 13:41:34,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 975.30 | bwd_inner_microstep: 975.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2306
[2024-06-10 13:41:36,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.31 | bwd_microstep: 980.85 | bwd_inner_microstep: 980.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 13:41:38,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1286.62 | bwd_inner_microstep: 1286.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 13:41:39,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1353.15 | bwd_inner_microstep: 1353.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3608
[2024-06-10 13:41:41,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.72 | bwd_microstep: 1437.06 | bwd_inner_microstep: 1437.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532
[2024-06-10 13:41:43,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1323.47 | bwd_inner_microstep: 1323.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4019
[2024-06-10 13:41:46,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1615.94 | bwd_inner_microstep: 1615.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810
[2024-06-10 13:41:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.12 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 13:41:50,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.00 | bwd_microstep: 1586.40 | bwd_inner_microstep: 1586.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3383
[2024-06-10 13:41:52,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.72 | bwd_microstep: 1366.65 | bwd_inner_microstep: 1366.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2065
[2024-06-10 13:41:53,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.06 | bwd_microstep: 915.05 | bwd_inner_microstep: 915.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3673
[2024-06-10 13:41:56,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.11 | bwd_microstep: 1718.58 | bwd_inner_microstep: 1718.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2056
[2024-06-10 13:41:57,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.03 | bwd_microstep: 815.58 | bwd_inner_microstep: 815.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-10 13:41:59,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.54 | bwd_microstep: 1434.50 | bwd_inner_microstep: 1434.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3567
[2024-06-10 13:42:04,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 13:42:04,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.25 | bwd_microstep: 5092.94 | bwd_inner_microstep: 1762.98 | bwd_allreduce_microstep: 3329.91 | step_microstep: 38.08
[2024-06-10 13:42:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15045.51 | bwd: 43753.77 | bwd_inner: 40422.85 | bwd_allreduce: 3330.20 | step: 39.59
{'loss': 1.274, 'learning_rate': 2.4940343963585267e-05, 'epoch': 0.44}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 13:42:06,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.77 | bwd_microstep: 1233.22 | bwd_inner_microstep: 1233.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3946
[2024-06-10 13:42:08,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.17 | bwd_microstep: 1595.10 | bwd_inner_microstep: 1595.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 13:42:10,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.82 | bwd_microstep: 1241.56 | bwd_inner_microstep: 1241.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3793
[2024-06-10 13:42:12,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.58 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 13:42:14,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.37 | bwd_microstep: 1479.25 | bwd_inner_microstep: 1479.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3701
[2024-06-10 13:42:16,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.29 | bwd_microstep: 1421.13 | bwd_inner_microstep: 1421.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 13:42:18,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1351.20 | bwd_inner_microstep: 1351.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:42:20,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1384.68 | bwd_inner_microstep: 1384.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 13:42:22,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 13:42:24,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1343.30 | bwd_inner_microstep: 1343.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 729
[2024-06-10 13:42:24,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.75 | bwd_microstep: 293.28 | bwd_inner_microstep: 293.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-10 13:42:26,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1161.16 | bwd_inner_microstep: 1161.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 13:42:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.03 | bwd_microstep: 789.02 | bwd_inner_microstep: 789.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 13:42:29,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1491.39 | bwd_inner_microstep: 1491.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 13:42:31,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1380.53 | bwd_inner_microstep: 1380.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3423
[2024-06-10 13:42:33,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.72 | bwd_microstep: 1474.63 | bwd_inner_microstep: 1474.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 13:42:35,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 13:42:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.04 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-10 13:42:38,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.11 | bwd_microstep: 694.95 | bwd_inner_microstep: 694.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1939
[2024-06-10 13:42:39,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.66 | bwd_microstep: 764.75 | bwd_inner_microstep: 764.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 13:42:41,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.07 | bwd_microstep: 1294.35 | bwd_inner_microstep: 1294.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2214
[2024-06-10 13:42:42,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.65 | bwd_microstep: 896.07 | bwd_inner_microstep: 896.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 13:42:44,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.70 | bwd_microstep: 1287.52 | bwd_inner_microstep: 1287.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2287
[2024-06-10 13:42:45,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.50 | bwd_microstep: 910.04 | bwd_inner_microstep: 910.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3803
[2024-06-10 13:42:47,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.26 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749
[2024-06-10 13:42:49,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1443.00 | bwd_inner_microstep: 1442.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3472
[2024-06-10 13:42:50,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1329.23 | bwd_inner_microstep: 1329.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3811
[2024-06-10 13:42:53,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1584.03 | bwd_inner_microstep: 1584.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 13:42:55,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1436.15 | bwd_inner_microstep: 1436.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 13:42:57,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1549.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822
[2024-06-10 13:42:59,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.53 | bwd_microstep: 1688.57 | bwd_inner_microstep: 1688.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-10 13:43:05,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 13:43:05,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 5261.49 | bwd_inner_microstep: 1850.37 | bwd_allreduce_microstep: 3411.07 | step_microstep: 38.13
[2024-06-10 13:43:05,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15450.86 | bwd: 44792.23 | bwd_inner: 41380.25 | bwd_allreduce: 3411.30 | step: 39.62
{'loss': 1.2172, 'learning_rate': 2.490396445895849e-05, 'epoch': 0.44}


 44%|████▎     | 752/1726 [13:00:35<16:45:50, 61.96s/it]
 44%|████▎     | 753/1726 [13:01:36<16:39:11, 61.62s/it]


 44%|████▎     | 753/1726 [13:01:36<16:39:11, 61.62s/it]
 44%|████▎     | 754/1726 [13:02:38<16:43:42, 61.96s/it]


 44%|████▎     | 754/1726 [13:02:38<16:43:42, 61.96s/it]
 44%|████▎     | 755/1726 [13:03:42<16:51:18, 62.49s/it]


 44%|████▎     | 755/1726 [13:03:42<16:51:18, 62.49s/it]
 44%|████▍     | 756/1726 [13:04:41<16:33:57, 61.48s/it]


 44%|████▍     | 756/1726 [13:04:41<16:33:57, 61.48s/it]
 44%|████▍     | 757/1726 [13:05:42<16:28:30, 61.21s/it]


 44%|████▍     | 757/1726 [13:05:42<16:28:30, 61.21sdynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 13:43:07,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1366.33 | bwd_inner_microstep: 1366.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 13:43:08,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.34 | bwd_microstep: 676.17 | bwd_inner_microstep: 676.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 13:43:10,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.16 | bwd_microstep: 1240.92 | bwd_inner_microstep: 1240.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 13:43:11,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.52 | bwd_microstep: 1187.55 | bwd_inner_microstep: 1187.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 13:43:13,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1277.98 | bwd_inner_microstep: 1277.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729
[2024-06-10 13:43:15,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.61 | bwd_microstep: 1463.85 | bwd_inner_microstep: 1463.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 13:43:17,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.06 | bwd_microstep: 1537.59 | bwd_inner_microstep: 1537.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 13:43:19,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.25 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 13:43:21,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.03 | bwd_microstep: 1385.29 | bwd_inner_microstep: 1385.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 958
[2024-06-10 13:43:21,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 157.96 | bwd_microstep: 413.09 | bwd_inner_microstep: 413.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 13:43:23,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1247.30 | bwd_inner_microstep: 1247.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 13:43:25,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1248.08 | bwd_inner_microstep: 1248.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 13:43:27,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.13 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2881
[2024-06-10 13:43:28,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.85 | bwd_microstep: 1212.04 | bwd_inner_microstep: 1212.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2143
[2024-06-10 13:43:30,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.93 | bwd_microstep: 1024.79 | bwd_inner_microstep: 1024.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988
[2024-06-10 13:43:31,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.94 | bwd_microstep: 831.00 | bwd_inner_microstep: 830.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 13:43:33,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1484.65 | bwd_inner_microstep: 1484.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 13:43:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1406.00 | bwd_inner_microstep: 1405.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 13:43:37,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.69 | bwd_microstep: 1649.03 | bwd_inner_microstep: 1649.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 13:43:39,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1386.97 | bwd_inner_microstep: 1386.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 13:43:41,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.34 | bwd_microstep: 1553.52 | bwd_inner_microstep: 1553.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-10 13:43:43,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1437.40 | bwd_inner_microstep: 1437.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 13:43:45,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1382.26 | bwd_inner_microstep: 1382.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 13:43:47,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.97 | bwd_microstep: 1179.50 | bwd_inner_microstep: 1179.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 13:43:49,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1508.28 | bwd_inner_microstep: 1508.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-10 13:43:51,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.98 | bwd_microstep: 1320.98 | bwd_inner_microstep: 1320.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 13:43:53,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1372.19 | bwd_inner_microstep: 1372.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 13:43:54,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.11 | bwd_microstep: 908.41 | bwd_inner_microstep: 908.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 13:43:56,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.74 | bwd_microstep: 1504.43 | bwd_inner_microstep: 1504.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 13:43:58,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.59 | bwd_microstep: 1643.25 | bwd_inner_microstep: 1643.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028
[2024-06-10 13:43:59,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.06 | bwd_microstep: 905.54 | bwd_inner_microstep: 905.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3771
[2024-06-10 13:44:05,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 13:44:05,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.17 | bwd_microstep: 4989.04 | bwd_inner_microstep: 1908.63 | bwd_allreduce_microstep: 3080.35 | step_microstep: 38.25
[2024-06-10 13:44:05,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15407.61 | bwd: 44415.64 | bwd_inner: 41334.38 | bwd_allreduce: 3080.59 | step: 39.72
{'loss': 1.2921, 'learning_rate': 2.4867567682594374e-05, 'epoch': 0.44}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3472
[2024-06-10 13:44:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.59 | bwd_microstep: 1428.92 | bwd_inner_microstep: 1428.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3931
[2024-06-10 13:44:09,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.92 | bwd_microstep: 1589.24 | bwd_inner_microstep: 1589.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 13:44:11,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1345.02 | bwd_inner_microstep: 1345.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 13:44:13,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.65 | bwd_microstep: 1535.73 | bwd_inner_microstep: 1535.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2205
[2024-06-10 13:44:14,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.61 | bwd_microstep: 856.10 | bwd_inner_microstep: 856.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 13:44:16,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.00 | bwd_microstep: 790.17 | bwd_inner_microstep: 790.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-10 13:44:17,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.36 | bwd_microstep: 825.02 | bwd_inner_microstep: 824.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 13:44:18,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1247.35 | bwd_inner_microstep: 1247.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 13:44:20,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1386.72 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3676
[2024-06-10 13:44:22,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.79 | bwd_microstep: 1452.17 | bwd_inner_microstep: 1452.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3543
[2024-06-10 13:44:24,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1422.82 | bwd_inner_microstep: 1422.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508
[2024-06-10 13:44:26,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1445.19 | bwd_inner_microstep: 1445.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 13:44:28,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1340.48 | bwd_inner_microstep: 1340.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 13:44:30,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.72 | bwd_microstep: 1347.61 | bwd_inner_microstep: 1347.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2130
[2024-06-10 13:44:31,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.98 | bwd_microstep: 926.55 | bwd_inner_microstep: 926.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3433
[2024-06-10 13:44:33,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1399.78 | bwd_inner_microstep: 1399.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 13:44:36,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.57 | bwd_microstep: 1659.32 | bwd_inner_microstep: 1659.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 13:44:37,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1255.52 | bwd_inner_microstep: 1255.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-10 13:44:39,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.17 | bwd_microstep: 1183.37 | bwd_inner_microstep: 1183.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2282
[2024-06-10 13:44:40,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.61 | bwd_microstep: 812.43 | bwd_inner_microstep: 812.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 962
[2024-06-10 13:44:41,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 160.35 | bwd_microstep: 416.05 | bwd_inner_microstep: 416.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 13:44:43,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.94 | bwd_microstep: 1431.67 | bwd_inner_microstep: 1431.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3480
[2024-06-10 13:44:44,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1327.64 | bwd_inner_microstep: 1327.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 13:44:47,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1511.36 | bwd_inner_microstep: 1511.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 13:44:48,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.01 | bwd_microstep: 1182.12 | bwd_inner_microstep: 1182.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-10 13:44:50,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.14 | bwd_microstep: 1071.29 | bwd_inner_microstep: 1071.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 13:44:51,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.15 | bwd_microstep: 969.07 | bwd_inner_microstep: 969.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 13:44:53,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.26 | bwd_microstep: 1631.42 | bwd_inner_microstep: 1631.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190
[2024-06-10 13:44:54,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.96 | bwd_microstep: 855.35 | bwd_inner_microstep: 855.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 13:44:56,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.69 | bwd_microstep: 1444.76 | bwd_inner_microstep: 1444.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3777
[2024-06-10 13:44:59,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.79 | bwd_microstep: 1678.06 | bwd_inner_microstep: 1678.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 13:45:07,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.34 | optimizer_step: 6.61
[2024-06-10 13:45:07,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.50 | bwd_microstep: 7743.36 | bwd_inner_microstep: 1807.64 | bwd_allreduce_microstep: 5935.66 | step_microstep: 38.91
[2024-06-10 13:45:07,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15108.00 | bwd: 46511.71 | bwd_inner: 40575.14 | bwd_allreduce: 5935.90 | step: 40.35
{'loss': 1.2285, 'learning_rate': 2.4831153762682186e-05, 'epoch': 0.44}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470
[2024-06-10 13:45:09,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.01 | bwd_microstep: 1559.49 | bwd_inner_microstep: 1559.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886
[2024-06-10 13:45:11,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.25 | bwd_microstep: 1580.76 | bwd_inner_microstep: 1580.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 13:45:13,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 13:45:16,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.02 | bwd_microstep: 1634.23 | bwd_inner_microstep: 1634.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 777
[2024-06-10 13:45:16,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.35 | bwd_microstep: 305.98 | bwd_inner_microstep: 305.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 13:45:18,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 13:45:20,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1393.30 | bwd_inner_microstep: 1393.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2083
[2024-06-10 13:45:21,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.82 | bwd_microstep: 821.85 | bwd_inner_microstep: 821.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079
[2024-06-10 13:45:22,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.68 | bwd_microstep: 821.39 | bwd_inner_microstep: 821.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1904
[2024-06-10 13:45:23,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.16 | bwd_microstep: 714.06 | bwd_inner_microstep: 714.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3735
[2024-06-10 13:45:25,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.74 | bwd_microstep: 1483.96 | bwd_inner_microstep: 1483.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-10 13:45:27,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1405.74 | bwd_inner_microstep: 1405.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960
[2024-06-10 13:45:28,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 824.93 | bwd_inner_microstep: 824.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-10 13:45:31,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.43 | bwd_microstep: 1714.00 | bwd_inner_microstep: 1713.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3494
[2024-06-10 13:45:33,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.52 | bwd_microstep: 1547.53 | bwd_inner_microstep: 1547.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-10 13:45:34,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.54 | bwd_microstep: 1194.47 | bwd_inner_microstep: 1194.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640
[2024-06-10 13:45:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.98 | bwd_microstep: 1313.07 | bwd_inner_microstep: 1313.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 13:45:38,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.59 | bwd_microstep: 1527.89 | bwd_inner_microstep: 1527.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-10 13:45:40,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.08 | bwd_microstep: 1431.16 | bwd_inner_microstep: 1431.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 13:45:42,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1394.94 | bwd_inner_microstep: 1394.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2298
[2024-06-10 13:45:43,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.01 | bwd_microstep: 908.61 | bwd_inner_microstep: 908.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 13:45:45,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.50 | bwd_microstep: 875.06 | bwd_inner_microstep: 875.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 13:45:46,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.78 | bwd_microstep: 808.72 | bwd_inner_microstep: 808.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887
[2024-06-10 13:45:47,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.08 | bwd_microstep: 681.97 | bwd_inner_microstep: 681.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692
[2024-06-10 13:45:49,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1454.52 | bwd_inner_microstep: 1454.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716
[2024-06-10 13:45:51,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.32 | bwd_microstep: 1465.65 | bwd_inner_microstep: 1465.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 13:45:53,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.57 | bwd_microstep: 1555.32 | bwd_inner_microstep: 1555.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 13:45:55,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.80 | bwd_microstep: 1647.10 | bwd_inner_microstep: 1647.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2439
[2024-06-10 13:45:57,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.30 | bwd_microstep: 1044.07 | bwd_inner_microstep: 1044.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 13:45:59,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.20 | bwd_microstep: 1618.88 | bwd_inner_microstep: 1618.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3583
[2024-06-10 13:46:01,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.73 | bwd_microstep: 1559.91 | bwd_inner_microstep: 1559.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-10 13:46:10,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.56 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 13:46:10,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.51 | bwd_microstep: 7945.97 | bwd_inner_microstep: 1987.57 | bwd_allreduce_microstep: 5958.35 | step_microstep: 38.76
[2024-06-10 13:46:10,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15240.75 | bwd: 46996.61 | bwd_inner: 41037.34 | bwd_allreduce: 5958.57 | step: 40.26
{'loss': 1.2265, 'learning_rate': 2.479472282747157e-05, 'epoch': 0.44}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 13:46:12,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.06 | bwd_microstep: 1331.48 | bwd_inner_microstep: 1331.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4140
[2024-06-10 13:46:14,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.35 | bwd_microstep: 1635.25 | bwd_inner_microstep: 1635.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3946
[2024-06-10 13:46:16,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.61 | bwd_microstep: 1494.75 | bwd_inner_microstep: 1494.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 13:46:18,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.12 | bwd_microstep: 1546.77 | bwd_inner_microstep: 1546.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766
[2024-06-10 13:46:20,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.92 | bwd_microstep: 1602.46 | bwd_inner_microstep: 1602.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 13:46:22,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1287.19 | bwd_inner_microstep: 1287.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2456
[2024-06-10 13:46:23,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.11 | bwd_microstep: 950.31 | bwd_inner_microstep: 950.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 13:46:24,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.04 | bwd_microstep: 699.38 | bwd_inner_microstep: 699.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3953
[2024-06-10 13:46:26,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.32 | bwd_microstep: 1598.36 | bwd_inner_microstep: 1598.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3449
[2024-06-10 13:46:29,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.32 | bwd_microstep: 1542.73 | bwd_inner_microstep: 1542.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 13:46:31,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1507.44 | bwd_inner_microstep: 1507.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-10 13:46:32,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.47 | bwd_microstep: 886.12 | bwd_inner_microstep: 886.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-10 13:46:34,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1583.89 | bwd_inner_microstep: 1583.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 13:46:36,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504
[2024-06-10 13:46:38,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 1436.86 | bwd_inner_microstep: 1436.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 13:46:40,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1514.49 | bwd_inner_microstep: 1514.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 13:46:42,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1491.75 | bwd_inner_microstep: 1491.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 13:46:44,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1398.31 | bwd_inner_microstep: 1398.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297
[2024-06-10 13:46:45,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.12 | bwd_microstep: 877.20 | bwd_inner_microstep: 877.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 13:46:47,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 1552.81 | bwd_inner_microstep: 1552.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 13:46:49,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1256.16 | bwd_inner_microstep: 1256.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 13:46:52,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.21 | bwd_microstep: 1656.14 | bwd_inner_microstep: 1656.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3810
[2024-06-10 13:46:54,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.47 | bwd_microstep: 1679.71 | bwd_inner_microstep: 1679.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3687
[2024-06-10 13:46:56,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.82 | bwd_microstep: 1458.45 | bwd_inner_microstep: 1458.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 13:46:58,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.93 | bwd_microstep: 1631.21 | bwd_inner_microstep: 1631.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3569
[2024-06-10 13:47:00,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.64 | bwd_microstep: 1526.71 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3718
[2024-06-10 13:47:02,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1563.16 | bwd_inner_microstep: 1563.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3546
[2024-06-10 13:47:04,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.52 | bwd_microstep: 1559.25 | bwd_inner_microstep: 1559.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 13:47:07,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1498.24 | bwd_inner_microstep: 1498.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-10 13:47:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.14 | bwd_microstep: 1435.12 | bwd_inner_microstep: 1435.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-10 13:47:11,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.77 | bwd_microstep: 1450.29 | bwd_inner_microstep: 1450.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 13:47:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 13:47:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.25 | bwd_microstep: 1536.95 | bwd_inner_microstep: 1529.25 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.71
[2024-06-10 13:47:13,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16992.06 | bwd: 45672.42 | bwd_inner: 45663.88 | bwd_allreduce: 7.87 | step: 39.18
{'loss': 1.1662, 'learning_rate': 2.4758275005272073e-05, 'epoch': 0.44}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 13:47:15,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.55 | bwd_microstep: 1474.60 | bwd_inner_microstep: 1474.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1864
[2024-06-10 13:47:16,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.69 | bwd_microstep: 737.93 | bwd_inner_microstep: 737.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 13:47:17,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.13 | bwd_microstep: 1253.51 | bwd_inner_microstep: 1253.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3784
[2024-06-10 13:47:20,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.70 | bwd_microstep: 1492.96 | bwd_inner_microstep: 1492.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 13:47:21,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.62 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2314
[2024-06-10 13:47:23,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.19 | bwd_microstep: 980.88 | bwd_inner_microstep: 980.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 13:47:24,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:47:26,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1386.09 | bwd_inner_microstep: 1386.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 13:47:28,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1394.57 | bwd_inner_microstep: 1394.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 13:47:30,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.25 | bwd_microstep: 1254.18 | bwd_inner_microstep: 1254.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 13:47:32,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.27 | bwd_microstep: 1253.48 | bwd_inner_microstep: 1253.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3443
[2024-06-10 13:47:34,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.54 | bwd_microstep: 1397.09 | bwd_inner_microstep: 1397.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 13:47:35,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.55 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424
[2024-06-10 13:47:38,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.05 | bwd_microstep: 1537.33 | bwd_inner_microstep: 1537.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3507
[2024-06-10 13:47:40,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.16 | bwd_microstep: 1436.94 | bwd_inner_microstep: 1436.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3625
[2024-06-10 13:47:42,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.32 | bwd_microstep: 1577.42 | bwd_inner_microstep: 1577.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 13:47:43,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 13:47:45,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1350.05 | bwd_inner_microstep: 1350.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 13:47:47,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1509.90 | bwd_inner_microstep: 1509.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 13:47:49,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1286.70 | bwd_inner_microstep: 1286.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 13:47:51,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1376.36 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 13:47:53,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1157.02 | bwd_inner_microstep: 1157.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-10 13:47:54,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.54 | bwd_microstep: 1186.68 | bwd_inner_microstep: 1186.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3544
[2024-06-10 13:47:57,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.05 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3810
[2024-06-10 13:47:59,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.41 | bwd_microstep: 1668.68 | bwd_inner_microstep: 1668.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3825
[2024-06-10 13:48:01,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.19 | bwd_microstep: 1415.22 | bwd_inner_microstep: 1415.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 45, images per sample: 11.25, dynamic token length: 3813
[2024-06-10 13:48:03,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.07 | bwd_microstep: 1734.73 | bwd_inner_microstep: 1734.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 13:48:05,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.25 | bwd_microstep: 1589.79 | bwd_inner_microstep: 1589.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 13:48:08,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.89 | bwd_microstep: 1597.59 | bwd_inner_microstep: 1597.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3818
[2024-06-10 13:48:10,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.09 | bwd_microstep: 1505.81 | bwd_inner_microstep: 1505.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 13:48:12,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.51 | bwd_microstep: 1442.08 | bwd_inner_microstep: 1442.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580
[2024-06-10 13:48:16,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.14 | optimizer_step: 6.61
[2024-06-10 13:48:16,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.18 | bwd_microstep: 3213.51 | bwd_inner_microstep: 1921.68 | bwd_allreduce_microstep: 1291.78 | step_microstep: 37.55
[2024-06-10 13:48:16,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16591.85 | bwd: 45916.71 | bwd_inner: 44624.02 | bwd_allreduce: 1292.01 | step: 39.08
{'loss': 1.2488, 'learning_rate': 2.472181042445274e-05, 'epoch': 0.44}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 13:48:17,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1332.02 | bwd_inner_microstep: 1332.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2364
[2024-06-10 13:48:19,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.12 | bwd_microstep: 987.57 | bwd_inner_microstep: 987.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 13:48:20,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.04 | bwd_microstep: 785.44 | bwd_inner_microstep: 785.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 13:48:22,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.63 | bwd_microstep: 1479.68 | bwd_inner_microstep: 1479.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2672
[2024-06-10 13:48:23,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.42 | bwd_microstep: 976.96 | bwd_inner_microstep: 976.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931
[2024-06-10 13:48:24,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.68 | bwd_microstep: 727.27 | bwd_inner_microstep: 727.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-10 13:48:25,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.45 | bwd_microstep: 703.99 | bwd_inner_microstep: 703.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-10 13:48:27,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1441.97 | bwd_inner_microstep: 1441.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 13:48:29,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1446.33 | bwd_inner_microstep: 1446.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1862
[2024-06-10 13:48:30,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.10 | bwd_microstep: 705.76 | bwd_inner_microstep: 705.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 13:48:32,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1249.95 | bwd_inner_microstep: 1249.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-10 13:48:34,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1582.63 | bwd_inner_microstep: 1582.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 13:48:35,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.58 | bwd_microstep: 775.28 | bwd_inner_microstep: 775.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919
[2024-06-10 13:48:36,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.63 | bwd_microstep: 778.56 | bwd_inner_microstep: 778.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911
[2024-06-10 13:48:38,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.11 | bwd_microstep: 1597.68 | bwd_inner_microstep: 1597.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 13:48:41,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1554.88 | bwd_inner_microstep: 1554.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 13:48:42,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.87 | bwd_microstep: 791.93 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 13:48:44,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1491.31 | bwd_inner_microstep: 1491.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-10 13:48:46,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1517.33 | bwd_inner_microstep: 1517.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 13:48:48,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1314.22 | bwd_inner_microstep: 1314.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3839
[2024-06-10 13:48:50,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.56 | bwd_microstep: 1360.56 | bwd_inner_microstep: 1360.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 13:48:52,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.94 | bwd_microstep: 1510.09 | bwd_inner_microstep: 1510.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-10 13:48:54,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1612.80 | bwd_inner_microstep: 1612.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2271
[2024-06-10 13:48:55,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.30 | bwd_microstep: 905.79 | bwd_inner_microstep: 905.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 13:48:57,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.40 | bwd_microstep: 1295.13 | bwd_inner_microstep: 1295.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 13:48:59,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1317.62 | bwd_inner_microstep: 1317.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3639
[2024-06-10 13:49:01,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.68 | bwd_microstep: 1437.00 | bwd_inner_microstep: 1436.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 13:49:03,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1489.99 | bwd_inner_microstep: 1489.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438
[2024-06-10 13:49:05,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.16 | bwd_microstep: 1403.96 | bwd_inner_microstep: 1403.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1889
[2024-06-10 13:49:06,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.87 | bwd_microstep: 712.42 | bwd_inner_microstep: 712.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 13:49:08,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.26 | bwd_microstep: 1604.63 | bwd_inner_microstep: 1604.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 13:49:18,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 13:49:18,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.27 | bwd_microstep: 9239.56 | bwd_inner_microstep: 1993.17 | bwd_allreduce_microstep: 7246.34 | step_microstep: 38.24
[2024-06-10 13:49:18,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14827.26 | bwd: 47130.33 | bwd_inner: 39883.09 | bwd_allreduce: 7246.57 | step: 39.70
/it]
 44%|████▍     | 758/1726 [13:06:42<16:22:21, 60.89s/it]


 44%|████▍     | 758/1726 [13:06:42<16:22:21, 60.89s/it]
 44%|████▍     | 759/1726 [13:07:44<16:26:26, 61.21s/it]


 44%|████▍     | 759/1726 [13:07:44<16:26:26, 61.21s/it]
 44%|████▍     | 760/1726 [13:08:46<16:32:00, 61.62s/it]


 44%|████▍     | 760/1726 [13:08:46<16:32:00, 61.62s/it]
 44%|████▍     | 761/1726 [13:09:49<16:37:43, 62.03s/it]


 44%|████▍     | 761/1726 [13:09:49<16:37:43, 62.03s/it]
 44%|████▍     | 762/1726 [13:10:52<16:40:36, 62.28s/it]


 44%|████▍     | 762/1726 [13:10:52<16:40:36, 62.28s/it]
 44%|████▍     | 763/1726 [13:11:55<16:39:35, 62.{'loss': 1.1918, 'learning_rate': 2.4685329213441645e-05, 'epoch': 0.44}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 13:49:20,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1467.84 | bwd_inner_microstep: 1467.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3941
[2024-06-10 13:49:22,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1494.77 | bwd_inner_microstep: 1494.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3863
[2024-06-10 13:49:24,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1392.86 | bwd_inner_microstep: 1392.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3776
[2024-06-10 13:49:26,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1400.87 | bwd_inner_microstep: 1400.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 13:49:27,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.92 | bwd_microstep: 971.45 | bwd_inner_microstep: 971.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:49:29,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1383.92 | bwd_inner_microstep: 1383.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 13:49:31,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1379.93 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 13:49:33,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 13:49:35,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.97 | bwd_microstep: 1438.33 | bwd_inner_microstep: 1438.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3631
[2024-06-10 13:49:37,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1345.52 | bwd_inner_microstep: 1345.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-10 13:49:39,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.29 | bwd_microstep: 1333.87 | bwd_inner_microstep: 1333.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2388
[2024-06-10 13:49:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.99 | bwd_microstep: 1028.25 | bwd_inner_microstep: 1028.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3658
[2024-06-10 13:49:42,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.03 | bwd_microstep: 1545.28 | bwd_inner_microstep: 1545.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 13:49:44,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.63 | bwd_microstep: 1486.94 | bwd_inner_microstep: 1486.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3055
[2024-06-10 13:49:46,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.36 | bwd_microstep: 1137.75 | bwd_inner_microstep: 1137.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3629
[2024-06-10 13:49:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.98 | bwd_microstep: 1485.54 | bwd_inner_microstep: 1485.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1996
[2024-06-10 13:49:49,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.44 | bwd_microstep: 899.71 | bwd_inner_microstep: 899.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3525
[2024-06-10 13:49:51,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.15 | bwd_microstep: 1195.91 | bwd_inner_microstep: 1195.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 13:49:53,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.83 | bwd_microstep: 1430.13 | bwd_inner_microstep: 1430.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 13:49:55,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1616.60 | bwd_inner_microstep: 1616.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2140
[2024-06-10 13:49:56,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.33 | bwd_microstep: 931.04 | bwd_inner_microstep: 931.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 13:49:58,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1311.12 | bwd_inner_microstep: 1311.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3806
[2024-06-10 13:50:00,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.64 | bwd_microstep: 1685.88 | bwd_inner_microstep: 1685.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 13:50:02,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.06 | bwd_microstep: 1451.61 | bwd_inner_microstep: 1451.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808
[2024-06-10 13:50:04,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.37 | bwd_microstep: 1351.70 | bwd_inner_microstep: 1351.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-10 13:50:05,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.91 | bwd_microstep: 915.90 | bwd_inner_microstep: 915.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 13:50:07,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1437.48 | bwd_inner_microstep: 1437.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3683
[2024-06-10 13:50:09,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.41 | bwd_microstep: 1279.29 | bwd_inner_microstep: 1279.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2067
[2024-06-10 13:50:10,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.62 | bwd_microstep: 917.32 | bwd_inner_microstep: 917.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 13:50:12,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.25 | bwd_microstep: 1356.97 | bwd_inner_microstep: 1356.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 13:50:13,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.71 | bwd_microstep: 814.43 | bwd_inner_microstep: 814.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3576
[2024-06-10 13:50:19,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.58
[2024-06-10 13:50:19,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 5371.96 | bwd_inner_microstep: 1605.85 | bwd_allreduce_microstep: 3766.05 | step_microstep: 37.90
[2024-06-10 13:50:19,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15615.19 | bwd: 45646.86 | bwd_inner: 41879.91 | bwd_allreduce: 3766.28 | step: 39.43
{'loss': 1.2621, 'learning_rate': 2.4648831500725416e-05, 'epoch': 0.44}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 13:50:21,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.67 | bwd_microstep: 1481.06 | bwd_inner_microstep: 1481.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 13:50:24,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.32 | bwd_microstep: 1494.98 | bwd_inner_microstep: 1494.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 13:50:26,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.93 | bwd_microstep: 1640.59 | bwd_inner_microstep: 1640.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 13:50:28,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.71 | bwd_microstep: 1640.67 | bwd_inner_microstep: 1640.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 13:50:29,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.57 | bwd_microstep: 799.07 | bwd_inner_microstep: 799.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3598
[2024-06-10 13:50:31,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1309.41 | bwd_inner_microstep: 1309.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 13:50:33,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.01 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3703
[2024-06-10 13:50:35,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.63 | bwd_microstep: 1450.68 | bwd_inner_microstep: 1450.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2176
[2024-06-10 13:50:36,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.04 | bwd_microstep: 984.87 | bwd_inner_microstep: 984.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 13:50:38,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1477.76 | bwd_inner_microstep: 1477.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3545
[2024-06-10 13:50:40,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.81 | bwd_microstep: 1462.15 | bwd_inner_microstep: 1462.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3517
[2024-06-10 13:50:42,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.53 | bwd_microstep: 1556.44 | bwd_inner_microstep: 1556.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 13:50:44,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.93 | bwd_microstep: 1494.35 | bwd_inner_microstep: 1494.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 13:50:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.67 | bwd_microstep: 792.77 | bwd_inner_microstep: 792.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 13:50:48,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1494.72 | bwd_inner_microstep: 1494.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 13:50:50,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.37 | bwd_microstep: 1614.72 | bwd_inner_microstep: 1614.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-10 13:50:52,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.52 | bwd_microstep: 1425.45 | bwd_inner_microstep: 1425.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 13:50:53,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.29 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2087
[2024-06-10 13:50:54,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.42 | bwd_microstep: 756.08 | bwd_inner_microstep: 756.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2192
[2024-06-10 13:50:55,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.74 | bwd_microstep: 765.11 | bwd_inner_microstep: 765.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 13:50:57,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.67 | bwd_microstep: 1510.74 | bwd_inner_microstep: 1510.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2161
[2024-06-10 13:50:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.39 | bwd_microstep: 850.51 | bwd_inner_microstep: 850.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 13:51:00,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.09 | bwd_microstep: 1656.04 | bwd_inner_microstep: 1656.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 13:51:02,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.94 | bwd_microstep: 1300.24 | bwd_inner_microstep: 1300.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 13:51:04,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1556.25 | bwd_inner_microstep: 1556.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 13:51:06,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1409.56 | bwd_inner_microstep: 1409.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2489
[2024-06-10 13:51:08,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.33 | bwd_microstep: 1026.61 | bwd_inner_microstep: 1026.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2065
[2024-06-10 13:51:09,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.32 | bwd_microstep: 753.78 | bwd_inner_microstep: 753.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813
[2024-06-10 13:51:11,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.04 | bwd_microstep: 1476.23 | bwd_inner_microstep: 1476.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3572
[2024-06-10 13:51:13,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 1455.20 | bwd_inner_microstep: 1455.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3380
[2024-06-10 13:51:15,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1242.34 | bwd_inner_microstep: 1242.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1897
[2024-06-10 13:51:23,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 13:51:23,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.58 | bwd_microstep: 7815.33 | bwd_inner_microstep: 851.08 | bwd_allreduce_microstep: 6964.19 | step_microstep: 38.24
[2024-06-10 13:51:23,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15225.18 | bwd: 47775.02 | bwd_inner: 40809.91 | bwd_allreduce: 6964.43 | step: 39.92
{'loss': 1.2521, 'learning_rate': 2.4612317414848804e-05, 'epoch': 0.44}
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3396
[2024-06-10 13:51:25,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.38 | bwd_microstep: 1310.37 | bwd_inner_microstep: 1310.21 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920
[2024-06-10 13:51:27,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1586.16 | bwd_inner_microstep: 1586.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-10 13:51:28,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.30 | bwd_microstep: 1179.52 | bwd_inner_microstep: 1179.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258
[2024-06-10 13:51:30,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.79 | bwd_microstep: 964.11 | bwd_inner_microstep: 964.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 13:51:32,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1384.17 | bwd_inner_microstep: 1384.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 13:51:33,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.88 | bwd_microstep: 1242.99 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 13:51:35,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1242.90 | bwd_inner_microstep: 1242.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2703
[2024-06-10 13:51:37,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.27 | bwd_microstep: 1031.50 | bwd_inner_microstep: 1031.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3502
[2024-06-10 13:51:38,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1335.29 | bwd_inner_microstep: 1335.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 13:51:40,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1280.15 | bwd_inner_microstep: 1280.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 13:51:42,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.99 | bwd_microstep: 1183.88 | bwd_inner_microstep: 1183.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1873
[2024-06-10 13:51:43,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.10 | bwd_microstep: 708.01 | bwd_inner_microstep: 707.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3456
[2024-06-10 13:51:44,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.63 | bwd_microstep: 1186.40 | bwd_inner_microstep: 1186.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 13:51:46,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.04 | bwd_microstep: 1436.04 | bwd_inner_microstep: 1436.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651
[2024-06-10 13:51:49,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.82 | bwd_microstep: 1710.26 | bwd_inner_microstep: 1710.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3642
[2024-06-10 13:51:51,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.88 | bwd_microstep: 1535.84 | bwd_inner_microstep: 1535.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 13:51:53,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1481.59 | bwd_inner_microstep: 1481.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647
[2024-06-10 13:51:55,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.52 | bwd_microstep: 1574.02 | bwd_inner_microstep: 1573.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-10 13:51:57,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.16 | bwd_microstep: 1071.63 | bwd_inner_microstep: 1071.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1901
[2024-06-10 13:51:58,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.28 | bwd_microstep: 773.79 | bwd_inner_microstep: 773.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 13:52:00,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1448.98 | bwd_inner_microstep: 1448.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3820
[2024-06-10 13:52:01,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.36 | bwd_microstep: 1261.83 | bwd_inner_microstep: 1261.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 13:52:03,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.32 | bwd_microstep: 1404.21 | bwd_inner_microstep: 1404.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 13:52:06,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1637.16 | bwd_inner_microstep: 1637.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2248
[2024-06-10 13:52:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.27 | bwd_microstep: 968.46 | bwd_inner_microstep: 968.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 13:52:09,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1398.30 | bwd_inner_microstep: 1398.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 13:52:11,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1555.87 | bwd_inner_microstep: 1555.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 13:52:13,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.17 | bwd_microstep: 1389.82 | bwd_inner_microstep: 1389.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 13:52:15,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1397.97 | bwd_inner_microstep: 1397.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 13:52:17,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1398.25 | bwd_inner_microstep: 1398.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 13:52:19,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1401.75 | bwd_inner_microstep: 1401.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 13:52:25,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.56
[2024-06-10 13:52:25,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.92 | bwd_microstep: 5899.48 | bwd_inner_microstep: 1804.92 | bwd_allreduce_microstep: 4094.51 | step_microstep: 37.88
[2024-06-10 13:52:25,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15753.39 | bwd: 46380.73 | bwd_inner: 42285.19 | bwd_allreduce: 4094.81 | step: 39.35
{'loss': 1.2684, 'learning_rate': 2.4575787084414244e-05, 'epoch': 0.44}
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 4023
[2024-06-10 13:52:28,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.41 | bwd_microstep: 1725.60 | bwd_inner_microstep: 1725.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 13:52:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.11 | bwd_microstep: 1178.26 | bwd_inner_microstep: 1178.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-10 13:52:31,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1557.86 | bwd_inner_microstep: 1557.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 13:52:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.26 | bwd_microstep: 970.15 | bwd_inner_microstep: 970.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399
[2024-06-10 13:52:34,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.33 | bwd_microstep: 1148.25 | bwd_inner_microstep: 1148.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 13:52:36,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1386.05 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 13:52:38,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.55 | bwd_microstep: 1384.87 | bwd_inner_microstep: 1384.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3519
[2024-06-10 13:52:40,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.74 | bwd_microstep: 1223.16 | bwd_inner_microstep: 1223.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426
[2024-06-10 13:52:41,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.84 | bwd_microstep: 1185.63 | bwd_inner_microstep: 1185.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2655
[2024-06-10 13:52:43,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.24 | bwd_microstep: 1210.08 | bwd_inner_microstep: 1210.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-10 13:52:45,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1408.60 | bwd_inner_microstep: 1408.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 13:52:47,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1445.92 | bwd_inner_microstep: 1445.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2172
[2024-06-10 13:52:48,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.22 | bwd_microstep: 856.74 | bwd_inner_microstep: 856.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111
[2024-06-10 13:52:50,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.02 | bwd_microstep: 921.60 | bwd_inner_microstep: 921.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3468
[2024-06-10 13:52:51,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.51 | bwd_microstep: 1210.89 | bwd_inner_microstep: 1210.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 13:52:53,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.82 | bwd_microstep: 1276.29 | bwd_inner_microstep: 1276.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 13:52:55,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.95 | bwd_microstep: 1487.44 | bwd_inner_microstep: 1487.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 13:52:56,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 799.37 | bwd_inner_microstep: 799.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 13:52:58,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1287.01 | bwd_inner_microstep: 1286.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2511
[2024-06-10 13:52:59,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.88 | bwd_microstep: 896.56 | bwd_inner_microstep: 896.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 13:53:01,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1611.95 | bwd_inner_microstep: 1611.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 13:53:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.39 | bwd_microstep: 1526.82 | bwd_inner_microstep: 1526.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 13:53:06,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.09 | bwd_microstep: 1659.06 | bwd_inner_microstep: 1659.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 13:53:08,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1285.71 | bwd_inner_microstep: 1285.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596
[2024-06-10 13:53:10,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.45 | bwd_microstep: 1431.90 | bwd_inner_microstep: 1431.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 13:53:12,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.16 | bwd_microstep: 1555.65 | bwd_inner_microstep: 1555.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3812
[2024-06-10 13:53:14,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.03 | bwd_microstep: 1416.75 | bwd_inner_microstep: 1416.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 13:53:16,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.97 | bwd_microstep: 1450.34 | bwd_inner_microstep: 1450.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 13:53:18,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1554.81 | bwd_inner_microstep: 1554.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3740
[2024-06-10 13:53:20,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1560.18 | bwd_inner_microstep: 1560.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 13:53:21,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.69 | bwd_microstep: 975.03 | bwd_inner_microstep: 975.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 13:53:27,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 13:53:27,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.87 | bwd_microstep: 5214.98 | bwd_inner_microstep: 1869.36 | bwd_allreduce_microstep: 3345.56 | step_microstep: 38.11
[2024-06-10 13:53:27,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15797.79 | bwd: 45803.52 | bwd_inner: 42457.05 | bwd_allreduce: 3345.80 | step: 39.60
{'loss': 1.2557, 'learning_rate': 2.4539240638081347e-05, 'epoch': 0.44}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-10 13:53:29,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1571.36 | bwd_inner_microstep: 1571.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 13:53:31,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.35 | bwd_microstep: 1488.03 | bwd_inner_microstep: 1488.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3838
[2024-06-10 13:53:33,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.43 | bwd_microstep: 1357.03 | bwd_inner_microstep: 1357.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 13:53:35,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1341.20 | bwd_inner_microstep: 1341.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-10 13:53:36,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.89 | bwd_microstep: 677.55 | bwd_inner_microstep: 677.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 13:53:38,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.86 | bwd_microstep: 1340.49 | bwd_inner_microstep: 1340.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 13:53:39,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 794.53 | bwd_inner_microstep: 794.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 13:53:40,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.57 | bwd_microstep: 701.21 | bwd_inner_microstep: 701.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1980
[2024-06-10 13:53:41,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 831.37 | bwd_inner_microstep: 831.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3510
[2024-06-10 13:53:43,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1447.50 | bwd_inner_microstep: 1447.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3780
[2024-06-10 13:53:45,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1413.16 | bwd_inner_microstep: 1413.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1979
[2024-06-10 13:53:46,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.29 | bwd_microstep: 855.41 | bwd_inner_microstep: 855.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 13:53:48,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1393.90 | bwd_inner_microstep: 1393.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-10 13:53:50,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.41 | bwd_microstep: 1584.40 | bwd_inner_microstep: 1584.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385
[2024-06-10 13:53:52,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.81 | bwd_microstep: 1239.89 | bwd_inner_microstep: 1239.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 13:53:54,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.77 | bwd_microstep: 1621.52 | bwd_inner_microstep: 1621.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1997
[2024-06-10 13:53:56,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 893.33 | bwd_inner_microstep: 893.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3831
[2024-06-10 13:53:58,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.59 | bwd_microstep: 1614.40 | bwd_inner_microstep: 1614.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3615
[2024-06-10 13:54:00,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.18 | bwd_microstep: 1570.58 | bwd_inner_microstep: 1570.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 13:54:02,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.90 | bwd_microstep: 1189.83 | bwd_inner_microstep: 1189.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 13:54:03,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1344.57 | bwd_inner_microstep: 1344.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 13:54:05,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.62 | bwd_microstep: 1306.87 | bwd_inner_microstep: 1306.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 13:54:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.44 | bwd_microstep: 1556.12 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-10 13:54:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.26 | bwd_microstep: 1215.15 | bwd_inner_microstep: 1215.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 13:54:11,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.65 | bwd_microstep: 1657.83 | bwd_inner_microstep: 1657.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-10 13:54:13,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.17 | bwd_microstep: 1452.01 | bwd_inner_microstep: 1451.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 13:54:15,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.89 | bwd_microstep: 1182.76 | bwd_inner_microstep: 1182.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 13:54:17,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1315.12 | bwd_inner_microstep: 1315.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 13:54:19,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1396.10 | bwd_inner_microstep: 1396.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 13:54:21,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-10 13:54:23,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.91 | bwd_microstep: 1480.44 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 13:54:27,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.57
[2024-06-10 13:54:27,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 3303.02 | bwd_inner_microstep: 1683.10 | bwd_allreduce_microstep: 1619.87 | step_microstep: 38.21
[2024-06-10 13:54:27,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15617.22 | bwd: 43514.93 | bwd_inner: 41894.15 | bwd_allreduce: 1620.10 | step: 39.68
28s/it]


 44%|████▍     | 763/1726 [13:11:55<16:39:35, 62.28s/it]
 44%|████▍     | 764/1726 [13:12:56<16:35:14, 62.07s/it]


 44%|████▍     | 764/1726 [13:12:56<16:35:14, 62.07s/it]
 44%|████▍     | 765/1726 [13:13:59<16:40:20, 62.46s/it]


 44%|████▍     | 765/1726 [13:13:59<16:40:20, 62.46s/it]
 44%|████▍     | 766/1726 [13:15:02<16:39:19, 62.46s/it]


 44%|████▍     | 766/1726 [13:15:02<16:39:19, 62.46s/it]
 44%|████▍     | 767/1726 [13:16:04<16:35:44, 62.30s/it]


 44%|████▍     | 767/1726 [13:16:04<16:35:44, 62.30s/it]
 44%|████▍     | 768/1726 [13:17:03<16:21:05, 61.45s/it]
                                                        {'loss': 1.181, 'learning_rate': 2.4502678204566522e-05, 'epoch': 0.44}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 13:54:28,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.58 | bwd_microstep: 1238.45 | bwd_inner_microstep: 1238.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-10 13:54:30,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1443.09 | bwd_inner_microstep: 1443.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 13:54:32,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1388.86 | bwd_inner_microstep: 1388.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 13:54:34,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 13:54:35,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 792.89 | bwd_inner_microstep: 792.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748
[2024-06-10 13:54:37,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1434.07 | bwd_inner_microstep: 1434.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 13:54:39,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.51 | bwd_microstep: 1488.70 | bwd_inner_microstep: 1488.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 13:54:41,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1528.42 | bwd_inner_microstep: 1528.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 13:54:43,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.89 | bwd_microstep: 1428.99 | bwd_inner_microstep: 1428.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691
[2024-06-10 13:54:45,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.22 | bwd_microstep: 1423.78 | bwd_inner_microstep: 1423.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 13:54:46,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.94 | bwd_microstep: 817.89 | bwd_inner_microstep: 817.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-10 13:54:48,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.35 | bwd_microstep: 916.24 | bwd_inner_microstep: 916.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1987
[2024-06-10 13:54:49,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.72 | bwd_microstep: 895.42 | bwd_inner_microstep: 895.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 1958
[2024-06-10 13:54:50,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.64 | bwd_microstep: 952.76 | bwd_inner_microstep: 952.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 13:54:52,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.02 | bwd_microstep: 1369.45 | bwd_inner_microstep: 1369.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638
[2024-06-10 13:54:55,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.70 | bwd_microstep: 1707.96 | bwd_inner_microstep: 1707.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2948
[2024-06-10 13:54:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.42 | bwd_microstep: 1099.44 | bwd_inner_microstep: 1099.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3435
[2024-06-10 13:54:58,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1377.08 | bwd_inner_microstep: 1377.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 13:55:00,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1507.96 | bwd_inner_microstep: 1507.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3641
[2024-06-10 13:55:02,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.71 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 13:55:04,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.37 | bwd_microstep: 1449.09 | bwd_inner_microstep: 1449.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 13:55:06,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.09 | bwd_microstep: 1286.44 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1995
[2024-06-10 13:55:07,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.89 | bwd_microstep: 772.95 | bwd_inner_microstep: 772.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3451
[2024-06-10 13:55:09,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.87 | bwd_microstep: 1300.70 | bwd_inner_microstep: 1300.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 13:55:11,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 13:55:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.56 | bwd_microstep: 1431.24 | bwd_inner_microstep: 1431.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039
[2024-06-10 13:55:14,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.93 | bwd_microstep: 907.57 | bwd_inner_microstep: 907.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3575
[2024-06-10 13:55:16,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.31 | bwd_microstep: 1363.16 | bwd_inner_microstep: 1363.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2405
[2024-06-10 13:55:17,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.08 | bwd_microstep: 1031.44 | bwd_inner_microstep: 1031.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811
[2024-06-10 13:55:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1355.20 | bwd_inner_microstep: 1355.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2041
[2024-06-10 13:55:20,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.58 | bwd_microstep: 906.64 | bwd_inner_microstep: 906.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2463
[2024-06-10 13:55:28,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.29 | optimizer_step: 6.59
[2024-06-10 13:55:28,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.77 | bwd_microstep: 7734.00 | bwd_inner_microstep: 1081.18 | bwd_allreduce_microstep: 6652.76 | step_microstep: 39.08
[2024-06-10 13:55:28,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14917.03 | bwd: 46643.93 | bwd_inner: 39990.25 | bwd_allreduce: 6653.00 | step: 40.59
{'loss': 1.2006, 'learning_rate': 2.446609991264248e-05, 'epoch': 0.45}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 13:55:30,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.90 | bwd_microstep: 1268.18 | bwd_inner_microstep: 1268.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 13:55:32,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1398.51 | bwd_inner_microstep: 1398.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3879
[2024-06-10 13:55:34,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.36 | bwd_microstep: 1579.83 | bwd_inner_microstep: 1579.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3875
[2024-06-10 13:55:36,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.87 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 13:55:38,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.07 | bwd_microstep: 1378.05 | bwd_inner_microstep: 1378.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-10 13:55:40,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.30 | bwd_microstep: 1367.63 | bwd_inner_microstep: 1367.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262
[2024-06-10 13:55:41,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.45 | bwd_microstep: 965.87 | bwd_inner_microstep: 965.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 13:55:43,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.38 | bwd_microstep: 1248.18 | bwd_inner_microstep: 1248.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 13:55:45,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.96 | bwd_microstep: 1395.38 | bwd_inner_microstep: 1395.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2191
[2024-06-10 13:55:46,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.37 | bwd_microstep: 765.36 | bwd_inner_microstep: 765.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 13:55:48,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.85 | bwd_microstep: 1276.34 | bwd_inner_microstep: 1276.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 13:55:50,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1250.35 | bwd_inner_microstep: 1250.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 13:55:51,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.16 | bwd_microstep: 702.70 | bwd_inner_microstep: 702.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3730
[2024-06-10 13:55:53,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1557.83 | bwd_inner_microstep: 1557.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1883
[2024-06-10 13:55:54,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.79 | bwd_microstep: 790.90 | bwd_inner_microstep: 790.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 13:55:56,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.37 | bwd_microstep: 1382.43 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-10 13:55:58,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.56 | bwd_microstep: 1713.96 | bwd_inner_microstep: 1713.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-10 13:56:00,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.68 | bwd_microstep: 1425.80 | bwd_inner_microstep: 1425.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 13:56:02,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3511
[2024-06-10 13:56:04,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1252.61 | bwd_inner_microstep: 1252.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901
[2024-06-10 13:56:05,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.10 | bwd_microstep: 680.88 | bwd_inner_microstep: 680.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 13:56:07,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.40 | bwd_microstep: 1552.31 | bwd_inner_microstep: 1552.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 13:56:09,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1256.06 | bwd_inner_microstep: 1256.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3826
[2024-06-10 13:56:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.40 | bwd_microstep: 1259.81 | bwd_inner_microstep: 1259.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3874
[2024-06-10 13:56:13,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.78 | bwd_microstep: 1610.17 | bwd_inner_microstep: 1610.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2291
[2024-06-10 13:56:14,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.58 | bwd_microstep: 939.85 | bwd_inner_microstep: 939.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3787
[2024-06-10 13:56:16,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.47 | bwd_microstep: 1550.13 | bwd_inner_microstep: 1550.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809
[2024-06-10 13:56:18,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.52 | bwd_microstep: 1579.32 | bwd_inner_microstep: 1579.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650
[2024-06-10 13:56:20,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.63 | bwd_microstep: 1575.34 | bwd_inner_microstep: 1575.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2272
[2024-06-10 13:56:22,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.32 | bwd_microstep: 976.89 | bwd_inner_microstep: 976.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 13:56:23,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3811
[2024-06-10 13:56:28,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 13:56:28,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.92 | bwd_microstep: 4280.36 | bwd_inner_microstep: 2097.64 | bwd_allreduce_microstep: 2182.67 | step_microstep: 38.02
[2024-06-10 13:56:28,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15577.60 | bwd: 44003.37 | bwd_inner: 41819.80 | bwd_allreduce: 2182.90 | step: 39.53
{'loss': 1.2584, 'learning_rate': 2.442950589113775e-05, 'epoch': 0.45}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 13:56:30,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.54 | bwd_microstep: 1373.24 | bwd_inner_microstep: 1373.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3913
[2024-06-10 13:56:33,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.73 | bwd_microstep: 1685.09 | bwd_inner_microstep: 1685.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2354
[2024-06-10 13:56:34,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.31 | bwd_microstep: 985.52 | bwd_inner_microstep: 985.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 13:56:36,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1482.16 | bwd_inner_microstep: 1482.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3528
[2024-06-10 13:56:38,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1228.08 | bwd_inner_microstep: 1228.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2878
[2024-06-10 13:56:39,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.91 | bwd_microstep: 1016.55 | bwd_inner_microstep: 1016.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 13:56:41,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.55 | bwd_microstep: 1286.07 | bwd_inner_microstep: 1286.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 13:56:43,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.58 | bwd_microstep: 1380.97 | bwd_inner_microstep: 1380.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4158
[2024-06-10 13:56:45,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.03 | bwd_microstep: 1447.30 | bwd_inner_microstep: 1447.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 13:56:47,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1388.34 | bwd_inner_microstep: 1388.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-10 13:56:49,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.35 | bwd_microstep: 1316.23 | bwd_inner_microstep: 1316.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3695
[2024-06-10 13:56:51,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.50 | bwd_microstep: 1790.70 | bwd_inner_microstep: 1790.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 13:56:53,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1382.76 | bwd_inner_microstep: 1382.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-10 13:56:55,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.91 | bwd_microstep: 1316.29 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3650
[2024-06-10 13:56:57,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.50 | bwd_microstep: 1652.16 | bwd_inner_microstep: 1652.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 933
[2024-06-10 13:56:58,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.35 | bwd_microstep: 377.71 | bwd_inner_microstep: 377.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 13:56:59,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1254.30 | bwd_inner_microstep: 1254.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 13:57:00,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.87 | bwd_microstep: 789.12 | bwd_inner_microstep: 789.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 13:57:03,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1554.49 | bwd_inner_microstep: 1554.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2426
[2024-06-10 13:57:04,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.22 | bwd_microstep: 1034.88 | bwd_inner_microstep: 1034.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 13:57:06,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.17 | bwd_microstep: 1557.35 | bwd_inner_microstep: 1557.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 13:57:08,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1429.42 | bwd_inner_microstep: 1429.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-10 13:57:10,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.54 | bwd_microstep: 1185.42 | bwd_inner_microstep: 1185.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2192
[2024-06-10 13:57:11,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.07 | bwd_microstep: 858.10 | bwd_inner_microstep: 858.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 13:57:13,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1397.14 | bwd_inner_microstep: 1397.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 13:57:15,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1315.35 | bwd_inner_microstep: 1315.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 13:57:17,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1413.73 | bwd_inner_microstep: 1413.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 866
[2024-06-10 13:57:17,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.16 | bwd_microstep: 366.77 | bwd_inner_microstep: 366.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3818
[2024-06-10 13:57:19,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.31 | bwd_microstep: 1688.18 | bwd_inner_microstep: 1688.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3419
[2024-06-10 13:57:22,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.60 | bwd_microstep: 1509.79 | bwd_inner_microstep: 1509.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 13:57:23,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1413.11 | bwd_inner_microstep: 1413.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3585
[2024-06-10 13:57:30,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.28 | optimizer_step: 6.57
[2024-06-10 13:57:30,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.05 | bwd_microstep: 5964.86 | bwd_inner_microstep: 2251.87 | bwd_allreduce_microstep: 3712.93 | step_microstep: 39.20
[2024-06-10 13:57:30,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15540.06 | bwd: 45841.18 | bwd_inner: 42127.34 | bwd_allreduce: 3713.16 | step: 40.69
{'loss': 1.2452, 'learning_rate': 2.4392896268936305e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 13:57:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 1439.52 | bwd_inner_microstep: 1439.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 13:57:34,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1339.58 | bwd_inner_microstep: 1339.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 13:57:36,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.00 | bwd_microstep: 1645.92 | bwd_inner_microstep: 1645.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2314
[2024-06-10 13:57:38,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.42 | bwd_microstep: 979.33 | bwd_inner_microstep: 979.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 13:57:39,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1380.90 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 13:57:41,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1383.81 | bwd_inner_microstep: 1383.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4064
[2024-06-10 13:57:44,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.13 | bwd_microstep: 1721.90 | bwd_inner_microstep: 1721.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 13:57:46,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.40 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3692
[2024-06-10 13:57:48,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.02 | bwd_microstep: 1720.07 | bwd_inner_microstep: 1720.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 13:57:50,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1384.60 | bwd_inner_microstep: 1384.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3711
[2024-06-10 13:57:52,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.12 | bwd_microstep: 1828.01 | bwd_inner_microstep: 1827.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2916
[2024-06-10 13:57:54,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.48 | bwd_microstep: 1189.65 | bwd_inner_microstep: 1189.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3485
[2024-06-10 13:57:56,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.24 | bwd_microstep: 1574.86 | bwd_inner_microstep: 1574.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 13:57:58,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.37 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3772
[2024-06-10 13:58:00,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.46 | bwd_microstep: 1466.58 | bwd_inner_microstep: 1466.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 13:58:03,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.54 | bwd_microstep: 1625.93 | bwd_inner_microstep: 1625.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-10 13:58:05,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.07 | bwd_microstep: 1411.24 | bwd_inner_microstep: 1411.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 13:58:06,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.93 | bwd_microstep: 1187.04 | bwd_inner_microstep: 1187.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2133
[2024-06-10 13:58:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.04 | bwd_microstep: 766.17 | bwd_inner_microstep: 766.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 13:58:09,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1386.27 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:58:11,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1381.63 | bwd_inner_microstep: 1381.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 13:58:12,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.76 | bwd_microstep: 801.64 | bwd_inner_microstep: 801.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-10 13:58:14,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.43 | bwd_microstep: 1157.69 | bwd_inner_microstep: 1157.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 13:58:16,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1652.60 | bwd_inner_microstep: 1652.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3605
[2024-06-10 13:58:18,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.64 | bwd_microstep: 1640.87 | bwd_inner_microstep: 1640.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 13:58:20,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1252.46 | bwd_inner_microstep: 1252.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3696
[2024-06-10 13:58:22,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.77 | bwd_microstep: 1360.55 | bwd_inner_microstep: 1360.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426
[2024-06-10 13:58:24,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.07 | bwd_microstep: 1280.79 | bwd_inner_microstep: 1280.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2249
[2024-06-10 13:58:25,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.64 | bwd_microstep: 966.63 | bwd_inner_microstep: 966.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 13:58:27,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1474.51 | bwd_inner_microstep: 1474.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2264
[2024-06-10 13:58:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.55 | bwd_microstep: 971.53 | bwd_inner_microstep: 971.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3550
[2024-06-10 13:58:30,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.40 | optimizer_step: 6.62
[2024-06-10 13:58:30,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.01 | bwd_microstep: 1382.16 | bwd_inner_microstep: 1374.43 | bwd_allreduce_microstep: 7.68 | step_microstep: 39.81
[2024-06-10 13:58:30,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16275.97 | bwd: 43626.72 | bwd_inner: 43618.14 | bwd_allreduce: 7.91 | step: 41.34
{'loss': 1.256, 'learning_rate': 2.435627117497703e-05, 'epoch': 0.45}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 13:58:32,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.08 | bwd_microstep: 1336.76 | bwd_inner_microstep: 1336.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 13:58:34,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1282.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 13:58:36,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1465.19 | bwd_inner_microstep: 1465.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 13:58:38,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 13:58:40,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1288.42 | bwd_inner_microstep: 1288.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1952
[2024-06-10 13:58:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.42 | bwd_microstep: 699.70 | bwd_inner_microstep: 699.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3471
[2024-06-10 13:58:43,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.18 | bwd_microstep: 1550.49 | bwd_inner_microstep: 1550.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 13:58:45,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1342.48 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 13:58:47,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.21 | bwd_microstep: 1624.62 | bwd_inner_microstep: 1624.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517
[2024-06-10 13:58:49,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1322.77 | bwd_inner_microstep: 1322.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3496
[2024-06-10 13:58:51,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.30 | bwd_microstep: 1647.59 | bwd_inner_microstep: 1647.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1965
[2024-06-10 13:58:52,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.42 | bwd_microstep: 765.83 | bwd_inner_microstep: 765.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2180
[2024-06-10 13:58:53,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.04 | bwd_microstep: 954.93 | bwd_inner_microstep: 954.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 13:58:54,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.70 | bwd_microstep: 790.59 | bwd_inner_microstep: 790.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2162
[2024-06-10 13:58:56,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.39 | bwd_microstep: 948.19 | bwd_inner_microstep: 948.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 13:58:57,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.99 | bwd_microstep: 889.34 | bwd_inner_microstep: 889.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3600
[2024-06-10 13:58:59,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1666.96 | bwd_inner_microstep: 1666.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 13:59:02,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.86 | bwd_microstep: 1646.21 | bwd_inner_microstep: 1646.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 13:59:04,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.45 | bwd_microstep: 1432.90 | bwd_inner_microstep: 1432.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 13:59:06,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.97 | bwd_microstep: 1462.84 | bwd_inner_microstep: 1462.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 13:59:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1396.62 | bwd_inner_microstep: 1396.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 13:59:10,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1493.15 | bwd_inner_microstep: 1493.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3719
[2024-06-10 13:59:11,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1366.80 | bwd_inner_microstep: 1366.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3811
[2024-06-10 13:59:14,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.59 | bwd_microstep: 1588.19 | bwd_inner_microstep: 1588.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563
[2024-06-10 13:59:16,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1427.88 | bwd_inner_microstep: 1427.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 13:59:17,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1258.94 | bwd_inner_microstep: 1258.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 13:59:19,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1286.36 | bwd_inner_microstep: 1286.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3740
[2024-06-10 13:59:21,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.11 | bwd_microstep: 1469.18 | bwd_inner_microstep: 1469.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3566
[2024-06-10 13:59:23,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1360.28 | bwd_inner_microstep: 1360.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 13:59:25,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.37 | bwd_microstep: 1535.13 | bwd_inner_microstep: 1535.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-10 13:59:28,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.14 | bwd_microstep: 1750.43 | bwd_inner_microstep: 1750.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3779
[2024-06-10 13:59:34,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 13:59:34,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.93 | bwd_microstep: 5486.83 | bwd_inner_microstep: 1821.43 | bwd_allreduce_microstep: 3665.36 | step_microstep: 37.83
[2024-06-10 13:59:34,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16080.52 | bwd: 46915.97 | bwd_inner: 43249.71 | bwd_allreduce: 3665.59 | step: 39.32
{'loss': 1.2512, 'learning_rate': 2.4319630738253333e-05, 'epoch': 0.45}


 44%|████▍     | 768/1726 [13:17:03<16:21:05, 61.45s/it]
 45%|████▍     | 769/1726 [13:18:05<16:22:13, 61.58s/it]


 45%|████▍     | 769/1726 [13:18:05<16:22:13, 61.58s/it]
 45%|████▍     | 770/1726 [13:19:05<16:13:12, 61.08s/it]


 45%|████▍     | 770/1726 [13:19:05<16:13:12, 61.08s/it]
 45%|████▍     | 771/1726 [13:20:07<16:15:14, 61.27s/it]


 45%|████▍     | 771/1726 [13:20:07<16:15:14, 61.27s/it]
 45%|████▍     | 772/1726 [13:21:07<16:09:20, 60.97s/it]


 45%|████▍     | 772/1726 [13:21:07<16:09:20, 60.97s/it]
 45%|████▍     | 773/1726 [13:22:10<16:19:37, 61.68s/it]


 45%|████▍     | 773/1726 [13:22:10<16:19:37, 61.68sdynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-10 13:59:35,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.49 | bwd_microstep: 774.37 | bwd_inner_microstep: 774.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 13:59:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.72 | bwd_microstep: 787.65 | bwd_inner_microstep: 787.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 13:59:38,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1384.47 | bwd_inner_microstep: 1384.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 13:59:40,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1339.41 | bwd_inner_microstep: 1339.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 13:59:41,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.21 | bwd_microstep: 1339.53 | bwd_inner_microstep: 1339.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3757
[2024-06-10 13:59:44,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.18 | bwd_microstep: 1470.20 | bwd_inner_microstep: 1470.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3473
[2024-06-10 13:59:45,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.93 | bwd_microstep: 1243.20 | bwd_inner_microstep: 1243.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969
[2024-06-10 13:59:46,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.23 | bwd_microstep: 702.94 | bwd_inner_microstep: 702.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 13:59:48,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1243.86 | bwd_inner_microstep: 1243.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420
[2024-06-10 13:59:50,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.28 | bwd_microstep: 1151.67 | bwd_inner_microstep: 1151.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 728
[2024-06-10 13:59:50,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.15 | bwd_microstep: 292.26 | bwd_inner_microstep: 292.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3742
[2024-06-10 13:59:52,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1386.49 | bwd_inner_microstep: 1386.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440
[2024-06-10 13:59:54,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.57 | bwd_microstep: 1309.52 | bwd_inner_microstep: 1309.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3501
[2024-06-10 13:59:56,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.37 | bwd_microstep: 1649.90 | bwd_inner_microstep: 1649.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3493
[2024-06-10 13:59:58,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.63 | bwd_microstep: 1428.65 | bwd_inner_microstep: 1428.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193
[2024-06-10 13:59:59,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.99 | bwd_microstep: 859.45 | bwd_inner_microstep: 859.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3421
[2024-06-10 14:00:01,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.21 | bwd_microstep: 1508.14 | bwd_inner_microstep: 1508.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 14:00:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.61 | bwd_microstep: 1486.53 | bwd_inner_microstep: 1486.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3601
[2024-06-10 14:00:05,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1431.31 | bwd_inner_microstep: 1431.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 14:00:07,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1248.64 | bwd_inner_microstep: 1248.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2137
[2024-06-10 14:00:08,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.45 | bwd_microstep: 1028.28 | bwd_inner_microstep: 1028.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2110
[2024-06-10 14:00:10,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.06 | bwd_microstep: 1018.33 | bwd_inner_microstep: 1018.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822
[2024-06-10 14:00:12,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1355.05 | bwd_inner_microstep: 1355.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 14:00:14,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.47 | bwd_microstep: 1608.83 | bwd_inner_microstep: 1608.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2282
[2024-06-10 14:00:15,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.50 | bwd_microstep: 908.07 | bwd_inner_microstep: 908.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 14:00:17,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1347.50 | bwd_inner_microstep: 1347.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 14:00:19,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1900
[2024-06-10 14:00:20,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.42 | bwd_microstep: 745.50 | bwd_inner_microstep: 745.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 14:00:22,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.06 | bwd_microstep: 1603.10 | bwd_inner_microstep: 1603.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233
[2024-06-10 14:00:23,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.98 | bwd_microstep: 964.10 | bwd_inner_microstep: 964.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 14:00:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1253.65 | bwd_inner_microstep: 1253.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 14:00:33,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.31 | optimizer_step: 6.58
[2024-06-10 14:00:33,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 7766.83 | bwd_inner_microstep: 1414.02 | bwd_allreduce_microstep: 6352.75 | step_microstep: 39.04
[2024-06-10 14:00:33,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14411.76 | bwd: 45030.56 | bwd_inner: 38676.88 | bwd_allreduce: 6352.99 | step: 40.61
{'loss': 1.3041, 'learning_rate': 2.4282975087812627e-05, 'epoch': 0.45}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 14:00:35,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.14 | bwd_microstep: 1237.21 | bwd_inner_microstep: 1237.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892
[2024-06-10 14:00:37,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.16 | bwd_microstep: 1580.02 | bwd_inner_microstep: 1579.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 14:00:39,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1369.49 | bwd_inner_microstep: 1369.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-10 14:00:42,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.67 | bwd_microstep: 1641.34 | bwd_inner_microstep: 1641.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 14:00:44,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.42 | bwd_microstep: 1640.78 | bwd_inner_microstep: 1640.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 14:00:46,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.39 | bwd_microstep: 1535.63 | bwd_inner_microstep: 1535.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4077
[2024-06-10 14:00:48,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.82 | bwd_microstep: 1587.56 | bwd_inner_microstep: 1587.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524
[2024-06-10 14:00:50,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.26 | bwd_microstep: 1193.44 | bwd_inner_microstep: 1193.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 14:00:52,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1401.61 | bwd_inner_microstep: 1401.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 14:00:54,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1401.19 | bwd_inner_microstep: 1401.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 14:00:55,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1249.55 | bwd_inner_microstep: 1249.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2106
[2024-06-10 14:00:57,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.30 | bwd_microstep: 917.71 | bwd_inner_microstep: 917.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3745
[2024-06-10 14:00:59,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.91 | bwd_microstep: 1645.86 | bwd_inner_microstep: 1645.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625
[2024-06-10 14:01:01,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.57 | bwd_microstep: 1538.05 | bwd_inner_microstep: 1538.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3490
[2024-06-10 14:01:03,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.26 | bwd_microstep: 1440.88 | bwd_inner_microstep: 1440.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 14:01:05,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 14:01:06,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.90 | bwd_microstep: 795.68 | bwd_inner_microstep: 795.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 14:01:08,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1312.41 | bwd_inner_microstep: 1312.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 14:01:10,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1385.24 | bwd_inner_microstep: 1385.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 14:01:12,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1398.15 | bwd_inner_microstep: 1398.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 14:01:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1493.96 | bwd_inner_microstep: 1493.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 14:01:16,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1425.64 | bwd_inner_microstep: 1425.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488
[2024-06-10 14:01:18,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.09 | bwd_microstep: 1217.71 | bwd_inner_microstep: 1217.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3597
[2024-06-10 14:01:20,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.89 | bwd_microstep: 1450.02 | bwd_inner_microstep: 1449.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431
[2024-06-10 14:01:21,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.83 | bwd_microstep: 1152.14 | bwd_inner_microstep: 1152.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 14:01:23,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.04 | bwd_microstep: 1387.86 | bwd_inner_microstep: 1387.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3549
[2024-06-10 14:01:25,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.21 | bwd_microstep: 1458.86 | bwd_inner_microstep: 1458.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2942
[2024-06-10 14:01:27,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.85 | bwd_microstep: 1282.68 | bwd_inner_microstep: 1282.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3555
[2024-06-10 14:01:29,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1357.64 | bwd_inner_microstep: 1357.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-10 14:01:31,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.31 | bwd_microstep: 1448.72 | bwd_inner_microstep: 1448.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 14:01:33,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.22 | bwd_microstep: 1456.23 | bwd_inner_microstep: 1456.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1995
[2024-06-10 14:01:34,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.13 | optimizer_step: 6.59
[2024-06-10 14:01:34,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.08 | bwd_microstep: 753.37 | bwd_inner_microstep: 739.36 | bwd_allreduce_microstep: 13.96 | step_microstep: 37.63
[2024-06-10 14:01:34,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16353.47 | bwd: 43663.42 | bwd_inner: 43648.54 | bwd_allreduce: 14.18 | step: 39.22
{'loss': 1.2934, 'learning_rate': 2.4246304352755924e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 14:01:36,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 14:01:38,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1484.24 | bwd_inner_microstep: 1484.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 14:01:40,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1477.27 | bwd_inner_microstep: 1477.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 14:01:41,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.75 | bwd_microstep: 794.78 | bwd_inner_microstep: 794.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 14:01:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.92 | bwd_microstep: 1545.54 | bwd_inner_microstep: 1545.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 14:01:45,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.08 | bwd_microstep: 1481.65 | bwd_inner_microstep: 1481.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768
[2024-06-10 14:01:47,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.75 | bwd_microstep: 1472.22 | bwd_inner_microstep: 1472.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 14:01:49,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.75 | bwd_microstep: 1384.87 | bwd_inner_microstep: 1384.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1991
[2024-06-10 14:01:50,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.79 | bwd_microstep: 707.24 | bwd_inner_microstep: 707.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 14:01:52,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.25 | bwd_microstep: 1148.86 | bwd_inner_microstep: 1148.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 14:01:54,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1379.55 | bwd_inner_microstep: 1379.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 14:01:56,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1388.99 | bwd_inner_microstep: 1388.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3500
[2024-06-10 14:01:57,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1346.57 | bwd_inner_microstep: 1346.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-10 14:01:59,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.44 | bwd_microstep: 887.81 | bwd_inner_microstep: 887.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-10 14:02:01,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.93 | bwd_microstep: 1383.57 | bwd_inner_microstep: 1383.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 14:02:02,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.86 | bwd_microstep: 1285.11 | bwd_inner_microstep: 1285.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3631
[2024-06-10 14:02:04,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.32 | bwd_microstep: 1346.11 | bwd_inner_microstep: 1346.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 14:02:06,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.43 | bwd_microstep: 1256.18 | bwd_inner_microstep: 1256.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2010
[2024-06-10 14:02:07,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.07 | bwd_microstep: 740.16 | bwd_inner_microstep: 740.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 14:02:09,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1391.32 | bwd_inner_microstep: 1391.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 14:02:10,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.33 | bwd_microstep: 877.61 | bwd_inner_microstep: 877.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 14:02:12,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.26 | bwd_microstep: 1658.13 | bwd_inner_microstep: 1658.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 14:02:13,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.82 | bwd_microstep: 696.76 | bwd_inner_microstep: 696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-10 14:02:14,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.49 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645
[2024-06-10 14:02:17,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.81 | bwd_microstep: 1516.90 | bwd_inner_microstep: 1516.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432
[2024-06-10 14:02:18,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1282.76 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-10 14:02:20,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.41 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-10 14:02:22,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.34 | bwd_microstep: 1543.05 | bwd_inner_microstep: 1543.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 14:02:24,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1551.27 | bwd_inner_microstep: 1551.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2012
[2024-06-10 14:02:26,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.69 | bwd_microstep: 898.88 | bwd_inner_microstep: 898.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-10 14:02:28,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.26 | bwd_microstep: 1741.20 | bwd_inner_microstep: 1741.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3731
[2024-06-10 14:02:34,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 14:02:34,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.89 | bwd_microstep: 5667.98 | bwd_inner_microstep: 1905.07 | bwd_allreduce_microstep: 3762.86 | step_microstep: 38.03
[2024-06-10 14:02:34,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15233.19 | bwd: 44864.01 | bwd_inner: 41100.24 | bwd_allreduce: 3763.09 | step: 39.53
{'loss': 1.3127, 'learning_rate': 2.4209618662237367e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 14:02:36,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.37 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 14:02:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.30 | bwd_microstep: 1284.39 | bwd_inner_microstep: 1284.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3561
[2024-06-10 14:02:40,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.02 | bwd_microstep: 1433.13 | bwd_inner_microstep: 1433.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 14:02:42,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.45 | bwd_microstep: 1475.63 | bwd_inner_microstep: 1475.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3759
[2024-06-10 14:02:44,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1467.15 | bwd_inner_microstep: 1467.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489
[2024-06-10 14:02:46,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1313.19 | bwd_inner_microstep: 1313.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 14:02:48,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1245.36 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 14:02:50,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.03 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 14:02:52,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.28 | bwd_microstep: 1630.77 | bwd_inner_microstep: 1630.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 14:02:54,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.72 | bwd_microstep: 1389.31 | bwd_inner_microstep: 1389.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 14:02:56,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.84 | bwd_microstep: 1276.99 | bwd_inner_microstep: 1276.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 14:02:58,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 1471.26 | bwd_inner_microstep: 1471.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 14:03:00,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.95 | bwd_microstep: 1469.03 | bwd_inner_microstep: 1469.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 14:03:01,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1383.65 | bwd_inner_microstep: 1383.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1985
[2024-06-10 14:03:03,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.96 | bwd_microstep: 734.95 | bwd_inner_microstep: 734.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638
[2024-06-10 14:03:05,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.24 | bwd_microstep: 1709.52 | bwd_inner_microstep: 1709.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 4091
[2024-06-10 14:03:07,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.99 | bwd_microstep: 1760.41 | bwd_inner_microstep: 1760.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3524
[2024-06-10 14:03:09,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1552.39 | bwd_inner_microstep: 1552.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1994
[2024-06-10 14:03:10,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.34 | bwd_microstep: 738.50 | bwd_inner_microstep: 738.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 14:03:12,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.89 | bwd_microstep: 802.70 | bwd_inner_microstep: 802.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520
[2024-06-10 14:03:13,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.87 | bwd_microstep: 1226.02 | bwd_inner_microstep: 1226.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3784
[2024-06-10 14:03:15,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1481.23 | bwd_inner_microstep: 1481.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 14:03:17,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1391.96 | bwd_inner_microstep: 1391.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 14:03:18,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.97 | bwd_microstep: 801.29 | bwd_inner_microstep: 801.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 14:03:20,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.41 | bwd_microstep: 1310.60 | bwd_inner_microstep: 1310.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724
[2024-06-10 14:03:22,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1369.08 | bwd_inner_microstep: 1369.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 14:03:24,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.00 | bwd_microstep: 1299.88 | bwd_inner_microstep: 1299.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 14:03:26,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1375.70 | bwd_inner_microstep: 1375.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3439
[2024-06-10 14:03:28,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1394.70 | bwd_inner_microstep: 1394.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3575
[2024-06-10 14:03:30,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1519.28 | bwd_inner_microstep: 1519.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3404
[2024-06-10 14:03:32,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.23 | bwd_microstep: 1485.16 | bwd_inner_microstep: 1485.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-10 14:03:36,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 14:03:36,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 3532.79 | bwd_inner_microstep: 1598.57 | bwd_allreduce_microstep: 1934.17 | step_microstep: 38.19
[2024-06-10 14:03:36,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16125.23 | bwd: 45187.34 | bwd_inner: 43252.20 | bwd_allreduce: 1934.43 | step: 39.76
{'loss': 1.2262, 'learning_rate': 2.4172918145463763e-05, 'epoch': 0.45}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-10 14:03:38,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.92 | bwd_microstep: 1582.61 | bwd_inner_microstep: 1582.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 14:03:40,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 14:03:42,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 14:03:43,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.84 | bwd_microstep: 815.93 | bwd_inner_microstep: 815.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3535
[2024-06-10 14:03:45,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.10 | bwd_microstep: 1198.22 | bwd_inner_microstep: 1198.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 14:03:46,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 788.54 | bwd_inner_microstep: 788.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 14:03:47,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.01 | bwd_microstep: 792.33 | bwd_inner_microstep: 792.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 14:03:49,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1280.78 | bwd_inner_microstep: 1280.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 14:03:50,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 795.42 | bwd_inner_microstep: 795.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 14:03:52,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.14 | bwd_microstep: 1400.08 | bwd_inner_microstep: 1400.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3683
[2024-06-10 14:03:54,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.22 | bwd_microstep: 1654.71 | bwd_inner_microstep: 1654.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 14:03:56,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.64 | bwd_microstep: 1374.70 | bwd_inner_microstep: 1374.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 14:03:57,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.77 | bwd_microstep: 788.98 | bwd_inner_microstep: 788.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1958
[2024-06-10 14:03:58,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.20 | bwd_microstep: 918.24 | bwd_inner_microstep: 918.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3494
[2024-06-10 14:04:00,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.40 | bwd_microstep: 1531.95 | bwd_inner_microstep: 1531.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 14:04:02,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1384.79 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2747
[2024-06-10 14:04:04,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.20 | bwd_microstep: 1165.62 | bwd_inner_microstep: 1165.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 14:04:06,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1481.69 | bwd_inner_microstep: 1481.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 14:04:08,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.11 | bwd_microstep: 1523.04 | bwd_inner_microstep: 1523.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 14:04:10,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1479.28 | bwd_inner_microstep: 1479.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 14:04:12,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1245.01 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-10 14:04:14,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.82 | bwd_microstep: 1718.62 | bwd_inner_microstep: 1718.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 14:04:16,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1451.79 | bwd_inner_microstep: 1451.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3806
[2024-06-10 14:04:18,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.75 | bwd_microstep: 1383.74 | bwd_inner_microstep: 1383.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3708
[2024-06-10 14:04:20,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1460.43 | bwd_inner_microstep: 1460.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225
[2024-06-10 14:04:21,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.39 | bwd_microstep: 958.12 | bwd_inner_microstep: 958.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3456
[2024-06-10 14:04:23,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1414.56 | bwd_inner_microstep: 1414.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 14:04:25,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.40 | bwd_microstep: 1451.09 | bwd_inner_microstep: 1451.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3854
[2024-06-10 14:04:28,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.94 | bwd_microstep: 1760.30 | bwd_inner_microstep: 1760.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 14:04:30,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.04 | bwd_microstep: 1548.73 | bwd_inner_microstep: 1548.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-10 14:04:32,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.29 | bwd_microstep: 1596.00 | bwd_inner_microstep: 1595.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2285
[2024-06-10 14:04:35,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.14 | optimizer_step: 6.61
[2024-06-10 14:04:35,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.49 | bwd_microstep: 2226.66 | bwd_inner_microstep: 1215.93 | bwd_allreduce_microstep: 1010.69 | step_microstep: 37.47
[2024-06-10 14:04:35,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15616.17 | bwd: 42943.49 | bwd_inner: 41931.90 | bwd_allreduce: 1010.92 | step: 38.91
{'loss': 1.2102, 'learning_rate': 2.413620293169415e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 14:04:37,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.07 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1976
[2024-06-10 14:04:38,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.99 | bwd_microstep: 703.15 | bwd_inner_microstep: 703.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3019
[2024-06-10 14:04:39,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.58 | bwd_microstep: 1171.58 | bwd_inner_microstep: 1171.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3922
[2024-06-10 14:04:42,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.62 | bwd_microstep: 1487.50 | bwd_inner_microstep: 1487.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3399
[2024-06-10 14:04:43,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.45 | bwd_microstep: 1208.43 | bwd_inner_microstep: 1208.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3765
[2024-06-10 14:04:45,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1495.48 | bwd_inner_microstep: 1495.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3584
[2024-06-10 14:04:47,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.88 | bwd_microstep: 1205.87 | bwd_inner_microstep: 1205.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3763
[2024-06-10 14:04:49,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1401.73 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 14:04:51,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.37 | bwd_microstep: 1282.74 | bwd_inner_microstep: 1282.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2444
[2024-06-10 14:04:52,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.50 | bwd_microstep: 1014.52 | bwd_inner_microstep: 1014.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 14:04:54,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.48 | bwd_microstep: 1255.01 | bwd_inner_microstep: 1254.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 14:04:55,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.26 | bwd_microstep: 797.86 | bwd_inner_microstep: 797.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3857
[2024-06-10 14:04:57,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.94 | bwd_microstep: 1624.15 | bwd_inner_microstep: 1624.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 14:04:59,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1444.92 | bwd_inner_microstep: 1444.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588
[2024-06-10 14:05:01,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.36 | bwd_microstep: 1699.02 | bwd_inner_microstep: 1698.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 14:05:03,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.81 | bwd_microstep: 1385.22 | bwd_inner_microstep: 1385.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2310
[2024-06-10 14:05:05,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.06 | bwd_microstep: 1077.29 | bwd_inner_microstep: 1077.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021
[2024-06-10 14:05:06,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.12 | bwd_microstep: 714.06 | bwd_inner_microstep: 714.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 14:05:08,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1508.93 | bwd_inner_microstep: 1508.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 14:05:10,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1407.57 | bwd_inner_microstep: 1407.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 14:05:12,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.70 | bwd_microstep: 1448.55 | bwd_inner_microstep: 1448.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 14:05:14,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1285.98 | bwd_inner_microstep: 1285.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2277
[2024-06-10 14:05:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.63 | bwd_microstep: 812.82 | bwd_inner_microstep: 812.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3555
[2024-06-10 14:05:17,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1536.40 | bwd_inner_microstep: 1536.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556
[2024-06-10 14:05:19,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.33 | bwd_microstep: 1359.60 | bwd_inner_microstep: 1359.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3810
[2024-06-10 14:05:21,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.23 | bwd_microstep: 1721.98 | bwd_inner_microstep: 1721.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 14:05:23,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1374.02 | bwd_inner_microstep: 1373.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 14:05:25,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.38 | bwd_microstep: 1298.59 | bwd_inner_microstep: 1298.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 14:05:27,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1542.10 | bwd_inner_microstep: 1542.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3783
[2024-06-10 14:05:29,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.79 | bwd_microstep: 1579.59 | bwd_inner_microstep: 1579.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3563
[2024-06-10 14:05:31,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1520.66 | bwd_inner_microstep: 1520.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2284
[2024-06-10 14:05:37,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.33 | optimizer_step: 6.62
[2024-06-10 14:05:37,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.44 | bwd_microstep: 5093.23 | bwd_inner_microstep: 1062.20 | bwd_allreduce_microstep: 4030.96 | step_microstep: 38.76
[2024-06-10 14:05:37,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15670.80 | bwd: 45939.59 | bwd_inner: 41907.71 | bwd_allreduce: 4031.20 | step: 40.25
/it]
 45%|████▍     | 774/1726 [13:23:10<16:09:32, 61.11s/it]


 45%|████▍     | 774/1726 [13:23:10<16:09:32, 61.11s/it]
 45%|████▍     | 775/1726 [13:24:11<16:04:57, 60.88s/it]


 45%|████▍     | 775/1726 [13:24:11<16:04:57, 60.88s/it]
 45%|████▍     | 776/1726 [13:25:11<16:01:46, 60.74s/it]


 45%|████▍     | 776/1726 [13:25:11<16:01:46, 60.74s/it]
 45%|████▌     | 777/1726 [13:26:13<16:05:05, 61.02s/it]


 45%|████▌     | 777/1726 [13:26:13<16:05:05, 61.02s/it]
 45%|████▌     | 778/1726 [13:27:12<15:53:59, 60.38s/it]


 45%|████▌     | 778/1726 [13:27:12<15:53:59, 60.38s/it]
 45%|████▌     | 779/1726 [13:28:13<16:00:23, 60.{'loss': 1.2722, 'learning_rate': 2.4099473150239306e-05, 'epoch': 0.45}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942
[2024-06-10 14:05:38,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.74 | bwd_microstep: 875.65 | bwd_inner_microstep: 875.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 14:05:40,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.37 | bwd_microstep: 1275.81 | bwd_inner_microstep: 1275.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4285
[2024-06-10 14:05:42,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1667.55 | bwd_inner_microstep: 1667.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-10 14:05:44,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.66 | bwd_microstep: 1643.96 | bwd_inner_microstep: 1643.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 14:05:46,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1281.37 | bwd_inner_microstep: 1281.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 14:05:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.99 | bwd_microstep: 790.38 | bwd_inner_microstep: 790.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 14:05:49,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.39 | bwd_microstep: 1183.97 | bwd_inner_microstep: 1183.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2902
[2024-06-10 14:05:50,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.55 | bwd_microstep: 1188.08 | bwd_inner_microstep: 1188.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 14:05:52,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.37 | bwd_microstep: 1247.09 | bwd_inner_microstep: 1247.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 14:05:54,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.12 | bwd_microstep: 1160.44 | bwd_inner_microstep: 1160.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 14:05:56,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.43 | bwd_microstep: 1518.98 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 14:05:58,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.22 | bwd_microstep: 1379.49 | bwd_inner_microstep: 1379.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3724
[2024-06-10 14:06:00,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.39 | bwd_microstep: 1657.56 | bwd_inner_microstep: 1657.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656
[2024-06-10 14:06:02,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.36 | bwd_microstep: 1613.20 | bwd_inner_microstep: 1613.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 14:06:04,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.04 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-10 14:06:06,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.46 | bwd_microstep: 1666.02 | bwd_inner_microstep: 1665.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3669
[2024-06-10 14:06:08,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1323.95 | bwd_inner_microstep: 1323.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526
[2024-06-10 14:06:10,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.76 | bwd_microstep: 1452.88 | bwd_inner_microstep: 1452.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 14:06:12,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1554.71 | bwd_inner_microstep: 1554.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3621
[2024-06-10 14:06:14,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 1341.52 | bwd_inner_microstep: 1341.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819
[2024-06-10 14:06:16,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.14 | bwd_microstep: 1386.45 | bwd_inner_microstep: 1386.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 14:06:18,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.21 | bwd_microstep: 1505.17 | bwd_inner_microstep: 1505.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 14:06:20,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1282.52 | bwd_inner_microstep: 1282.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 14:06:22,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1436.87 | bwd_inner_microstep: 1436.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2183
[2024-06-10 14:06:23,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 891.00 | bwd_inner_microstep: 890.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 14:06:25,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.41 | bwd_microstep: 1448.82 | bwd_inner_microstep: 1448.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 14:06:27,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.36 | bwd_microstep: 1623.57 | bwd_inner_microstep: 1623.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 14:06:29,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1491.90 | bwd_inner_microstep: 1491.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3776
[2024-06-10 14:06:31,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.23 | bwd_microstep: 1342.49 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2053
[2024-06-10 14:06:32,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.37 | bwd_microstep: 751.04 | bwd_inner_microstep: 751.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627
[2024-06-10 14:06:35,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.91 | bwd_microstep: 1709.62 | bwd_inner_microstep: 1709.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398
[2024-06-10 14:06:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 14:06:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 3485.25 | bwd_inner_microstep: 1553.34 | bwd_allreduce_microstep: 1931.86 | step_microstep: 37.89
[2024-06-10 14:06:39,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16231.05 | bwd: 45463.77 | bwd_inner: 43530.98 | bwd_allreduce: 1932.10 | step: 39.36
{'loss': 1.2515, 'learning_rate': 2.4062728930461345e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 14:06:41,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.27 | bwd_microstep: 1467.30 | bwd_inner_microstep: 1467.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 14:06:43,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.25 | bwd_microstep: 1341.54 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3393
[2024-06-10 14:06:44,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1303.34 | bwd_inner_microstep: 1303.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 14:06:46,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1381.14 | bwd_inner_microstep: 1381.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:06:48,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1387.12 | bwd_inner_microstep: 1387.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 14:06:50,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.81 | bwd_microstep: 1291.53 | bwd_inner_microstep: 1291.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 14:06:52,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1383.61 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 14:06:54,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1244.73 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2455
[2024-06-10 14:06:55,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.48 | bwd_microstep: 947.54 | bwd_inner_microstep: 947.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3419
[2024-06-10 14:06:57,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.77 | bwd_microstep: 1198.45 | bwd_inner_microstep: 1198.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 14:06:58,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1285.61 | bwd_inner_microstep: 1285.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-10 14:07:00,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.13 | bwd_microstep: 1343.09 | bwd_inner_microstep: 1343.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 14:07:02,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.19 | bwd_microstep: 1481.67 | bwd_inner_microstep: 1481.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3697
[2024-06-10 14:07:05,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.61 | bwd_microstep: 1584.48 | bwd_inner_microstep: 1584.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-10 14:07:07,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1544.32 | bwd_inner_microstep: 1544.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1924
[2024-06-10 14:07:08,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.73 | bwd_microstep: 819.35 | bwd_inner_microstep: 819.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1991
[2024-06-10 14:07:09,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.77 | bwd_microstep: 894.50 | bwd_inner_microstep: 894.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 14:07:11,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1504.89 | bwd_inner_microstep: 1504.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1973
[2024-06-10 14:07:12,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.66 | bwd_microstep: 891.55 | bwd_inner_microstep: 891.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432
[2024-06-10 14:07:14,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1185.74 | bwd_inner_microstep: 1185.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 14:07:16,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.28 | bwd_microstep: 1656.26 | bwd_inner_microstep: 1656.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 14:07:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1358.84 | bwd_inner_microstep: 1358.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 14:07:20,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.86 | bwd_microstep: 1390.83 | bwd_inner_microstep: 1390.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2922
[2024-06-10 14:07:22,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.12 | bwd_microstep: 1065.31 | bwd_inner_microstep: 1065.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2134
[2024-06-10 14:07:23,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.29 | bwd_microstep: 832.27 | bwd_inner_microstep: 832.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 14:07:25,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.44 | bwd_microstep: 1396.78 | bwd_inner_microstep: 1396.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 14:07:26,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.57 | bwd_microstep: 972.09 | bwd_inner_microstep: 972.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 14:07:28,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1557.46 | bwd_inner_microstep: 1557.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2217
[2024-06-10 14:07:29,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.66 | bwd_microstep: 862.58 | bwd_inner_microstep: 862.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 14:07:31,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.15 | bwd_microstep: 1187.71 | bwd_inner_microstep: 1187.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 14:07:33,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.84 | bwd_microstep: 1602.48 | bwd_inner_microstep: 1602.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2097
[2024-06-10 14:07:39,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 14:07:39,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.72 | bwd_microstep: 5451.70 | bwd_inner_microstep: 1050.10 | bwd_allreduce_microstep: 4401.55 | step_microstep: 37.81
[2024-06-10 14:07:39,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15103.88 | bwd: 44815.82 | bwd_inner: 40413.37 | bwd_allreduce: 4401.78 | step: 39.25
{'loss': 1.2338, 'learning_rate': 2.4025970401773204e-05, 'epoch': 0.45}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1925
[2024-06-10 14:07:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.74 | bwd_microstep: 746.88 | bwd_inner_microstep: 746.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873
[2024-06-10 14:07:42,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.43 | bwd_microstep: 1675.74 | bwd_inner_microstep: 1675.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 14:07:44,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1447.46 | bwd_inner_microstep: 1447.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 14:07:46,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.32 | bwd_microstep: 1477.24 | bwd_inner_microstep: 1477.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 14:07:49,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.75 | bwd_microstep: 1547.23 | bwd_inner_microstep: 1547.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 14:07:50,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.91 | bwd_microstep: 1183.58 | bwd_inner_microstep: 1183.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 14:07:52,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.28 | bwd_microstep: 1529.46 | bwd_inner_microstep: 1529.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 14:07:55,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.08 | bwd_microstep: 1632.46 | bwd_inner_microstep: 1632.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199
[2024-06-10 14:07:56,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.95 | bwd_microstep: 955.13 | bwd_inner_microstep: 955.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-10 14:07:58,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.00 | bwd_microstep: 1186.93 | bwd_inner_microstep: 1186.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3684
[2024-06-10 14:08:00,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.31 | bwd_microstep: 1476.39 | bwd_inner_microstep: 1476.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2925
[2024-06-10 14:08:01,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.33 | bwd_microstep: 1158.25 | bwd_inner_microstep: 1158.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 14:08:03,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1483.45 | bwd_inner_microstep: 1483.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 14:08:04,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.83 | bwd_microstep: 787.39 | bwd_inner_microstep: 787.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3674
[2024-06-10 14:08:06,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.88 | bwd_microstep: 1326.52 | bwd_inner_microstep: 1326.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829
[2024-06-10 14:08:08,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.07 | bwd_microstep: 1519.16 | bwd_inner_microstep: 1519.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 14:08:10,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1280.97 | bwd_inner_microstep: 1280.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833
[2024-06-10 14:08:12,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.68 | bwd_microstep: 1464.45 | bwd_inner_microstep: 1464.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 14:08:14,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.58 | bwd_microstep: 1409.46 | bwd_inner_microstep: 1409.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 14:08:16,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1399.72 | bwd_inner_microstep: 1399.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942
[2024-06-10 14:08:17,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.42 | bwd_microstep: 700.06 | bwd_inner_microstep: 700.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3824
[2024-06-10 14:08:19,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.92 | bwd_microstep: 1490.42 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 14:08:21,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.48 | bwd_microstep: 1393.54 | bwd_inner_microstep: 1393.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 14:08:23,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1399.79 | bwd_inner_microstep: 1399.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 14:08:24,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.52 | bwd_microstep: 687.86 | bwd_inner_microstep: 687.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3599
[2024-06-10 14:08:26,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.63 | bwd_microstep: 1467.36 | bwd_inner_microstep: 1467.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271
[2024-06-10 14:08:27,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.40 | bwd_microstep: 1000.80 | bwd_inner_microstep: 1000.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2272
[2024-06-10 14:08:28,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.98 | bwd_microstep: 908.86 | bwd_inner_microstep: 908.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2647
[2024-06-10 14:08:30,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.27 | bwd_microstep: 952.80 | bwd_inner_microstep: 952.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 14:08:32,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.72 | bwd_microstep: 1281.11 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 14:08:34,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1495.58 | bwd_inner_microstep: 1495.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 14:08:44,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 14:08:44,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 9923.87 | bwd_inner_microstep: 1567.50 | bwd_allreduce_microstep: 8356.31 | step_microstep: 38.00
[2024-06-10 14:08:44,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15337.76 | bwd: 49389.94 | bwd_inner: 41032.69 | bwd_allreduce: 8356.55 | step: 39.48
{'loss': 1.2427, 'learning_rate': 2.3989197693638237e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 14:08:46,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.05 | bwd_microstep: 1463.20 | bwd_inner_microstep: 1463.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3944
[2024-06-10 14:08:48,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.14 | bwd_microstep: 1591.81 | bwd_inner_microstep: 1591.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 14:08:50,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1441.30 | bwd_inner_microstep: 1441.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-10 14:08:52,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 1439.76 | bwd_inner_microstep: 1439.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2361
[2024-06-10 14:08:54,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.67 | bwd_microstep: 1019.21 | bwd_inner_microstep: 1019.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 14:08:55,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.83 | bwd_microstep: 726.05 | bwd_inner_microstep: 726.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3424
[2024-06-10 14:08:56,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.35 | bwd_microstep: 1211.84 | bwd_inner_microstep: 1211.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1888
[2024-06-10 14:08:57,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.30 | bwd_microstep: 711.61 | bwd_inner_microstep: 711.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3412
[2024-06-10 14:08:59,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1292.38 | bwd_inner_microstep: 1292.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-10 14:09:01,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1410.18 | bwd_inner_microstep: 1410.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3698
[2024-06-10 14:09:03,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1327.67 | bwd_inner_microstep: 1327.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3410
[2024-06-10 14:09:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.55 | bwd_microstep: 1403.71 | bwd_inner_microstep: 1403.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 14:09:07,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1343.83 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-10 14:09:09,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.29 | bwd_microstep: 1716.85 | bwd_inner_microstep: 1716.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3644
[2024-06-10 14:09:11,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.86 | bwd_microstep: 1704.24 | bwd_inner_microstep: 1704.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 14:09:13,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1486.00 | bwd_inner_microstep: 1485.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-10 14:09:16,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.76 | bwd_microstep: 1524.22 | bwd_inner_microstep: 1524.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 14:09:17,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1388.80 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 14:09:19,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 797.34 | bwd_inner_microstep: 797.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 14:09:20,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1284.65 | bwd_inner_microstep: 1284.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 14:09:22,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.92 | bwd_microstep: 1279.27 | bwd_inner_microstep: 1279.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3737
[2024-06-10 14:09:24,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1430.37 | bwd_inner_microstep: 1430.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 14:09:26,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1184.72 | bwd_inner_microstep: 1184.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 14:09:28,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1401.28 | bwd_inner_microstep: 1401.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176
[2024-06-10 14:09:29,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.11 | bwd_microstep: 856.46 | bwd_inner_microstep: 856.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3459
[2024-06-10 14:09:31,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.27 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-10 14:09:32,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.84 | bwd_microstep: 1191.78 | bwd_inner_microstep: 1191.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 14:09:34,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1495.35 | bwd_inner_microstep: 1495.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-10 14:09:36,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1413.66 | bwd_inner_microstep: 1413.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 14:09:38,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.01 | bwd_microstep: 1445.43 | bwd_inner_microstep: 1445.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579
[2024-06-10 14:09:40,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1527.14 | bwd_inner_microstep: 1527.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 14:09:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 14:09:45,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 4583.76 | bwd_inner_microstep: 1405.69 | bwd_allreduce_microstep: 3178.01 | step_microstep: 38.40
[2024-06-10 14:09:46,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15744.99 | bwd: 45337.46 | bwd_inner: 42158.51 | bwd_allreduce: 3178.24 | step: 39.93
{'loss': 1.2765, 'learning_rate': 2.395241093556974e-05, 'epoch': 0.45}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3459
[2024-06-10 14:09:47,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1402.46 | bwd_inner_microstep: 1402.30 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 14:09:50,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.30 | bwd_microstep: 1678.97 | bwd_inner_microstep: 1678.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4277
[2024-06-10 14:09:52,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.13 | bwd_microstep: 1766.74 | bwd_inner_microstep: 1766.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3849
[2024-06-10 14:09:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.03 | bwd_microstep: 1463.11 | bwd_inner_microstep: 1463.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 14:09:56,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1244.11 | bwd_inner_microstep: 1244.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 14:09:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.23 | bwd_microstep: 1640.09 | bwd_inner_microstep: 1640.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 14:10:00,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.19 | bwd_microstep: 1637.84 | bwd_inner_microstep: 1637.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3756
[2024-06-10 14:10:02,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.82 | bwd_microstep: 1464.47 | bwd_inner_microstep: 1464.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 14:10:04,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1251.77 | bwd_inner_microstep: 1251.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3844
[2024-06-10 14:10:07,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 679.55 | bwd_microstep: 1866.67 | bwd_inner_microstep: 1866.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432
[2024-06-10 14:10:09,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.92 | bwd_microstep: 1311.20 | bwd_inner_microstep: 1311.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1123
[2024-06-10 14:10:09,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 180.22 | bwd_microstep: 473.01 | bwd_inner_microstep: 472.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 14:10:11,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1343.54 | bwd_inner_microstep: 1343.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:10:13,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1389.95 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699
[2024-06-10 14:10:15,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.14 | bwd_microstep: 1619.50 | bwd_inner_microstep: 1619.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3512
[2024-06-10 14:10:17,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.15 | bwd_microstep: 1555.60 | bwd_inner_microstep: 1555.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3640
[2024-06-10 14:10:19,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.72 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 14:10:21,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.86 | bwd_microstep: 1436.61 | bwd_inner_microstep: 1436.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3606
[2024-06-10 14:10:23,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1341.12 | bwd_inner_microstep: 1341.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 14:10:25,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.15 | bwd_microstep: 1556.00 | bwd_inner_microstep: 1555.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2426
[2024-06-10 14:10:27,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.82 | bwd_microstep: 943.28 | bwd_inner_microstep: 943.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 14:10:28,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 14:10:30,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.09 | bwd_microstep: 813.62 | bwd_inner_microstep: 813.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 14:10:31,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.38 | bwd_microstep: 1157.36 | bwd_inner_microstep: 1157.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 14:10:33,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.07 | bwd_microstep: 1437.76 | bwd_inner_microstep: 1437.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 14:10:35,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 1553.23 | bwd_inner_microstep: 1553.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-10 14:10:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.61 | bwd_microstep: 1447.23 | bwd_inner_microstep: 1447.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 14:10:39,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.07 | bwd_microstep: 1428.51 | bwd_inner_microstep: 1428.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3562
[2024-06-10 14:10:42,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.59 | bwd_microstep: 1642.09 | bwd_inner_microstep: 1642.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3580
[2024-06-10 14:10:44,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.01 | bwd_microstep: 1627.40 | bwd_inner_microstep: 1627.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805
[2024-06-10 14:10:46,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.79 | bwd_microstep: 1748.07 | bwd_inner_microstep: 1748.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3825
[2024-06-10 14:10:49,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.63
[2024-06-10 14:10:49,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.62 | bwd_microstep: 1731.06 | bwd_inner_microstep: 1723.34 | bwd_allreduce_microstep: 7.67 | step_microstep: 37.63
[2024-06-10 14:10:49,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17018.91 | bwd: 45761.62 | bwd_inner: 45752.94 | bwd_allreduce: 7.96 | step: 39.11
85s/it]


 45%|████▌     | 779/1726 [13:28:13<16:00:23, 60.85s/it]
 45%|████▌     | 780/1726 [13:29:16<16:04:55, 61.20s/it]


 45%|████▌     | 780/1726 [13:29:16<16:04:55, 61.20s/it]
 45%|████▌     | 781/1726 [13:30:16<15:59:24, 60.91s/it]


 45%|████▌     | 781/1726 [13:30:16<15:59:24, 60.91s/it]
 45%|████▌     | 782/1726 [13:31:21<16:17:57, 62.16s/it]


 45%|████▌     | 782/1726 [13:31:21<16:17:57, 62.16s/it]
 45%|████▌     | 783/1726 [13:32:22<16:13:27, 61.94s/it]


 45%|████▌     | 783/1726 [13:32:22<16:13:27, 61.94s/it]
 45%|████▌     | 784/1726 [13:33:25<16:17:59, 62.29s/it]
                                                        {'loss': 1.2666, 'learning_rate': 2.3915610257130464e-05, 'epoch': 0.45}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4447
[2024-06-10 14:10:51,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.90 | bwd_microstep: 1584.08 | bwd_inner_microstep: 1584.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 14:10:53,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 14:10:54,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.06 | bwd_microstep: 1389.95 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 14:10:56,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.34 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 14:10:59,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1638.95 | bwd_inner_microstep: 1638.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 14:11:00,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 795.17 | bwd_inner_microstep: 795.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 14:11:02,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1283.40 | bwd_inner_microstep: 1283.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 14:11:04,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.41 | bwd_microstep: 1427.20 | bwd_inner_microstep: 1427.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 14:11:05,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 793.24 | bwd_inner_microstep: 793.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 14:11:07,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.97 | bwd_microstep: 1372.37 | bwd_inner_microstep: 1372.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135
[2024-06-10 14:11:08,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.30 | bwd_microstep: 837.65 | bwd_inner_microstep: 837.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 14:11:09,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.99 | bwd_microstep: 1161.62 | bwd_inner_microstep: 1161.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 14:11:11,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.77 | bwd_microstep: 1474.97 | bwd_inner_microstep: 1474.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380
[2024-06-10 14:11:13,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.06 | bwd_microstep: 1334.96 | bwd_inner_microstep: 1334.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2823
[2024-06-10 14:11:15,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.43 | bwd_microstep: 1062.47 | bwd_inner_microstep: 1062.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 14:11:16,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.28 | bwd_microstep: 1280.34 | bwd_inner_microstep: 1280.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641
[2024-06-10 14:11:19,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.71 | bwd_microstep: 1611.25 | bwd_inner_microstep: 1611.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2135
[2024-06-10 14:11:20,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 958.29 | bwd_inner_microstep: 958.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3539
[2024-06-10 14:11:22,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1230.75 | bwd_inner_microstep: 1230.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 14:11:24,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.84 | bwd_microstep: 1658.36 | bwd_inner_microstep: 1658.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 14:11:26,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.06 | bwd_microstep: 1376.07 | bwd_inner_microstep: 1376.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3598
[2024-06-10 14:11:28,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.23 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486
[2024-06-10 14:11:29,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.36 | bwd_microstep: 1187.48 | bwd_inner_microstep: 1187.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 14:11:32,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.03 | bwd_microstep: 1559.04 | bwd_inner_microstep: 1559.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 14:11:34,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1535.68 | bwd_inner_microstep: 1535.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568
[2024-06-10 14:11:36,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1360.27 | bwd_inner_microstep: 1360.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 14:11:37,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1345.84 | bwd_inner_microstep: 1345.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3573
[2024-06-10 14:11:39,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.50 | bwd_microstep: 1454.38 | bwd_inner_microstep: 1454.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 14:11:41,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1401.91 | bwd_inner_microstep: 1401.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 14:11:43,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.50 | bwd_microstep: 1310.78 | bwd_inner_microstep: 1310.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 14:11:45,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.40 | bwd_microstep: 1561.28 | bwd_inner_microstep: 1561.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3805
[2024-06-10 14:11:48,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 14:11:48,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.87 | bwd_microstep: 2206.49 | bwd_inner_microstep: 1833.89 | bwd_allreduce_microstep: 372.56 | step_microstep: 37.66
[2024-06-10 14:11:48,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16029.96 | bwd: 43231.96 | bwd_inner: 42858.49 | bwd_allreduce: 372.79 | step: 39.09
{'loss': 1.2458, 'learning_rate': 2.387879578793222e-05, 'epoch': 0.45}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 14:11:50,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1491.53 | bwd_inner_microstep: 1491.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 14:11:52,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1312.02 | bwd_inner_microstep: 1311.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 14:11:54,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.21 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-10 14:11:56,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.75 | bwd_microstep: 1352.56 | bwd_inner_microstep: 1352.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 14:11:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1551.59 | bwd_inner_microstep: 1551.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432
[2024-06-10 14:12:00,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.18 | bwd_microstep: 1186.30 | bwd_inner_microstep: 1186.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 14:12:02,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.13 | bwd_microstep: 1625.91 | bwd_inner_microstep: 1625.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 14:12:04,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.03 | bwd_microstep: 1287.72 | bwd_inner_microstep: 1287.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1930
[2024-06-10 14:12:05,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.99 | bwd_microstep: 882.98 | bwd_inner_microstep: 882.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 14:12:07,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1379.09 | bwd_inner_microstep: 1379.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 14:12:09,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1347.62 | bwd_inner_microstep: 1347.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3494
[2024-06-10 14:12:11,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.83 | bwd_microstep: 1510.46 | bwd_inner_microstep: 1510.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-10 14:12:13,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.73 | bwd_microstep: 1585.29 | bwd_inner_microstep: 1585.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 14:12:15,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.66 | bwd_microstep: 1484.82 | bwd_inner_microstep: 1484.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 14:12:17,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.26 | bwd_inner_microstep: 1345.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 14:12:19,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1515.48 | bwd_inner_microstep: 1515.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 14:12:21,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1397.49 | bwd_inner_microstep: 1397.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 14:12:23,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1397.37 | bwd_inner_microstep: 1397.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 14:12:25,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1661.11 | bwd_inner_microstep: 1661.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2121
[2024-06-10 14:12:26,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.70 | bwd_microstep: 861.10 | bwd_inner_microstep: 861.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 14:12:27,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.77 | bwd_microstep: 697.36 | bwd_inner_microstep: 697.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 14:12:29,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.36 | bwd_microstep: 1402.53 | bwd_inner_microstep: 1402.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 14:12:30,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.63 | bwd_microstep: 875.51 | bwd_inner_microstep: 875.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 14:12:33,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.08 | bwd_microstep: 1557.09 | bwd_inner_microstep: 1557.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 14:12:35,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.57 | bwd_microstep: 1609.84 | bwd_inner_microstep: 1609.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 14:12:36,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.37 | bwd_microstep: 1192.69 | bwd_inner_microstep: 1192.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 14:12:38,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.36 | bwd_microstep: 1350.58 | bwd_inner_microstep: 1350.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 14:12:40,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.28 | bwd_microstep: 1326.74 | bwd_inner_microstep: 1326.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3521
[2024-06-10 14:12:42,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.30 | bwd_microstep: 1353.48 | bwd_inner_microstep: 1353.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3565
[2024-06-10 14:12:44,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.90 | bwd_microstep: 1444.56 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 14:12:46,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1600.33 | bwd_inner_microstep: 1600.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764
[2024-06-10 14:12:49,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.57
[2024-06-10 14:12:49,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.60 | bwd_microstep: 2095.80 | bwd_inner_microstep: 1576.14 | bwd_allreduce_microstep: 519.61 | step_microstep: 37.50
[2024-06-10 14:12:49,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16261.04 | bwd: 44072.51 | bwd_inner: 43552.01 | bwd_allreduce: 519.83 | step: 38.94
{'loss': 1.2548, 'learning_rate': 2.3841967657635384e-05, 'epoch': 0.46}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3455
[2024-06-10 14:12:51,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.07 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3928
[2024-06-10 14:12:53,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1589.82 | bwd_inner_microstep: 1589.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 14:12:54,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.92 | bwd_microstep: 824.55 | bwd_inner_microstep: 824.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 14:12:56,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-10 14:12:58,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.24 | bwd_microstep: 1445.58 | bwd_inner_microstep: 1445.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2906
[2024-06-10 14:13:00,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.38 | bwd_microstep: 1092.20 | bwd_inner_microstep: 1092.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 14:13:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1377.34 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2239
[2024-06-10 14:13:03,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.01 | bwd_microstep: 863.78 | bwd_inner_microstep: 863.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-10 14:13:04,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.50 | bwd_microstep: 1188.41 | bwd_inner_microstep: 1188.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2470
[2024-06-10 14:13:06,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.69 | bwd_microstep: 1021.24 | bwd_inner_microstep: 1021.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1964
[2024-06-10 14:13:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.14 | bwd_microstep: 894.10 | bwd_inner_microstep: 894.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 14:13:09,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1483.13 | bwd_inner_microstep: 1483.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 14:13:11,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.24 | bwd_microstep: 1482.79 | bwd_inner_microstep: 1482.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521
[2024-06-10 14:13:13,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.14 | bwd_microstep: 1454.87 | bwd_inner_microstep: 1454.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 14:13:14,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.44 | bwd_microstep: 915.38 | bwd_inner_microstep: 915.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639
[2024-06-10 14:13:16,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.49 | bwd_microstep: 1313.39 | bwd_inner_microstep: 1313.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-10 14:13:18,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.59 | bwd_microstep: 1620.80 | bwd_inner_microstep: 1620.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 14:13:20,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1291.09 | bwd_inner_microstep: 1291.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 14:13:22,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.53 | bwd_microstep: 1313.58 | bwd_inner_microstep: 1313.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3636
[2024-06-10 14:13:24,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.53 | bwd_microstep: 1564.56 | bwd_inner_microstep: 1564.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 14:13:26,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.38 | bwd_microstep: 1292.12 | bwd_inner_microstep: 1292.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 14:13:28,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1259.85 | bwd_inner_microstep: 1259.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 14:13:30,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.79 | bwd_microstep: 1451.70 | bwd_inner_microstep: 1451.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 14:13:32,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.38 | bwd_microstep: 1316.02 | bwd_inner_microstep: 1316.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 14:13:34,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.81 | bwd_microstep: 1487.79 | bwd_inner_microstep: 1487.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 14:13:35,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.79 | bwd_microstep: 779.09 | bwd_inner_microstep: 779.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 14:13:37,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.93 | bwd_microstep: 1380.36 | bwd_inner_microstep: 1380.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-10 14:13:38,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 703.65 | bwd_inner_microstep: 703.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 14:13:39,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.49 | bwd_microstep: 910.93 | bwd_inner_microstep: 910.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 14:13:41,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1392.60 | bwd_inner_microstep: 1392.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 14:13:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1514.64 | bwd_inner_microstep: 1514.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3388
[2024-06-10 14:13:50,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 14:13:50,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.73 | bwd_microstep: 6649.31 | bwd_inner_microstep: 1557.09 | bwd_allreduce_microstep: 5092.17 | step_microstep: 38.07
[2024-06-10 14:13:50,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15153.83 | bwd: 45655.04 | bwd_inner: 40561.83 | bwd_allreduce: 5092.48 | step: 39.66
{'loss': 1.2866, 'learning_rate': 2.3805125995948422e-05, 'epoch': 0.46}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870
[2024-06-10 14:13:51,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.61 | bwd_microstep: 704.10 | bwd_inner_microstep: 704.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 14:13:53,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1516.64 | bwd_inner_microstep: 1516.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 14:13:55,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1481.78 | bwd_inner_microstep: 1481.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866
[2024-06-10 14:13:57,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.27 | bwd_microstep: 1560.61 | bwd_inner_microstep: 1560.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3853
[2024-06-10 14:14:00,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1634.74 | bwd_inner_microstep: 1634.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831
[2024-06-10 14:14:01,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.67 | bwd_microstep: 1387.73 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3499
[2024-06-10 14:14:03,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1247.81 | bwd_inner_microstep: 1247.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 14:14:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.35 | bwd_microstep: 1437.43 | bwd_inner_microstep: 1437.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 14:14:07,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1531.69 | bwd_inner_microstep: 1531.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2103
[2024-06-10 14:14:08,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.79 | bwd_microstep: 855.13 | bwd_inner_microstep: 855.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-10 14:14:09,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.71 | bwd_microstep: 703.02 | bwd_inner_microstep: 703.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3667
[2024-06-10 14:14:12,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1566.86 | bwd_inner_microstep: 1566.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 14:14:14,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 14:14:16,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.48 | bwd_microstep: 1582.06 | bwd_inner_microstep: 1582.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 14:14:18,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1347.07 | bwd_inner_microstep: 1347.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2139
[2024-06-10 14:14:19,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.19 | bwd_microstep: 928.55 | bwd_inner_microstep: 928.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1940
[2024-06-10 14:14:20,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.94 | bwd_microstep: 729.80 | bwd_inner_microstep: 729.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-10 14:14:22,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.28 | bwd_microstep: 1313.04 | bwd_inner_microstep: 1313.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 14:14:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.18 | bwd_microstep: 812.98 | bwd_inner_microstep: 812.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528
[2024-06-10 14:14:25,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.93 | bwd_microstep: 1323.91 | bwd_inner_microstep: 1323.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 14:14:27,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1279.03 | bwd_inner_microstep: 1279.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2081
[2024-06-10 14:14:28,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.10 | bwd_microstep: 817.25 | bwd_inner_microstep: 817.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 14:14:29,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.15 | bwd_microstep: 1184.62 | bwd_inner_microstep: 1184.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 14:14:30,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.05 | bwd_microstep: 801.45 | bwd_inner_microstep: 801.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 14:14:32,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.27 | bwd_microstep: 1300.50 | bwd_inner_microstep: 1300.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 14:14:34,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.86 | bwd_microstep: 1506.95 | bwd_inner_microstep: 1506.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3565
[2024-06-10 14:14:36,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.88 | bwd_microstep: 1427.80 | bwd_inner_microstep: 1427.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3765
[2024-06-10 14:14:38,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.27 | bwd_microstep: 1393.08 | bwd_inner_microstep: 1393.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 14:14:40,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.49 | bwd_microstep: 1470.71 | bwd_inner_microstep: 1470.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 14:14:41,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.02 | bwd_microstep: 877.17 | bwd_inner_microstep: 877.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3582
[2024-06-10 14:14:43,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.59 | bwd_microstep: 1206.12 | bwd_inner_microstep: 1206.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 14:14:52,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.35 | optimizer_step: 6.61
[2024-06-10 14:14:52,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.09 | bwd_microstep: 8434.68 | bwd_inner_microstep: 1581.27 | bwd_allreduce_microstep: 6853.35 | step_microstep: 38.65
[2024-06-10 14:14:52,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14962.41 | bwd: 46850.68 | bwd_inner: 39996.38 | bwd_allreduce: 6853.60 | step: 40.08
{'loss': 1.219, 'learning_rate': 2.3768270932627485e-05, 'epoch': 0.46}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 14:14:54,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.35 | bwd_microstep: 1333.07 | bwd_inner_microstep: 1333.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3973
[2024-06-10 14:14:56,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1598.33 | bwd_inner_microstep: 1598.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 14:14:58,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.68 | bwd_microstep: 1272.63 | bwd_inner_microstep: 1272.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 14:15:00,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.68 | bwd_microstep: 1477.38 | bwd_inner_microstep: 1477.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 14:15:02,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1278.36 | bwd_inner_microstep: 1278.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 14:15:04,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1248.95 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 14:15:05,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1273.28 | bwd_inner_microstep: 1273.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897
[2024-06-10 14:15:06,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.43 | bwd_microstep: 713.82 | bwd_inner_microstep: 713.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3560
[2024-06-10 14:15:08,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.94 | bwd_microstep: 1201.92 | bwd_inner_microstep: 1201.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 14:15:10,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.94 | bwd_microstep: 1285.44 | bwd_inner_microstep: 1285.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 14:15:12,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.36 | bwd_microstep: 1525.31 | bwd_inner_microstep: 1525.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:15:14,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1391.72 | bwd_inner_microstep: 1391.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 14:15:16,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1503.55 | bwd_inner_microstep: 1503.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1862
[2024-06-10 14:15:17,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.14 | bwd_microstep: 767.85 | bwd_inner_microstep: 767.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-10 14:15:19,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.64 | bwd_microstep: 1581.00 | bwd_inner_microstep: 1580.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 14:15:21,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.49 | bwd_microstep: 1516.48 | bwd_inner_microstep: 1516.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 14:15:23,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470
[2024-06-10 14:15:25,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1409.70 | bwd_inner_microstep: 1409.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2106
[2024-06-10 14:15:26,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.16 | bwd_microstep: 820.25 | bwd_inner_microstep: 820.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2495
[2024-06-10 14:15:28,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.83 | bwd_microstep: 1056.18 | bwd_inner_microstep: 1056.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-10 14:15:30,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.14 | bwd_microstep: 1328.14 | bwd_inner_microstep: 1328.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3610
[2024-06-10 14:15:32,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1534.97 | bwd_inner_microstep: 1534.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 14:15:34,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.87 | bwd_microstep: 1459.93 | bwd_inner_microstep: 1459.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2328
[2024-06-10 14:15:35,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.27 | bwd_microstep: 890.25 | bwd_inner_microstep: 890.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 14:15:37,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.14 | bwd_microstep: 1609.56 | bwd_inner_microstep: 1609.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 14:15:39,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.49 | bwd_microstep: 1499.64 | bwd_inner_microstep: 1499.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 14:15:41,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.40 | bwd_microstep: 1496.95 | bwd_inner_microstep: 1496.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506
[2024-06-10 14:15:43,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.30 | bwd_microstep: 1190.49 | bwd_inner_microstep: 1190.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3578
[2024-06-10 14:15:45,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.44 | bwd_microstep: 1447.61 | bwd_inner_microstep: 1447.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 14:15:47,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.00 | bwd_microstep: 1278.12 | bwd_inner_microstep: 1278.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609
[2024-06-10 14:15:49,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.57 | bwd_microstep: 1537.69 | bwd_inner_microstep: 1537.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3382
[2024-06-10 14:15:55,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.21 | optimizer_step: 6.57
[2024-06-10 14:15:55,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.04 | bwd_microstep: 5691.50 | bwd_inner_microstep: 1553.88 | bwd_allreduce_microstep: 4137.56 | step_microstep: 37.98
[2024-06-10 14:15:55,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15884.56 | bwd: 46701.46 | bwd_inner: 42562.99 | bwd_allreduce: 4137.79 | step: 39.44
{'loss': 1.2646, 'learning_rate': 2.3731402597475916e-05, 'epoch': 0.46}


 45%|████▌     | 784/1726 [13:33:25<16:17:59, 62.29s/it]
 45%|████▌     | 785/1726 [13:34:25<16:04:13, 61.48s/it]


 45%|████▌     | 785/1726 [13:34:25<16:04:13, 61.48s/it]
 46%|████▌     | 786/1726 [13:35:26<15:59:20, 61.23s/it]


 46%|████▌     | 786/1726 [13:35:26<15:59:20, 61.23s/it]
 46%|████▌     | 787/1726 [13:36:27<15:57:52, 61.21s/it]


 46%|████▌     | 787/1726 [13:36:27<15:57:52, 61.21s/it]
 46%|████▌     | 788/1726 [13:37:29<16:01:14, 61.49s/it]


 46%|████▌     | 788/1726 [13:37:29<16:01:14, 61.49s/it]
 46%|████▌     | 789/1726 [13:38:32<16:06:55, 61.92s/it]


 46%|████▌     | 789/1726 [13:38:32<16:06:55, 61.92sdynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3396
[2024-06-10 14:15:57,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.70 | bwd_microstep: 1358.00 | bwd_inner_microstep: 1357.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 14:15:59,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1275.25 | bwd_inner_microstep: 1275.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 14:16:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1501.53 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 14:16:03,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.52 | bwd_microstep: 1374.36 | bwd_inner_microstep: 1374.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 14:16:05,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.60 | bwd_microstep: 1477.38 | bwd_inner_microstep: 1477.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2580
[2024-06-10 14:16:06,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.08 | bwd_microstep: 1038.81 | bwd_inner_microstep: 1038.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 14:16:08,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.16 | bwd_microstep: 1636.46 | bwd_inner_microstep: 1636.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 14:16:10,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1246.33 | bwd_inner_microstep: 1246.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 14:16:12,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1389.22 | bwd_inner_microstep: 1389.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430
[2024-06-10 14:16:14,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.64 | bwd_microstep: 1404.99 | bwd_inner_microstep: 1404.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3959
[2024-06-10 14:16:16,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.19 | bwd_microstep: 1790.18 | bwd_inner_microstep: 1790.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 14:16:19,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1614.71 | bwd_inner_microstep: 1614.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3932
[2024-06-10 14:16:21,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.00 | bwd_microstep: 1725.13 | bwd_inner_microstep: 1725.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3682
[2024-06-10 14:16:23,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.83 | bwd_microstep: 1480.76 | bwd_inner_microstep: 1480.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 14:16:25,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.24 | bwd_microstep: 1615.43 | bwd_inner_microstep: 1615.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 14:16:27,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.61 | bwd_microstep: 1479.82 | bwd_inner_microstep: 1479.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 14:16:29,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1410.89 | bwd_inner_microstep: 1410.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 14:16:31,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.26 | bwd_microstep: 1552.77 | bwd_inner_microstep: 1552.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 14:16:33,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.84 | bwd_microstep: 1279.09 | bwd_inner_microstep: 1279.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 14:16:35,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.02 | bwd_microstep: 1153.59 | bwd_inner_microstep: 1153.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 14:16:37,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.40 | bwd_microstep: 1297.22 | bwd_inner_microstep: 1297.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 14:16:38,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1251.69 | bwd_inner_microstep: 1251.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851
[2024-06-10 14:16:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1467.49 | bwd_inner_microstep: 1467.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 14:16:42,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1511.55 | bwd_inner_microstep: 1511.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3737
[2024-06-10 14:16:44,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.87 | bwd_microstep: 1483.54 | bwd_inner_microstep: 1483.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 14:16:46,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.05 | bwd_microstep: 1448.84 | bwd_inner_microstep: 1448.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 14:16:48,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1351.71 | bwd_inner_microstep: 1351.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1915
[2024-06-10 14:16:49,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.06 | bwd_microstep: 716.98 | bwd_inner_microstep: 716.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3593
[2024-06-10 14:16:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.88 | bwd_microstep: 1531.13 | bwd_inner_microstep: 1531.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3594
[2024-06-10 14:16:54,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.83 | bwd_microstep: 1528.82 | bwd_inner_microstep: 1528.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3804
[2024-06-10 14:16:56,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.24 | bwd_microstep: 1413.00 | bwd_inner_microstep: 1412.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 14:16:58,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 14:16:58,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1729.82 | bwd_inner_microstep: 1454.59 | bwd_allreduce_microstep: 275.17 | step_microstep: 37.73
[2024-06-10 14:16:58,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16872.44 | bwd: 45536.50 | bwd_inner: 45260.42 | bwd_allreduce: 275.40 | step: 39.24
{'loss': 1.2467, 'learning_rate': 2.369452112034379e-05, 'epoch': 0.46}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 14:17:00,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.16 | bwd_microstep: 1374.67 | bwd_inner_microstep: 1374.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 14:17:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.73 | bwd_microstep: 800.48 | bwd_inner_microstep: 800.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3385
[2024-06-10 14:17:03,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1354.03 | bwd_inner_microstep: 1354.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2262
[2024-06-10 14:17:04,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.05 | bwd_microstep: 902.11 | bwd_inner_microstep: 902.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2114
[2024-06-10 14:17:05,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.11 | bwd_microstep: 827.97 | bwd_inner_microstep: 827.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 14:17:07,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 14:17:09,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.20 | bwd_microstep: 1484.20 | bwd_inner_microstep: 1484.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4073
[2024-06-10 14:17:11,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.47 | bwd_microstep: 1629.85 | bwd_inner_microstep: 1629.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 14:17:13,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1496.34 | bwd_inner_microstep: 1496.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4033
[2024-06-10 14:17:16,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.08 | bwd_microstep: 1647.16 | bwd_inner_microstep: 1647.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 14:17:18,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1483.75 | bwd_inner_microstep: 1483.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 14:17:20,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.63 | bwd_microstep: 1429.53 | bwd_inner_microstep: 1429.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3671
[2024-06-10 14:17:22,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.74 | bwd_microstep: 1596.21 | bwd_inner_microstep: 1596.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 14:17:24,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.77 | bwd_microstep: 1485.28 | bwd_inner_microstep: 1485.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3472
[2024-06-10 14:17:26,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.08 | bwd_microstep: 1641.60 | bwd_inner_microstep: 1641.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742
[2024-06-10 14:17:28,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1525.09 | bwd_inner_microstep: 1525.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3524
[2024-06-10 14:17:30,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.68 | bwd_microstep: 1451.92 | bwd_inner_microstep: 1451.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 14:17:32,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1488.12 | bwd_inner_microstep: 1488.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 14:17:35,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.44 | bwd_microstep: 1656.35 | bwd_inner_microstep: 1656.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671
[2024-06-10 14:17:37,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.20 | bwd_microstep: 1623.43 | bwd_inner_microstep: 1623.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2035
[2024-06-10 14:17:38,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.07 | bwd_microstep: 713.84 | bwd_inner_microstep: 713.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3712
[2024-06-10 14:17:40,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1331.40 | bwd_inner_microstep: 1331.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2049
[2024-06-10 14:17:41,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.60 | bwd_microstep: 849.61 | bwd_inner_microstep: 849.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3613
[2024-06-10 14:17:43,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.01 | bwd_microstep: 1363.84 | bwd_inner_microstep: 1363.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 14:17:45,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1498.86 | bwd_inner_microstep: 1498.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3438
[2024-06-10 14:17:47,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.22 | bwd_microstep: 1296.99 | bwd_inner_microstep: 1296.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3711
[2024-06-10 14:17:49,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.62 | bwd_microstep: 1596.79 | bwd_inner_microstep: 1596.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3589
[2024-06-10 14:17:51,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.65 | bwd_microstep: 1528.50 | bwd_inner_microstep: 1528.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3679
[2024-06-10 14:17:53,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1550.80 | bwd_inner_microstep: 1550.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3569
[2024-06-10 14:17:55,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.22 | bwd_microstep: 1797.43 | bwd_inner_microstep: 1797.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 14:17:58,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1500.47 | bwd_inner_microstep: 1500.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 14:18:00,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.14 | optimizer_step: 6.60
[2024-06-10 14:18:00,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1754.93 | bwd_inner_microstep: 1443.72 | bwd_allreduce_microstep: 311.17 | step_microstep: 37.61
[2024-06-10 14:18:00,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16633.79 | bwd: 45067.10 | bwd_inner: 44755.03 | bwd_allreduce: 311.39 | step: 39.14
{'loss': 1.2151, 'learning_rate': 2.3657626631127484e-05, 'epoch': 0.46}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1943
[2024-06-10 14:18:01,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.85 | bwd_microstep: 885.33 | bwd_inner_microstep: 885.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 14:18:03,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1246.03 | bwd_inner_microstep: 1246.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3402
[2024-06-10 14:18:05,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.90 | bwd_microstep: 1307.72 | bwd_inner_microstep: 1307.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3786
[2024-06-10 14:18:07,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.18 | bwd_microstep: 1448.61 | bwd_inner_microstep: 1448.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 14:18:09,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.73 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 14:18:11,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 1647.29 | bwd_inner_microstep: 1647.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 14:18:12,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.20 | bwd_microstep: 787.92 | bwd_inner_microstep: 787.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 14:18:14,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.51 | bwd_microstep: 1403.66 | bwd_inner_microstep: 1403.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3717
[2024-06-10 14:18:16,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.33 | bwd_microstep: 1392.87 | bwd_inner_microstep: 1392.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3693
[2024-06-10 14:18:17,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1231.76 | bwd_inner_microstep: 1231.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-10 14:18:20,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.08 | bwd_microstep: 1614.59 | bwd_inner_microstep: 1614.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 14:18:21,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.29 | bwd_microstep: 1287.62 | bwd_inner_microstep: 1287.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 14:18:23,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.60 | bwd_microstep: 1335.36 | bwd_inner_microstep: 1335.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 14:18:25,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1484.19 | bwd_inner_microstep: 1484.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 14:18:27,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1454.79 | bwd_inner_microstep: 1454.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 14:18:28,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.50 | bwd_microstep: 798.66 | bwd_inner_microstep: 798.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 14:18:31,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.76 | bwd_microstep: 1505.75 | bwd_inner_microstep: 1505.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-10 14:18:33,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.78 | bwd_microstep: 1523.86 | bwd_inner_microstep: 1523.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 14:18:35,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1505.96 | bwd_inner_microstep: 1505.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 14:18:37,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.42 | bwd_microstep: 1345.13 | bwd_inner_microstep: 1345.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 14:18:38,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 14:18:40,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1384.02 | bwd_inner_microstep: 1383.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919
[2024-06-10 14:18:41,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.41 | bwd_microstep: 687.01 | bwd_inner_microstep: 686.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 14:18:43,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.08 | bwd_microstep: 1278.83 | bwd_inner_microstep: 1278.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 14:18:45,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1445.06 | bwd_inner_microstep: 1445.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429
[2024-06-10 14:18:47,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1281.92 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 14:18:49,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1412.80 | bwd_inner_microstep: 1412.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-10 14:18:50,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 904.23 | bwd_inner_microstep: 904.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 14:18:52,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1551.64 | bwd_inner_microstep: 1551.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 14:18:54,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1346.46 | bwd_inner_microstep: 1346.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3604
[2024-06-10 14:18:56,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.30 | bwd_microstep: 1635.14 | bwd_inner_microstep: 1635.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584
[2024-06-10 14:19:00,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-10 14:19:00,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 3621.28 | bwd_inner_microstep: 1724.47 | bwd_allreduce_microstep: 1896.75 | step_microstep: 38.46
[2024-06-10 14:19:00,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15814.72 | bwd: 44389.17 | bwd_inner: 42491.48 | bwd_allreduce: 1896.98 | step: 40.03
{'loss': 1.2548, 'learning_rate': 2.3620719259769204e-05, 'epoch': 0.46}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608
[2024-06-10 14:19:02,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1307.50 | bwd_inner_microstep: 1307.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 4263
[2024-06-10 14:19:04,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.28 | bwd_microstep: 1330.29 | bwd_inner_microstep: 1330.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3949
[2024-06-10 14:19:06,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1395.40 | bwd_inner_microstep: 1395.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 14:19:07,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.16 | bwd_microstep: 793.19 | bwd_inner_microstep: 793.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 14:19:09,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.01 | bwd_microstep: 1681.99 | bwd_inner_microstep: 1681.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3775
[2024-06-10 14:19:11,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1501.65 | bwd_inner_microstep: 1501.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 14:19:13,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1390.09 | bwd_inner_microstep: 1390.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 14:19:15,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1248.26 | bwd_inner_microstep: 1248.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 14:19:17,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1295.68 | bwd_inner_microstep: 1295.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4073
[2024-06-10 14:19:19,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1554.64 | bwd_inner_microstep: 1554.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 14:19:21,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.36 | bwd_microstep: 1650.70 | bwd_inner_microstep: 1650.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 14:19:23,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.33 | bwd_microstep: 1417.35 | bwd_inner_microstep: 1417.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3709
[2024-06-10 14:19:26,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.83 | bwd_microstep: 1688.03 | bwd_inner_microstep: 1688.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 14:19:28,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.71 | bwd_microstep: 1519.21 | bwd_inner_microstep: 1519.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2069
[2024-06-10 14:19:29,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.48 | bwd_microstep: 1009.97 | bwd_inner_microstep: 1009.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3658
[2024-06-10 14:19:31,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.64 | bwd_microstep: 1714.18 | bwd_inner_microstep: 1714.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1937
[2024-06-10 14:19:33,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.41 | bwd_microstep: 818.31 | bwd_inner_microstep: 818.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-10 14:19:35,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.33 | bwd_microstep: 1421.41 | bwd_inner_microstep: 1421.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 14:19:37,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.26 | bwd_microstep: 1513.32 | bwd_inner_microstep: 1513.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 14:19:38,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.85 | bwd_microstep: 1287.66 | bwd_inner_microstep: 1287.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3835
[2024-06-10 14:19:40,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.28 | bwd_microstep: 1478.92 | bwd_inner_microstep: 1478.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 14:19:42,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.35 | bwd_microstep: 801.05 | bwd_inner_microstep: 801.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123
[2024-06-10 14:19:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.74 | bwd_microstep: 827.80 | bwd_inner_microstep: 827.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 14:19:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.01 | bwd_microstep: 1397.87 | bwd_inner_microstep: 1397.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 14:19:46,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1275.41 | bwd_inner_microstep: 1275.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 14:19:49,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.35 | bwd_microstep: 1633.89 | bwd_inner_microstep: 1633.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 14:19:51,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1407.34 | bwd_inner_microstep: 1407.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3824
[2024-06-10 14:19:53,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.29 | bwd_microstep: 1619.22 | bwd_inner_microstep: 1619.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 14:19:55,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1442.69 | bwd_inner_microstep: 1442.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-10 14:19:57,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.42 | bwd_microstep: 1347.16 | bwd_inner_microstep: 1347.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3821
[2024-06-10 14:19:59,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.83 | bwd_microstep: 1609.70 | bwd_inner_microstep: 1609.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3398
[2024-06-10 14:20:01,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.05 | optimizer_step: 6.65
[2024-06-10 14:20:01,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.18 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1535.15 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.49
[2024-06-10 14:20:01,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16399.32 | bwd: 43922.76 | bwd_inner: 43914.21 | bwd_allreduce: 7.87 | step: 38.92
{'loss': 1.1892, 'learning_rate': 2.3583799136256505e-05, 'epoch': 0.46}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 14:20:03,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1364.77 | bwd_inner_microstep: 1364.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 14:20:05,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.81 | bwd_microstep: 1455.70 | bwd_inner_microstep: 1455.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3846
[2024-06-10 14:20:07,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.11 | bwd_microstep: 1361.75 | bwd_inner_microstep: 1361.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 14:20:09,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.99 | bwd_microstep: 1375.18 | bwd_inner_microstep: 1375.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 14:20:11,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.46 | bwd_microstep: 1291.66 | bwd_inner_microstep: 1291.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-10 14:20:13,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.21 | bwd_microstep: 1636.22 | bwd_inner_microstep: 1636.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 14:20:15,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.09 | bwd_microstep: 1549.51 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 14:20:17,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.06 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 14:20:19,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.82 | bwd_microstep: 1533.03 | bwd_inner_microstep: 1533.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 14:20:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1249.05 | bwd_inner_microstep: 1249.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 14:20:23,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.38 | bwd_microstep: 1638.69 | bwd_inner_microstep: 1638.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 14:20:25,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.01 | bwd_microstep: 1387.35 | bwd_inner_microstep: 1387.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 14:20:27,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.99 | bwd_microstep: 1520.43 | bwd_inner_microstep: 1520.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3444
[2024-06-10 14:20:29,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1311.87 | bwd_inner_microstep: 1311.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 14:20:30,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 14:20:32,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1509.68 | bwd_inner_microstep: 1509.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 14:20:34,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.57 | bwd_microstep: 1418.37 | bwd_inner_microstep: 1418.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 14:20:36,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1393.10 | bwd_inner_microstep: 1393.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 14:20:38,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.02 | bwd_microstep: 974.67 | bwd_inner_microstep: 974.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 14:20:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1406.73 | bwd_inner_microstep: 1406.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3517
[2024-06-10 14:20:42,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1417.84 | bwd_inner_microstep: 1417.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-10 14:20:44,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.67 | bwd_microstep: 1625.75 | bwd_inner_microstep: 1625.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2117
[2024-06-10 14:20:45,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.89 | bwd_microstep: 765.62 | bwd_inner_microstep: 765.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 14:20:47,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.23 | bwd_microstep: 1293.08 | bwd_inner_microstep: 1293.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2356
[2024-06-10 14:20:48,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.59 | bwd_microstep: 1023.60 | bwd_inner_microstep: 1023.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 14:20:50,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.20 | bwd_microstep: 1371.94 | bwd_inner_microstep: 1371.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 14:20:52,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.22 | bwd_microstep: 1299.60 | bwd_inner_microstep: 1299.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3724
[2024-06-10 14:20:54,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1336.16 | bwd_inner_microstep: 1336.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486
[2024-06-10 14:20:56,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.30 | bwd_microstep: 1430.58 | bwd_inner_microstep: 1430.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 14:20:58,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.99 | bwd_microstep: 1453.94 | bwd_inner_microstep: 1453.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556
[2024-06-10 14:21:00,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1428.16 | bwd_inner_microstep: 1428.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2040
[2024-06-10 14:21:02,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 14:21:02,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.95 | bwd_microstep: 2296.83 | bwd_inner_microstep: 995.95 | bwd_allreduce_microstep: 1300.83 | step_microstep: 37.90
[2024-06-10 14:21:02,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16203.19 | bwd: 44646.58 | bwd_inner: 43344.86 | bwd_allreduce: 1301.05 | step: 39.36
{'loss': 1.2478, 'learning_rate': 2.3546866390621888e-05, 'epoch': 0.46}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 14:21:04,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1445.13 | bwd_inner_microstep: 1445.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 14:21:06,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1375.21 | bwd_inner_microstep: 1375.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3897
[2024-06-10 14:21:08,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.21 | bwd_microstep: 1542.57 | bwd_inner_microstep: 1542.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-10 14:21:11,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.57 | bwd_microstep: 1738.80 | bwd_inner_microstep: 1738.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4092
[2024-06-10 14:21:13,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.96 | bwd_microstep: 1628.27 | bwd_inner_microstep: 1628.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 14:21:15,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1249.36 | bwd_inner_microstep: 1249.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 14:21:16,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.26 | bwd_microstep: 1341.73 | bwd_inner_microstep: 1341.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 14:21:18,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1251.08 | bwd_inner_microstep: 1251.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 14:21:20,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.58 | bwd_microstep: 1622.67 | bwd_inner_microstep: 1622.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491
[2024-06-10 14:21:23,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.77 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 14:21:25,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1445.17 | bwd_inner_microstep: 1445.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3656
[2024-06-10 14:21:27,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.64 | bwd_microstep: 1680.73 | bwd_inner_microstep: 1680.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3504
[2024-06-10 14:21:29,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.25 | bwd_microstep: 1536.48 | bwd_inner_microstep: 1536.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3519
[2024-06-10 14:21:31,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.74 | bwd_microstep: 1684.08 | bwd_inner_microstep: 1684.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 2699
[2024-06-10 14:21:33,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.10 | bwd_microstep: 1258.31 | bwd_inner_microstep: 1258.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 14:21:35,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1382.25 | bwd_inner_microstep: 1382.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2080
[2024-06-10 14:21:36,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.18 | bwd_microstep: 848.90 | bwd_inner_microstep: 848.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2283
[2024-06-10 14:21:37,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.47 | bwd_microstep: 939.10 | bwd_inner_microstep: 939.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 14:21:39,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1289.85 | bwd_inner_microstep: 1289.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 14:21:41,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.06 | bwd_microstep: 1425.72 | bwd_inner_microstep: 1425.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 14:21:43,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1318.06 | bwd_inner_microstep: 1318.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 14:21:44,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.44 | bwd_microstep: 804.24 | bwd_inner_microstep: 804.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 14:21:45,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.71 | bwd_microstep: 883.51 | bwd_inner_microstep: 883.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 14:21:47,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1432.17 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 14:21:49,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1393.80 | bwd_inner_microstep: 1393.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4100
[2024-06-10 14:21:52,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 680.62 | bwd_microstep: 1840.92 | bwd_inner_microstep: 1840.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-10 14:21:54,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1591.50 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593
[2024-06-10 14:21:56,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.09 | bwd_microstep: 1370.96 | bwd_inner_microstep: 1370.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1506
[2024-06-10 14:21:57,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 216.13 | bwd_microstep: 559.17 | bwd_inner_microstep: 559.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564
[2024-06-10 14:21:59,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1423.59 | bwd_inner_microstep: 1423.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3569
[2024-06-10 14:22:01,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.76 | bwd_microstep: 1472.63 | bwd_inner_microstep: 1472.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3442
[2024-06-10 14:22:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.18 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 14:22:04,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 2199.51 | bwd_inner_microstep: 1760.41 | bwd_allreduce_microstep: 439.05 | step_microstep: 39.57
[2024-06-10 14:22:04,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16351.34 | bwd: 44553.13 | bwd_inner: 44113.13 | bwd_allreduce: 439.29 | step: 42.32
/it]
 46%|████▌     | 790/1726 [13:39:35<16:09:47, 62.17s/it]


 46%|████▌     | 790/1726 [13:39:35<16:09:47, 62.17s/it]
 46%|████▌     | 791/1726 [13:40:37<16:08:10, 62.13s/it]


 46%|████▌     | 791/1726 [13:40:37<16:08:10, 62.13s/it]
 46%|████▌     | 792/1726 [13:41:37<15:59:41, 61.65s/it]


 46%|████▌     | 792/1726 [13:41:37<15:59:41, 61.65s/it]
 46%|████▌     | 793/1726 [13:42:38<15:54:00, 61.35s/it]


 46%|████▌     | 793/1726 [13:42:38<15:54:00, 61.35s/it]
 46%|████▌     | 794/1726 [13:43:39<15:52:14, 61.30s/it]


 46%|████▌     | 794/1726 [13:43:39<15:52:14, 61.30s/it]
 46%|████▌     | 795/1726 [13:44:40<15:51:02, 61.{'loss': 1.2756, 'learning_rate': 2.3509921152942276e-05, 'epoch': 0.46}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916
[2024-06-10 14:22:05,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.22 | bwd_microstep: 713.09 | bwd_inner_microstep: 712.96 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4105
[2024-06-10 14:22:07,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.94 | bwd_microstep: 1632.48 | bwd_inner_microstep: 1632.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 14:22:09,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.83 | bwd_microstep: 1289.16 | bwd_inner_microstep: 1289.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 14:22:10,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.53 | bwd_microstep: 1247.63 | bwd_inner_microstep: 1247.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3771
[2024-06-10 14:22:12,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.07 | bwd_microstep: 1591.79 | bwd_inner_microstep: 1591.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 14:22:14,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.36 | bwd_microstep: 1450.29 | bwd_inner_microstep: 1450.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 14:22:16,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.47 | bwd_microstep: 1288.27 | bwd_inner_microstep: 1288.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 14:22:18,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1251.44 | bwd_inner_microstep: 1251.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 14:22:20,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1286.29 | bwd_inner_microstep: 1286.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 14:22:22,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.16 | bwd_microstep: 1254.80 | bwd_inner_microstep: 1254.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 14:22:23,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3463
[2024-06-10 14:22:25,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.89 | bwd_microstep: 1340.90 | bwd_inner_microstep: 1340.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 873
[2024-06-10 14:22:26,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.38 | bwd_microstep: 365.38 | bwd_inner_microstep: 365.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 14:22:28,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.27 | bwd_microstep: 1525.57 | bwd_inner_microstep: 1525.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3518
[2024-06-10 14:22:30,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.61 | bwd_microstep: 1600.47 | bwd_inner_microstep: 1600.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 14:22:32,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1387.79 | bwd_inner_microstep: 1387.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472
[2024-06-10 14:22:34,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.61 | bwd_microstep: 1245.06 | bwd_inner_microstep: 1245.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3636
[2024-06-10 14:22:35,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1247.74 | bwd_inner_microstep: 1247.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 14:22:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1559.35 | bwd_inner_microstep: 1559.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 14:22:39,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1379.31 | bwd_inner_microstep: 1379.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 14:22:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.55 | bwd_microstep: 1403.44 | bwd_inner_microstep: 1403.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 627
[2024-06-10 14:22:42,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.71 | bwd_microstep: 261.65 | bwd_inner_microstep: 261.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2010
[2024-06-10 14:22:43,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.39 | bwd_microstep: 710.04 | bwd_inner_microstep: 710.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 14:22:45,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1417.72 | bwd_inner_microstep: 1417.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3587
[2024-06-10 14:22:47,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.85 | bwd_microstep: 1535.41 | bwd_inner_microstep: 1535.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 14:22:49,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.73 | bwd_microstep: 1658.32 | bwd_inner_microstep: 1658.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2224
[2024-06-10 14:22:50,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 770.94 | bwd_inner_microstep: 770.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3567
[2024-06-10 14:22:52,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 1433.88 | bwd_inner_microstep: 1433.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1086
[2024-06-10 14:22:53,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 162.32 | bwd_microstep: 417.38 | bwd_inner_microstep: 417.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2474
[2024-06-10 14:22:54,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.10 | bwd_microstep: 859.49 | bwd_inner_microstep: 859.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2266
[2024-06-10 14:22:55,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.88 | bwd_microstep: 1002.73 | bwd_inner_microstep: 1002.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3817
[2024-06-10 14:23:07,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.35 | optimizer_step: 6.63
[2024-06-10 14:23:07,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.82 | bwd_microstep: 11220.35 | bwd_inner_microstep: 1945.64 | bwd_allreduce_microstep: 9274.66 | step_microstep: 39.89
[2024-06-10 14:23:07,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14686.06 | bwd: 48594.17 | bwd_inner: 39318.51 | bwd_allreduce: 9274.93 | step: 41.53
{'loss': 1.178, 'learning_rate': 2.3472963553338614e-05, 'epoch': 0.46}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487
[2024-06-10 14:23:09,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1328.82 | bwd_inner_microstep: 1328.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3478
[2024-06-10 14:23:11,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.98 | bwd_microstep: 1214.52 | bwd_inner_microstep: 1214.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2307
[2024-06-10 14:23:12,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 977.81 | bwd_inner_microstep: 977.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 14:23:13,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 677.48 | bwd_inner_microstep: 677.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 14:23:15,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1382.20 | bwd_inner_microstep: 1382.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 14:23:17,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.79 | bwd_microstep: 1346.55 | bwd_inner_microstep: 1346.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 14:23:18,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.71 | bwd_microstep: 788.78 | bwd_inner_microstep: 788.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 14:23:20,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.46 | bwd_microstep: 1278.14 | bwd_inner_microstep: 1278.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597
[2024-06-10 14:23:21,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.28 | bwd_microstep: 1304.03 | bwd_inner_microstep: 1304.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-10 14:23:23,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.46 | bwd_microstep: 1311.83 | bwd_inner_microstep: 1311.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3662
[2024-06-10 14:23:25,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.11 | bwd_microstep: 1321.40 | bwd_inner_microstep: 1321.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 14:23:27,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1253.63 | bwd_inner_microstep: 1253.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 14:23:29,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.50 | bwd_microstep: 1482.51 | bwd_inner_microstep: 1482.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3543
[2024-06-10 14:23:31,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.54 | bwd_microstep: 1534.66 | bwd_inner_microstep: 1534.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 14:23:33,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1464.45 | bwd_inner_microstep: 1464.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537
[2024-06-10 14:23:35,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.75 | bwd_microstep: 1584.98 | bwd_inner_microstep: 1584.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 14:23:37,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1373.61 | bwd_inner_microstep: 1373.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 14:23:39,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.10 | bwd_microstep: 1156.58 | bwd_inner_microstep: 1156.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1016
[2024-06-10 14:23:39,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.63 | bwd_microstep: 396.17 | bwd_inner_microstep: 396.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 14:23:41,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.20 | bwd_microstep: 972.01 | bwd_inner_microstep: 971.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 14:23:43,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1458.56 | bwd_inner_microstep: 1458.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 14:23:44,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.29 | bwd_microstep: 1279.71 | bwd_inner_microstep: 1279.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843
[2024-06-10 14:23:47,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.17 | bwd_microstep: 1662.36 | bwd_inner_microstep: 1662.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 14:23:48,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.56 | bwd_microstep: 1294.01 | bwd_inner_microstep: 1293.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3235
[2024-06-10 14:23:50,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.39 | bwd_microstep: 1278.56 | bwd_inner_microstep: 1278.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 14:23:52,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3552
[2024-06-10 14:23:54,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1276.48 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2894
[2024-06-10 14:23:56,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.43 | bwd_microstep: 1180.29 | bwd_inner_microstep: 1180.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3780
[2024-06-10 14:23:58,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1472.42 | bwd_inner_microstep: 1472.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3613
[2024-06-10 14:24:00,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.70 | bwd_microstep: 1432.52 | bwd_inner_microstep: 1432.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2215
[2024-06-10 14:24:01,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.33 | bwd_microstep: 862.50 | bwd_inner_microstep: 862.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-10 14:24:09,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-10 14:24:09,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.12 | bwd_microstep: 7284.38 | bwd_inner_microstep: 1856.87 | bwd_allreduce_microstep: 5427.45 | step_microstep: 39.33
[2024-06-10 14:24:09,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15196.25 | bwd: 46026.05 | bwd_inner: 40597.68 | bwd_allreduce: 5427.68 | step: 40.91
{'loss': 1.2632, 'learning_rate': 2.3435993721975365e-05, 'epoch': 0.46}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1884
[2024-06-10 14:24:10,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.11 | bwd_microstep: 765.26 | bwd_inner_microstep: 765.13 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 14:24:12,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1273.42 | bwd_inner_microstep: 1273.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487
[2024-06-10 14:24:13,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1342.33 | bwd_inner_microstep: 1342.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 14:24:15,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.37 | bwd_microstep: 1146.16 | bwd_inner_microstep: 1146.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263
[2024-06-10 14:24:16,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.44 | bwd_microstep: 969.83 | bwd_inner_microstep: 969.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758
[2024-06-10 14:24:19,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.70 | bwd_microstep: 1637.84 | bwd_inner_microstep: 1637.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-10 14:24:20,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1313.72 | bwd_inner_microstep: 1313.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 14:24:22,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1382.28 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3754
[2024-06-10 14:24:24,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1370.78 | bwd_inner_microstep: 1370.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875
[2024-06-10 14:24:25,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.02 | bwd_microstep: 710.39 | bwd_inner_microstep: 710.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 14:24:27,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1396.30 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532
[2024-06-10 14:24:29,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1324.86 | bwd_inner_microstep: 1324.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 14:24:31,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.94 | bwd_microstep: 1290.91 | bwd_inner_microstep: 1290.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3679
[2024-06-10 14:24:33,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.59 | bwd_microstep: 1718.61 | bwd_inner_microstep: 1718.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 14:24:35,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.49 | bwd_microstep: 1475.69 | bwd_inner_microstep: 1475.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3649
[2024-06-10 14:24:37,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.59 | bwd_microstep: 1352.35 | bwd_inner_microstep: 1352.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531
[2024-06-10 14:24:39,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.59 | bwd_microstep: 1589.53 | bwd_inner_microstep: 1589.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 14:24:41,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.99 | bwd_microstep: 1341.58 | bwd_inner_microstep: 1341.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 14:24:43,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.10 | bwd_microstep: 1358.20 | bwd_inner_microstep: 1358.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 14:24:45,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.96 | bwd_microstep: 1511.76 | bwd_inner_microstep: 1511.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 14:24:47,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.46 | bwd_microstep: 1399.71 | bwd_inner_microstep: 1399.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3713
[2024-06-10 14:24:49,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.18 | bwd_microstep: 1597.04 | bwd_inner_microstep: 1597.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3660
[2024-06-10 14:24:51,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.87 | bwd_microstep: 1482.58 | bwd_inner_microstep: 1482.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3726
[2024-06-10 14:24:53,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.20 | bwd_microstep: 1565.50 | bwd_inner_microstep: 1565.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716
[2024-06-10 14:24:55,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.77 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2461
[2024-06-10 14:24:56,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.10 | bwd_microstep: 888.62 | bwd_inner_microstep: 888.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 14:24:59,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.49 | bwd_microstep: 1504.12 | bwd_inner_microstep: 1504.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-10 14:25:00,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.75 | bwd_microstep: 1343.69 | bwd_inner_microstep: 1343.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 14:25:03,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.18 | bwd_microstep: 1596.14 | bwd_inner_microstep: 1596.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 14:25:05,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1645.71 | bwd_inner_microstep: 1645.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 14:25:07,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1508.29 | bwd_inner_microstep: 1508.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 14:25:10,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 14:25:10,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.58 | bwd_microstep: 2757.61 | bwd_inner_microstep: 1926.11 | bwd_allreduce_microstep: 831.46 | step_microstep: 40.07
[2024-06-10 14:25:10,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16351.11 | bwd: 44901.67 | bwd_inner: 44069.21 | bwd_allreduce: 831.73 | step: 41.61
{'loss': 1.1872, 'learning_rate': 2.3399011789060092e-05, 'epoch': 0.46}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 14:25:12,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1370.71 | bwd_inner_microstep: 1370.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2932
[2024-06-10 14:25:14,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.13 | bwd_microstep: 1072.15 | bwd_inner_microstep: 1072.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 14:25:16,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.77 | bwd_microstep: 1648.59 | bwd_inner_microstep: 1648.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 14:25:18,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1280.97 | bwd_inner_microstep: 1280.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-10 14:25:20,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.47 | bwd_microstep: 1412.54 | bwd_inner_microstep: 1412.32 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.32
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3479
[2024-06-10 14:25:21,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.34 | bwd_microstep: 1243.09 | bwd_inner_microstep: 1243.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 14:25:23,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1278.30 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 14:25:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1381.88 | bwd_inner_microstep: 1381.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3487
[2024-06-10 14:25:27,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.54 | bwd_microstep: 1261.10 | bwd_inner_microstep: 1261.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 14:25:29,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1341.83 | bwd_inner_microstep: 1341.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 14:25:31,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1339.84 | bwd_inner_microstep: 1339.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3517
[2024-06-10 14:25:32,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.72 | bwd_microstep: 1369.38 | bwd_inner_microstep: 1369.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 14:25:34,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1452.91 | bwd_inner_microstep: 1452.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2423
[2024-06-10 14:25:36,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.36 | bwd_microstep: 1006.62 | bwd_inner_microstep: 1006.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-10 14:25:37,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.62 | bwd_microstep: 895.57 | bwd_inner_microstep: 895.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 14:25:39,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1478.33 | bwd_inner_microstep: 1478.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 14:25:41,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1395.35 | bwd_inner_microstep: 1395.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 14:25:42,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.81 | bwd_microstep: 788.71 | bwd_inner_microstep: 788.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 14:25:44,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.47 | bwd_microstep: 1449.66 | bwd_inner_microstep: 1449.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 14:25:46,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1277.41 | bwd_inner_microstep: 1277.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 14:25:48,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3307
[2024-06-10 14:25:49,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.20 | bwd_microstep: 1197.03 | bwd_inner_microstep: 1197.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1978
[2024-06-10 14:25:50,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.91 | bwd_microstep: 703.11 | bwd_inner_microstep: 703.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 14:25:53,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1659.06 | bwd_inner_microstep: 1659.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3679
[2024-06-10 14:25:55,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2541
[2024-06-10 14:25:56,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.00 | bwd_microstep: 1060.42 | bwd_inner_microstep: 1060.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944
[2024-06-10 14:25:57,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.16 | bwd_microstep: 728.67 | bwd_inner_microstep: 728.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 14:25:59,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.98 | bwd_microstep: 1308.49 | bwd_inner_microstep: 1308.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3644
[2024-06-10 14:26:01,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.10 | bwd_microstep: 1347.63 | bwd_inner_microstep: 1347.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2332
[2024-06-10 14:26:02,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.60 | bwd_microstep: 949.81 | bwd_inner_microstep: 949.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-10 14:26:04,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.91 | bwd_microstep: 1404.84 | bwd_inner_microstep: 1404.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2126
[2024-06-10 14:26:10,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.45 | optimizer_step: 6.59
[2024-06-10 14:26:10,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.80 | bwd_microstep: 5835.16 | bwd_inner_microstep: 977.09 | bwd_allreduce_microstep: 4858.00 | step_microstep: 40.31
[2024-06-10 14:26:10,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14933.09 | bwd: 44605.04 | bwd_inner: 39745.88 | bwd_allreduce: 4858.39 | step: 42.32
{'loss': 1.1484, 'learning_rate': 2.3362017884842967e-05, 'epoch': 0.46}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 14:26:12,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.38 | bwd_microstep: 1469.78 | bwd_inner_microstep: 1469.72 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3470
[2024-06-10 14:26:14,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.12 | bwd_microstep: 1239.11 | bwd_inner_microstep: 1239.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822
[2024-06-10 14:26:16,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.20 | bwd_microstep: 1575.03 | bwd_inner_microstep: 1575.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 14:26:18,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1383.57 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 14:26:20,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.92 | bwd_microstep: 1381.04 | bwd_inner_microstep: 1381.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 14:26:22,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 14:26:23,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.03 | bwd_microstep: 794.65 | bwd_inner_microstep: 794.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 14:26:25,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.14 | bwd_microstep: 1531.68 | bwd_inner_microstep: 1531.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 14:26:26,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.68 | bwd_microstep: 797.42 | bwd_inner_microstep: 797.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3578
[2024-06-10 14:26:28,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.70 | bwd_microstep: 1364.36 | bwd_inner_microstep: 1364.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3683
[2024-06-10 14:26:30,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.48 | bwd_microstep: 1826.12 | bwd_inner_microstep: 1826.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 14:26:32,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1443.45 | bwd_inner_microstep: 1443.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-10 14:26:35,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.38 | bwd_microstep: 1541.25 | bwd_inner_microstep: 1541.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 476
[2024-06-10 14:26:35,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 91.23 | bwd_microstep: 228.67 | bwd_inner_microstep: 228.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 14:26:37,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.69 | bwd_microstep: 1283.95 | bwd_inner_microstep: 1283.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 14:26:38,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.21 | bwd_microstep: 797.71 | bwd_inner_microstep: 797.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1999
[2024-06-10 14:26:39,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 898.31 | bwd_inner_microstep: 898.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3647
[2024-06-10 14:26:41,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1478.75 | bwd_inner_microstep: 1478.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 14:26:43,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.26 | bwd_microstep: 1180.74 | bwd_inner_microstep: 1180.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-10 14:26:45,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1610.12 | bwd_inner_microstep: 1610.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 14:26:47,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.08 | bwd_microstep: 1315.98 | bwd_inner_microstep: 1315.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 14:26:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.66 | bwd_microstep: 880.39 | bwd_inner_microstep: 880.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536
[2024-06-10 14:26:50,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.20 | bwd_microstep: 1416.81 | bwd_inner_microstep: 1416.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 14:26:51,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.42 | bwd_microstep: 702.44 | bwd_inner_microstep: 702.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 14:26:52,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.12 | bwd_microstep: 976.66 | bwd_inner_microstep: 976.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3466
[2024-06-10 14:26:54,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.24 | bwd_microstep: 1533.72 | bwd_inner_microstep: 1533.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 14:26:56,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.92 | bwd_microstep: 1408.49 | bwd_inner_microstep: 1408.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 14:26:58,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.03 | bwd_microstep: 1322.89 | bwd_inner_microstep: 1322.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 14:27:00,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.40 | bwd_microstep: 1655.28 | bwd_inner_microstep: 1655.09 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 14:27:02,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.32 | bwd_microstep: 1469.98 | bwd_inner_microstep: 1469.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-10 14:27:04,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.34 | bwd_microstep: 1438.24 | bwd_inner_microstep: 1438.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 14:27:13,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.44 | optimizer_step: 6.62
[2024-06-10 14:27:13,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 8281.43 | bwd_inner_microstep: 1522.27 | bwd_allreduce_microstep: 6759.09 | step_microstep: 40.21
[2024-06-10 14:27:13,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15183.49 | bwd: 47517.19 | bwd_inner: 40756.98 | bwd_allreduce: 6759.44 | step: 41.82
29s/it]


 46%|████▌     | 795/1726 [13:44:40<15:51:02, 61.29s/it]
 46%|████▌     | 796/1726 [13:45:44<16:00:49, 61.99s/it]


 46%|████▌     | 796/1726 [13:45:44<16:00:49, 61.99s/it]
 46%|████▌     | 797/1726 [13:46:45<15:57:50, 61.86s/it]


 46%|████▌     | 797/1726 [13:46:45<15:57:50, 61.86s/it]
 46%|████▌     | 798/1726 [13:47:47<15:55:37, 61.79s/it]


 46%|████▌     | 798/1726 [13:47:47<15:55:37, 61.79s/it]
 46%|████▋     | 799/1726 [13:48:47<15:45:47, 61.22s/it]


 46%|████▋     | 799/1726 [13:48:47<15:45:47, 61.22s/it]
 46%|████▋     | 800/1726 [13:49:50<15:53:13, 61.76s/it]
                                                        {'loss': 1.2246, 'learning_rate': 2.3325012139616333e-05, 'epoch': 0.46}


 46%|████▋     | 800/1726 [13:49:50<15:53:13, 61.76s/it][INFO|trainer.py:2936] 2024-06-10 14:27:16,071 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800
[INFO|configuration_utils.py:473] 2024-06-10 14:27:16,093 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/config.json
[INFO|configuration_utils.py:594] 2024-06-10 14:27:16,131 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-10 14:27:23,894 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-10 14:27:23,920 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-10 14:27:23,928 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-10 14:27:23,930 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/added_tokens.json
[2024-06-10 14:27:24,192] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step800 is about to be saved!
[2024-06-10 14:27:24,204] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/mp_rank_00_model_states.pt
[2024-06-10 14:27:24,204] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/mp_rank_00_model_states.pt...
[2024-06-10 14:27:32,653] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/mp_rank_00_model_states.pt.
[2024-06-10 14:27:32,687] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-10 14:27:44,856] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-10 14:27:44,953] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-10 14:27:44,954] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step800 is ready now!
[INFO|trainer.py:3028] 2024-06-10 14:27:45,176 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-200] due to args.save_total_limit
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1270
[2024-06-10 14:27:46,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.54 | bwd_microstep: 453.65 | bwd_inner_microstep: 453.51 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 14:27:48,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1470.00 | bwd_inner_microstep: 1469.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 14:27:50,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.84 | bwd_microstep: 1485.80 | bwd_inner_microstep: 1485.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296
[2024-06-10 14:27:51,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.78 | bwd_microstep: 967.75 | bwd_inner_microstep: 967.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 14:27:53,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1406.48 | bwd_inner_microstep: 1406.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 14:27:55,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.92 | bwd_microstep: 1277.69 | bwd_inner_microstep: 1277.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 14:27:57,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.09 | bwd_microstep: 1283.05 | bwd_inner_microstep: 1283.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507
[2024-06-10 14:27:59,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.24 | bwd_microstep: 1190.70 | bwd_inner_microstep: 1190.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2457
[2024-06-10 14:28:00,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.83 | bwd_microstep: 1022.84 | bwd_inner_microstep: 1022.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4006
[2024-06-10 14:28:02,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.72 | bwd_microstep: 1635.82 | bwd_inner_microstep: 1635.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2471
[2024-06-10 14:28:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.38 | bwd_microstep: 979.96 | bwd_inner_microstep: 979.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 14:28:05,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1344.56 | bwd_inner_microstep: 1344.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 14:28:08,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.50 | bwd_microstep: 1491.91 | bwd_inner_microstep: 1491.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 14:28:10,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.54 | bwd_microstep: 1605.52 | bwd_inner_microstep: 1605.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3519
[2024-06-10 14:28:12,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.16 | bwd_microstep: 1432.13 | bwd_inner_microstep: 1432.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 14:28:14,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1371.56 | bwd_inner_microstep: 1371.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100
[2024-06-10 14:28:15,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.89 | bwd_microstep: 914.80 | bwd_inner_microstep: 914.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513
[2024-06-10 14:28:17,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.81 | bwd_microstep: 1409.09 | bwd_inner_microstep: 1409.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 14:28:19,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1431.26 | bwd_inner_microstep: 1431.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913
[2024-06-10 14:28:20,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.57 | bwd_microstep: 686.95 | bwd_inner_microstep: 686.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2038
[2024-06-10 14:28:21,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.51 | bwd_microstep: 868.31 | bwd_inner_microstep: 868.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 14:28:23,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1603.33 | bwd_inner_microstep: 1603.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3505
[2024-06-10 14:28:25,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.74 | bwd_inner_microstep: 1247.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 14:28:27,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1411.08 | bwd_inner_microstep: 1411.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 14:28:28,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.91 | bwd_microstep: 1180.20 | bwd_inner_microstep: 1180.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 14:28:30,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1357.05 | bwd_inner_microstep: 1357.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 14:28:32,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1374.44 | bwd_inner_microstep: 1374.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 14:28:34,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1292.21 | bwd_inner_microstep: 1292.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 14:28:36,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.55 | bwd_microstep: 1641.72 | bwd_inner_microstep: 1641.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3761
[2024-06-10 14:28:38,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.50 | bwd_microstep: 1469.37 | bwd_inner_microstep: 1469.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770
[2024-06-10 14:28:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1566.32 | bwd_inner_microstep: 1566.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3757
[2024-06-10 14:28:46,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 14:28:46,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.95 | bwd_microstep: 5030.98 | bwd_inner_microstep: 1979.37 | bwd_allreduce_microstep: 3051.56 | step_microstep: 38.09
[2024-06-10 14:28:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15542.64 | bwd: 44904.33 | bwd_inner: 41851.77 | bwd_allreduce: 3051.84 | step: 39.79
{'loss': 1.2194, 'learning_rate': 2.3287994683714222e-05, 'epoch': 0.46}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 14:28:48,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1371.48 | bwd_inner_microstep: 1371.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 14:28:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.65 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 14:28:52,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1379.21 | bwd_inner_microstep: 1379.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-10 14:28:53,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.51 | bwd_microstep: 773.87 | bwd_inner_microstep: 773.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797
[2024-06-10 14:28:55,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.46 | bwd_microstep: 1643.40 | bwd_inner_microstep: 1643.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 14:28:57,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 14:28:59,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1277.40 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2638
[2024-06-10 14:29:00,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.91 | bwd_microstep: 919.29 | bwd_inner_microstep: 919.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 14:29:02,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1386.02 | bwd_inner_microstep: 1385.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 14:29:04,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1286.00 | bwd_inner_microstep: 1285.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 14:29:06,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1383.34 | bwd_inner_microstep: 1383.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 14:29:07,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1335.98 | bwd_inner_microstep: 1335.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3660
[2024-06-10 14:29:09,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1320.91 | bwd_inner_microstep: 1320.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 14:29:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 795.02 | bwd_inner_microstep: 794.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2014
[2024-06-10 14:29:11,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.82 | bwd_microstep: 835.79 | bwd_inner_microstep: 835.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 14:29:13,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.67 | bwd_microstep: 1423.46 | bwd_inner_microstep: 1423.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 14:29:15,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1383.68 | bwd_inner_microstep: 1383.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:29:17,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3830
[2024-06-10 14:29:19,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 14:29:21,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1407.98 | bwd_inner_microstep: 1407.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 14:29:23,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.59 | bwd_microstep: 1609.78 | bwd_inner_microstep: 1609.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3563
[2024-06-10 14:29:25,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.01 | bwd_microstep: 1232.37 | bwd_inner_microstep: 1232.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920
[2024-06-10 14:29:26,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.89 | bwd_microstep: 686.34 | bwd_inner_microstep: 686.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 14:29:28,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1475.15 | bwd_inner_microstep: 1475.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 14:29:30,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1553.61 | bwd_inner_microstep: 1553.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235
[2024-06-10 14:29:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.73 | bwd_microstep: 864.81 | bwd_inner_microstep: 864.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1081
[2024-06-10 14:29:32,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 161.53 | bwd_microstep: 420.81 | bwd_inner_microstep: 420.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3439
[2024-06-10 14:29:34,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1310.96 | bwd_inner_microstep: 1310.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 14:29:36,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1344.99 | bwd_inner_microstep: 1344.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 14:29:38,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1380.89 | bwd_inner_microstep: 1380.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 14:29:40,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1395.36 | bwd_inner_microstep: 1395.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3819
[2024-06-10 14:29:47,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.38 | optimizer_step: 6.59
[2024-06-10 14:29:47,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.15 | bwd_microstep: 6816.39 | bwd_inner_microstep: 2042.53 | bwd_allreduce_microstep: 4773.81 | step_microstep: 38.88
[2024-06-10 14:29:47,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15175.15 | bwd: 45417.19 | bwd_inner: 40642.47 | bwd_allreduce: 4774.04 | step: 40.32
{'loss': 1.2191, 'learning_rate': 2.325096564751193e-05, 'epoch': 0.46}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-10 14:29:48,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.20 | bwd_microstep: 790.56 | bwd_inner_microstep: 790.49 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 14:29:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.87 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 14:29:52,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1476.94 | bwd_inner_microstep: 1476.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 14:29:54,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.43 | bwd_inner_microstep: 1479.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-10 14:29:55,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.64 | bwd_microstep: 815.97 | bwd_inner_microstep: 815.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 14:29:57,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1295.16 | bwd_inner_microstep: 1295.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 14:29:59,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.27 | bwd_microstep: 1251.20 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 14:30:00,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.89 | bwd_microstep: 791.33 | bwd_inner_microstep: 791.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 14:30:02,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1342.39 | bwd_inner_microstep: 1342.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 14:30:04,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 14:30:06,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1402.33 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 14:30:07,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.84 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 14:30:09,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1407.20 | bwd_inner_microstep: 1407.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 14:30:11,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1478.65 | bwd_inner_microstep: 1478.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3628
[2024-06-10 14:30:13,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.64 | bwd_microstep: 1398.84 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3689
[2024-06-10 14:30:15,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.74 | bwd_microstep: 1493.52 | bwd_inner_microstep: 1493.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 14:30:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1615.41 | bwd_inner_microstep: 1615.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 14:30:20,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.29 | bwd_microstep: 1656.78 | bwd_inner_microstep: 1656.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 14:30:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.53 | bwd_microstep: 1461.23 | bwd_inner_microstep: 1461.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 14:30:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.55 | bwd_microstep: 1458.66 | bwd_inner_microstep: 1458.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3773
[2024-06-10 14:30:26,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.27 | bwd_microstep: 1345.04 | bwd_inner_microstep: 1345.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3836
[2024-06-10 14:30:28,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.74 | bwd_microstep: 1262.47 | bwd_inner_microstep: 1262.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825
[2024-06-10 14:30:30,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.18 | bwd_microstep: 1585.29 | bwd_inner_microstep: 1585.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 14:30:32,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1511.86 | bwd_inner_microstep: 1511.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816
[2024-06-10 14:30:34,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.39 | bwd_microstep: 1615.67 | bwd_inner_microstep: 1615.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 14:30:37,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.81 | bwd_microstep: 2411.76 | bwd_inner_microstep: 2411.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 14:30:39,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.87 | bwd_microstep: 1593.38 | bwd_inner_microstep: 1593.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-10 14:30:41,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.11 | bwd_microstep: 1587.52 | bwd_inner_microstep: 1587.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 14:30:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.15 | bwd_microstep: 1647.82 | bwd_inner_microstep: 1647.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810
[2024-06-10 14:30:46,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.82 | bwd_microstep: 1752.52 | bwd_inner_microstep: 1752.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471
[2024-06-10 14:30:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.10 | bwd_microstep: 1434.61 | bwd_inner_microstep: 1434.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 14:30:49,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 14:30:49,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.25 | bwd_microstep: 723.25 | bwd_inner_microstep: 709.38 | bwd_allreduce_microstep: 13.82 | step_microstep: 38.87
[2024-06-10 14:30:49,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16574.75 | bwd: 45208.33 | bwd_inner: 45193.58 | bwd_allreduce: 14.07 | step: 40.34
{'loss': 1.199, 'learning_rate': 2.3213925161425533e-05, 'epoch': 0.47}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457
[2024-06-10 14:30:51,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.56 | bwd_microstep: 1565.60 | bwd_inner_microstep: 1565.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2366
[2024-06-10 14:30:53,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.97 | bwd_microstep: 987.62 | bwd_inner_microstep: 987.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 14:30:55,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.83 | bwd_microstep: 1550.89 | bwd_inner_microstep: 1550.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 14:30:56,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.13 | bwd_microstep: 803.82 | bwd_inner_microstep: 803.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 14:30:58,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.35 | bwd_microstep: 1245.32 | bwd_inner_microstep: 1245.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 14:30:59,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 790.14 | bwd_inner_microstep: 790.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883
[2024-06-10 14:31:00,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.09 | bwd_microstep: 680.81 | bwd_inner_microstep: 680.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 14:31:02,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1386.23 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 712
[2024-06-10 14:31:02,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 115.98 | bwd_microstep: 290.04 | bwd_inner_microstep: 290.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3470
[2024-06-10 14:31:04,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.45 | bwd_microstep: 1432.00 | bwd_inner_microstep: 1431.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3508
[2024-06-10 14:31:06,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.21 | bwd_microstep: 1549.01 | bwd_inner_microstep: 1548.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3921
[2024-06-10 14:31:09,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.51 | bwd_microstep: 1740.01 | bwd_inner_microstep: 1739.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3622
[2024-06-10 14:31:11,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.94 | bwd_microstep: 1452.85 | bwd_inner_microstep: 1452.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-10 14:31:13,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.16 | bwd_microstep: 1437.34 | bwd_inner_microstep: 1437.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 14:31:15,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.58 | bwd_microstep: 1487.97 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 14:31:16,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1288.38 | bwd_inner_microstep: 1288.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296
[2024-06-10 14:31:18,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.27 | bwd_microstep: 975.38 | bwd_inner_microstep: 975.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2428
[2024-06-10 14:31:19,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.79 | bwd_microstep: 1038.75 | bwd_inner_microstep: 1038.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3618
[2024-06-10 14:31:21,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 1536.17 | bwd_inner_microstep: 1536.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 14:31:22,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.92 | bwd_microstep: 798.09 | bwd_inner_microstep: 798.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 14:31:24,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1556.39 | bwd_inner_microstep: 1556.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 14:31:26,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1405.48 | bwd_inner_microstep: 1405.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 14:31:28,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1280.61 | bwd_inner_microstep: 1280.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2005
[2024-06-10 14:31:29,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.80 | bwd_microstep: 861.64 | bwd_inner_microstep: 861.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 14:31:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1347.63 | bwd_inner_microstep: 1347.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 14:31:34,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-10 14:31:35,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1398.48 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-10 14:31:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.49 | bwd_microstep: 1372.71 | bwd_inner_microstep: 1372.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3461
[2024-06-10 14:31:39,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.85 | bwd_microstep: 1520.90 | bwd_inner_microstep: 1520.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-10 14:31:42,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.16 | bwd_microstep: 1635.09 | bwd_inner_microstep: 1635.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487
[2024-06-10 14:31:44,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.64 | bwd_microstep: 1442.81 | bwd_inner_microstep: 1442.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773
[2024-06-10 14:31:52,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.62
[2024-06-10 14:31:52,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.04 | bwd_microstep: 7282.92 | bwd_inner_microstep: 2293.44 | bwd_allreduce_microstep: 4989.41 | step_microstep: 38.88
[2024-06-10 14:31:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15382.96 | bwd: 46787.77 | bwd_inner: 41797.43 | bwd_allreduce: 4989.64 | step: 40.39
{'loss': 1.2867, 'learning_rate': 2.3176873355911414e-05, 'epoch': 0.47}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 14:31:53,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.73 | bwd_microstep: 1235.14 | bwd_inner_microstep: 1235.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 14:31:54,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.75 | bwd_microstep: 777.39 | bwd_inner_microstep: 777.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 14:31:56,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.43 | bwd_microstep: 1378.16 | bwd_inner_microstep: 1378.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 14:31:58,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.13 | bwd_microstep: 1545.61 | bwd_inner_microstep: 1545.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 14:32:01,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1489.57 | bwd_inner_microstep: 1489.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 14:32:02,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1383.14 | bwd_inner_microstep: 1383.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 841
[2024-06-10 14:32:03,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.01 | bwd_microstep: 344.78 | bwd_inner_microstep: 344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681
[2024-06-10 14:32:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1326.38 | bwd_inner_microstep: 1326.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 14:32:06,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 794.25 | bwd_inner_microstep: 794.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 14:32:07,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.36 | bwd_microstep: 792.98 | bwd_inner_microstep: 792.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1917
[2024-06-10 14:32:08,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.39 | bwd_microstep: 716.64 | bwd_inner_microstep: 716.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 14:32:10,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1389.27 | bwd_inner_microstep: 1389.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 14:32:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.82 | bwd_microstep: 1513.89 | bwd_inner_microstep: 1513.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-10 14:32:14,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1581.94 | bwd_inner_microstep: 1581.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 14:32:16,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.88 | bwd_microstep: 1480.91 | bwd_inner_microstep: 1480.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 14:32:18,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.38 | bwd_microstep: 1352.57 | bwd_inner_microstep: 1352.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840
[2024-06-10 14:32:20,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.29 | bwd_microstep: 1654.61 | bwd_inner_microstep: 1654.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533
[2024-06-10 14:32:22,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.19 | bwd_microstep: 1196.43 | bwd_inner_microstep: 1196.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 14:32:24,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1394.64 | bwd_inner_microstep: 1394.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 14:32:26,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1357.44 | bwd_inner_microstep: 1357.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 14:32:28,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1411.79 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583
[2024-06-10 14:32:29,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.84 | bwd_microstep: 1236.40 | bwd_inner_microstep: 1236.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3657
[2024-06-10 14:32:32,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.27 | bwd_microstep: 1482.83 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 14:32:33,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1285.81 | bwd_inner_microstep: 1285.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3816
[2024-06-10 14:32:36,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1630.28 | bwd_inner_microstep: 1630.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 14:32:38,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.40 | bwd_microstep: 1467.01 | bwd_inner_microstep: 1466.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 14:32:40,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.48 | bwd_microstep: 1655.21 | bwd_inner_microstep: 1655.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963
[2024-06-10 14:32:41,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.41 | bwd_microstep: 704.82 | bwd_inner_microstep: 704.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3026
[2024-06-10 14:32:43,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.06 | bwd_microstep: 1230.49 | bwd_inner_microstep: 1230.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792
[2024-06-10 14:32:45,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.48 | bwd_microstep: 1549.42 | bwd_inner_microstep: 1549.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3756
[2024-06-10 14:32:47,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.73 | bwd_microstep: 1391.60 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-10 14:32:52,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.22 | optimizer_step: 6.64
[2024-06-10 14:32:52,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.85 | bwd_microstep: 4514.54 | bwd_inner_microstep: 1406.99 | bwd_allreduce_microstep: 3107.49 | step_microstep: 37.91
[2024-06-10 14:32:52,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15375.53 | bwd: 44265.97 | bwd_inner: 41157.56 | bwd_allreduce: 3107.72 | step: 39.38
{'loss': 1.2487, 'learning_rate': 2.3139810361465854e-05, 'epoch': 0.47}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 14:32:54,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.95 | bwd_microstep: 1365.79 | bwd_inner_microstep: 1365.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2363
[2024-06-10 14:32:55,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.25 | bwd_microstep: 987.43 | bwd_inner_microstep: 987.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 14:32:57,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1383.30 | bwd_inner_microstep: 1383.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 14:32:59,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.45 | bwd_microstep: 1639.90 | bwd_inner_microstep: 1639.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 14:33:01,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.76 | bwd_microstep: 1643.58 | bwd_inner_microstep: 1643.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 14:33:03,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.81 | bwd_microstep: 1146.44 | bwd_inner_microstep: 1146.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4089
[2024-06-10 14:33:05,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.89 | bwd_microstep: 1529.84 | bwd_inner_microstep: 1529.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 14:33:07,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3910
[2024-06-10 14:33:09,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.83 | bwd_microstep: 1591.78 | bwd_inner_microstep: 1591.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-10 14:33:11,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.30 | bwd_microstep: 1535.06 | bwd_inner_microstep: 1535.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502
[2024-06-10 14:33:13,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.66 | bwd_microstep: 1347.25 | bwd_inner_microstep: 1347.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 14:33:15,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1349.13 | bwd_inner_microstep: 1349.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1944
[2024-06-10 14:33:16,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.98 | bwd_microstep: 886.22 | bwd_inner_microstep: 886.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416
[2024-06-10 14:33:18,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.34 | bwd_microstep: 1313.21 | bwd_inner_microstep: 1313.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 14:33:19,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.52 | bwd_microstep: 790.51 | bwd_inner_microstep: 790.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3770
[2024-06-10 14:33:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1495.40 | bwd_inner_microstep: 1495.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 14:33:23,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.63 | bwd_microstep: 1474.91 | bwd_inner_microstep: 1474.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 14:33:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1587.86 | bwd_inner_microstep: 1587.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 14:33:27,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1495.13 | bwd_inner_microstep: 1495.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 14:33:29,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.55 | bwd_microstep: 973.96 | bwd_inner_microstep: 973.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898
[2024-06-10 14:33:31,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.05 | bwd_microstep: 1578.43 | bwd_inner_microstep: 1578.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 14:33:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1450.96 | bwd_inner_microstep: 1450.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818
[2024-06-10 14:33:35,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.01 | bwd_microstep: 1719.76 | bwd_inner_microstep: 1719.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:33:37,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.68 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 14:33:39,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.77 | bwd_microstep: 1405.55 | bwd_inner_microstep: 1405.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3452
[2024-06-10 14:33:41,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.78 | bwd_microstep: 1218.82 | bwd_inner_microstep: 1218.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-10 14:33:42,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1308.25 | bwd_inner_microstep: 1308.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 14:33:45,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.55 | bwd_microstep: 1502.24 | bwd_inner_microstep: 1502.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 14:33:46,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1386.12 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262
[2024-06-10 14:33:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.65 | bwd_microstep: 972.27 | bwd_inner_microstep: 972.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 14:33:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.60 | bwd_microstep: 1639.79 | bwd_inner_microstep: 1639.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 14:33:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 14:33:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 3911.89 | bwd_inner_microstep: 1529.19 | bwd_allreduce_microstep: 2382.64 | step_microstep: 38.14
[2024-06-10 14:33:55,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.06 | bwd: 46270.15 | bwd_inner: 43886.59 | bwd_allreduce: 2382.88 | step: 39.56

 46%|████▋     | 801/1726 [13:51:23<18:16:09, 71.10s/it]


 46%|████▋     | 801/1726 [13:51:23<18:16:09, 71.10s/it]
 46%|████▋     | 802/1726 [13:52:24<17:27:56, 68.05s/it]


 46%|████▋     | 802/1726 [13:52:24<17:27:56, 68.05s/it]
 47%|████▋     | 803/1726 [13:53:26<16:59:28, 66.27s/it]


 47%|████▋     | 803/1726 [13:53:26<16:59:28, 66.27s/it]
 47%|████▋     | 804/1726 [13:54:28<16:41:00, 65.14s/it]


 47%|████▋     | 804/1726 [13:54:28<16:41:00, 65.14s/it]
 47%|████▋     | 805/1726 [13:55:28<16:16:04, 63.59s/it]


 47%|████▋     | 805/1726 [13:55:28<16:16:04, 63.59s/it]
 47%|████▋     | 806/1726 [13:56:31<16:11:52, 63.38s/{'loss': 1.2709, 'learning_rate': 2.310273630862453e-05, 'epoch': 0.47}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 14:33:56,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1385.84 | bwd_inner_microstep: 1385.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1858
[2024-06-10 14:33:57,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.67 | bwd_microstep: 674.01 | bwd_inner_microstep: 673.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-10 14:34:00,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.72 | bwd_microstep: 1554.24 | bwd_inner_microstep: 1554.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837
[2024-06-10 14:34:02,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.95 | bwd_microstep: 1655.45 | bwd_inner_microstep: 1655.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 14:34:04,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.33 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 14:34:06,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1352.36 | bwd_inner_microstep: 1352.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 14:34:08,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1381.24 | bwd_inner_microstep: 1381.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 14:34:09,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 794.03 | bwd_inner_microstep: 794.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2151
[2024-06-10 14:34:10,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.01 | bwd_microstep: 927.93 | bwd_inner_microstep: 927.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411
[2024-06-10 14:34:12,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1310.83 | bwd_inner_microstep: 1310.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 14:34:14,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1381.85 | bwd_inner_microstep: 1381.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 14:34:16,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:34:18,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.69 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2175
[2024-06-10 14:34:19,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.66 | bwd_microstep: 1045.50 | bwd_inner_microstep: 1045.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3380
[2024-06-10 14:34:21,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.84 | bwd_microstep: 1303.04 | bwd_inner_microstep: 1303.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 14:34:23,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1339.38 | bwd_inner_microstep: 1339.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 14:34:24,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.00 | bwd_inner_microstep: 1347.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 14:34:27,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1531.08 | bwd_inner_microstep: 1531.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 14:34:29,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1517.75 | bwd_inner_microstep: 1517.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3661
[2024-06-10 14:34:31,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.12 | bwd_microstep: 1553.21 | bwd_inner_microstep: 1553.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 14:34:33,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1255.71 | bwd_inner_microstep: 1255.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 14:34:34,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1277.30 | bwd_inner_microstep: 1277.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 14:34:37,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1657.80 | bwd_inner_microstep: 1657.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 14:34:39,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.95 | bwd_microstep: 1513.46 | bwd_inner_microstep: 1513.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1889
[2024-06-10 14:34:40,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.48 | bwd_microstep: 713.60 | bwd_inner_microstep: 713.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 14:34:41,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1292.91 | bwd_inner_microstep: 1292.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-10 14:34:43,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.09 | bwd_microstep: 1300.56 | bwd_inner_microstep: 1300.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 14:34:45,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.87 | bwd_microstep: 1555.80 | bwd_inner_microstep: 1555.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 14:34:47,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1251.30 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 14:34:48,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.77 | bwd_microstep: 807.93 | bwd_inner_microstep: 807.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3794
[2024-06-10 14:34:50,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.42 | bwd_microstep: 1516.77 | bwd_inner_microstep: 1516.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3573
[2024-06-10 14:34:55,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 14:34:55,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 3512.24 | bwd_inner_microstep: 1723.03 | bwd_allreduce_microstep: 1789.16 | step_microstep: 37.95
[2024-06-10 14:34:55,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15692.38 | bwd: 43950.03 | bwd_inner: 42159.96 | bwd_allreduce: 1789.39 | step: 39.42
{'loss': 1.2709, 'learning_rate': 2.3065651327962054e-05, 'epoch': 0.47}
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4557
[2024-06-10 14:34:57,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 714.26 | bwd_microstep: 1940.87 | bwd_inner_microstep: 1940.66 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 14:34:59,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1379.43 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 14:35:01,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1375.15 | bwd_inner_microstep: 1375.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 14:35:03,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 1492.10 | bwd_inner_microstep: 1492.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 14:35:05,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1343.05 | bwd_inner_microstep: 1343.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3806
[2024-06-10 14:35:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.38 | bwd_microstep: 1417.04 | bwd_inner_microstep: 1417.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 14:35:09,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1379.52 | bwd_inner_microstep: 1379.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 14:35:11,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1481.76 | bwd_inner_microstep: 1481.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582
[2024-06-10 14:35:13,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1302.06 | bwd_inner_microstep: 1302.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 14:35:14,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1244.55 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 14:35:16,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.33 | bwd_microstep: 1414.10 | bwd_inner_microstep: 1414.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899
[2024-06-10 14:35:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.89 | bwd_microstep: 714.96 | bwd_inner_microstep: 714.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 14:35:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.53 | bwd_microstep: 1315.16 | bwd_inner_microstep: 1315.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 14:35:21,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1345.38 | bwd_inner_microstep: 1345.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 14:35:23,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1483.17 | bwd_inner_microstep: 1483.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 14:35:25,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.18 | bwd_microstep: 1337.24 | bwd_inner_microstep: 1337.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 14:35:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.45 | bwd_microstep: 790.22 | bwd_inner_microstep: 790.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 14:35:28,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1389.83 | bwd_inner_microstep: 1389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 14:35:30,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 14:35:32,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1295.41 | bwd_inner_microstep: 1295.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937
[2024-06-10 14:35:33,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.94 | bwd_microstep: 696.30 | bwd_inner_microstep: 696.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3078
[2024-06-10 14:35:34,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.97 | bwd_microstep: 1053.05 | bwd_inner_microstep: 1053.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 14:35:36,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1488.48 | bwd_inner_microstep: 1488.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 14:35:38,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.15 | bwd_microstep: 1401.95 | bwd_inner_microstep: 1401.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3664
[2024-06-10 14:35:40,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1454.17 | bwd_inner_microstep: 1454.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547
[2024-06-10 14:35:42,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1325.46 | bwd_inner_microstep: 1325.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3378
[2024-06-10 14:35:44,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.05 | bwd_microstep: 1432.28 | bwd_inner_microstep: 1432.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 14:35:46,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1258.40 | bwd_inner_microstep: 1258.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 14:35:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.81 | bwd_microstep: 1321.52 | bwd_inner_microstep: 1321.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 14:35:49,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1443.83 | bwd_inner_microstep: 1443.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 14:35:51,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1394.50 | bwd_inner_microstep: 1394.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 14:35:56,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 14:35:56,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 4459.66 | bwd_inner_microstep: 1946.25 | bwd_allreduce_microstep: 2513.35 | step_microstep: 37.96
[2024-06-10 14:35:56,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15984.17 | bwd: 45557.88 | bwd_inner: 43043.46 | bwd_allreduce: 2513.66 | step: 39.51
{'loss': 1.2509, 'learning_rate': 2.3028555550091536e-05, 'epoch': 0.47}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1929
[2024-06-10 14:35:58,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.04 | bwd_microstep: 836.93 | bwd_inner_microstep: 836.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 14:36:00,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.41 | bwd_microstep: 1547.41 | bwd_inner_microstep: 1547.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814
[2024-06-10 14:36:02,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.51 | bwd_microstep: 1578.36 | bwd_inner_microstep: 1578.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 14:36:04,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.42 | bwd_microstep: 1381.46 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 14:36:06,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.35 | bwd_microstep: 1336.04 | bwd_inner_microstep: 1336.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 14:36:07,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.56 | bwd_microstep: 1294.91 | bwd_inner_microstep: 1294.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 14:36:09,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 14:36:10,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.97 | bwd_microstep: 792.50 | bwd_inner_microstep: 792.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1953
[2024-06-10 14:36:11,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.64 | bwd_microstep: 729.51 | bwd_inner_microstep: 729.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 14:36:13,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1243.49 | bwd_inner_microstep: 1243.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 14:36:15,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.38 | bwd_microstep: 1276.89 | bwd_inner_microstep: 1276.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2014
[2024-06-10 14:36:16,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.60 | bwd_microstep: 740.13 | bwd_inner_microstep: 740.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693
[2024-06-10 14:36:18,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.65 | bwd_microstep: 1721.88 | bwd_inner_microstep: 1721.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2193
[2024-06-10 14:36:20,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 950.81 | bwd_inner_microstep: 950.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3899
[2024-06-10 14:36:22,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1455.55 | bwd_inner_microstep: 1455.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 14:36:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1286.80 | bwd_inner_microstep: 1286.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 14:36:25,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1348.53 | bwd_inner_microstep: 1348.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2105
[2024-06-10 14:36:26,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.63 | bwd_microstep: 821.84 | bwd_inner_microstep: 821.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 14:36:28,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.70 | bwd_microstep: 1448.88 | bwd_inner_microstep: 1448.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-10 14:36:30,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.15 | bwd_microstep: 1485.18 | bwd_inner_microstep: 1485.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-10 14:36:32,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.31 | bwd_microstep: 1190.22 | bwd_inner_microstep: 1190.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 14:36:34,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.98 | bwd_microstep: 1353.04 | bwd_inner_microstep: 1353.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291
[2024-06-10 14:36:35,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.88 | bwd_microstep: 909.31 | bwd_inner_microstep: 909.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-10 14:36:37,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1448.72 | bwd_inner_microstep: 1448.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3616
[2024-06-10 14:36:39,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.31 | bwd_microstep: 1638.86 | bwd_inner_microstep: 1638.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3599
[2024-06-10 14:36:42,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.65 | bwd_microstep: 1559.96 | bwd_inner_microstep: 1559.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379
[2024-06-10 14:36:43,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1240.45 | bwd_inner_microstep: 1240.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3588
[2024-06-10 14:36:46,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.61 | bwd_microstep: 1767.87 | bwd_inner_microstep: 1767.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3420
[2024-06-10 14:36:48,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.81 | bwd_microstep: 1534.71 | bwd_inner_microstep: 1534.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 14:36:50,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.40 | bwd_microstep: 1487.97 | bwd_inner_microstep: 1487.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3650
[2024-06-10 14:36:52,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.77 | bwd_microstep: 1535.95 | bwd_inner_microstep: 1535.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 14:36:58,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 14:36:58,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 4986.86 | bwd_inner_microstep: 1521.34 | bwd_allreduce_microstep: 3465.47 | step_microstep: 38.04
[2024-06-10 14:36:58,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15566.70 | bwd: 45316.46 | bwd_inner: 41850.01 | bwd_allreduce: 3465.73 | step: 39.53
{'loss': 1.2591, 'learning_rate': 2.2991449105664113e-05, 'epoch': 0.47}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 14:37:00,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.63 | bwd_microstep: 1433.41 | bwd_inner_microstep: 1433.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 14:37:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.96 | bwd_microstep: 1238.78 | bwd_inner_microstep: 1238.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-10 14:37:02,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.69 | bwd_microstep: 800.57 | bwd_inner_microstep: 800.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3530
[2024-06-10 14:37:04,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1355.28 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 14:37:05,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.80 | bwd_microstep: 679.79 | bwd_inner_microstep: 679.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 957
[2024-06-10 14:37:06,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.83 | bwd_microstep: 381.22 | bwd_inner_microstep: 381.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 14:37:08,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1382.45 | bwd_inner_microstep: 1382.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3236
[2024-06-10 14:37:09,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.04 | bwd_microstep: 1178.62 | bwd_inner_microstep: 1178.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2215
[2024-06-10 14:37:11,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.10 | bwd_microstep: 939.67 | bwd_inner_microstep: 939.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 14:37:13,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 14:37:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 710.98 | bwd_inner_microstep: 710.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 14:37:15,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1279.09 | bwd_inner_microstep: 1279.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3667
[2024-06-10 14:37:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.92 | bwd_microstep: 1654.10 | bwd_inner_microstep: 1654.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 14:37:19,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1346.83 | bwd_inner_microstep: 1346.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3667
[2024-06-10 14:37:22,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.09 | bwd_microstep: 1716.68 | bwd_inner_microstep: 1716.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2886
[2024-06-10 14:37:23,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.60 | bwd_microstep: 1085.91 | bwd_inner_microstep: 1085.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 14:37:25,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.77 | bwd_microstep: 1245.37 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-10 14:37:27,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.88 | bwd_microstep: 1069.78 | bwd_inner_microstep: 1069.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3464
[2024-06-10 14:37:28,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1403.16 | bwd_inner_microstep: 1403.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007
[2024-06-10 14:37:30,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 901.03 | bwd_inner_microstep: 901.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2290
[2024-06-10 14:37:31,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.31 | bwd_microstep: 882.08 | bwd_inner_microstep: 882.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 14:37:33,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1509.09 | bwd_inner_microstep: 1509.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 14:37:35,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.84 | bwd_microstep: 1186.83 | bwd_inner_microstep: 1186.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 14:37:37,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.29 | bwd_microstep: 1558.76 | bwd_inner_microstep: 1558.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 14:37:39,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.49 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 14:37:41,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.61 | bwd_microstep: 1355.53 | bwd_inner_microstep: 1355.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 14:37:43,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1393.48 | bwd_inner_microstep: 1393.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-10 14:37:44,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.58 | bwd_microstep: 877.06 | bwd_inner_microstep: 877.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-10 14:37:46,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.40 | bwd_microstep: 1538.74 | bwd_inner_microstep: 1538.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569
[2024-06-10 14:37:48,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.60 | bwd_microstep: 1450.08 | bwd_inner_microstep: 1450.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 14:37:50,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1313.81 | bwd_inner_microstep: 1313.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3987
[2024-06-10 14:37:57,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 14:37:57,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.26 | bwd_microstep: 6716.32 | bwd_inner_microstep: 2041.96 | bwd_allreduce_microstep: 4674.31 | step_microstep: 37.85
[2024-06-10 14:37:57,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14834.40 | bwd: 44459.33 | bwd_inner: 39784.12 | bwd_allreduce: 4674.54 | step: 39.41
{'loss': 1.2518, 'learning_rate': 2.295433212536849e-05, 'epoch': 0.47}
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4678
[2024-06-10 14:38:00,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 755.98 | bwd_microstep: 2064.52 | bwd_inner_microstep: 2064.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3929
[2024-06-10 14:38:02,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1496.44 | bwd_inner_microstep: 1496.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 14:38:04,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.22 | bwd_microstep: 1455.53 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 14:38:05,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.65 | bwd_microstep: 875.70 | bwd_inner_microstep: 875.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 14:38:06,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.87 | bwd_microstep: 679.68 | bwd_inner_microstep: 679.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 14:38:08,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-10 14:38:10,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1428.25 | bwd_inner_microstep: 1428.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 14:38:12,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1247.69 | bwd_inner_microstep: 1247.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 14:38:14,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1244.41 | bwd_inner_microstep: 1244.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1371
[2024-06-10 14:38:14,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.89 | bwd_microstep: 522.04 | bwd_inner_microstep: 522.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3984
[2024-06-10 14:38:16,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1488.00 | bwd_inner_microstep: 1487.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651
[2024-06-10 14:38:18,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.31 | bwd_microstep: 1316.70 | bwd_inner_microstep: 1316.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 14:38:20,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1405.16 | bwd_inner_microstep: 1405.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2187
[2024-06-10 14:38:21,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.69 | bwd_microstep: 857.82 | bwd_inner_microstep: 857.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 14:38:23,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.31 | bwd_microstep: 1473.48 | bwd_inner_microstep: 1473.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 14:38:25,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1384.21 | bwd_inner_microstep: 1384.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3474
[2024-06-10 14:38:27,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1326.26 | bwd_inner_microstep: 1326.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 14:38:29,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1281.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 14:38:31,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1289.91 | bwd_inner_microstep: 1289.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2276
[2024-06-10 14:38:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.10 | bwd_microstep: 782.86 | bwd_inner_microstep: 782.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 14:38:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1250.79 | bwd_inner_microstep: 1250.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 14:38:35,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1394.00 | bwd_inner_microstep: 1393.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167
[2024-06-10 14:38:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.15 | bwd_microstep: 854.83 | bwd_inner_microstep: 854.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3539
[2024-06-10 14:38:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.26 | bwd_microstep: 1358.60 | bwd_inner_microstep: 1358.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4896
[2024-06-10 14:38:41,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 756.08 | bwd_microstep: 2050.02 | bwd_inner_microstep: 2050.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3427
[2024-06-10 14:38:43,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 14:38:45,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1403.00 | bwd_inner_microstep: 1402.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3594
[2024-06-10 14:38:47,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.74 | bwd_microstep: 1337.18 | bwd_inner_microstep: 1337.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-10 14:38:48,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.58 | bwd_microstep: 708.70 | bwd_inner_microstep: 708.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 14:38:50,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1259.16 | bwd_inner_microstep: 1259.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 14:38:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1591.93 | bwd_inner_microstep: 1591.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3009
[2024-06-10 14:38:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 14:38:58,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 5518.26 | bwd_inner_microstep: 1435.39 | bwd_allreduce_microstep: 4082.81 | step_microstep: 38.15
[2024-06-10 14:38:58,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15371.15 | bwd: 45119.98 | bwd_inner: 41036.27 | bwd_allreduce: 4083.04 | step: 39.63
{'loss': 1.2639, 'learning_rate': 2.291720473993049e-05, 'epoch': 0.47}
it]


 47%|████▋     | 806/1726 [13:56:31<16:11:52, 63.38s/it]
 47%|████▋     | 807/1726 [13:57:31<15:55:09, 62.36s/it]


 47%|████▋     | 807/1726 [13:57:31<15:55:09, 62.36s/it]
 47%|████▋     | 808/1726 [13:58:33<15:51:52, 62.21s/it]


 47%|████▋     | 808/1726 [13:58:33<15:51:52, 62.21s/it]
 47%|████▋     | 809/1726 [13:59:34<15:46:17, 61.92s/it]


 47%|████▋     | 809/1726 [13:59:34<15:46:17, 61.92s/it]
 47%|████▋     | 810/1726 [14:00:34<15:34:44, 61.23s/it]


 47%|████▋     | 810/1726 [14:00:34<15:34:44, 61.23s/it]
 47%|████▋     | 811/1726 [14:01:35<15:31:50, 61.10s/it]


 4dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874
[2024-06-10 14:38:59,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.16 | bwd_microstep: 670.28 | bwd_inner_microstep: 670.21 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3907
[2024-06-10 14:39:01,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.58 | bwd_microstep: 1813.34 | bwd_inner_microstep: 1813.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 14:39:03,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1442.44 | bwd_inner_microstep: 1442.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 14:39:06,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.97 | bwd_microstep: 1544.46 | bwd_inner_microstep: 1544.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 14:39:08,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.85 | bwd_microstep: 1473.20 | bwd_inner_microstep: 1473.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 14:39:09,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.60 | bwd_microstep: 787.63 | bwd_inner_microstep: 787.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2253
[2024-06-10 14:39:10,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.71 | bwd_microstep: 964.36 | bwd_inner_microstep: 964.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 14:39:12,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1338.89 | bwd_inner_microstep: 1338.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3722
[2024-06-10 14:39:14,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1461.24 | bwd_inner_microstep: 1461.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1878
[2024-06-10 14:39:15,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.00 | bwd_microstep: 709.29 | bwd_inner_microstep: 709.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3542
[2024-06-10 14:39:17,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1230.43 | bwd_inner_microstep: 1230.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438
[2024-06-10 14:39:18,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.78 | bwd_microstep: 1157.92 | bwd_inner_microstep: 1157.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 14:39:20,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1377.58 | bwd_inner_microstep: 1377.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3666
[2024-06-10 14:39:22,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.40 | bwd_microstep: 1599.10 | bwd_inner_microstep: 1599.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3678
[2024-06-10 14:39:25,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.71 | bwd_microstep: 1657.81 | bwd_inner_microstep: 1657.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2397
[2024-06-10 14:39:26,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.26 | bwd_microstep: 1003.89 | bwd_inner_microstep: 1003.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 14:39:28,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.33 | bwd_microstep: 1609.42 | bwd_inner_microstep: 1609.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 14:39:30,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3821
[2024-06-10 14:39:32,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.26 | bwd_microstep: 1336.76 | bwd_inner_microstep: 1336.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 14:39:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1394.72 | bwd_inner_microstep: 1394.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 14:39:36,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.70 | bwd_microstep: 1553.87 | bwd_inner_microstep: 1553.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1990
[2024-06-10 14:39:37,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.43 | bwd_microstep: 708.81 | bwd_inner_microstep: 708.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3480
[2024-06-10 14:39:39,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.67 | bwd_microstep: 1247.24 | bwd_inner_microstep: 1247.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 14:39:41,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.80 | bwd_microstep: 1355.65 | bwd_inner_microstep: 1355.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 14:39:42,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.30 | bwd_microstep: 1407.32 | bwd_inner_microstep: 1407.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-10 14:39:44,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.79 | bwd_microstep: 1341.45 | bwd_inner_microstep: 1341.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 14:39:46,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1372.19 | bwd_inner_microstep: 1372.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3462
[2024-06-10 14:39:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1520.00 | bwd_inner_microstep: 1519.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 14:39:50,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1482.67 | bwd_inner_microstep: 1482.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3557
[2024-06-10 14:39:53,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.91 | bwd_microstep: 1585.69 | bwd_inner_microstep: 1585.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 14:39:54,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1243.07 | bwd_inner_microstep: 1243.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 14:39:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.63
[2024-06-10 14:39:59,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 4159.41 | bwd_inner_microstep: 1649.26 | bwd_allreduce_microstep: 2510.10 | step_microstep: 38.43
[2024-06-10 14:39:59,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15806.80 | bwd: 44839.60 | bwd_inner: 42328.55 | bwd_allreduce: 2510.35 | step: 39.89
{'loss': 1.2812, 'learning_rate': 2.2880067080112553e-05, 'epoch': 0.47}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 14:40:00,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.00 | bwd_microstep: 770.54 | bwd_inner_microstep: 770.45 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 14:40:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1339.58 | bwd_inner_microstep: 1339.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231
[2024-06-10 14:40:03,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.43 | bwd_microstep: 958.09 | bwd_inner_microstep: 958.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 14:40:05,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1296.72 | bwd_inner_microstep: 1296.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 14:40:07,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1338.99 | bwd_inner_microstep: 1338.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 14:40:09,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1530.88 | bwd_inner_microstep: 1530.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 14:40:11,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1248.08 | bwd_inner_microstep: 1248.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943
[2024-06-10 14:40:12,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.42 | bwd_microstep: 727.77 | bwd_inner_microstep: 727.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481
[2024-06-10 14:40:13,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.01 | bwd_microstep: 1214.86 | bwd_inner_microstep: 1214.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 14:40:15,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.55 | bwd_microstep: 1337.98 | bwd_inner_microstep: 1337.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2891
[2024-06-10 14:40:17,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 425.99 | bwd_microstep: 1119.36 | bwd_inner_microstep: 1119.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 14:40:19,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1378.53 | bwd_inner_microstep: 1378.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 14:40:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1444.38 | bwd_inner_microstep: 1444.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3631
[2024-06-10 14:40:23,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.00 | bwd_microstep: 1805.49 | bwd_inner_microstep: 1805.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 14:40:25,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1386.15 | bwd_inner_microstep: 1386.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3654
[2024-06-10 14:40:27,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.05 | bwd_microstep: 1387.28 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-10 14:40:29,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1320.27 | bwd_inner_microstep: 1320.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 14:40:31,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1288.83 | bwd_inner_microstep: 1288.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 14:40:33,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.43 | bwd_microstep: 1553.00 | bwd_inner_microstep: 1552.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1987
[2024-06-10 14:40:34,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.15 | bwd_microstep: 770.61 | bwd_inner_microstep: 770.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 14:40:35,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.80 | bwd_microstep: 1156.69 | bwd_inner_microstep: 1156.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 14:40:37,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.54 | bwd_microstep: 1456.89 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533
[2024-06-10 14:40:39,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.76 | bwd_microstep: 1197.22 | bwd_inner_microstep: 1197.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917
[2024-06-10 14:40:40,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.89 | bwd_microstep: 687.53 | bwd_inner_microstep: 687.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004
[2024-06-10 14:40:41,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.26 | bwd_microstep: 713.01 | bwd_inner_microstep: 712.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 14:40:43,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1313.30 | bwd_inner_microstep: 1313.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2026
[2024-06-10 14:40:44,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.92 | bwd_microstep: 933.54 | bwd_inner_microstep: 933.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2054
[2024-06-10 14:40:46,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.93 | bwd_microstep: 957.50 | bwd_inner_microstep: 957.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576
[2024-06-10 14:40:48,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.85 | bwd_microstep: 1529.40 | bwd_inner_microstep: 1529.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 14:40:49,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 14:40:51,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.95 | bwd_microstep: 1503.50 | bwd_inner_microstep: 1503.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-10 14:41:02,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.27 | optimizer_step: 6.60
[2024-06-10 14:41:02,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.42 | bwd_microstep: 9596.16 | bwd_inner_microstep: 1647.12 | bwd_allreduce_microstep: 7948.96 | step_microstep: 38.53
[2024-06-10 14:41:02,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14781.30 | bwd: 47515.19 | bwd_inner: 39565.24 | bwd_allreduce: 7949.25 | step: 39.99
{'loss': 1.2868, 'learning_rate': 2.2842919276713334e-05, 'epoch': 0.47}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3419
[2024-06-10 14:41:04,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.46 | bwd_microstep: 1359.15 | bwd_inner_microstep: 1359.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898
[2024-06-10 14:41:06,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.04 | bwd_microstep: 1580.70 | bwd_inner_microstep: 1580.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1949
[2024-06-10 14:41:07,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 825.72 | bwd_inner_microstep: 825.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 14:41:09,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1382.26 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803
[2024-06-10 14:41:11,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.03 | bwd_microstep: 1449.76 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 14:41:13,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.19 | bwd_microstep: 1630.01 | bwd_inner_microstep: 1629.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 14:41:15,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1252.80 | bwd_inner_microstep: 1252.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 14:41:16,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.66 | bwd_microstep: 800.04 | bwd_inner_microstep: 800.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100
[2024-06-10 14:41:17,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.62 | bwd_microstep: 918.43 | bwd_inner_microstep: 918.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-10 14:41:19,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.14 | bwd_microstep: 1621.73 | bwd_inner_microstep: 1621.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3492
[2024-06-10 14:41:21,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.40 | bwd_microstep: 1328.50 | bwd_inner_microstep: 1328.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 14:41:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1387.55 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2218
[2024-06-10 14:41:24,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.54 | bwd_microstep: 959.42 | bwd_inner_microstep: 959.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 14:41:26,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.63 | bwd_microstep: 1444.72 | bwd_inner_microstep: 1444.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2963
[2024-06-10 14:41:28,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.92 | bwd_microstep: 1196.88 | bwd_inner_microstep: 1196.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 14:41:30,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.55 | bwd_microstep: 1610.01 | bwd_inner_microstep: 1609.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-10 14:41:32,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.34 | bwd_microstep: 1621.95 | bwd_inner_microstep: 1621.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 14:41:34,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1351.61 | bwd_inner_microstep: 1351.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656
[2024-06-10 14:41:37,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.99 | bwd_microstep: 1620.43 | bwd_inner_microstep: 1620.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 14:41:39,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.38 | bwd_microstep: 1475.49 | bwd_inner_microstep: 1475.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 14:41:41,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.91 | bwd_microstep: 1479.62 | bwd_inner_microstep: 1479.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3738
[2024-06-10 14:41:43,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3593
[2024-06-10 14:41:45,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.02 | bwd_microstep: 1432.45 | bwd_inner_microstep: 1432.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 14:41:46,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1246.94 | bwd_inner_microstep: 1246.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3516
[2024-06-10 14:41:48,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1509.96 | bwd_inner_microstep: 1509.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 14:41:50,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1404.31 | bwd_inner_microstep: 1404.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3813
[2024-06-10 14:41:52,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.03 | bwd_microstep: 1256.75 | bwd_inner_microstep: 1256.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 14:41:54,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.89 | bwd_microstep: 1294.73 | bwd_inner_microstep: 1294.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271
[2024-06-10 14:41:55,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.22 | bwd_microstep: 876.74 | bwd_inner_microstep: 876.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 14:41:56,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 875.75 | bwd_inner_microstep: 875.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3778
[2024-06-10 14:41:58,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.60 | bwd_microstep: 1607.71 | bwd_inner_microstep: 1607.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 14:42:03,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.17 | optimizer_step: 6.58
[2024-06-10 14:42:03,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 3609.01 | bwd_inner_microstep: 1577.34 | bwd_allreduce_microstep: 2031.62 | step_microstep: 37.72
[2024-06-10 14:42:03,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15902.87 | bwd: 44783.46 | bwd_inner: 42750.94 | bwd_allreduce: 2031.85 | step: 39.18
{'loss': 1.2105, 'learning_rate': 2.2805761460567197e-05, 'epoch': 0.47}
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2635
[2024-06-10 14:42:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.44 | bwd_microstep: 1008.18 | bwd_inner_microstep: 1008.10 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 14:42:06,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1379.18 | bwd_inner_microstep: 1379.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 14:42:08,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1378.69 | bwd_inner_microstep: 1378.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 14:42:10,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1299.65 | bwd_inner_microstep: 1299.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 14:42:12,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.60 | bwd_microstep: 1385.41 | bwd_inner_microstep: 1385.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478
[2024-06-10 14:42:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.64 | bwd_microstep: 1409.89 | bwd_inner_microstep: 1409.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-10 14:42:16,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1536.49 | bwd_inner_microstep: 1536.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 14:42:18,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.98 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 14:42:19,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1298.00 | bwd_inner_microstep: 1297.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 14:42:20,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2227
[2024-06-10 14:42:22,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.61 | bwd_microstep: 960.25 | bwd_inner_microstep: 960.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 14:42:24,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.85 | bwd_microstep: 1289.33 | bwd_inner_microstep: 1289.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719
[2024-06-10 14:42:26,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1559.85 | bwd_inner_microstep: 1559.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 14:42:28,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.91 | bwd_microstep: 1443.43 | bwd_inner_microstep: 1443.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 14:42:30,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1502.95 | bwd_inner_microstep: 1502.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 14:42:32,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1345.25 | bwd_inner_microstep: 1345.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3828
[2024-06-10 14:42:34,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.83 | bwd_microstep: 1858.25 | bwd_inner_microstep: 1858.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 14:42:36,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.71 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 14:42:38,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1255.30 | bwd_inner_microstep: 1255.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 14:42:40,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.72 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 14:42:42,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 14:42:44,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.09 | bwd_microstep: 1293.00 | bwd_inner_microstep: 1292.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 14:42:45,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.47 | bwd_microstep: 1296.15 | bwd_inner_microstep: 1296.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 14:42:48,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.15 | bwd_microstep: 1656.86 | bwd_inner_microstep: 1656.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 889
[2024-06-10 14:42:48,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.80 | bwd_microstep: 370.44 | bwd_inner_microstep: 370.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 14:42:50,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.95 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 14:42:52,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1351.05 | bwd_inner_microstep: 1351.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468
[2024-06-10 14:42:54,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.40 | bwd_microstep: 1434.96 | bwd_inner_microstep: 1434.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3806
[2024-06-10 14:42:56,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.50 | bwd_microstep: 1696.13 | bwd_inner_microstep: 1696.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2284
[2024-06-10 14:42:57,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.10 | bwd_microstep: 782.23 | bwd_inner_microstep: 782.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 14:42:59,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1553.31 | bwd_inner_microstep: 1553.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 14:43:07,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 14:43:07,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.15 | bwd_microstep: 6454.24 | bwd_inner_microstep: 1808.64 | bwd_allreduce_microstep: 4645.55 | step_microstep: 38.39
[2024-06-10 14:43:07,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16008.29 | bwd: 47551.36 | bwd_inner: 42904.85 | bwd_allreduce: 4645.81 | step: 39.90
{'loss': 1.2262, 'learning_rate': 2.2768593762543784e-05, 'epoch': 0.47}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 14:43:08,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1399.25 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 14:43:10,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.46 | bwd_microstep: 1344.63 | bwd_inner_microstep: 1344.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3879
[2024-06-10 14:43:13,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.60 | bwd_microstep: 1582.17 | bwd_inner_microstep: 1582.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 14:43:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.87 | bwd_microstep: 1646.33 | bwd_inner_microstep: 1646.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798
[2024-06-10 14:43:17,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.65 | bwd_microstep: 1547.91 | bwd_inner_microstep: 1547.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 14:43:18,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.42 | bwd_microstep: 790.05 | bwd_inner_microstep: 790.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-10 14:43:19,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.93 | bwd_microstep: 727.55 | bwd_inner_microstep: 727.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3433
[2024-06-10 14:43:21,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.72 | bwd_microstep: 1214.54 | bwd_inner_microstep: 1214.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 14:43:23,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 14:43:25,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.33 | bwd_microstep: 1531.04 | bwd_inner_microstep: 1531.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3443
[2024-06-10 14:43:27,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.55 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 14:43:29,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1478.57 | bwd_inner_microstep: 1478.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660
[2024-06-10 14:43:31,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.41 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 14:43:33,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.13 | bwd_microstep: 1406.85 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 14:43:35,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.85 | bwd_microstep: 1585.03 | bwd_inner_microstep: 1585.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 14:43:37,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1326.08 | bwd_inner_microstep: 1326.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522
[2024-06-10 14:43:38,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1322.87 | bwd_inner_microstep: 1322.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 14:43:40,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1395.29 | bwd_inner_microstep: 1395.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 14:43:42,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.92 | bwd_microstep: 1391.61 | bwd_inner_microstep: 1391.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-10 14:43:44,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.04 | bwd_microstep: 1213.32 | bwd_inner_microstep: 1213.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154
[2024-06-10 14:43:45,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.55 | bwd_microstep: 851.76 | bwd_inner_microstep: 851.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 14:43:47,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1291.26 | bwd_inner_microstep: 1291.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-10 14:43:49,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2407
[2024-06-10 14:43:50,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.62 | bwd_microstep: 842.07 | bwd_inner_microstep: 842.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 14:43:52,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.34 | bwd_microstep: 1554.39 | bwd_inner_microstep: 1554.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2269
[2024-06-10 14:43:53,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.68 | bwd_microstep: 907.15 | bwd_inner_microstep: 907.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1935
[2024-06-10 14:43:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.06 | bwd_microstep: 760.50 | bwd_inner_microstep: 760.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2216
[2024-06-10 14:43:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.89 | bwd_microstep: 831.86 | bwd_inner_microstep: 831.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 14:43:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.37 | bwd_microstep: 1602.45 | bwd_inner_microstep: 1602.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670
[2024-06-10 14:44:00,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.18 | bwd_microstep: 1723.53 | bwd_inner_microstep: 1723.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584
[2024-06-10 14:44:02,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1526.58 | bwd_inner_microstep: 1526.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 14:44:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.61
[2024-06-10 14:44:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.17 | bwd_microstep: 3759.90 | bwd_inner_microstep: 1918.86 | bwd_allreduce_microstep: 1840.99 | step_microstep: 37.72
[2024-06-10 14:44:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15742.97 | bwd: 44178.29 | bwd_inner: 42336.40 | bwd_allreduce: 1841.22 | step: 39.17
{'loss': 1.2647, 'learning_rate': 2.273141631354753e-05, 'epoch': 0.47}
7%|████▋     | 811/1726 [14:01:35<15:31:50, 61.10s/it]
 47%|████▋     | 812/1726 [14:02:36<15:30:13, 61.07s/it]


 47%|████▋     | 812/1726 [14:02:36<15:30:13, 61.07s/it]
 47%|████▋     | 813/1726 [14:03:38<15:36:17, 61.53s/it]


 47%|████▋     | 813/1726 [14:03:38<15:36:17, 61.53s/it]
 47%|████▋     | 814/1726 [14:04:39<15:32:57, 61.38s/it]


 47%|████▋     | 814/1726 [14:04:39<15:32:57, 61.38s/it]
 47%|████▋     | 815/1726 [14:05:43<15:43:22, 62.13s/it]


 47%|████▋     | 815/1726 [14:05:43<15:43:22, 62.13s/it]
 47%|████▋     | 816/1726 [14:06:44<15:33:45, 61.57s/it]


 47%|████▋     | 816/1726 [14:06:44<15:33:45, 61.57s/it]dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 14:44:09,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1236.10 | bwd_inner_microstep: 1236.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-10 14:44:10,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.55 | bwd_microstep: 1354.28 | bwd_inner_microstep: 1354.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495
[2024-06-10 14:44:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1314.59 | bwd_inner_microstep: 1314.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 14:44:14,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 14:44:16,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1439.79 | bwd_inner_microstep: 1439.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 14:44:18,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.42 | bwd_microstep: 1532.00 | bwd_inner_microstep: 1531.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 14:44:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1252.83 | bwd_inner_microstep: 1252.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 14:44:22,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.88 | bwd_microstep: 1274.58 | bwd_inner_microstep: 1274.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 14:44:23,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.50 | bwd_microstep: 889.59 | bwd_inner_microstep: 889.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 14:44:25,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1246.11 | bwd_inner_microstep: 1246.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 14:44:27,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.56 | bwd_microstep: 1749.96 | bwd_inner_microstep: 1749.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678
[2024-06-10 14:44:29,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.06 | bwd_microstep: 1362.25 | bwd_inner_microstep: 1362.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 14:44:31,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.42 | bwd_microstep: 1275.90 | bwd_inner_microstep: 1275.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3619
[2024-06-10 14:44:33,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.97 | bwd_microstep: 1708.54 | bwd_inner_microstep: 1708.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3398
[2024-06-10 14:44:35,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.37 | bwd_microstep: 1305.18 | bwd_inner_microstep: 1305.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 14:44:37,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 1458.06 | bwd_inner_microstep: 1458.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2303
[2024-06-10 14:44:38,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.40 | bwd_microstep: 881.41 | bwd_inner_microstep: 881.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2312
[2024-06-10 14:44:39,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.37 | bwd_microstep: 822.56 | bwd_inner_microstep: 822.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099
[2024-06-10 14:44:40,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.64 | bwd_microstep: 922.35 | bwd_inner_microstep: 922.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 14:44:42,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.26 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1280.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3617
[2024-06-10 14:44:44,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.21 | bwd_microstep: 1556.62 | bwd_inner_microstep: 1556.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 14:44:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1280.43 | bwd_inner_microstep: 1280.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2080
[2024-06-10 14:44:47,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.20 | bwd_microstep: 753.75 | bwd_inner_microstep: 753.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 14:44:49,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1498.43 | bwd_inner_microstep: 1498.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 14:44:51,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1375.54 | bwd_inner_microstep: 1375.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 14:44:53,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1509.82 | bwd_inner_microstep: 1509.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 14:44:55,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1453.61 | bwd_inner_microstep: 1453.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-10 14:44:57,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760
[2024-06-10 14:44:59,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1438.16 | bwd_inner_microstep: 1438.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 14:45:01,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 1304.09 | bwd_inner_microstep: 1304.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 14:45:02,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.57 | bwd_microstep: 973.90 | bwd_inner_microstep: 973.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 14:45:06,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 14:45:06,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 3136.89 | bwd_inner_microstep: 2106.79 | bwd_allreduce_microstep: 1030.05 | step_microstep: 37.77
[2024-06-10 14:45:06,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15646.61 | bwd: 43273.77 | bwd_inner: 42242.82 | bwd_allreduce: 1030.28 | step: 39.24
{'loss': 1.2263, 'learning_rate': 2.2694229244517226e-05, 'epoch': 0.47}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 14:45:08,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1383.57 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4027
[2024-06-10 14:45:10,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.52 | bwd_microstep: 1712.46 | bwd_inner_microstep: 1712.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 14:45:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.49 | bwd_microstep: 1184.53 | bwd_inner_microstep: 1184.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 14:45:14,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.80 | bwd_microstep: 1646.39 | bwd_inner_microstep: 1646.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 14:45:16,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1389.40 | bwd_inner_microstep: 1389.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 14:45:18,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1534.22 | bwd_inner_microstep: 1534.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 14:45:20,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1282.54 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2629
[2024-06-10 14:45:21,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.16 | bwd_microstep: 1017.21 | bwd_inner_microstep: 1017.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 14:45:23,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.75 | bwd_microstep: 815.22 | bwd_inner_microstep: 815.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 14:45:25,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.93 | bwd_microstep: 1446.75 | bwd_inner_microstep: 1446.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 14:45:27,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1446.39 | bwd_inner_microstep: 1446.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 14:45:28,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.61 | bwd_microstep: 1279.96 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-10 14:45:31,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1579.93 | bwd_inner_microstep: 1579.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 14:45:32,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1384.62 | bwd_inner_microstep: 1384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014
[2024-06-10 14:45:34,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.05 | bwd_microstep: 899.72 | bwd_inner_microstep: 899.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3463
[2024-06-10 14:45:36,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.35 | bwd_microstep: 1500.13 | bwd_inner_microstep: 1500.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640
[2024-06-10 14:45:38,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1572.25 | bwd_inner_microstep: 1572.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637
[2024-06-10 14:45:40,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1319.71 | bwd_inner_microstep: 1319.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2406
[2024-06-10 14:45:41,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.15 | bwd_microstep: 940.29 | bwd_inner_microstep: 940.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-10 14:45:43,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1418.41 | bwd_inner_microstep: 1418.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3625
[2024-06-10 14:45:45,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1370.43 | bwd_inner_microstep: 1370.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 14:45:47,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.45 | bwd_microstep: 1578.94 | bwd_inner_microstep: 1578.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670
[2024-06-10 14:45:49,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1325.63 | bwd_inner_microstep: 1325.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 14:45:51,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1494.19 | bwd_inner_microstep: 1494.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 14:45:53,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.78 | bwd_microstep: 1301.36 | bwd_inner_microstep: 1301.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 14:45:54,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.69 | bwd_microstep: 697.36 | bwd_inner_microstep: 697.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 14:45:56,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1508.70 | bwd_inner_microstep: 1508.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 14:45:58,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1297.17 | bwd_inner_microstep: 1297.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 14:46:00,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.61 | bwd_microstep: 1450.88 | bwd_inner_microstep: 1450.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 14:46:02,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.22 | bwd_microstep: 1527.33 | bwd_inner_microstep: 1527.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3586
[2024-06-10 14:46:04,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1531.50 | bwd_inner_microstep: 1531.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2244
[2024-06-10 14:46:09,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 14:46:09,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.32 | bwd_microstep: 4262.52 | bwd_inner_microstep: 1100.85 | bwd_allreduce_microstep: 3161.62 | step_microstep: 37.98
[2024-06-10 14:46:09,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16018.63 | bwd: 46099.74 | bwd_inner: 42937.21 | bwd_allreduce: 3161.84 | step: 39.51
{'loss': 1.2571, 'learning_rate': 2.2657032686425517e-05, 'epoch': 0.47}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 14:46:10,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 14:46:11,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 787.28 | bwd_inner_microstep: 787.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 14:46:14,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.66 | bwd_microstep: 1550.90 | bwd_inner_microstep: 1550.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482
[2024-06-10 14:46:15,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1314.17 | bwd_inner_microstep: 1314.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 14:46:18,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.03 | bwd_microstep: 1547.72 | bwd_inner_microstep: 1547.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 14:46:19,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1245.58 | bwd_inner_microstep: 1245.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 14:46:21,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717
[2024-06-10 14:46:23,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1364.21 | bwd_inner_microstep: 1364.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 14:46:24,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.48 | bwd_microstep: 702.77 | bwd_inner_microstep: 702.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4017
[2024-06-10 14:46:26,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.04 | bwd_microstep: 1716.69 | bwd_inner_microstep: 1716.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3750
[2024-06-10 14:46:29,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.17 | bwd_microstep: 1636.17 | bwd_inner_microstep: 1636.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 14:46:31,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1346.05 | bwd_inner_microstep: 1346.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 14:46:33,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1446.64 | bwd_inner_microstep: 1446.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 14:46:35,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.97 | bwd_microstep: 1486.43 | bwd_inner_microstep: 1486.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2023
[2024-06-10 14:46:36,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.93 | bwd_microstep: 838.44 | bwd_inner_microstep: 838.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3467
[2024-06-10 14:46:37,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.03 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639
[2024-06-10 14:46:39,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.08 | bwd_microstep: 1317.83 | bwd_inner_microstep: 1317.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 14:46:41,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.86 | bwd_microstep: 1459.59 | bwd_inner_microstep: 1459.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 639
[2024-06-10 14:46:42,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.28 | bwd_microstep: 264.82 | bwd_inner_microstep: 264.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 14:46:43,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1296.62 | bwd_inner_microstep: 1296.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 14:46:45,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1384.40 | bwd_inner_microstep: 1384.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-10 14:46:46,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.38 | bwd_microstep: 701.30 | bwd_inner_microstep: 701.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 14:46:48,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.43 | bwd_microstep: 1358.80 | bwd_inner_microstep: 1358.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 14:46:50,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1508.68 | bwd_inner_microstep: 1508.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 14:46:53,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.20 | bwd_microstep: 1651.40 | bwd_inner_microstep: 1651.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603
[2024-06-10 14:46:54,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.44 | bwd_microstep: 1213.38 | bwd_inner_microstep: 1213.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 14:46:55,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.09 | bwd_microstep: 805.65 | bwd_inner_microstep: 805.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2285
[2024-06-10 14:46:57,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.64 | bwd_microstep: 915.15 | bwd_inner_microstep: 915.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580
[2024-06-10 14:46:59,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.38 | bwd_microstep: 1519.86 | bwd_inner_microstep: 1519.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 14:47:01,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3815
[2024-06-10 14:47:03,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.23 | bwd_microstep: 1433.28 | bwd_inner_microstep: 1433.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 14:47:09,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 14:47:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 6153.67 | bwd_inner_microstep: 1434.07 | bwd_allreduce_microstep: 4719.54 | step_microstep: 37.87
[2024-06-10 14:47:09,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15194.27 | bwd: 45337.93 | bwd_inner: 40617.48 | bwd_allreduce: 4719.77 | step: 39.43
{'loss': 1.2723, 'learning_rate': 2.261982677027851e-05, 'epoch': 0.47}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 14:47:10,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.31 | bwd_microstep: 670.32 | bwd_inner_microstep: 670.23 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 14:47:12,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1475.60 | bwd_inner_microstep: 1475.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 14:47:14,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1476.45 | bwd_inner_microstep: 1476.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 14:47:17,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.61 | bwd_microstep: 1542.86 | bwd_inner_microstep: 1542.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 14:47:19,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1549.36 | bwd_inner_microstep: 1549.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 14:47:20,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1249.20 | bwd_inner_microstep: 1249.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 14:47:22,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1294.50 | bwd_inner_microstep: 1294.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 14:47:24,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.37 | bwd_microstep: 1641.29 | bwd_inner_microstep: 1641.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 14:47:26,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.21 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 14:47:28,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.82 | bwd_microstep: 1521.80 | bwd_inner_microstep: 1521.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3679
[2024-06-10 14:47:31,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.37 | bwd_microstep: 1549.92 | bwd_inner_microstep: 1549.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701
[2024-06-10 14:47:33,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.49 | bwd_microstep: 1617.04 | bwd_inner_microstep: 1617.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3661
[2024-06-10 14:47:35,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.12 | bwd_microstep: 1655.42 | bwd_inner_microstep: 1655.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3512
[2024-06-10 14:47:37,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.12 | bwd_microstep: 1680.38 | bwd_inner_microstep: 1680.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 14:47:40,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1551.37 | bwd_inner_microstep: 1551.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 14:47:42,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.16 | bwd_microstep: 1603.83 | bwd_inner_microstep: 1603.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 14:47:44,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.81 | bwd_microstep: 1645.82 | bwd_inner_microstep: 1645.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 14:47:46,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 14:47:48,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1412.53 | bwd_inner_microstep: 1412.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077
[2024-06-10 14:47:49,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.67 | bwd_microstep: 917.14 | bwd_inner_microstep: 917.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 14:47:51,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1580.86 | bwd_inner_microstep: 1580.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518
[2024-06-10 14:47:53,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1420.01 | bwd_inner_microstep: 1419.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821
[2024-06-10 14:47:56,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.66 | bwd_microstep: 1753.23 | bwd_inner_microstep: 1753.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 14:47:58,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.40 | bwd_microstep: 1461.11 | bwd_inner_microstep: 1461.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 14:48:00,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.55 | bwd_microstep: 1548.57 | bwd_inner_microstep: 1548.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447
[2024-06-10 14:48:02,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1377.33 | bwd_inner_microstep: 1377.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 14:48:04,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.09 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-10 14:48:05,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.75 | bwd_microstep: 1401.89 | bwd_inner_microstep: 1401.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3581
[2024-06-10 14:48:07,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1349.73 | bwd_inner_microstep: 1349.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 14:48:09,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 1401.66 | bwd_inner_microstep: 1401.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3766
[2024-06-10 14:48:11,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 1373.59 | bwd_inner_microstep: 1373.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 14:48:13,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.18 | optimizer_step: 6.67
[2024-06-10 14:48:13,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.63 | bwd_microstep: 1441.61 | bwd_inner_microstep: 1433.85 | bwd_allreduce_microstep: 7.72 | step_microstep: 37.75
[2024-06-10 14:48:13,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17223.09 | bwd: 46280.65 | bwd_inner: 46271.96 | bwd_allreduce: 7.99 | step: 39.22
{'loss': 1.3042, 'learning_rate': 2.258261162711523e-05, 'epoch': 0.48}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 14:48:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1488.02 | bwd_inner_microstep: 1487.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 14:48:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866
[2024-06-10 14:48:18,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 676.84 | bwd_inner_microstep: 676.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 14:48:20,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1378.36 | bwd_inner_microstep: 1378.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-10 14:48:22,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1637.45 | bwd_inner_microstep: 1637.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-10 14:48:24,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.66 | bwd_microstep: 1185.04 | bwd_inner_microstep: 1185.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1925
[2024-06-10 14:48:25,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.65 | bwd_microstep: 758.71 | bwd_inner_microstep: 758.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 14:48:27,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.95 | bwd_microstep: 1385.74 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 14:48:29,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.23 | bwd_microstep: 1624.32 | bwd_inner_microstep: 1624.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 14:48:31,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1387.89 | bwd_inner_microstep: 1387.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3677
[2024-06-10 14:48:33,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.73 | bwd_microstep: 1570.17 | bwd_inner_microstep: 1570.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3517
[2024-06-10 14:48:35,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1513.82 | bwd_inner_microstep: 1513.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 14:48:37,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.26 | bwd_microstep: 1386.15 | bwd_inner_microstep: 1386.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1949
[2024-06-10 14:48:38,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 821.76 | bwd_inner_microstep: 821.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3645
[2024-06-10 14:48:41,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.11 | bwd_microstep: 1758.40 | bwd_inner_microstep: 1758.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3646
[2024-06-10 14:48:43,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.02 | bwd_microstep: 1641.65 | bwd_inner_microstep: 1641.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3632
[2024-06-10 14:48:45,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1248.20 | bwd_inner_microstep: 1248.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 14:48:47,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1375.03 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 14:48:48,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1281.49 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1991
[2024-06-10 14:48:50,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.33 | bwd_microstep: 832.13 | bwd_inner_microstep: 832.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 14:48:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1554.28 | bwd_inner_microstep: 1554.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 14:48:53,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1256.72 | bwd_inner_microstep: 1256.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 14:48:55,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.11 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 14:48:57,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.45 | bwd_microstep: 804.25 | bwd_inner_microstep: 804.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585
[2024-06-10 14:48:59,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.87 | bwd_microstep: 1533.29 | bwd_inner_microstep: 1533.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2367
[2024-06-10 14:49:00,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.75 | bwd_microstep: 1119.00 | bwd_inner_microstep: 1118.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1927
[2024-06-10 14:49:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.72 | bwd_microstep: 759.19 | bwd_inner_microstep: 759.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 14:49:03,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1604.18 | bwd_inner_microstep: 1604.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934
[2024-06-10 14:49:04,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.69 | bwd_microstep: 756.68 | bwd_inner_microstep: 756.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586
[2024-06-10 14:49:06,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1338.44 | bwd_inner_microstep: 1338.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 14:49:08,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.65 | bwd_microstep: 1546.04 | bwd_inner_microstep: 1546.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-10 14:49:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.15 | optimizer_gradients: 4.30 | optimizer_step: 6.62
[2024-06-10 14:49:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.59 | bwd_microstep: 4100.15 | bwd_inner_microstep: 777.93 | bwd_allreduce_microstep: 3322.15 | step_microstep: 41.17
[2024-06-10 14:49:13,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15210.39 | bwd: 44113.85 | bwd_inner: 40790.79 | bwd_allreduce: 3322.39 | step: 42.63
{'loss': 1.3127, 'learning_rate': 2.2545387388007227e-05, 'epoch': 0.48}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 14:49:15,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1475.35 | bwd_inner_microstep: 1475.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 14:49:17,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1373.18 | bwd_inner_microstep: 1373.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 14:49:19,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1375.09 | bwd_inner_microstep: 1375.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3788
[2024-06-10 14:49:21,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.23 | bwd_microstep: 1510.31 | bwd_inner_microstep: 1510.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 14:49:23,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.67 | bwd_microstep: 1341.00 | bwd_inner_microstep: 1340.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 14:49:25,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.89 | bwd_microstep: 1534.71 | bwd_inner_microstep: 1534.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 14:49:27,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 14:49:28,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 14:49:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.80 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 14:49:31,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.21 | bwd_microstep: 790.32 | bwd_inner_microstep: 790.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 14:49:33,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1495.28 | bwd_inner_microstep: 1495.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 14:49:35,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.25 | bwd_microstep: 1381.41 | bwd_inner_microstep: 1381.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958
[2024-06-10 14:49:36,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.96 | bwd_microstep: 822.73 | bwd_inner_microstep: 822.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 14:49:38,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1486.13 | bwd_inner_microstep: 1486.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437
[2024-06-10 14:49:40,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1284.12 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2127
[2024-06-10 14:49:41,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.65 | bwd_microstep: 829.46 | bwd_inner_microstep: 829.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 14:49:43,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1414.68 | bwd_inner_microstep: 1414.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-10 14:49:46,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.13 | bwd_microstep: 1611.92 | bwd_inner_microstep: 1611.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-10 14:49:47,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.16 | bwd_microstep: 1310.62 | bwd_inner_microstep: 1310.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3519
[2024-06-10 14:49:50,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.50 | bwd_microstep: 1634.26 | bwd_inner_microstep: 1634.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 14:49:52,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1458.06 | bwd_inner_microstep: 1458.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823
[2024-06-10 14:49:54,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.38 | bwd_microstep: 1752.21 | bwd_inner_microstep: 1752.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3710
[2024-06-10 14:49:56,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.77 | bwd_microstep: 1487.25 | bwd_inner_microstep: 1487.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518
[2024-06-10 14:49:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1512.19 | bwd_inner_microstep: 1512.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3808
[2024-06-10 14:50:00,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1419.44 | bwd_inner_microstep: 1419.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815
[2024-06-10 14:50:02,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.11 | bwd_microstep: 1387.73 | bwd_inner_microstep: 1387.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 14:50:04,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.13 | bwd_microstep: 1405.17 | bwd_inner_microstep: 1405.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518
[2024-06-10 14:50:06,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.85 | bwd_microstep: 1227.47 | bwd_inner_microstep: 1227.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3778
[2024-06-10 14:50:08,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.81 | bwd_microstep: 1580.82 | bwd_inner_microstep: 1580.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 14:50:09,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.40 | bwd_microstep: 1185.34 | bwd_inner_microstep: 1185.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 14:50:12,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1456.93 | bwd_inner_microstep: 1456.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3830
[2024-06-10 14:50:15,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 14:50:15,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.46 | bwd_microstep: 3205.73 | bwd_inner_microstep: 1938.12 | bwd_allreduce_microstep: 1267.56 | step_microstep: 37.64
[2024-06-10 14:50:15,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16505.75 | bwd: 45668.90 | bwd_inner: 44400.43 | bwd_allreduce: 1267.79 | step: 39.09

 47%|████▋     | 817/1726 [14:07:43<15:22:13, 60.87s/it]


 47%|████▋     | 817/1726 [14:07:43<15:22:13, 60.87s/it]
 47%|████▋     | 818/1726 [14:08:45<15:28:26, 61.35s/it]


 47%|████▋     | 818/1726 [14:08:45<15:28:26, 61.35s/it]
 47%|████▋     | 819/1726 [14:09:46<15:25:11, 61.20s/it]


 47%|████▋     | 819/1726 [14:09:46<15:25:11, 61.20s/it]
 48%|████▊     | 820/1726 [14:10:50<15:36:08, 62.00s/it]


 48%|████▊     | 820/1726 [14:10:50<15:36:08, 62.00s/it]
 48%|████▊     | 821/1726 [14:11:50<15:24:30, 61.29s/it]


 48%|████▊     | 821/1726 [14:11:50<15:24:30, 61.29s/it]
 48%|████▊     | 822/1726 [14:12:52<15:28:58, 61.66s/{'loss': 1.2302, 'learning_rate': 2.2508154184058077e-05, 'epoch': 0.48}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 14:50:16,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.36 | bwd_microstep: 717.18 | bwd_inner_microstep: 717.05 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4415
[2024-06-10 14:50:19,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.01 | bwd_microstep: 1618.78 | bwd_inner_microstep: 1618.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2417
[2024-06-10 14:50:20,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.68 | bwd_microstep: 938.55 | bwd_inner_microstep: 938.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3768
[2024-06-10 14:50:22,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.09 | bwd_microstep: 1369.08 | bwd_inner_microstep: 1369.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2470
[2024-06-10 14:50:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.35 | bwd_microstep: 951.25 | bwd_inner_microstep: 951.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 14:50:25,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.02 | bwd_microstep: 1310.25 | bwd_inner_microstep: 1310.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 14:50:27,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1280.84 | bwd_inner_microstep: 1280.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 14:50:28,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.78 | bwd_microstep: 1154.98 | bwd_inner_microstep: 1154.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 14:50:30,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1525.06 | bwd_inner_microstep: 1525.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 14:50:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.52 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3674
[2024-06-10 14:50:35,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.30 | bwd_microstep: 1666.03 | bwd_inner_microstep: 1666.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 14:50:37,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.39 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 14:50:39,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1440.69 | bwd_inner_microstep: 1440.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3561
[2024-06-10 14:50:41,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.32 | bwd_microstep: 1693.72 | bwd_inner_microstep: 1693.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 14:50:43,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1394.31 | bwd_inner_microstep: 1394.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3840
[2024-06-10 14:50:45,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.37 | bwd_microstep: 1557.77 | bwd_inner_microstep: 1557.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 14:50:47,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-10 14:50:49,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.44 | bwd_microstep: 1189.57 | bwd_inner_microstep: 1189.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 14:50:51,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1395.29 | bwd_inner_microstep: 1395.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3820
[2024-06-10 14:50:53,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1600.65 | bwd_inner_microstep: 1600.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 14:50:55,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1485.85 | bwd_inner_microstep: 1485.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 14:50:57,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1496.27 | bwd_inner_microstep: 1496.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-10 14:50:59,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.04 | bwd_microstep: 1345.54 | bwd_inner_microstep: 1345.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 14:51:01,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1555.06 | bwd_inner_microstep: 1555.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 14:51:03,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1508.17 | bwd_inner_microstep: 1508.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 14:51:05,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.36 | bwd_microstep: 1158.61 | bwd_inner_microstep: 1158.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 14:51:07,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1432.68 | bwd_inner_microstep: 1432.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 14:51:09,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1400.69 | bwd_inner_microstep: 1400.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2034
[2024-06-10 14:51:10,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.81 | bwd_microstep: 905.67 | bwd_inner_microstep: 905.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 14:51:12,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.03 | bwd_microstep: 1342.50 | bwd_inner_microstep: 1342.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3771
[2024-06-10 14:51:14,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1604.08 | bwd_inner_microstep: 1604.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 14:51:17,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.20 | optimizer_step: 6.57
[2024-06-10 14:51:17,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.24 | bwd_microstep: 2686.27 | bwd_inner_microstep: 1830.32 | bwd_allreduce_microstep: 855.91 | step_microstep: 38.19
[2024-06-10 14:51:17,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16516.48 | bwd: 45245.31 | bwd_inner: 44388.41 | bwd_allreduce: 856.18 | step: 39.84
{'loss': 1.2389, 'learning_rate': 2.2470912146402935e-05, 'epoch': 0.48}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 14:51:19,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.46 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530
[2024-06-10 14:51:21,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.55 | bwd_microstep: 1195.07 | bwd_inner_microstep: 1195.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2414
[2024-06-10 14:51:22,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.19 | bwd_microstep: 1001.69 | bwd_inner_microstep: 1001.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 14:51:25,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.06 | bwd_microstep: 1639.85 | bwd_inner_microstep: 1639.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 14:51:27,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.95 | bwd_microstep: 1542.60 | bwd_inner_microstep: 1542.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 14:51:29,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 14:51:31,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.80 | bwd_microstep: 1341.26 | bwd_inner_microstep: 1341.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3543
[2024-06-10 14:51:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.98 | bwd_microstep: 1197.14 | bwd_inner_microstep: 1197.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235
[2024-06-10 14:51:33,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.16 | bwd_microstep: 863.84 | bwd_inner_microstep: 863.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 14:51:35,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.54 | bwd_microstep: 1189.05 | bwd_inner_microstep: 1189.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3410
[2024-06-10 14:51:37,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1387.68 | bwd_inner_microstep: 1387.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2091
[2024-06-10 14:51:38,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.42 | bwd_microstep: 793.45 | bwd_inner_microstep: 793.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 14:51:40,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.17 | bwd_microstep: 1386.21 | bwd_inner_microstep: 1386.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 14:51:42,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.14 | bwd_microstep: 1517.91 | bwd_inner_microstep: 1517.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2887
[2024-06-10 14:51:44,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.04 | bwd_microstep: 1182.37 | bwd_inner_microstep: 1182.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 14:51:46,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831
[2024-06-10 14:51:47,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.22 | bwd_microstep: 1358.28 | bwd_inner_microstep: 1358.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150
[2024-06-10 14:51:49,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.78 | bwd_microstep: 950.35 | bwd_inner_microstep: 950.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3687
[2024-06-10 14:51:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.94 | bwd_microstep: 1553.19 | bwd_inner_microstep: 1553.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 14:51:53,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.65 | bwd_microstep: 1659.03 | bwd_inner_microstep: 1659.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 14:51:55,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1253.08 | bwd_inner_microstep: 1253.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 14:51:57,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.19 | bwd_microstep: 1409.15 | bwd_inner_microstep: 1409.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 14:51:58,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 800.36 | bwd_inner_microstep: 800.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 14:52:00,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1407.22 | bwd_inner_microstep: 1407.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 14:52:02,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3828
[2024-06-10 14:52:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.06 | bwd_microstep: 1723.55 | bwd_inner_microstep: 1723.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 14:52:06,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.81 | bwd_microstep: 973.35 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061
[2024-06-10 14:52:07,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.84 | bwd_microstep: 910.88 | bwd_inner_microstep: 910.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486
[2024-06-10 14:52:09,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.31 | bwd_microstep: 1574.83 | bwd_inner_microstep: 1574.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3767
[2024-06-10 14:52:11,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.00 | bwd_microstep: 1607.10 | bwd_inner_microstep: 1607.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-10 14:52:13,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.05 | bwd_microstep: 1439.06 | bwd_inner_microstep: 1439.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053
[2024-06-10 14:52:19,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 14:52:19,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.66 | bwd_microstep: 5666.15 | bwd_inner_microstep: 931.12 | bwd_allreduce_microstep: 4734.98 | step_microstep: 37.97
[2024-06-10 14:52:19,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.52 | bwd: 45965.81 | bwd_inner: 41229.78 | bwd_allreduce: 4735.29 | step: 39.55
{'loss': 1.2243, 'learning_rate': 2.2433661406208055e-05, 'epoch': 0.48}
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2704
[2024-06-10 14:52:21,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.28 | bwd_microstep: 1026.28 | bwd_inner_microstep: 1026.13 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 14:52:22,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1339.68 | bwd_inner_microstep: 1339.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 14:52:24,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.58 | bwd_microstep: 786.84 | bwd_inner_microstep: 786.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1957
[2024-06-10 14:52:25,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.57 | bwd_microstep: 730.78 | bwd_inner_microstep: 730.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3760
[2024-06-10 14:52:27,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1488.06 | bwd_inner_microstep: 1488.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 14:52:28,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.84 | bwd_microstep: 1287.88 | bwd_inner_microstep: 1287.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 14:52:30,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1387.20 | bwd_inner_microstep: 1387.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-10 14:52:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.52 | bwd_microstep: 1444.71 | bwd_inner_microstep: 1444.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 14:52:34,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1286.10 | bwd_inner_microstep: 1286.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 14:52:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.28 | bwd_microstep: 1151.24 | bwd_inner_microstep: 1151.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3877
[2024-06-10 14:52:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.32 | bwd_microstep: 1718.76 | bwd_inner_microstep: 1718.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3669
[2024-06-10 14:52:40,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1386.46 | bwd_inner_microstep: 1386.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 14:52:42,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 14:52:44,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.85 | bwd_microstep: 1338.43 | bwd_inner_microstep: 1338.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 14:52:46,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.40 | bwd_microstep: 1604.45 | bwd_inner_microstep: 1604.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3543
[2024-06-10 14:52:48,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.87 | bwd_microstep: 1589.09 | bwd_inner_microstep: 1589.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503
[2024-06-10 14:52:50,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1316.31 | bwd_inner_microstep: 1316.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 14:52:52,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.64 | bwd_microstep: 1352.46 | bwd_inner_microstep: 1352.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2078
[2024-06-10 14:52:53,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.03 | bwd_microstep: 848.58 | bwd_inner_microstep: 848.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-10 14:52:55,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.61 | bwd_microstep: 1710.79 | bwd_inner_microstep: 1710.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2214
[2024-06-10 14:52:57,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.61 | bwd_microstep: 958.56 | bwd_inner_microstep: 958.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1973
[2024-06-10 14:52:57,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.26 | bwd_microstep: 704.65 | bwd_inner_microstep: 704.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960
[2024-06-10 14:52:58,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.32 | bwd_microstep: 701.90 | bwd_inner_microstep: 701.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-10 14:53:00,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.77 | bwd_microstep: 1340.97 | bwd_inner_microstep: 1340.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 14:53:02,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1398.96 | bwd_inner_microstep: 1398.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 14:53:04,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1497.72 | bwd_inner_microstep: 1497.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 14:53:07,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.17 | bwd_microstep: 1607.09 | bwd_inner_microstep: 1607.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 14:53:09,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1553.63 | bwd_inner_microstep: 1553.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3724
[2024-06-10 14:53:11,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.08 | bwd_microstep: 1669.26 | bwd_inner_microstep: 1669.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273
[2024-06-10 14:53:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.59 | bwd_microstep: 1072.01 | bwd_inner_microstep: 1071.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 14:53:14,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1254.88 | bwd_inner_microstep: 1254.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 14:53:21,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 14:53:21,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.26 | bwd_microstep: 6331.68 | bwd_inner_microstep: 1686.32 | bwd_allreduce_microstep: 4645.31 | step_microstep: 38.06
[2024-06-10 14:53:21,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15440.05 | bwd: 46131.69 | bwd_inner: 41485.36 | bwd_allreduce: 4645.59 | step: 39.59
{'loss': 1.234, 'learning_rate': 2.2396402094670345e-05, 'epoch': 0.48}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 14:53:23,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.76 | bwd_microstep: 1327.62 | bwd_inner_microstep: 1327.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 14:53:25,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.45 | bwd_microstep: 1274.49 | bwd_inner_microstep: 1274.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-10 14:53:26,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1279.02 | bwd_inner_microstep: 1278.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 14:53:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.52 | bwd_microstep: 1272.71 | bwd_inner_microstep: 1272.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3629
[2024-06-10 14:53:30,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.76 | bwd_microstep: 1457.84 | bwd_inner_microstep: 1457.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 14:53:32,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.41 | bwd_microstep: 1279.97 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043
[2024-06-10 14:53:33,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.71 | bwd_microstep: 806.90 | bwd_inner_microstep: 806.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 14:53:35,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1249.77 | bwd_inner_microstep: 1249.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 14:53:37,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1377.49 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622
[2024-06-10 14:53:39,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.54 | bwd_microstep: 1312.67 | bwd_inner_microstep: 1312.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 14:53:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 14:53:41,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.83 | bwd_microstep: 687.31 | bwd_inner_microstep: 687.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2202
[2024-06-10 14:53:43,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.61 | bwd_microstep: 893.45 | bwd_inner_microstep: 893.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3386
[2024-06-10 14:53:44,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.89 | bwd_microstep: 1240.10 | bwd_inner_microstep: 1240.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505
[2024-06-10 14:53:46,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1428.04 | bwd_inner_microstep: 1428.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 14:53:49,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.81 | bwd_microstep: 1610.21 | bwd_inner_microstep: 1610.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1967
[2024-06-10 14:53:50,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.50 | bwd_microstep: 747.28 | bwd_inner_microstep: 747.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 14:53:52,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.95 | bwd_microstep: 1519.86 | bwd_inner_microstep: 1519.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 14:53:53,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1255.99 | bwd_inner_microstep: 1255.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 14:53:55,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.48 | bwd_microstep: 1453.66 | bwd_inner_microstep: 1453.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2139
[2024-06-10 14:53:56,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.84 | bwd_microstep: 736.80 | bwd_inner_microstep: 736.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536
[2024-06-10 14:53:58,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1419.90 | bwd_inner_microstep: 1419.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3587
[2024-06-10 14:54:01,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.36 | bwd_microstep: 1566.82 | bwd_inner_microstep: 1566.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 14:54:03,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1552.83 | bwd_inner_microstep: 1552.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 14:54:05,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.95 | bwd_microstep: 1374.13 | bwd_inner_microstep: 1374.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 14:54:07,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.26 | bwd_microstep: 1658.32 | bwd_inner_microstep: 1658.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 14:54:09,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1404.98 | bwd_inner_microstep: 1404.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 14:54:10,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.01 | bwd_microstep: 1185.55 | bwd_inner_microstep: 1185.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 14:54:13,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1655.53 | bwd_inner_microstep: 1655.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3753
[2024-06-10 14:54:15,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.58 | bwd_microstep: 1804.12 | bwd_inner_microstep: 1804.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3548
[2024-06-10 14:54:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1534.14 | bwd_inner_microstep: 1534.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456
[2024-06-10 14:54:23,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.37 | optimizer_step: 6.62
[2024-06-10 14:54:23,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 4845.57 | bwd_inner_microstep: 1566.47 | bwd_allreduce_microstep: 3279.03 | step_microstep: 39.41
[2024-06-10 14:54:23,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15764.93 | bwd: 45556.87 | bwd_inner: 42276.92 | bwd_allreduce: 3279.27 | step: 40.91
{'loss': 1.2255, 'learning_rate': 2.2359134343016926e-05, 'epoch': 0.48}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 14:54:25,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.51 | bwd_microstep: 1274.06 | bwd_inner_microstep: 1274.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943
[2024-06-10 14:54:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.28 | bwd_microstep: 758.16 | bwd_inner_microstep: 758.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 14:54:27,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1343.02 | bwd_inner_microstep: 1343.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 14:54:30,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.98 | bwd_microstep: 1554.53 | bwd_inner_microstep: 1554.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 14:54:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.97 | bwd_microstep: 1538.38 | bwd_inner_microstep: 1538.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407
[2024-06-10 14:54:33,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.98 | bwd_microstep: 1178.87 | bwd_inner_microstep: 1178.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 14:54:35,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1244.90 | bwd_inner_microstep: 1244.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 14:54:37,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.21 | bwd_microstep: 1278.21 | bwd_inner_microstep: 1278.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3692
[2024-06-10 14:54:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.83 | bwd_microstep: 1604.78 | bwd_inner_microstep: 1604.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 14:54:41,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.65 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3676
[2024-06-10 14:54:43,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.98 | bwd_microstep: 1447.09 | bwd_inner_microstep: 1447.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-10 14:54:45,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.70 | bwd_microstep: 1433.31 | bwd_inner_microstep: 1433.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 14:54:47,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.46 | bwd_microstep: 1479.34 | bwd_inner_microstep: 1479.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 14:54:49,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1245.50 | bwd_inner_microstep: 1245.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 14:54:51,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.90 | bwd_microstep: 1274.39 | bwd_inner_microstep: 1274.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 14:54:52,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1350.41 | bwd_inner_microstep: 1350.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003
[2024-06-10 14:54:54,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.82 | bwd_microstep: 898.59 | bwd_inner_microstep: 898.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3676
[2024-06-10 14:54:56,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.20 | bwd_microstep: 1358.24 | bwd_inner_microstep: 1358.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 14:54:58,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1395.06 | bwd_inner_microstep: 1395.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 14:55:00,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1557.09 | bwd_inner_microstep: 1557.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 14:55:01,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.15 | bwd_microstep: 974.96 | bwd_inner_microstep: 974.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 14:55:03,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1295.75 | bwd_inner_microstep: 1295.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 14:55:04,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 797.43 | bwd_inner_microstep: 797.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 14:55:05,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.74 | bwd_microstep: 698.02 | bwd_inner_microstep: 697.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 14:55:07,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1510.28 | bwd_inner_microstep: 1510.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3777
[2024-06-10 14:55:09,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.18 | bwd_microstep: 1348.13 | bwd_inner_microstep: 1348.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2432
[2024-06-10 14:55:10,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.07 | bwd_microstep: 876.25 | bwd_inner_microstep: 876.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 14:55:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1509.29 | bwd_inner_microstep: 1509.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 14:55:14,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.72 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 14:55:16,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.26 | bwd_microstep: 1481.30 | bwd_inner_microstep: 1481.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3588
[2024-06-10 14:55:18,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.40 | bwd_microstep: 1366.93 | bwd_inner_microstep: 1366.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 14:55:26,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 14:55:26,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 7377.35 | bwd_inner_microstep: 1521.18 | bwd_allreduce_microstep: 5856.12 | step_microstep: 38.28
[2024-06-10 14:55:26,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15467.27 | bwd: 47215.90 | bwd_inner: 41358.88 | bwd_allreduce: 5856.34 | step: 39.71
{'loss': 1.2273, 'learning_rate': 2.2321858282504606e-05, 'epoch': 0.48}
it]


 48%|████▊     | 822/1726 [14:12:52<15:28:58, 61.66s/it]
 48%|████▊     | 823/1726 [14:13:54<15:29:59, 61.79s/it]


 48%|████▊     | 823/1726 [14:13:54<15:29:59, 61.79s/it]
 48%|████▊     | 824/1726 [14:14:56<15:28:36, 61.77s/it]


 48%|████▊     | 824/1726 [14:14:56<15:28:36, 61.77s/it]
 48%|████▊     | 825/1726 [14:15:58<15:28:12, 61.81s/it]


 48%|████▊     | 825/1726 [14:15:58<15:28:12, 61.81s/it]
 48%|████▊     | 826/1726 [14:17:00<15:26:28, 61.76s/it]


 48%|████▊     | 826/1726 [14:17:00<15:26:28, 61.76s/it]
 48%|████▊     | 827/1726 [14:18:03<15:31:01, 62.14s/it]


 4dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 14:55:28,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.51 | bwd_microstep: 1467.92 | bwd_inner_microstep: 1467.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3985
[2024-06-10 14:55:30,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.44 | bwd_microstep: 1702.59 | bwd_inner_microstep: 1702.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1856
[2024-06-10 14:55:31,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.65 | bwd_microstep: 670.20 | bwd_inner_microstep: 670.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 14:55:33,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.02 | bwd_microstep: 1374.28 | bwd_inner_microstep: 1374.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3800
[2024-06-10 14:55:35,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.51 | bwd_microstep: 1412.28 | bwd_inner_microstep: 1412.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-10 14:55:37,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.12 | bwd_microstep: 1534.94 | bwd_inner_microstep: 1534.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 14:55:39,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1435.68 | bwd_inner_microstep: 1435.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-10 14:55:41,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.19 | bwd_microstep: 1189.05 | bwd_inner_microstep: 1189.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 14:55:43,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1310.91 | bwd_inner_microstep: 1310.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1351
[2024-06-10 14:55:43,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.94 | bwd_microstep: 550.81 | bwd_inner_microstep: 550.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4274
[2024-06-10 14:55:46,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.77 | bwd_microstep: 1874.79 | bwd_inner_microstep: 1874.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1989
[2024-06-10 14:55:47,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.71 | bwd_microstep: 735.00 | bwd_inner_microstep: 734.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2946
[2024-06-10 14:55:48,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.30 | bwd_microstep: 1014.74 | bwd_inner_microstep: 1014.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 14:55:50,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.52 | bwd_microstep: 1321.39 | bwd_inner_microstep: 1321.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 14:55:52,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1481.62 | bwd_inner_microstep: 1481.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 14:55:54,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1488.20 | bwd_inner_microstep: 1488.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-10 14:55:56,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.31 | bwd_microstep: 1586.68 | bwd_inner_microstep: 1586.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3390
[2024-06-10 14:55:58,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.00 | bwd_microstep: 1274.45 | bwd_inner_microstep: 1274.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 14:56:00,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3553
[2024-06-10 14:56:02,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.81 | bwd_microstep: 1557.65 | bwd_inner_microstep: 1557.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919
[2024-06-10 14:56:04,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.29 | bwd_microstep: 1187.40 | bwd_inner_microstep: 1187.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 14:56:06,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1394.90 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 14:56:08,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1389.17 | bwd_inner_microstep: 1389.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3611
[2024-06-10 14:56:09,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1340.64 | bwd_inner_microstep: 1340.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 14:56:12,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.95 | bwd_microstep: 1608.88 | bwd_inner_microstep: 1608.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813
[2024-06-10 14:56:14,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3720
[2024-06-10 14:56:16,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.84 | bwd_microstep: 1732.54 | bwd_inner_microstep: 1732.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3813
[2024-06-10 14:56:18,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.05 | bwd_microstep: 1704.97 | bwd_inner_microstep: 1704.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3770
[2024-06-10 14:56:20,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.12 | bwd_microstep: 1502.80 | bwd_inner_microstep: 1502.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 14:56:22,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.13 | bwd_microstep: 1258.70 | bwd_inner_microstep: 1258.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 14:56:24,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1495.44 | bwd_inner_microstep: 1495.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3809
[2024-06-10 14:56:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 14:56:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1858.66 | bwd_inner_microstep: 1778.21 | bwd_allreduce_microstep: 80.41 | step_microstep: 37.61
[2024-06-10 14:56:27,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16419.00 | bwd: 44186.21 | bwd_inner: 44104.90 | bwd_allreduce: 80.63 | step: 39.12
{'loss': 1.2363, 'learning_rate': 2.228457404441949e-05, 'epoch': 0.48}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 14:56:29,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1370.35 | bwd_inner_microstep: 1370.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 14:56:30,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1244.83 | bwd_inner_microstep: 1244.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 14:56:32,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.31 | bwd_microstep: 1552.87 | bwd_inner_microstep: 1552.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 14:56:35,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1549.84 | bwd_inner_microstep: 1549.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2438
[2024-06-10 14:56:36,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.95 | bwd_microstep: 913.85 | bwd_inner_microstep: 913.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4134
[2024-06-10 14:56:38,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.64 | bwd_microstep: 1640.03 | bwd_inner_microstep: 1640.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2884
[2024-06-10 14:56:40,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.78 | bwd_microstep: 1057.01 | bwd_inner_microstep: 1056.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 14:56:42,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1382.53 | bwd_inner_microstep: 1382.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 844
[2024-06-10 14:56:42,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.22 | bwd_microstep: 345.14 | bwd_inner_microstep: 345.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 14:56:44,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.67 | bwd_microstep: 1623.18 | bwd_inner_microstep: 1623.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 14:56:46,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.98 | bwd_microstep: 1624.27 | bwd_inner_microstep: 1624.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-10 14:56:48,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.08 | bwd_microstep: 1185.05 | bwd_inner_microstep: 1185.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 14:56:50,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1411.24 | bwd_inner_microstep: 1411.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 14:56:52,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.32 | bwd_microstep: 1341.05 | bwd_inner_microstep: 1341.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3487
[2024-06-10 14:56:54,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.04 | bwd_microstep: 1455.81 | bwd_inner_microstep: 1455.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 14:56:56,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.71 | bwd_microstep: 1275.30 | bwd_inner_microstep: 1275.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3827
[2024-06-10 14:56:58,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.33 | bwd_microstep: 1580.07 | bwd_inner_microstep: 1580.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-10 14:56:59,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 810.30 | bwd_inner_microstep: 810.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513
[2024-06-10 14:57:01,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.59 | bwd_microstep: 1189.92 | bwd_inner_microstep: 1189.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 14:57:02,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 14:57:04,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1380.06 | bwd_inner_microstep: 1380.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 14:57:06,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.26 | bwd_microstep: 1288.47 | bwd_inner_microstep: 1288.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 14:57:08,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1626.66 | bwd_inner_microstep: 1626.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2274
[2024-06-10 14:57:09,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.94 | bwd_microstep: 781.18 | bwd_inner_microstep: 781.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 14:57:11,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1356.65 | bwd_inner_microstep: 1356.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 14:57:13,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 14:57:15,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.06 | bwd_microstep: 1453.72 | bwd_inner_microstep: 1453.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3771
[2024-06-10 14:57:18,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.30 | bwd_microstep: 1674.79 | bwd_inner_microstep: 1674.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 14:57:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.49 | bwd_microstep: 1377.93 | bwd_inner_microstep: 1377.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3616
[2024-06-10 14:57:22,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.04 | bwd_microstep: 1556.36 | bwd_inner_microstep: 1556.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 14:57:24,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1548.73 | bwd_inner_microstep: 1548.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058
[2024-06-10 14:57:27,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 14:57:27,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.10 | bwd_microstep: 3056.31 | bwd_inner_microstep: 1043.65 | bwd_allreduce_microstep: 2012.61 | step_microstep: 37.94
[2024-06-10 14:57:27,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15790.80 | bwd: 44301.66 | bwd_inner: 42288.15 | bwd_allreduce: 2012.83 | step: 39.44
{'loss': 1.2594, 'learning_rate': 2.2247281760076468e-05, 'epoch': 0.48}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 14:57:29,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.43 | bwd_microstep: 1472.67 | bwd_inner_microstep: 1472.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3473
[2024-06-10 14:57:31,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.06 | bwd_microstep: 1237.05 | bwd_inner_microstep: 1237.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 14:57:33,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1278.32 | bwd_inner_microstep: 1278.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 14:57:35,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.10 | bwd_microstep: 1448.15 | bwd_inner_microstep: 1448.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798
[2024-06-10 14:57:37,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.26 | bwd_microstep: 1546.98 | bwd_inner_microstep: 1546.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 14:57:38,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.69 | bwd_microstep: 678.34 | bwd_inner_microstep: 678.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-10 14:57:40,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.27 | bwd_microstep: 1643.86 | bwd_inner_microstep: 1643.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 14:57:42,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.11 | bwd_microstep: 1302.81 | bwd_inner_microstep: 1302.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 14:57:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.78 | bwd_microstep: 1526.90 | bwd_inner_microstep: 1526.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 14:57:45,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 798.69 | bwd_inner_microstep: 798.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 14:57:47,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.00 | bwd_microstep: 1190.20 | bwd_inner_microstep: 1190.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 14:57:49,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1322.67 | bwd_inner_microstep: 1322.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 14:57:50,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.77 | bwd_microstep: 1317.80 | bwd_inner_microstep: 1317.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3506
[2024-06-10 14:57:52,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1251.81 | bwd_inner_microstep: 1251.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 14:57:54,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.04 | bwd_microstep: 1386.50 | bwd_inner_microstep: 1386.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 14:57:56,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1474.38 | bwd_inner_microstep: 1474.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2203
[2024-06-10 14:57:57,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.20 | bwd_microstep: 988.33 | bwd_inner_microstep: 988.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 14:57:59,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1405.89 | bwd_inner_microstep: 1405.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082
[2024-06-10 14:58:01,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.26 | bwd_microstep: 851.03 | bwd_inner_microstep: 851.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862
[2024-06-10 14:58:01,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.62 | bwd_microstep: 677.26 | bwd_inner_microstep: 677.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3548
[2024-06-10 14:58:04,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.64 | bwd_microstep: 1522.74 | bwd_inner_microstep: 1522.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 14:58:05,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.10 | bwd_microstep: 1187.25 | bwd_inner_microstep: 1187.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 14:58:07,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.08 | bwd_microstep: 1503.55 | bwd_inner_microstep: 1503.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546
[2024-06-10 14:58:09,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1199.98 | bwd_inner_microstep: 1199.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3467
[2024-06-10 14:58:11,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.77 | bwd_microstep: 1214.22 | bwd_inner_microstep: 1214.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 14:58:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1286.91 | bwd_inner_microstep: 1286.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-10 14:58:14,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.50 | bwd_microstep: 1192.62 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 5317
[2024-06-10 14:58:17,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 737.35 | bwd_microstep: 1965.92 | bwd_inner_microstep: 1965.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 14:58:19,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1591.65 | bwd_inner_microstep: 1591.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 14:58:21,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 14:58:23,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1497.44 | bwd_inner_microstep: 1497.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2282
[2024-06-10 14:58:27,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 14:58:27,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.71 | bwd_microstep: 3627.17 | bwd_inner_microstep: 1214.52 | bwd_allreduce_microstep: 2412.60 | step_microstep: 38.08
[2024-06-10 14:58:27,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15601.87 | bwd: 43939.58 | bwd_inner: 41526.08 | bwd_allreduce: 2412.82 | step: 39.58
{'loss': 1.2062, 'learning_rate': 2.2209981560818763e-05, 'epoch': 0.48}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 14:58:29,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1332.20 | bwd_inner_microstep: 1332.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-10 14:58:31,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1561.31 | bwd_inner_microstep: 1561.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 14:58:33,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.47 | bwd_microstep: 1652.03 | bwd_inner_microstep: 1652.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 14:58:35,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1245.51 | bwd_inner_microstep: 1245.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 14:58:37,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.19 | bwd_microstep: 1543.54 | bwd_inner_microstep: 1543.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-10 14:58:39,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.35 | bwd_microstep: 1629.25 | bwd_inner_microstep: 1629.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 14:58:41,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.27 | bwd_microstep: 1147.02 | bwd_inner_microstep: 1146.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3538
[2024-06-10 14:58:43,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1230.96 | bwd_inner_microstep: 1230.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 14:58:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1389.43 | bwd_inner_microstep: 1389.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1887
[2024-06-10 14:58:46,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.91 | bwd_microstep: 744.77 | bwd_inner_microstep: 744.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 14:58:47,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 14:58:49,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.18 | bwd_microstep: 804.58 | bwd_inner_microstep: 804.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2906
[2024-06-10 14:58:50,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.16 | bwd_microstep: 1186.61 | bwd_inner_microstep: 1186.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-10 14:58:52,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.43 | bwd_microstep: 1587.24 | bwd_inner_microstep: 1587.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971
[2024-06-10 14:58:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 887.78 | bwd_inner_microstep: 887.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2405
[2024-06-10 14:58:55,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.47 | bwd_microstep: 942.85 | bwd_inner_microstep: 942.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3529
[2024-06-10 14:58:57,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1555.79 | bwd_inner_microstep: 1555.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 14:58:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 705.94 | bwd_inner_microstep: 705.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 14:59:00,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.33 | bwd_microstep: 1434.41 | bwd_inner_microstep: 1434.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 14:59:02,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1294.56 | bwd_inner_microstep: 1294.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 14:59:04,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1287.49 | bwd_inner_microstep: 1287.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-10 14:59:05,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.15 | bwd_microstep: 801.84 | bwd_inner_microstep: 801.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-10 14:59:06,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1307.99 | bwd_inner_microstep: 1307.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 14:59:09,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.50 | bwd_microstep: 1542.78 | bwd_inner_microstep: 1542.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016
[2024-06-10 14:59:10,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.08 | bwd_microstep: 711.86 | bwd_inner_microstep: 711.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3652
[2024-06-10 14:59:12,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.81 | bwd_microstep: 1482.42 | bwd_inner_microstep: 1482.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 14:59:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.59 | bwd_microstep: 1304.11 | bwd_inner_microstep: 1304.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803
[2024-06-10 14:59:15,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 1455.99 | bwd_inner_microstep: 1455.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 14:59:18,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.53 | bwd_microstep: 1650.30 | bwd_inner_microstep: 1650.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2927
[2024-06-10 14:59:19,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.95 | bwd_microstep: 1031.28 | bwd_inner_microstep: 1031.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-10 14:59:21,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1591.97 | bwd_inner_microstep: 1591.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-10 14:59:28,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 14:59:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.62 | bwd_microstep: 5733.30 | bwd_inner_microstep: 1625.78 | bwd_allreduce_microstep: 4107.47 | step_microstep: 38.22
[2024-06-10 14:59:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15292.86 | bwd: 45063.65 | bwd_inner: 40955.27 | bwd_allreduce: 4107.70 | step: 39.73
{'loss': 1.2794, 'learning_rate': 2.2172673578017497e-05, 'epoch': 0.48}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 14:59:30,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.57 | bwd_microstep: 1467.97 | bwd_inner_microstep: 1467.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 14:59:31,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.84 | bwd_microstep: 1239.14 | bwd_inner_microstep: 1239.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 14:59:34,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.30 | bwd_microstep: 1546.30 | bwd_inner_microstep: 1546.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1860
[2024-06-10 14:59:35,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.06 | bwd_microstep: 676.65 | bwd_inner_microstep: 676.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 14:59:36,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1389.03 | bwd_inner_microstep: 1389.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 14:59:38,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.60 | bwd_microstep: 1181.19 | bwd_inner_microstep: 1181.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 14:59:40,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1384.25 | bwd_inner_microstep: 1384.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1037
[2024-06-10 14:59:41,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.75 | bwd_microstep: 399.93 | bwd_inner_microstep: 399.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 14:59:43,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.57 | bwd_microstep: 1520.07 | bwd_inner_microstep: 1520.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 14:59:44,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1246.70 | bwd_inner_microstep: 1246.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 14:59:46,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.20 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 14:59:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.58 | bwd_microstep: 1340.44 | bwd_inner_microstep: 1340.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3646
[2024-06-10 14:59:50,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1312.74 | bwd_inner_microstep: 1312.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 14:59:52,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1377.34 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1885
[2024-06-10 14:59:53,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.60 | bwd_microstep: 773.63 | bwd_inner_microstep: 773.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3631
[2024-06-10 14:59:55,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1541.32 | bwd_inner_microstep: 1541.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1958
[2024-06-10 14:59:56,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.21 | bwd_microstep: 852.77 | bwd_inner_microstep: 852.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3842
[2024-06-10 14:59:59,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.33 | bwd_microstep: 1725.04 | bwd_inner_microstep: 1725.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 15:00:01,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1507.19 | bwd_inner_microstep: 1507.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 15:00:02,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.19 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 15:00:05,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1507.01 | bwd_inner_microstep: 1506.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4018
[2024-06-10 15:00:07,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1616.03 | bwd_inner_microstep: 1616.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2279
[2024-06-10 15:00:08,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.69 | bwd_microstep: 933.08 | bwd_inner_microstep: 933.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895
[2024-06-10 15:00:09,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.41 | bwd_microstep: 715.83 | bwd_inner_microstep: 715.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 15:00:11,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.88 | bwd_microstep: 1647.72 | bwd_inner_microstep: 1647.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-10 15:00:13,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.30 | bwd_microstep: 1546.67 | bwd_inner_microstep: 1546.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695
[2024-06-10 15:00:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.21 | bwd_microstep: 1460.58 | bwd_inner_microstep: 1460.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2074
[2024-06-10 15:00:17,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.84 | bwd_microstep: 912.45 | bwd_inner_microstep: 912.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2262
[2024-06-10 15:00:18,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.62 | bwd_microstep: 884.00 | bwd_inner_microstep: 883.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478
[2024-06-10 15:00:20,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.01 | bwd_microstep: 1341.07 | bwd_inner_microstep: 1341.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2041
[2024-06-10 15:00:21,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.08 | bwd_microstep: 842.66 | bwd_inner_microstep: 842.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 15:00:27,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 15:00:27,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.86 | bwd_microstep: 5440.39 | bwd_inner_microstep: 1732.50 | bwd_allreduce_microstep: 3707.83 | step_microstep: 38.19
[2024-06-10 15:00:27,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14991.09 | bwd: 43998.82 | bwd_inner: 40290.08 | bwd_allreduce: 3708.06 | step: 39.66
{'loss': 1.275, 'learning_rate': 2.213535794307118e-05, 'epoch': 0.48}
8%|████▊     | 827/1726 [14:18:03<15:31:01, 62.14s/it]
 48%|████▊     | 828/1726 [14:19:03<15:24:37, 61.78s/it]


 48%|████▊     | 828/1726 [14:19:03<15:24:37, 61.78s/it]
 48%|████▊     | 829/1726 [14:20:04<15:17:29, 61.37s/it]


 48%|████▊     | 829/1726 [14:20:04<15:17:29, 61.37s/it]
 48%|████▊     | 830/1726 [14:21:04<15:09:44, 60.92s/it]


 48%|████▊     | 830/1726 [14:21:04<15:09:44, 60.92s/it]
 48%|████▊     | 831/1726 [14:22:04<15:07:40, 60.85s/it]


 48%|████▊     | 831/1726 [14:22:04<15:07:40, 60.85s/it]
 48%|████▊     | 832/1726 [14:23:04<14:59:47, 60.39s/it]


 48%|████▊     | 832/1726 [14:23:04<14:59:47, 60.39s/it]dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3401
[2024-06-10 15:00:29,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.59 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397
[2024-06-10 15:00:31,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.15 | bwd_microstep: 1145.71 | bwd_inner_microstep: 1145.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 15:00:32,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1307.24 | bwd_inner_microstep: 1307.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 15:00:34,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.52 | bwd_microstep: 1543.33 | bwd_inner_microstep: 1543.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 15:00:36,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1278.23 | bwd_inner_microstep: 1278.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 15:00:38,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 15:00:40,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1382.16 | bwd_inner_microstep: 1382.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3782
[2024-06-10 15:00:42,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1413.19 | bwd_inner_microstep: 1413.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 15:00:44,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1256.03 | bwd_inner_microstep: 1256.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 15:00:46,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1410.31 | bwd_inner_microstep: 1410.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 15:00:47,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.02 | bwd_microstep: 1157.90 | bwd_inner_microstep: 1157.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 15:00:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1501.32 | bwd_inner_microstep: 1501.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 15:00:51,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1436.69 | bwd_inner_microstep: 1436.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 15:00:53,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.72 | bwd_microstep: 1389.36 | bwd_inner_microstep: 1389.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489
[2024-06-10 15:00:55,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.38 | bwd_microstep: 1429.18 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 15:00:57,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.37 | bwd_microstep: 1483.71 | bwd_inner_microstep: 1483.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3631
[2024-06-10 15:01:00,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.61 | bwd_microstep: 1811.40 | bwd_inner_microstep: 1811.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 15:01:02,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.79 | bwd_microstep: 1602.52 | bwd_inner_microstep: 1602.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-10 15:01:04,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.01 | bwd_microstep: 1579.49 | bwd_inner_microstep: 1579.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528
[2024-06-10 15:01:06,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1324.56 | bwd_inner_microstep: 1324.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3869
[2024-06-10 15:01:08,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1498.17 | bwd_inner_microstep: 1498.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 15:01:10,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1157.82 | bwd_inner_microstep: 1157.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1375
[2024-06-10 15:01:10,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.73 | bwd_microstep: 522.22 | bwd_inner_microstep: 522.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 15:01:12,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.95 | bwd_microstep: 1534.16 | bwd_inner_microstep: 1534.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2133
[2024-06-10 15:01:14,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.79 | bwd_microstep: 803.26 | bwd_inner_microstep: 803.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 15:01:15,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.00 | bwd_microstep: 1357.49 | bwd_inner_microstep: 1357.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 15:01:17,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.32 | bwd_microstep: 1306.95 | bwd_inner_microstep: 1306.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 15:01:19,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1451.86 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3819
[2024-06-10 15:01:22,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.23 | bwd_microstep: 1853.69 | bwd_inner_microstep: 1853.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 15:01:24,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.77 | bwd_microstep: 1505.38 | bwd_inner_microstep: 1505.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229
[2024-06-10 15:01:25,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.97 | bwd_microstep: 963.71 | bwd_inner_microstep: 963.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 15:01:30,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.30 | optimizer_step: 6.59
[2024-06-10 15:01:30,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 4091.40 | bwd_inner_microstep: 1945.31 | bwd_allreduce_microstep: 2146.04 | step_microstep: 38.26
[2024-06-10 15:01:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16345.64 | bwd: 46268.85 | bwd_inner: 44121.91 | bwd_allreduce: 2146.27 | step: 39.74
{'loss': 1.2421, 'learning_rate': 2.2098034787405288e-05, 'epoch': 0.48}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-10 15:01:32,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.81 | bwd_microstep: 1362.46 | bwd_inner_microstep: 1362.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3400
[2024-06-10 15:01:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.34 | bwd_microstep: 1196.28 | bwd_inner_microstep: 1196.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-10 15:01:36,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.02 | bwd_microstep: 1506.84 | bwd_inner_microstep: 1506.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 15:01:38,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.71 | bwd_microstep: 1488.02 | bwd_inner_microstep: 1487.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 15:01:40,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.46 | bwd_microstep: 1647.63 | bwd_inner_microstep: 1647.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-10 15:01:42,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.46 | bwd_microstep: 1639.62 | bwd_inner_microstep: 1639.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4107
[2024-06-10 15:01:44,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.23 | bwd_microstep: 1532.90 | bwd_inner_microstep: 1532.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 840
[2024-06-10 15:01:45,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.64 | bwd_microstep: 345.95 | bwd_inner_microstep: 345.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 15:01:46,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1246.69 | bwd_inner_microstep: 1246.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918
[2024-06-10 15:01:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.53 | bwd_microstep: 689.03 | bwd_inner_microstep: 689.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 15:01:50,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1558.21 | bwd_inner_microstep: 1558.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-10 15:01:52,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.01 | bwd_microstep: 1719.18 | bwd_inner_microstep: 1719.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976
[2024-06-10 15:01:53,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.24 | bwd_microstep: 889.63 | bwd_inner_microstep: 889.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 15:01:55,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1384.32 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 15:01:57,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.66 | bwd_microstep: 1604.09 | bwd_inner_microstep: 1604.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-10 15:01:59,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1438.15 | bwd_inner_microstep: 1438.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 15:02:01,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.70 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036
[2024-06-10 15:02:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 811.97 | bwd_inner_microstep: 811.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 15:02:04,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.19 | bwd_microstep: 1489.66 | bwd_inner_microstep: 1489.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-10 15:02:06,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1327.75 | bwd_inner_microstep: 1327.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 15:02:08,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1492.11 | bwd_inner_microstep: 1492.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 15:02:10,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.08 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754
[2024-06-10 15:02:12,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.35 | bwd_microstep: 1440.89 | bwd_inner_microstep: 1440.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2931
[2024-06-10 15:02:14,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.87 | bwd_microstep: 1166.79 | bwd_inner_microstep: 1166.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 15:02:16,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1393.44 | bwd_inner_microstep: 1393.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3611
[2024-06-10 15:02:18,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.05 | bwd_microstep: 1534.36 | bwd_inner_microstep: 1534.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-10 15:02:20,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1317.95 | bwd_inner_microstep: 1317.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728
[2024-06-10 15:02:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1335.47 | bwd_inner_microstep: 1335.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 15:02:23,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.65 | bwd_microstep: 965.53 | bwd_inner_microstep: 965.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798
[2024-06-10 15:02:25,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.07 | bwd_microstep: 1546.38 | bwd_inner_microstep: 1546.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3575
[2024-06-10 15:02:27,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.26 | bwd_microstep: 1643.11 | bwd_inner_microstep: 1643.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 15:02:33,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 15:02:33,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.66 | bwd_microstep: 5469.74 | bwd_inner_microstep: 1434.26 | bwd_allreduce_microstep: 4035.43 | step_microstep: 37.94
[2024-06-10 15:02:33,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15985.93 | bwd: 46992.06 | bwd_inner: 42955.69 | bwd_allreduce: 4035.67 | step: 39.46
{'loss': 1.2224, 'learning_rate': 2.206070424247178e-05, 'epoch': 0.48}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452
[2024-06-10 15:02:35,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.85 | bwd_microstep: 1369.71 | bwd_inner_microstep: 1369.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 15:02:37,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1380.90 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 15:02:39,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1350.32 | bwd_inner_microstep: 1350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3796
[2024-06-10 15:02:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.11 | bwd_microstep: 1575.75 | bwd_inner_microstep: 1575.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 15:02:43,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.06 | bwd_microstep: 1538.68 | bwd_inner_microstep: 1538.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-10 15:02:45,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.83 | bwd_microstep: 1445.63 | bwd_inner_microstep: 1445.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 15:02:47,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1386.00 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 15:02:49,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.85 | bwd_microstep: 1628.92 | bwd_inner_microstep: 1628.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 15:02:51,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1244.48 | bwd_inner_microstep: 1244.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 15:02:53,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 15:02:55,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 15:02:57,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1387.27 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 15:02:58,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1389.76 | bwd_inner_microstep: 1389.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 15:03:00,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1473.74 | bwd_inner_microstep: 1473.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3493
[2024-06-10 15:03:03,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.38 | bwd_microstep: 1680.41 | bwd_inner_microstep: 1680.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440
[2024-06-10 15:03:05,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.70 | bwd_microstep: 1380.88 | bwd_inner_microstep: 1380.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 15:03:07,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1375.64 | bwd_inner_microstep: 1375.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3526
[2024-06-10 15:03:09,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.81 | bwd_microstep: 1403.45 | bwd_inner_microstep: 1403.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 15:03:11,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1557.14 | bwd_inner_microstep: 1557.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 15:03:13,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.10 | bwd_microstep: 1557.97 | bwd_inner_microstep: 1557.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 15:03:14,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.07 | bwd_microstep: 1185.24 | bwd_inner_microstep: 1185.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 15:03:17,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1655.62 | bwd_inner_microstep: 1655.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504
[2024-06-10 15:03:18,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1255.93 | bwd_inner_microstep: 1255.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062
[2024-06-10 15:03:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 818.67 | bwd_inner_microstep: 818.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521
[2024-06-10 15:03:21,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.18 | bwd_microstep: 1195.23 | bwd_inner_microstep: 1195.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 15:03:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.03 | bwd_microstep: 1300.88 | bwd_inner_microstep: 1300.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2280
[2024-06-10 15:03:25,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1069.06 | bwd_inner_microstep: 1069.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 15:03:26,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1285.46 | bwd_inner_microstep: 1285.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 15:03:27,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 805.28 | bwd_inner_microstep: 805.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 15:03:30,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1588.74 | bwd_inner_microstep: 1588.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918
[2024-06-10 15:03:31,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.82 | bwd_microstep: 686.79 | bwd_inner_microstep: 686.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2928
[2024-06-10 15:03:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-10 15:03:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.29 | bwd_microstep: 3888.09 | bwd_inner_microstep: 1385.24 | bwd_allreduce_microstep: 2502.79 | step_microstep: 38.00
[2024-06-10 15:03:35,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.26 | bwd: 45389.20 | bwd_inner: 42885.50 | bwd_allreduce: 2503.02 | step: 39.42
{'loss': 1.2109, 'learning_rate': 2.2023366439748647e-05, 'epoch': 0.48}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 15:03:37,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1328.74 | bwd_inner_microstep: 1328.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4576
[2024-06-10 15:03:39,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.98 | bwd_microstep: 1750.22 | bwd_inner_microstep: 1750.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 15:03:41,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1381.92 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 15:03:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 15:03:45,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-10 15:03:46,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.28 | bwd_microstep: 805.10 | bwd_inner_microstep: 805.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 15:03:47,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 789.96 | bwd_inner_microstep: 789.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 15:03:49,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1315.57 | bwd_inner_microstep: 1315.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 15:03:51,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 15:03:52,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.76 | bwd_microstep: 788.67 | bwd_inner_microstep: 788.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 15:03:54,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.89 | bwd_microstep: 1627.26 | bwd_inner_microstep: 1627.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424
[2024-06-10 15:03:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.84 | bwd_microstep: 1154.69 | bwd_inner_microstep: 1154.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 15:03:58,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1444.57 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 15:04:00,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.76 | bwd_microstep: 1476.89 | bwd_inner_microstep: 1476.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 15:04:02,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.29 | bwd_microstep: 1617.20 | bwd_inner_microstep: 1617.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 15:04:04,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1417.66 | bwd_inner_microstep: 1417.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3487
[2024-06-10 15:04:06,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.16 | bwd_microstep: 1217.90 | bwd_inner_microstep: 1217.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832
[2024-06-10 15:04:08,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.01 | bwd_microstep: 1476.92 | bwd_inner_microstep: 1476.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 15:04:10,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.16 | bwd_microstep: 1555.43 | bwd_inner_microstep: 1555.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 15:04:12,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1480.93 | bwd_inner_microstep: 1480.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 15:04:14,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.86 | bwd_microstep: 1604.46 | bwd_inner_microstep: 1604.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3532
[2024-06-10 15:04:16,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 1592.05 | bwd_inner_microstep: 1592.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 15:04:18,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.06 | bwd_microstep: 1461.10 | bwd_inner_microstep: 1461.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978
[2024-06-10 15:04:19,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.61 | bwd_microstep: 735.64 | bwd_inner_microstep: 735.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 15:04:21,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1372.57 | bwd_inner_microstep: 1372.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 15:04:23,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1253.43 | bwd_inner_microstep: 1253.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 15:04:25,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.95 | bwd_microstep: 1403.28 | bwd_inner_microstep: 1403.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451
[2024-06-10 15:04:27,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1413.74 | bwd_inner_microstep: 1413.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 15:04:29,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1490.12 | bwd_inner_microstep: 1490.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795
[2024-06-10 15:04:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3777
[2024-06-10 15:04:33,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1285.19 | bwd_inner_microstep: 1285.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3808
[2024-06-10 15:04:36,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 15:04:36,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.09 | bwd_microstep: 2158.75 | bwd_inner_microstep: 1627.38 | bwd_allreduce_microstep: 531.33 | step_microstep: 37.75
[2024-06-10 15:04:36,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16244.36 | bwd: 44120.02 | bwd_inner: 43587.79 | bwd_allreduce: 531.55 | step: 39.29
{'loss': 1.2274, 'learning_rate': 2.198602151073943e-05, 'epoch': 0.48}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870
[2024-06-10 15:04:37,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.37 | bwd_microstep: 732.73 | bwd_inner_microstep: 732.63 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402
[2024-06-10 15:04:39,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.73 | bwd_microstep: 1367.46 | bwd_inner_microstep: 1367.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2785
[2024-06-10 15:04:40,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.49 | bwd_microstep: 1121.07 | bwd_inner_microstep: 1121.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-10 15:04:42,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.82 | bwd_microstep: 1646.56 | bwd_inner_microstep: 1646.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 15:04:45,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.03 | bwd_microstep: 1535.90 | bwd_inner_microstep: 1535.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 15:04:46,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 15:04:48,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1282.82 | bwd_inner_microstep: 1282.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 15:04:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1247.27 | bwd_inner_microstep: 1247.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 15:04:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.67 | bwd_microstep: 1528.86 | bwd_inner_microstep: 1528.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 15:04:54,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1289.09 | bwd_inner_microstep: 1289.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3650
[2024-06-10 15:04:56,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1389.37 | bwd_inner_microstep: 1389.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3673
[2024-06-10 15:04:58,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.95 | bwd_microstep: 1571.82 | bwd_inner_microstep: 1571.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3693
[2024-06-10 15:05:00,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.11 | bwd_microstep: 1473.95 | bwd_inner_microstep: 1473.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-10 15:05:02,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.73 | bwd_microstep: 1579.43 | bwd_inner_microstep: 1579.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3583
[2024-06-10 15:05:04,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.17 | bwd_microstep: 1333.35 | bwd_inner_microstep: 1333.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 15:05:06,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1578.32 | bwd_inner_microstep: 1578.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1937
[2024-06-10 15:05:07,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.06 | bwd_microstep: 819.67 | bwd_inner_microstep: 819.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-10 15:05:08,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.55 | bwd_microstep: 888.51 | bwd_inner_microstep: 888.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3826
[2024-06-10 15:05:11,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.18 | bwd_microstep: 1723.26 | bwd_inner_microstep: 1723.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 15:05:12,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.28 | bwd_microstep: 975.21 | bwd_inner_microstep: 975.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613
[2024-06-10 15:05:14,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 15:05:16,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1379.76 | bwd_inner_microstep: 1379.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 15:05:18,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1657.40 | bwd_inner_microstep: 1657.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 15:05:20,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.77 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 15:05:22,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1408.03 | bwd_inner_microstep: 1408.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3721
[2024-06-10 15:05:24,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1562.29 | bwd_inner_microstep: 1562.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 15:05:26,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1658.27 | bwd_inner_microstep: 1658.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600
[2024-06-10 15:05:28,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 1430.88 | bwd_inner_microstep: 1430.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782
[2024-06-10 15:05:30,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1444.54 | bwd_inner_microstep: 1444.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3715
[2024-06-10 15:05:32,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.56 | bwd_microstep: 1733.66 | bwd_inner_microstep: 1733.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398
[2024-06-10 15:05:34,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.51 | bwd_microstep: 1369.65 | bwd_inner_microstep: 1369.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 15:05:38,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 15:05:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.56 | bwd_microstep: 2845.78 | bwd_inner_microstep: 1740.53 | bwd_allreduce_microstep: 1105.20 | step_microstep: 37.88
[2024-06-10 15:05:38,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16435.47 | bwd: 45386.51 | bwd_inner: 44280.34 | bwd_allreduce: 1105.47 | step: 39.36
{'loss': 1.2077, 'learning_rate': 2.1948669586972776e-05, 'epoch': 0.48}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 15:05:40,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1339.87 | bwd_inner_microstep: 1339.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478
[2024-06-10 15:05:42,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.07 | bwd_microstep: 1439.34 | bwd_inner_microstep: 1439.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 15:05:44,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1483.19 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 15:05:46,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.76 | bwd_microstep: 1651.32 | bwd_inner_microstep: 1651.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 15:05:48,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.62 | bwd_microstep: 1276.40 | bwd_inner_microstep: 1276.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 15:05:50,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1393.00 | bwd_inner_microstep: 1392.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 15:05:52,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-10 15:05:53,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.17 | bwd_microstep: 1218.08 | bwd_inner_microstep: 1218.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 15:05:55,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1383.13 | bwd_inner_microstep: 1383.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 15:05:57,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 15:05:59,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 15:06:01,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.67 | bwd_microstep: 1185.49 | bwd_inner_microstep: 1185.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502
[2024-06-10 15:06:02,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1219.56 | bwd_inner_microstep: 1219.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 15:06:05,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.11 | bwd_microstep: 1617.02 | bwd_inner_microstep: 1617.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 15:06:07,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1393.21 | bwd_inner_microstep: 1393.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550
[2024-06-10 15:06:09,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1592.63 | bwd_inner_microstep: 1592.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 15:06:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1612.91 | bwd_inner_microstep: 1612.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2085
[2024-06-10 15:06:12,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.96 | bwd_microstep: 884.73 | bwd_inner_microstep: 884.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 15:06:14,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.12 | bwd_microstep: 1558.46 | bwd_inner_microstep: 1558.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 15:06:16,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2017
[2024-06-10 15:06:17,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.05 | bwd_microstep: 712.84 | bwd_inner_microstep: 712.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2289
[2024-06-10 15:06:18,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 784.51 | bwd_inner_microstep: 784.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 15:06:20,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1544.45 | bwd_inner_microstep: 1544.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 15:06:22,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1434.25 | bwd_inner_microstep: 1434.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3601
[2024-06-10 15:06:24,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1468.00 | bwd_inner_microstep: 1467.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 15:06:26,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1290.76 | bwd_inner_microstep: 1290.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 15:06:28,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1377.74 | bwd_inner_microstep: 1377.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 15:06:30,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1390.15 | bwd_inner_microstep: 1390.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3427
[2024-06-10 15:06:32,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1511.15 | bwd_inner_microstep: 1511.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 15:06:34,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.33 | bwd_microstep: 1516.09 | bwd_inner_microstep: 1516.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043
[2024-06-10 15:06:35,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.88 | bwd_microstep: 908.95 | bwd_inner_microstep: 908.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 15:06:38,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 15:06:38,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 1779.18 | bwd_inner_microstep: 1474.57 | bwd_allreduce_microstep: 304.57 | step_microstep: 37.66
[2024-06-10 15:06:38,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16116.41 | bwd: 43360.76 | bwd_inner: 43055.30 | bwd_allreduce: 304.79 | step: 39.14

 48%|████▊     | 833/1726 [14:24:07<15:10:13, 61.16s/it]


 48%|████▊     | 833/1726 [14:24:07<15:10:13, 61.16s/it]
 48%|████▊     | 834/1726 [14:25:10<15:18:48, 61.80s/it]


 48%|████▊     | 834/1726 [14:25:10<15:18:48, 61.80s/it]
 48%|████▊     | 835/1726 [14:26:12<15:17:26, 61.78s/it]


 48%|████▊     | 835/1726 [14:26:12<15:17:26, 61.78s/it]
 48%|████▊     | 836/1726 [14:27:12<15:11:35, 61.46s/it]


 48%|████▊     | 836/1726 [14:27:12<15:11:35, 61.46s/it]
 48%|████▊     | 837/1726 [14:28:15<15:13:42, 61.67s/it]


 48%|████▊     | 837/1726 [14:28:15<15:13:42, 61.67s/it]
 49%|████▊     | 838/1726 [14:29:14<15:04:24, 61.11s/{'loss': 1.1835, 'learning_rate': 2.1911310800001967e-05, 'epoch': 0.49}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 15:06:40,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1349.02 | bwd_inner_microstep: 1349.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952
[2024-06-10 15:06:42,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.83 | bwd_microstep: 1694.80 | bwd_inner_microstep: 1694.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3883
[2024-06-10 15:06:44,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.01 | bwd_microstep: 1685.51 | bwd_inner_microstep: 1685.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 15:06:46,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1340.58 | bwd_inner_microstep: 1340.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 15:06:48,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.13 | bwd_microstep: 1244.78 | bwd_inner_microstep: 1244.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 15:06:49,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 787.56 | bwd_inner_microstep: 787.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 15:06:51,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1281.62 | bwd_inner_microstep: 1281.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586
[2024-06-10 15:06:52,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.25 | bwd_microstep: 1212.26 | bwd_inner_microstep: 1212.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 15:06:55,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.51 | bwd_microstep: 1627.22 | bwd_inner_microstep: 1627.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 15:06:56,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1342.22 | bwd_inner_microstep: 1342.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-10 15:06:58,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.69 | bwd_microstep: 1181.97 | bwd_inner_microstep: 1181.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3484
[2024-06-10 15:07:00,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.78 | bwd_microstep: 1248.10 | bwd_inner_microstep: 1248.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 15:07:02,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.04 | bwd_microstep: 1430.90 | bwd_inner_microstep: 1430.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-10 15:07:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1365.45 | bwd_inner_microstep: 1365.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3570
[2024-06-10 15:07:06,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.66 | bwd_microstep: 1629.51 | bwd_inner_microstep: 1629.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423
[2024-06-10 15:07:08,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 15:07:09,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.98 | bwd_microstep: 797.87 | bwd_inner_microstep: 797.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 15:07:11,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 15:07:13,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1486.57 | bwd_inner_microstep: 1486.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1995
[2024-06-10 15:07:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.59 | bwd_microstep: 706.69 | bwd_inner_microstep: 706.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 15:07:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1344.69 | bwd_inner_microstep: 1344.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465
[2024-06-10 15:07:17,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1333.24 | bwd_inner_microstep: 1333.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 15:07:19,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1392.66 | bwd_inner_microstep: 1392.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3612
[2024-06-10 15:07:21,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.87 | bwd_microstep: 1368.80 | bwd_inner_microstep: 1368.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-10 15:07:23,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1430.20 | bwd_inner_microstep: 1430.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 15:07:25,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3721
[2024-06-10 15:07:28,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1697.52 | bwd_inner_microstep: 1697.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899
[2024-06-10 15:07:29,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.04 | bwd_microstep: 714.22 | bwd_inner_microstep: 714.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 15:07:30,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.15 | bwd_microstep: 697.72 | bwd_inner_microstep: 697.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 15:07:32,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.87 | bwd_microstep: 1404.53 | bwd_inner_microstep: 1404.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 15:07:34,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.92 | bwd_microstep: 1637.87 | bwd_inner_microstep: 1637.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3573
[2024-06-10 15:07:37,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 15:07:37,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 3078.12 | bwd_inner_microstep: 1728.78 | bwd_allreduce_microstep: 1349.29 | step_microstep: 37.94
[2024-06-10 15:07:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15786.35 | bwd: 43706.06 | bwd_inner: 42355.87 | bwd_allreduce: 1349.52 | step: 39.45
{'loss': 1.2662, 'learning_rate': 2.187394528140445e-05, 'epoch': 0.49}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-10 15:07:39,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1295.62 | bwd_inner_microstep: 1295.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3960
[2024-06-10 15:07:41,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1424.69 | bwd_inner_microstep: 1424.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 15:07:43,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1248.98 | bwd_inner_microstep: 1248.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 15:07:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.45 | bwd_microstep: 1375.16 | bwd_inner_microstep: 1375.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 15:07:47,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.62 | bwd_microstep: 1547.17 | bwd_inner_microstep: 1547.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 15:07:48,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 792.67 | bwd_inner_microstep: 792.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 15:07:50,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1246.00 | bwd_inner_microstep: 1245.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 15:07:52,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.75 | bwd_microstep: 1530.81 | bwd_inner_microstep: 1530.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 15:07:54,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.99 | bwd_microstep: 1150.27 | bwd_inner_microstep: 1150.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 15:07:55,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.08 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1885
[2024-06-10 15:07:57,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.93 | bwd_microstep: 761.22 | bwd_inner_microstep: 761.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3542
[2024-06-10 15:07:59,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1442.37 | bwd_inner_microstep: 1442.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2622
[2024-06-10 15:08:00,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.89 | bwd_microstep: 1107.66 | bwd_inner_microstep: 1107.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-10 15:08:02,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1579.96 | bwd_inner_microstep: 1579.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486
[2024-06-10 15:08:04,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.48 | bwd_microstep: 1578.91 | bwd_inner_microstep: 1578.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 15:08:06,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.76 | bwd_microstep: 1484.99 | bwd_inner_microstep: 1484.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 15:08:08,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1444.47 | bwd_inner_microstep: 1444.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3411
[2024-06-10 15:08:10,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.56 | bwd_microstep: 1438.87 | bwd_inner_microstep: 1438.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-10 15:08:12,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1440.10 | bwd_inner_microstep: 1440.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 15:08:14,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1292.46 | bwd_inner_microstep: 1292.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 15:08:16,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1292.47 | bwd_inner_microstep: 1292.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 15:08:18,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 1586.08 | bwd_inner_microstep: 1586.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 15:08:20,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.39 | bwd_microstep: 1583.01 | bwd_inner_microstep: 1582.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551
[2024-06-10 15:08:22,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.21 | bwd_microstep: 1426.07 | bwd_inner_microstep: 1426.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3447
[2024-06-10 15:08:24,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1398.34 | bwd_inner_microstep: 1398.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 15:08:26,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.87 | bwd_microstep: 1487.94 | bwd_inner_microstep: 1487.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3722
[2024-06-10 15:08:29,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.72 | bwd_microstep: 1612.45 | bwd_inner_microstep: 1612.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 15:08:30,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1391.33 | bwd_inner_microstep: 1391.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 15:08:33,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1537.55 | bwd_inner_microstep: 1537.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 15:08:34,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.74 | bwd_microstep: 1308.25 | bwd_inner_microstep: 1308.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 15:08:36,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.66 | bwd_microstep: 1355.27 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 15:08:38,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 15:08:38,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1336.77 | bwd_inner_microstep: 1328.53 | bwd_allreduce_microstep: 8.20 | step_microstep: 37.65
[2024-06-10 15:08:38,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16420.19 | bwd: 43883.11 | bwd_inner: 43874.01 | bwd_allreduce: 8.42 | step: 39.19
{'loss': 1.2786, 'learning_rate': 2.1836573162781406e-05, 'epoch': 0.49}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 15:08:40,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.08 | bwd_microstep: 1312.98 | bwd_inner_microstep: 1312.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 15:08:42,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1283.93 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 15:08:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883
[2024-06-10 15:08:46,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.95 | bwd_microstep: 1583.09 | bwd_inner_microstep: 1583.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 15:08:48,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2312
[2024-06-10 15:08:49,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.63 | bwd_microstep: 915.99 | bwd_inner_microstep: 915.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-10 15:08:51,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.37 | bwd_microstep: 1443.45 | bwd_inner_microstep: 1443.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 15:08:53,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1287.92 | bwd_inner_microstep: 1287.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 15:08:54,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1246.22 | bwd_inner_microstep: 1246.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 15:08:56,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1250.59 | bwd_inner_microstep: 1250.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 15:08:58,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.82 | bwd_microstep: 1356.54 | bwd_inner_microstep: 1356.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 15:09:00,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1311.87 | bwd_inner_microstep: 1311.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 15:09:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1314.92 | bwd_inner_microstep: 1314.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 15:09:03,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.68 | bwd_microstep: 1315.14 | bwd_inner_microstep: 1315.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-10 15:09:05,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1334.85 | bwd_inner_microstep: 1334.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 15:09:07,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2110
[2024-06-10 15:09:08,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.76 | bwd_microstep: 923.97 | bwd_inner_microstep: 923.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 15:09:11,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1590.11 | bwd_inner_microstep: 1590.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 15:09:13,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.31 | bwd_microstep: 1476.47 | bwd_inner_microstep: 1476.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 15:09:15,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.93 | bwd_microstep: 1383.23 | bwd_inner_microstep: 1383.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 15:09:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1491.26 | bwd_inner_microstep: 1491.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 15:09:19,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.46 | bwd_microstep: 1596.88 | bwd_inner_microstep: 1596.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2243
[2024-06-10 15:09:20,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.37 | bwd_microstep: 1026.42 | bwd_inner_microstep: 1026.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 15:09:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1355.42 | bwd_inner_microstep: 1355.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 15:09:23,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.51 | bwd_microstep: 810.16 | bwd_inner_microstep: 810.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3557
[2024-06-10 15:09:25,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1359.10 | bwd_inner_microstep: 1359.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 15:09:27,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1416.46 | bwd_inner_microstep: 1416.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3561
[2024-06-10 15:09:29,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.51 | bwd_microstep: 1568.37 | bwd_inner_microstep: 1568.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 15:09:31,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1357.38 | bwd_inner_microstep: 1357.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 15:09:33,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.89 | bwd_microstep: 1373.27 | bwd_inner_microstep: 1373.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 15:09:35,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.65 | bwd_microstep: 1340.19 | bwd_inner_microstep: 1340.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1845
[2024-06-10 15:09:39,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.64
[2024-06-10 15:09:39,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.42 | bwd_microstep: 3458.19 | bwd_inner_microstep: 766.02 | bwd_allreduce_microstep: 2692.12 | step_microstep: 38.03
[2024-06-10 15:09:39,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15653.14 | bwd: 44497.09 | bwd_inner: 41804.06 | bwd_allreduce: 2692.35 | step: 39.45
{'loss': 1.2411, 'learning_rate': 2.179919457575722e-05, 'epoch': 0.49}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 15:09:40,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.76 | bwd_microstep: 779.46 | bwd_inner_microstep: 779.37 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3995
[2024-06-10 15:09:42,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.31 | bwd_microstep: 1600.85 | bwd_inner_microstep: 1600.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2321
[2024-06-10 15:09:43,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.02 | bwd_microstep: 981.71 | bwd_inner_microstep: 981.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3478
[2024-06-10 15:09:45,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1229.62 | bwd_inner_microstep: 1229.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 15:09:47,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1547.23 | bwd_inner_microstep: 1547.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 15:09:49,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.29 | bwd_microstep: 1149.16 | bwd_inner_microstep: 1149.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 15:09:50,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1275.08 | bwd_inner_microstep: 1275.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 15:09:52,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1390.05 | bwd_inner_microstep: 1390.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 15:09:54,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 15:09:55,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.43 | bwd_microstep: 683.64 | bwd_inner_microstep: 683.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605
[2024-06-10 15:09:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.88 | bwd_microstep: 1440.63 | bwd_inner_microstep: 1440.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2220
[2024-06-10 15:09:58,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.00 | bwd_microstep: 893.24 | bwd_inner_microstep: 893.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 15:10:00,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.72 | bwd_microstep: 1316.57 | bwd_inner_microstep: 1316.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 15:10:02,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1509.35 | bwd_inner_microstep: 1509.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397
[2024-06-10 15:10:04,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1402.61 | bwd_inner_microstep: 1402.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691
[2024-06-10 15:10:06,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.50 | bwd_microstep: 1617.86 | bwd_inner_microstep: 1617.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449
[2024-06-10 15:10:08,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.68 | bwd_microstep: 1378.13 | bwd_inner_microstep: 1378.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 15:10:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 15:10:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1509.74 | bwd_inner_microstep: 1509.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 15:10:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.63 | bwd_microstep: 1256.22 | bwd_inner_microstep: 1256.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 15:10:16,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.58 | bwd_microstep: 1426.02 | bwd_inner_microstep: 1425.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 15:10:18,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1390.60 | bwd_inner_microstep: 1390.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 15:10:20,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.30 | bwd_microstep: 1657.81 | bwd_inner_microstep: 1657.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 15:10:23,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.35 | bwd_microstep: 1658.14 | bwd_inner_microstep: 1658.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 15:10:25,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1497.81 | bwd_inner_microstep: 1497.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1904
[2024-06-10 15:10:26,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.69 | bwd_microstep: 763.93 | bwd_inner_microstep: 763.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3718
[2024-06-10 15:10:28,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.47 | bwd_microstep: 1422.08 | bwd_inner_microstep: 1422.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 15:10:30,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1475.01 | bwd_inner_microstep: 1474.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3566
[2024-06-10 15:10:32,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.59 | bwd_microstep: 1346.30 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 15:10:34,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3596
[2024-06-10 15:10:36,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.00 | bwd_microstep: 1636.89 | bwd_inner_microstep: 1636.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 15:10:39,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 15:10:39,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.67 | bwd_microstep: 2697.21 | bwd_inner_microstep: 1809.69 | bwd_allreduce_microstep: 887.47 | step_microstep: 38.03
[2024-06-10 15:10:39,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.42 | bwd: 44187.66 | bwd_inner: 43299.22 | bwd_allreduce: 887.74 | step: 39.55
{'loss': 1.2561, 'learning_rate': 2.1761809651979098e-05, 'epoch': 0.49}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 15:10:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.50 | bwd_microstep: 1302.45 | bwd_inner_microstep: 1302.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 15:10:43,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1345.15 | bwd_inner_microstep: 1345.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-10 15:10:45,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.46 | bwd_microstep: 1560.91 | bwd_inner_microstep: 1560.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 15:10:47,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1480.13 | bwd_inner_microstep: 1480.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3837
[2024-06-10 15:10:49,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.28 | bwd_microstep: 1383.09 | bwd_inner_microstep: 1383.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237
[2024-06-10 15:10:50,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.39 | bwd_microstep: 864.95 | bwd_inner_microstep: 864.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 15:10:52,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1381.82 | bwd_inner_microstep: 1381.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418
[2024-06-10 15:10:54,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.40 | bwd_microstep: 1181.25 | bwd_inner_microstep: 1181.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 15:10:55,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.40 | bwd_microstep: 1149.89 | bwd_inner_microstep: 1149.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-10 15:10:57,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.92 | bwd_microstep: 1283.18 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-10 15:10:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.66 | bwd_microstep: 1347.49 | bwd_inner_microstep: 1347.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128
[2024-06-10 15:11:00,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.86 | bwd_microstep: 924.58 | bwd_inner_microstep: 924.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 15:11:02,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1377.95 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 15:11:04,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1492.16 | bwd_inner_microstep: 1492.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 15:11:06,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1480.24 | bwd_inner_microstep: 1480.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3675
[2024-06-10 15:11:09,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.72 | bwd_microstep: 1715.40 | bwd_inner_microstep: 1715.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521
[2024-06-10 15:11:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.70 | bwd_microstep: 1191.93 | bwd_inner_microstep: 1191.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3520
[2024-06-10 15:11:12,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1433.93 | bwd_inner_microstep: 1433.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-10 15:11:15,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.48 | bwd_microstep: 1711.06 | bwd_inner_microstep: 1711.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3731
[2024-06-10 15:11:16,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.74 | bwd_microstep: 1334.25 | bwd_inner_microstep: 1334.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3512
[2024-06-10 15:11:18,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1347.50 | bwd_inner_microstep: 1347.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 15:11:20,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.49 | bwd_microstep: 1187.05 | bwd_inner_microstep: 1187.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723
[2024-06-10 15:11:22,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1331.53 | bwd_inner_microstep: 1331.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913
[2024-06-10 15:11:23,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.85 | bwd_microstep: 686.43 | bwd_inner_microstep: 686.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 15:11:25,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.93 | bwd_microstep: 1313.36 | bwd_inner_microstep: 1313.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 15:11:27,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1376.30 | bwd_inner_microstep: 1376.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1611
[2024-06-10 15:11:27,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 244.58 | bwd_microstep: 644.20 | bwd_inner_microstep: 644.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007
[2024-06-10 15:11:29,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.32 | bwd_microstep: 899.91 | bwd_inner_microstep: 899.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3818
[2024-06-10 15:11:31,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.84 | bwd_microstep: 1752.32 | bwd_inner_microstep: 1752.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 15:11:33,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1244.47 | bwd_inner_microstep: 1244.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-10 15:11:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.93 | bwd_microstep: 1590.20 | bwd_inner_microstep: 1590.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-10 15:11:40,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-10 15:11:40,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.27 | bwd_microstep: 4648.00 | bwd_inner_microstep: 1089.34 | bwd_allreduce_microstep: 3558.59 | step_microstep: 39.05
[2024-06-10 15:11:40,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15490.30 | bwd: 44963.08 | bwd_inner: 41403.57 | bwd_allreduce: 3558.84 | step: 40.61
{'loss': 1.2329, 'learning_rate': 2.1724418523116534e-05, 'epoch': 0.49}
it]


 49%|████▊     | 838/1726 [14:29:14<15:04:24, 61.11s/it]
 49%|████▊     | 839/1726 [14:30:14<14:57:42, 60.72s/it]


 49%|████▊     | 839/1726 [14:30:14<14:57:42, 60.72s/it]
 49%|████▊     | 840/1726 [14:31:15<14:56:19, 60.70s/it]


 49%|████▊     | 840/1726 [14:31:15<14:56:19, 60.70s/it]
 49%|████▊     | 841/1726 [14:32:15<14:54:18, 60.63s/it]


 49%|████▊     | 841/1726 [14:32:15<14:54:18, 60.63s/it]
 49%|████▉     | 842/1726 [14:33:16<14:53:24, 60.64s/it]


 49%|████▉     | 842/1726 [14:33:16<14:53:24, 60.64s/it]
 49%|████▉     | 843/1726 [14:34:17<14:53:03, 60.68s/it]


 4dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 15:11:42,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.27 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3969
[2024-06-10 15:11:44,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.40 | bwd_microstep: 1497.94 | bwd_inner_microstep: 1497.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 15:11:46,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.16 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-10 15:11:47,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 787.75 | bwd_inner_microstep: 787.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 15:11:49,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1479.77 | bwd_inner_microstep: 1479.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475
[2024-06-10 15:11:51,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1407.99 | bwd_inner_microstep: 1407.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 15:11:52,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.31 | bwd_microstep: 808.24 | bwd_inner_microstep: 808.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 15:11:54,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1402.02 | bwd_inner_microstep: 1401.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2482
[2024-06-10 15:11:56,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.62 | bwd_microstep: 926.02 | bwd_inner_microstep: 926.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2394
[2024-06-10 15:11:57,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.42 | bwd_microstep: 1119.80 | bwd_inner_microstep: 1119.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977
[2024-06-10 15:11:58,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.97 | bwd_microstep: 894.54 | bwd_inner_microstep: 894.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3839
[2024-06-10 15:12:01,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.87 | bwd_microstep: 1715.01 | bwd_inner_microstep: 1714.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3443
[2024-06-10 15:12:03,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.78 | bwd_microstep: 1405.28 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3506
[2024-06-10 15:12:05,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1335.08 | bwd_inner_microstep: 1335.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-10 15:12:06,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1414.69 | bwd_inner_microstep: 1414.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 15:12:08,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1386.55 | bwd_inner_microstep: 1386.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-10 15:12:10,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.63 | bwd_microstep: 1417.78 | bwd_inner_microstep: 1417.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 15:12:12,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.72 | bwd_microstep: 1320.29 | bwd_inner_microstep: 1320.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 15:12:14,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.83 | bwd_microstep: 1354.62 | bwd_inner_microstep: 1354.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 15:12:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.25 | bwd_microstep: 1609.91 | bwd_inner_microstep: 1609.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 15:12:17,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.71 | bwd_microstep: 876.34 | bwd_inner_microstep: 876.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 15:12:19,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.26 | bwd_microstep: 1451.14 | bwd_inner_microstep: 1451.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3805
[2024-06-10 15:12:22,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.54 | bwd_microstep: 1686.93 | bwd_inner_microstep: 1686.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.00
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 15:12:24,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2192
[2024-06-10 15:12:25,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.70 | bwd_microstep: 1051.26 | bwd_inner_microstep: 1051.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 15:12:27,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.33 | bwd_microstep: 1295.50 | bwd_inner_microstep: 1295.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 15:12:29,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1284.98 | bwd_inner_microstep: 1284.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775
[2024-06-10 15:12:31,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1482.73 | bwd_inner_microstep: 1482.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 15:12:33,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.62 | bwd_microstep: 1653.50 | bwd_inner_microstep: 1653.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 15:12:35,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1499.25 | bwd_inner_microstep: 1499.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 15:12:37,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.21 | bwd_microstep: 1275.31 | bwd_inner_microstep: 1275.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 15:12:40,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 15:12:40,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.43 | bwd_microstep: 1846.87 | bwd_inner_microstep: 1573.45 | bwd_allreduce_microstep: 273.37 | step_microstep: 37.69
[2024-06-10 15:12:40,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16013.91 | bwd: 43174.26 | bwd_inner: 42899.99 | bwd_allreduce: 273.60 | step: 41.16
{'loss': 1.2126, 'learning_rate': 2.1687021320860893e-05, 'epoch': 0.49}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 15:12:41,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.96 | bwd_microstep: 1267.99 | bwd_inner_microstep: 1267.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3895
[2024-06-10 15:12:43,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3890
[2024-06-10 15:12:45,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1384.93 | bwd_inner_microstep: 1384.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 15:12:47,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.79 | bwd_microstep: 1349.06 | bwd_inner_microstep: 1349.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 15:12:48,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.07 | bwd_microstep: 873.41 | bwd_inner_microstep: 873.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-10 15:12:50,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.71 | bwd_microstep: 1180.33 | bwd_inner_microstep: 1180.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2464
[2024-06-10 15:12:51,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.80 | bwd_microstep: 948.70 | bwd_inner_microstep: 948.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1901
[2024-06-10 15:12:52,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.10 | bwd_microstep: 712.14 | bwd_inner_microstep: 712.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 15:12:54,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.19 | bwd_microstep: 1181.07 | bwd_inner_microstep: 1181.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718
[2024-06-10 15:12:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1332.25 | bwd_inner_microstep: 1332.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 15:12:57,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.01 | bwd_microstep: 812.50 | bwd_inner_microstep: 812.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 15:12:59,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1339.66 | bwd_inner_microstep: 1339.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 15:13:00,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1292.40 | bwd_inner_microstep: 1292.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-10 15:13:02,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1415.36 | bwd_inner_microstep: 1415.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 15:13:04,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.78 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 15:13:06,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.86 | bwd_inner_microstep: 1245.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 15:13:08,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1247.20 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 15:13:09,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.17 | bwd_microstep: 801.94 | bwd_inner_microstep: 801.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 15:13:11,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 15:13:13,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.35 | bwd_microstep: 1358.03 | bwd_inner_microstep: 1358.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 15:13:15,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.71 | bwd_microstep: 1452.31 | bwd_inner_microstep: 1452.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940
[2024-06-10 15:13:16,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.71 | bwd_microstep: 696.42 | bwd_inner_microstep: 696.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 15:13:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.83 | bwd_microstep: 1482.89 | bwd_inner_microstep: 1482.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3545
[2024-06-10 15:13:20,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 15:13:21,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1296.68 | bwd_inner_microstep: 1296.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 15:13:24,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1651.41 | bwd_inner_microstep: 1651.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 15:13:25,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.00 | bwd_microstep: 1181.71 | bwd_inner_microstep: 1181.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844
[2024-06-10 15:13:27,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.09 | bwd_microstep: 1468.42 | bwd_inner_microstep: 1468.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3725
[2024-06-10 15:13:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.31 | bwd_microstep: 1416.61 | bwd_inner_microstep: 1416.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 15:13:31,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.41 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 15:13:33,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1510.33 | bwd_inner_microstep: 1510.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066
[2024-06-10 15:13:40,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.64 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-10 15:13:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.07 | bwd_microstep: 5940.13 | bwd_inner_microstep: 938.30 | bwd_allreduce_microstep: 5001.76 | step_microstep: 40.21
[2024-06-10 15:13:40,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14967.09 | bwd: 44785.07 | bwd_inner: 39782.40 | bwd_allreduce: 5002.00 | step: 41.68
{'loss': 1.2146, 'learning_rate': 2.164961817692494e-05, 'epoch': 0.49}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 15:13:41,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.35 | bwd_microstep: 1273.43 | bwd_inner_microstep: 1273.35 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3943
[2024-06-10 15:13:44,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.58 | bwd_microstep: 1592.52 | bwd_inner_microstep: 1592.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2323
[2024-06-10 15:13:45,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.33 | bwd_microstep: 981.26 | bwd_inner_microstep: 981.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-10 15:13:47,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1544.96 | bwd_inner_microstep: 1544.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2047
[2024-06-10 15:13:48,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.10 | bwd_microstep: 780.90 | bwd_inner_microstep: 780.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 15:13:50,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.16 | bwd_microstep: 1376.33 | bwd_inner_microstep: 1376.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 15:13:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.14 | bwd_microstep: 1218.37 | bwd_inner_microstep: 1218.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 15:13:54,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1279.65 | bwd_inner_microstep: 1279.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 15:13:55,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1353.53 | bwd_inner_microstep: 1353.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3986
[2024-06-10 15:13:58,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1536.87 | bwd_inner_microstep: 1536.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440
[2024-06-10 15:13:59,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1311.86 | bwd_inner_microstep: 1311.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488
[2024-06-10 15:14:01,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.07 | bwd_microstep: 1434.67 | bwd_inner_microstep: 1434.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3633
[2024-06-10 15:14:04,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1653.56 | bwd_inner_microstep: 1653.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3818
[2024-06-10 15:14:06,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.54 | bwd_microstep: 1856.24 | bwd_inner_microstep: 1856.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 15:14:08,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.67 | bwd_microstep: 1522.10 | bwd_inner_microstep: 1522.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 15:14:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1514.09 | bwd_inner_microstep: 1514.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 15:14:13,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.95 | bwd_microstep: 1624.22 | bwd_inner_microstep: 1624.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 15:14:15,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1509.86 | bwd_inner_microstep: 1509.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 15:14:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.96 | bwd_microstep: 801.73 | bwd_inner_microstep: 801.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 15:14:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1493.72 | bwd_inner_microstep: 1493.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2282
[2024-06-10 15:14:19,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.27 | bwd_microstep: 812.85 | bwd_inner_microstep: 812.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 15:14:21,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 1494.96 | bwd_inner_microstep: 1494.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 15:14:23,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1355.10 | bwd_inner_microstep: 1355.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-10 15:14:25,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1606.73 | bwd_inner_microstep: 1606.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 15:14:26,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.45 | bwd_microstep: 697.15 | bwd_inner_microstep: 697.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 15:14:28,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1553.35 | bwd_inner_microstep: 1553.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3587
[2024-06-10 15:14:30,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1240.68 | bwd_inner_microstep: 1240.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 15:14:32,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1505.49 | bwd_inner_microstep: 1505.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3716
[2024-06-10 15:14:34,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.60 | bwd_microstep: 1732.54 | bwd_inner_microstep: 1732.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 15:14:36,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.43 | bwd_microstep: 1533.11 | bwd_inner_microstep: 1533.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3384
[2024-06-10 15:14:38,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.70 | bwd_microstep: 1366.96 | bwd_inner_microstep: 1366.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 15:14:42,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 15:14:42,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 2833.61 | bwd_inner_microstep: 1527.30 | bwd_allreduce_microstep: 1306.26 | step_microstep: 37.81
[2024-06-10 15:14:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16372.80 | bwd: 45392.42 | bwd_inner: 44085.20 | bwd_allreduce: 1306.53 | step: 39.35
{'loss': 1.2301, 'learning_rate': 2.1612209223042346e-05, 'epoch': 0.49}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 15:14:44,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1450.60 | bwd_inner_microstep: 1450.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 15:14:46,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.09 | bwd_microstep: 1585.99 | bwd_inner_microstep: 1585.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 15:14:48,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.58 | bwd_microstep: 1443.23 | bwd_inner_microstep: 1443.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 15:14:50,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424
[2024-06-10 15:14:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.10 | bwd_microstep: 1150.57 | bwd_inner_microstep: 1150.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 15:14:53,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1302.78 | bwd_inner_microstep: 1302.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 15:14:55,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2672
[2024-06-10 15:14:56,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.56 | bwd_microstep: 1118.95 | bwd_inner_microstep: 1118.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 15:14:58,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1443.72 | bwd_inner_microstep: 1443.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2315
[2024-06-10 15:15:00,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.10 | bwd_microstep: 917.00 | bwd_inner_microstep: 916.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2993
[2024-06-10 15:15:01,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.00 | bwd_microstep: 1201.14 | bwd_inner_microstep: 1201.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 15:15:03,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1341.92 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 15:15:05,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1313.56 | bwd_inner_microstep: 1313.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1980
[2024-06-10 15:15:06,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.93 | bwd_microstep: 831.30 | bwd_inner_microstep: 831.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2902
[2024-06-10 15:15:08,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.96 | bwd_microstep: 1154.74 | bwd_inner_microstep: 1154.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2102
[2024-06-10 15:15:09,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.38 | bwd_microstep: 1018.13 | bwd_inner_microstep: 1018.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-10 15:15:11,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1317.60 | bwd_inner_microstep: 1317.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 15:15:13,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.81 | bwd_microstep: 1423.34 | bwd_inner_microstep: 1423.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 15:15:15,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 15:15:17,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 1395.70 | bwd_inner_microstep: 1395.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 15:15:19,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1248.17 | bwd_inner_microstep: 1248.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 15:15:21,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1379.91 | bwd_inner_microstep: 1379.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2214
[2024-06-10 15:15:22,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.43 | bwd_microstep: 860.70 | bwd_inner_microstep: 860.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 15:15:24,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1398.58 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 15:15:25,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4000
[2024-06-10 15:15:28,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.40 | bwd_microstep: 1642.23 | bwd_inner_microstep: 1642.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 15:15:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1608.39 | bwd_inner_microstep: 1608.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1400
[2024-06-10 15:15:31,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.48 | bwd_microstep: 525.83 | bwd_inner_microstep: 525.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-10 15:15:33,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.25 | bwd_microstep: 1438.31 | bwd_inner_microstep: 1438.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 15:15:34,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.42 | bwd_microstep: 977.33 | bwd_inner_microstep: 977.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 15:15:36,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.41 | bwd_microstep: 1644.60 | bwd_inner_microstep: 1644.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3604
[2024-06-10 15:15:41,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 15:15:41,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 4570.84 | bwd_inner_microstep: 1923.80 | bwd_allreduce_microstep: 2646.99 | step_microstep: 37.84
[2024-06-10 15:15:41,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.01 | bwd: 44037.82 | bwd_inner: 41389.92 | bwd_allreduce: 2647.22 | step: 39.45
{'loss': 1.2385, 'learning_rate': 2.157479459096724e-05, 'epoch': 0.49}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-10 15:15:44,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.48 | bwd_microstep: 1579.64 | bwd_inner_microstep: 1579.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3933
[2024-06-10 15:15:46,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.33 | bwd_microstep: 1455.93 | bwd_inner_microstep: 1455.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486
[2024-06-10 15:15:48,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.03 | bwd_microstep: 1410.48 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 15:15:50,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.84 | bwd_microstep: 1546.83 | bwd_inner_microstep: 1546.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2020
[2024-06-10 15:15:51,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.79 | bwd_microstep: 741.33 | bwd_inner_microstep: 741.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4121
[2024-06-10 15:15:53,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.60 | bwd_microstep: 1737.89 | bwd_inner_microstep: 1737.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 15:15:55,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.31 | bwd_microstep: 1281.93 | bwd_inner_microstep: 1281.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3496
[2024-06-10 15:15:57,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.33 | bwd_microstep: 1219.44 | bwd_inner_microstep: 1219.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 15:15:58,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.05 | bwd_microstep: 1309.67 | bwd_inner_microstep: 1309.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4068
[2024-06-10 15:16:01,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.71 | bwd_microstep: 1726.00 | bwd_inner_microstep: 1725.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2635
[2024-06-10 15:16:02,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.17 | bwd_microstep: 1048.72 | bwd_inner_microstep: 1048.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 15:16:04,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.56 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 15:16:06,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1482.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 15:16:08,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1351.49 | bwd_inner_microstep: 1351.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 15:16:10,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1377.42 | bwd_inner_microstep: 1377.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 15:16:12,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1297.39 | bwd_inner_microstep: 1297.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 15:16:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.56 | bwd_microstep: 1451.80 | bwd_inner_microstep: 1451.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 15:16:16,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 15:16:18,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1511.09 | bwd_inner_microstep: 1511.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 15:16:20,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.49 | bwd_microstep: 1656.32 | bwd_inner_microstep: 1656.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 15:16:22,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1555.31 | bwd_inner_microstep: 1555.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2676
[2024-06-10 15:16:23,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.54 | bwd_microstep: 1027.23 | bwd_inner_microstep: 1027.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 15:16:25,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1433.55 | bwd_inner_microstep: 1433.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3687
[2024-06-10 15:16:27,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.04 | bwd_microstep: 1477.94 | bwd_inner_microstep: 1477.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071
[2024-06-10 15:16:29,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.02 | bwd_microstep: 914.66 | bwd_inner_microstep: 914.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2677
[2024-06-10 15:16:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.65 | bwd_microstep: 1122.98 | bwd_inner_microstep: 1122.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 15:16:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1318.28 | bwd_inner_microstep: 1318.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-10 15:16:34,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.42 | bwd_microstep: 1325.46 | bwd_inner_microstep: 1325.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442
[2024-06-10 15:16:36,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1447.94 | bwd_inner_microstep: 1447.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 15:16:38,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588
[2024-06-10 15:16:40,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.80 | bwd_microstep: 1603.09 | bwd_inner_microstep: 1603.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3731
[2024-06-10 15:16:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.13 | optimizer_step: 6.60
[2024-06-10 15:16:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1653.04 | bwd_inner_microstep: 1406.98 | bwd_allreduce_microstep: 246.01 | step_microstep: 37.77
[2024-06-10 15:16:42,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16402.18 | bwd: 44203.45 | bwd_inner: 43956.55 | bwd_allreduce: 246.24 | step: 39.21
{'loss': 1.2467, 'learning_rate': 2.1537374412473773e-05, 'epoch': 0.49}
9%|████▉     | 843/1726 [14:34:17<14:53:03, 60.68s/it]
 49%|████▉     | 844/1726 [14:35:16<14:46:56, 60.34s/it]


 49%|████▉     | 844/1726 [14:35:16<14:46:56, 60.34s/it]
 49%|████▉     | 845/1726 [14:36:16<14:44:48, 60.26s/it]


 49%|████▉     | 845/1726 [14:36:16<14:44:48, 60.26s/it]
 49%|████▉     | 846/1726 [14:37:18<14:51:54, 60.81s/it]


 49%|████▉     | 846/1726 [14:37:18<14:51:54, 60.81s/it]
 49%|████▉     | 847/1726 [14:38:18<14:45:59, 60.48s/it]


 49%|████▉     | 847/1726 [14:38:18<14:45:59, 60.48s/it]
 49%|████▉     | 848/1726 [14:39:19<14:46:59, 60.61s/it]


 49%|████▉     | 848/1726 [14:39:19<14:46:59, 60.61s/it]dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3473
[2024-06-10 15:16:45,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.27 | bwd_microstep: 1575.96 | bwd_inner_microstep: 1575.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 15:16:46,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1378.09 | bwd_inner_microstep: 1378.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 15:16:49,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.66 | bwd_microstep: 1552.88 | bwd_inner_microstep: 1552.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 15:16:51,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1407.44 | bwd_inner_microstep: 1407.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 15:16:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1349.71 | bwd_inner_microstep: 1349.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 15:16:54,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1439.78 | bwd_inner_microstep: 1439.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 15:16:56,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.78 | bwd_microstep: 1283.89 | bwd_inner_microstep: 1283.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3705
[2024-06-10 15:16:58,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.73 | bwd_microstep: 1460.23 | bwd_inner_microstep: 1460.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 15:16:59,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 790.52 | bwd_inner_microstep: 790.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2982
[2024-06-10 15:17:01,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.67 | bwd_microstep: 1104.82 | bwd_inner_microstep: 1104.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 15:17:03,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1345.31 | bwd_inner_microstep: 1345.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 15:17:05,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.65 | bwd_microstep: 1477.30 | bwd_inner_microstep: 1477.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 15:17:07,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.21 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856
[2024-06-10 15:17:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.75 | bwd_microstep: 1666.07 | bwd_inner_microstep: 1666.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-10 15:17:11,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.49 | bwd_microstep: 1610.82 | bwd_inner_microstep: 1610.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-10 15:17:13,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.02 | bwd_microstep: 1156.93 | bwd_inner_microstep: 1156.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 15:17:14,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1315.17 | bwd_inner_microstep: 1315.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 15:17:16,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1418.88 | bwd_inner_microstep: 1418.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469
[2024-06-10 15:17:18,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.28 | bwd_microstep: 1437.36 | bwd_inner_microstep: 1437.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 15:17:20,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1414.09 | bwd_inner_microstep: 1414.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 15:17:22,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.76 | bwd_microstep: 1278.86 | bwd_inner_microstep: 1278.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 15:17:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1450.88 | bwd_inner_microstep: 1450.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 15:17:26,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1495.63 | bwd_inner_microstep: 1495.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 15:17:28,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.57 | bwd_microstep: 1350.55 | bwd_inner_microstep: 1350.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931
[2024-06-10 15:17:29,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.97 | bwd_microstep: 728.32 | bwd_inner_microstep: 728.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2293
[2024-06-10 15:17:30,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.36 | bwd_microstep: 846.88 | bwd_inner_microstep: 846.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 15:17:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.95 | bwd_microstep: 1246.50 | bwd_inner_microstep: 1246.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 15:17:34,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1256.07 | bwd_inner_microstep: 1256.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897
[2024-06-10 15:17:35,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.79 | bwd_microstep: 715.07 | bwd_inner_microstep: 715.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 15:17:37,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.74 | bwd_microstep: 1551.34 | bwd_inner_microstep: 1551.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 15:17:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1497.18 | bwd_inner_microstep: 1497.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 15:17:43,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.57 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-10 15:17:43,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.02 | bwd_microstep: 4061.53 | bwd_inner_microstep: 895.58 | bwd_allreduce_microstep: 3165.90 | step_microstep: 38.09
[2024-06-10 15:17:43,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.96 | bwd: 45008.04 | bwd_inner: 41841.23 | bwd_allreduce: 3166.13 | step: 39.56
{'loss': 1.2339, 'learning_rate': 2.1499948819355626e-05, 'epoch': 0.49}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 15:17:45,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.00 | bwd_microstep: 1474.96 | bwd_inner_microstep: 1474.81 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898
[2024-06-10 15:17:48,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.10 | bwd_microstep: 1581.79 | bwd_inner_microstep: 1581.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2367
[2024-06-10 15:17:49,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.98 | bwd_microstep: 892.69 | bwd_inner_microstep: 892.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797
[2024-06-10 15:17:51,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1510.18 | bwd_inner_microstep: 1510.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 15:17:53,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 15:17:54,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1245.48 | bwd_inner_microstep: 1245.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 15:17:56,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1386.68 | bwd_inner_microstep: 1386.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 15:17:58,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 15:18:00,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1283.11 | bwd_inner_microstep: 1283.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 15:18:02,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.20 | bwd_microstep: 1502.48 | bwd_inner_microstep: 1502.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3718
[2024-06-10 15:18:04,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1236.77 | bwd_inner_microstep: 1236.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 15:18:05,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.66 | bwd_microstep: 798.38 | bwd_inner_microstep: 798.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 15:18:07,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1530.54 | bwd_inner_microstep: 1530.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 15:18:09,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.93 | bwd_microstep: 1340.51 | bwd_inner_microstep: 1340.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2448
[2024-06-10 15:18:10,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.24 | bwd_microstep: 1014.80 | bwd_inner_microstep: 1014.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 15:18:12,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.55 | bwd_microstep: 1343.49 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 15:18:14,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1370.22 | bwd_inner_microstep: 1370.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2023
[2024-06-10 15:18:15,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.70 | bwd_microstep: 963.86 | bwd_inner_microstep: 963.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 15:18:17,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1452.32 | bwd_inner_microstep: 1452.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3675
[2024-06-10 15:18:19,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.24 | bwd_microstep: 1324.71 | bwd_inner_microstep: 1324.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-10 15:18:20,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.35 | bwd_microstep: 685.38 | bwd_inner_microstep: 685.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082
[2024-06-10 15:18:21,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.54 | bwd_microstep: 822.72 | bwd_inner_microstep: 822.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 15:18:23,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.80 | bwd_microstep: 1301.39 | bwd_inner_microstep: 1301.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 15:18:25,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1533.40 | bwd_inner_microstep: 1533.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 15:18:27,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1295.32 | bwd_inner_microstep: 1295.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3593
[2024-06-10 15:18:29,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.44 | bwd_microstep: 1533.59 | bwd_inner_microstep: 1533.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 15:18:31,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.36 | bwd_microstep: 1575.68 | bwd_inner_microstep: 1575.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3571
[2024-06-10 15:18:33,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.05 | bwd_microstep: 1265.36 | bwd_inner_microstep: 1265.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3777
[2024-06-10 15:18:35,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.49 | bwd_microstep: 1794.88 | bwd_inner_microstep: 1794.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2724
[2024-06-10 15:18:37,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.48 | bwd_microstep: 1149.35 | bwd_inner_microstep: 1149.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 15:18:39,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.44 | bwd_microstep: 1590.99 | bwd_inner_microstep: 1590.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 15:18:44,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 15:18:44,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 3831.53 | bwd_inner_microstep: 1564.54 | bwd_allreduce_microstep: 2266.93 | step_microstep: 37.95
[2024-06-10 15:18:44,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.22 | bwd: 44200.29 | bwd_inner: 41932.33 | bwd_allreduce: 2267.23 | step: 39.48
{'loss': 1.2289, 'learning_rate': 2.1462517943425523e-05, 'epoch': 0.49}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862
[2024-06-10 15:18:44,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.80 | bwd_microstep: 668.71 | bwd_inner_microstep: 668.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 15:18:46,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1272.78 | bwd_inner_microstep: 1272.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 15:18:48,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.34 | bwd_microstep: 1148.71 | bwd_inner_microstep: 1148.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 15:18:50,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 15:18:52,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1293.52 | bwd_inner_microstep: 1293.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638
[2024-06-10 15:18:53,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.68 | bwd_microstep: 1017.71 | bwd_inner_microstep: 1017.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 15:18:55,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720
[2024-06-10 15:18:57,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1332.63 | bwd_inner_microstep: 1332.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 15:18:58,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1248.52 | bwd_inner_microstep: 1248.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 15:19:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.39 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 15:19:02,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.23 | bwd_microstep: 1487.54 | bwd_inner_microstep: 1487.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3496
[2024-06-10 15:19:04,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1514.24 | bwd_inner_microstep: 1514.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384
[2024-06-10 15:19:06,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1237.48 | bwd_inner_microstep: 1237.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 15:19:08,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.64 | bwd_microstep: 1582.27 | bwd_inner_microstep: 1582.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 15:19:10,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.59 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 15:19:12,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.31 | bwd_microstep: 1473.43 | bwd_inner_microstep: 1473.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 15:19:14,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1254.32 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977
[2024-06-10 15:19:15,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.20 | bwd_microstep: 826.17 | bwd_inner_microstep: 826.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 15:19:17,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.34 | bwd_microstep: 1287.06 | bwd_inner_microstep: 1287.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 15:19:19,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.83 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 15:19:21,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1548.72 | bwd_inner_microstep: 1548.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 15:19:23,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1384.56 | bwd_inner_microstep: 1384.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 15:19:24,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.76 | bwd_microstep: 802.45 | bwd_inner_microstep: 802.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 15:19:26,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1376.72 | bwd_inner_microstep: 1376.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 15:19:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.64 | bwd_microstep: 1354.98 | bwd_inner_microstep: 1354.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3539
[2024-06-10 15:19:30,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1360.33 | bwd_inner_microstep: 1360.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 15:19:32,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.16 | bwd_microstep: 1463.57 | bwd_inner_microstep: 1463.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 15:19:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1395.00 | bwd_inner_microstep: 1394.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3775
[2024-06-10 15:19:36,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3425
[2024-06-10 15:19:37,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.27 | bwd_microstep: 1200.64 | bwd_inner_microstep: 1200.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766
[2024-06-10 15:19:40,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.64 | bwd_microstep: 1609.51 | bwd_inner_microstep: 1609.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 15:19:46,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 15:19:46,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.83 | bwd_microstep: 5740.56 | bwd_inner_microstep: 2176.29 | bwd_allreduce_microstep: 3564.22 | step_microstep: 37.91
[2024-06-10 15:19:46,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15841.48 | bwd: 46285.12 | bwd_inner: 42719.97 | bwd_allreduce: 3564.46 | step: 39.44
{'loss': 1.2425, 'learning_rate': 2.1425081916514827e-05, 'epoch': 0.49}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-10 15:19:48,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1338.10 | bwd_inner_microstep: 1338.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 15:19:50,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.17 | bwd_microstep: 1242.14 | bwd_inner_microstep: 1242.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 15:19:51,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1390.18 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 15:19:54,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1653.30 | bwd_inner_microstep: 1653.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-10 15:19:56,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.13 | bwd_microstep: 1646.30 | bwd_inner_microstep: 1646.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868
[2024-06-10 15:19:57,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 709.09 | bwd_inner_microstep: 709.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3483
[2024-06-10 15:19:59,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.35 | bwd_microstep: 1244.87 | bwd_inner_microstep: 1244.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-10 15:20:00,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.11 | bwd_microstep: 809.72 | bwd_inner_microstep: 809.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 15:20:02,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1249.13 | bwd_inner_microstep: 1249.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-10 15:20:04,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.69 | bwd_microstep: 1542.15 | bwd_inner_microstep: 1542.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 15:20:05,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 794.42 | bwd_inner_microstep: 794.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 15:20:07,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1343.91 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3991
[2024-06-10 15:20:09,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1543.64 | bwd_inner_microstep: 1543.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 15:20:11,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.26 | bwd_microstep: 1287.92 | bwd_inner_microstep: 1287.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3428
[2024-06-10 15:20:12,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.31 | bwd_microstep: 1187.20 | bwd_inner_microstep: 1187.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 15:20:14,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.47 | bwd_microstep: 1300.41 | bwd_inner_microstep: 1300.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 15:20:16,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1292.23 | bwd_inner_microstep: 1292.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 15:20:17,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.04 | bwd_microstep: 820.11 | bwd_inner_microstep: 820.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 15:20:19,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1389.28 | bwd_inner_microstep: 1389.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 15:20:21,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546
[2024-06-10 15:20:22,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.24 | bwd_microstep: 1202.28 | bwd_inner_microstep: 1202.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3624
[2024-06-10 15:20:24,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.84 | bwd_microstep: 1215.95 | bwd_inner_microstep: 1215.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 15:20:26,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1293.13 | bwd_inner_microstep: 1293.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3572
[2024-06-10 15:20:28,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.17 | bwd_microstep: 1235.87 | bwd_inner_microstep: 1235.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 15:20:29,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.11 | bwd_microstep: 1288.41 | bwd_inner_microstep: 1288.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-10 15:20:31,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.31 | bwd_microstep: 1373.08 | bwd_inner_microstep: 1373.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2067
[2024-06-10 15:20:32,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.32 | bwd_microstep: 910.83 | bwd_inner_microstep: 910.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 15:20:34,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1344.06 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 15:20:36,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1373.84 | bwd_inner_microstep: 1373.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 15:20:38,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1500.15 | bwd_inner_microstep: 1500.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3628
[2024-06-10 15:20:41,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.65 | bwd_microstep: 1708.08 | bwd_inner_microstep: 1708.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2460
[2024-06-10 15:20:46,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 15:20:46,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.15 | bwd_microstep: 4615.78 | bwd_inner_microstep: 1190.23 | bwd_allreduce_microstep: 3425.50 | step_microstep: 38.32
[2024-06-10 15:20:46,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.93 | bwd: 44135.17 | bwd_inner: 40708.77 | bwd_allreduce: 3425.72 | step: 39.82
{'loss': 1.2225, 'learning_rate': 2.1387640870473033e-05, 'epoch': 0.49}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 15:20:47,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.01 | bwd_microstep: 1239.11 | bwd_inner_microstep: 1239.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3983
[2024-06-10 15:20:49,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1370.00 | bwd_inner_microstep: 1369.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3864
[2024-06-10 15:20:52,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1590.84 | bwd_inner_microstep: 1590.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 15:20:53,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.68 | bwd_microstep: 970.74 | bwd_inner_microstep: 970.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 15:20:55,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.11 | bwd_microstep: 1629.38 | bwd_inner_microstep: 1629.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 15:20:57,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1250.29 | bwd_inner_microstep: 1250.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1887
[2024-06-10 15:20:58,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.56 | bwd_microstep: 712.20 | bwd_inner_microstep: 712.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3725
[2024-06-10 15:21:00,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1268.20 | bwd_inner_microstep: 1268.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 15:21:02,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1390.71 | bwd_inner_microstep: 1390.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 15:21:03,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.52 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416
[2024-06-10 15:21:05,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.55 | bwd_microstep: 1307.45 | bwd_inner_microstep: 1307.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 15:21:07,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1406.78 | bwd_inner_microstep: 1406.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 15:21:09,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.26 | bwd_microstep: 1625.15 | bwd_inner_microstep: 1625.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3483
[2024-06-10 15:21:11,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1365.32 | bwd_inner_microstep: 1365.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3449
[2024-06-10 15:21:13,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.84 | bwd_microstep: 1498.07 | bwd_inner_microstep: 1498.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 15:21:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1483.67 | bwd_inner_microstep: 1483.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668
[2024-06-10 15:21:17,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.61 | bwd_microstep: 1456.33 | bwd_inner_microstep: 1456.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 15:21:20,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1613.88 | bwd_inner_microstep: 1613.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 15:21:22,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1510.39 | bwd_inner_microstep: 1510.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 15:21:24,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1395.21 | bwd_inner_microstep: 1395.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 15:21:26,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1478.51 | bwd_inner_microstep: 1478.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 15:21:27,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 806.60 | bwd_inner_microstep: 806.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 15:21:29,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1554.89 | bwd_inner_microstep: 1554.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 15:21:31,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1284.64 | bwd_inner_microstep: 1284.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078
[2024-06-10 15:21:32,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.78 | bwd_microstep: 818.73 | bwd_inner_microstep: 818.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2050
[2024-06-10 15:21:33,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.30 | bwd_microstep: 942.34 | bwd_inner_microstep: 942.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2227
[2024-06-10 15:21:35,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.05 | bwd_microstep: 992.34 | bwd_inner_microstep: 992.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3606
[2024-06-10 15:21:36,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.78 | bwd_microstep: 1371.19 | bwd_inner_microstep: 1371.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 15:21:38,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1423.07 | bwd_inner_microstep: 1423.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032
[2024-06-10 15:21:40,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.07 | bwd_microstep: 901.46 | bwd_inner_microstep: 901.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 15:21:42,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1643.87 | bwd_inner_microstep: 1643.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-10 15:21:47,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 15:21:47,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.61 | bwd_microstep: 4165.19 | bwd_inner_microstep: 1866.12 | bwd_allreduce_microstep: 2299.02 | step_microstep: 38.01
[2024-06-10 15:21:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15800.67 | bwd: 44846.72 | bwd_inner: 42546.80 | bwd_allreduce: 2299.25 | step: 39.49
{'loss': 1.2137, 'learning_rate': 2.1350194937167307e-05, 'epoch': 0.49}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 15:21:49,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1474.07 | bwd_inner_microstep: 1474.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3966
[2024-06-10 15:21:51,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.89 | bwd_microstep: 1597.22 | bwd_inner_microstep: 1597.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 15:21:52,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 792.12 | bwd_inner_microstep: 792.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3843
[2024-06-10 15:21:54,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1493.15 | bwd_inner_microstep: 1493.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 15:21:56,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1247.55 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 15:21:58,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.66 | bwd_microstep: 1351.64 | bwd_inner_microstep: 1351.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 15:22:00,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1349.43 | bwd_inner_microstep: 1349.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 15:22:01,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1254.75 | bwd_inner_microstep: 1254.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 15:22:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1395.64 | bwd_inner_microstep: 1395.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 15:22:05,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1253.59 | bwd_inner_microstep: 1253.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495
[2024-06-10 15:22:07,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.81 | bwd_microstep: 1217.62 | bwd_inner_microstep: 1217.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 15:22:08,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 796.02 | bwd_inner_microstep: 795.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495
[2024-06-10 15:22:10,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1575.77 | bwd_inner_microstep: 1575.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3713
[2024-06-10 15:22:12,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.57 | bwd_microstep: 1727.62 | bwd_inner_microstep: 1727.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1574
[2024-06-10 15:22:13,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 220.65 | bwd_microstep: 574.33 | bwd_inner_microstep: 574.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2017
[2024-06-10 15:22:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.37 | bwd_microstep: 900.80 | bwd_inner_microstep: 900.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683
[2024-06-10 15:22:17,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.43 | bwd_microstep: 1722.82 | bwd_inner_microstep: 1722.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-10 15:22:19,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1407.26 | bwd_inner_microstep: 1407.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3595
[2024-06-10 15:22:20,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1246.06 | bwd_inner_microstep: 1246.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 15:22:22,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1511.72 | bwd_inner_microstep: 1511.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-10 15:22:25,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.28 | bwd_microstep: 1613.16 | bwd_inner_microstep: 1613.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 15:22:27,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.69 | bwd_microstep: 1542.20 | bwd_inner_microstep: 1542.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019
[2024-06-10 15:22:28,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.15 | bwd_microstep: 839.33 | bwd_inner_microstep: 839.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998
[2024-06-10 15:22:29,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.98 | bwd_microstep: 709.04 | bwd_inner_microstep: 709.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693
[2024-06-10 15:22:31,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.30 | bwd_microstep: 1727.87 | bwd_inner_microstep: 1727.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3811
[2024-06-10 15:22:34,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.83 | bwd_microstep: 1685.49 | bwd_inner_microstep: 1685.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 15:22:35,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1299.48 | bwd_inner_microstep: 1299.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3601
[2024-06-10 15:22:38,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.51 | bwd_microstep: 1805.10 | bwd_inner_microstep: 1805.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-10 15:22:40,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.30 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2246
[2024-06-10 15:22:42,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.62 | bwd_microstep: 932.32 | bwd_inner_microstep: 932.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 15:22:44,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.29 | bwd_microstep: 1504.24 | bwd_inner_microstep: 1504.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 15:22:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.44 | optimizer_step: 6.63
[2024-06-10 15:22:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1643.79 | bwd_inner_microstep: 1635.70 | bwd_allreduce_microstep: 8.03 | step_microstep: 42.77
[2024-06-10 15:22:46,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15961.43 | bwd: 42939.88 | bwd_inner: 42930.92 | bwd_allreduce: 8.26 | step: 44.20

 49%|████▉     | 849/1726 [14:40:20<14:47:32, 60.72s/it]


 49%|████▉     | 849/1726 [14:40:20<14:47:32, 60.72s/it]
 49%|████▉     | 850/1726 [14:41:20<14:44:09, 60.56s/it]


 49%|████▉     | 850/1726 [14:41:20<14:44:09, 60.56s/it]
 49%|████▉     | 851/1726 [14:42:23<14:51:28, 61.13s/it]


 49%|████▉     | 851/1726 [14:42:23<14:51:28, 61.13s/it]
 49%|████▉     | 852/1726 [14:43:22<14:44:19, 60.71s/it]


 49%|████▉     | 852/1726 [14:43:22<14:44:19, 60.71s/it]
 49%|████▉     | 853/1726 [14:44:23<14:44:28, 60.79s/it]


 49%|████▉     | 853/1726 [14:44:23<14:44:28, 60.79s/it]
 49%|████▉     | 854/1726 [14:45:23<14:36:43, 60.33s/{'loss': 1.2028, 'learning_rate': 2.1312744248482035e-05, 'epoch': 0.49}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 15:22:48,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1284.64 | bwd_inner_microstep: 1284.48 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3928
[2024-06-10 15:22:50,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.63 | bwd_microstep: 1692.51 | bwd_inner_microstep: 1692.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892
[2024-06-10 15:22:52,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.35 | bwd_microstep: 1586.20 | bwd_inner_microstep: 1586.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787
[2024-06-10 15:22:54,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.02 | bwd_microstep: 1443.46 | bwd_inner_microstep: 1443.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768
[2024-06-10 15:22:56,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1473.11 | bwd_inner_microstep: 1473.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 15:22:58,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.03 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 15:23:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936
[2024-06-10 15:23:02,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.84 | bwd_microstep: 1591.82 | bwd_inner_microstep: 1591.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716
[2024-06-10 15:23:04,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1463.54 | bwd_inner_microstep: 1463.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704
[2024-06-10 15:23:06,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.38 | bwd_microstep: 1626.33 | bwd_inner_microstep: 1626.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445
[2024-06-10 15:23:08,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.67 | bwd_microstep: 1158.08 | bwd_inner_microstep: 1158.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 15:23:10,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1386.75 | bwd_inner_microstep: 1386.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 15:23:12,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3537
[2024-06-10 15:23:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.51 | bwd_microstep: 1520.52 | bwd_inner_microstep: 1520.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3679
[2024-06-10 15:23:16,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1486.66 | bwd_inner_microstep: 1486.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1925
[2024-06-10 15:23:17,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.99 | bwd_microstep: 819.16 | bwd_inner_microstep: 819.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641
[2024-06-10 15:23:19,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1579.67 | bwd_inner_microstep: 1579.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 15:23:20,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.68 | bwd_microstep: 796.59 | bwd_inner_microstep: 796.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 15:23:22,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1283.16 | bwd_inner_microstep: 1283.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2091
[2024-06-10 15:23:23,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.36 | bwd_microstep: 880.81 | bwd_inner_microstep: 880.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2688
[2024-06-10 15:23:25,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.99 | bwd_microstep: 1026.63 | bwd_inner_microstep: 1026.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827
[2024-06-10 15:23:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.42 | bwd_microstep: 1389.11 | bwd_inner_microstep: 1389.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 15:23:28,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1294.77 | bwd_inner_microstep: 1294.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3800
[2024-06-10 15:23:31,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.50 | bwd_microstep: 1578.97 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1985
[2024-06-10 15:23:32,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.73 | bwd_microstep: 891.33 | bwd_inner_microstep: 891.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396
[2024-06-10 15:23:34,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1276.01 | bwd_inner_microstep: 1275.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 15:23:36,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1449.04 | bwd_inner_microstep: 1449.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 15:23:38,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1497.73 | bwd_inner_microstep: 1497.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 15:23:40,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.53 | bwd_microstep: 1659.47 | bwd_inner_microstep: 1659.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1150
[2024-06-10 15:23:41,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 173.31 | bwd_microstep: 448.40 | bwd_inner_microstep: 448.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-10 15:23:43,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.56 | bwd_microstep: 1539.65 | bwd_inner_microstep: 1539.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009
[2024-06-10 15:23:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 15:23:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.06 | bwd_microstep: 3780.42 | bwd_inner_microstep: 839.47 | bwd_allreduce_microstep: 2940.89 | step_microstep: 37.96
[2024-06-10 15:23:47,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15639.94 | bwd: 44844.86 | bwd_inner: 41902.92 | bwd_allreduce: 2941.20 | step: 39.50
{'loss': 1.2404, 'learning_rate': 2.1275288936318334e-05, 'epoch': 0.5}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 15:23:49,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1466.45 | bwd_inner_microstep: 1466.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911
[2024-06-10 15:23:51,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.05 | bwd_microstep: 1583.85 | bwd_inner_microstep: 1583.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 15:23:52,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 676.06 | bwd_inner_microstep: 676.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3840
[2024-06-10 15:23:54,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.96 | bwd_microstep: 1457.65 | bwd_inner_microstep: 1457.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 15:23:56,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.13 | bwd_microstep: 1639.50 | bwd_inner_microstep: 1639.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 15:23:58,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-10 15:24:00,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.56 | bwd_microstep: 1186.93 | bwd_inner_microstep: 1186.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958
[2024-06-10 15:24:01,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.43 | bwd_microstep: 824.90 | bwd_inner_microstep: 824.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 902
[2024-06-10 15:24:01,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.83 | bwd_microstep: 403.62 | bwd_inner_microstep: 403.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3379
[2024-06-10 15:24:03,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.60 | bwd_microstep: 1302.64 | bwd_inner_microstep: 1302.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188
[2024-06-10 15:24:05,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.82 | bwd_microstep: 955.50 | bwd_inner_microstep: 955.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1882
[2024-06-10 15:24:06,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.61 | bwd_microstep: 773.21 | bwd_inner_microstep: 773.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 15:24:07,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.61 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 15:24:09,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1387.86 | bwd_inner_microstep: 1387.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 15:24:11,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1345.58 | bwd_inner_microstep: 1345.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2147
[2024-06-10 15:24:13,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.05 | bwd_microstep: 1054.96 | bwd_inner_microstep: 1054.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 15:24:15,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1338.90 | bwd_inner_microstep: 1338.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3496
[2024-06-10 15:24:16,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.48 | bwd_inner_microstep: 1345.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 15:24:18,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1342.01 | bwd_inner_microstep: 1341.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 15:24:20,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1250.30 | bwd_inner_microstep: 1250.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683
[2024-06-10 15:24:22,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1329.52 | bwd_inner_microstep: 1329.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 15:24:24,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1298.20 | bwd_inner_microstep: 1298.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 15:24:26,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 15:24:28,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.22 | bwd_microstep: 1460.15 | bwd_inner_microstep: 1460.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 15:24:29,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.61 | bwd_microstep: 1299.78 | bwd_inner_microstep: 1299.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2300
[2024-06-10 15:24:31,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.58 | bwd_microstep: 979.55 | bwd_inner_microstep: 979.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 15:24:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 976.15 | bwd_inner_microstep: 976.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 15:24:34,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1279.33 | bwd_inner_microstep: 1279.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 15:24:36,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1254.25 | bwd_inner_microstep: 1254.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2268
[2024-06-10 15:24:37,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.72 | bwd_microstep: 876.20 | bwd_inner_microstep: 876.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 15:24:39,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 1356.85 | bwd_inner_microstep: 1356.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585
[2024-06-10 15:24:50,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.38 | optimizer_step: 6.61
[2024-06-10 15:24:50,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.79 | bwd_microstep: 10947.31 | bwd_inner_microstep: 1725.22 | bwd_allreduce_microstep: 9222.02 | step_microstep: 39.01
[2024-06-10 15:24:50,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14621.34 | bwd: 48454.89 | bwd_inner: 39231.94 | bwd_allreduce: 9222.26 | step: 40.52
{'loss': 1.2363, 'learning_rate': 2.123782913259364e-05, 'epoch': 0.5}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 15:24:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1370.02 | bwd_inner_microstep: 1369.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 15:24:54,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.57 | bwd_microstep: 1268.45 | bwd_inner_microstep: 1268.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 15:24:56,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1287.68 | bwd_inner_microstep: 1287.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2432
[2024-06-10 15:24:57,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.07 | bwd_microstep: 1031.78 | bwd_inner_microstep: 1031.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866
[2024-06-10 15:24:58,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.32 | bwd_microstep: 706.26 | bwd_inner_microstep: 706.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 15:25:00,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1390.69 | bwd_inner_microstep: 1390.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219
[2024-06-10 15:25:01,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.86 | bwd_microstep: 955.92 | bwd_inner_microstep: 955.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 15:25:03,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.14 | bwd_microstep: 1340.62 | bwd_inner_microstep: 1340.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 15:25:05,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1381.26 | bwd_inner_microstep: 1381.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2728
[2024-06-10 15:25:07,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.60 | bwd_microstep: 1132.30 | bwd_inner_microstep: 1132.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158
[2024-06-10 15:25:08,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.60 | bwd_microstep: 946.44 | bwd_inner_microstep: 946.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428
[2024-06-10 15:25:10,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.34 | bwd_microstep: 1280.04 | bwd_inner_microstep: 1280.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957
[2024-06-10 15:25:11,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.79 | bwd_microstep: 890.68 | bwd_inner_microstep: 890.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3461
[2024-06-10 15:25:13,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1497.45 | bwd_inner_microstep: 1497.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 15:25:15,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1242.68 | bwd_inner_microstep: 1242.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 15:25:17,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1347.93 | bwd_inner_microstep: 1347.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 15:25:19,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.36 | bwd_microstep: 1477.23 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510
[2024-06-10 15:25:21,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1410.41 | bwd_inner_microstep: 1410.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569
[2024-06-10 15:25:23,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.71 | bwd_microstep: 1455.47 | bwd_inner_microstep: 1455.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2664
[2024-06-10 15:25:24,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.06 | bwd_microstep: 1023.26 | bwd_inner_microstep: 1023.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603
[2024-06-10 15:25:26,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1439.91 | bwd_inner_microstep: 1439.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 15:25:28,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1294.03 | bwd_inner_microstep: 1294.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 15:25:30,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1352.66 | bwd_inner_microstep: 1352.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 15:25:31,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.67 | bwd_microstep: 1280.33 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 15:25:33,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1353.06 | bwd_inner_microstep: 1353.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2047
[2024-06-10 15:25:34,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.39 | bwd_microstep: 868.55 | bwd_inner_microstep: 868.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2064
[2024-06-10 15:25:35,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.60 | bwd_microstep: 723.83 | bwd_inner_microstep: 723.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3916
[2024-06-10 15:25:38,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.99 | bwd_microstep: 1897.50 | bwd_inner_microstep: 1897.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 15:25:40,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.85 | bwd_microstep: 1613.75 | bwd_inner_microstep: 1613.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2039
[2024-06-10 15:25:41,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.36 | bwd_microstep: 720.22 | bwd_inner_microstep: 720.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 15:25:43,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 15:25:50,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 15:25:50,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 5698.73 | bwd_inner_microstep: 1676.02 | bwd_allreduce_microstep: 4022.65 | step_microstep: 38.01
[2024-06-10 15:25:50,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14986.55 | bwd: 44188.12 | bwd_inner: 40164.57 | bwd_allreduce: 4022.88 | step: 39.46
{'loss': 1.3007, 'learning_rate': 2.120036496924117e-05, 'epoch': 0.5}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3418
[2024-06-10 15:25:52,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.44 | bwd_microstep: 1365.44 | bwd_inner_microstep: 1365.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3991
[2024-06-10 15:25:54,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.33 | bwd_microstep: 1501.77 | bwd_inner_microstep: 1501.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 15:25:56,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1379.66 | bwd_inner_microstep: 1379.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 831
[2024-06-10 15:25:56,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 123.68 | bwd_microstep: 314.82 | bwd_inner_microstep: 314.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 15:25:58,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1386.35 | bwd_inner_microstep: 1386.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2758
[2024-06-10 15:25:59,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.20 | bwd_microstep: 1079.36 | bwd_inner_microstep: 1079.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 15:26:02,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.82 | bwd_microstep: 1548.20 | bwd_inner_microstep: 1548.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2220
[2024-06-10 15:26:03,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.62 | bwd_microstep: 893.05 | bwd_inner_microstep: 893.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 15:26:05,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.55 | bwd_microstep: 1288.13 | bwd_inner_microstep: 1288.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 15:26:06,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.69 | bwd_microstep: 789.52 | bwd_inner_microstep: 789.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 15:26:07,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.79 | bwd_microstep: 789.46 | bwd_inner_microstep: 789.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2456
[2024-06-10 15:26:08,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.40 | bwd_microstep: 977.66 | bwd_inner_microstep: 977.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3501
[2024-06-10 15:26:10,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.42 | bwd_microstep: 1532.88 | bwd_inner_microstep: 1532.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 15:26:12,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.76 | bwd_microstep: 1516.49 | bwd_inner_microstep: 1516.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 15:26:14,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 15:26:16,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1474.89 | bwd_inner_microstep: 1474.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486
[2024-06-10 15:26:18,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.14 | bwd_microstep: 1225.92 | bwd_inner_microstep: 1225.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 15:26:20,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.49 | bwd_microstep: 1540.18 | bwd_inner_microstep: 1540.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 15:26:22,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 15:26:24,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1393.37 | bwd_inner_microstep: 1393.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 15:26:26,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1291.71 | bwd_inner_microstep: 1291.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 15:26:27,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.87 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547
[2024-06-10 15:26:29,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.91 | bwd_microstep: 1454.93 | bwd_inner_microstep: 1454.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 15:26:31,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1508.08 | bwd_inner_microstep: 1508.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667
[2024-06-10 15:26:33,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.68 | bwd_microstep: 1325.04 | bwd_inner_microstep: 1325.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3555
[2024-06-10 15:26:35,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.95 | bwd_microstep: 1561.95 | bwd_inner_microstep: 1561.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 15:26:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.02 | bwd_microstep: 1297.09 | bwd_inner_microstep: 1297.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470
[2024-06-10 15:26:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.72 | bwd_microstep: 1574.65 | bwd_inner_microstep: 1574.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3815
[2024-06-10 15:26:42,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.35 | bwd_microstep: 1690.47 | bwd_inner_microstep: 1690.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 15:26:44,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1655.41 | bwd_inner_microstep: 1655.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597
[2024-06-10 15:26:46,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.18 | bwd_microstep: 1432.10 | bwd_inner_microstep: 1432.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 15:26:52,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 15:26:52,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 5817.45 | bwd_inner_microstep: 1685.75 | bwd_allreduce_microstep: 4131.64 | step_microstep: 38.45
[2024-06-10 15:26:52,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15814.41 | bwd: 46555.11 | bwd_inner: 42422.56 | bwd_allreduce: 4131.87 | step: 39.93
{'loss': 1.2305, 'learning_rate': 2.1162896578209517e-05, 'epoch': 0.5}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023
[2024-06-10 15:26:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.51 | bwd_microstep: 892.53 | bwd_inner_microstep: 892.39 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4510
[2024-06-10 15:26:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.97 | bwd_microstep: 1738.65 | bwd_inner_microstep: 1738.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 15:26:58,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.59 | bwd_microstep: 1547.44 | bwd_inner_microstep: 1547.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 15:27:00,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.60 | bwd_microstep: 1277.48 | bwd_inner_microstep: 1277.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 15:27:02,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1276.73 | bwd_inner_microstep: 1276.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 15:27:03,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.50 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 15:27:05,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 15:27:07,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1248.80 | bwd_inner_microstep: 1248.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766
[2024-06-10 15:27:09,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.72 | bwd_microstep: 1344.29 | bwd_inner_microstep: 1344.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3419
[2024-06-10 15:27:11,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.99 | bwd_microstep: 1307.69 | bwd_inner_microstep: 1307.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 15:27:12,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.41 | bwd_microstep: 1326.78 | bwd_inner_microstep: 1326.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 15:27:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1481.77 | bwd_inner_microstep: 1481.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 15:27:17,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.79 | bwd_microstep: 1521.64 | bwd_inner_microstep: 1521.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971
[2024-06-10 15:27:18,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.06 | bwd_microstep: 890.06 | bwd_inner_microstep: 890.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 15:27:20,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.36 | bwd_microstep: 1583.92 | bwd_inner_microstep: 1583.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2501
[2024-06-10 15:27:22,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.76 | bwd_microstep: 1083.10 | bwd_inner_microstep: 1083.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982
[2024-06-10 15:27:23,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.70 | bwd_microstep: 826.76 | bwd_inner_microstep: 826.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530
[2024-06-10 15:27:24,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.01 | bwd_microstep: 1196.48 | bwd_inner_microstep: 1196.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 15:27:26,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1397.46 | bwd_inner_microstep: 1397.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 637
[2024-06-10 15:27:27,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.50 | bwd_microstep: 264.01 | bwd_inner_microstep: 263.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2469
[2024-06-10 15:27:28,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.96 | bwd_microstep: 955.58 | bwd_inner_microstep: 955.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2619
[2024-06-10 15:27:30,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.53 | bwd_microstep: 1141.21 | bwd_inner_microstep: 1141.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 15:27:31,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.20 | bwd_microstep: 1285.45 | bwd_inner_microstep: 1285.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 15:27:33,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 15:27:35,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 1497.58 | bwd_inner_microstep: 1497.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902
[2024-06-10 15:27:36,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.13 | bwd_microstep: 684.87 | bwd_inner_microstep: 684.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3557
[2024-06-10 15:27:38,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.18 | bwd_microstep: 1330.76 | bwd_inner_microstep: 1330.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597
[2024-06-10 15:27:40,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.10 | bwd_microstep: 1304.37 | bwd_inner_microstep: 1304.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3734
[2024-06-10 15:27:42,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.03 | bwd_microstep: 1242.54 | bwd_inner_microstep: 1242.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576
[2024-06-10 15:27:44,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.40 | bwd_microstep: 1334.15 | bwd_inner_microstep: 1334.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 15:27:46,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.80 | bwd_microstep: 1598.19 | bwd_inner_microstep: 1598.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3571
[2024-06-10 15:27:52,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 15:27:52,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.62 | bwd_microstep: 5247.16 | bwd_inner_microstep: 1920.22 | bwd_allreduce_microstep: 3326.88 | step_microstep: 38.10
[2024-06-10 15:27:52,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15152.64 | bwd: 43791.71 | bwd_inner: 40463.82 | bwd_allreduce: 3327.16 | step: 39.64
{'loss': 1.251, 'learning_rate': 2.112542409146217e-05, 'epoch': 0.5}
it]


 49%|████▉     | 854/1726 [14:45:23<14:36:43, 60.33s/it]
 50%|████▉     | 855/1726 [14:46:23<14:37:51, 60.47s/it]


 50%|████▉     | 855/1726 [14:46:23<14:37:51, 60.47s/it]
 50%|████▉     | 856/1726 [14:47:27<14:49:34, 61.35s/it]


 50%|████▉     | 856/1726 [14:47:27<14:49:34, 61.35s/it]
 50%|████▉     | 857/1726 [14:48:26<14:40:30, 60.79s/it]


 50%|████▉     | 857/1726 [14:48:26<14:40:30, 60.79s/it]
 50%|████▉     | 858/1726 [14:49:29<14:47:46, 61.37s/it]


 50%|████▉     | 858/1726 [14:49:29<14:47:46, 61.37s/it]
 50%|████▉     | 859/1726 [14:50:28<14:37:41, 60.74s/it]


 5dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 15:27:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3972
[2024-06-10 15:27:56,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.44 | bwd_microstep: 1597.61 | bwd_inner_microstep: 1597.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3895
[2024-06-10 15:27:58,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.84 | bwd_microstep: 1581.90 | bwd_inner_microstep: 1581.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896
[2024-06-10 15:28:00,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.18 | bwd_microstep: 1583.10 | bwd_inner_microstep: 1583.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 15:28:02,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.96 | bwd_microstep: 1452.31 | bwd_inner_microstep: 1452.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-10 15:28:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.00 | bwd_microstep: 1645.57 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3521
[2024-06-10 15:28:06,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1255.17 | bwd_inner_microstep: 1255.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 15:28:07,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.66 | bwd_microstep: 787.76 | bwd_inner_microstep: 787.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 15:28:09,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.93 | bwd_microstep: 1382.70 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 15:28:11,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.96 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2929
[2024-06-10 15:28:12,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.22 | bwd_microstep: 1093.71 | bwd_inner_microstep: 1093.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 15:28:14,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.15 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 15:28:16,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.72 | bwd_microstep: 1488.32 | bwd_inner_microstep: 1488.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5458
[2024-06-10 15:28:19,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 716.48 | bwd_microstep: 1891.43 | bwd_inner_microstep: 1891.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3387
[2024-06-10 15:28:21,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.08 | bwd_microstep: 1368.65 | bwd_inner_microstep: 1368.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441
[2024-06-10 15:28:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.38 | bwd_microstep: 945.79 | bwd_inner_microstep: 945.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 15:28:23,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 795.20 | bwd_inner_microstep: 795.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 15:28:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.71 | bwd_microstep: 1382.14 | bwd_inner_microstep: 1382.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 15:28:27,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.04 | bwd_microstep: 1608.81 | bwd_inner_microstep: 1608.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 15:28:29,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 15:28:31,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1408.04 | bwd_inner_microstep: 1408.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 15:28:32,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.45 | bwd_microstep: 800.04 | bwd_inner_microstep: 800.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673
[2024-06-10 15:28:34,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.63 | bwd_microstep: 1590.73 | bwd_inner_microstep: 1590.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3588
[2024-06-10 15:28:36,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.82 | bwd_microstep: 1432.70 | bwd_inner_microstep: 1432.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 15:28:37,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.55 | bwd_microstep: 791.55 | bwd_inner_microstep: 791.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-10 15:28:39,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1436.64 | bwd_inner_microstep: 1436.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 15:28:42,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.85 | bwd_microstep: 1645.29 | bwd_inner_microstep: 1645.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570
[2024-06-10 15:28:44,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1457.43 | bwd_inner_microstep: 1457.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 15:28:46,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1444.25 | bwd_inner_microstep: 1444.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3561
[2024-06-10 15:28:48,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.37 | bwd_microstep: 1562.53 | bwd_inner_microstep: 1562.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405
[2024-06-10 15:28:50,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1371.72 | bwd_inner_microstep: 1371.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3598
[2024-06-10 15:28:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 15:28:52,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.47 | bwd_microstep: 1689.79 | bwd_inner_microstep: 1681.70 | bwd_allreduce_microstep: 8.03 | step_microstep: 38.84
[2024-06-10 15:28:52,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16323.68 | bwd: 43691.18 | bwd_inner: 43682.21 | bwd_allreduce: 8.27 | step: 40.33
{'loss': 1.228, 'learning_rate': 2.1087947640977015e-05, 'epoch': 0.5}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 15:28:54,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1580.82 | bwd_inner_microstep: 1580.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 15:28:56,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1478.43 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 15:28:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 15:29:00,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.95 | bwd_microstep: 1353.07 | bwd_inner_microstep: 1353.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 15:29:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.15 | bwd_microstep: 1291.02 | bwd_inner_microstep: 1290.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 15:29:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.58 | bwd_microstep: 1281.28 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 15:29:05,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1284.02 | bwd_inner_microstep: 1283.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 934
[2024-06-10 15:29:06,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 156.76 | bwd_microstep: 412.35 | bwd_inner_microstep: 412.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3701
[2024-06-10 15:29:08,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.41 | bwd_microstep: 1480.25 | bwd_inner_microstep: 1480.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 15:29:10,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1378.66 | bwd_inner_microstep: 1378.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 15:29:12,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1243.58 | bwd_inner_microstep: 1243.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 15:29:14,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3640
[2024-06-10 15:29:16,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1486.28 | bwd_inner_microstep: 1486.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-10 15:29:18,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1408.96 | bwd_inner_microstep: 1408.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 15:29:20,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1395.83 | bwd_inner_microstep: 1395.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3519
[2024-06-10 15:29:21,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1348.50 | bwd_inner_microstep: 1348.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 15:29:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 15:29:26,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1510.96 | bwd_inner_microstep: 1510.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2084
[2024-06-10 15:29:27,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.02 | bwd_microstep: 849.91 | bwd_inner_microstep: 849.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 15:29:29,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.61 | bwd_microstep: 1554.77 | bwd_inner_microstep: 1554.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 15:29:31,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1394.37 | bwd_inner_microstep: 1394.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 15:29:33,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.55 | bwd_microstep: 1462.69 | bwd_inner_microstep: 1462.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 15:29:35,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.12 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 15:29:37,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1490.46 | bwd_inner_microstep: 1490.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 15:29:39,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1457.67 | bwd_inner_microstep: 1457.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3706
[2024-06-10 15:29:41,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1236.05 | bwd_inner_microstep: 1236.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846
[2024-06-10 15:29:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.34 | bwd_microstep: 1662.00 | bwd_inner_microstep: 1661.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549
[2024-06-10 15:29:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1329.42 | bwd_inner_microstep: 1329.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 15:29:47,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.20 | bwd_microstep: 1542.35 | bwd_inner_microstep: 1542.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 15:29:49,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.41 | bwd_microstep: 1548.96 | bwd_inner_microstep: 1548.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3455
[2024-06-10 15:29:51,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.75 | bwd_microstep: 1381.26 | bwd_inner_microstep: 1381.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 15:29:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.01 | optimizer_gradients: 4.15 | optimizer_step: 6.58
[2024-06-10 15:29:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.49 | bwd_microstep: 2149.66 | bwd_inner_microstep: 1574.78 | bwd_allreduce_microstep: 574.83 | step_microstep: 37.78
[2024-06-10 15:29:54,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16488.74 | bwd: 44729.04 | bwd_inner: 44153.31 | bwd_allreduce: 575.06 | step: 39.25
{'loss': 1.2358, 'learning_rate': 2.105046735874592e-05, 'epoch': 0.5}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 15:29:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.21 | bwd_microstep: 1562.33 | bwd_inner_microstep: 1562.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3968
[2024-06-10 15:29:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.09 | bwd_microstep: 1598.72 | bwd_inner_microstep: 1598.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865
[2024-06-10 15:30:00,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.43 | bwd_microstep: 1661.28 | bwd_inner_microstep: 1661.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-10 15:30:02,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1646.23 | bwd_inner_microstep: 1646.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 15:30:04,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1278.31 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 15:30:06,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1245.03 | bwd_inner_microstep: 1245.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 15:30:08,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1257.30 | bwd_inner_microstep: 1257.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 15:30:10,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1435.09 | bwd_inner_microstep: 1435.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 15:30:12,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.51 | bwd_microstep: 1484.69 | bwd_inner_microstep: 1484.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3781
[2024-06-10 15:30:14,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.54 | bwd_microstep: 1607.06 | bwd_inner_microstep: 1607.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3745
[2024-06-10 15:30:16,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.12 | bwd_microstep: 1669.96 | bwd_inner_microstep: 1669.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 15:30:18,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1354.29 | bwd_inner_microstep: 1354.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380
[2024-06-10 15:30:20,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.26 | bwd_microstep: 1335.37 | bwd_inner_microstep: 1335.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 15:30:22,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1339.06 | bwd_inner_microstep: 1339.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 15:30:24,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 15:30:26,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.06 | bwd_microstep: 1417.78 | bwd_inner_microstep: 1417.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-10 15:30:28,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.59 | bwd_microstep: 1584.45 | bwd_inner_microstep: 1584.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3631
[2024-06-10 15:30:30,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.71 | bwd_microstep: 1708.59 | bwd_inner_microstep: 1708.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 15:30:32,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1417.04 | bwd_inner_microstep: 1417.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 15:30:34,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.69 | bwd_microstep: 1387.77 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 15:30:36,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 15:30:38,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1307.81 | bwd_inner_microstep: 1307.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931
[2024-06-10 15:30:39,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.61 | bwd_microstep: 819.87 | bwd_inner_microstep: 819.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3967
[2024-06-10 15:30:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.25 | bwd_microstep: 1605.31 | bwd_inner_microstep: 1605.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 15:30:43,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.47 | bwd_microstep: 1514.04 | bwd_inner_microstep: 1514.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3543
[2024-06-10 15:30:45,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.46 | bwd_microstep: 1229.84 | bwd_inner_microstep: 1229.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 15:30:47,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1287.53 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2516
[2024-06-10 15:30:48,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.47 | bwd_microstep: 1059.40 | bwd_inner_microstep: 1059.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 15:30:50,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.66 | bwd_microstep: 1188.10 | bwd_inner_microstep: 1188.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811
[2024-06-10 15:30:52,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.38 | bwd_microstep: 1413.98 | bwd_inner_microstep: 1413.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-10 15:30:54,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.40 | bwd_microstep: 1600.39 | bwd_inner_microstep: 1600.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211
[2024-06-10 15:30:57,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.64
[2024-06-10 15:30:57,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.70 | bwd_microstep: 2029.33 | bwd_inner_microstep: 1084.92 | bwd_allreduce_microstep: 944.37 | step_microstep: 38.05
[2024-06-10 15:30:57,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16736.24 | bwd: 45948.26 | bwd_inner: 45002.99 | bwd_allreduce: 944.59 | step: 39.54
{'loss': 1.2098, 'learning_rate': 2.1012983376774255e-05, 'epoch': 0.5}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 15:30:58,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.62 | bwd_microstep: 785.75 | bwd_inner_microstep: 785.62 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-10 15:30:59,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.37 | bwd_microstep: 1307.17 | bwd_inner_microstep: 1307.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 15:31:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 15:31:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.99 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2705
[2024-06-10 15:31:05,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.80 | bwd_microstep: 1084.65 | bwd_inner_microstep: 1084.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 15:31:07,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.62 | bwd_microstep: 1148.36 | bwd_inner_microstep: 1148.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3406
[2024-06-10 15:31:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.34 | bwd_microstep: 1405.28 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3571
[2024-06-10 15:31:10,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1333.35 | bwd_inner_microstep: 1333.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 15:31:12,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3683
[2024-06-10 15:31:14,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3511
[2024-06-10 15:31:16,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.51 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 15:31:18,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.52 | bwd_microstep: 1489.82 | bwd_inner_microstep: 1489.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 15:31:20,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.88 | bwd_microstep: 1616.65 | bwd_inner_microstep: 1616.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 15:31:21,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 799.42 | bwd_inner_microstep: 799.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 15:31:23,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1422.85 | bwd_inner_microstep: 1422.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3541
[2024-06-10 15:31:26,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1541.87 | bwd_inner_microstep: 1541.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 15:31:27,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.84 | bwd_microstep: 801.06 | bwd_inner_microstep: 801.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416
[2024-06-10 15:31:28,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.31 | bwd_microstep: 1180.33 | bwd_inner_microstep: 1180.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 15:31:30,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.97 | bwd_microstep: 1410.94 | bwd_inner_microstep: 1410.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2288
[2024-06-10 15:31:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.45 | bwd_microstep: 939.35 | bwd_inner_microstep: 939.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 15:31:34,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.78 | bwd_microstep: 1626.93 | bwd_inner_microstep: 1626.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 15:31:36,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.88 | bwd_microstep: 1359.93 | bwd_inner_microstep: 1359.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2234
[2024-06-10 15:31:37,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.18 | bwd_microstep: 807.37 | bwd_inner_microstep: 807.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 15:31:39,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.88 | bwd_microstep: 1657.16 | bwd_inner_microstep: 1657.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 15:31:41,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.82 | bwd_microstep: 1184.32 | bwd_inner_microstep: 1184.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 15:31:43,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3813
[2024-06-10 15:31:45,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1509.36 | bwd_inner_microstep: 1509.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 15:31:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 15:31:49,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1396.17 | bwd_inner_microstep: 1396.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 15:31:51,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.20 | bwd_microstep: 1645.29 | bwd_inner_microstep: 1645.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 15:31:53,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.42 | bwd_microstep: 1542.72 | bwd_inner_microstep: 1542.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3596
[2024-06-10 15:31:59,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.30 | optimizer_step: 6.59
[2024-06-10 15:31:59,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 5303.34 | bwd_inner_microstep: 1930.81 | bwd_allreduce_microstep: 3372.46 | step_microstep: 38.40
[2024-06-10 15:31:59,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15925.38 | bwd: 46161.61 | bwd_inner: 42788.14 | bwd_allreduce: 3372.74 | step: 39.90
{'loss': 1.2033, 'learning_rate': 2.0975495827080404e-05, 'epoch': 0.5}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403
[2024-06-10 15:32:01,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1299.65 | bwd_inner_microstep: 1299.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 15:32:02,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.15 | bwd_microstep: 1240.70 | bwd_inner_microstep: 1240.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 15:32:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.50 | bwd_microstep: 1348.97 | bwd_inner_microstep: 1348.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2890
[2024-06-10 15:32:06,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.12 | bwd_microstep: 1181.34 | bwd_inner_microstep: 1181.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 15:32:08,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1245.37 | bwd_inner_microstep: 1245.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 15:32:09,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1280.23 | bwd_inner_microstep: 1280.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 15:32:11,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3717
[2024-06-10 15:32:13,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1461.35 | bwd_inner_microstep: 1461.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 15:32:14,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 790.92 | bwd_inner_microstep: 790.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2660
[2024-06-10 15:32:16,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.08 | bwd_microstep: 1025.56 | bwd_inner_microstep: 1025.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 15:32:18,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.11 | bwd_microstep: 1411.52 | bwd_inner_microstep: 1411.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 15:32:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1647.71 | bwd_inner_microstep: 1647.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 15:32:22,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1348.61 | bwd_inner_microstep: 1348.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3976
[2024-06-10 15:32:24,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.18 | bwd_microstep: 1802.37 | bwd_inner_microstep: 1802.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 15:32:26,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.07 | bwd_microstep: 1286.19 | bwd_inner_microstep: 1286.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539
[2024-06-10 15:32:28,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.14 | bwd_microstep: 1199.27 | bwd_inner_microstep: 1199.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 15:32:30,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1612.13 | bwd_inner_microstep: 1612.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 15:32:32,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 15:32:34,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1393.95 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 15:32:36,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.03 | bwd_microstep: 1524.36 | bwd_inner_microstep: 1524.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 15:32:38,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3848
[2024-06-10 15:32:40,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1396.91 | bwd_inner_microstep: 1396.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3870
[2024-06-10 15:32:42,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.05 | bwd_microstep: 1672.03 | bwd_inner_microstep: 1672.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4157
[2024-06-10 15:32:45,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 677.04 | bwd_microstep: 1847.74 | bwd_inner_microstep: 1847.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-10 15:32:47,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1318.57 | bwd_inner_microstep: 1318.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3550
[2024-06-10 15:32:49,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.63 | bwd_microstep: 1535.39 | bwd_inner_microstep: 1535.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 15:32:51,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.98 | bwd_microstep: 1477.45 | bwd_inner_microstep: 1477.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 15:32:53,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.04 | bwd_microstep: 1476.12 | bwd_inner_microstep: 1476.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 15:32:55,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 1644.22 | bwd_inner_microstep: 1644.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2094
[2024-06-10 15:32:56,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.69 | bwd_microstep: 917.87 | bwd_inner_microstep: 917.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 15:32:58,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1376.63 | bwd_inner_microstep: 1376.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2962
[2024-06-10 15:33:00,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 15:33:00,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.41 | bwd_microstep: 1223.27 | bwd_inner_microstep: 1214.41 | bwd_allreduce_microstep: 8.81 | step_microstep: 37.60
[2024-06-10 15:33:00,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16490.72 | bwd: 44118.36 | bwd_inner: 44108.65 | bwd_allreduce: 9.03 | step: 39.11
{'loss': 1.1835, 'learning_rate': 2.093800484169532e-05, 'epoch': 0.5}
0%|████▉     | 859/1726 [14:50:28<14:37:41, 60.74s/it]
 50%|████▉     | 860/1726 [14:51:29<14:34:59, 60.62s/it]


 50%|████▉     | 860/1726 [14:51:29<14:34:59, 60.62s/it]
 50%|████▉     | 861/1726 [14:52:30<14:37:58, 60.90s/it]


 50%|████▉     | 861/1726 [14:52:30<14:37:58, 60.90s/it]
 50%|████▉     | 862/1726 [14:53:33<14:46:07, 61.54s/it]


 50%|████▉     | 862/1726 [14:53:33<14:46:07, 61.54s/it]
 50%|█████     | 863/1726 [14:54:36<14:48:54, 61.80s/it]


 50%|█████     | 863/1726 [14:54:36<14:48:54, 61.80s/it]
 50%|█████     | 864/1726 [14:55:37<14:44:11, 61.54s/it]


 50%|█████     | 864/1726 [14:55:37<14:44:11, 61.54s/it]dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1896
[2024-06-10 15:33:01,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.91 | bwd_microstep: 804.24 | bwd_inner_microstep: 804.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-10 15:33:03,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.28 | bwd_microstep: 1563.76 | bwd_inner_microstep: 1563.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 15:33:05,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.54 | bwd_microstep: 1455.24 | bwd_inner_microstep: 1455.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798
[2024-06-10 15:33:07,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.68 | bwd_microstep: 1550.03 | bwd_inner_microstep: 1550.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 15:33:09,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.37 | bwd_microstep: 1185.73 | bwd_inner_microstep: 1185.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 15:33:11,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.38 | bwd_microstep: 1245.17 | bwd_inner_microstep: 1245.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 15:33:12,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1287.55 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2134
[2024-06-10 15:33:14,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.13 | bwd_microstep: 928.35 | bwd_inner_microstep: 928.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-10 15:33:15,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.74 | bwd_microstep: 1157.28 | bwd_inner_microstep: 1157.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3950
[2024-06-10 15:33:18,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.84 | bwd_microstep: 1699.82 | bwd_inner_microstep: 1699.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 15:33:20,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.37 | bwd_microstep: 1614.73 | bwd_inner_microstep: 1614.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495
[2024-06-10 15:33:22,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1430.11 | bwd_inner_microstep: 1430.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 15:33:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.01 | bwd_microstep: 1499.07 | bwd_inner_microstep: 1499.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 15:33:26,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.82 | bwd_inner_microstep: 1508.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 15:33:28,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.99 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2898
[2024-06-10 15:33:29,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.08 | bwd_microstep: 1088.46 | bwd_inner_microstep: 1088.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948
[2024-06-10 15:33:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.09 | bwd_microstep: 821.73 | bwd_inner_microstep: 821.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-10 15:33:33,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.55 | bwd_microstep: 1753.81 | bwd_inner_microstep: 1753.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 15:33:34,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.44 | bwd_microstep: 977.72 | bwd_inner_microstep: 977.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 15:33:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.56 | bwd_microstep: 1602.52 | bwd_inner_microstep: 1602.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489
[2024-06-10 15:33:38,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1318.09 | bwd_inner_microstep: 1318.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 15:33:41,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.02 | bwd_microstep: 1659.57 | bwd_inner_microstep: 1659.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3398
[2024-06-10 15:33:43,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1467.78 | bwd_inner_microstep: 1467.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2068
[2024-06-10 15:33:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.82 | bwd_microstep: 756.44 | bwd_inner_microstep: 756.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2061
[2024-06-10 15:33:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.62 | bwd_microstep: 817.14 | bwd_inner_microstep: 817.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086
[2024-06-10 15:33:46,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.54 | bwd_microstep: 823.01 | bwd_inner_microstep: 822.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 15:33:48,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1397.61 | bwd_inner_microstep: 1397.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 15:33:49,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.40 | bwd_microstep: 879.64 | bwd_inner_microstep: 879.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 15:33:51,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1295.36 | bwd_inner_microstep: 1295.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 15:33:53,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1283.59 | bwd_inner_microstep: 1283.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2167
[2024-06-10 15:33:54,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.08 | bwd_microstep: 951.64 | bwd_inner_microstep: 951.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 15:34:03,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.33 | optimizer_step: 6.58
[2024-06-10 15:34:03,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 8556.02 | bwd_inner_microstep: 1544.77 | bwd_allreduce_microstep: 7011.18 | step_microstep: 38.47
[2024-06-10 15:34:03,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15179.79 | bwd: 47756.98 | bwd_inner: 40744.83 | bwd_allreduce: 7011.45 | step: 39.98
{'loss': 1.2617, 'learning_rate': 2.0900510552662057e-05, 'epoch': 0.5}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3460
[2024-06-10 15:34:05,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.17 | bwd_microstep: 1564.78 | bwd_inner_microstep: 1564.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-10 15:34:07,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1322.15 | bwd_inner_microstep: 1322.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2376
[2024-06-10 15:34:08,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.40 | bwd_microstep: 928.63 | bwd_inner_microstep: 928.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 15:34:10,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1345.93 | bwd_inner_microstep: 1345.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3842
[2024-06-10 15:34:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.09 | bwd_microstep: 1683.80 | bwd_inner_microstep: 1683.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 15:34:14,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1337.31 | bwd_inner_microstep: 1337.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 15:34:16,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1383.50 | bwd_inner_microstep: 1383.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3484
[2024-06-10 15:34:18,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.55 | bwd_microstep: 1504.45 | bwd_inner_microstep: 1504.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936
[2024-06-10 15:34:21,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1583.53 | bwd_inner_microstep: 1583.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 15:34:23,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1515.39 | bwd_inner_microstep: 1515.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3663
[2024-06-10 15:34:25,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.49 | bwd_microstep: 1814.75 | bwd_inner_microstep: 1814.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484
[2024-06-10 15:34:27,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.56 | bwd_microstep: 1573.53 | bwd_inner_microstep: 1573.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651
[2024-06-10 15:34:30,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.75 | bwd_microstep: 1709.93 | bwd_inner_microstep: 1709.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3636
[2024-06-10 15:34:32,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.73 | bwd_microstep: 1779.47 | bwd_inner_microstep: 1779.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-10 15:34:34,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1411.13 | bwd_inner_microstep: 1411.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 15:34:36,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.31 | bwd_microstep: 1285.15 | bwd_inner_microstep: 1285.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 15:34:38,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1426.29 | bwd_inner_microstep: 1426.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3920
[2024-06-10 15:34:40,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1393.42 | bwd_inner_microstep: 1393.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 15:34:41,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.55 | bwd_microstep: 796.72 | bwd_inner_microstep: 796.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 15:34:43,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1378.85 | bwd_inner_microstep: 1378.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 15:34:45,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1345.11 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3688
[2024-06-10 15:34:47,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.09 | bwd_microstep: 1328.12 | bwd_inner_microstep: 1328.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 15:34:49,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.33 | bwd_microstep: 1552.29 | bwd_inner_microstep: 1552.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2272
[2024-06-10 15:34:50,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.16 | bwd_microstep: 1032.78 | bwd_inner_microstep: 1032.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3809
[2024-06-10 15:34:52,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.90 | bwd_microstep: 1594.25 | bwd_inner_microstep: 1594.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3819
[2024-06-10 15:34:55,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.87 | bwd_microstep: 1803.33 | bwd_inner_microstep: 1803.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3526
[2024-06-10 15:34:57,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.05 | bwd_microstep: 1534.35 | bwd_inner_microstep: 1534.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815
[2024-06-10 15:34:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.13 | bwd_microstep: 1592.24 | bwd_inner_microstep: 1592.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768
[2024-06-10 15:35:01,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1473.93 | bwd_inner_microstep: 1473.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2023
[2024-06-10 15:35:02,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.37 | bwd_microstep: 868.86 | bwd_inner_microstep: 868.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 15:35:04,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.24 | bwd_microstep: 914.48 | bwd_inner_microstep: 914.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2227
[2024-06-10 15:35:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 15:35:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.43 | bwd_microstep: 999.97 | bwd_inner_microstep: 991.73 | bwd_allreduce_microstep: 8.19 | step_microstep: 38.00
[2024-06-10 15:35:05,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16643.50 | bwd: 44778.42 | bwd_inner: 44769.33 | bwd_allreduce: 8.41 | step: 39.46
{'loss': 1.2107, 'learning_rate': 2.0863013092035312e-05, 'epoch': 0.5}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 15:35:07,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.44 | bwd_microstep: 1474.29 | bwd_inner_microstep: 1474.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904
[2024-06-10 15:35:09,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889
[2024-06-10 15:35:11,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.97 | bwd_microstep: 1580.93 | bwd_inner_microstep: 1580.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 15:35:13,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.22 | bwd_microstep: 1647.56 | bwd_inner_microstep: 1647.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 15:35:15,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1243.30 | bwd_inner_microstep: 1243.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 15:35:17,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.82 | bwd_microstep: 1311.84 | bwd_inner_microstep: 1311.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 15:35:19,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 15:35:20,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.92 | bwd_microstep: 791.39 | bwd_inner_microstep: 791.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3573
[2024-06-10 15:35:22,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1263.28 | bwd_inner_microstep: 1263.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3483
[2024-06-10 15:35:23,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1290.45 | bwd_inner_microstep: 1290.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 15:35:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1283.09 | bwd_inner_microstep: 1283.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 15:35:27,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1421.54 | bwd_inner_microstep: 1421.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3683
[2024-06-10 15:35:29,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.72 | bwd_microstep: 1445.64 | bwd_inner_microstep: 1445.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3653
[2024-06-10 15:35:31,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.20 | bwd_microstep: 1677.58 | bwd_inner_microstep: 1677.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 15:35:33,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1381.93 | bwd_inner_microstep: 1381.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 15:35:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.35 | bwd_microstep: 1455.81 | bwd_inner_microstep: 1455.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3526
[2024-06-10 15:35:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1588.04 | bwd_inner_microstep: 1588.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843
[2024-06-10 15:35:40,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.78 | bwd_microstep: 1455.46 | bwd_inner_microstep: 1455.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3607
[2024-06-10 15:35:42,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1635.49 | bwd_inner_microstep: 1635.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 15:35:44,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1374.16 | bwd_inner_microstep: 1374.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-10 15:35:45,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.56 | bwd_microstep: 912.72 | bwd_inner_microstep: 912.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 15:35:47,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.73 | bwd_microstep: 1652.30 | bwd_inner_microstep: 1652.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-10 15:35:49,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1341.96 | bwd_inner_microstep: 1341.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2202
[2024-06-10 15:35:50,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.83 | bwd_microstep: 907.78 | bwd_inner_microstep: 907.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 15:35:52,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1407.86 | bwd_inner_microstep: 1407.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-10 15:35:54,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.15 | bwd_microstep: 972.26 | bwd_inner_microstep: 972.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 15:35:56,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1611.76 | bwd_inner_microstep: 1611.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 15:35:58,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1255.37 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-10 15:36:00,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.47 | bwd_microstep: 1591.19 | bwd_inner_microstep: 1591.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568
[2024-06-10 15:36:02,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.27 | bwd_microstep: 1585.39 | bwd_inner_microstep: 1585.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1431
[2024-06-10 15:36:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 218.33 | bwd_microstep: 565.69 | bwd_inner_microstep: 565.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 15:36:07,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 15:36:07,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.90 | bwd_microstep: 3217.98 | bwd_inner_microstep: 1786.51 | bwd_allreduce_microstep: 1431.42 | step_microstep: 37.93
[2024-06-10 15:36:07,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16252.53 | bwd: 45070.29 | bwd_inner: 43637.98 | bwd_allreduce: 1431.64 | step: 39.40
{'loss': 1.2016, 'learning_rate': 2.082551259188094e-05, 'epoch': 0.5}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 15:36:09,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.15 | bwd_microstep: 1467.65 | bwd_inner_microstep: 1467.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975
[2024-06-10 15:36:10,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.78 | bwd_microstep: 828.44 | bwd_inner_microstep: 828.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 15:36:12,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1551.68 | bwd_inner_microstep: 1551.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1945
[2024-06-10 15:36:13,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 820.46 | bwd_inner_microstep: 820.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 15:36:15,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.04 | bwd_microstep: 1242.27 | bwd_inner_microstep: 1242.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 15:36:17,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-10 15:36:19,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.15 | bwd_microstep: 1527.44 | bwd_inner_microstep: 1527.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 15:36:21,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1384.82 | bwd_inner_microstep: 1384.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3659
[2024-06-10 15:36:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.10 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2149
[2024-06-10 15:36:24,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.90 | bwd_microstep: 880.03 | bwd_inner_microstep: 880.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096
[2024-06-10 15:36:25,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.95 | bwd_microstep: 917.66 | bwd_inner_microstep: 917.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 15:36:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.03 | bwd_microstep: 1474.39 | bwd_inner_microstep: 1474.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-10 15:36:29,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.08 | bwd_microstep: 1619.91 | bwd_inner_microstep: 1619.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3222
[2024-06-10 15:36:31,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.12 | bwd_microstep: 1208.90 | bwd_inner_microstep: 1208.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647
[2024-06-10 15:36:33,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.21 | bwd_microstep: 1575.75 | bwd_inner_microstep: 1575.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3543
[2024-06-10 15:36:35,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.90 | bwd_microstep: 1590.34 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3537
[2024-06-10 15:36:37,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.83 | bwd_microstep: 1526.09 | bwd_inner_microstep: 1526.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624
[2024-06-10 15:36:39,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.02 | bwd_microstep: 1432.72 | bwd_inner_microstep: 1432.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1995
[2024-06-10 15:36:40,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.85 | bwd_microstep: 832.30 | bwd_inner_microstep: 832.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 639
[2024-06-10 15:36:41,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.03 | bwd_microstep: 265.28 | bwd_inner_microstep: 265.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 15:36:43,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 1512.14 | bwd_inner_microstep: 1512.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 15:36:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1557.86 | bwd_inner_microstep: 1557.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 15:36:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1548.36 | bwd_inner_microstep: 1548.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3716
[2024-06-10 15:36:49,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.68 | bwd_microstep: 1731.26 | bwd_inner_microstep: 1731.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 15:36:52,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.22 | bwd_microstep: 1600.51 | bwd_inner_microstep: 1600.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 15:36:53,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.87 | bwd_microstep: 1255.59 | bwd_inner_microstep: 1255.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 15:36:56,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1507.42 | bwd_inner_microstep: 1507.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2056
[2024-06-10 15:36:57,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.96 | bwd_microstep: 722.32 | bwd_inner_microstep: 722.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3811
[2024-06-10 15:36:59,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.21 | bwd_microstep: 1603.48 | bwd_inner_microstep: 1603.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2270
[2024-06-10 15:37:00,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.92 | bwd_microstep: 934.62 | bwd_inner_microstep: 934.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3591
[2024-06-10 15:37:02,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.91 | bwd_microstep: 1210.13 | bwd_inner_microstep: 1210.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 15:37:12,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 15:37:12,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 9535.99 | bwd_inner_microstep: 1548.85 | bwd_allreduce_microstep: 7987.08 | step_microstep: 38.94
[2024-06-10 15:37:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15422.31 | bwd: 49430.18 | bwd_inner: 41442.19 | bwd_allreduce: 7987.32 | step: 40.46
{'loss': 1.2657, 'learning_rate': 2.0788009184275514e-05, 'epoch': 0.5}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 15:37:14,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1266.72 | bwd_inner_microstep: 1266.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 15:37:16,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.68 | bwd_microstep: 1473.98 | bwd_inner_microstep: 1473.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856
[2024-06-10 15:37:18,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.54 | bwd_microstep: 1557.87 | bwd_inner_microstep: 1557.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 15:37:20,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.86 | bwd_microstep: 1646.97 | bwd_inner_microstep: 1646.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2303
[2024-06-10 15:37:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.96 | bwd_microstep: 972.38 | bwd_inner_microstep: 972.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-10 15:37:23,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.09 | bwd_microstep: 1188.98 | bwd_inner_microstep: 1188.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 15:37:25,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.69 | bwd_microstep: 1529.78 | bwd_inner_microstep: 1529.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-10 15:37:27,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.87 | bwd_microstep: 1188.15 | bwd_inner_microstep: 1188.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 15:37:29,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1279.37 | bwd_inner_microstep: 1279.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 15:37:30,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1246.59 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3455
[2024-06-10 15:37:32,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.53 | bwd_microstep: 1211.85 | bwd_inner_microstep: 1211.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 15:37:33,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.72 | bwd_microstep: 827.38 | bwd_inner_microstep: 827.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 15:37:35,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.31 | bwd_microstep: 1379.52 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3644
[2024-06-10 15:37:37,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1469.91 | bwd_inner_microstep: 1469.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3643
[2024-06-10 15:37:39,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.94 | bwd_microstep: 1654.90 | bwd_inner_microstep: 1654.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 15:37:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1342.62 | bwd_inner_microstep: 1342.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 15:37:43,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1392.21 | bwd_inner_microstep: 1392.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3020
[2024-06-10 15:37:45,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 431.34 | bwd_microstep: 1130.31 | bwd_inner_microstep: 1130.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 15:37:47,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 15:37:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1460.41 | bwd_inner_microstep: 1460.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 15:37:51,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.53 | bwd_microstep: 1653.35 | bwd_inner_microstep: 1653.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3548
[2024-06-10 15:37:53,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1230.02 | bwd_inner_microstep: 1229.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 15:37:55,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1532.48 | bwd_inner_microstep: 1532.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 15:37:57,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1438.49 | bwd_inner_microstep: 1438.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502
[2024-06-10 15:37:58,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.23 | bwd_microstep: 1219.57 | bwd_inner_microstep: 1219.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 15:38:01,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1595.48 | bwd_inner_microstep: 1595.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 15:38:03,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.26 | bwd_microstep: 1605.66 | bwd_inner_microstep: 1605.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 15:38:05,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.29 | bwd_microstep: 1275.21 | bwd_inner_microstep: 1275.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3754
[2024-06-10 15:38:07,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.02 | bwd_microstep: 1672.05 | bwd_inner_microstep: 1672.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 15:38:09,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1585.65 | bwd_inner_microstep: 1585.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 15:38:11,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1600.74 | bwd_inner_microstep: 1600.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3745
[2024-06-10 15:38:14,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 15:38:14,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.15 | bwd_microstep: 1770.66 | bwd_inner_microstep: 1762.94 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.69
[2024-06-10 15:38:14,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16755.96 | bwd: 44879.28 | bwd_inner: 44870.72 | bwd_allreduce: 7.91 | step: 39.15
{'loss': 1.2419, 'learning_rate': 2.0750503001305832e-05, 'epoch': 0.5}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 15:38:16,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-10 15:38:18,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1444.79 | bwd_inner_microstep: 1444.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 15:38:20,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1482.60 | bwd_inner_microstep: 1482.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 15:38:22,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1538.88 | bwd_inner_microstep: 1538.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 840
[2024-06-10 15:38:22,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.19 | bwd_microstep: 344.92 | bwd_inner_microstep: 344.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 15:38:24,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.77 | bwd_microstep: 1422.87 | bwd_inner_microstep: 1422.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3409
[2024-06-10 15:38:26,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1281.25 | bwd_inner_microstep: 1281.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1956
[2024-06-10 15:38:27,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.96 | bwd_microstep: 760.81 | bwd_inner_microstep: 760.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 15:38:29,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1485.73 | bwd_inner_microstep: 1485.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3426
[2024-06-10 15:38:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-10 15:38:33,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1444.13 | bwd_inner_microstep: 1444.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 15:38:35,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1317.68 | bwd_inner_microstep: 1317.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 15:38:37,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1506.63 | bwd_inner_microstep: 1506.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-10 15:38:39,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1576.91 | bwd_inner_microstep: 1576.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 15:38:40,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 698.63 | bwd_inner_microstep: 698.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3873
[2024-06-10 15:38:42,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.61 | bwd_microstep: 1580.91 | bwd_inner_microstep: 1580.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 15:38:44,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1280.04 | bwd_inner_microstep: 1280.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3613
[2024-06-10 15:38:46,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1340.54 | bwd_inner_microstep: 1340.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3869
[2024-06-10 15:38:48,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.89 | bwd_microstep: 1366.81 | bwd_inner_microstep: 1366.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-10 15:38:49,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.30 | bwd_microstep: 832.09 | bwd_inner_microstep: 832.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3720
[2024-06-10 15:38:51,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1237.72 | bwd_inner_microstep: 1237.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 15:38:53,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.62 | bwd_inner_microstep: 1348.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 15:38:54,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1255.73 | bwd_inner_microstep: 1255.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 15:38:56,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.12 | bwd_microstep: 919.63 | bwd_inner_microstep: 919.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2194
[2024-06-10 15:38:57,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.86 | bwd_microstep: 829.54 | bwd_inner_microstep: 829.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905
[2024-06-10 15:38:58,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.20 | bwd_microstep: 689.14 | bwd_inner_microstep: 689.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2080
[2024-06-10 15:38:59,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.45 | bwd_microstep: 896.13 | bwd_inner_microstep: 896.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454
[2024-06-10 15:39:01,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.92 | bwd_microstep: 1314.33 | bwd_inner_microstep: 1314.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 15:39:03,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.28 | bwd_microstep: 1299.77 | bwd_inner_microstep: 1299.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3554
[2024-06-10 15:39:05,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.75 | bwd_microstep: 1660.68 | bwd_inner_microstep: 1660.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 15:39:07,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1493.73 | bwd_inner_microstep: 1493.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 15:39:16,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.22 | optimizer_step: 6.57
[2024-06-10 15:39:16,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.17 | bwd_microstep: 8023.32 | bwd_inner_microstep: 1802.08 | bwd_allreduce_microstep: 6221.19 | step_microstep: 38.00
[2024-06-10 15:39:16,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15003.39 | bwd: 46457.13 | bwd_inner: 40235.04 | bwd_allreduce: 6221.41 | step: 39.41

 50%|█████     | 865/1726 [14:56:40<14:50:36, 62.06s/it]


 50%|█████     | 865/1726 [14:56:40<14:50:36, 62.06s/it]
 50%|█████     | 866/1726 [14:57:42<14:48:17, 61.97s/it]


 50%|█████     | 866/1726 [14:57:42<14:48:17, 61.97s/it]
 50%|█████     | 867/1726 [14:58:43<14:45:54, 61.88s/it]


 50%|█████     | 867/1726 [14:58:43<14:45:54, 61.88s/it]
 50%|█████     | 868/1726 [14:59:49<14:59:04, 62.87s/it]


 50%|█████     | 868/1726 [14:59:49<14:59:04, 62.87s/it]
 50%|█████     | 869/1726 [15:00:51<14:54:10, 62.60s/it]


 50%|█████     | 869/1726 [15:00:51<14:54:10, 62.60s/it]
 50%|█████     | 870/1726 [15:01:52<14:49:36, 62.36s/{'loss': 1.2339, 'learning_rate': 2.071299417506849e-05, 'epoch': 0.5}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1932
[2024-06-10 15:39:17,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.80 | bwd_microstep: 844.04 | bwd_inner_microstep: 843.94 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3956
[2024-06-10 15:39:19,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1496.99 | bwd_inner_microstep: 1496.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 15:39:20,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 792.60 | bwd_inner_microstep: 792.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 15:39:22,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.48 | bwd_microstep: 1341.27 | bwd_inner_microstep: 1341.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 15:39:24,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1439.72 | bwd_inner_microstep: 1439.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 15:39:26,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.27 | bwd_microstep: 1384.13 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 15:39:28,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1386.29 | bwd_inner_microstep: 1386.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 15:39:29,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.69 | bwd_microstep: 797.02 | bwd_inner_microstep: 796.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 15:39:31,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.59 | bwd_microstep: 1520.03 | bwd_inner_microstep: 1520.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 15:39:33,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1422.03 | bwd_inner_microstep: 1422.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 15:39:34,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 797.40 | bwd_inner_microstep: 797.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3439
[2024-06-10 15:39:36,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1393.20 | bwd_inner_microstep: 1393.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2105
[2024-06-10 15:39:37,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.07 | bwd_microstep: 1012.02 | bwd_inner_microstep: 1011.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3216
[2024-06-10 15:39:39,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1400.24 | bwd_inner_microstep: 1400.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 15:39:41,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1293.70 | bwd_inner_microstep: 1293.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972
[2024-06-10 15:39:42,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 733.04 | bwd_inner_microstep: 733.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517
[2024-06-10 15:39:44,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1317.84 | bwd_inner_microstep: 1317.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 15:39:46,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1391.09 | bwd_inner_microstep: 1391.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2404
[2024-06-10 15:39:47,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.58 | bwd_microstep: 899.67 | bwd_inner_microstep: 899.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2183
[2024-06-10 15:39:48,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.04 | bwd_microstep: 858.71 | bwd_inner_microstep: 858.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3830
[2024-06-10 15:39:50,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.61 | bwd_microstep: 1585.94 | bwd_inner_microstep: 1585.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 15:39:52,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539
[2024-06-10 15:39:55,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.28 | bwd_microstep: 1590.34 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 15:39:56,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1256.02 | bwd_inner_microstep: 1255.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 15:39:59,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.80 | bwd_microstep: 1652.96 | bwd_inner_microstep: 1652.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 15:40:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.62 | bwd_microstep: 1458.78 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 15:40:03,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.08 | bwd_microstep: 1628.22 | bwd_inner_microstep: 1628.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584
[2024-06-10 15:40:05,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1399.58 | bwd_inner_microstep: 1399.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 15:40:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.63 | bwd_microstep: 1543.67 | bwd_inner_microstep: 1543.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 15:40:09,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.40 | bwd_microstep: 1487.12 | bwd_inner_microstep: 1487.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561
[2024-06-10 15:40:11,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1328.84 | bwd_inner_microstep: 1328.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 15:40:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.58
[2024-06-10 15:40:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 5408.65 | bwd_inner_microstep: 1532.09 | bwd_allreduce_microstep: 3876.51 | step_microstep: 38.15
[2024-06-10 15:40:17,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15471.40 | bwd: 45418.87 | bwd_inner: 41541.38 | bwd_allreduce: 3876.78 | step: 39.62
{'loss': 1.2369, 'learning_rate': 2.0675482837669367e-05, 'epoch': 0.5}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 15:40:19,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1467.58 | bwd_inner_microstep: 1467.42 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 15:40:21,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.87 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3838
[2024-06-10 15:40:23,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.27 | bwd_microstep: 1290.61 | bwd_inner_microstep: 1290.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3860
[2024-06-10 15:40:25,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.87 | bwd_microstep: 1464.44 | bwd_inner_microstep: 1464.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4148
[2024-06-10 15:40:27,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1479.19 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-10 15:40:28,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.94 | bwd_microstep: 701.82 | bwd_inner_microstep: 701.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 15:40:29,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1248.76 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 15:40:31,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 15:40:32,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.81 | bwd_microstep: 793.04 | bwd_inner_microstep: 793.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2022
[2024-06-10 15:40:34,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.44 | bwd_microstep: 838.62 | bwd_inner_microstep: 838.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 15:40:36,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1485.66 | bwd_inner_microstep: 1485.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 15:40:38,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092
[2024-06-10 15:40:39,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 919.65 | bwd_inner_microstep: 919.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417
[2024-06-10 15:40:41,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.67 | bwd_microstep: 1537.94 | bwd_inner_microstep: 1537.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3645
[2024-06-10 15:40:43,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1573.30 | bwd_inner_microstep: 1573.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 15:40:45,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.05 | bwd_microstep: 1613.65 | bwd_inner_microstep: 1613.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 15:40:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1407.61 | bwd_inner_microstep: 1407.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 15:40:49,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1254.88 | bwd_inner_microstep: 1254.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 15:40:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.28 | bwd_microstep: 703.55 | bwd_inner_microstep: 703.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 15:40:52,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 15:40:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 15:40:56,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1310.38 | bwd_inner_microstep: 1310.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-10 15:40:57,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.71 | bwd_microstep: 820.80 | bwd_inner_microstep: 820.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 15:40:59,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1411.40 | bwd_inner_microstep: 1411.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 15:41:01,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.38 | bwd_microstep: 1292.99 | bwd_inner_microstep: 1292.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 15:41:03,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.09 | bwd_microstep: 1506.53 | bwd_inner_microstep: 1506.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3598
[2024-06-10 15:41:04,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.83 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3559
[2024-06-10 15:41:07,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.95 | bwd_microstep: 1527.94 | bwd_inner_microstep: 1527.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-10 15:41:08,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1292.33 | bwd_inner_microstep: 1292.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588
[2024-06-10 15:41:11,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.52 | bwd_microstep: 1597.16 | bwd_inner_microstep: 1597.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 15:41:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1405.61 | bwd_inner_microstep: 1405.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 15:41:18,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.36 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 15:41:18,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 5231.42 | bwd_inner_microstep: 1869.69 | bwd_allreduce_microstep: 3361.67 | step_microstep: 38.90
[2024-06-10 15:41:18,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15689.82 | bwd: 45530.52 | bwd_inner: 42167.82 | bwd_allreduce: 3361.96 | step: 40.39
{'loss': 1.2485, 'learning_rate': 2.06379691212232e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-10 15:41:20,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1436.70 | bwd_inner_microstep: 1436.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2396
[2024-06-10 15:41:22,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.38 | bwd_microstep: 997.51 | bwd_inner_microstep: 997.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 15:41:23,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.19 | bwd_microstep: 1275.47 | bwd_inner_microstep: 1275.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4273
[2024-06-10 15:41:26,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.16 | bwd_microstep: 1665.57 | bwd_inner_microstep: 1665.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 15:41:27,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.99 | bwd_microstep: 798.88 | bwd_inner_microstep: 798.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 15:41:29,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1422.42 | bwd_inner_microstep: 1422.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431
[2024-06-10 15:41:30,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.33 | bwd_microstep: 1183.53 | bwd_inner_microstep: 1183.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 15:41:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 15:41:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1389.69 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974
[2024-06-10 15:41:35,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.98 | bwd_microstep: 765.01 | bwd_inner_microstep: 764.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154
[2024-06-10 15:41:36,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.08 | bwd_microstep: 853.03 | bwd_inner_microstep: 853.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461
[2024-06-10 15:41:38,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.34 | bwd_microstep: 1212.34 | bwd_inner_microstep: 1212.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986
[2024-06-10 15:41:39,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.20 | bwd_microstep: 831.84 | bwd_inner_microstep: 831.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525
[2024-06-10 15:41:41,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1349.83 | bwd_inner_microstep: 1349.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 15:41:43,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.24 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 15:41:45,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3686
[2024-06-10 15:41:48,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.52 | bwd_microstep: 1760.40 | bwd_inner_microstep: 1760.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 15:41:50,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.81 | bwd_microstep: 1598.14 | bwd_inner_microstep: 1598.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 15:41:52,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1393.49 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 15:41:54,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.67 | bwd_microstep: 1489.41 | bwd_inner_microstep: 1489.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3455
[2024-06-10 15:41:55,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.65 | bwd_microstep: 1190.30 | bwd_inner_microstep: 1190.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 15:41:57,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.17 | bwd_microstep: 1504.85 | bwd_inner_microstep: 1504.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 15:41:59,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1398.55 | bwd_inner_microstep: 1398.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 15:42:00,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.52 | bwd_microstep: 818.00 | bwd_inner_microstep: 817.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566
[2024-06-10 15:42:02,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.78 | bwd_microstep: 1202.77 | bwd_inner_microstep: 1202.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 15:42:03,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.59 | bwd_microstep: 801.43 | bwd_inner_microstep: 801.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 15:42:05,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.39 | bwd_microstep: 1313.56 | bwd_inner_microstep: 1313.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 15:42:07,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1413.26 | bwd_inner_microstep: 1413.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 15:42:08,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.90 | bwd_microstep: 698.19 | bwd_inner_microstep: 698.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 15:42:10,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.55 | bwd_microstep: 1568.88 | bwd_inner_microstep: 1568.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 15:42:11,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.57 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 15:42:19,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 15:42:19,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.88 | bwd_microstep: 7349.15 | bwd_inner_microstep: 1817.51 | bwd_allreduce_microstep: 5531.59 | step_microstep: 37.92
[2024-06-10 15:42:19,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14979.66 | bwd: 45625.55 | bwd_inner: 40093.02 | bwd_allreduce: 5531.83 | step: 39.43
{'loss': 1.2182, 'learning_rate': 2.0600453157853103e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 15:42:21,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.11 | bwd_microstep: 1476.29 | bwd_inner_microstep: 1476.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3872
[2024-06-10 15:42:23,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.36 | bwd_microstep: 1556.90 | bwd_inner_microstep: 1556.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 15:42:25,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1243.15 | bwd_inner_microstep: 1243.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 15:42:26,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.98 | bwd_microstep: 723.35 | bwd_inner_microstep: 723.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-10 15:42:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.95 | bwd_microstep: 786.83 | bwd_inner_microstep: 786.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 15:42:29,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1335.13 | bwd_inner_microstep: 1335.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 15:42:31,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1243.50 | bwd_inner_microstep: 1243.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 15:42:33,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1380.12 | bwd_inner_microstep: 1380.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 15:42:34,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1247.90 | bwd_inner_microstep: 1247.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 15:42:36,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.82 | bwd_microstep: 1278.35 | bwd_inner_microstep: 1278.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969
[2024-06-10 15:42:37,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 823.59 | bwd_inner_microstep: 823.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3500
[2024-06-10 15:42:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1524.40 | bwd_inner_microstep: 1524.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3477
[2024-06-10 15:42:42,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.69 | bwd_microstep: 1568.81 | bwd_inner_microstep: 1568.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 15:42:44,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.65 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652
[2024-06-10 15:42:46,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.93 | bwd_microstep: 1713.08 | bwd_inner_microstep: 1713.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3688
[2024-06-10 15:42:48,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1384.73 | bwd_inner_microstep: 1384.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 15:42:50,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1556.11 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105
[2024-06-10 15:42:51,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.60 | bwd_microstep: 918.89 | bwd_inner_microstep: 918.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654
[2024-06-10 15:42:53,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.77 | bwd_microstep: 1426.04 | bwd_inner_microstep: 1426.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-10 15:42:54,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.35 | bwd_microstep: 686.31 | bwd_inner_microstep: 686.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688
[2024-06-10 15:42:56,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.82 | bwd_microstep: 1521.60 | bwd_inner_microstep: 1521.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 15:42:58,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1438.30 | bwd_inner_microstep: 1438.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2159
[2024-06-10 15:43:00,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.18 | bwd_microstep: 949.62 | bwd_inner_microstep: 949.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999
[2024-06-10 15:43:01,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.60 | bwd_microstep: 739.87 | bwd_inner_microstep: 739.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 15:43:03,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.27 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 15:43:05,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.17 | bwd_microstep: 1198.96 | bwd_inner_microstep: 1198.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 15:43:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 15:43:09,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.11 | bwd_microstep: 1609.70 | bwd_inner_microstep: 1609.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 15:43:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1494.73 | bwd_inner_microstep: 1494.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3400
[2024-06-10 15:43:13,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1469.65 | bwd_inner_microstep: 1469.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 15:43:15,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.08 | bwd_microstep: 1539.48 | bwd_inner_microstep: 1539.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3779
[2024-06-10 15:43:20,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 15:43:20,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.49 | bwd_microstep: 4263.55 | bwd_inner_microstep: 1979.03 | bwd_allreduce_microstep: 2284.47 | step_microstep: 37.98
[2024-06-10 15:43:20,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15693.69 | bwd: 44542.95 | bwd_inner: 42257.58 | bwd_allreduce: 2284.70 | step: 39.44
{'loss': 1.2741, 'learning_rate': 2.05629350796901e-05, 'epoch': 0.51}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 15:43:22,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1393.65 | bwd_inner_microstep: 1393.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 15:43:24,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.96 | bwd_microstep: 1346.15 | bwd_inner_microstep: 1346.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 15:43:26,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 15:43:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.21 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 15:43:29,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1477.93 | bwd_inner_microstep: 1477.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 15:43:32,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1633.33 | bwd_inner_microstep: 1633.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897
[2024-06-10 15:43:33,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.87 | bwd_microstep: 682.37 | bwd_inner_microstep: 682.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189
[2024-06-10 15:43:34,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.71 | bwd_microstep: 856.59 | bwd_inner_microstep: 856.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 15:43:36,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.55 | bwd_microstep: 1627.56 | bwd_inner_microstep: 1627.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 15:43:38,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.97 | bwd_microstep: 1287.11 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 15:43:39,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004
[2024-06-10 15:43:41,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 835.36 | bwd_inner_microstep: 835.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-10 15:43:42,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.52 | bwd_microstep: 876.53 | bwd_inner_microstep: 876.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 15:43:44,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.89 | bwd_microstep: 1447.63 | bwd_inner_microstep: 1447.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3495
[2024-06-10 15:43:46,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.48 | bwd_microstep: 1331.24 | bwd_inner_microstep: 1331.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 15:43:48,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1660.06 | bwd_inner_microstep: 1660.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2144
[2024-06-10 15:43:49,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.14 | bwd_microstep: 1027.24 | bwd_inner_microstep: 1027.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3530
[2024-06-10 15:43:52,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.09 | bwd_microstep: 1536.49 | bwd_inner_microstep: 1536.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 15:43:54,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.92 | bwd_microstep: 1647.82 | bwd_inner_microstep: 1647.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 15:43:55,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.34 | bwd_microstep: 800.62 | bwd_inner_microstep: 800.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 15:43:56,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 797.10 | bwd_inner_microstep: 797.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2280
[2024-06-10 15:43:57,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.73 | bwd_microstep: 811.08 | bwd_inner_microstep: 811.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 15:43:59,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.45 | bwd_inner_microstep: 1256.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 15:44:01,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1352.00 | bwd_inner_microstep: 1351.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 15:44:03,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1506.82 | bwd_inner_microstep: 1506.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 15:44:05,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1354.15 | bwd_inner_microstep: 1354.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 15:44:07,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.66 | bwd_microstep: 1544.69 | bwd_inner_microstep: 1544.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2285
[2024-06-10 15:44:08,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.68 | bwd_microstep: 942.09 | bwd_inner_microstep: 942.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799
[2024-06-10 15:44:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.78 | bwd_microstep: 1640.53 | bwd_inner_microstep: 1640.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3735
[2024-06-10 15:44:12,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1486.54 | bwd_inner_microstep: 1486.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043
[2024-06-10 15:44:14,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.66 | bwd_microstep: 905.13 | bwd_inner_microstep: 905.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 15:44:20,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.37 | optimizer_step: 6.56
[2024-06-10 15:44:20,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.68 | bwd_microstep: 5858.52 | bwd_inner_microstep: 1804.02 | bwd_allreduce_microstep: 4054.44 | step_microstep: 38.22
[2024-06-10 15:44:20,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15152.03 | bwd: 44846.87 | bwd_inner: 40791.52 | bwd_allreduce: 4054.68 | step: 39.70
{'loss': 1.2496, 'learning_rate': 2.0525415018872686e-05, 'epoch': 0.51}
it]


 50%|█████     | 870/1726 [15:01:52<14:49:36, 62.36s/it]
 50%|█████     | 871/1726 [15:02:54<14:43:42, 62.01s/it]


 50%|█████     | 871/1726 [15:02:54<14:43:42, 62.01s/it]
 51%|█████     | 872/1726 [15:03:55<14:40:43, 61.88s/it]


 51%|█████     | 872/1726 [15:03:55<14:40:43, 61.88s/it]
 51%|█████     | 873/1726 [15:04:56<14:35:37, 61.59s/it]


 51%|█████     | 873/1726 [15:04:56<14:35:37, 61.59s/it]
 51%|█████     | 874/1726 [15:05:57<14:30:13, 61.28s/it]


 51%|█████     | 874/1726 [15:05:57<14:30:13, 61.28s/it]
 51%|█████     | 875/1726 [15:06:57<14:25:09, 61.00s/it]


 5dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 15:44:22,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.02 | bwd_microstep: 1471.10 | bwd_inner_microstep: 1471.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 15:44:24,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1377.70 | bwd_inner_microstep: 1377.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 15:44:26,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1275.89 | bwd_inner_microstep: 1275.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 15:44:28,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1482.44 | bwd_inner_microstep: 1482.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4179
[2024-06-10 15:44:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.21 | bwd_microstep: 1648.88 | bwd_inner_microstep: 1648.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 846
[2024-06-10 15:44:31,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.42 | bwd_microstep: 345.75 | bwd_inner_microstep: 345.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 15:44:33,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.17 | bwd_microstep: 1485.45 | bwd_inner_microstep: 1485.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 15:44:34,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.50 | bwd_microstep: 794.20 | bwd_inner_microstep: 794.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 15:44:36,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1390.01 | bwd_inner_microstep: 1389.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 15:44:37,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.84 | bwd_microstep: 1151.12 | bwd_inner_microstep: 1151.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118
[2024-06-10 15:44:38,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.50 | bwd_microstep: 829.66 | bwd_inner_microstep: 829.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 15:44:40,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1444.67 | bwd_inner_microstep: 1444.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 15:44:43,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.69 | bwd_inner_microstep: 1508.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 15:44:44,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 15:44:47,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.46 | bwd_microstep: 1617.20 | bwd_inner_microstep: 1617.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 15:44:48,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1391.27 | bwd_inner_microstep: 1391.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 15:44:49,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.59 | bwd_microstep: 698.70 | bwd_inner_microstep: 698.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 15:44:52,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.20 | bwd_microstep: 1518.83 | bwd_inner_microstep: 1518.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171
[2024-06-10 15:44:53,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.20 | bwd_microstep: 855.43 | bwd_inner_microstep: 855.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090
[2024-06-10 15:44:54,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.82 | bwd_microstep: 917.00 | bwd_inner_microstep: 916.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 15:44:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.25 | bwd_microstep: 1621.58 | bwd_inner_microstep: 1621.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-10 15:44:57,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.60 | bwd_microstep: 695.56 | bwd_inner_microstep: 695.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 15:44:59,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1281.28 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548
[2024-06-10 15:45:01,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1420.58 | bwd_inner_microstep: 1420.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 15:45:03,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.41 | bwd_microstep: 1425.76 | bwd_inner_microstep: 1425.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 15:45:05,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.11 | bwd_microstep: 1647.35 | bwd_inner_microstep: 1647.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2282
[2024-06-10 15:45:07,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.14 | bwd_microstep: 1004.69 | bwd_inner_microstep: 1004.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3832
[2024-06-10 15:45:09,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.76 | bwd_microstep: 1621.86 | bwd_inner_microstep: 1621.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-10 15:45:11,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.63 | bwd_microstep: 1599.37 | bwd_inner_microstep: 1599.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733
[2024-06-10 15:45:13,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.49 | bwd_microstep: 1441.56 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810
[2024-06-10 15:45:15,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.67 | bwd_microstep: 1751.64 | bwd_inner_microstep: 1751.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3812
[2024-06-10 15:45:21,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.39 | optimizer_step: 6.59
[2024-06-10 15:45:21,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.29 | bwd_microstep: 4562.40 | bwd_inner_microstep: 2099.95 | bwd_allreduce_microstep: 2462.39 | step_microstep: 39.58
[2024-06-10 15:45:21,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15609.89 | bwd: 44522.70 | bwd_inner: 42059.40 | bwd_allreduce: 2462.62 | step: 41.04
{'loss': 1.1974, 'learning_rate': 2.0487893107546298e-05, 'epoch': 0.51}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 15:45:22,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1298.24 | bwd_inner_microstep: 1298.07 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 15:45:24,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1278.60 | bwd_inner_microstep: 1278.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586
[2024-06-10 15:45:26,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1338.89 | bwd_inner_microstep: 1338.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1734
[2024-06-10 15:45:27,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.94 | bwd_microstep: 680.42 | bwd_inner_microstep: 680.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-10 15:45:29,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.14 | bwd_microstep: 1537.90 | bwd_inner_microstep: 1537.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3483
[2024-06-10 15:45:31,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1330.38 | bwd_inner_microstep: 1330.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 15:45:33,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.85 | bwd_microstep: 1428.16 | bwd_inner_microstep: 1428.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3718
[2024-06-10 15:45:35,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1462.37 | bwd_inner_microstep: 1462.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 15:45:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1395.70 | bwd_inner_microstep: 1395.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 15:45:39,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1295.05 | bwd_inner_microstep: 1295.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970
[2024-06-10 15:45:40,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.91 | bwd_microstep: 828.89 | bwd_inner_microstep: 828.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 15:45:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1489.12 | bwd_inner_microstep: 1489.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-10 15:45:44,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.75 | bwd_microstep: 1584.55 | bwd_inner_microstep: 1584.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 15:45:46,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1380.05 | bwd_inner_microstep: 1380.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 15:45:47,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.02 | bwd_microstep: 686.61 | bwd_inner_microstep: 686.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 15:45:49,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.05 | bwd_microstep: 1294.41 | bwd_inner_microstep: 1294.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3745
[2024-06-10 15:45:51,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1442.61 | bwd_inner_microstep: 1442.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143
[2024-06-10 15:45:52,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.72 | bwd_microstep: 830.74 | bwd_inner_microstep: 830.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519
[2024-06-10 15:45:54,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1318.80 | bwd_inner_microstep: 1318.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 15:45:56,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2325
[2024-06-10 15:45:57,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.06 | bwd_microstep: 918.85 | bwd_inner_microstep: 918.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2185
[2024-06-10 15:45:58,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.53 | bwd_microstep: 903.42 | bwd_inner_microstep: 903.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 15:46:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1399.97 | bwd_inner_microstep: 1399.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 15:46:02,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 15:46:03,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.98 | bwd_microstep: 971.73 | bwd_inner_microstep: 971.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750
[2024-06-10 15:46:05,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.12 | bwd_microstep: 1341.19 | bwd_inner_microstep: 1341.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3601
[2024-06-10 15:46:07,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.05 | bwd_microstep: 1741.04 | bwd_inner_microstep: 1741.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 15:46:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1441.07 | bwd_inner_microstep: 1441.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 15:46:11,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1380.41 | bwd_inner_microstep: 1380.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 15:46:14,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.98 | bwd_microstep: 1646.69 | bwd_inner_microstep: 1646.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462
[2024-06-10 15:46:15,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.22 | bwd_microstep: 1341.38 | bwd_inner_microstep: 1341.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 15:46:21,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 15:46:21,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.13 | bwd_microstep: 4818.44 | bwd_inner_microstep: 1866.34 | bwd_allreduce_microstep: 2952.05 | step_microstep: 37.94
[2024-06-10 15:46:21,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15505.27 | bwd: 44484.40 | bwd_inner: 41531.31 | bwd_allreduce: 2952.35 | step: 39.49
{'loss': 1.2136, 'learning_rate': 2.0450369477862922e-05, 'epoch': 0.51}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 15:46:23,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1329.91 | bwd_inner_microstep: 1329.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 15:46:24,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.65 | bwd_microstep: 1149.11 | bwd_inner_microstep: 1149.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 15:46:26,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.97 | bwd_microstep: 1294.54 | bwd_inner_microstep: 1294.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3873
[2024-06-10 15:46:28,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1514.12 | bwd_inner_microstep: 1514.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489
[2024-06-10 15:46:30,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1315.96 | bwd_inner_microstep: 1315.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258
[2024-06-10 15:46:31,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.71 | bwd_microstep: 966.39 | bwd_inner_microstep: 966.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3779
[2024-06-10 15:46:33,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.85 | bwd_microstep: 1332.19 | bwd_inner_microstep: 1332.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-10 15:46:34,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.64 | bwd_microstep: 683.24 | bwd_inner_microstep: 683.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-10 15:46:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.15 | bwd_microstep: 1187.00 | bwd_inner_microstep: 1186.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 15:46:38,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1249.78 | bwd_inner_microstep: 1249.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 15:46:39,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.81 | bwd_microstep: 1290.24 | bwd_inner_microstep: 1290.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3699
[2024-06-10 15:46:42,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.45 | bwd_microstep: 1560.40 | bwd_inner_microstep: 1560.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 15:46:44,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1487.00 | bwd_inner_microstep: 1486.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 15:46:45,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.06 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 15:46:47,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1388.91 | bwd_inner_microstep: 1388.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3719
[2024-06-10 15:46:50,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.08 | bwd_microstep: 1780.96 | bwd_inner_microstep: 1780.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2158
[2024-06-10 15:46:51,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.43 | bwd_microstep: 1048.97 | bwd_inner_microstep: 1048.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537
[2024-06-10 15:46:53,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1583.75 | bwd_inner_microstep: 1583.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 15:46:55,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1509.11 | bwd_inner_microstep: 1509.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 15:46:57,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 1397.14 | bwd_inner_microstep: 1397.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 15:46:59,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.71 | bwd_microstep: 1397.94 | bwd_inner_microstep: 1397.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623
[2024-06-10 15:47:01,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.13 | bwd_microstep: 1342.12 | bwd_inner_microstep: 1342.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 15:47:03,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1411.21 | bwd_inner_microstep: 1411.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 15:47:05,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.65 | bwd_microstep: 1659.46 | bwd_inner_microstep: 1659.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3639
[2024-06-10 15:47:07,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1476.22 | bwd_inner_microstep: 1476.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-10 15:47:20,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.94 | bwd_microstep: 1451.88 | bwd_inner_microstep: 1451.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 15:47:22,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1393.24 | bwd_inner_microstep: 1393.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3776
[2024-06-10 15:47:24,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.67 | bwd_microstep: 1469.94 | bwd_inner_microstep: 1469.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803
[2024-06-10 15:47:26,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.47 | bwd_microstep: 1448.10 | bwd_inner_microstep: 1448.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 15:47:28,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1347.54 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 15:47:30,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1282.88 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3454
[2024-06-10 15:47:34,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 15:47:34,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 3114.18 | bwd_inner_microstep: 1761.50 | bwd_allreduce_microstep: 1352.64 | step_microstep: 37.77
[2024-06-10 15:47:34,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16336.48 | bwd: 45144.43 | bwd_inner: 43790.89 | bwd_allreduce: 1352.86 | step: 39.30
{'loss': 1.2547, 'learning_rate': 2.0412844261980588e-05, 'epoch': 0.51}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457
[2024-06-10 15:47:36,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.01 | bwd_microstep: 1563.08 | bwd_inner_microstep: 1563.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 15:47:38,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.09 | bwd_microstep: 1494.95 | bwd_inner_microstep: 1494.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 15:47:40,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.24 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461
[2024-06-10 15:47:42,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.07 | bwd_microstep: 1214.88 | bwd_inner_microstep: 1214.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3759
[2024-06-10 15:47:44,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.81 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 15:47:45,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.64 | bwd_microstep: 679.15 | bwd_inner_microstep: 679.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3404
[2024-06-10 15:47:47,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.57 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 15:47:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 15:47:50,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1250.96 | bwd_inner_microstep: 1250.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 15:47:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 1442.18 | bwd_inner_microstep: 1442.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 15:47:54,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.29 | bwd_microstep: 1612.57 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3620
[2024-06-10 15:47:56,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 1578.66 | bwd_inner_microstep: 1578.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3517
[2024-06-10 15:47:59,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.85 | bwd_microstep: 1689.07 | bwd_inner_microstep: 1689.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-10 15:48:01,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1414.67 | bwd_inner_microstep: 1414.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3517
[2024-06-10 15:48:03,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.69 | bwd_microstep: 1607.50 | bwd_inner_microstep: 1607.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 15:48:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1485.26 | bwd_inner_microstep: 1485.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-10 15:48:07,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.01 | bwd_microstep: 1712.60 | bwd_inner_microstep: 1712.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2071
[2024-06-10 15:48:08,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.94 | bwd_microstep: 866.96 | bwd_inner_microstep: 866.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 15:48:10,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1382.25 | bwd_inner_microstep: 1382.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3443
[2024-06-10 15:48:12,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.23 | bwd_microstep: 1378.80 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427
[2024-06-10 15:48:14,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.85 | bwd_microstep: 1280.60 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3606
[2024-06-10 15:48:16,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2291
[2024-06-10 15:48:17,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.72 | bwd_microstep: 942.48 | bwd_inner_microstep: 942.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 15:48:19,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1491.18 | bwd_inner_microstep: 1491.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3784
[2024-06-10 15:48:21,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.70 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 15:48:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1487.78 | bwd_inner_microstep: 1487.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 15:48:25,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1311.83 | bwd_inner_microstep: 1311.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 15:48:27,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1488.96 | bwd_inner_microstep: 1488.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004
[2024-06-10 15:48:28,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.11 | bwd_microstep: 896.13 | bwd_inner_microstep: 896.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 15:48:30,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1394.67 | bwd_inner_microstep: 1394.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 15:48:32,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.56 | bwd_microstep: 1542.47 | bwd_inner_microstep: 1542.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 15:48:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 15:48:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.37 | bwd_microstep: 2110.84 | bwd_inner_microstep: 1790.88 | bwd_allreduce_microstep: 319.91 | step_microstep: 37.61
[2024-06-10 15:48:35,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16425.75 | bwd: 44563.74 | bwd_inner: 44242.93 | bwd_allreduce: 320.13 | step: 39.04
{'loss': 1.2721, 'learning_rate': 2.0375317592062912e-05, 'epoch': 0.51}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 15:48:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.49 | bwd_microstep: 791.24 | bwd_inner_microstep: 791.13 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 15:48:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.25 | bwd_microstep: 1276.05 | bwd_inner_microstep: 1276.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2453
[2024-06-10 15:48:39,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.32 | bwd_microstep: 1015.89 | bwd_inner_microstep: 1015.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-10 15:48:41,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.52 | bwd_microstep: 875.63 | bwd_inner_microstep: 875.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 15:48:43,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1450.47 | bwd_inner_microstep: 1450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 15:48:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.80 | bwd_microstep: 1148.47 | bwd_inner_microstep: 1148.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 15:48:46,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.34 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 15:48:49,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 2014.88 | bwd_inner_microstep: 2014.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 15:48:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 15:48:52,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1304.50 | bwd_inner_microstep: 1304.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 15:48:53,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.97 | bwd_microstep: 795.02 | bwd_inner_microstep: 794.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 15:48:55,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.86 | bwd_microstep: 1478.58 | bwd_inner_microstep: 1478.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 15:48:57,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1478.70 | bwd_inner_microstep: 1478.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001
[2024-06-10 15:48:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.08 | bwd_microstep: 901.02 | bwd_inner_microstep: 901.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 15:49:01,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1379.02 | bwd_inner_microstep: 1379.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3947
[2024-06-10 15:49:03,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.61 | bwd_microstep: 1602.28 | bwd_inner_microstep: 1602.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 15:49:05,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.40 | bwd_microstep: 1276.46 | bwd_inner_microstep: 1276.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084
[2024-06-10 15:49:06,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.49 | bwd_microstep: 822.28 | bwd_inner_microstep: 822.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 15:49:08,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.56 | bwd_microstep: 1657.38 | bwd_inner_microstep: 1657.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 15:49:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.12 | bwd_microstep: 1186.15 | bwd_inner_microstep: 1186.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 15:49:12,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1512.59 | bwd_inner_microstep: 1512.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 15:49:13,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1285.32 | bwd_inner_microstep: 1285.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 15:49:15,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1397.28 | bwd_inner_microstep: 1397.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 15:49:17,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.76 | bwd_microstep: 1218.17 | bwd_inner_microstep: 1218.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 15:49:19,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.74 | bwd_microstep: 1562.09 | bwd_inner_microstep: 1562.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061
[2024-06-10 15:49:21,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.06 | bwd_microstep: 914.83 | bwd_inner_microstep: 914.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2252
[2024-06-10 15:49:22,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.30 | bwd_microstep: 939.59 | bwd_inner_microstep: 939.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3485
[2024-06-10 15:49:24,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1334.06 | bwd_inner_microstep: 1334.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-10 15:49:26,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.19 | bwd_microstep: 1457.28 | bwd_inner_microstep: 1457.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740
[2024-06-10 15:49:28,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1346.90 | bwd_inner_microstep: 1346.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 15:49:30,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1446.11 | bwd_inner_microstep: 1446.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 15:49:37,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.35 | optimizer_step: 6.63
[2024-06-10 15:49:37,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 7402.60 | bwd_inner_microstep: 1411.21 | bwd_allreduce_microstep: 5991.31 | step_microstep: 39.10
[2024-06-10 15:49:37,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15069.33 | bwd: 46939.69 | bwd_inner: 40947.35 | bwd_allreduce: 5991.60 | step: 40.60
{'loss': 1.2537, 'learning_rate': 2.0337789600278623e-05, 'epoch': 0.51}
1%|█████     | 875/1726 [15:06:57<14:25:09, 61.00s/it]
 51%|█████     | 876/1726 [15:07:57<14:21:53, 60.84s/it]


 51%|█████     | 876/1726 [15:07:57<14:21:53, 60.84s/it]
 51%|█████     | 877/1726 [15:08:58<14:18:41, 60.69s/it]


 51%|█████     | 877/1726 [15:08:58<14:18:41, 60.69s/it]
 51%|█████     | 878/1726 [15:10:11<15:09:15, 64.33s/it]


 51%|█████     | 878/1726 [15:10:11<15:09:15, 64.33s/it]
 51%|█████     | 879/1726 [15:11:12<14:55:28, 63.43s/it]


 51%|█████     | 879/1726 [15:11:12<14:55:28, 63.43s/it]
 51%|█████     | 880/1726 [15:12:14<14:49:45, 63.10s/it]


 51%|█████     | 880/1726 [15:12:14<14:49:45, 63.10s/it]dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 15:49:39,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1334.97 | bwd_inner_microstep: 1334.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 15:49:41,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1335.98 | bwd_inner_microstep: 1335.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3831
[2024-06-10 15:49:43,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1318.74 | bwd_inner_microstep: 1318.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2355
[2024-06-10 15:49:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.80 | bwd_microstep: 891.76 | bwd_inner_microstep: 891.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-10 15:49:46,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.60 | bwd_microstep: 1562.41 | bwd_inner_microstep: 1562.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-10 15:49:47,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.95 | bwd_microstep: 683.78 | bwd_inner_microstep: 683.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 15:49:49,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.52 | bwd_microstep: 1279.59 | bwd_inner_microstep: 1279.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 15:49:51,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.65 | bwd_microstep: 1546.39 | bwd_inner_microstep: 1546.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 15:49:53,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1297.59 | bwd_inner_microstep: 1297.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 15:49:55,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.09 | bwd_microstep: 1383.01 | bwd_inner_microstep: 1382.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 15:49:57,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1252.75 | bwd_inner_microstep: 1252.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126
[2024-06-10 15:49:58,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.96 | bwd_microstep: 927.21 | bwd_inner_microstep: 927.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 15:50:00,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1396.44 | bwd_inner_microstep: 1396.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438
[2024-06-10 15:50:02,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.23 | bwd_microstep: 1153.56 | bwd_inner_microstep: 1153.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3610
[2024-06-10 15:50:04,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.42 | bwd_microstep: 1555.75 | bwd_inner_microstep: 1555.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3076
[2024-06-10 15:50:05,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1308.54 | bwd_inner_microstep: 1308.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 15:50:07,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1317.91 | bwd_inner_microstep: 1317.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 15:50:09,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1291.67 | bwd_inner_microstep: 1291.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-10 15:50:11,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1426.69 | bwd_inner_microstep: 1426.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 15:50:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.77 | bwd_microstep: 1157.17 | bwd_inner_microstep: 1157.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831
[2024-06-10 15:50:15,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1388.60 | bwd_inner_microstep: 1388.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 15:50:16,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.91 | bwd_microstep: 810.71 | bwd_inner_microstep: 810.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3816
[2024-06-10 15:50:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 1579.45 | bwd_inner_microstep: 1579.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 15:50:20,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1454.84 | bwd_inner_microstep: 1454.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3808
[2024-06-10 15:50:22,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1613.84 | bwd_inner_microstep: 1613.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3552
[2024-06-10 15:50:24,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1588.08 | bwd_inner_microstep: 1588.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 15:50:26,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1448.38 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3389
[2024-06-10 15:50:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.59 | bwd_microstep: 1436.49 | bwd_inner_microstep: 1436.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3574
[2024-06-10 15:50:30,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1591.76 | bwd_inner_microstep: 1591.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-10 15:50:32,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.92 | bwd_microstep: 1354.43 | bwd_inner_microstep: 1354.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 15:50:34,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1374.54 | bwd_inner_microstep: 1374.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 15:50:39,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 15:50:39,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 3781.01 | bwd_inner_microstep: 1638.32 | bwd_allreduce_microstep: 2142.65 | step_microstep: 38.09
[2024-06-10 15:50:39,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15963.62 | bwd: 44844.04 | bwd_inner: 42700.49 | bwd_allreduce: 2142.87 | step: 39.56
{'loss': 1.2584, 'learning_rate': 2.0300260418801123e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 15:50:41,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.20 | bwd_microstep: 1490.69 | bwd_inner_microstep: 1490.51 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 15:50:42,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1271.25 | bwd_inner_microstep: 1271.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 15:50:44,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1401.58 | bwd_inner_microstep: 1401.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 15:50:46,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.27 | bwd_microstep: 1479.06 | bwd_inner_microstep: 1479.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752
[2024-06-10 15:50:49,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.20 | bwd_microstep: 1636.12 | bwd_inner_microstep: 1636.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 15:50:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 791.01 | bwd_inner_microstep: 790.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 15:50:51,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1246.23 | bwd_inner_microstep: 1246.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 15:50:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 15:50:56,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1506.71 | bwd_inner_microstep: 1506.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-10 15:50:58,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1442.44 | bwd_inner_microstep: 1442.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3663
[2024-06-10 15:51:00,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1654.74 | bwd_inner_microstep: 1654.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3645
[2024-06-10 15:51:02,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.93 | bwd_microstep: 1813.88 | bwd_inner_microstep: 1813.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486
[2024-06-10 15:51:04,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.89 | bwd_microstep: 1435.39 | bwd_inner_microstep: 1435.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 15:51:06,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3484
[2024-06-10 15:51:08,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.12 | bwd_microstep: 1335.66 | bwd_inner_microstep: 1335.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 15:51:10,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.94 | bwd_microstep: 1316.51 | bwd_inner_microstep: 1316.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3856
[2024-06-10 15:51:12,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1366.77 | bwd_inner_microstep: 1366.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 647
[2024-06-10 15:51:12,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.09 | bwd_microstep: 275.17 | bwd_inner_microstep: 275.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3782
[2024-06-10 15:51:14,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.44 | bwd_microstep: 1578.10 | bwd_inner_microstep: 1578.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023
[2024-06-10 15:51:15,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.62 | bwd_microstep: 715.85 | bwd_inner_microstep: 715.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2162
[2024-06-10 15:51:17,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.77 | bwd_microstep: 952.08 | bwd_inner_microstep: 952.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824
[2024-06-10 15:51:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.82 | bwd_microstep: 1757.53 | bwd_inner_microstep: 1757.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 15:51:21,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1387.04 | bwd_inner_microstep: 1387.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 15:51:23,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1486.73 | bwd_inner_microstep: 1486.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 15:51:25,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.23 | bwd_microstep: 1311.96 | bwd_inner_microstep: 1311.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3782
[2024-06-10 15:51:27,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.02 | bwd_microstep: 1578.75 | bwd_inner_microstep: 1578.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 15:51:29,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.13 | bwd_microstep: 1596.02 | bwd_inner_microstep: 1595.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-10 15:51:30,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.64 | bwd_microstep: 969.22 | bwd_inner_microstep: 969.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 15:51:32,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1392.70 | bwd_inner_microstep: 1392.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271
[2024-06-10 15:51:34,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.85 | bwd_microstep: 1003.18 | bwd_inner_microstep: 1003.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3377
[2024-06-10 15:51:36,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1432.31 | bwd_inner_microstep: 1432.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3579
[2024-06-10 15:51:39,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 15:51:39,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.50 | bwd_microstep: 2842.54 | bwd_inner_microstep: 2032.13 | bwd_allreduce_microstep: 810.37 | step_microstep: 37.73
[2024-06-10 15:51:39,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16124.74 | bwd: 44231.60 | bwd_inner: 43420.19 | bwd_allreduce: 810.67 | step: 39.25
{'loss': 1.23, 'learning_rate': 2.026273017980798e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 15:51:41,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.83 | bwd_microstep: 1431.96 | bwd_inner_microstep: 1431.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915
[2024-06-10 15:51:43,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1588.72 | bwd_inner_microstep: 1588.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 15:51:45,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1390.45 | bwd_inner_microstep: 1390.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 15:51:48,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.04 | bwd_microstep: 1645.76 | bwd_inner_microstep: 1645.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2274
[2024-06-10 15:51:49,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 873.65 | bwd_inner_microstep: 873.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3801
[2024-06-10 15:51:51,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.44 | bwd_microstep: 1598.81 | bwd_inner_microstep: 1598.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 15:51:53,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1282.53 | bwd_inner_microstep: 1282.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3712
[2024-06-10 15:51:55,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.70 | bwd_microstep: 1359.25 | bwd_inner_microstep: 1359.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 15:51:57,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.96 | bwd_microstep: 1411.46 | bwd_inner_microstep: 1411.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 15:51:59,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1507.45 | bwd_inner_microstep: 1507.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 15:52:01,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.54 | bwd_microstep: 1290.49 | bwd_inner_microstep: 1290.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 15:52:02,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.08 | bwd_microstep: 1382.18 | bwd_inner_microstep: 1382.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 15:52:04,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1375.57 | bwd_inner_microstep: 1375.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3520
[2024-06-10 15:52:06,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.49 | bwd_microstep: 1559.40 | bwd_inner_microstep: 1559.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 15:52:08,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1244.05 | bwd_inner_microstep: 1244.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 15:52:10,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.04 | bwd_microstep: 1386.34 | bwd_inner_microstep: 1386.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447
[2024-06-10 15:52:12,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1283.21 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 15:52:14,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.01 | bwd_microstep: 1192.52 | bwd_inner_microstep: 1192.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3965
[2024-06-10 15:52:16,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.01 | bwd_microstep: 1665.40 | bwd_inner_microstep: 1665.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 15:52:18,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1407.20 | bwd_inner_microstep: 1407.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082
[2024-06-10 15:52:19,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.43 | bwd_microstep: 818.96 | bwd_inner_microstep: 818.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 15:52:21,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1379.94 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 15:52:23,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1279.31 | bwd_inner_microstep: 1279.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3808
[2024-06-10 15:52:25,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.28 | bwd_microstep: 1513.96 | bwd_inner_microstep: 1513.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 15:52:27,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1460.19 | bwd_inner_microstep: 1460.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-10 15:52:29,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.62 | bwd_microstep: 1301.19 | bwd_inner_microstep: 1301.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-10 15:52:31,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.98 | bwd_microstep: 1562.96 | bwd_inner_microstep: 1562.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 15:52:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1255.42 | bwd_inner_microstep: 1255.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 15:52:34,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1373.93 | bwd_inner_microstep: 1373.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 15:52:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1399.06 | bwd_inner_microstep: 1399.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 15:52:38,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 975.43 | bwd_inner_microstep: 975.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 15:52:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-10 15:52:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 2937.22 | bwd_inner_microstep: 1640.85 | bwd_allreduce_microstep: 1296.32 | step_microstep: 37.73
[2024-06-10 15:52:41,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16351.30 | bwd: 45133.99 | bwd_inner: 43836.77 | bwd_allreduce: 1296.55 | step: 39.19
{'loss': 1.2172, 'learning_rate': 2.0225199015480518e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 15:52:43,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.80 | bwd_microstep: 1495.30 | bwd_inner_microstep: 1495.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 15:52:45,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1892
[2024-06-10 15:52:46,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.69 | bwd_microstep: 747.00 | bwd_inner_microstep: 746.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3776
[2024-06-10 15:52:48,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1372.14 | bwd_inner_microstep: 1372.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 15:52:50,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.17 | bwd_microstep: 1244.38 | bwd_inner_microstep: 1244.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 15:52:51,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.15 | bwd_microstep: 1247.54 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 15:52:53,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1412.02 | bwd_inner_microstep: 1411.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 15:52:55,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.97 | bwd_microstep: 1157.75 | bwd_inner_microstep: 1157.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732
[2024-06-10 15:52:57,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.70 | bwd_microstep: 1633.03 | bwd_inner_microstep: 1633.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 15:52:58,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.06 | bwd_microstep: 804.66 | bwd_inner_microstep: 804.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 15:53:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1343.59 | bwd_inner_microstep: 1343.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2249
[2024-06-10 15:53:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.21 | bwd_microstep: 842.05 | bwd_inner_microstep: 842.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 15:53:02,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.26 | bwd_microstep: 809.61 | bwd_inner_microstep: 809.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 15:53:04,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.59 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1482.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 15:53:06,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.42 | bwd_microstep: 1476.17 | bwd_inner_microstep: 1476.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641
[2024-06-10 15:53:09,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.98 | bwd_microstep: 1710.10 | bwd_inner_microstep: 1710.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061
[2024-06-10 15:53:10,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.06 | bwd_microstep: 910.82 | bwd_inner_microstep: 910.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3543
[2024-06-10 15:53:12,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1450.21 | bwd_inner_microstep: 1450.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 15:53:14,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.72 | bwd_microstep: 1301.49 | bwd_inner_microstep: 1301.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 15:53:16,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.06 | bwd_microstep: 1392.63 | bwd_inner_microstep: 1392.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616
[2024-06-10 15:53:18,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.33 | bwd_microstep: 1537.90 | bwd_inner_microstep: 1537.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 15:53:20,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1374.95 | bwd_inner_microstep: 1374.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 15:53:22,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1411.30 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2542
[2024-06-10 15:53:23,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.99 | bwd_microstep: 969.50 | bwd_inner_microstep: 969.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 15:53:25,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1324.60 | bwd_inner_microstep: 1324.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519
[2024-06-10 15:53:27,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1414.67 | bwd_inner_microstep: 1414.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2715
[2024-06-10 15:53:28,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.81 | bwd_microstep: 1130.95 | bwd_inner_microstep: 1130.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-10 15:53:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.20 | bwd_microstep: 1719.42 | bwd_inner_microstep: 1719.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 15:53:33,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.75 | bwd_microstep: 1545.98 | bwd_inner_microstep: 1545.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3561
[2024-06-10 15:53:35,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.99 | bwd_microstep: 1210.11 | bwd_inner_microstep: 1210.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3743
[2024-06-10 15:53:37,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.88 | bwd_microstep: 1602.41 | bwd_inner_microstep: 1602.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064
[2024-06-10 15:53:42,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 15:53:42,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.60 | bwd_microstep: 5112.08 | bwd_inner_microstep: 1042.96 | bwd_allreduce_microstep: 4069.05 | step_microstep: 38.85
[2024-06-10 15:53:42,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15432.77 | bwd: 45434.40 | bwd_inner: 41364.42 | bwd_allreduce: 4069.29 | step: 40.27
{'loss': 1.2303, 'learning_rate': 2.0187667058003298e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-10 15:53:44,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.28 | bwd_microstep: 1442.12 | bwd_inner_microstep: 1442.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2414
[2024-06-10 15:53:46,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.84 | bwd_microstep: 1000.33 | bwd_inner_microstep: 1000.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 15:53:47,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1249.66 | bwd_inner_microstep: 1249.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 15:53:50,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.83 | bwd_microstep: 1640.00 | bwd_inner_microstep: 1639.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3738
[2024-06-10 15:53:52,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 15:53:53,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.74 | bwd_microstep: 1186.54 | bwd_inner_microstep: 1186.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 15:53:55,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1275.19 | bwd_inner_microstep: 1275.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717
[2024-06-10 15:53:57,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.16 | bwd_microstep: 1361.22 | bwd_inner_microstep: 1361.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3450
[2024-06-10 15:53:59,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1395.05 | bwd_inner_microstep: 1395.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-10 15:54:01,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.00 | bwd_microstep: 1537.79 | bwd_inner_microstep: 1537.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 15:54:03,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.18 | bwd_microstep: 1715.60 | bwd_inner_microstep: 1715.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 15:54:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.56 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377
[2024-06-10 15:54:07,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.09 | bwd_microstep: 1239.74 | bwd_inner_microstep: 1239.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3686
[2024-06-10 15:54:09,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.71 | bwd_microstep: 1822.21 | bwd_inner_microstep: 1822.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 15:54:11,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 15:54:13,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1393.56 | bwd_inner_microstep: 1393.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 15:54:16,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1509.47 | bwd_inner_microstep: 1509.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564
[2024-06-10 15:54:17,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.29 | bwd_microstep: 1204.22 | bwd_inner_microstep: 1204.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 15:54:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1377.39 | bwd_inner_microstep: 1377.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 15:54:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1254.61 | bwd_inner_microstep: 1254.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2060
[2024-06-10 15:54:22,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.20 | bwd_microstep: 752.34 | bwd_inner_microstep: 752.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 15:54:24,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.11 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 15:54:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1430.58 | bwd_inner_microstep: 1430.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 15:54:28,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1459.92 | bwd_inner_microstep: 1459.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 15:54:30,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1530.72 | bwd_inner_microstep: 1530.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 15:54:31,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.43 | bwd_microstep: 978.86 | bwd_inner_microstep: 978.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3777
[2024-06-10 15:54:34,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1498.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053
[2024-06-10 15:54:35,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.50 | bwd_microstep: 913.39 | bwd_inner_microstep: 913.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 15:54:37,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1348.67 | bwd_inner_microstep: 1348.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539
[2024-06-10 15:54:39,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1518.81 | bwd_inner_microstep: 1518.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 15:54:41,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1497.51 | bwd_inner_microstep: 1497.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033
[2024-06-10 15:54:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.17 | optimizer_step: 6.58
[2024-06-10 15:54:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.34 | bwd_microstep: 1666.29 | bwd_inner_microstep: 908.21 | bwd_allreduce_microstep: 758.03 | step_microstep: 37.74
[2024-06-10 15:54:43,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.70 | bwd: 44054.26 | bwd_inner: 43295.33 | bwd_allreduce: 758.25 | step: 39.24
{'loss': 1.2396, 'learning_rate': 2.0150134439563667e-05, 'epoch': 0.51}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-10 15:54:45,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1442.91 | bwd_inner_microstep: 1442.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 15:54:47,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1374.64 | bwd_inner_microstep: 1374.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 15:54:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1447.34 | bwd_inner_microstep: 1447.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 15:54:51,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.99 | bwd_microstep: 1275.54 | bwd_inner_microstep: 1275.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 15:54:53,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.14 | bwd_microstep: 1534.80 | bwd_inner_microstep: 1534.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3491
[2024-06-10 15:54:54,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.25 | bwd_microstep: 1233.96 | bwd_inner_microstep: 1233.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-10 15:54:56,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.73 | bwd_microstep: 950.58 | bwd_inner_microstep: 950.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2640
[2024-06-10 15:54:57,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.34 | bwd_microstep: 1051.23 | bwd_inner_microstep: 1051.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481
[2024-06-10 15:54:59,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.25 | bwd_microstep: 1344.67 | bwd_inner_microstep: 1344.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 15:55:01,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 15:55:03,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.01 | bwd_microstep: 1484.17 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3473
[2024-06-10 15:55:05,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1574.12 | bwd_inner_microstep: 1574.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3712
[2024-06-10 15:55:07,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.80 | bwd_microstep: 1690.03 | bwd_inner_microstep: 1690.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3460
[2024-06-10 15:55:10,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.36 | bwd_microstep: 1516.04 | bwd_inner_microstep: 1516.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3480
[2024-06-10 15:55:11,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1433.13 | bwd_inner_microstep: 1433.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 15:55:13,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.18 | bwd_microstep: 1429.89 | bwd_inner_microstep: 1429.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3894
[2024-06-10 15:55:16,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.55 | bwd_microstep: 1685.34 | bwd_inner_microstep: 1685.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 15:55:17,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.61 | bwd_microstep: 799.02 | bwd_inner_microstep: 798.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2945
[2024-06-10 15:55:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.61 | bwd_microstep: 1007.58 | bwd_inner_microstep: 1007.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 15:55:21,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.54 | bwd_microstep: 1656.51 | bwd_inner_microstep: 1656.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 15:55:23,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 15:55:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.64 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 15:55:26,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1637.71 | bwd_inner_microstep: 1637.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-10 15:55:28,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.85 | bwd_microstep: 1461.60 | bwd_inner_microstep: 1461.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815
[2024-06-10 15:55:30,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.91 | bwd_microstep: 1608.71 | bwd_inner_microstep: 1608.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2297
[2024-06-10 15:55:31,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.10 | bwd_microstep: 848.61 | bwd_inner_microstep: 848.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 15:55:33,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.94 | bwd_microstep: 1308.19 | bwd_inner_microstep: 1308.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 15:55:35,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1405.36 | bwd_inner_microstep: 1405.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 15:55:37,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1499.33 | bwd_inner_microstep: 1499.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 15:55:39,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.05 | bwd_microstep: 1491.82 | bwd_inner_microstep: 1491.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-10 15:55:40,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.58 | bwd_microstep: 809.46 | bwd_inner_microstep: 809.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3768
[2024-06-10 15:55:45,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 15:55:45,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 4072.52 | bwd_inner_microstep: 1815.54 | bwd_allreduce_microstep: 2256.91 | step_microstep: 38.81
[2024-06-10 15:55:45,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16171.86 | bwd: 45751.18 | bwd_inner: 43493.35 | bwd_allreduce: 2257.15 | step: 40.25

 51%|█████     | 881/1726 [15:13:15<14:40:23, 62.51s/it]


 51%|█████     | 881/1726 [15:13:15<14:40:23, 62.51s/it]
 51%|█████     | 882/1726 [15:14:16<14:31:40, 61.97s/it]


 51%|█████     | 882/1726 [15:14:16<14:31:40, 61.97s/it]
 51%|█████     | 883/1726 [15:15:18<14:29:59, 61.92s/it]


 51%|█████     | 883/1726 [15:15:18<14:29:59, 61.92s/it]
 51%|█████     | 884/1726 [15:16:19<14:25:55, 61.70s/it]


 51%|█████     | 884/1726 [15:16:19<14:25:55, 61.70s/it]
 51%|█████▏    | 885/1726 [15:17:20<14:20:02, 61.36s/it]


 51%|█████▏    | 885/1726 [15:17:20<14:20:02, 61.36s/it]
 51%|█████▏    | 886/1726 [15:18:22<14:22:47, 6{'loss': 1.1994, 'learning_rate': 2.0112601292351322e-05, 'epoch': 0.51}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1852
[2024-06-10 15:55:46,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.55 | bwd_microstep: 665.95 | bwd_inner_microstep: 665.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 15:55:48,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1281.22 | bwd_inner_microstep: 1281.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1934
[2024-06-10 15:55:49,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.24 | bwd_microstep: 851.23 | bwd_inner_microstep: 851.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 15:55:51,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.65 | bwd_microstep: 1376.25 | bwd_inner_microstep: 1376.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 15:55:53,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.11 | bwd_microstep: 1538.81 | bwd_inner_microstep: 1538.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 15:55:55,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1376.48 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 15:55:57,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1377.21 | bwd_inner_microstep: 1377.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 15:55:59,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.43 | bwd_microstep: 1526.96 | bwd_inner_microstep: 1526.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 15:56:01,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1341.42 | bwd_inner_microstep: 1341.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 15:56:03,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1293.93 | bwd_inner_microstep: 1293.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 15:56:04,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2140
[2024-06-10 15:56:06,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.99 | bwd_microstep: 864.54 | bwd_inner_microstep: 864.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 15:56:08,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.36 | bwd_microstep: 1618.86 | bwd_inner_microstep: 1618.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3452
[2024-06-10 15:56:10,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1400.33 | bwd_inner_microstep: 1400.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2100
[2024-06-10 15:56:11,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.89 | bwd_microstep: 789.48 | bwd_inner_microstep: 789.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 15:56:13,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1253.77 | bwd_inner_microstep: 1253.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3649
[2024-06-10 15:56:14,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.70 | bwd_microstep: 1357.78 | bwd_inner_microstep: 1357.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803
[2024-06-10 15:56:16,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.15 | bwd_microstep: 1484.14 | bwd_inner_microstep: 1484.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 15:56:18,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.66 | bwd_microstep: 1253.59 | bwd_inner_microstep: 1253.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 15:56:20,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.08 | bwd_microstep: 1374.17 | bwd_inner_microstep: 1374.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 15:56:22,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1499.73 | bwd_inner_microstep: 1499.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 15:56:24,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1412.27 | bwd_inner_microstep: 1412.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 15:56:26,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1557.82 | bwd_inner_microstep: 1557.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3603
[2024-06-10 15:56:28,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.71 | bwd_microstep: 1244.98 | bwd_inner_microstep: 1244.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721
[2024-06-10 15:56:30,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1335.65 | bwd_inner_microstep: 1335.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 15:56:31,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.74 | bwd_microstep: 802.77 | bwd_inner_microstep: 802.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 15:56:33,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1390.12 | bwd_inner_microstep: 1390.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 15:56:35,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1393.30 | bwd_inner_microstep: 1393.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 15:56:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.40 | bwd_microstep: 1645.11 | bwd_inner_microstep: 1645.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 15:56:39,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1550.41 | bwd_inner_microstep: 1550.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2000
[2024-06-10 15:56:40,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.12 | bwd_microstep: 861.12 | bwd_inner_microstep: 861.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2116
[2024-06-10 15:56:47,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 15:56:47,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.45 | bwd_microstep: 6622.95 | bwd_inner_microstep: 976.51 | bwd_allreduce_microstep: 5646.37 | step_microstep: 38.73
[2024-06-10 15:56:47,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.14 | bwd: 46589.84 | bwd_inner: 40942.51 | bwd_allreduce: 5646.63 | step: 40.23
{'loss': 1.3055, 'learning_rate': 2.0075067748557808e-05, 'epoch': 0.51}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3444
[2024-06-10 15:56:49,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4012
[2024-06-10 15:56:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1513.35 | bwd_inner_microstep: 1513.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4263
[2024-06-10 15:56:54,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.19 | bwd_microstep: 1463.29 | bwd_inner_microstep: 1463.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 15:56:56,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1450.89 | bwd_inner_microstep: 1450.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1884
[2024-06-10 15:56:57,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.97 | bwd_microstep: 723.94 | bwd_inner_microstep: 723.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-10 15:56:59,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.94 | bwd_microstep: 1435.07 | bwd_inner_microstep: 1435.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 15:57:00,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1279.44 | bwd_inner_microstep: 1279.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 15:57:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.45 | bwd_microstep: 805.99 | bwd_inner_microstep: 805.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 15:57:03,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1434.47 | bwd_inner_microstep: 1434.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 15:57:05,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1385.87 | bwd_inner_microstep: 1385.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1913
[2024-06-10 15:57:06,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.84 | bwd_microstep: 716.02 | bwd_inner_microstep: 716.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 15:57:08,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.08 | bwd_microstep: 1313.16 | bwd_inner_microstep: 1313.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 15:57:10,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1381.81 | bwd_inner_microstep: 1381.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3526
[2024-06-10 15:57:12,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.05 | bwd_microstep: 1687.14 | bwd_inner_microstep: 1687.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3421
[2024-06-10 15:57:14,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.67 | bwd_microstep: 1372.13 | bwd_inner_microstep: 1372.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 15:57:16,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.70 | bwd_microstep: 1417.78 | bwd_inner_microstep: 1417.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2001
[2024-06-10 15:57:17,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.41 | bwd_microstep: 709.44 | bwd_inner_microstep: 709.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 15:57:19,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1307.54 | bwd_inner_microstep: 1307.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 15:57:21,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.72 | bwd_microstep: 1425.73 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3530
[2024-06-10 15:57:23,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.46 | bwd_microstep: 1453.01 | bwd_inner_microstep: 1452.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 15:57:25,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1610.11 | bwd_inner_microstep: 1610.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 15:57:27,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1392.01 | bwd_inner_microstep: 1391.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1990
[2024-06-10 15:57:28,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.95 | bwd_microstep: 708.03 | bwd_inner_microstep: 708.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 15:57:30,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1493.40 | bwd_inner_microstep: 1493.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-10 15:57:31,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.95 | bwd_microstep: 761.65 | bwd_inner_microstep: 761.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-10 15:57:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.00 | bwd_microstep: 699.19 | bwd_inner_microstep: 699.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 15:57:34,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.03 | bwd_microstep: 1347.32 | bwd_inner_microstep: 1347.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 15:57:36,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.12 | bwd_microstep: 1543.76 | bwd_inner_microstep: 1543.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2234
[2024-06-10 15:57:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.34 | bwd_microstep: 1060.34 | bwd_inner_microstep: 1060.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3425
[2024-06-10 15:57:40,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.44 | bwd_microstep: 1494.25 | bwd_inner_microstep: 1494.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3398
[2024-06-10 15:57:42,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.31 | bwd_microstep: 1490.42 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 15:57:49,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 15:57:49,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.07 | bwd_microstep: 6570.94 | bwd_inner_microstep: 2018.65 | bwd_allreduce_microstep: 4552.24 | step_microstep: 38.17
[2024-06-10 15:57:49,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15353.76 | bwd: 45989.06 | bwd_inner: 41435.92 | bwd_allreduce: 4552.46 | step: 39.76
{'loss': 1.2174, 'learning_rate': 2.0037533940376083e-05, 'epoch': 0.51}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 15:57:51,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1332.49 | bwd_inner_microstep: 1332.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 15:57:53,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1279.68 | bwd_inner_microstep: 1279.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2334
[2024-06-10 15:57:54,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.61 | bwd_microstep: 982.44 | bwd_inner_microstep: 982.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 15:57:55,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.89 | bwd_microstep: 971.55 | bwd_inner_microstep: 971.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 15:57:57,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1246.81 | bwd_inner_microstep: 1246.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 15:57:58,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.24 | bwd_microstep: 803.51 | bwd_inner_microstep: 803.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1894
[2024-06-10 15:57:59,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.06 | bwd_microstep: 712.45 | bwd_inner_microstep: 712.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 15:58:01,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.85 | bwd_microstep: 1279.93 | bwd_inner_microstep: 1279.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2455
[2024-06-10 15:58:02,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.41 | bwd_microstep: 977.97 | bwd_inner_microstep: 977.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2994
[2024-06-10 15:58:04,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.45 | bwd_microstep: 1190.00 | bwd_inner_microstep: 1189.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 15:58:06,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.63 | bwd_microstep: 1477.77 | bwd_inner_microstep: 1477.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3447
[2024-06-10 15:58:08,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1578.68 | bwd_inner_microstep: 1578.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 15:58:10,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1349.82 | bwd_inner_microstep: 1349.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3499
[2024-06-10 15:58:12,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1679.38 | bwd_inner_microstep: 1679.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 15:58:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1391.35 | bwd_inner_microstep: 1391.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 15:58:16,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1380.56 | bwd_inner_microstep: 1380.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-10 15:58:17,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.61 | bwd_microstep: 709.66 | bwd_inner_microstep: 709.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 15:58:19,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1412.32 | bwd_inner_microstep: 1412.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627
[2024-06-10 15:58:21,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1444.96 | bwd_inner_microstep: 1444.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 15:58:22,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.77 | bwd_microstep: 798.44 | bwd_inner_microstep: 798.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466
[2024-06-10 15:58:24,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.75 | bwd_microstep: 1211.78 | bwd_inner_microstep: 1211.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 15:58:25,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.48 | bwd_microstep: 1156.84 | bwd_inner_microstep: 1156.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 15:58:28,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1508.93 | bwd_inner_microstep: 1508.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893
[2024-06-10 15:58:29,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.71 | bwd_microstep: 682.82 | bwd_inner_microstep: 682.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918
[2024-06-10 15:58:30,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.75 | bwd_microstep: 781.76 | bwd_inner_microstep: 781.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 15:58:32,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1411.90 | bwd_inner_microstep: 1411.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 15:58:33,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.34 | bwd_microstep: 1185.70 | bwd_inner_microstep: 1185.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3723
[2024-06-10 15:58:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1397.60 | bwd_inner_microstep: 1397.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 15:58:37,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1556.59 | bwd_inner_microstep: 1556.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077
[2024-06-10 15:58:39,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.84 | bwd_microstep: 916.30 | bwd_inner_microstep: 916.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2909
[2024-06-10 15:58:40,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.43 | bwd_microstep: 1200.94 | bwd_inner_microstep: 1200.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 15:58:51,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 15:58:51,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.92 | bwd_microstep: 9954.44 | bwd_inner_microstep: 1746.04 | bwd_allreduce_microstep: 8208.33 | step_microstep: 38.41
[2024-06-10 15:58:51,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14462.10 | bwd: 46965.39 | bwd_inner: 38756.15 | bwd_allreduce: 8208.57 | step: 39.83
{'loss': 1.2236, 'learning_rate': 2e-05, 'epoch': 0.52}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-10 15:58:53,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.86 | bwd_microstep: 1435.34 | bwd_inner_microstep: 1435.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 15:58:55,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.51 | bwd_microstep: 1469.34 | bwd_inner_microstep: 1469.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-10 15:58:57,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.61 | bwd_microstep: 1577.45 | bwd_inner_microstep: 1577.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 15:58:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.61 | bwd_microstep: 1478.95 | bwd_inner_microstep: 1478.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 15:59:01,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.23 | bwd_microstep: 1451.03 | bwd_inner_microstep: 1451.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 15:59:03,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1279.66 | bwd_inner_microstep: 1279.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 15:59:05,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1383.09 | bwd_inner_microstep: 1383.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 15:59:07,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.34 | bwd_microstep: 1293.18 | bwd_inner_microstep: 1293.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 15:59:08,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 15:59:10,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.35 | bwd_microstep: 1423.85 | bwd_inner_microstep: 1423.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 15:59:11,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.90 | bwd_microstep: 795.55 | bwd_inner_microstep: 795.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 15:59:13,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1284.42 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 15:59:15,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.80 | bwd_microstep: 1338.84 | bwd_inner_microstep: 1338.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 15:59:17,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 15:59:19,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1514.38 | bwd_inner_microstep: 1514.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 15:59:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.27 | bwd_microstep: 793.36 | bwd_inner_microstep: 793.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 15:59:22,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1390.87 | bwd_inner_microstep: 1390.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 15:59:24,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1523.52 | bwd_inner_microstep: 1523.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 15:59:26,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1558.20 | bwd_inner_microstep: 1558.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2283
[2024-06-10 15:59:28,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.48 | bwd_microstep: 927.55 | bwd_inner_microstep: 927.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3932
[2024-06-10 15:59:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.84 | bwd_microstep: 1401.55 | bwd_inner_microstep: 1401.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 15:59:31,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1376.79 | bwd_inner_microstep: 1376.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822
[2024-06-10 15:59:34,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.30 | bwd_microstep: 1686.26 | bwd_inner_microstep: 1686.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 15:59:36,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1395.30 | bwd_inner_microstep: 1395.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2010
[2024-06-10 15:59:37,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.74 | bwd_microstep: 740.79 | bwd_inner_microstep: 740.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 15:59:39,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1547.41 | bwd_inner_microstep: 1547.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3721
[2024-06-10 15:59:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1381.41 | bwd_inner_microstep: 1381.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 15:59:43,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.62 | bwd_microstep: 1472.55 | bwd_inner_microstep: 1472.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438
[2024-06-10 15:59:45,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3554
[2024-06-10 15:59:47,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.73 | bwd_microstep: 1562.29 | bwd_inner_microstep: 1562.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 15:59:49,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1339.71 | bwd_inner_microstep: 1339.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3411
[2024-06-10 15:59:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-10 15:59:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.93 | bwd_microstep: 1820.68 | bwd_inner_microstep: 1503.34 | bwd_allreduce_microstep: 317.30 | step_microstep: 37.56
[2024-06-10 15:59:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16219.34 | bwd: 43728.80 | bwd_inner: 43410.61 | bwd_allreduce: 317.52 | step: 39.06
{'loss': 1.2677, 'learning_rate': 1.9962466059623928e-05, 'epoch': 0.52}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1978
[2024-06-10 15:59:52,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 823.41 | bwd_inner_microstep: 823.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3899
[2024-06-10 15:59:54,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1390.31 | bwd_inner_microstep: 1390.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 15:59:56,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.92 | bwd_microstep: 1150.13 | bwd_inner_microstep: 1150.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 15:59:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1552.58 | bwd_inner_microstep: 1552.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 16:00:00,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1251.47 | bwd_inner_microstep: 1251.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:00:02,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1387.53 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 16:00:04,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.41 | bwd_microstep: 1625.79 | bwd_inner_microstep: 1625.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 16:00:06,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.97 | bwd_microstep: 1425.65 | bwd_inner_microstep: 1425.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 16:00:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 16:00:09,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 16:00:12,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1492.99 | bwd_inner_microstep: 1492.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 16:00:13,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1378.92 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1960
[2024-06-10 16:00:14,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.76 | bwd_microstep: 732.21 | bwd_inner_microstep: 732.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523
[2024-06-10 16:00:17,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.36 | bwd_microstep: 1583.05 | bwd_inner_microstep: 1583.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527
[2024-06-10 16:00:18,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1322.94 | bwd_inner_microstep: 1322.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 16:00:20,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1477.21 | bwd_inner_microstep: 1477.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 16:00:23,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1481.55 | bwd_inner_microstep: 1481.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 16:00:25,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.11 | bwd_microstep: 1514.81 | bwd_inner_microstep: 1514.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 16:00:27,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1411.34 | bwd_inner_microstep: 1411.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-10 16:00:28,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.46 | bwd_microstep: 1192.62 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 16:00:30,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1555.86 | bwd_inner_microstep: 1555.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2163
[2024-06-10 16:00:32,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.91 | bwd_microstep: 950.57 | bwd_inner_microstep: 950.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551
[2024-06-10 16:00:33,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.83 | bwd_microstep: 1232.05 | bwd_inner_microstep: 1232.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 16:00:35,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.46 | bwd_microstep: 1292.86 | bwd_inner_microstep: 1292.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534
[2024-06-10 16:00:37,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.12 | bwd_microstep: 1229.32 | bwd_inner_microstep: 1229.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-10 16:00:39,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1218.21 | bwd_inner_microstep: 1218.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 16:00:41,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1559.78 | bwd_inner_microstep: 1559.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 16:00:43,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1491.30 | bwd_inner_microstep: 1491.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 16:00:45,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1405.21 | bwd_inner_microstep: 1405.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723
[2024-06-10 16:00:47,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1335.20 | bwd_inner_microstep: 1335.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 16:00:48,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1291.64 | bwd_inner_microstep: 1291.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 16:00:55,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 16:00:55,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.51 | bwd_microstep: 6294.29 | bwd_inner_microstep: 1663.98 | bwd_allreduce_microstep: 4630.27 | step_microstep: 38.01
[2024-06-10 16:00:55,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16098.25 | bwd: 47737.50 | bwd_inner: 43106.30 | bwd_allreduce: 4630.50 | step: 39.51
1.63s/it]


 51%|█████▏    | 886/1726 [15:18:22<14:22:47, 61.63s/it]
 51%|█████▏    | 887/1726 [15:19:24<14:24:23, 61.82s/it]


 51%|█████▏    | 887/1726 [15:19:24<14:24:23, 61.82s/it]
 51%|█████▏    | 888/1726 [15:20:26<14:22:46, 61.77s/it]


 51%|█████▏    | 888/1726 [15:20:26<14:22:46, 61.77s/it]
 52%|█████▏    | 889/1726 [15:21:28<14:21:37, 61.77s/it]


 52%|█████▏    | 889/1726 [15:21:28<14:21:37, 61.77s/it]
 52%|█████▏    | 890/1726 [15:22:28<14:14:23, 61.32s/it]


 52%|█████▏    | 890/1726 [15:22:28<14:14:23, 61.32s/it]
 52%|█████▏    | 891/1726 [15:23:32<14:25:13, 62.17s/it]
                                  {'loss': 1.1821, 'learning_rate': 1.992493225144219e-05, 'epoch': 0.52}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3520
[2024-06-10 16:00:57,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.67 | bwd_microstep: 1336.21 | bwd_inner_microstep: 1336.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3921
[2024-06-10 16:00:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.39 | bwd_microstep: 1484.64 | bwd_inner_microstep: 1484.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 16:01:01,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.45 | bwd_microstep: 1546.68 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-10 16:01:02,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.00 | bwd_microstep: 874.60 | bwd_inner_microstep: 874.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 16:01:04,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.13 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 16:01:06,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 16:01:08,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.65 | bwd_microstep: 1180.53 | bwd_inner_microstep: 1180.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073
[2024-06-10 16:01:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.98 | bwd_microstep: 816.54 | bwd_inner_microstep: 816.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 16:01:11,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1243.27 | bwd_inner_microstep: 1243.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 16:01:12,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1246.61 | bwd_inner_microstep: 1246.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 16:01:14,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.99 | bwd_inner_microstep: 1479.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-10 16:01:16,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.11 | bwd_microstep: 1438.75 | bwd_inner_microstep: 1438.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 16:01:18,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1478.48 | bwd_inner_microstep: 1478.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3437
[2024-06-10 16:01:21,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1506.20 | bwd_inner_microstep: 1506.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 16:01:22,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.49 | bwd_microstep: 1314.63 | bwd_inner_microstep: 1314.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 16:01:24,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.26 | bwd_microstep: 1314.29 | bwd_inner_microstep: 1314.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 16:01:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1496.00 | bwd_inner_microstep: 1495.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2300
[2024-06-10 16:01:28,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.85 | bwd_microstep: 1004.87 | bwd_inner_microstep: 1004.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3813
[2024-06-10 16:01:30,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.34 | bwd_microstep: 1682.44 | bwd_inner_microstep: 1682.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-10 16:01:32,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1607.61 | bwd_inner_microstep: 1607.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 16:01:33,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.40 | bwd_microstep: 973.53 | bwd_inner_microstep: 973.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 16:01:35,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.36 | bwd_microstep: 1276.19 | bwd_inner_microstep: 1276.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 16:01:37,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1553.78 | bwd_inner_microstep: 1553.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 16:01:39,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1294.79 | bwd_inner_microstep: 1294.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 16:01:41,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1458.08 | bwd_inner_microstep: 1458.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 16:01:43,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1345.34 | bwd_inner_microstep: 1345.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 16:01:45,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.32 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3586
[2024-06-10 16:01:47,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.75 | bwd_microstep: 1704.87 | bwd_inner_microstep: 1704.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3605
[2024-06-10 16:01:50,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.34 | bwd_microstep: 1703.53 | bwd_inner_microstep: 1703.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3422
[2024-06-10 16:01:51,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.93 | bwd_microstep: 1208.43 | bwd_inner_microstep: 1208.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3421
[2024-06-10 16:01:53,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.40 | bwd_microstep: 1325.11 | bwd_inner_microstep: 1325.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 16:01:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 16:01:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 2994.56 | bwd_inner_microstep: 1527.25 | bwd_allreduce_microstep: 1467.25 | step_microstep: 38.19
[2024-06-10 16:01:57,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16175.93 | bwd: 44897.51 | bwd_inner: 43429.35 | bwd_allreduce: 1467.48 | step: 39.68
{'loss': 1.2872, 'learning_rate': 1.988739870764869e-05, 'epoch': 0.52}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-10 16:01:59,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1354.62 | bwd_inner_microstep: 1354.55 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 16:02:00,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.23 | bwd_microstep: 1274.64 | bwd_inner_microstep: 1274.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3871
[2024-06-10 16:02:02,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.85 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 16:02:04,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1447.72 | bwd_inner_microstep: 1447.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 16:02:06,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.27 | bwd_inner_microstep: 1382.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-10 16:02:08,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.95 | bwd_microstep: 1187.53 | bwd_inner_microstep: 1187.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1983
[2024-06-10 16:02:09,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.43 | bwd_microstep: 734.11 | bwd_inner_microstep: 734.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 16:02:11,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1384.01 | bwd_inner_microstep: 1383.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 16:02:13,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1252.51 | bwd_inner_microstep: 1252.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 16:02:14,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1316.19 | bwd_inner_microstep: 1316.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955
[2024-06-10 16:02:16,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.94 | bwd_microstep: 820.44 | bwd_inner_microstep: 820.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3567
[2024-06-10 16:02:18,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.48 | bwd_microstep: 1558.14 | bwd_inner_microstep: 1558.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 16:02:20,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.02 | bwd_microstep: 1579.00 | bwd_inner_microstep: 1578.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774
[2024-06-10 16:02:22,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.83 | bwd_microstep: 1735.38 | bwd_inner_microstep: 1735.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-10 16:02:25,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.41 | bwd_microstep: 1711.51 | bwd_inner_microstep: 1711.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 16:02:27,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.58 | bwd_microstep: 1484.50 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 16:02:29,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1442.25 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637
[2024-06-10 16:02:31,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.24 | bwd_microstep: 1532.67 | bwd_inner_microstep: 1532.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3510
[2024-06-10 16:02:33,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1511.60 | bwd_inner_microstep: 1511.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1202
[2024-06-10 16:02:34,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 178.54 | bwd_microstep: 462.84 | bwd_inner_microstep: 462.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3392
[2024-06-10 16:02:35,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1391.63 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 16:02:38,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.02 | bwd_microstep: 1522.07 | bwd_inner_microstep: 1522.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3604
[2024-06-10 16:02:40,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.48 | bwd_microstep: 1584.72 | bwd_inner_microstep: 1584.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 16:02:42,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1322.90 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 16:02:43,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.05 | bwd_microstep: 971.00 | bwd_inner_microstep: 970.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 16:02:44,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.75 | bwd_microstep: 976.23 | bwd_inner_microstep: 976.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551
[2024-06-10 16:02:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.21 | bwd_microstep: 1197.96 | bwd_inner_microstep: 1197.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2108
[2024-06-10 16:02:47,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.72 | bwd_microstep: 854.23 | bwd_inner_microstep: 854.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3539
[2024-06-10 16:02:49,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1323.09 | bwd_inner_microstep: 1323.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 16:02:51,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1499.11 | bwd_inner_microstep: 1499.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 16:02:53,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3579
[2024-06-10 16:02:57,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.38 | optimizer_step: 6.61
[2024-06-10 16:02:57,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.29 | bwd_microstep: 3991.33 | bwd_inner_microstep: 1491.70 | bwd_allreduce_microstep: 2499.57 | step_microstep: 39.33
[2024-06-10 16:02:57,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15713.01 | bwd: 44617.33 | bwd_inner: 42116.80 | bwd_allreduce: 2499.82 | step: 40.83
{'loss': 1.1885, 'learning_rate': 1.984986556043634e-05, 'epoch': 0.52}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 16:02:59,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.30 | bwd_microstep: 1329.60 | bwd_inner_microstep: 1329.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3883
[2024-06-10 16:03:01,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.12 | bwd_microstep: 1350.07 | bwd_inner_microstep: 1350.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 16:03:03,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1271.29 | bwd_inner_microstep: 1271.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-10 16:03:05,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.40 | bwd_microstep: 1640.79 | bwd_inner_microstep: 1640.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 16:03:07,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1492.22 | bwd_inner_microstep: 1492.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 16:03:09,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1281.43 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 16:03:11,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1395.65 | bwd_inner_microstep: 1395.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 778
[2024-06-10 16:03:11,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.23 | bwd_microstep: 306.41 | bwd_inner_microstep: 306.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 16:03:13,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1386.62 | bwd_inner_microstep: 1386.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 16:03:15,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.03 | bwd_microstep: 1285.65 | bwd_inner_microstep: 1285.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 16:03:17,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1420.36 | bwd_inner_microstep: 1420.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 16:03:18,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 802.59 | bwd_inner_microstep: 802.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 16:03:20,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609
[2024-06-10 16:03:22,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1314.96 | bwd_inner_microstep: 1314.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515
[2024-06-10 16:03:23,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1352.94 | bwd_inner_microstep: 1352.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3423
[2024-06-10 16:03:25,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1407.36 | bwd_inner_microstep: 1407.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 16:03:28,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.66 | bwd_microstep: 1600.46 | bwd_inner_microstep: 1600.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-10 16:03:29,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.43 | bwd_microstep: 1408.00 | bwd_inner_microstep: 1407.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 648
[2024-06-10 16:03:30,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.31 | bwd_microstep: 273.92 | bwd_inner_microstep: 273.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 16:03:32,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.82 | bwd_microstep: 1654.40 | bwd_inner_microstep: 1654.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2287
[2024-06-10 16:03:34,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.07 | bwd_microstep: 1068.79 | bwd_inner_microstep: 1068.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 16:03:36,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 16:03:38,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1412.87 | bwd_inner_microstep: 1412.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 16:03:40,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1535.71 | bwd_inner_microstep: 1535.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-10 16:03:42,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1653.82 | bwd_inner_microstep: 1653.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 16:03:44,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1535.98 | bwd_inner_microstep: 1535.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 16:03:46,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 879
[2024-06-10 16:03:47,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.09 | bwd_microstep: 366.98 | bwd_inner_microstep: 366.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 16:03:49,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.05 | bwd_microstep: 1503.92 | bwd_inner_microstep: 1503.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3791
[2024-06-10 16:03:51,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.55 | bwd_microstep: 1681.32 | bwd_inner_microstep: 1681.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 16:03:53,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 1634.72 | bwd_inner_microstep: 1634.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2416
[2024-06-10 16:03:57,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 16:03:57,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.12 | bwd_microstep: 2937.51 | bwd_inner_microstep: 1225.02 | bwd_allreduce_microstep: 1712.43 | step_microstep: 37.95
[2024-06-10 16:03:57,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15504.04 | bwd: 43494.81 | bwd_inner: 41781.45 | bwd_allreduce: 1712.66 | step: 39.44
{'loss': 1.2182, 'learning_rate': 1.981233294199671e-05, 'epoch': 0.52}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 16:03:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.98 | bwd_microstep: 1370.68 | bwd_inner_microstep: 1370.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1353
[2024-06-10 16:03:59,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.44 | bwd_microstep: 516.01 | bwd_inner_microstep: 515.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 16:04:01,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1443.12 | bwd_inner_microstep: 1443.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 16:04:03,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1474.86 | bwd_inner_microstep: 1474.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 16:04:05,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1379.36 | bwd_inner_microstep: 1379.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 16:04:07,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.89 | bwd_microstep: 1544.28 | bwd_inner_microstep: 1544.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 16:04:09,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1275.76 | bwd_inner_microstep: 1275.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-10 16:04:11,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1428.49 | bwd_inner_microstep: 1428.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3485
[2024-06-10 16:04:13,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1408.58 | bwd_inner_microstep: 1408.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2903
[2024-06-10 16:04:15,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.10 | bwd_microstep: 1088.12 | bwd_inner_microstep: 1088.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1892
[2024-06-10 16:04:15,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.06 | bwd_microstep: 683.21 | bwd_inner_microstep: 683.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 16:04:18,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.94 | bwd_microstep: 1480.49 | bwd_inner_microstep: 1480.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 16:04:20,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1480.86 | bwd_inner_microstep: 1480.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3535
[2024-06-10 16:04:21,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1321.16 | bwd_inner_microstep: 1321.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493
[2024-06-10 16:04:24,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1581.51 | bwd_inner_microstep: 1581.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-10 16:04:26,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1577.74 | bwd_inner_microstep: 1577.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 16:04:27,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.90 | bwd_microstep: 701.08 | bwd_inner_microstep: 701.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-10 16:04:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.16 | bwd_microstep: 918.54 | bwd_inner_microstep: 918.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 16:04:30,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1354.61 | bwd_inner_microstep: 1354.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 16:04:32,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1414.82 | bwd_inner_microstep: 1414.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 16:04:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.93 | bwd_microstep: 1657.44 | bwd_inner_microstep: 1657.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 16:04:35,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.60 | bwd_microstep: 974.40 | bwd_inner_microstep: 974.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 16:04:37,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 16:04:39,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1300.16 | bwd_inner_microstep: 1300.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 16:04:41,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1448.11 | bwd_inner_microstep: 1448.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3817
[2024-06-10 16:04:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-10 16:04:45,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1590.93 | bwd_inner_microstep: 1590.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 16:04:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.68 | bwd_microstep: 1595.33 | bwd_inner_microstep: 1595.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 16:04:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1535.72 | bwd_inner_microstep: 1535.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 16:04:52,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1397.83 | bwd_inner_microstep: 1397.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042
[2024-06-10 16:04:53,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.38 | bwd_microstep: 904.29 | bwd_inner_microstep: 904.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3562
[2024-06-10 16:04:59,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.26 | optimizer_step: 6.58
[2024-06-10 16:04:59,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.94 | bwd_microstep: 5515.20 | bwd_inner_microstep: 1775.85 | bwd_allreduce_microstep: 3739.29 | step_microstep: 38.40
[2024-06-10 16:04:59,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15826.08 | bwd: 46280.01 | bwd_inner: 42539.81 | bwd_allreduce: 3739.53 | step: 39.89
{'loss': 1.2076, 'learning_rate': 1.9774800984519485e-05, 'epoch': 0.52}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3597
[2024-06-10 16:05:01,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.56 | bwd_microstep: 1458.97 | bwd_inner_microstep: 1458.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2434
[2024-06-10 16:05:02,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.94 | bwd_microstep: 1008.77 | bwd_inner_microstep: 1008.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 16:05:04,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.22 | bwd_microstep: 1245.74 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3912
[2024-06-10 16:05:06,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 1488.51 | bwd_inner_microstep: 1488.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 16:05:08,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.96 | bwd_microstep: 1150.93 | bwd_inner_microstep: 1150.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862
[2024-06-10 16:05:10,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.23 | bwd_microstep: 1665.96 | bwd_inner_microstep: 1665.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 16:05:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1348.31 | bwd_inner_microstep: 1348.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2311
[2024-06-10 16:05:13,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.95 | bwd_microstep: 787.11 | bwd_inner_microstep: 787.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 16:05:15,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1342.76 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 16:05:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.14 | bwd_microstep: 797.68 | bwd_inner_microstep: 797.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 16:05:18,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1390.63 | bwd_inner_microstep: 1390.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 16:05:20,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1391.56 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3417
[2024-06-10 16:05:22,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1421.53 | bwd_inner_microstep: 1421.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2875
[2024-06-10 16:05:24,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.26 | bwd_microstep: 1176.93 | bwd_inner_microstep: 1176.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 16:05:26,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.63 | bwd_microstep: 1511.41 | bwd_inner_microstep: 1511.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2175
[2024-06-10 16:05:27,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.92 | bwd_microstep: 948.22 | bwd_inner_microstep: 948.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 16:05:29,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 1491.24 | bwd_inner_microstep: 1491.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 16:05:31,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.78 | bwd_microstep: 1626.17 | bwd_inner_microstep: 1626.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3692
[2024-06-10 16:05:33,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1358.62 | bwd_inner_microstep: 1358.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 16:05:35,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1512.57 | bwd_inner_microstep: 1512.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 16:05:37,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1502.27 | bwd_inner_microstep: 1502.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 16:05:39,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1556.21 | bwd_inner_microstep: 1556.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3548
[2024-06-10 16:05:41,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.54 | bwd_microstep: 1230.14 | bwd_inner_microstep: 1230.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3778
[2024-06-10 16:05:43,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.33 | bwd_microstep: 1411.00 | bwd_inner_microstep: 1410.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2432
[2024-06-10 16:05:44,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.23 | bwd_microstep: 941.09 | bwd_inner_microstep: 941.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 633
[2024-06-10 16:05:45,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 108.64 | bwd_microstep: 263.37 | bwd_inner_microstep: 263.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 16:05:47,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.71 | bwd_microstep: 1549.51 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 16:05:49,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1653.51 | bwd_inner_microstep: 1653.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3594
[2024-06-10 16:05:51,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.48 | bwd_microstep: 1553.44 | bwd_inner_microstep: 1553.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-10 16:05:53,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.81 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3491
[2024-06-10 16:05:55,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.89 | bwd_microstep: 1333.54 | bwd_inner_microstep: 1333.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 16:05:59,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 16:05:59,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 2991.42 | bwd_inner_microstep: 1753.43 | bwd_allreduce_microstep: 1237.94 | step_microstep: 37.83
[2024-06-10 16:05:59,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.93 | bwd: 43557.15 | bwd_inner: 42318.30 | bwd_allreduce: 1238.17 | step: 39.57
{'loss': 1.211, 'learning_rate': 1.973726982019202e-05, 'epoch': 0.52}


 52%|█████▏    | 891/1726 [15:23:32<14:25:13, 62.17s/it]
 52%|█████▏    | 892/1726 [15:24:33<14:21:00, 61.94s/it]


 52%|█████▏    | 892/1726 [15:24:33<14:21:00, 61.94s/it]
 52%|█████▏    | 893/1726 [15:25:34<14:14:37, 61.56s/it]


 52%|█████▏    | 893/1726 [15:25:34<14:14:37, 61.56s/it]
 52%|█████▏    | 894/1726 [15:26:33<14:04:20, 60.89s/it]


 52%|█████▏    | 894/1726 [15:26:33<14:04:20, 60.89s/it]
 52%|█████▏    | 895/1726 [15:27:36<14:09:48, 61.36s/it]


 52%|█████▏    | 895/1726 [15:27:36<14:09:48, 61.36s/it]
 52%|█████▏    | 896/1726 [15:28:35<14:01:40, 60.84s/it]


 52%|█████dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-10 16:06:01,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.50 | bwd_microstep: 1596.72 | bwd_inner_microstep: 1596.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 16:06:03,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1275.10 | bwd_inner_microstep: 1275.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 16:06:04,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 16:06:06,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1356.89 | bwd_inner_microstep: 1356.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:06:08,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.21 | bwd_microstep: 1381.26 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4081
[2024-06-10 16:06:10,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1586.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 16:06:12,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.92 | bwd_microstep: 1252.55 | bwd_inner_microstep: 1252.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 16:06:14,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.59 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 16:06:15,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.27 | bwd_microstep: 797.57 | bwd_inner_microstep: 797.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 16:06:17,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.03 | bwd_microstep: 1305.30 | bwd_inner_microstep: 1305.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 16:06:19,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1250.75 | bwd_inner_microstep: 1250.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 16:06:21,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.35 | bwd_microstep: 1529.88 | bwd_inner_microstep: 1529.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-10 16:06:22,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.88 | bwd_microstep: 805.70 | bwd_inner_microstep: 805.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2157
[2024-06-10 16:06:23,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.03 | bwd_microstep: 828.35 | bwd_inner_microstep: 828.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1857
[2024-06-10 16:06:24,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.52 | bwd_microstep: 707.09 | bwd_inner_microstep: 707.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3677
[2024-06-10 16:06:26,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.15 | bwd_microstep: 1551.69 | bwd_inner_microstep: 1551.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 16:06:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1277.09 | bwd_inner_microstep: 1277.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 16:06:30,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1247.10 | bwd_inner_microstep: 1247.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 16:06:32,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1493.27 | bwd_inner_microstep: 1493.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 16:06:34,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1521.64 | bwd_inner_microstep: 1521.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2132
[2024-06-10 16:06:35,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.00 | bwd_microstep: 867.00 | bwd_inner_microstep: 866.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3604
[2024-06-10 16:06:37,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.50 | bwd_microstep: 1217.46 | bwd_inner_microstep: 1217.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 16:06:39,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1558.93 | bwd_inner_microstep: 1558.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 16:06:41,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1281.29 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 16:06:43,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.02 | bwd_microstep: 1638.23 | bwd_inner_microstep: 1638.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 16:06:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1559.16 | bwd_inner_microstep: 1559.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728
[2024-06-10 16:06:47,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.11 | bwd_microstep: 1466.84 | bwd_inner_microstep: 1466.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602
[2024-06-10 16:06:49,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1537.59 | bwd_inner_microstep: 1537.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799
[2024-06-10 16:06:51,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.47 | bwd_microstep: 1644.99 | bwd_inner_microstep: 1644.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3802
[2024-06-10 16:06:53,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1513.54 | bwd_inner_microstep: 1513.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 16:06:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.56 | bwd_microstep: 1284.20 | bwd_inner_microstep: 1284.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 16:07:02,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 16:07:02,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 5931.34 | bwd_inner_microstep: 1803.09 | bwd_allreduce_microstep: 4128.20 | step_microstep: 37.87
[2024-06-10 16:07:02,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15873.67 | bwd: 46833.49 | bwd_inner: 42704.38 | bwd_allreduce: 4128.43 | step: 39.45
{'loss': 1.2167, 'learning_rate': 1.9699739581198888e-05, 'epoch': 0.52}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 16:07:04,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1466.38 | bwd_inner_microstep: 1466.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401
[2024-06-10 16:07:05,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.34 | bwd_microstep: 1207.94 | bwd_inner_microstep: 1207.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 16:07:07,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1377.17 | bwd_inner_microstep: 1377.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 16:07:10,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.18 | bwd_microstep: 1566.49 | bwd_inner_microstep: 1566.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3482
[2024-06-10 16:07:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.79 | bwd_microstep: 1215.81 | bwd_inner_microstep: 1215.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 16:07:13,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1381.41 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4121
[2024-06-10 16:07:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.25 | bwd_microstep: 1635.77 | bwd_inner_microstep: 1635.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 16:07:17,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 1416.18 | bwd_inner_microstep: 1416.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 16:07:18,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.21 | bwd_microstep: 798.50 | bwd_inner_microstep: 798.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2660
[2024-06-10 16:07:20,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.16 | bwd_microstep: 1024.84 | bwd_inner_microstep: 1024.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3530
[2024-06-10 16:07:22,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.08 | bwd_microstep: 1454.48 | bwd_inner_microstep: 1454.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672
[2024-06-10 16:07:24,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.18 | bwd_microstep: 1615.15 | bwd_inner_microstep: 1615.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 16:07:26,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.32 | bwd_microstep: 1519.77 | bwd_inner_microstep: 1519.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2105
[2024-06-10 16:07:27,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.12 | bwd_microstep: 855.35 | bwd_inner_microstep: 855.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 16:07:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533
[2024-06-10 16:07:31,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1585.14 | bwd_inner_microstep: 1585.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 16:07:33,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1254.28 | bwd_inner_microstep: 1254.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 16:07:34,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.56 | bwd_microstep: 807.26 | bwd_inner_microstep: 807.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 16:07:35,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.88 | bwd_microstep: 810.52 | bwd_inner_microstep: 810.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2446
[2024-06-10 16:07:37,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.83 | bwd_microstep: 948.55 | bwd_inner_microstep: 948.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 16:07:38,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.37 | bwd_microstep: 1216.44 | bwd_inner_microstep: 1216.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 16:07:40,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.19 | bwd_microstep: 1280.07 | bwd_inner_microstep: 1280.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 16:07:42,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.81 | bwd_microstep: 1627.45 | bwd_inner_microstep: 1627.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2048
[2024-06-10 16:07:44,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.46 | bwd_microstep: 907.98 | bwd_inner_microstep: 907.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-10 16:07:45,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.61 | bwd_microstep: 1190.14 | bwd_inner_microstep: 1190.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 16:07:47,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 16:07:49,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1403.71 | bwd_inner_microstep: 1403.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814
[2024-06-10 16:07:51,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.05 | bwd_microstep: 1722.81 | bwd_inner_microstep: 1722.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 16:07:54,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1507.13 | bwd_inner_microstep: 1507.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 16:07:56,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.40 | bwd_microstep: 1644.78 | bwd_inner_microstep: 1644.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 16:07:58,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.65 | bwd_microstep: 1599.96 | bwd_inner_microstep: 1599.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3604
[2024-06-10 16:08:02,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 16:08:02,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.36 | bwd_microstep: 3605.41 | bwd_inner_microstep: 1930.27 | bwd_allreduce_microstep: 1675.09 | step_microstep: 37.75
[2024-06-10 16:08:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15872.83 | bwd: 44312.37 | bwd_inner: 42636.38 | bwd_allreduce: 1675.32 | step: 39.19
{'loss': 1.2287, 'learning_rate': 1.966221039972138e-05, 'epoch': 0.52}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 16:08:04,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.25 | bwd_microstep: 1468.91 | bwd_inner_microstep: 1468.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 16:08:06,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.72 | bwd_microstep: 1276.34 | bwd_inner_microstep: 1276.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 16:08:08,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1337.77 | bwd_inner_microstep: 1337.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 16:08:10,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.72 | bwd_microstep: 1546.79 | bwd_inner_microstep: 1546.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1880
[2024-06-10 16:08:11,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.90 | bwd_microstep: 742.03 | bwd_inner_microstep: 742.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-10 16:08:13,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.99 | bwd_microstep: 1462.97 | bwd_inner_microstep: 1462.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 16:08:15,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 1285.71 | bwd_inner_microstep: 1285.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 16:08:17,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.00 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 16:08:18,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 16:08:20,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1249.38 | bwd_inner_microstep: 1249.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 16:08:21,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.62 | bwd_microstep: 801.44 | bwd_inner_microstep: 801.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434
[2024-06-10 16:08:23,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.12 | bwd_microstep: 1191.00 | bwd_inner_microstep: 1190.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2457
[2024-06-10 16:08:24,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.42 | bwd_microstep: 1014.63 | bwd_inner_microstep: 1014.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3416
[2024-06-10 16:08:26,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1330.44 | bwd_inner_microstep: 1330.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3694
[2024-06-10 16:08:28,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.40 | bwd_microstep: 1234.57 | bwd_inner_microstep: 1234.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 16:08:30,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1287.38 | bwd_inner_microstep: 1287.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686
[2024-06-10 16:08:31,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1331.11 | bwd_inner_microstep: 1331.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 16:08:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1326.99 | bwd_inner_microstep: 1326.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3681
[2024-06-10 16:08:35,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1291.82 | bwd_inner_microstep: 1291.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 16:08:37,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.13 | bwd_microstep: 1391.58 | bwd_inner_microstep: 1391.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 16:08:39,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.91 | bwd_microstep: 1292.71 | bwd_inner_microstep: 1292.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 16:08:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.99 | bwd_microstep: 1183.74 | bwd_inner_microstep: 1183.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 16:08:43,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1608.55 | bwd_inner_microstep: 1608.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934
[2024-06-10 16:08:44,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.77 | bwd_microstep: 727.81 | bwd_inner_microstep: 727.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 16:08:46,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.47 | bwd_microstep: 1606.60 | bwd_inner_microstep: 1606.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3568
[2024-06-10 16:08:48,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.53 | bwd_microstep: 1346.88 | bwd_inner_microstep: 1346.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813
[2024-06-10 16:08:50,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1624.67 | bwd_inner_microstep: 1624.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-10 16:08:52,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.66 | bwd_microstep: 1754.91 | bwd_inner_microstep: 1754.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607
[2024-06-10 16:08:55,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1566.26 | bwd_inner_microstep: 1566.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2264
[2024-06-10 16:08:56,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.23 | bwd_microstep: 875.80 | bwd_inner_microstep: 875.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-10 16:08:58,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.40 | bwd_microstep: 1751.79 | bwd_inner_microstep: 1751.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 16:09:03,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 16:09:03,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 4040.04 | bwd_inner_microstep: 1808.37 | bwd_allreduce_microstep: 2231.62 | step_microstep: 37.87
[2024-06-10 16:09:03,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15771.08 | bwd: 44484.98 | bwd_inner: 42252.46 | bwd_allreduce: 2231.85 | step: 39.33
{'loss': 1.2305, 'learning_rate': 1.962468240793709e-05, 'epoch': 0.52}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2026
[2024-06-10 16:09:04,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.22 | bwd_microstep: 903.41 | bwd_inner_microstep: 903.30 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3392
[2024-06-10 16:09:06,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.56 | bwd_microstep: 1141.97 | bwd_inner_microstep: 1141.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4508
[2024-06-10 16:09:08,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.55 | bwd_microstep: 1742.33 | bwd_inner_microstep: 1742.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3922
[2024-06-10 16:09:10,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.07 | bwd_microstep: 1485.76 | bwd_inner_microstep: 1485.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-10 16:09:12,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1409.77 | bwd_inner_microstep: 1409.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4174
[2024-06-10 16:09:15,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.02 | bwd_microstep: 1747.84 | bwd_inner_microstep: 1747.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217
[2024-06-10 16:09:16,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.19 | bwd_microstep: 956.30 | bwd_inner_microstep: 956.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2666
[2024-06-10 16:09:17,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.96 | bwd_microstep: 1024.12 | bwd_inner_microstep: 1024.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 16:09:19,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 16:09:21,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1519.45 | bwd_inner_microstep: 1519.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976
[2024-06-10 16:09:22,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.64 | bwd_microstep: 892.36 | bwd_inner_microstep: 892.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 16:09:24,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1350.01 | bwd_inner_microstep: 1349.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905
[2024-06-10 16:09:25,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.35 | bwd_microstep: 775.73 | bwd_inner_microstep: 775.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1930
[2024-06-10 16:09:26,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.12 | bwd_microstep: 726.36 | bwd_inner_microstep: 726.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2979
[2024-06-10 16:09:28,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.39 | bwd_microstep: 1138.09 | bwd_inner_microstep: 1138.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3515
[2024-06-10 16:09:30,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.57 | bwd_microstep: 1616.12 | bwd_inner_microstep: 1616.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3648
[2024-06-10 16:09:32,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.63 | bwd_microstep: 1706.43 | bwd_inner_microstep: 1706.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023
[2024-06-10 16:09:34,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.80 | bwd_microstep: 903.85 | bwd_inner_microstep: 903.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 16:09:35,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.50 | bwd_microstep: 1157.98 | bwd_inner_microstep: 1157.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-10 16:09:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.76 | bwd_microstep: 901.20 | bwd_inner_microstep: 901.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 16:09:39,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.16 | bwd_microstep: 1494.01 | bwd_inner_microstep: 1493.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 16:09:41,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1461.36 | bwd_inner_microstep: 1461.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068
[2024-06-10 16:09:42,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.52 | bwd_microstep: 818.11 | bwd_inner_microstep: 818.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1437
[2024-06-10 16:09:42,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.30 | bwd_microstep: 535.29 | bwd_inner_microstep: 535.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568
[2024-06-10 16:09:44,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1362.05 | bwd_inner_microstep: 1362.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3608
[2024-06-10 16:09:46,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1341.17 | bwd_inner_microstep: 1341.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 16:09:48,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1411.93 | bwd_inner_microstep: 1411.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 16:09:50,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1450.10 | bwd_inner_microstep: 1450.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 16:09:52,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.48 | bwd_microstep: 1287.56 | bwd_inner_microstep: 1287.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-10 16:09:54,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.18 | bwd_microstep: 1311.62 | bwd_inner_microstep: 1311.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 16:09:56,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.53 | bwd_microstep: 1446.84 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 16:10:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 16:10:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.55 | bwd_microstep: 8471.44 | bwd_inner_microstep: 1862.60 | bwd_allreduce_microstep: 6608.79 | step_microstep: 39.07
[2024-06-10 16:10:05,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14943.80 | bwd: 46736.91 | bwd_inner: 40127.13 | bwd_allreduce: 6609.06 | step: 40.67
{'loss': 1.2033, 'learning_rate': 1.9587155738019412e-05, 'epoch': 0.52}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 16:10:07,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1360.92 | bwd_inner_microstep: 1360.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4577
[2024-06-10 16:10:09,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.23 | bwd_microstep: 1780.46 | bwd_inner_microstep: 1780.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 16:10:11,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1485.16 | bwd_inner_microstep: 1485.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 16:10:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.00 | bwd_microstep: 1146.80 | bwd_inner_microstep: 1146.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 16:10:15,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1379.98 | bwd_inner_microstep: 1379.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 16:10:17,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.69 | bwd_microstep: 1528.32 | bwd_inner_microstep: 1528.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 16:10:19,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1283.27 | bwd_inner_microstep: 1283.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 16:10:20,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1250.60 | bwd_inner_microstep: 1250.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 16:10:22,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1298.28 | bwd_inner_microstep: 1298.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2371
[2024-06-10 16:10:23,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.35 | bwd_microstep: 933.52 | bwd_inner_microstep: 933.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 16:10:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1252.71 | bwd_inner_microstep: 1252.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3433
[2024-06-10 16:10:27,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1536.54 | bwd_inner_microstep: 1536.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 16:10:30,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1617.03 | bwd_inner_microstep: 1617.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-10 16:10:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1345.52 | bwd_inner_microstep: 1345.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2047
[2024-06-10 16:10:33,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.88 | bwd_microstep: 904.92 | bwd_inner_microstep: 904.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538
[2024-06-10 16:10:34,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.79 | bwd_microstep: 1196.34 | bwd_inner_microstep: 1196.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 16:10:36,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1278.47 | bwd_inner_microstep: 1278.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 16:10:37,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 788.99 | bwd_inner_microstep: 788.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 16:10:39,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.58 | bwd_microstep: 1652.34 | bwd_inner_microstep: 1652.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 16:10:41,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1293.78 | bwd_inner_microstep: 1293.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 16:10:43,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1356.68 | bwd_inner_microstep: 1356.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 16:10:45,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1254.14 | bwd_inner_microstep: 1254.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000
[2024-06-10 16:10:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.06 | bwd_microstep: 800.59 | bwd_inner_microstep: 800.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 16:10:48,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1190.84 | bwd_inner_microstep: 1190.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093
[2024-06-10 16:10:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.71 | bwd_microstep: 928.03 | bwd_inner_microstep: 928.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 16:10:51,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1474.23 | bwd_inner_microstep: 1474.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3550
[2024-06-10 16:10:53,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.96 | bwd_microstep: 1421.90 | bwd_inner_microstep: 1421.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 16:10:55,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.67 | bwd_microstep: 1694.25 | bwd_inner_microstep: 1694.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 16:10:57,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1531.67 | bwd_inner_microstep: 1531.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 16:11:00,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.14 | bwd_microstep: 1595.68 | bwd_inner_microstep: 1595.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 16:11:01,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.82 | bwd_microstep: 972.98 | bwd_inner_microstep: 972.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 16:11:05,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.19 | optimizer_step: 6.62
[2024-06-10 16:11:05,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.81 | bwd_microstep: 3853.31 | bwd_inner_microstep: 1852.67 | bwd_allreduce_microstep: 2000.59 | step_microstep: 38.04
[2024-06-10 16:11:05,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15804.29 | bwd: 44388.26 | bwd_inner: 42386.77 | bwd_allreduce: 2000.82 | step: 39.50
{'loss': 1.2348, 'learning_rate': 1.9549630522137084e-05, 'epoch': 0.52}
▏    | 896/1726 [15:28:35<14:01:40, 60.84s/it]
 52%|█████▏    | 897/1726 [15:29:38<14:09:44, 61.50s/it]


 52%|█████▏    | 897/1726 [15:29:39<14:09:44, 61.50s/it]
 52%|█████▏    | 898/1726 [15:30:39<14:04:38, 61.21s/it]


 52%|█████▏    | 898/1726 [15:30:39<14:04:38, 61.21s/it]
 52%|█████▏    | 899/1726 [15:31:40<14:01:01, 61.02s/it]


 52%|█████▏    | 899/1726 [15:31:40<14:01:01, 61.02s/it]
 52%|█████▏    | 900/1726 [15:32:42<14:04:09, 61.32s/it]


 52%|█████▏    | 900/1726 [15:32:42<14:04:09, 61.32s/it]
 52%|█████▏    | 901/1726 [15:33:42<13:59:50, 61.08s/it]


 52%|█████▏    | 901/1726 [15:33:42<13:59:50, 61.08sdynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2622
[2024-06-10 16:11:07,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.22 | bwd_microstep: 1103.52 | bwd_inner_microstep: 1103.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888
[2024-06-10 16:11:09,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1581.16 | bwd_inner_microstep: 1581.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3853
[2024-06-10 16:11:11,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1361.39 | bwd_inner_microstep: 1361.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788
[2024-06-10 16:11:13,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.91 | bwd_microstep: 1449.25 | bwd_inner_microstep: 1449.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761
[2024-06-10 16:11:15,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.34 | bwd_microstep: 1339.49 | bwd_inner_microstep: 1339.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 16:11:17,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.73 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 16:11:19,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 1381.92 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 16:11:21,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1483.47 | bwd_inner_microstep: 1483.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 16:11:22,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3727
[2024-06-10 16:11:24,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.50 | bwd_microstep: 1483.41 | bwd_inner_microstep: 1483.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 16:11:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1370.43 | bwd_inner_microstep: 1370.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 16:11:28,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1240.15 | bwd_inner_microstep: 1240.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 16:11:30,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.92 | bwd_microstep: 1474.46 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3658
[2024-06-10 16:11:32,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.76 | bwd_microstep: 1461.15 | bwd_inner_microstep: 1461.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 16:11:34,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.43 | bwd_microstep: 1310.30 | bwd_inner_microstep: 1310.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2156
[2024-06-10 16:11:35,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.57 | bwd_microstep: 946.83 | bwd_inner_microstep: 946.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 16:11:37,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1415.22 | bwd_inner_microstep: 1415.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123
[2024-06-10 16:11:38,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.09 | bwd_microstep: 828.68 | bwd_inner_microstep: 828.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3685
[2024-06-10 16:11:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1324.91 | bwd_inner_microstep: 1324.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 16:11:42,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1506.91 | bwd_inner_microstep: 1506.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 16:11:44,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.54 | bwd_microstep: 1153.72 | bwd_inner_microstep: 1153.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 16:11:46,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.09 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 16:11:48,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.63 | bwd_microstep: 1502.90 | bwd_inner_microstep: 1502.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 16:11:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1407.05 | bwd_inner_microstep: 1407.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2273
[2024-06-10 16:11:51,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.56 | bwd_microstep: 936.62 | bwd_inner_microstep: 936.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3813
[2024-06-10 16:11:53,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1386.85 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564
[2024-06-10 16:11:55,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.40 | bwd_microstep: 1330.20 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-10 16:11:57,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1597.15 | bwd_inner_microstep: 1597.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3395
[2024-06-10 16:11:59,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.02 | bwd_microstep: 1439.39 | bwd_inner_microstep: 1439.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 16:12:01,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 1644.15 | bwd_inner_microstep: 1644.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-10 16:12:04,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.08 | bwd_microstep: 1751.16 | bwd_inner_microstep: 1751.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 16:12:08,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 16:12:08,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.78 | bwd_microstep: 3500.06 | bwd_inner_microstep: 1710.70 | bwd_allreduce_microstep: 1789.31 | step_microstep: 37.99
[2024-06-10 16:12:08,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16374.37 | bwd: 45740.27 | bwd_inner: 43950.03 | bwd_allreduce: 1789.55 | step: 39.44
{'loss': 1.2557, 'learning_rate': 1.951210689245371e-05, 'epoch': 0.52}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 16:12:10,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.90 | bwd_microstep: 1327.46 | bwd_inner_microstep: 1327.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3998
[2024-06-10 16:12:12,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.23 | bwd_microstep: 1606.27 | bwd_inner_microstep: 1606.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 16:12:14,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.80 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-10 16:12:16,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.81 | bwd_microstep: 1545.20 | bwd_inner_microstep: 1545.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2212
[2024-06-10 16:12:17,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.83 | bwd_microstep: 956.25 | bwd_inner_microstep: 956.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 16:12:20,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1539.43 | bwd_inner_microstep: 1539.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870
[2024-06-10 16:12:20,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.37 | bwd_microstep: 708.20 | bwd_inner_microstep: 708.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-10 16:12:21,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.88 | bwd_microstep: 710.26 | bwd_inner_microstep: 710.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 16:12:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1250.98 | bwd_inner_microstep: 1250.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404
[2024-06-10 16:12:25,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.78 | bwd_microstep: 1307.88 | bwd_inner_microstep: 1307.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-10 16:12:27,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.42 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3017
[2024-06-10 16:12:29,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.26 | bwd_microstep: 1227.79 | bwd_inner_microstep: 1227.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3982
[2024-06-10 16:12:31,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.60 | bwd_microstep: 1605.05 | bwd_inner_microstep: 1605.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 16:12:33,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1371.12 | bwd_inner_microstep: 1371.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118
[2024-06-10 16:12:34,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.38 | bwd_microstep: 826.98 | bwd_inner_microstep: 826.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3645
[2024-06-10 16:12:36,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 16:12:38,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1417.42 | bwd_inner_microstep: 1417.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3636
[2024-06-10 16:12:40,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.05 | bwd_microstep: 1248.70 | bwd_inner_microstep: 1248.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-10 16:12:41,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.72 | bwd_microstep: 697.70 | bwd_inner_microstep: 697.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 16:12:42,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.95 | bwd_microstep: 800.45 | bwd_inner_microstep: 800.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 16:12:44,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1402.21 | bwd_inner_microstep: 1402.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 16:12:46,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.46 | bwd_microstep: 1585.89 | bwd_inner_microstep: 1585.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 16:12:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1524.25 | bwd_inner_microstep: 1524.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 16:12:50,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.57 | bwd_microstep: 1422.93 | bwd_inner_microstep: 1422.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 16:12:52,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1278.45 | bwd_inner_microstep: 1278.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 16:12:54,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1457.03 | bwd_inner_microstep: 1457.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865
[2024-06-10 16:12:56,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.19 | bwd_microstep: 1667.71 | bwd_inner_microstep: 1667.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-10 16:12:58,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.33 | bwd_microstep: 1431.19 | bwd_inner_microstep: 1431.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 16:13:00,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1633.57 | bwd_inner_microstep: 1633.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 16:13:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.47 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-10 16:13:03,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.80 | bwd_microstep: 1070.52 | bwd_inner_microstep: 1070.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2281
[2024-06-10 16:13:09,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 16:13:09,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.76 | bwd_microstep: 5320.04 | bwd_inner_microstep: 1137.93 | bwd_allreduce_microstep: 4182.05 | step_microstep: 39.09
[2024-06-10 16:13:09,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15315.05 | bwd: 45258.04 | bwd_inner: 41075.07 | bwd_allreduce: 4182.28 | step: 40.66
{'loss': 1.2451, 'learning_rate': 1.947458498112732e-05, 'epoch': 0.52}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 16:13:11,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1397.13 | bwd_inner_microstep: 1397.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 16:13:12,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.86 | bwd_microstep: 1240.00 | bwd_inner_microstep: 1239.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4311
[2024-06-10 16:13:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.77 | bwd_microstep: 1680.94 | bwd_inner_microstep: 1680.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3866
[2024-06-10 16:13:17,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.50 | bwd_microstep: 1661.94 | bwd_inner_microstep: 1661.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701
[2024-06-10 16:13:19,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.34 | bwd_microstep: 1624.01 | bwd_inner_microstep: 1623.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 16:13:21,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.19 | bwd_microstep: 1239.41 | bwd_inner_microstep: 1239.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3512
[2024-06-10 16:13:23,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1248.78 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 16:13:25,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 1483.02 | bwd_inner_microstep: 1482.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2461
[2024-06-10 16:13:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.26 | bwd_microstep: 949.48 | bwd_inner_microstep: 949.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4018
[2024-06-10 16:13:28,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.19 | bwd_microstep: 1617.01 | bwd_inner_microstep: 1616.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 16:13:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1524.76 | bwd_inner_microstep: 1524.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3496
[2024-06-10 16:13:32,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.17 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-10 16:13:34,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.45 | bwd_microstep: 1655.89 | bwd_inner_microstep: 1655.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 16:13:36,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1347.00 | bwd_inner_microstep: 1346.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-10 16:13:39,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.77 | bwd_microstep: 1629.03 | bwd_inner_microstep: 1629.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3460
[2024-06-10 16:13:40,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.04 | bwd_microstep: 1211.79 | bwd_inner_microstep: 1211.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 16:13:42,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1556.43 | bwd_inner_microstep: 1556.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 903
[2024-06-10 16:13:43,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.42 | bwd_microstep: 372.11 | bwd_inner_microstep: 372.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 16:13:45,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.05 | bwd_microstep: 1355.62 | bwd_inner_microstep: 1355.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463
[2024-06-10 16:13:47,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1405.24 | bwd_inner_microstep: 1405.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 16:13:49,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1397.52 | bwd_inner_microstep: 1397.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 16:13:51,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1428.58 | bwd_inner_microstep: 1428.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3857
[2024-06-10 16:13:53,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.19 | bwd_microstep: 1762.56 | bwd_inner_microstep: 1762.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548
[2024-06-10 16:13:55,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1428.59 | bwd_inner_microstep: 1428.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830
[2024-06-10 16:13:57,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.03 | bwd_microstep: 1355.98 | bwd_inner_microstep: 1355.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-10 16:13:59,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.61 | bwd_microstep: 1750.84 | bwd_inner_microstep: 1750.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2112
[2024-06-10 16:14:01,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.39 | bwd_microstep: 921.39 | bwd_inner_microstep: 921.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 16:14:03,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1576.74 | bwd_inner_microstep: 1576.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-10 16:14:05,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.50 | bwd_microstep: 1300.27 | bwd_inner_microstep: 1300.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 16:14:07,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.99 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 16:14:08,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1284.58 | bwd_inner_microstep: 1284.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581
[2024-06-10 16:14:10,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.30 | optimizer_step: 6.63
[2024-06-10 16:14:10,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.66 | bwd_microstep: 1470.00 | bwd_inner_microstep: 1462.04 | bwd_allreduce_microstep: 7.90 | step_microstep: 39.23
[2024-06-10 16:14:10,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16693.75 | bwd: 44684.24 | bwd_inner: 44675.43 | bwd_allreduce: 8.13 | step: 40.70
{'loss': 1.2348, 'learning_rate': 1.9437064920309895e-05, 'epoch': 0.52}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486
[2024-06-10 16:14:12,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1310.70 | bwd_inner_microstep: 1310.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4087
[2024-06-10 16:14:14,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.04 | bwd_microstep: 1522.04 | bwd_inner_microstep: 1522.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 16:14:16,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1244.57 | bwd_inner_microstep: 1244.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 16:14:18,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.75 | bwd_microstep: 1649.82 | bwd_inner_microstep: 1649.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 16:14:21,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.90 | bwd_microstep: 1652.71 | bwd_inner_microstep: 1652.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415
[2024-06-10 16:14:22,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1312.66 | bwd_inner_microstep: 1312.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-10 16:14:23,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.98 | bwd_microstep: 731.34 | bwd_inner_microstep: 731.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-10 16:14:24,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.75 | bwd_microstep: 682.32 | bwd_inner_microstep: 682.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 16:14:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.58 | bwd_microstep: 1283.85 | bwd_inner_microstep: 1283.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3621
[2024-06-10 16:14:28,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1538.46 | bwd_inner_microstep: 1538.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683
[2024-06-10 16:14:30,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 16:14:32,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1347.82 | bwd_inner_microstep: 1347.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 16:14:34,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641
[2024-06-10 16:14:36,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.36 | bwd_microstep: 1607.34 | bwd_inner_microstep: 1607.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 16:14:38,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1374.42 | bwd_inner_microstep: 1374.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3678
[2024-06-10 16:14:40,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.29 | bwd_microstep: 1554.49 | bwd_inner_microstep: 1554.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2103
[2024-06-10 16:14:41,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.56 | bwd_microstep: 856.37 | bwd_inner_microstep: 856.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 16:14:43,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 16:14:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.20 | bwd_microstep: 802.45 | bwd_inner_microstep: 802.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3833
[2024-06-10 16:14:46,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1607.33 | bwd_inner_microstep: 1607.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3677
[2024-06-10 16:14:48,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.62 | bwd_microstep: 1458.95 | bwd_inner_microstep: 1458.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 16:14:50,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.95 | bwd_microstep: 1294.35 | bwd_inner_microstep: 1294.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866
[2024-06-10 16:14:52,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.32 | bwd_microstep: 1567.25 | bwd_inner_microstep: 1567.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 16:14:54,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 16:14:56,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.55 | bwd_microstep: 1399.46 | bwd_inner_microstep: 1399.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 16:14:58,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.59 | bwd_microstep: 1544.97 | bwd_inner_microstep: 1544.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2233
[2024-06-10 16:15:00,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.05 | bwd_microstep: 925.77 | bwd_inner_microstep: 925.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 16:15:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.40 | bwd_microstep: 1751.85 | bwd_inner_microstep: 1751.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 16:15:04,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.64 | bwd_microstep: 1500.70 | bwd_inner_microstep: 1500.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3637
[2024-06-10 16:15:06,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.02 | bwd_microstep: 1579.82 | bwd_inner_microstep: 1579.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-10 16:15:08,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.30 | bwd_microstep: 1649.08 | bwd_inner_microstep: 1649.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-10 16:15:12,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.26 | optimizer_step: 6.62
[2024-06-10 16:15:12,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.64 | bwd_microstep: 3197.56 | bwd_inner_microstep: 1635.46 | bwd_allreduce_microstep: 1562.04 | step_microstep: 38.41
[2024-06-10 16:15:12,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16225.75 | bwd: 45190.28 | bwd_inner: 43627.32 | bwd_allreduce: 1562.28 | step: 39.92
{'loss': 1.2312, 'learning_rate': 1.93995468421469e-05, 'epoch': 0.52}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 16:15:14,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1368.80 | bwd_inner_microstep: 1368.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 16:15:16,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1348.06 | bwd_inner_microstep: 1348.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 16:15:18,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.68 | bwd_microstep: 1472.17 | bwd_inner_microstep: 1472.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 16:15:19,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.09 | bwd_microstep: 809.32 | bwd_inner_microstep: 809.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 16:15:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.99 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 16:15:23,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.56 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 16:15:25,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1306.94 | bwd_inner_microstep: 1306.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 16:15:26,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 16:15:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.09 | bwd_microstep: 1388.16 | bwd_inner_microstep: 1388.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 16:15:30,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1344.43 | bwd_inner_microstep: 1344.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3985
[2024-06-10 16:15:32,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.22 | bwd_microstep: 1603.91 | bwd_inner_microstep: 1603.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 16:15:33,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.28 | bwd_microstep: 789.19 | bwd_inner_microstep: 789.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-10 16:15:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1442.91 | bwd_inner_microstep: 1442.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385
[2024-06-10 16:15:37,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3622
[2024-06-10 16:15:39,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.22 | bwd_microstep: 1469.07 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3860
[2024-06-10 16:15:42,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.46 | bwd_microstep: 1593.04 | bwd_inner_microstep: 1593.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 16:15:44,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.89 | bwd_microstep: 1403.09 | bwd_inner_microstep: 1403.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 16:15:45,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.64 | bwd_microstep: 1322.41 | bwd_inner_microstep: 1322.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 16:15:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.05 | bwd_microstep: 1655.87 | bwd_inner_microstep: 1655.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 16:15:50,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 1555.99 | bwd_inner_microstep: 1555.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 16:15:52,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1453.58 | bwd_inner_microstep: 1453.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 16:15:54,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1397.93 | bwd_inner_microstep: 1397.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 16:15:56,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 1555.60 | bwd_inner_microstep: 1555.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 16:15:58,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1359.17 | bwd_inner_microstep: 1359.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1927
[2024-06-10 16:15:59,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.15 | bwd_microstep: 806.78 | bwd_inner_microstep: 806.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2056
[2024-06-10 16:16:00,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.23 | bwd_microstep: 860.58 | bwd_inner_microstep: 860.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 16:16:02,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1301.18 | bwd_inner_microstep: 1301.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 16:16:04,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.51 | bwd_microstep: 1190.13 | bwd_inner_microstep: 1190.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 16:16:06,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.65 | bwd_microstep: 1470.97 | bwd_inner_microstep: 1470.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 16:16:08,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.57 | bwd_microstep: 1445.53 | bwd_inner_microstep: 1445.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3543
[2024-06-10 16:16:10,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3557
[2024-06-10 16:16:14,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 16:16:14,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.53 | bwd_microstep: 3269.79 | bwd_inner_microstep: 1741.60 | bwd_allreduce_microstep: 1528.15 | step_microstep: 38.43
[2024-06-10 16:16:14,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.20 | bwd: 44812.73 | bwd_inner: 43283.69 | bwd_allreduce: 1528.37 | step: 39.90
{'loss': 1.2117, 'learning_rate': 1.936203087877681e-05, 'epoch': 0.52}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 16:16:15,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1363.19 | bwd_inner_microstep: 1363.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 16:16:17,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1375.97 | bwd_inner_microstep: 1375.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 16:16:19,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.32 | bwd_microstep: 1340.35 | bwd_inner_microstep: 1340.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3417
[2024-06-10 16:16:21,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.91 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 16:16:23,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1246.43 | bwd_inner_microstep: 1246.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3776
[2024-06-10 16:16:25,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.31 | bwd_microstep: 1343.48 | bwd_inner_microstep: 1343.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 16:16:26,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 16:16:28,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1245.55 | bwd_inner_microstep: 1245.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-10 16:16:30,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.53 | bwd_microstep: 1185.99 | bwd_inner_microstep: 1185.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924
[2024-06-10 16:16:31,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.27 | bwd_microstep: 848.52 | bwd_inner_microstep: 848.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 16:16:33,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1481.89 | bwd_inner_microstep: 1481.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3470
[2024-06-10 16:16:35,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1531.62 | bwd_inner_microstep: 1531.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939
[2024-06-10 16:16:37,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 1684.62 | bwd_inner_microstep: 1684.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3483
[2024-06-10 16:16:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1431.23 | bwd_inner_microstep: 1431.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 16:16:41,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1251.09 | bwd_inner_microstep: 1251.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 16:16:43,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1392.22 | bwd_inner_microstep: 1392.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 16:16:45,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.41 | bwd_microstep: 1178.72 | bwd_inner_microstep: 1178.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 16:16:47,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 16:16:48,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.74 | bwd_microstep: 973.36 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843
[2024-06-10 16:16:50,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1464.34 | bwd_inner_microstep: 1464.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 16:16:52,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743
[2024-06-10 16:16:54,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1440.49 | bwd_inner_microstep: 1440.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 16:16:56,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1395.78 | bwd_inner_microstep: 1395.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547
[2024-06-10 16:16:58,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1520.58 | bwd_inner_microstep: 1520.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 16:17:00,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1397.96 | bwd_inner_microstep: 1397.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 16:17:02,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.95 | bwd_microstep: 1429.07 | bwd_inner_microstep: 1429.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930
[2024-06-10 16:17:03,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.63 | bwd_microstep: 763.15 | bwd_inner_microstep: 763.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2193
[2024-06-10 16:17:04,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.19 | bwd_microstep: 955.61 | bwd_inner_microstep: 955.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3445
[2024-06-10 16:17:06,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1376.73 | bwd_inner_microstep: 1376.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801
[2024-06-10 16:17:09,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.05 | bwd_microstep: 1748.14 | bwd_inner_microstep: 1748.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 16:17:11,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.81 | bwd_microstep: 1456.31 | bwd_inner_microstep: 1456.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3776
[2024-06-10 16:17:14,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.63
[2024-06-10 16:17:14,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.02 | bwd_microstep: 2751.10 | bwd_inner_microstep: 1784.49 | bwd_allreduce_microstep: 966.55 | step_microstep: 37.48
[2024-06-10 16:17:14,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16128.62 | bwd: 44150.84 | bwd_inner: 43183.38 | bwd_allreduce: 966.78 | step: 38.99
/it]
 52%|█████▏    | 902/1726 [15:34:45<14:04:26, 61.49s/it]


 52%|█████▏    | 902/1726 [15:34:45<14:04:26, 61.49s/it]
 52%|█████▏    | 903/1726 [15:35:45<14:01:00, 61.31s/it]


 52%|█████▏    | 903/1726 [15:35:45<14:01:00, 61.31s/it]
 52%|█████▏    | 904/1726 [15:36:47<14:01:40, 61.44s/it]


 52%|█████▏    | 904/1726 [15:36:47<14:01:40, 61.44s/it]
 52%|█████▏    | 905/1726 [15:37:49<14:01:58, 61.53s/it]


 52%|█████▏    | 905/1726 [15:37:49<14:01:58, 61.53s/it]
 52%|█████▏    | 906/1726 [15:38:50<14:00:00, 61.46s/it]


 52%|█████▏    | 906/1726 [15:38:50<14:00:00, 61.46s/it]
 53%|█████▎    | 907/1726 [{'loss': 1.2343, 'learning_rate': 1.932451716233064e-05, 'epoch': 0.53}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 16:17:16,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1310.05 | bwd_inner_microstep: 1310.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 16:17:18,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1377.77 | bwd_inner_microstep: 1377.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899
[2024-06-10 16:17:20,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.54 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1482.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-10 16:17:21,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.86 | bwd_microstep: 800.08 | bwd_inner_microstep: 800.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 16:17:23,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3505
[2024-06-10 16:17:25,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1249.62 | bwd_inner_microstep: 1249.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 16:17:26,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.82 | bwd_microstep: 1276.64 | bwd_inner_microstep: 1276.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 16:17:28,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.29 | bwd_microstep: 1421.01 | bwd_inner_microstep: 1420.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 16:17:30,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.41 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-10 16:17:33,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1627.59 | bwd_inner_microstep: 1627.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 16:17:35,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.28 | bwd_microstep: 1487.71 | bwd_inner_microstep: 1487.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518
[2024-06-10 16:17:36,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.89 | bwd_microstep: 1227.48 | bwd_inner_microstep: 1227.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930
[2024-06-10 16:17:37,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.24 | bwd_microstep: 761.12 | bwd_inner_microstep: 761.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-10 16:17:39,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.34 | bwd_microstep: 1446.53 | bwd_inner_microstep: 1446.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 16:17:42,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.40 | bwd_microstep: 1717.05 | bwd_inner_microstep: 1717.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3501
[2024-06-10 16:17:44,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.55 | bwd_microstep: 1612.64 | bwd_inner_microstep: 1612.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 16:17:46,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.41 | bwd_microstep: 1299.81 | bwd_inner_microstep: 1299.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 16:17:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1390.21 | bwd_inner_microstep: 1390.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 16:17:49,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.66 | bwd_microstep: 801.01 | bwd_inner_microstep: 800.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 16:17:51,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 16:17:52,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.46 | bwd_microstep: 796.52 | bwd_inner_microstep: 796.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 16:17:54,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1485.88 | bwd_inner_microstep: 1485.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920
[2024-06-10 16:17:55,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.88 | bwd_microstep: 687.31 | bwd_inner_microstep: 687.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 16:17:57,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1380.54 | bwd_inner_microstep: 1380.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 16:17:59,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1412.13 | bwd_inner_microstep: 1412.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3552
[2024-06-10 16:18:01,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.63 | bwd_microstep: 1526.05 | bwd_inner_microstep: 1526.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 16:18:03,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1257.13 | bwd_inner_microstep: 1257.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3757
[2024-06-10 16:18:05,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.19 | bwd_microstep: 1738.70 | bwd_inner_microstep: 1738.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 16:18:07,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.94 | bwd_microstep: 1597.79 | bwd_inner_microstep: 1597.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 16:18:09,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.95 | bwd_microstep: 1548.51 | bwd_inner_microstep: 1548.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 16:18:11,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1357.26 | bwd_inner_microstep: 1357.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3572
[2024-06-10 16:18:15,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.62
[2024-06-10 16:18:15,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.74 | bwd_microstep: 3263.75 | bwd_inner_microstep: 1878.84 | bwd_allreduce_microstep: 1384.86 | step_microstep: 39.40
[2024-06-10 16:18:15,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16059.88 | bwd: 44582.07 | bwd_inner: 43196.31 | bwd_allreduce: 1385.08 | step: 40.84
{'loss': 1.2875, 'learning_rate': 1.9287005824931514e-05, 'epoch': 0.53}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878
[2024-06-10 16:18:16,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.00 | bwd_microstep: 738.12 | bwd_inner_microstep: 737.98 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 16:18:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 16:18:20,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1283.07 | bwd_inner_microstep: 1283.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868
[2024-06-10 16:18:22,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.72 | bwd_microstep: 1663.85 | bwd_inner_microstep: 1663.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 16:18:24,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1482.19 | bwd_inner_microstep: 1482.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 16:18:26,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.04 | bwd_microstep: 1531.87 | bwd_inner_microstep: 1531.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3738
[2024-06-10 16:18:29,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1633.73 | bwd_inner_microstep: 1633.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2047
[2024-06-10 16:18:30,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.28 | bwd_microstep: 719.19 | bwd_inner_microstep: 719.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1890
[2024-06-10 16:18:30,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 683.07 | bwd_inner_microstep: 683.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1925
[2024-06-10 16:18:32,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.55 | bwd_microstep: 817.98 | bwd_inner_microstep: 817.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2397
[2024-06-10 16:18:33,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.55 | bwd_microstep: 958.40 | bwd_inner_microstep: 958.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 16:18:35,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.35 | bwd_microstep: 1617.43 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-10 16:18:38,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.44 | bwd_microstep: 1706.16 | bwd_inner_microstep: 1706.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3639
[2024-06-10 16:18:40,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.57 | bwd_microstep: 1709.94 | bwd_inner_microstep: 1709.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 16:18:41,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.55 | bwd_microstep: 1186.25 | bwd_inner_microstep: 1186.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 16:18:43,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1316.69 | bwd_inner_microstep: 1316.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 16:18:45,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1486.79 | bwd_inner_microstep: 1486.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 16:18:47,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.63 | bwd_microstep: 1276.61 | bwd_inner_microstep: 1276.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 16:18:48,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.13 | bwd_microstep: 698.75 | bwd_inner_microstep: 698.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 16:18:50,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.60 | bwd_microstep: 1551.80 | bwd_inner_microstep: 1551.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 16:18:51,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 799.13 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3608
[2024-06-10 16:18:53,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1461.60 | bwd_inner_microstep: 1461.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 16:18:55,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.10 | bwd_microstep: 1499.16 | bwd_inner_microstep: 1499.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 16:18:58,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1497.35 | bwd_inner_microstep: 1497.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506
[2024-06-10 16:19:00,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1549.35 | bwd_inner_microstep: 1549.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 16:19:02,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.85 | bwd_microstep: 1752.25 | bwd_inner_microstep: 1752.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3589
[2024-06-10 16:19:04,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.36 | bwd_microstep: 1671.29 | bwd_inner_microstep: 1671.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1936
[2024-06-10 16:19:05,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.10 | bwd_microstep: 758.95 | bwd_inner_microstep: 758.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3860
[2024-06-10 16:19:08,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.39 | bwd_microstep: 1594.47 | bwd_inner_microstep: 1594.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580
[2024-06-10 16:19:10,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1455.29 | bwd_inner_microstep: 1455.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 16:19:12,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.83 | bwd_microstep: 1544.39 | bwd_inner_microstep: 1544.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3416
[2024-06-10 16:19:18,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 16:19:18,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 5656.98 | bwd_inner_microstep: 1562.56 | bwd_allreduce_microstep: 4094.37 | step_microstep: 38.42
[2024-06-10 16:19:18,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15797.60 | bwd: 46692.45 | bwd_inner: 42597.08 | bwd_allreduce: 4094.64 | step: 39.88
{'loss': 1.24, 'learning_rate': 1.9249496998694168e-05, 'epoch': 0.53}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 16:19:20,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1366.89 | bwd_inner_microstep: 1366.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-10 16:19:22,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 1447.50 | bwd_inner_microstep: 1447.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3479
[2024-06-10 16:19:24,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.43 | bwd_microstep: 1442.49 | bwd_inner_microstep: 1442.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-10 16:19:26,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.29 | bwd_microstep: 1276.00 | bwd_inner_microstep: 1275.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 16:19:27,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.18 | bwd_microstep: 1248.03 | bwd_inner_microstep: 1248.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 16:19:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.70 | bwd_microstep: 698.25 | bwd_inner_microstep: 698.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 16:19:30,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.46 | bwd_microstep: 1451.29 | bwd_inner_microstep: 1451.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 16:19:32,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.27 | bwd_microstep: 1245.09 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 16:19:34,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1384.39 | bwd_inner_microstep: 1384.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 16:19:36,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.52 | bwd_inner_microstep: 1254.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 16:19:38,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1486.02 | bwd_inner_microstep: 1485.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573
[2024-06-10 16:19:40,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.19 | bwd_microstep: 1432.46 | bwd_inner_microstep: 1432.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 16:19:42,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1483.26 | bwd_inner_microstep: 1483.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 16:19:44,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.27 | bwd_microstep: 1335.62 | bwd_inner_microstep: 1335.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-10 16:19:45,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.02 | bwd_microstep: 817.78 | bwd_inner_microstep: 817.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3518
[2024-06-10 16:19:47,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.50 | bwd_microstep: 1334.45 | bwd_inner_microstep: 1334.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 16:19:48,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.01 | bwd_microstep: 1254.95 | bwd_inner_microstep: 1254.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 16:19:50,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1395.79 | bwd_inner_microstep: 1395.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 16:19:52,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.81 | bwd_microstep: 1530.13 | bwd_inner_microstep: 1530.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-10 16:19:54,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.13 | bwd_microstep: 1180.68 | bwd_inner_microstep: 1180.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 16:19:56,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1312.29 | bwd_inner_microstep: 1312.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 16:19:58,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1398.42 | bwd_inner_microstep: 1398.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 16:20:00,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1397.14 | bwd_inner_microstep: 1397.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 16:20:02,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1385.29 | bwd_inner_microstep: 1385.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2179
[2024-06-10 16:20:03,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.99 | bwd_microstep: 917.68 | bwd_inner_microstep: 917.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717
[2024-06-10 16:20:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.80 | bwd_microstep: 1343.82 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 16:20:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.82 | bwd_microstep: 1602.33 | bwd_inner_microstep: 1602.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 16:20:09,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.45 | bwd_microstep: 1444.45 | bwd_inner_microstep: 1444.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 16:20:11,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1407.57 | bwd_inner_microstep: 1407.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 16:20:13,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.69 | bwd_microstep: 1645.85 | bwd_inner_microstep: 1645.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588
[2024-06-10 16:20:15,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.87 | bwd_microstep: 1702.45 | bwd_inner_microstep: 1702.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 16:20:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 16:20:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.66 | bwd_microstep: 4283.37 | bwd_inner_microstep: 1817.99 | bwd_allreduce_microstep: 2465.33 | step_microstep: 37.94
[2024-06-10 16:20:20,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.46 | bwd: 45906.26 | bwd_inner: 43440.02 | bwd_allreduce: 2465.56 | step: 39.36
{'loss': 1.2881, 'learning_rate': 1.9211990815724496e-05, 'epoch': 0.53}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3560
[2024-06-10 16:20:22,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.79 | bwd_microstep: 1513.73 | bwd_inner_microstep: 1513.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3396
[2024-06-10 16:20:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.36 | bwd_microstep: 1144.48 | bwd_inner_microstep: 1144.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4343
[2024-06-10 16:20:26,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.08 | bwd_microstep: 1698.65 | bwd_inner_microstep: 1698.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 16:20:28,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.14 | bwd_microstep: 1478.20 | bwd_inner_microstep: 1478.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 16:20:30,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.26 | bwd_microstep: 1186.27 | bwd_inner_microstep: 1186.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 16:20:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.23 | bwd_microstep: 790.91 | bwd_inner_microstep: 790.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 16:20:33,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.16 | bwd_microstep: 1291.84 | bwd_inner_microstep: 1291.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 16:20:35,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.98 | bwd_microstep: 1529.53 | bwd_inner_microstep: 1529.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 16:20:37,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.95 | bwd_microstep: 1525.54 | bwd_inner_microstep: 1525.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3627
[2024-06-10 16:20:39,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.97 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3669
[2024-06-10 16:20:41,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.97 | bwd_microstep: 1482.52 | bwd_inner_microstep: 1482.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 16:20:43,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.15 | bwd_microstep: 1520.88 | bwd_inner_microstep: 1520.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3665
[2024-06-10 16:20:45,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1451.10 | bwd_inner_microstep: 1451.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3960
[2024-06-10 16:20:47,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.36 | bwd_microstep: 1793.30 | bwd_inner_microstep: 1793.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 16:20:50,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.45 | bwd_microstep: 1615.68 | bwd_inner_microstep: 1615.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3717
[2024-06-10 16:20:52,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.38 | bwd_microstep: 1477.27 | bwd_inner_microstep: 1477.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 16:20:54,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1507.09 | bwd_inner_microstep: 1507.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-10 16:20:56,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1401.21 | bwd_inner_microstep: 1401.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 16:20:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1337.18 | bwd_inner_microstep: 1337.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130
[2024-06-10 16:20:59,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.50 | bwd_microstep: 832.13 | bwd_inner_microstep: 832.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 16:21:00,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.46 | bwd_microstep: 1188.51 | bwd_inner_microstep: 1188.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-10 16:21:02,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1433.45 | bwd_inner_microstep: 1433.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689
[2024-06-10 16:21:04,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1431.58 | bwd_inner_microstep: 1431.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 16:21:05,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.78 | bwd_microstep: 806.75 | bwd_inner_microstep: 806.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 16:21:07,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1301.57 | bwd_inner_microstep: 1301.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 16:21:09,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.25 | bwd_microstep: 975.58 | bwd_inner_microstep: 975.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3573
[2024-06-10 16:21:10,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.03 | bwd_microstep: 1334.02 | bwd_inner_microstep: 1333.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 16:21:13,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.25 | bwd_microstep: 1600.44 | bwd_inner_microstep: 1600.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-10 16:21:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1419.22 | bwd_inner_microstep: 1419.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 16:21:17,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2448
[2024-06-10 16:21:18,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.09 | bwd_microstep: 853.77 | bwd_inner_microstep: 853.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 16:21:22,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.33 | optimizer_step: 6.58
[2024-06-10 16:21:22,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.63 | bwd_microstep: 4072.96 | bwd_inner_microstep: 1106.98 | bwd_allreduce_microstep: 2965.91 | step_microstep: 38.64
[2024-06-10 16:21:22,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16011.80 | bwd: 45801.30 | bwd_inner: 42834.48 | bwd_allreduce: 2966.15 | step: 40.15
{'loss': 1.2445, 'learning_rate': 1.9174487408119067e-05, 'epoch': 0.53}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 16:21:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.30 | bwd_microstep: 1237.35 | bwd_inner_microstep: 1237.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4299
[2024-06-10 16:21:26,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.60 | bwd_microstep: 1675.27 | bwd_inner_microstep: 1675.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 16:21:29,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.86 | bwd_microstep: 1478.60 | bwd_inner_microstep: 1478.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-10 16:21:30,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.62 | bwd_microstep: 1216.24 | bwd_inner_microstep: 1216.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 16:21:32,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1350.80 | bwd_inner_microstep: 1350.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 16:21:34,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1246.00 | bwd_inner_microstep: 1245.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480
[2024-06-10 16:21:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.32 | bwd_microstep: 1184.08 | bwd_inner_microstep: 1184.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 16:21:37,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1251.86 | bwd_inner_microstep: 1251.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 16:21:39,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.27 | bwd_microstep: 1659.81 | bwd_inner_microstep: 1659.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 16:21:41,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.03 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3450
[2024-06-10 16:21:43,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.55 | bwd_microstep: 1334.27 | bwd_inner_microstep: 1334.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846
[2024-06-10 16:21:44,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.12 | bwd_microstep: 670.81 | bwd_inner_microstep: 670.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 16:21:46,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-10 16:21:48,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092
[2024-06-10 16:21:49,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.29 | bwd_microstep: 919.60 | bwd_inner_microstep: 919.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644
[2024-06-10 16:21:51,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1434.69 | bwd_inner_microstep: 1434.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 16:21:53,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1287.52 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 16:21:55,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1313.58 | bwd_inner_microstep: 1313.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 16:21:56,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.70 | bwd_microstep: 1383.38 | bwd_inner_microstep: 1383.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 16:21:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 16:22:00,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.33 | bwd_microstep: 972.31 | bwd_inner_microstep: 972.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 16:22:01,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1394.54 | bwd_inner_microstep: 1394.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 16:22:03,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.51 | bwd_microstep: 972.77 | bwd_inner_microstep: 972.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 16:22:05,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1564.66 | bwd_inner_microstep: 1564.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 16:22:07,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1298.94 | bwd_inner_microstep: 1298.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3635
[2024-06-10 16:22:09,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.01 | bwd_microstep: 1346.14 | bwd_inner_microstep: 1346.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4067
[2024-06-10 16:22:11,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.69 | bwd_microstep: 1527.65 | bwd_inner_microstep: 1527.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 16:22:13,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.98 | bwd_microstep: 1349.44 | bwd_inner_microstep: 1349.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 16:22:14,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.48 | bwd_microstep: 1298.50 | bwd_inner_microstep: 1298.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3587
[2024-06-10 16:22:16,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.61 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3467
[2024-06-10 16:22:18,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1330.57 | bwd_inner_microstep: 1330.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3582
[2024-06-10 16:22:24,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 16:22:24,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.65 | bwd_microstep: 5227.69 | bwd_inner_microstep: 1648.58 | bwd_allreduce_microstep: 3579.05 | step_microstep: 38.29
[2024-06-10 16:22:24,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15694.69 | bwd: 45478.68 | bwd_inner: 41898.72 | bwd_allreduce: 3579.28 | step: 39.80
15:39:51<13:55:28, 61.21s/it]


 53%|█████▎    | 907/1726 [15:39:51<13:55:28, 61.21s/it]
 53%|█████▎    | 908/1726 [15:40:52<13:53:30, 61.14s/it]


 53%|█████▎    | 908/1726 [15:40:52<13:53:30, 61.14s/it]
 53%|█████▎    | 909/1726 [15:41:55<13:59:24, 61.65s/it]


 53%|█████▎    | 909/1726 [15:41:55<13:59:24, 61.65s/it]
 53%|█████▎    | 910/1726 [15:42:57<14:01:17, 61.86s/it]


 53%|█████▎    | 910/1726 [15:42:57<14:01:17, 61.86s/it]
 53%|█████▎    | 911/1726 [15:43:59<14:01:25, 61.95s/it]


 53%|█████▎    | 911/1726 [15:43:59<14:01:25, 61.95s/it]
 53%|█████▎    | 912/1726 [15:45:01<13:58:34, 61.81s/it]
              {'loss': 1.2131, 'learning_rate': 1.9136986907964694e-05, 'epoch': 0.53}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 16:22:26,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.73 | bwd_microstep: 1235.20 | bwd_inner_microstep: 1235.02 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2409
[2024-06-10 16:22:27,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.06 | bwd_microstep: 998.16 | bwd_inner_microstep: 998.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463
[2024-06-10 16:22:29,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1403.70 | bwd_inner_microstep: 1403.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3391
[2024-06-10 16:22:31,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.42 | bwd_microstep: 1144.39 | bwd_inner_microstep: 1144.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 16:22:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1254.96 | bwd_inner_microstep: 1254.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-10 16:22:34,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.92 | bwd_microstep: 1543.79 | bwd_inner_microstep: 1543.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-10 16:22:36,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.83 | bwd_microstep: 1179.20 | bwd_inner_microstep: 1179.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 16:22:38,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.62 | bwd_microstep: 1150.45 | bwd_inner_microstep: 1150.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431
[2024-06-10 16:22:39,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.30 | bwd_microstep: 1317.46 | bwd_inner_microstep: 1317.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 16:22:41,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1378.14 | bwd_inner_microstep: 1378.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3416
[2024-06-10 16:22:43,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.86 | bwd_microstep: 1211.41 | bwd_inner_microstep: 1211.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 16:22:44,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.35 | bwd_microstep: 889.24 | bwd_inner_microstep: 889.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 16:22:45,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3886
[2024-06-10 16:22:48,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.44 | bwd_microstep: 1849.68 | bwd_inner_microstep: 1849.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 16:22:50,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 16:22:52,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.92 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 16:22:54,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2094
[2024-06-10 16:22:55,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.92 | bwd_microstep: 822.13 | bwd_inner_microstep: 822.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 16:22:57,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1297.33 | bwd_inner_microstep: 1297.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3669
[2024-06-10 16:22:59,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1376.59 | bwd_inner_microstep: 1376.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 16:23:01,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.82 | bwd_microstep: 1485.86 | bwd_inner_microstep: 1485.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 591
[2024-06-10 16:23:01,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.31 | bwd_microstep: 256.74 | bwd_inner_microstep: 256.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 16:23:03,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.00 | bwd_microstep: 1491.53 | bwd_inner_microstep: 1491.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 16:23:05,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1529.37 | bwd_inner_microstep: 1529.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947
[2024-06-10 16:23:06,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.18 | bwd_microstep: 729.20 | bwd_inner_microstep: 729.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 16:23:08,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.91 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 16:23:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1469.39 | bwd_inner_microstep: 1469.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3555
[2024-06-10 16:23:13,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.96 | bwd_microstep: 1591.49 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3485
[2024-06-10 16:23:14,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.64 | bwd_microstep: 1335.69 | bwd_inner_microstep: 1335.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3772
[2024-06-10 16:23:16,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1436.40 | bwd_inner_microstep: 1436.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954
[2024-06-10 16:23:18,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.39 | bwd_microstep: 854.69 | bwd_inner_microstep: 854.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 16:23:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.28 | optimizer_step: 6.60
[2024-06-10 16:23:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 6279.98 | bwd_inner_microstep: 1756.79 | bwd_allreduce_microstep: 4523.13 | step_microstep: 39.07
[2024-06-10 16:23:24,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.87 | bwd: 45075.39 | bwd_inner: 40551.22 | bwd_allreduce: 4523.44 | step: 40.77
{'loss': 1.2771, 'learning_rate': 1.9099489447337946e-05, 'epoch': 0.53}
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1415
[2024-06-10 16:23:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.33 | bwd_microstep: 524.68 | bwd_inner_microstep: 524.60 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2435
[2024-06-10 16:23:27,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.02 | bwd_microstep: 1007.56 | bwd_inner_microstep: 1007.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 16:23:29,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1376.33 | bwd_inner_microstep: 1376.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-10 16:23:30,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.26 | bwd_microstep: 967.42 | bwd_inner_microstep: 967.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506
[2024-06-10 16:23:32,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.56 | bwd_microstep: 1188.43 | bwd_inner_microstep: 1188.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 16:23:33,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1412.33 | bwd_inner_microstep: 1412.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760
[2024-06-10 16:23:35,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1434.41 | bwd_inner_microstep: 1434.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3429
[2024-06-10 16:23:37,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.05 | bwd_microstep: 1215.08 | bwd_inner_microstep: 1215.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 16:23:39,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1286.33 | bwd_inner_microstep: 1286.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 16:23:41,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1248.22 | bwd_inner_microstep: 1248.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3759
[2024-06-10 16:23:43,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.83 | bwd_microstep: 1471.05 | bwd_inner_microstep: 1471.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-10 16:23:45,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1421.86 | bwd_inner_microstep: 1421.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1985
[2024-06-10 16:23:46,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.22 | bwd_microstep: 860.77 | bwd_inner_microstep: 860.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 16:23:47,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1163.51 | bwd_inner_microstep: 1163.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 16:23:49,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1393.54 | bwd_inner_microstep: 1393.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057
[2024-06-10 16:23:51,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.79 | bwd_microstep: 849.23 | bwd_inner_microstep: 849.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2368
[2024-06-10 16:23:52,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.91 | bwd_microstep: 894.69 | bwd_inner_microstep: 894.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 16:23:54,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1341.81 | bwd_inner_microstep: 1341.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 16:23:56,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.75 | bwd_microstep: 1487.35 | bwd_inner_microstep: 1487.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 16:23:58,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1404.52 | bwd_inner_microstep: 1404.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 16:24:00,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.41 | bwd_microstep: 1484.37 | bwd_inner_microstep: 1484.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 16:24:02,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.49 | bwd_microstep: 1387.85 | bwd_inner_microstep: 1387.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 16:24:03,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1374.71 | bwd_inner_microstep: 1374.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 16:24:05,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1296.33 | bwd_inner_microstep: 1296.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3020
[2024-06-10 16:24:07,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.83 | bwd_microstep: 1235.36 | bwd_inner_microstep: 1235.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3573
[2024-06-10 16:24:09,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.65 | bwd_microstep: 1633.90 | bwd_inner_microstep: 1633.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 16:24:10,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.14 | bwd_microstep: 803.85 | bwd_inner_microstep: 803.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565
[2024-06-10 16:24:12,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.56 | bwd_microstep: 1330.26 | bwd_inner_microstep: 1330.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 16:24:14,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1544.54 | bwd_inner_microstep: 1544.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2958
[2024-06-10 16:24:16,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.55 | bwd_microstep: 1199.97 | bwd_inner_microstep: 1199.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 16:24:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795
[2024-06-10 16:24:25,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 16:24:25,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 6850.52 | bwd_inner_microstep: 1750.96 | bwd_allreduce_microstep: 5099.51 | step_microstep: 37.99
[2024-06-10 16:24:25,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15132.25 | bwd: 45567.72 | bwd_inner: 40467.24 | bwd_allreduce: 5099.78 | step: 39.48
{'loss': 1.1726, 'learning_rate': 1.9061995158304682e-05, 'epoch': 0.53}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3433
[2024-06-10 16:24:27,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.01 | bwd_microstep: 1299.94 | bwd_inner_microstep: 1299.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 16:24:29,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1350.57 | bwd_inner_microstep: 1350.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 16:24:31,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.09 | bwd_microstep: 1644.98 | bwd_inner_microstep: 1644.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 16:24:33,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.64 | bwd_microstep: 1242.93 | bwd_inner_microstep: 1242.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 16:24:35,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1478.76 | bwd_inner_microstep: 1478.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 16:24:37,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.84 | bwd_microstep: 1524.51 | bwd_inner_microstep: 1524.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 16:24:39,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1248.76 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 16:24:41,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1481.76 | bwd_inner_microstep: 1481.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 16:24:43,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1345.55 | bwd_inner_microstep: 1345.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3675
[2024-06-10 16:24:45,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.60 | bwd_microstep: 1823.87 | bwd_inner_microstep: 1823.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 16:24:47,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1251.30 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489
[2024-06-10 16:24:49,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3677
[2024-06-10 16:24:52,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.24 | bwd_microstep: 1823.50 | bwd_inner_microstep: 1823.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3451
[2024-06-10 16:24:54,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.51 | bwd_microstep: 1333.48 | bwd_inner_microstep: 1333.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3609
[2024-06-10 16:24:55,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.60 | bwd_microstep: 1273.38 | bwd_inner_microstep: 1273.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 16:24:57,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1420.59 | bwd_inner_microstep: 1420.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830
[2024-06-10 16:24:59,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.03 | bwd_microstep: 1358.10 | bwd_inner_microstep: 1358.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 16:25:01,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.29 | bwd_microstep: 1183.81 | bwd_inner_microstep: 1183.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 16:25:03,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.10 | bwd_microstep: 1606.27 | bwd_inner_microstep: 1606.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 16:25:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1554.66 | bwd_inner_microstep: 1554.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3842
[2024-06-10 16:25:07,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1266.44 | bwd_inner_microstep: 1266.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-10 16:25:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.71 | bwd_microstep: 1582.47 | bwd_inner_microstep: 1582.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 16:25:11,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1405.22 | bwd_inner_microstep: 1405.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3929
[2024-06-10 16:25:13,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1497.22 | bwd_inner_microstep: 1497.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 16:25:14,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.68 | bwd_microstep: 804.04 | bwd_inner_microstep: 804.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431
[2024-06-10 16:25:16,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.34 | bwd_microstep: 1185.87 | bwd_inner_microstep: 1185.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 16:25:18,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.35 | bwd_microstep: 1659.40 | bwd_inner_microstep: 1659.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2396
[2024-06-10 16:25:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.74 | bwd_microstep: 1081.97 | bwd_inner_microstep: 1081.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777
[2024-06-10 16:25:22,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.14 | bwd_microstep: 1639.91 | bwd_inner_microstep: 1639.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 16:25:24,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.79 | bwd_microstep: 1308.34 | bwd_inner_microstep: 1308.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3742
[2024-06-10 16:25:26,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.71 | bwd_microstep: 1736.36 | bwd_inner_microstep: 1736.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3602
[2024-06-10 16:25:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.65
[2024-06-10 16:25:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1501.11 | bwd_inner_microstep: 1493.36 | bwd_allreduce_microstep: 7.70 | step_microstep: 37.64
[2024-06-10 16:25:28,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16936.66 | bwd: 45492.71 | bwd_inner: 45484.10 | bwd_allreduce: 7.92 | step: 39.10
{'loss': 1.181, 'learning_rate': 1.9024504172919606e-05, 'epoch': 0.53}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 16:25:30,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.38 | bwd_microstep: 1481.10 | bwd_inner_microstep: 1481.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 16:25:32,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1376.70 | bwd_inner_microstep: 1376.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-10 16:25:34,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.48 | bwd_microstep: 1553.60 | bwd_inner_microstep: 1553.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 16:25:36,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.37 | bwd_microstep: 1448.91 | bwd_inner_microstep: 1448.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 16:25:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.91 | bwd_microstep: 1644.97 | bwd_inner_microstep: 1644.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 16:25:41,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 16:25:42,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.97 | bwd_microstep: 1150.37 | bwd_inner_microstep: 1150.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 16:25:43,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.55 | bwd_microstep: 790.60 | bwd_inner_microstep: 790.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3500
[2024-06-10 16:25:45,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1350.55 | bwd_inner_microstep: 1350.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948
[2024-06-10 16:25:46,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.86 | bwd_microstep: 824.99 | bwd_inner_microstep: 824.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3681
[2024-06-10 16:25:48,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.73 | bwd_microstep: 1446.09 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 16:25:50,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1480.06 | bwd_inner_microstep: 1480.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504
[2024-06-10 16:25:52,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.96 | bwd_microstep: 1442.68 | bwd_inner_microstep: 1442.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974
[2024-06-10 16:25:53,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 888.73 | bwd_inner_microstep: 888.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 16:25:56,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.55 | bwd_microstep: 1615.73 | bwd_inner_microstep: 1615.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3638
[2024-06-10 16:25:58,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1572.41 | bwd_inner_microstep: 1572.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 16:26:00,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1386.39 | bwd_inner_microstep: 1386.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3445
[2024-06-10 16:26:01,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.66 | bwd_microstep: 1185.57 | bwd_inner_microstep: 1185.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 16:26:04,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.72 | bwd_microstep: 1544.83 | bwd_inner_microstep: 1544.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 16:26:06,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.48 | bwd_microstep: 1455.06 | bwd_inner_microstep: 1455.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 16:26:08,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1458.81 | bwd_inner_microstep: 1458.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3628
[2024-06-10 16:26:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.49 | bwd_microstep: 1217.33 | bwd_inner_microstep: 1217.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 16:26:11,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.09 | bwd_microstep: 1186.60 | bwd_inner_microstep: 1186.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 16:26:13,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1277.65 | bwd_inner_microstep: 1277.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 16:26:14,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1256.50 | bwd_inner_microstep: 1256.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 16:26:16,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1455.58 | bwd_inner_microstep: 1455.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553
[2024-06-10 16:26:18,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.51 | bwd_microstep: 1328.45 | bwd_inner_microstep: 1328.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066
[2024-06-10 16:26:20,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.37 | bwd_microstep: 915.62 | bwd_inner_microstep: 915.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 16:26:22,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.96 | bwd_microstep: 1546.92 | bwd_inner_microstep: 1546.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3790
[2024-06-10 16:26:24,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.85 | bwd_microstep: 1583.47 | bwd_inner_microstep: 1583.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 16:26:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1590.36 | bwd_inner_microstep: 1590.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3765
[2024-06-10 16:26:31,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 16:26:31,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 4691.27 | bwd_inner_microstep: 1957.91 | bwd_allreduce_microstep: 2733.30 | step_microstep: 38.01
[2024-06-10 16:26:31,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16224.19 | bwd: 46531.70 | bwd_inner: 43797.49 | bwd_allreduce: 2733.53 | step: 39.48
{'loss': 1.2333, 'learning_rate': 1.8987016623225748e-05, 'epoch': 0.53}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 16:26:33,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.51 | bwd_microstep: 1335.01 | bwd_inner_microstep: 1334.85 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 870
[2024-06-10 16:26:34,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.11 | bwd_microstep: 363.83 | bwd_inner_microstep: 363.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 16:26:36,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1498.94 | bwd_inner_microstep: 1498.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3842
[2024-06-10 16:26:38,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1359.73 | bwd_inner_microstep: 1359.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 16:26:40,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.72 | bwd_microstep: 1479.37 | bwd_inner_microstep: 1479.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 16:26:42,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.36 | bwd_microstep: 1649.63 | bwd_inner_microstep: 1649.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 16:26:44,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 16:26:46,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1244.27 | bwd_inner_microstep: 1244.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 16:26:48,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.15 | bwd_microstep: 1629.53 | bwd_inner_microstep: 1629.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3986
[2024-06-10 16:26:50,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.52 | bwd_microstep: 1633.89 | bwd_inner_microstep: 1633.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 16:26:52,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1346.87 | bwd_inner_microstep: 1346.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 16:26:54,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1285.56 | bwd_inner_microstep: 1285.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495
[2024-06-10 16:26:56,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.63 | bwd_microstep: 1435.24 | bwd_inner_microstep: 1435.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507
[2024-06-10 16:26:58,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.39 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-10 16:27:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.37 | bwd_microstep: 1719.66 | bwd_inner_microstep: 1719.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 16:27:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 16:27:04,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1384.46 | bwd_inner_microstep: 1384.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638
[2024-06-10 16:27:06,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.21 | bwd_microstep: 1709.59 | bwd_inner_microstep: 1709.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2171
[2024-06-10 16:27:07,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.77 | bwd_microstep: 917.91 | bwd_inner_microstep: 917.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624
[2024-06-10 16:27:09,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1315.01 | bwd_inner_microstep: 1314.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 16:27:11,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1355.87 | bwd_inner_microstep: 1355.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 16:27:13,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.14 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 16:27:14,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.65 | bwd_microstep: 974.18 | bwd_inner_microstep: 974.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2454
[2024-06-10 16:27:16,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.74 | bwd_microstep: 950.41 | bwd_inner_microstep: 950.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 16:27:17,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1293.92 | bwd_inner_microstep: 1293.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561
[2024-06-10 16:27:19,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.49 | bwd_microstep: 1330.28 | bwd_inner_microstep: 1330.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 16:27:21,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 1388.57 | bwd_inner_microstep: 1388.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 16:27:23,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.73 | bwd_microstep: 1534.81 | bwd_inner_microstep: 1534.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 16:27:26,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.46 | bwd_microstep: 1644.18 | bwd_inner_microstep: 1644.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467
[2024-06-10 16:27:28,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1568.11 | bwd_inner_microstep: 1568.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 16:27:30,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.28 | bwd_microstep: 1450.48 | bwd_inner_microstep: 1450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3805
[2024-06-10 16:27:36,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.03 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 16:27:36,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.78 | bwd_microstep: 5622.19 | bwd_inner_microstep: 1975.22 | bwd_allreduce_microstep: 3646.92 | step_microstep: 38.39
[2024-06-10 16:27:36,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16355.26 | bwd: 47880.74 | bwd_inner: 44232.80 | bwd_allreduce: 3647.21 | step: 39.94
{'loss': 1.234, 'learning_rate': 1.894953264125408e-05, 'epoch': 0.53}


 53%|█████▎    | 912/1726 [15:45:01<13:58:34, 61.81s/it]
 53%|█████▎    | 913/1726 [15:46:01<13:52:21, 61.43s/it]


 53%|█████▎    | 913/1726 [15:46:01<13:52:21, 61.43s/it]
 53%|█████▎    | 914/1726 [15:47:02<13:49:40, 61.31s/it]


 53%|█████▎    | 914/1726 [15:47:02<13:49:40, 61.31s/it]
 53%|█████▎    | 915/1726 [15:48:05<13:54:33, 61.74s/it]


 53%|█████▎    | 915/1726 [15:48:05<13:54:33, 61.74s/it]
 53%|█████▎    | 916/1726 [15:49:08<13:58:58, 62.15s/it]


 53%|█████▎    | 916/1726 [15:49:08<13:58:58, 62.15s/it]
 53%|█████▎    | 917/1726 [15:50:13<14:07:46, 62.88s/it]


dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 16:27:38,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.95 | bwd_microstep: 1477.91 | bwd_inner_microstep: 1477.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 16:27:40,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1246.79 | bwd_inner_microstep: 1246.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3943
[2024-06-10 16:27:42,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1496.53 | bwd_inner_microstep: 1496.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 16:27:44,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 16:27:45,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.03 | bwd_microstep: 776.84 | bwd_inner_microstep: 776.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 16:27:47,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1531.88 | bwd_inner_microstep: 1531.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3724
[2024-06-10 16:27:49,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.98 | bwd_microstep: 1733.29 | bwd_inner_microstep: 1733.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2529
[2024-06-10 16:27:51,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.43 | bwd_microstep: 1029.22 | bwd_inner_microstep: 1029.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478
[2024-06-10 16:27:52,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.60 | bwd_microstep: 1243.48 | bwd_inner_microstep: 1243.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2357
[2024-06-10 16:27:54,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.72 | bwd_microstep: 892.43 | bwd_inner_microstep: 892.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 16:27:56,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1393.33 | bwd_inner_microstep: 1393.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 16:27:57,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.94 | bwd_microstep: 1275.58 | bwd_inner_microstep: 1275.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516
[2024-06-10 16:27:59,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1581.62 | bwd_inner_microstep: 1581.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510
[2024-06-10 16:28:01,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-10 16:28:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1575.97 | bwd_inner_microstep: 1575.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3839
[2024-06-10 16:28:06,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.48 | bwd_microstep: 1605.06 | bwd_inner_microstep: 1605.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 16:28:08,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.39 | bwd_microstep: 1391.34 | bwd_inner_microstep: 1391.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 16:28:10,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.15 | bwd_microstep: 1656.16 | bwd_inner_microstep: 1656.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 16:28:12,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.52 | bwd_microstep: 1274.99 | bwd_inner_microstep: 1274.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-10 16:28:14,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.92 | bwd_microstep: 1447.97 | bwd_inner_microstep: 1447.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 16:28:16,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1324.07 | bwd_inner_microstep: 1324.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3813
[2024-06-10 16:28:18,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.98 | bwd_microstep: 1691.33 | bwd_inner_microstep: 1691.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 16:28:20,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1427.78 | bwd_inner_microstep: 1427.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3641
[2024-06-10 16:28:22,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1538.44 | bwd_inner_microstep: 1538.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 16:28:24,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1374.92 | bwd_inner_microstep: 1374.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513
[2024-06-10 16:28:26,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1439.55 | bwd_inner_microstep: 1439.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599
[2024-06-10 16:28:28,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.17 | bwd_microstep: 1704.42 | bwd_inner_microstep: 1704.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 16:28:30,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.63 | bwd_microstep: 1597.59 | bwd_inner_microstep: 1597.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3554
[2024-06-10 16:28:33,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.73 | bwd_microstep: 1591.18 | bwd_inner_microstep: 1591.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785
[2024-06-10 16:28:35,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.32 | bwd_microstep: 1613.88 | bwd_inner_microstep: 1613.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3384
[2024-06-10 16:28:36,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.57 | bwd_microstep: 1176.08 | bwd_inner_microstep: 1176.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 16:28:39,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-10 16:28:39,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.41 | bwd_microstep: 1631.88 | bwd_inner_microstep: 1624.17 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.68
[2024-06-10 16:28:39,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16913.08 | bwd: 45468.20 | bwd_inner: 45459.65 | bwd_allreduce: 7.88 | step: 39.12
{'loss': 1.2289, 'learning_rate': 1.8912052359022995e-05, 'epoch': 0.53}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 16:28:41,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.36 | bwd_microstep: 1447.57 | bwd_inner_microstep: 1447.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 16:28:43,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.21 | bwd_microstep: 1381.90 | bwd_inner_microstep: 1381.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 16:28:45,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1462.97 | bwd_inner_microstep: 1462.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 16:28:47,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1492.19 | bwd_inner_microstep: 1492.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 16:28:49,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4061
[2024-06-10 16:28:51,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.62 | bwd_microstep: 1718.45 | bwd_inner_microstep: 1718.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 16:28:53,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.69 | bwd_microstep: 1280.39 | bwd_inner_microstep: 1280.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 16:28:55,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.82 | bwd_microstep: 1380.84 | bwd_inner_microstep: 1380.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3698
[2024-06-10 16:28:57,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1389.00 | bwd_inner_microstep: 1388.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1903
[2024-06-10 16:28:58,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.57 | bwd_microstep: 745.66 | bwd_inner_microstep: 745.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3538
[2024-06-10 16:29:00,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.82 | bwd_microstep: 1438.65 | bwd_inner_microstep: 1438.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1966
[2024-06-10 16:29:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.12 | bwd_microstep: 887.34 | bwd_inner_microstep: 887.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-10 16:29:03,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.92 | bwd_microstep: 1572.08 | bwd_inner_microstep: 1572.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3421
[2024-06-10 16:29:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1491.58 | bwd_inner_microstep: 1491.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3634
[2024-06-10 16:29:07,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.14 | bwd_microstep: 1810.60 | bwd_inner_microstep: 1810.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3423
[2024-06-10 16:29:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1277.62 | bwd_inner_microstep: 1277.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 16:29:11,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.69 | bwd_microstep: 1505.92 | bwd_inner_microstep: 1505.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523
[2024-06-10 16:29:13,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1320.72 | bwd_inner_microstep: 1320.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 16:29:15,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1376.62 | bwd_inner_microstep: 1376.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 16:29:17,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1661.91 | bwd_inner_microstep: 1661.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 16:29:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.62 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 16:29:21,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1252.73 | bwd_inner_microstep: 1252.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 16:29:23,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.44 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 16:29:25,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1414.85 | bwd_inner_microstep: 1414.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2010
[2024-06-10 16:29:26,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.15 | bwd_microstep: 773.00 | bwd_inner_microstep: 772.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 16:29:28,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.53 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 16:29:30,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1500.86 | bwd_inner_microstep: 1500.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3549
[2024-06-10 16:29:32,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 16:29:34,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1539.25 | bwd_inner_microstep: 1539.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 16:29:36,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1551.84 | bwd_inner_microstep: 1551.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 16:29:38,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1555.22 | bwd_inner_microstep: 1555.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579
[2024-06-10 16:29:41,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.14 | optimizer_step: 6.61
[2024-06-10 16:29:41,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 2002.04 | bwd_inner_microstep: 1611.36 | bwd_allreduce_microstep: 390.63 | step_microstep: 37.67
[2024-06-10 16:29:41,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16751.61 | bwd: 45351.91 | bwd_inner: 44960.38 | bwd_allreduce: 390.86 | step: 39.12
{'loss': 1.2081, 'learning_rate': 1.887457590853784e-05, 'epoch': 0.53}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1947
[2024-06-10 16:29:42,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.56 | bwd_microstep: 886.66 | bwd_inner_microstep: 886.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 16:29:44,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.02 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 16:29:46,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1270.50 | bwd_inner_microstep: 1270.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470
[2024-06-10 16:29:48,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1337.66 | bwd_inner_microstep: 1337.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 16:29:50,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 16:29:51,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.89 | bwd_microstep: 1340.77 | bwd_inner_microstep: 1340.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 16:29:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.49 | bwd_microstep: 1528.99 | bwd_inner_microstep: 1528.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939
[2024-06-10 16:29:55,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.74 | bwd_microstep: 821.96 | bwd_inner_microstep: 821.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1930
[2024-06-10 16:29:56,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.82 | bwd_microstep: 881.76 | bwd_inner_microstep: 881.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 16:29:58,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1480.87 | bwd_inner_microstep: 1480.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1935
[2024-06-10 16:29:59,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.35 | bwd_microstep: 885.16 | bwd_inner_microstep: 885.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2150
[2024-06-10 16:30:01,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.00 | bwd_microstep: 1041.93 | bwd_inner_microstep: 1041.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3484
[2024-06-10 16:30:03,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.14 | bwd_microstep: 1674.72 | bwd_inner_microstep: 1674.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3690
[2024-06-10 16:30:05,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.10 | bwd_microstep: 1824.31 | bwd_inner_microstep: 1824.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3508
[2024-06-10 16:30:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.01 | bwd_microstep: 1533.90 | bwd_inner_microstep: 1533.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 16:30:10,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1513.49 | bwd_inner_microstep: 1513.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 646
[2024-06-10 16:30:10,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.22 | bwd_microstep: 274.65 | bwd_inner_microstep: 274.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628
[2024-06-10 16:30:12,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.27 | bwd_microstep: 1535.13 | bwd_inner_microstep: 1535.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 16:30:14,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1511.40 | bwd_inner_microstep: 1511.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 16:30:16,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1651.54 | bwd_inner_microstep: 1651.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 16:30:19,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1507.87 | bwd_inner_microstep: 1507.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 16:30:21,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1404.58 | bwd_inner_microstep: 1404.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 16:30:23,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1554.91 | bwd_inner_microstep: 1554.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 16:30:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1393.53 | bwd_inner_microstep: 1393.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792
[2024-06-10 16:30:27,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1553.83 | bwd_inner_microstep: 1553.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3807
[2024-06-10 16:30:29,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.02 | bwd_microstep: 1578.81 | bwd_inner_microstep: 1578.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2322
[2024-06-10 16:30:30,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.09 | bwd_microstep: 986.91 | bwd_inner_microstep: 986.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3544
[2024-06-10 16:30:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.21 | bwd_microstep: 1518.58 | bwd_inner_microstep: 1518.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 16:30:34,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1514.03 | bwd_inner_microstep: 1514.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431
[2024-06-10 16:30:36,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.00 | bwd_microstep: 1309.16 | bwd_inner_microstep: 1309.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3790
[2024-06-10 16:30:38,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3773
[2024-06-10 16:30:41,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 16:30:41,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 2572.89 | bwd_inner_microstep: 1666.03 | bwd_allreduce_microstep: 906.82 | step_microstep: 37.67
[2024-06-10 16:30:41,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15973.11 | bwd: 44006.65 | bwd_inner: 43098.90 | bwd_allreduce: 907.05 | step: 39.16
{'loss': 1.3089, 'learning_rate': 1.8837103421790486e-05, 'epoch': 0.53}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5813
[2024-06-10 16:30:46,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 1809.83 | bwd_microstep: 2627.21 | bwd_inner_microstep: 2627.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4090
[2024-06-10 16:30:48,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.70 | bwd_microstep: 1624.11 | bwd_inner_microstep: 1624.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 16:30:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1484.42 | bwd_inner_microstep: 1484.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2258
[2024-06-10 16:30:51,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.99 | bwd_microstep: 900.60 | bwd_inner_microstep: 900.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 16:30:53,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.70 | bwd_microstep: 1145.39 | bwd_inner_microstep: 1145.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397
[2024-06-10 16:30:55,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.81 | bwd_microstep: 1147.22 | bwd_inner_microstep: 1147.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 16:30:56,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 16:30:58,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 16:31:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.04 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2658
[2024-06-10 16:31:02,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 956.07 | bwd_inner_microstep: 956.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892
[2024-06-10 16:31:04,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.66 | bwd_microstep: 1579.70 | bwd_inner_microstep: 1579.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486
[2024-06-10 16:31:06,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1314.71 | bwd_inner_microstep: 1314.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3672
[2024-06-10 16:31:07,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.74 | bwd_microstep: 1370.92 | bwd_inner_microstep: 1370.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 16:31:09,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.37 | bwd_microstep: 1310.40 | bwd_inner_microstep: 1310.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 16:31:11,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.07 | bwd_microstep: 1381.83 | bwd_inner_microstep: 1381.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 16:31:13,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1385.61 | bwd_inner_microstep: 1385.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3651
[2024-06-10 16:31:15,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.35 | bwd_microstep: 1586.64 | bwd_inner_microstep: 1586.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 16:31:17,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.46 | bwd_microstep: 1619.94 | bwd_inner_microstep: 1619.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2017
[2024-06-10 16:31:19,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.30 | bwd_microstep: 898.33 | bwd_inner_microstep: 898.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 16:31:21,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.79 | bwd_microstep: 1629.77 | bwd_inner_microstep: 1629.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3443
[2024-06-10 16:31:23,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.21 | bwd_microstep: 1313.04 | bwd_inner_microstep: 1313.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 16:31:25,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1612.96 | bwd_inner_microstep: 1612.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 894
[2024-06-10 16:31:25,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.84 | bwd_microstep: 369.58 | bwd_inner_microstep: 369.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545
[2024-06-10 16:31:28,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1525.45 | bwd_inner_microstep: 1525.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 16:31:30,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.61 | bwd_microstep: 1531.36 | bwd_inner_microstep: 1531.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 16:31:32,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2061
[2024-06-10 16:31:33,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.70 | bwd_microstep: 753.51 | bwd_inner_microstep: 753.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 16:31:35,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.10 | bwd_microstep: 1476.30 | bwd_inner_microstep: 1476.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 16:31:37,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.92 | bwd_microstep: 1637.67 | bwd_inner_microstep: 1637.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3736
[2024-06-10 16:31:39,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.29 | bwd_microstep: 1564.63 | bwd_inner_microstep: 1564.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3569
[2024-06-10 16:31:42,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 1643.86 | bwd_inner_microstep: 1643.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 16:31:44,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.19 | optimizer_step: 6.65
[2024-06-10 16:31:44,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.66 | bwd_microstep: 1510.59 | bwd_inner_microstep: 1502.89 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.52
[2024-06-10 16:31:44,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17431.95 | bwd: 44438.21 | bwd_inner: 44429.67 | bwd_allreduce: 7.86 | step: 39.02
{'loss': 1.1821, 'learning_rate': 1.8799635030758837e-05, 'epoch': 0.53}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 16:31:45,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1333.65 | bwd_inner_microstep: 1333.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 16:31:47,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1349.33 | bwd_inner_microstep: 1349.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 16:31:50,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.51 | bwd_microstep: 1653.64 | bwd_inner_microstep: 1653.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 16:31:52,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 16:31:53,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.36 | bwd_microstep: 1402.03 | bwd_inner_microstep: 1402.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 16:31:55,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1381.73 | bwd_inner_microstep: 1381.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 715
[2024-06-10 16:31:56,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.51 | bwd_microstep: 290.55 | bwd_inner_microstep: 290.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 16:31:58,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1430.19 | bwd_inner_microstep: 1430.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 16:32:00,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.05 | bwd_inner_microstep: 1385.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1948
[2024-06-10 16:32:01,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.68 | bwd_microstep: 759.37 | bwd_inner_microstep: 759.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 16:32:03,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.24 | bwd_microstep: 1313.91 | bwd_inner_microstep: 1313.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 16:32:05,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1508.76 | bwd_inner_microstep: 1508.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 16:32:07,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1448.44 | bwd_inner_microstep: 1448.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-10 16:32:09,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.19 | bwd_microstep: 1614.98 | bwd_inner_microstep: 1614.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 16:32:11,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1346.38 | bwd_inner_microstep: 1346.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3458
[2024-06-10 16:32:13,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.05 | bwd_microstep: 1566.16 | bwd_inner_microstep: 1566.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2285
[2024-06-10 16:32:14,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.69 | bwd_microstep: 906.17 | bwd_inner_microstep: 906.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3826
[2024-06-10 16:32:16,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1416.13 | bwd_inner_microstep: 1416.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 16:32:18,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.56 | bwd_microstep: 1461.17 | bwd_inner_microstep: 1461.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2048
[2024-06-10 16:32:19,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.59 | bwd_microstep: 842.42 | bwd_inner_microstep: 842.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1922
[2024-06-10 16:32:20,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.62 | bwd_microstep: 834.33 | bwd_inner_microstep: 834.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695
[2024-06-10 16:32:22,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1457.74 | bwd_inner_microstep: 1457.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 16:32:24,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.78 | bwd_microstep: 1467.24 | bwd_inner_microstep: 1467.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928
[2024-06-10 16:32:25,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.56 | bwd_microstep: 726.38 | bwd_inner_microstep: 726.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 16:32:27,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.27 | bwd_microstep: 878.38 | bwd_inner_microstep: 878.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 16:32:29,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.96 | bwd_microstep: 1453.03 | bwd_inner_microstep: 1453.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 16:32:31,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1386.65 | bwd_inner_microstep: 1386.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 16:32:32,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.57 | bwd_microstep: 1156.33 | bwd_inner_microstep: 1156.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 16:32:34,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1283.02 | bwd_inner_microstep: 1282.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 16:32:36,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.78 | bwd_microstep: 1456.89 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219
[2024-06-10 16:32:37,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.20 | bwd_microstep: 864.27 | bwd_inner_microstep: 864.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 16:32:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.25 | optimizer_step: 6.59
[2024-06-10 16:32:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.81 | bwd_microstep: 6853.49 | bwd_inner_microstep: 1576.29 | bwd_allreduce_microstep: 5277.14 | step_microstep: 38.08
[2024-06-10 16:32:45,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15081.20 | bwd: 45607.86 | bwd_inner: 40329.81 | bwd_allreduce: 5277.37 | step: 39.57
{'loss': 1.264, 'learning_rate': 1.8762170867406366e-05, 'epoch': 0.53}
 53%|█████▎    | 917/1726 [15:50:13<14:07:46, 62.88s/it]
 53%|█████▎    | 918/1726 [15:51:15<14:06:06, 62.83s/it]


 53%|█████▎    | 918/1726 [15:51:15<14:06:06, 62.83s/it]
 53%|█████▎    | 919/1726 [15:52:18<14:03:30, 62.71s/it]


 53%|█████▎    | 919/1726 [15:52:18<14:03:30, 62.71s/it]
 53%|█████▎    | 920/1726 [15:53:18<13:52:48, 62.00s/it]


 53%|█████▎    | 920/1726 [15:53:18<13:52:48, 62.00s/it]
 53%|█████▎    | 921/1726 [15:54:20<13:52:36, 62.06s/it]


 53%|█████▎    | 921/1726 [15:54:20<13:52:36, 62.06s/it]
 53%|█████▎    | 922/1726 [15:55:21<13:47:22, 61.74s/it]


 53%|█████▎    | 922/1726 [15:55dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931
[2024-06-10 16:32:46,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.92 | bwd_microstep: 876.09 | bwd_inner_microstep: 876.02 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4007
[2024-06-10 16:32:48,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.77 | bwd_microstep: 1604.93 | bwd_inner_microstep: 1604.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2373
[2024-06-10 16:32:49,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.46 | bwd_microstep: 994.65 | bwd_inner_microstep: 994.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2969
[2024-06-10 16:32:51,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.26 | bwd_microstep: 1196.57 | bwd_inner_microstep: 1196.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-10 16:32:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.90 | bwd_microstep: 1546.90 | bwd_inner_microstep: 1546.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3694
[2024-06-10 16:32:55,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1289.53 | bwd_inner_microstep: 1289.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 16:32:57,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.75 | bwd_microstep: 1279.89 | bwd_inner_microstep: 1279.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 16:32:59,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.28 | bwd_microstep: 1384.40 | bwd_inner_microstep: 1384.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 16:33:01,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1383.06 | bwd_inner_microstep: 1383.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 16:33:02,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1255.23 | bwd_inner_microstep: 1255.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 16:33:04,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.83 | bwd_microstep: 1384.72 | bwd_inner_microstep: 1384.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 16:33:06,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.49 | bwd_microstep: 1484.91 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 16:33:08,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.31 | bwd_microstep: 1373.93 | bwd_inner_microstep: 1373.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 16:33:10,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.41 | bwd_microstep: 1516.41 | bwd_inner_microstep: 1516.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 16:33:12,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.44 | bwd_microstep: 1614.92 | bwd_inner_microstep: 1614.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3643
[2024-06-10 16:33:14,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 1363.96 | bwd_inner_microstep: 1363.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3473
[2024-06-10 16:33:16,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.69 | bwd_microstep: 1242.55 | bwd_inner_microstep: 1242.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 16:33:18,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.26 | bwd_microstep: 1294.61 | bwd_inner_microstep: 1294.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 16:33:19,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.15 | bwd_microstep: 815.91 | bwd_inner_microstep: 815.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 16:33:21,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1510.25 | bwd_inner_microstep: 1510.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3547
[2024-06-10 16:33:23,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1424.63 | bwd_inner_microstep: 1424.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 16:33:24,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.26 | bwd_microstep: 805.47 | bwd_inner_microstep: 805.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 16:33:26,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.56 | bwd_microstep: 1460.33 | bwd_inner_microstep: 1460.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 16:33:28,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.84 | bwd_microstep: 1183.44 | bwd_inner_microstep: 1183.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 16:33:30,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.53 | bwd_microstep: 1343.30 | bwd_inner_microstep: 1343.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3518
[2024-06-10 16:33:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.17 | bwd_microstep: 1440.79 | bwd_inner_microstep: 1440.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3598
[2024-06-10 16:33:34,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.93 | bwd_microstep: 1705.39 | bwd_inner_microstep: 1705.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 16:33:36,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1279.59 | bwd_inner_microstep: 1279.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2249
[2024-06-10 16:33:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.76 | bwd_microstep: 1062.26 | bwd_inner_microstep: 1062.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3650
[2024-06-10 16:33:39,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.19 | bwd_microstep: 1348.77 | bwd_inner_microstep: 1348.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-10 16:33:41,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.91 | bwd_microstep: 1441.60 | bwd_inner_microstep: 1441.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3776
[2024-06-10 16:33:46,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 16:33:46,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.18 | bwd_microstep: 3660.98 | bwd_inner_microstep: 2089.03 | bwd_allreduce_microstep: 1571.89 | step_microstep: 38.11
[2024-06-10 16:33:46,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16000.84 | bwd: 44570.01 | bwd_inner: 42997.17 | bwd_allreduce: 1572.14 | step: 39.60
{'loss': 1.1735, 'learning_rate': 1.8724711063681665e-05, 'epoch': 0.53}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 16:33:47,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1277.00 | bwd_inner_microstep: 1276.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 16:33:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.71 | bwd_microstep: 1486.76 | bwd_inner_microstep: 1486.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3869
[2024-06-10 16:33:51,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.06 | bwd_microstep: 1461.44 | bwd_inner_microstep: 1461.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 16:33:52,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.97 | bwd_microstep: 787.21 | bwd_inner_microstep: 787.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 16:33:54,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1382.35 | bwd_inner_microstep: 1382.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 16:33:56,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 16:33:58,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.07 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 16:34:00,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1281.03 | bwd_inner_microstep: 1281.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 16:34:02,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619
[2024-06-10 16:34:03,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1312.19 | bwd_inner_microstep: 1312.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3712
[2024-06-10 16:34:06,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.77 | bwd_microstep: 1618.63 | bwd_inner_microstep: 1618.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 16:34:08,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.56 | bwd_microstep: 1523.56 | bwd_inner_microstep: 1523.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 16:34:10,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1413.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 16:34:12,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.77 | bwd_microstep: 1478.01 | bwd_inner_microstep: 1477.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2127
[2024-06-10 16:34:13,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.62 | bwd_microstep: 925.78 | bwd_inner_microstep: 925.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 16:34:15,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1282.24 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 16:34:17,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1611.62 | bwd_inner_microstep: 1611.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 16:34:19,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3456
[2024-06-10 16:34:21,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1303.35 | bwd_inner_microstep: 1303.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 16:34:22,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.84 | bwd_microstep: 1157.60 | bwd_inner_microstep: 1157.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 16:34:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1388.31 | bwd_inner_microstep: 1388.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 16:34:26,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1505.94 | bwd_inner_microstep: 1505.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 924
[2024-06-10 16:34:27,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.43 | bwd_microstep: 376.44 | bwd_inner_microstep: 376.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618
[2024-06-10 16:34:29,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-10 16:34:31,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1433.96 | bwd_inner_microstep: 1433.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 16:34:33,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1497.17 | bwd_inner_microstep: 1497.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-10 16:34:35,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.72 | bwd_microstep: 1343.19 | bwd_inner_microstep: 1343.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 16:34:37,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3760
[2024-06-10 16:34:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1587.50 | bwd_inner_microstep: 1587.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 16:34:41,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1372.93 | bwd_inner_microstep: 1372.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2134
[2024-06-10 16:34:42,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.53 | bwd_microstep: 861.16 | bwd_inner_microstep: 861.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-10 16:34:46,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.63
[2024-06-10 16:34:46,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.42 | bwd_microstep: 3259.94 | bwd_inner_microstep: 1790.99 | bwd_allreduce_microstep: 1468.89 | step_microstep: 38.95
[2024-06-10 16:34:46,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15900.82 | bwd: 44061.81 | bwd_inner: 42592.01 | bwd_allreduce: 1469.12 | step: 40.47
{'loss': 1.1719, 'learning_rate': 1.8687255751517975e-05, 'epoch': 0.54}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 16:34:48,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.13 | bwd_microstep: 1333.07 | bwd_inner_microstep: 1332.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3873
[2024-06-10 16:34:50,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.76 | bwd_microstep: 1540.24 | bwd_inner_microstep: 1540.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 16:34:52,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 16:34:54,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.82 | bwd_microstep: 1548.83 | bwd_inner_microstep: 1548.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 16:34:55,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 787.12 | bwd_inner_microstep: 787.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 16:34:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1382.01 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 16:34:59,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1281.64 | bwd_inner_microstep: 1281.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 16:35:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.46 | bwd_microstep: 793.70 | bwd_inner_microstep: 793.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3498
[2024-06-10 16:35:01,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.51 | bwd_microstep: 1222.05 | bwd_inner_microstep: 1222.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-10 16:35:02,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 704.56 | bwd_inner_microstep: 704.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 16:35:04,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.46 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 16:35:06,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1442.24 | bwd_inner_microstep: 1442.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-10 16:35:08,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1409.92 | bwd_inner_microstep: 1409.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 16:35:10,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1342.09 | bwd_inner_microstep: 1342.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3486
[2024-06-10 16:35:12,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.41 | bwd_microstep: 1505.99 | bwd_inner_microstep: 1505.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2400
[2024-06-10 16:35:13,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.90 | bwd_microstep: 1034.26 | bwd_inner_microstep: 1034.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 16:35:15,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1310.97 | bwd_inner_microstep: 1310.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 16:35:17,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.07 | bwd_microstep: 1426.72 | bwd_inner_microstep: 1426.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819
[2024-06-10 16:35:19,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.24 | bwd_microstep: 1386.94 | bwd_inner_microstep: 1386.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 16:35:21,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1414.95 | bwd_inner_microstep: 1414.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 16:35:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.22 | bwd_microstep: 1453.51 | bwd_inner_microstep: 1453.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2116
[2024-06-10 16:35:24,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.86 | bwd_microstep: 861.95 | bwd_inner_microstep: 861.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-10 16:35:26,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.08 | bwd_microstep: 1542.55 | bwd_inner_microstep: 1542.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 16:35:29,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.75 | bwd_microstep: 1644.80 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 16:35:31,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.30 | bwd_microstep: 1409.52 | bwd_inner_microstep: 1409.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3565
[2024-06-10 16:35:33,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.83 | bwd_microstep: 1359.55 | bwd_inner_microstep: 1359.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 16:35:35,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1454.68 | bwd_inner_microstep: 1454.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591
[2024-06-10 16:35:37,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1463.18 | bwd_inner_microstep: 1463.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 16:35:39,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1540.46 | bwd_inner_microstep: 1540.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 16:35:41,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1397.37 | bwd_inner_microstep: 1397.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 16:35:42,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 903.95 | bwd_inner_microstep: 903.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 16:35:45,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 16:35:45,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.11 | bwd_microstep: 2143.89 | bwd_inner_microstep: 1677.53 | bwd_allreduce_microstep: 466.30 | step_microstep: 37.91
[2024-06-10 16:35:45,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15770.44 | bwd: 42702.44 | bwd_inner: 42235.13 | bwd_allreduce: 466.57 | step: 39.39
{'loss': 1.2107, 'learning_rate': 1.8649805062832697e-05, 'epoch': 0.54}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 16:35:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1479.76 | bwd_inner_microstep: 1479.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-10 16:35:48,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.56 | bwd_microstep: 725.13 | bwd_inner_microstep: 725.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 16:35:50,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1556.09 | bwd_inner_microstep: 1556.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3778
[2024-06-10 16:35:52,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.09 | bwd_microstep: 1579.54 | bwd_inner_microstep: 1579.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-10 16:35:53,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.70 | bwd_microstep: 802.98 | bwd_inner_microstep: 802.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1892
[2024-06-10 16:35:54,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.38 | bwd_microstep: 710.68 | bwd_inner_microstep: 710.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 16:35:55,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 696.75 | bwd_inner_microstep: 696.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489
[2024-06-10 16:35:57,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.29 | bwd_microstep: 1432.39 | bwd_inner_microstep: 1432.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 16:35:59,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1534.27 | bwd_inner_microstep: 1534.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192
[2024-06-10 16:36:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.18 | bwd_microstep: 953.72 | bwd_inner_microstep: 953.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 16:36:02,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.93 | bwd_microstep: 798.83 | bwd_inner_microstep: 798.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678
[2024-06-10 16:36:04,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.16 | bwd_microstep: 1447.52 | bwd_inner_microstep: 1447.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3528
[2024-06-10 16:36:06,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.89 | bwd_microstep: 1583.29 | bwd_inner_microstep: 1583.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 16:36:08,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1242.78 | bwd_inner_microstep: 1242.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 16:36:10,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1496.74 | bwd_inner_microstep: 1496.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2128
[2024-06-10 16:36:11,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.59 | bwd_microstep: 797.50 | bwd_inner_microstep: 797.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 16:36:13,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1496.51 | bwd_inner_microstep: 1496.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 16:36:14,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.53 | bwd_microstep: 802.20 | bwd_inner_microstep: 802.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623
[2024-06-10 16:36:16,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.11 | bwd_microstep: 1310.95 | bwd_inner_microstep: 1310.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724
[2024-06-10 16:36:18,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1367.76 | bwd_inner_microstep: 1367.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 16:36:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3870
[2024-06-10 16:36:21,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.29 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 16:36:24,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1555.26 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 16:36:26,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1397.29 | bwd_inner_microstep: 1397.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 747
[2024-06-10 16:36:26,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.98 | bwd_microstep: 300.58 | bwd_inner_microstep: 300.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 16:36:28,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.74 | bwd_microstep: 1501.92 | bwd_inner_microstep: 1501.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 16:36:30,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1356.35 | bwd_inner_microstep: 1356.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613
[2024-06-10 16:36:32,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3571
[2024-06-10 16:36:34,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.81 | bwd_microstep: 1557.21 | bwd_inner_microstep: 1557.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 16:36:36,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.07 | bwd_microstep: 1542.33 | bwd_inner_microstep: 1542.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2588
[2024-06-10 16:36:37,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.37 | bwd_microstep: 980.11 | bwd_inner_microstep: 980.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 16:36:42,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 16:36:42,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 4128.33 | bwd_inner_microstep: 1687.02 | bwd_allreduce_microstep: 2441.26 | step_microstep: 37.94
[2024-06-10 16:36:42,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14836.09 | bwd: 42189.37 | bwd_inner: 39747.21 | bwd_allreduce: 2441.49 | step: 39.35
{'loss': 1.1955, 'learning_rate': 1.861235912952697e-05, 'epoch': 0.54}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1891
[2024-06-10 16:36:43,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.27 | bwd_microstep: 861.49 | bwd_inner_microstep: 861.40 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 949
[2024-06-10 16:36:44,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.56 | bwd_microstep: 381.07 | bwd_inner_microstep: 381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 16:36:46,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1405.14 | bwd_inner_microstep: 1405.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3854
[2024-06-10 16:36:48,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3742
[2024-06-10 16:36:49,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1299.19 | bwd_inner_microstep: 1299.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2436
[2024-06-10 16:36:51,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.10 | bwd_microstep: 851.25 | bwd_inner_microstep: 851.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 16:36:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3610
[2024-06-10 16:36:54,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1360.60 | bwd_inner_microstep: 1360.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 16:36:55,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.45 | bwd_microstep: 796.99 | bwd_inner_microstep: 796.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 16:36:57,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.46 | bwd_microstep: 1288.13 | bwd_inner_microstep: 1288.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692
[2024-06-10 16:36:59,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.89 | bwd_microstep: 1458.15 | bwd_inner_microstep: 1458.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 16:37:01,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.81 | bwd_microstep: 1291.02 | bwd_inner_microstep: 1291.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-10 16:37:03,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1582.13 | bwd_inner_microstep: 1582.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2492
[2024-06-10 16:37:05,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.64 | bwd_microstep: 1054.03 | bwd_inner_microstep: 1054.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142
[2024-06-10 16:37:06,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.73 | bwd_microstep: 926.97 | bwd_inner_microstep: 926.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 16:37:08,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1481.89 | bwd_inner_microstep: 1481.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555
[2024-06-10 16:37:10,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.19 | bwd_microstep: 1234.47 | bwd_inner_microstep: 1234.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3631
[2024-06-10 16:37:12,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 1441.74 | bwd_inner_microstep: 1441.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 16:37:14,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1418.92 | bwd_inner_microstep: 1418.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-10 16:37:15,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.15 | bwd_microstep: 1075.52 | bwd_inner_microstep: 1075.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3707
[2024-06-10 16:37:17,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1388.13 | bwd_inner_microstep: 1388.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514
[2024-06-10 16:37:19,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1416.93 | bwd_inner_microstep: 1416.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 16:37:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 16:37:23,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1554.87 | bwd_inner_microstep: 1554.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 16:37:25,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1385.02 | bwd_inner_microstep: 1384.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 16:37:27,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.19 | bwd_microstep: 1626.80 | bwd_inner_microstep: 1626.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3571
[2024-06-10 16:37:29,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.07 | bwd_microstep: 1443.37 | bwd_inner_microstep: 1443.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3762
[2024-06-10 16:37:31,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.68 | bwd_microstep: 1590.21 | bwd_inner_microstep: 1590.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-10 16:37:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1367.53 | bwd_inner_microstep: 1367.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3579
[2024-06-10 16:37:36,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.95 | bwd_microstep: 1612.57 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3432
[2024-06-10 16:37:37,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3807
[2024-06-10 16:37:44,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-10 16:37:44,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.55 | bwd_microstep: 5497.73 | bwd_inner_microstep: 1446.33 | bwd_allreduce_microstep: 4051.33 | step_microstep: 38.81
[2024-06-10 16:37:44,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15519.88 | bwd: 45681.28 | bwd_inner: 41628.95 | bwd_allreduce: 4051.61 | step: 40.31
{'loss': 1.2165, 'learning_rate': 1.8574918083485173e-05, 'epoch': 0.54}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-10 16:37:45,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.80 | bwd_microstep: 1139.55 | bwd_inner_microstep: 1139.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4015
[2024-06-10 16:37:47,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1314.42 | bwd_inner_microstep: 1314.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3927
[2024-06-10 16:37:49,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.20 | bwd_microstep: 1591.04 | bwd_inner_microstep: 1591.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902
[2024-06-10 16:37:51,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.67 | bwd_microstep: 1686.67 | bwd_inner_microstep: 1686.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-10 16:37:53,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1452.34 | bwd_inner_microstep: 1452.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3870
[2024-06-10 16:37:56,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.06 | bwd_microstep: 1664.44 | bwd_inner_microstep: 1664.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 16:37:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.51 | bwd_microstep: 1539.26 | bwd_inner_microstep: 1539.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 16:38:00,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1548.51 | bwd_inner_microstep: 1548.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 16:38:02,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.36 | bwd_microstep: 1391.32 | bwd_inner_microstep: 1391.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 16:38:04,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1244.49 | bwd_inner_microstep: 1244.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 16:38:05,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1252.54 | bwd_inner_microstep: 1252.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970
[2024-06-10 16:38:07,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.65 | bwd_microstep: 826.94 | bwd_inner_microstep: 826.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 16:38:08,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.78 | bwd_microstep: 801.75 | bwd_inner_microstep: 801.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 16:38:09,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.60 | bwd_inner_microstep: 1254.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495
[2024-06-10 16:38:11,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 16:38:13,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.60 | bwd_microstep: 1488.17 | bwd_inner_microstep: 1488.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669
[2024-06-10 16:38:16,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.70 | bwd_microstep: 1550.92 | bwd_inner_microstep: 1550.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-10 16:38:17,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 818.70 | bwd_inner_microstep: 818.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 16:38:19,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1311.12 | bwd_inner_microstep: 1311.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 16:38:21,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 16:38:23,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1478.89 | bwd_inner_microstep: 1478.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 16:38:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1346.61 | bwd_inner_microstep: 1346.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 16:38:27,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1498.20 | bwd_inner_microstep: 1498.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3479
[2024-06-10 16:38:29,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1575.30 | bwd_inner_microstep: 1575.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 16:38:31,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1402.77 | bwd_inner_microstep: 1402.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-10 16:38:32,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.26 | bwd_microstep: 896.38 | bwd_inner_microstep: 896.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2076
[2024-06-10 16:38:33,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.83 | bwd_microstep: 1014.58 | bwd_inner_microstep: 1014.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278
[2024-06-10 16:38:35,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.04 | bwd_microstep: 1070.68 | bwd_inner_microstep: 1070.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 16:38:37,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1398.25 | bwd_inner_microstep: 1398.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-10 16:38:39,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1306.81 | bwd_inner_microstep: 1306.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2057
[2024-06-10 16:38:40,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.16 | bwd_microstep: 915.43 | bwd_inner_microstep: 915.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3586
[2024-06-10 16:38:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-10 16:38:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.33 | bwd_microstep: 6672.00 | bwd_inner_microstep: 1537.66 | bwd_allreduce_microstep: 5134.29 | step_microstep: 37.87
[2024-06-10 16:38:47,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15741.90 | bwd: 47393.78 | bwd_inner: 42258.59 | bwd_allreduce: 5134.52 | step: 39.36
:21<13:47:22, 61.74s/it]
 53%|█████▎    | 923/1726 [15:56:22<13:42:58, 61.49s/it]


 53%|█████▎    | 923/1726 [15:56:22<13:42:58, 61.49s/it]
 54%|█████▎    | 924/1726 [15:57:23<13:37:11, 61.14s/it]


 54%|█████▎    | 924/1726 [15:57:23<13:37:11, 61.14s/it]
 54%|█████▎    | 925/1726 [15:58:21<13:26:50, 60.44s/it]


 54%|█████▎    | 925/1726 [15:58:21<13:26:50, 60.44s/it]
 54%|█████▎    | 926/1726 [15:59:19<13:13:29, 59.51s/it]


 54%|█████▎    | 926/1726 [15:59:19<13:13:29, 59.51s/it]
 54%|█████▎    | 927/1726 [16:00:20<13:20:35, 60.12s/it]


 54%|█████▎    | 927/1726 [16:00:20<13:20:35, 60.12s/it]
 54%|████�{'loss': 1.2519, 'learning_rate': 1.853748205657448e-05, 'epoch': 0.54}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3477
[2024-06-10 16:38:49,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1565.39 | bwd_inner_microstep: 1565.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3393
[2024-06-10 16:38:51,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1205.51 | bwd_inner_microstep: 1205.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893
[2024-06-10 16:38:53,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.96 | bwd_microstep: 1679.55 | bwd_inner_microstep: 1679.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2649
[2024-06-10 16:38:55,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.03 | bwd_microstep: 1114.57 | bwd_inner_microstep: 1114.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2311
[2024-06-10 16:38:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.08 | bwd_microstep: 979.65 | bwd_inner_microstep: 979.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 16:38:58,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1244.33 | bwd_inner_microstep: 1244.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 16:38:59,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.31 | bwd_microstep: 798.18 | bwd_inner_microstep: 798.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 16:39:00,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 792.51 | bwd_inner_microstep: 792.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 16:39:02,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1312.24 | bwd_inner_microstep: 1312.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 16:39:04,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1248.29 | bwd_inner_microstep: 1248.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3682
[2024-06-10 16:39:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1475.42 | bwd_inner_microstep: 1475.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3412
[2024-06-10 16:39:08,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.03 | bwd_microstep: 1472.87 | bwd_inner_microstep: 1472.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 16:39:10,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.48 | bwd_microstep: 1451.90 | bwd_inner_microstep: 1451.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3665
[2024-06-10 16:39:12,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.40 | bwd_microstep: 1672.27 | bwd_inner_microstep: 1672.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 16:39:14,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.27 | bwd_microstep: 1618.61 | bwd_inner_microstep: 1618.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 16:39:15,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 796.70 | bwd_inner_microstep: 796.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3632
[2024-06-10 16:39:17,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1314.15 | bwd_inner_microstep: 1314.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-10 16:39:19,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1307.83 | bwd_inner_microstep: 1307.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 16:39:21,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1254.90 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 16:39:22,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.10 | bwd_microstep: 1385.01 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-10 16:39:24,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1330.19 | bwd_inner_microstep: 1330.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 16:39:26,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1411.44 | bwd_inner_microstep: 1411.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 16:39:27,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.67 | bwd_microstep: 685.89 | bwd_inner_microstep: 685.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 16:39:29,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1562.41 | bwd_inner_microstep: 1562.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2082
[2024-06-10 16:39:31,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.32 | bwd_microstep: 920.44 | bwd_inner_microstep: 920.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2300
[2024-06-10 16:39:32,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.83 | bwd_microstep: 882.25 | bwd_inner_microstep: 882.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1904
[2024-06-10 16:39:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.27 | bwd_microstep: 716.30 | bwd_inner_microstep: 716.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2073
[2024-06-10 16:39:34,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.68 | bwd_microstep: 1012.66 | bwd_inner_microstep: 1012.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-10 16:39:36,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1394.06 | bwd_inner_microstep: 1394.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 16:39:38,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.75 | bwd_microstep: 971.72 | bwd_inner_microstep: 971.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3589
[2024-06-10 16:39:40,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.19 | bwd_microstep: 1806.51 | bwd_inner_microstep: 1806.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821
[2024-06-10 16:39:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 16:39:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.43 | bwd_microstep: 8695.23 | bwd_inner_microstep: 1716.51 | bwd_allreduce_microstep: 6978.68 | step_microstep: 38.07
[2024-06-10 16:39:49,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14898.95 | bwd: 47079.00 | bwd_inner: 40099.41 | bwd_allreduce: 6978.90 | step: 39.55
{'loss': 1.2251, 'learning_rate': 1.8500051180644388e-05, 'epoch': 0.54}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 16:39:51,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.63 | bwd_microstep: 1280.97 | bwd_inner_microstep: 1280.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4049
[2024-06-10 16:39:53,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.30 | bwd_microstep: 1511.11 | bwd_inner_microstep: 1511.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3900
[2024-06-10 16:39:55,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.29 | bwd_microstep: 1479.19 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 16:39:56,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.09 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 16:39:58,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.01 | bwd_microstep: 1537.72 | bwd_inner_microstep: 1537.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2238
[2024-06-10 16:40:00,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.58 | bwd_microstep: 861.87 | bwd_inner_microstep: 861.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 16:40:01,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.98 | bwd_microstep: 788.62 | bwd_inner_microstep: 788.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-10 16:40:03,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 16:40:04,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 796.40 | bwd_inner_microstep: 796.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691
[2024-06-10 16:40:06,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.93 | bwd_microstep: 1325.77 | bwd_inner_microstep: 1325.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 16:40:08,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1421.76 | bwd_inner_microstep: 1421.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2479
[2024-06-10 16:40:09,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.54 | bwd_microstep: 1047.44 | bwd_inner_microstep: 1047.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-10 16:40:10,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.47 | bwd_microstep: 853.28 | bwd_inner_microstep: 853.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 16:40:12,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.60 | bwd_microstep: 1372.86 | bwd_inner_microstep: 1372.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3952
[2024-06-10 16:40:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.16 | bwd_microstep: 1895.27 | bwd_inner_microstep: 1895.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3898
[2024-06-10 16:40:17,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1493.32 | bwd_inner_microstep: 1493.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 16:40:19,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.79 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 16:40:21,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1511.70 | bwd_inner_microstep: 1511.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3534
[2024-06-10 16:40:23,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.31 | bwd_microstep: 1196.35 | bwd_inner_microstep: 1196.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 16:40:25,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1515.72 | bwd_inner_microstep: 1515.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3836
[2024-06-10 16:40:27,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.06 | bwd_microstep: 1725.57 | bwd_inner_microstep: 1725.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 16:40:28,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 788.01 | bwd_inner_microstep: 787.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 16:40:30,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1348.32 | bwd_inner_microstep: 1348.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 16:40:32,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.89 | bwd_microstep: 1159.18 | bwd_inner_microstep: 1159.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 16:40:34,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1439.68 | bwd_inner_microstep: 1439.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3539
[2024-06-10 16:40:35,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.59 | bwd_microstep: 1231.48 | bwd_inner_microstep: 1231.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 16:40:37,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1403.54 | bwd_inner_microstep: 1403.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 16:40:39,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.41 | bwd_microstep: 1443.78 | bwd_inner_microstep: 1443.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3800
[2024-06-10 16:40:41,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.69 | bwd_microstep: 1614.73 | bwd_inner_microstep: 1614.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478
[2024-06-10 16:40:43,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1403.15 | bwd_inner_microstep: 1403.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 16:40:45,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.77 | bwd_microstep: 961.57 | bwd_inner_microstep: 961.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3576
[2024-06-10 16:40:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-10 16:40:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.04 | bwd_microstep: 4998.07 | bwd_inner_microstep: 1840.80 | bwd_allreduce_microstep: 3157.22 | step_microstep: 37.71
[2024-06-10 16:40:50,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15630.84 | bwd: 45128.14 | bwd_inner: 41970.02 | bwd_allreduce: 3157.45 | step: 39.17
{'loss': 1.2351, 'learning_rate': 1.846262558752623e-05, 'epoch': 0.54}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472
[2024-06-10 16:40:53,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.70 | bwd_microstep: 1564.37 | bwd_inner_microstep: 1564.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2430
[2024-06-10 16:40:54,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.80 | bwd_microstep: 908.76 | bwd_inner_microstep: 908.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3854
[2024-06-10 16:40:56,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1361.55 | bwd_inner_microstep: 1361.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895
[2024-06-10 16:40:58,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.97 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271
[2024-06-10 16:40:59,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.63 | bwd_microstep: 874.68 | bwd_inner_microstep: 874.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479
[2024-06-10 16:41:01,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 16:41:03,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.39 | bwd_microstep: 1153.54 | bwd_inner_microstep: 1153.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-10 16:41:05,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.49 | bwd_microstep: 1524.51 | bwd_inner_microstep: 1524.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3608
[2024-06-10 16:41:06,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.55 | bwd_microstep: 1214.40 | bwd_inner_microstep: 1214.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 16:41:08,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.00 | bwd_microstep: 1390.18 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 16:41:09,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.02 | bwd_microstep: 795.11 | bwd_inner_microstep: 795.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458
[2024-06-10 16:41:11,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1339.76 | bwd_inner_microstep: 1339.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-10 16:41:12,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.71 | bwd_microstep: 726.85 | bwd_inner_microstep: 726.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3690
[2024-06-10 16:41:14,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.44 | bwd_microstep: 1586.55 | bwd_inner_microstep: 1586.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 16:41:16,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.12 | bwd_microstep: 1383.61 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3462
[2024-06-10 16:41:18,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1421.21 | bwd_inner_microstep: 1421.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2643
[2024-06-10 16:41:20,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.04 | bwd_microstep: 1113.08 | bwd_inner_microstep: 1113.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 16:41:22,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1387.42 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-10 16:41:24,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.92 | bwd_microstep: 1311.03 | bwd_inner_microstep: 1311.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 16:41:26,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1488.36 | bwd_inner_microstep: 1488.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 16:41:27,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1325.15 | bwd_inner_microstep: 1325.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 16:41:29,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.84 | bwd_microstep: 1491.63 | bwd_inner_microstep: 1491.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 16:41:31,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1391.60 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 16:41:33,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1257.27 | bwd_inner_microstep: 1257.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 16:41:35,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1494.18 | bwd_inner_microstep: 1494.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 16:41:37,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1555.14 | bwd_inner_microstep: 1555.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 16:41:39,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.89 | bwd_microstep: 1293.61 | bwd_inner_microstep: 1293.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 16:41:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1399.38 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 16:41:43,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1389.84 | bwd_inner_microstep: 1389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 16:41:45,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1400.47 | bwd_inner_microstep: 1400.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3807
[2024-06-10 16:41:47,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.85 | bwd_microstep: 1517.85 | bwd_inner_microstep: 1517.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2029
[2024-06-10 16:41:52,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-10 16:41:52,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.42 | bwd_microstep: 4192.71 | bwd_inner_microstep: 1032.19 | bwd_allreduce_microstep: 3160.46 | step_microstep: 37.97
[2024-06-10 16:41:52,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15739.00 | bwd: 45138.16 | bwd_inner: 41976.79 | bwd_allreduce: 3160.69 | step: 39.46
{'loss': 1.2234, 'learning_rate': 1.8425205409032767e-05, 'epoch': 0.54}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923
[2024-06-10 16:41:53,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.40 | bwd_microstep: 817.90 | bwd_inner_microstep: 817.80 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 16:41:55,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.41 | bwd_microstep: 1477.46 | bwd_inner_microstep: 1477.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 16:41:57,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3582
[2024-06-10 16:41:58,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1329.21 | bwd_inner_microstep: 1329.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4165
[2024-06-10 16:42:01,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.87 | bwd_microstep: 1646.89 | bwd_inner_microstep: 1646.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3773
[2024-06-10 16:42:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1465.92 | bwd_inner_microstep: 1465.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 16:42:04,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1278.82 | bwd_inner_microstep: 1278.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-10 16:42:06,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.20 | bwd_microstep: 1213.77 | bwd_inner_microstep: 1213.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3708
[2024-06-10 16:42:08,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 16:42:10,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1626.09 | bwd_inner_microstep: 1626.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3854
[2024-06-10 16:42:13,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.59 | bwd_microstep: 1767.02 | bwd_inner_microstep: 1767.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3674
[2024-06-10 16:42:15,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.79 | bwd_microstep: 1355.85 | bwd_inner_microstep: 1355.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 16:42:16,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1350.21 | bwd_inner_microstep: 1350.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3657
[2024-06-10 16:42:19,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.27 | bwd_microstep: 1682.31 | bwd_inner_microstep: 1682.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3719
[2024-06-10 16:42:21,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.51 | bwd_microstep: 1470.36 | bwd_inner_microstep: 1470.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 16:42:23,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.19 | bwd_microstep: 1325.59 | bwd_inner_microstep: 1325.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 16:42:25,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1396.95 | bwd_inner_microstep: 1396.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 16:42:26,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.16 | bwd_microstep: 1181.95 | bwd_inner_microstep: 1181.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3953
[2024-06-10 16:42:28,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.52 | bwd_microstep: 1626.97 | bwd_inner_microstep: 1626.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 892
[2024-06-10 16:42:29,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.04 | bwd_microstep: 369.84 | bwd_inner_microstep: 369.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2115
[2024-06-10 16:42:30,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.43 | bwd_microstep: 927.38 | bwd_inner_microstep: 927.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 16:42:32,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1529.67 | bwd_inner_microstep: 1529.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-10 16:42:34,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.87 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 16:42:36,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.87 | bwd_microstep: 806.16 | bwd_inner_microstep: 806.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2185
[2024-06-10 16:42:37,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.44 | bwd_microstep: 890.03 | bwd_inner_microstep: 890.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 16:42:39,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.74 | bwd_microstep: 1559.69 | bwd_inner_microstep: 1559.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940
[2024-06-10 16:42:40,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.88 | bwd_microstep: 761.30 | bwd_inner_microstep: 761.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 16:42:42,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1509.20 | bwd_inner_microstep: 1509.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 16:42:43,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.31 | bwd_microstep: 878.65 | bwd_inner_microstep: 878.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 16:42:46,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.84 | bwd_microstep: 1658.43 | bwd_inner_microstep: 1658.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-10 16:42:48,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1447.30 | bwd_inner_microstep: 1447.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-10 16:42:54,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.12 | optimizer_step: 6.58
[2024-06-10 16:42:54,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 5477.33 | bwd_inner_microstep: 1678.14 | bwd_allreduce_microstep: 3799.14 | step_microstep: 37.88
[2024-06-10 16:42:54,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15733.06 | bwd: 45945.09 | bwd_inner: 42144.97 | bwd_allreduce: 3799.41 | step: 39.40
{'loss': 1.2131, 'learning_rate': 1.838779077695766e-05, 'epoch': 0.54}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-10 16:42:55,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.49 | bwd_microstep: 1330.36 | bwd_inner_microstep: 1330.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 16:42:57,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.09 | bwd_microstep: 1475.54 | bwd_inner_microstep: 1475.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 16:42:59,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1338.32 | bwd_inner_microstep: 1338.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-10 16:43:02,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.57 | bwd_microstep: 1641.01 | bwd_inner_microstep: 1640.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 16:43:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.04 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 16:43:05,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1282.93 | bwd_inner_microstep: 1282.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 16:43:06,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.34 | bwd_microstep: 791.49 | bwd_inner_microstep: 791.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 16:43:08,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.37 | bwd_microstep: 1158.49 | bwd_inner_microstep: 1158.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-10 16:43:10,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3715
[2024-06-10 16:43:12,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.99 | bwd_microstep: 1699.56 | bwd_inner_microstep: 1699.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1993
[2024-06-10 16:43:13,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.14 | bwd_microstep: 828.84 | bwd_inner_microstep: 828.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2186
[2024-06-10 16:43:15,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.67 | bwd_microstep: 1048.18 | bwd_inner_microstep: 1048.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-10 16:43:17,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.87 | bwd_microstep: 1548.76 | bwd_inner_microstep: 1548.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 16:43:19,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.37 | bwd_microstep: 1335.48 | bwd_inner_microstep: 1335.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 16:43:21,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 16:43:23,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.64 | bwd_microstep: 1314.18 | bwd_inner_microstep: 1314.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 943
[2024-06-10 16:43:23,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 162.00 | bwd_microstep: 412.55 | bwd_inner_microstep: 412.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3610
[2024-06-10 16:43:25,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.55 | bwd_microstep: 1536.72 | bwd_inner_microstep: 1536.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834
[2024-06-10 16:43:28,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.74 | bwd_microstep: 1754.98 | bwd_inner_microstep: 1754.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3610
[2024-06-10 16:43:30,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.68 | bwd_microstep: 1807.84 | bwd_inner_microstep: 1807.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 16:43:32,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.40 | bwd_microstep: 1397.23 | bwd_inner_microstep: 1397.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 16:43:34,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3835
[2024-06-10 16:43:36,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.82 | bwd_microstep: 1618.45 | bwd_inner_microstep: 1618.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 16:43:38,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.00 | bwd_microstep: 1650.98 | bwd_inner_microstep: 1650.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 16:43:40,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1449.81 | bwd_inner_microstep: 1449.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 16:43:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1340.24 | bwd_inner_microstep: 1340.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-10 16:43:43,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.44 | bwd_microstep: 686.27 | bwd_inner_microstep: 686.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 16:43:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.25 | bwd_microstep: 1488.04 | bwd_inner_microstep: 1488.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2189
[2024-06-10 16:43:46,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.85 | bwd_microstep: 765.26 | bwd_inner_microstep: 765.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2196
[2024-06-10 16:43:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.18 | bwd_microstep: 958.12 | bwd_inner_microstep: 958.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 16:43:50,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 16:43:55,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 16:43:55,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 5237.04 | bwd_inner_microstep: 1460.13 | bwd_allreduce_microstep: 3776.84 | step_microstep: 38.84
[2024-06-10 16:43:55,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.44 | bwd: 45780.30 | bwd_inner: 42002.54 | bwd_allreduce: 3777.08 | step: 40.28
�▍    | 928/1726 [16:01:24<13:32:57, 61.12s/it]


 54%|█████▍    | 928/1726 [16:01:24<13:32:57, 61.12s/it]
 54%|█████▍    | 929/1726 [16:02:26<13:36:38, 61.48s/it]


 54%|█████▍    | 929/1726 [16:02:26<13:36:38, 61.48s/it]
 54%|█████▍    | 930/1726 [16:03:27<13:34:04, 61.36s/it]


 54%|█████▍    | 930/1726 [16:03:27<13:34:04, 61.36s/it]
 54%|█████▍    | 931/1726 [16:04:28<13:32:23, 61.31s/it]


 54%|█████▍    | 931/1726 [16:04:28<13:32:23, 61.31s/it]
 54%|█████▍    | 932/1726 [16:05:30<13:34:07, 61.52s/it]


 54%|█████▍    | 932/1726 [16:05:30<13:34:07, 61.52s/it]
 54%|█████▍    | 933/1726 [16:06:32<13:34:00, 61.59{'loss': 1.2285, 'learning_rate': 1.8350381823075062e-05, 'epoch': 0.54}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 16:43:57,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1387.33 | bwd_inner_microstep: 1387.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3930
[2024-06-10 16:43:59,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.37 | bwd_microstep: 1487.90 | bwd_inner_microstep: 1487.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894
[2024-06-10 16:44:02,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.23 | bwd_microstep: 1583.03 | bwd_inner_microstep: 1583.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4303
[2024-06-10 16:44:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.70 | bwd_microstep: 1778.65 | bwd_inner_microstep: 1778.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 16:44:06,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1285.58 | bwd_inner_microstep: 1285.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3536
[2024-06-10 16:44:07,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.76 | bwd_microstep: 1197.19 | bwd_inner_microstep: 1197.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 16:44:09,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1396.06 | bwd_inner_microstep: 1396.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 16:44:11,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.53 | bwd_microstep: 1447.75 | bwd_inner_microstep: 1447.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 16:44:13,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1390.53 | bwd_inner_microstep: 1390.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 16:44:15,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1345.99 | bwd_inner_microstep: 1345.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493
[2024-06-10 16:44:17,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1575.84 | bwd_inner_microstep: 1575.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2381
[2024-06-10 16:44:19,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.44 | bwd_microstep: 929.76 | bwd_inner_microstep: 929.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 16:44:21,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1432.14 | bwd_inner_microstep: 1432.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3659
[2024-06-10 16:44:23,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.30 | bwd_microstep: 1822.69 | bwd_inner_microstep: 1822.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 16:44:25,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1503.00 | bwd_inner_microstep: 1502.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 16:44:27,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.02 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 16:44:29,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1490.54 | bwd_inner_microstep: 1490.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 16:44:31,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1491.34 | bwd_inner_microstep: 1491.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3643
[2024-06-10 16:44:33,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.96 | bwd_microstep: 1345.79 | bwd_inner_microstep: 1345.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852
[2024-06-10 16:44:35,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.92 | bwd_microstep: 1665.35 | bwd_inner_microstep: 1665.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 16:44:37,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.17 | bwd_microstep: 1384.32 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 16:44:39,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1352.73 | bwd_inner_microstep: 1352.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 16:44:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.31 | bwd_microstep: 685.49 | bwd_inner_microstep: 685.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-10 16:44:42,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.94 | bwd_microstep: 1423.76 | bwd_inner_microstep: 1423.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3479
[2024-06-10 16:44:44,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.57 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609
[2024-06-10 16:44:46,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.95 | bwd_microstep: 1309.14 | bwd_inner_microstep: 1309.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 16:44:47,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.37 | bwd_microstep: 805.69 | bwd_inner_microstep: 805.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 16:44:49,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.98 | bwd_microstep: 1294.20 | bwd_inner_microstep: 1294.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3832
[2024-06-10 16:44:51,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.43 | bwd_microstep: 1616.33 | bwd_inner_microstep: 1616.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 16:44:53,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.08 | bwd_microstep: 1553.14 | bwd_inner_microstep: 1553.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 16:44:55,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1348.55 | bwd_inner_microstep: 1348.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 16:44:57,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-10 16:44:57,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.87 | bwd_microstep: 1583.37 | bwd_inner_microstep: 1575.67 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.84
[2024-06-10 16:44:57,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16689.32 | bwd: 44648.09 | bwd_inner: 44639.55 | bwd_allreduce: 7.87 | step: 39.34
{'loss': 1.2199, 'learning_rate': 1.831297867913911e-05, 'epoch': 0.54}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3556
[2024-06-10 16:44:59,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1591.44 | bwd_inner_microstep: 1591.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 16:45:01,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1276.24 | bwd_inner_microstep: 1276.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448
[2024-06-10 16:45:03,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1415.65 | bwd_inner_microstep: 1415.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2317
[2024-06-10 16:45:04,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.75 | bwd_microstep: 931.66 | bwd_inner_microstep: 931.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 16:45:06,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1246.58 | bwd_inner_microstep: 1246.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 16:45:08,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1549.49 | bwd_inner_microstep: 1549.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 16:45:10,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.85 | bwd_microstep: 1353.63 | bwd_inner_microstep: 1353.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3565
[2024-06-10 16:45:12,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1461.39 | bwd_inner_microstep: 1461.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1902
[2024-06-10 16:45:13,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.71 | bwd_microstep: 779.21 | bwd_inner_microstep: 779.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 16:45:15,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.63 | bwd_microstep: 1534.58 | bwd_inner_microstep: 1534.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407
[2024-06-10 16:45:17,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.82 | bwd_microstep: 1209.58 | bwd_inner_microstep: 1209.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3425
[2024-06-10 16:45:19,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1308.53 | bwd_inner_microstep: 1308.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 16:45:21,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1375.05 | bwd_inner_microstep: 1375.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 16:45:22,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1380.18 | bwd_inner_microstep: 1380.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511
[2024-06-10 16:45:24,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1444.52 | bwd_inner_microstep: 1444.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 16:45:27,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1480.98 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-10 16:45:29,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.62 | bwd_microstep: 1587.34 | bwd_inner_microstep: 1587.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384
[2024-06-10 16:45:30,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.14 | bwd_microstep: 1240.61 | bwd_inner_microstep: 1240.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650
[2024-06-10 16:45:32,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.29 | bwd_microstep: 1425.84 | bwd_inner_microstep: 1425.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 16:45:35,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1658.35 | bwd_inner_microstep: 1658.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-10 16:45:37,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.92 | bwd_microstep: 1599.85 | bwd_inner_microstep: 1599.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 16:45:39,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615
[2024-06-10 16:45:39,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.89 | bwd_microstep: 260.58 | bwd_inner_microstep: 260.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 16:45:41,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1409.39 | bwd_inner_microstep: 1409.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 16:45:43,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1299.59 | bwd_inner_microstep: 1299.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 16:45:45,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.39 | bwd_microstep: 1297.56 | bwd_inner_microstep: 1297.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 16:45:47,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.97 | bwd_microstep: 1658.63 | bwd_inner_microstep: 1658.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 16:45:49,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.47 | bwd_microstep: 1449.72 | bwd_inner_microstep: 1449.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2062
[2024-06-10 16:45:50,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.53 | bwd_microstep: 1008.62 | bwd_inner_microstep: 1008.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3438
[2024-06-10 16:45:52,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.67 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3769
[2024-06-10 16:45:55,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1706.02 | bwd_inner_microstep: 1705.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 16:45:58,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 16:45:58,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 3152.64 | bwd_inner_microstep: 1830.39 | bwd_allreduce_microstep: 1322.21 | step_microstep: 37.99
[2024-06-10 16:45:58,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16125.90 | bwd: 44921.84 | bwd_inner: 43598.72 | bwd_allreduce: 1322.43 | step: 39.61
{'loss': 1.2554, 'learning_rate': 1.8275581476883472e-05, 'epoch': 0.54}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3552
[2024-06-10 16:46:00,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.55 | bwd_microstep: 1451.20 | bwd_inner_microstep: 1451.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 16:46:03,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.11 | bwd_microstep: 1619.42 | bwd_inner_microstep: 1619.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864
[2024-06-10 16:46:05,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.96 | bwd_microstep: 1459.24 | bwd_inner_microstep: 1459.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 16:46:07,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1377.16 | bwd_inner_microstep: 1377.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 16:46:09,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1552.12 | bwd_inner_microstep: 1552.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 16:46:11,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1532.87 | bwd_inner_microstep: 1532.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1952
[2024-06-10 16:46:12,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.30 | bwd_microstep: 702.54 | bwd_inner_microstep: 702.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 16:46:14,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1386.15 | bwd_inner_microstep: 1386.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 16:46:15,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.18 | bwd_microstep: 790.59 | bwd_inner_microstep: 790.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2169
[2024-06-10 16:46:16,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.75 | bwd_microstep: 760.05 | bwd_inner_microstep: 760.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 16:46:18,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.67 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 16:46:20,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 16:46:22,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1647.19 | bwd_inner_microstep: 1647.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3661
[2024-06-10 16:46:24,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.48 | bwd_microstep: 1661.28 | bwd_inner_microstep: 1661.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 16:46:26,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.99 | bwd_microstep: 1479.52 | bwd_inner_microstep: 1479.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-10 16:46:29,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1574.02 | bwd_inner_microstep: 1573.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 16:46:31,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.40 | bwd_microstep: 1599.01 | bwd_inner_microstep: 1598.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830
[2024-06-10 16:46:33,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.74 | bwd_microstep: 1488.40 | bwd_inner_microstep: 1488.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 16:46:35,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1394.61 | bwd_inner_microstep: 1394.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 16:46:37,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.81 | bwd_microstep: 1282.61 | bwd_inner_microstep: 1282.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 16:46:38,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.10 | bwd_microstep: 808.64 | bwd_inner_microstep: 808.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 16:46:40,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.46 | bwd_microstep: 1498.54 | bwd_inner_microstep: 1498.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 16:46:42,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.81 | bwd_microstep: 1455.67 | bwd_inner_microstep: 1455.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530
[2024-06-10 16:46:44,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1425.08 | bwd_inner_microstep: 1425.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 16:46:46,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1310.91 | bwd_inner_microstep: 1310.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3783
[2024-06-10 16:46:47,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147
[2024-06-10 16:46:49,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.95 | bwd_microstep: 948.65 | bwd_inner_microstep: 948.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3511
[2024-06-10 16:46:51,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1251.85 | bwd_inner_microstep: 1251.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-10 16:46:52,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.89 | bwd_microstep: 875.93 | bwd_inner_microstep: 875.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3810
[2024-06-10 16:46:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1528.59 | bwd_inner_microstep: 1528.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457
[2024-06-10 16:46:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1566.77 | bwd_inner_microstep: 1566.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801
[2024-06-10 16:47:00,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 16:47:00,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.80 | bwd_microstep: 3180.24 | bwd_inner_microstep: 1984.00 | bwd_allreduce_microstep: 1196.19 | step_microstep: 37.66
[2024-06-10 16:47:00,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16230.27 | bwd: 44863.01 | bwd_inner: 43665.93 | bwd_allreduce: 1196.42 | step: 39.17
{'loss': 1.1911, 'learning_rate': 1.823819034802091e-05, 'epoch': 0.54}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 16:47:02,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.30 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462
[2024-06-10 16:47:04,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.01 | bwd_microstep: 1209.44 | bwd_inner_microstep: 1209.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 16:47:06,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1406.38 | bwd_inner_microstep: 1406.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 16:47:08,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.71 | bwd_microstep: 1552.94 | bwd_inner_microstep: 1552.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 16:47:09,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1246.08 | bwd_inner_microstep: 1246.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2702
[2024-06-10 16:47:11,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.56 | bwd_microstep: 938.45 | bwd_inner_microstep: 938.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 16:47:12,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 16:47:14,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 16:47:16,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1281.60 | bwd_inner_microstep: 1281.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3724
[2024-06-10 16:47:18,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.59 | bwd_microstep: 1475.93 | bwd_inner_microstep: 1475.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 16:47:20,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1247.80 | bwd_inner_microstep: 1247.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 16:47:22,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.16 | bwd_microstep: 1339.31 | bwd_inner_microstep: 1339.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 16:47:24,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.39 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3686
[2024-06-10 16:47:26,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.89 | bwd_microstep: 1551.49 | bwd_inner_microstep: 1551.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:47:28,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.77 | bwd_microstep: 1390.75 | bwd_inner_microstep: 1390.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3690
[2024-06-10 16:47:30,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.15 | bwd_microstep: 1424.11 | bwd_inner_microstep: 1424.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2138
[2024-06-10 16:47:31,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.30 | bwd_microstep: 928.74 | bwd_inner_microstep: 928.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-10 16:47:33,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.52 | bwd_microstep: 1409.60 | bwd_inner_microstep: 1409.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 16:47:35,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1485.69 | bwd_inner_microstep: 1485.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 16:47:37,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1388.21 | bwd_inner_microstep: 1388.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 16:47:39,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.82 | bwd_microstep: 1321.17 | bwd_inner_microstep: 1321.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 16:47:41,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1558.46 | bwd_inner_microstep: 1558.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 16:47:43,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.53 | bwd_microstep: 1357.94 | bwd_inner_microstep: 1357.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2227
[2024-06-10 16:47:44,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.51 | bwd_microstep: 800.56 | bwd_inner_microstep: 800.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144
[2024-06-10 16:47:45,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.36 | bwd_microstep: 834.09 | bwd_inner_microstep: 834.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 16:47:47,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1466.42 | bwd_inner_microstep: 1466.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 16:47:49,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1487.38 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 16:47:51,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.53 | bwd_microstep: 1251.81 | bwd_inner_microstep: 1251.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3633
[2024-06-10 16:47:53,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.07 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3545
[2024-06-10 16:47:55,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.49 | bwd_microstep: 1454.27 | bwd_inner_microstep: 1454.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 16:47:57,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1499.31 | bwd_inner_microstep: 1499.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 16:48:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 16:48:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 4332.41 | bwd_inner_microstep: 1813.98 | bwd_allreduce_microstep: 2518.37 | step_microstep: 37.75
[2024-06-10 16:48:02,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16053.91 | bwd: 45543.48 | bwd_inner: 43024.21 | bwd_allreduce: 2518.61 | step: 39.26
{'loss': 1.2361, 'learning_rate': 1.820080542424278e-05, 'epoch': 0.54}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 16:48:04,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.00 | bwd_microstep: 1427.91 | bwd_inner_microstep: 1427.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3987
[2024-06-10 16:48:06,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.40 | bwd_microstep: 1602.62 | bwd_inner_microstep: 1602.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 16:48:08,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1377.13 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3883
[2024-06-10 16:48:10,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.35 | bwd_microstep: 1682.19 | bwd_inner_microstep: 1682.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 16:48:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1376.96 | bwd_inner_microstep: 1376.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:48:14,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2376
[2024-06-10 16:48:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.77 | bwd_microstep: 1027.31 | bwd_inner_microstep: 1027.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 16:48:17,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.88 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 16:48:19,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.82 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 16:48:20,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.69 | bwd_microstep: 697.76 | bwd_inner_microstep: 697.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 16:48:22,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1252.79 | bwd_inner_microstep: 1252.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 16:48:23,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.04 | bwd_microstep: 1151.48 | bwd_inner_microstep: 1151.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 16:48:26,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1501.56 | bwd_inner_microstep: 1501.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001
[2024-06-10 16:48:27,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.24 | bwd_microstep: 897.19 | bwd_inner_microstep: 897.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3442
[2024-06-10 16:48:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.22 | bwd_microstep: 1495.22 | bwd_inner_microstep: 1495.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3642
[2024-06-10 16:48:31,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.61 | bwd_microstep: 1643.43 | bwd_inner_microstep: 1643.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3635
[2024-06-10 16:48:33,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.64 | bwd_microstep: 1662.06 | bwd_inner_microstep: 1662.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 16:48:36,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.31 | bwd_microstep: 1599.39 | bwd_inner_microstep: 1599.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2419
[2024-06-10 16:48:37,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.06 | bwd_microstep: 963.87 | bwd_inner_microstep: 963.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999
[2024-06-10 16:48:38,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.06 | bwd_microstep: 739.05 | bwd_inner_microstep: 739.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184
[2024-06-10 16:48:39,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.84 | bwd_microstep: 955.40 | bwd_inner_microstep: 955.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 16:48:41,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.78 | bwd_inner_microstep: 1508.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585
[2024-06-10 16:48:43,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.12 | bwd_microstep: 1337.16 | bwd_inner_microstep: 1337.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2134
[2024-06-10 16:48:44,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.85 | bwd_microstep: 863.30 | bwd_inner_microstep: 863.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 16:48:46,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.57 | bwd_microstep: 1377.35 | bwd_inner_microstep: 1377.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 16:48:48,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1278.35 | bwd_inner_microstep: 1278.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3805
[2024-06-10 16:48:50,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.17 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970
[2024-06-10 16:48:51,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.21 | bwd_microstep: 829.33 | bwd_inner_microstep: 829.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3727
[2024-06-10 16:48:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.86 | bwd_microstep: 1487.10 | bwd_inner_microstep: 1487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-10 16:48:55,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.00 | bwd_microstep: 1640.24 | bwd_inner_microstep: 1640.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3607
[2024-06-10 16:48:57,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 1340.14 | bwd_inner_microstep: 1340.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 16:49:04,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 16:49:04,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.19 | bwd_microstep: 5707.87 | bwd_inner_microstep: 1662.06 | bwd_allreduce_microstep: 4045.75 | step_microstep: 38.21
[2024-06-10 16:49:04,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15585.07 | bwd: 45856.07 | bwd_inner: 41809.40 | bwd_allreduce: 4045.99 | step: 39.68
s/it]


 54%|█████▍    | 933/1726 [16:06:32<13:34:00, 61.59s/it]
 54%|█████▍    | 934/1726 [16:07:34<13:33:19, 61.62s/it]


 54%|█████▍    | 934/1726 [16:07:34<13:33:19, 61.62s/it]
 54%|█████▍    | 935/1726 [16:08:35<13:31:25, 61.55s/it]


 54%|█████▍    | 935/1726 [16:08:35<13:31:25, 61.55s/it]
 54%|█████▍    | 936/1726 [16:09:37<13:29:57, 61.52s/it]


 54%|█████▍    | 936/1726 [16:09:37<13:29:57, 61.52s/it]
 54%|█████▍    | 937/1726 [16:10:39<13:30:34, 61.64s/it]


 54%|█████▍    | 937/1726 [16:10:39<13:30:34, 61.64s/it]
 54%|█████▍    | 938/1726 [16:11:40<13:30:03, 61.68s/it]
                                      {'loss': 1.2365, 'learning_rate': 1.8163426837218604e-05, 'epoch': 0.54}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 16:49:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1476.50 | bwd_inner_microstep: 1476.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907
[2024-06-10 16:49:08,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.00 | bwd_microstep: 1586.50 | bwd_inner_microstep: 1586.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 16:49:10,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.42 | bwd_microstep: 1548.66 | bwd_inner_microstep: 1548.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-10 16:49:12,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.28 | bwd_microstep: 1347.60 | bwd_inner_microstep: 1347.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 16:49:14,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1379.80 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 16:49:15,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.44 | bwd_microstep: 679.47 | bwd_inner_microstep: 679.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:49:17,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2115
[2024-06-10 16:49:18,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.82 | bwd_microstep: 858.19 | bwd_inner_microstep: 858.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886
[2024-06-10 16:49:19,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.49 | bwd_microstep: 680.88 | bwd_inner_microstep: 680.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082
[2024-06-10 16:49:20,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.01 | bwd_microstep: 822.79 | bwd_inner_microstep: 822.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3761
[2024-06-10 16:49:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.44 | bwd_microstep: 1470.88 | bwd_inner_microstep: 1470.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3687
[2024-06-10 16:49:24,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.66 | bwd_microstep: 1658.66 | bwd_inner_microstep: 1658.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944
[2024-06-10 16:49:25,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.42 | bwd_microstep: 759.43 | bwd_inner_microstep: 759.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969
[2024-06-10 16:49:26,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.09 | bwd_microstep: 852.96 | bwd_inner_microstep: 852.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 16:49:28,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.46 | bwd_microstep: 1277.30 | bwd_inner_microstep: 1277.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397
[2024-06-10 16:49:30,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.77 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-10 16:49:32,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.08 | bwd_microstep: 1312.31 | bwd_inner_microstep: 1312.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3515
[2024-06-10 16:49:34,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.76 | bwd_microstep: 1528.17 | bwd_inner_microstep: 1528.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-10 16:49:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.72 | bwd_microstep: 810.74 | bwd_inner_microstep: 810.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2093
[2024-06-10 16:49:36,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.71 | bwd_microstep: 867.38 | bwd_inner_microstep: 867.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 16:49:38,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1386.68 | bwd_inner_microstep: 1386.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 16:49:40,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1438.92 | bwd_inner_microstep: 1438.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 16:49:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.20 | bwd_microstep: 1435.52 | bwd_inner_microstep: 1435.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 16:49:44,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1405.40 | bwd_inner_microstep: 1405.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 16:49:46,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1507.70 | bwd_inner_microstep: 1507.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 16:49:48,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.24 | bwd_microstep: 1457.94 | bwd_inner_microstep: 1457.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3535
[2024-06-10 16:49:50,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.13 | bwd_microstep: 1521.96 | bwd_inner_microstep: 1521.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 16:49:52,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.22 | bwd_microstep: 1352.62 | bwd_inner_microstep: 1352.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798
[2024-06-10 16:49:54,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.66 | bwd_microstep: 1457.54 | bwd_inner_microstep: 1457.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2031
[2024-06-10 16:49:55,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.99 | bwd_microstep: 901.76 | bwd_inner_microstep: 901.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2223
[2024-06-10 16:49:57,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.65 | bwd_microstep: 894.64 | bwd_inner_microstep: 894.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 16:50:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.57
[2024-06-10 16:50:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 8095.18 | bwd_inner_microstep: 1681.79 | bwd_allreduce_microstep: 6413.33 | step_microstep: 38.22
[2024-06-10 16:50:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14959.40 | bwd: 46561.35 | bwd_inner: 40147.08 | bwd_allreduce: 6413.56 | step: 39.75
{'loss': 1.1699, 'learning_rate': 1.8126054718595553e-05, 'epoch': 0.54}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 16:50:07,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1381.80 | bwd_inner_microstep: 1381.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 16:50:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1239.79 | bwd_inner_microstep: 1239.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 16:50:11,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.43 | bwd_microstep: 1340.60 | bwd_inner_microstep: 1340.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 16:50:13,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.17 | bwd_microstep: 1642.53 | bwd_inner_microstep: 1642.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 16:50:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.95 | bwd_microstep: 1554.08 | bwd_inner_microstep: 1554.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4103
[2024-06-10 16:50:18,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.56 | bwd_microstep: 1732.73 | bwd_inner_microstep: 1732.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1895
[2024-06-10 16:50:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.28 | bwd_microstep: 744.65 | bwd_inner_microstep: 744.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 16:50:20,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1244.44 | bwd_inner_microstep: 1244.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 16:50:22,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1249.28 | bwd_inner_microstep: 1249.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3720
[2024-06-10 16:50:24,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.87 | bwd_microstep: 1461.68 | bwd_inner_microstep: 1461.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3451
[2024-06-10 16:50:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1302.35 | bwd_inner_microstep: 1302.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 16:50:27,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.72 | bwd_microstep: 793.88 | bwd_inner_microstep: 793.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2900
[2024-06-10 16:50:29,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.25 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-10 16:50:31,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1508.59 | bwd_inner_microstep: 1508.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 16:50:33,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.10 | bwd_microstep: 1445.51 | bwd_inner_microstep: 1445.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3556
[2024-06-10 16:50:34,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.49 | bwd_microstep: 1199.92 | bwd_inner_microstep: 1199.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3649
[2024-06-10 16:50:36,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1318.89 | bwd_inner_microstep: 1318.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-10 16:50:38,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.95 | bwd_microstep: 919.56 | bwd_inner_microstep: 919.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 16:50:39,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1294.27 | bwd_inner_microstep: 1294.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 16:50:40,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 791.45 | bwd_inner_microstep: 791.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 16:50:42,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015
[2024-06-10 16:50:44,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 897.24 | bwd_inner_microstep: 897.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3479
[2024-06-10 16:50:46,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1424.89 | bwd_inner_microstep: 1424.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3565
[2024-06-10 16:50:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.61 | bwd_microstep: 1345.91 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2668
[2024-06-10 16:50:49,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.14 | bwd_microstep: 958.30 | bwd_inner_microstep: 958.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 16:50:51,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 1349.00 | bwd_inner_microstep: 1348.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 16:50:52,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1299.27 | bwd_inner_microstep: 1299.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 16:50:54,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.00 | bwd_microstep: 1458.37 | bwd_inner_microstep: 1458.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 16:50:57,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.59 | bwd_microstep: 1517.23 | bwd_inner_microstep: 1517.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3057
[2024-06-10 16:50:58,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.22 | bwd_microstep: 1236.25 | bwd_inner_microstep: 1236.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 16:51:00,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231
[2024-06-10 16:51:07,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 16:51:07,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.87 | bwd_microstep: 5748.35 | bwd_inner_microstep: 1090.53 | bwd_allreduce_microstep: 4657.77 | step_microstep: 38.07
[2024-06-10 16:51:07,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15278.06 | bwd: 45509.06 | bwd_inner: 40850.39 | bwd_allreduce: 4658.00 | step: 39.56
{'loss': 1.2266, 'learning_rate': 1.808868919999804e-05, 'epoch': 0.54}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393
[2024-06-10 16:51:08,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.32 | bwd_microstep: 1235.69 | bwd_inner_microstep: 1235.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3977
[2024-06-10 16:51:11,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.39 | bwd_microstep: 1698.07 | bwd_inner_microstep: 1698.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 16:51:12,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1244.44 | bwd_inner_microstep: 1244.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 16:51:14,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1386.18 | bwd_inner_microstep: 1386.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 16:51:16,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.69 | bwd_microstep: 1447.88 | bwd_inner_microstep: 1447.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 16:51:18,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1398.47 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-10 16:51:20,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.87 | bwd_microstep: 1296.72 | bwd_inner_microstep: 1296.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 16:51:22,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 16:51:23,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1290.31 | bwd_inner_microstep: 1290.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015
[2024-06-10 16:51:25,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.93 | bwd_microstep: 895.28 | bwd_inner_microstep: 895.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 16:51:27,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.99 | bwd_microstep: 1480.95 | bwd_inner_microstep: 1480.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1971
[2024-06-10 16:51:28,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.75 | bwd_microstep: 825.11 | bwd_inner_microstep: 825.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2781
[2024-06-10 16:51:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.99 | bwd_microstep: 1145.35 | bwd_inner_microstep: 1145.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3070
[2024-06-10 16:51:31,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.11 | bwd_microstep: 1236.08 | bwd_inner_microstep: 1236.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 16:51:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.21 | bwd_microstep: 1442.34 | bwd_inner_microstep: 1442.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-10 16:51:34,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.29 | bwd_microstep: 685.29 | bwd_inner_microstep: 685.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 16:51:36,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.60 | bwd_microstep: 1614.95 | bwd_inner_microstep: 1614.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2570
[2024-06-10 16:51:38,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.67 | bwd_microstep: 1000.69 | bwd_inner_microstep: 1000.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 16:51:40,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.01 | bwd_microstep: 1402.51 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 16:51:42,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.64 | bwd_microstep: 1430.65 | bwd_inner_microstep: 1430.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 16:51:44,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1380.36 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 16:51:46,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1554.82 | bwd_inner_microstep: 1554.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808
[2024-06-10 16:51:48,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.97 | bwd_microstep: 1385.97 | bwd_inner_microstep: 1385.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 16:51:49,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.05 | bwd_microstep: 1357.99 | bwd_inner_microstep: 1357.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 16:51:51,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1416.06 | bwd_inner_microstep: 1416.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 16:51:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 16:51:55,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2255
[2024-06-10 16:51:57,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.19 | bwd_microstep: 882.67 | bwd_inner_microstep: 882.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 16:51:59,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.65 | bwd_microstep: 1422.48 | bwd_inner_microstep: 1422.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 16:52:00,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.05 | bwd_microstep: 726.58 | bwd_inner_microstep: 726.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 16:52:02,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.80 | bwd_microstep: 1450.74 | bwd_inner_microstep: 1450.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 16:52:06,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-10 16:52:06,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.33 | bwd_microstep: 3656.19 | bwd_inner_microstep: 2158.19 | bwd_allreduce_microstep: 1497.95 | step_microstep: 37.72
[2024-06-10 16:52:06,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15574.54 | bwd: 43580.16 | bwd_inner: 42081.31 | bwd_allreduce: 1498.18 | step: 39.16
{'loss': 1.2691, 'learning_rate': 1.8051330413027227e-05, 'epoch': 0.55}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-10 16:52:08,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.45 | bwd_microstep: 1268.72 | bwd_inner_microstep: 1268.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2639
[2024-06-10 16:52:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.98 | bwd_microstep: 1018.43 | bwd_inner_microstep: 1018.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-10 16:52:11,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.51 | bwd_microstep: 1662.03 | bwd_inner_microstep: 1662.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3487
[2024-06-10 16:52:13,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1231.72 | bwd_inner_microstep: 1231.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-10 16:52:14,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.71 | bwd_microstep: 812.05 | bwd_inner_microstep: 812.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 16:52:16,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 16:52:17,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.41 | bwd_microstep: 795.84 | bwd_inner_microstep: 795.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3416
[2024-06-10 16:52:19,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1299.76 | bwd_inner_microstep: 1299.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3700
[2024-06-10 16:52:21,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.16 | bwd_microstep: 1659.23 | bwd_inner_microstep: 1659.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 16:52:22,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.12 | bwd_microstep: 684.34 | bwd_inner_microstep: 684.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410
[2024-06-10 16:52:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.83 | bwd_microstep: 1309.78 | bwd_inner_microstep: 1309.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2082
[2024-06-10 16:52:25,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.48 | bwd_microstep: 761.91 | bwd_inner_microstep: 761.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 16:52:27,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1497.64 | bwd_inner_microstep: 1497.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 16:52:29,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1297.29 | bwd_inner_microstep: 1297.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1997
[2024-06-10 16:52:30,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.71 | bwd_microstep: 896.61 | bwd_inner_microstep: 896.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 16:52:32,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1447.17 | bwd_inner_microstep: 1447.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-10 16:52:34,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.03 | bwd_microstep: 1242.48 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388
[2024-06-10 16:52:36,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.55 | bwd_microstep: 1244.70 | bwd_inner_microstep: 1244.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 16:52:38,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.43 | bwd_microstep: 1500.43 | bwd_inner_microstep: 1500.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 16:52:40,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.31 | bwd_microstep: 1555.64 | bwd_inner_microstep: 1555.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 16:52:42,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1289.08 | bwd_inner_microstep: 1289.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 16:52:43,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.30 | bwd_microstep: 1200.88 | bwd_inner_microstep: 1200.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3813
[2024-06-10 16:52:45,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.03 | bwd_microstep: 1262.52 | bwd_inner_microstep: 1262.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 604
[2024-06-10 16:52:45,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 107.34 | bwd_microstep: 259.18 | bwd_inner_microstep: 259.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 16:52:47,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062
[2024-06-10 16:52:48,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.26 | bwd_microstep: 819.55 | bwd_inner_microstep: 819.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 16:52:50,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1402.97 | bwd_inner_microstep: 1402.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 16:52:53,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1653.56 | bwd_inner_microstep: 1653.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3767
[2024-06-10 16:52:55,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.28 | bwd_microstep: 1587.00 | bwd_inner_microstep: 1586.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 16:52:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.13 | bwd_microstep: 1658.87 | bwd_inner_microstep: 1658.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2980
[2024-06-10 16:52:59,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.03 | bwd_microstep: 1198.20 | bwd_inner_microstep: 1198.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-10 16:53:06,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 16:53:06,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 7115.01 | bwd_inner_microstep: 810.96 | bwd_allreduce_microstep: 6303.99 | step_microstep: 38.10
[2024-06-10 16:53:06,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14603.60 | bwd: 45292.03 | bwd_inner: 38987.13 | bwd_allreduce: 6304.22 | step: 39.62
{'loss': 1.1832, 'learning_rate': 1.801397848926058e-05, 'epoch': 0.55}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 16:53:08,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1332.36 | bwd_inner_microstep: 1332.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 16:53:10,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.44 | bwd_microstep: 1145.83 | bwd_inner_microstep: 1145.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 16:53:12,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.02 | bwd_microstep: 1351.22 | bwd_inner_microstep: 1351.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496
[2024-06-10 16:53:13,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.29 | bwd_microstep: 1314.34 | bwd_inner_microstep: 1314.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1863
[2024-06-10 16:53:14,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.64 | bwd_microstep: 706.93 | bwd_inner_microstep: 706.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 16:53:16,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1244.66 | bwd_inner_microstep: 1244.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 16:53:18,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1280.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1948
[2024-06-10 16:53:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.32 | bwd_microstep: 760.54 | bwd_inner_microstep: 760.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2765
[2024-06-10 16:53:20,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.25 | bwd_microstep: 1047.79 | bwd_inner_microstep: 1047.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 16:53:22,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.59 | bwd_microstep: 1150.18 | bwd_inner_microstep: 1150.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 16:53:24,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1283.41 | bwd_inner_microstep: 1283.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3496
[2024-06-10 16:53:26,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1643.88 | bwd_inner_microstep: 1643.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 16:53:28,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1349.50 | bwd_inner_microstep: 1349.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 16:53:30,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.30 | bwd_microstep: 1470.39 | bwd_inner_microstep: 1470.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 16:53:32,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 16:53:33,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.51 | bwd_microstep: 800.55 | bwd_inner_microstep: 800.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 16:53:35,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1376.76 | bwd_inner_microstep: 1376.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1906
[2024-06-10 16:53:36,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.12 | bwd_microstep: 717.11 | bwd_inner_microstep: 717.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2305
[2024-06-10 16:53:37,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.64 | bwd_microstep: 1010.79 | bwd_inner_microstep: 1010.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 16:53:39,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1492.49 | bwd_inner_microstep: 1492.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278
[2024-06-10 16:53:41,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.81 | bwd_microstep: 1070.26 | bwd_inner_microstep: 1070.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2503
[2024-06-10 16:53:42,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.44 | bwd_microstep: 1026.83 | bwd_inner_microstep: 1026.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 16:53:44,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.13 | bwd_microstep: 1189.48 | bwd_inner_microstep: 1189.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3708
[2024-06-10 16:53:46,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.40 | bwd_microstep: 1362.05 | bwd_inner_microstep: 1362.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 16:53:47,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.52 | bwd_microstep: 1296.00 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3591
[2024-06-10 16:53:49,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.77 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 16:53:52,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1501.37 | bwd_inner_microstep: 1501.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 16:53:54,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.93 | bwd_microstep: 1495.49 | bwd_inner_microstep: 1495.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 16:53:56,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1399.27 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 16:53:57,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1358.16 | bwd_inner_microstep: 1358.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580
[2024-06-10 16:54:00,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.79 | bwd_microstep: 1692.31 | bwd_inner_microstep: 1692.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 16:54:08,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.29 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 16:54:08,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 7189.24 | bwd_inner_microstep: 1682.00 | bwd_allreduce_microstep: 5507.18 | step_microstep: 38.72
[2024-06-10 16:54:08,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15116.43 | bwd: 45905.09 | bwd_inner: 40397.00 | bwd_allreduce: 5507.41 | step: 40.17
{'loss': 1.2168, 'learning_rate': 1.797663356025136e-05, 'epoch': 0.55}


 54%|█████▍    | 938/1726 [16:11:40<13:30:03, 61.68s/it]
 54%|█████▍    | 939/1726 [16:12:42<13:29:42, 61.73s/it]


 54%|█████▍    | 939/1726 [16:12:42<13:29:42, 61.73s/it]
 54%|█████▍    | 940/1726 [16:13:43<13:26:14, 61.55s/it]


 54%|█████▍    | 940/1726 [16:13:43<13:26:14, 61.55s/it]
 55%|█████▍    | 941/1726 [16:14:43<13:17:05, 60.92s/it]


 55%|█████▍    | 941/1726 [16:14:43<13:17:05, 60.92s/it]
 55%|█████▍    | 942/1726 [16:15:43<13:13:20, 60.71s/it]


 55%|█████▍    | 942/1726 [16:15:43<13:13:20, 60.71s/it]
 55%|█████▍    | 943/1726 [16:16:44<13:14:48, 60.90s/it]


 55%|█████▍ dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 16:54:09,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.57 | bwd_microstep: 1375.01 | bwd_inner_microstep: 1374.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3872
[2024-06-10 16:54:12,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1560.78 | bwd_inner_microstep: 1560.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 16:54:14,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1377.20 | bwd_inner_microstep: 1377.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788
[2024-06-10 16:54:16,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.82 | bwd_microstep: 1451.15 | bwd_inner_microstep: 1451.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4082
[2024-06-10 16:54:18,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1526.74 | bwd_inner_microstep: 1526.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-10 16:54:20,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1549.52 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 16:54:21,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.87 | bwd_microstep: 812.28 | bwd_inner_microstep: 812.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3706
[2024-06-10 16:54:23,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.33 | bwd_microstep: 1264.24 | bwd_inner_microstep: 1264.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3739
[2024-06-10 16:54:25,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.21 | bwd_microstep: 1727.92 | bwd_inner_microstep: 1727.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2053
[2024-06-10 16:54:26,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.79 | bwd_microstep: 875.64 | bwd_inner_microstep: 875.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:54:28,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.38 | bwd_microstep: 1378.27 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 16:54:30,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.71 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-10 16:54:32,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.27 | bwd_microstep: 1615.94 | bwd_inner_microstep: 1615.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 16:54:34,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1251.95 | bwd_inner_microstep: 1251.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1916
[2024-06-10 16:54:35,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.26 | bwd_microstep: 750.24 | bwd_inner_microstep: 750.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3432
[2024-06-10 16:54:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.54 | bwd_microstep: 1230.95 | bwd_inner_microstep: 1230.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 16:54:39,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2120
[2024-06-10 16:54:40,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.66 | bwd_microstep: 734.17 | bwd_inner_microstep: 734.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 16:54:41,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1278.30 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 16:54:44,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 1531.30 | bwd_inner_microstep: 1531.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 16:54:45,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1307.37 | bwd_inner_microstep: 1307.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 16:54:47,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1507.31 | bwd_inner_microstep: 1507.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 16:54:49,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1396.93 | bwd_inner_microstep: 1396.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 16:54:51,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2085
[2024-06-10 16:54:52,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.95 | bwd_microstep: 769.55 | bwd_inner_microstep: 769.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551
[2024-06-10 16:54:54,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1201.19 | bwd_inner_microstep: 1201.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 16:54:56,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.80 | bwd_microstep: 1498.80 | bwd_inner_microstep: 1498.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 16:54:58,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1557.25 | bwd_inner_microstep: 1557.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 16:55:00,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.20 | bwd_microstep: 1580.25 | bwd_inner_microstep: 1580.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 16:55:02,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1414.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3644
[2024-06-10 16:55:05,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.48 | bwd_microstep: 1708.64 | bwd_inner_microstep: 1708.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 16:55:08,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 16:55:08,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.01 | bwd_microstep: 2372.25 | bwd_inner_microstep: 1774.55 | bwd_allreduce_microstep: 597.65 | step_microstep: 37.76
[2024-06-10 16:55:08,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16033.12 | bwd: 43629.82 | bwd_inner: 43031.27 | bwd_allreduce: 597.87 | step: 39.18
{'loss': 1.2336, 'learning_rate': 1.7939295757528225e-05, 'epoch': 0.55}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1996
[2024-06-10 16:55:09,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.29 | bwd_microstep: 860.72 | bwd_inner_microstep: 860.66 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911
[2024-06-10 16:55:11,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.84 | bwd_microstep: 1688.78 | bwd_inner_microstep: 1688.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 16:55:13,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1552.44 | bwd_inner_microstep: 1552.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 16:55:15,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1495.78 | bwd_inner_microstep: 1495.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 16:55:17,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1252.83 | bwd_inner_microstep: 1252.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 16:55:19,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.14 | bwd_microstep: 1533.67 | bwd_inner_microstep: 1533.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 16:55:21,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1389.57 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 16:55:23,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1389.02 | bwd_inner_microstep: 1388.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 16:55:25,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1248.55 | bwd_inner_microstep: 1248.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 16:55:27,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1348.38 | bwd_inner_microstep: 1348.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 16:55:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.21 | bwd_microstep: 793.42 | bwd_inner_microstep: 793.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 16:55:30,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1620.85 | bwd_inner_microstep: 1620.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 16:55:32,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.94 | bwd_microstep: 1475.40 | bwd_inner_microstep: 1475.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857
[2024-06-10 16:55:33,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.58 | bwd_microstep: 675.51 | bwd_inner_microstep: 675.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 16:55:35,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.56 | bwd_microstep: 1489.45 | bwd_inner_microstep: 1489.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 16:55:37,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.01 | bwd_microstep: 1301.34 | bwd_inner_microstep: 1301.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404
[2024-06-10 16:55:39,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.25 | bwd_microstep: 1306.54 | bwd_inner_microstep: 1306.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 16:55:40,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.73 | bwd_microstep: 697.50 | bwd_inner_microstep: 697.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2644
[2024-06-10 16:55:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.05 | bwd_microstep: 1115.88 | bwd_inner_microstep: 1115.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 16:55:43,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1256.05 | bwd_inner_microstep: 1256.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 16:55:45,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1391.48 | bwd_inner_microstep: 1391.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 16:55:46,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1182.21 | bwd_inner_microstep: 1182.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 16:55:49,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1653.44 | bwd_inner_microstep: 1653.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 16:55:51,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1429.47 | bwd_inner_microstep: 1429.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 16:55:53,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.05 | bwd_microstep: 1509.77 | bwd_inner_microstep: 1509.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 16:55:55,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.06 | bwd_microstep: 1544.31 | bwd_inner_microstep: 1544.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479
[2024-06-10 16:55:57,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.65 | bwd_microstep: 1217.09 | bwd_inner_microstep: 1217.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-10 16:55:59,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.87 | bwd_microstep: 1581.35 | bwd_inner_microstep: 1581.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3826
[2024-06-10 16:56:01,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.69 | bwd_microstep: 1510.08 | bwd_inner_microstep: 1510.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 16:56:03,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.76 | bwd_microstep: 1460.28 | bwd_inner_microstep: 1460.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-10 16:56:05,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.33 | bwd_microstep: 1305.76 | bwd_inner_microstep: 1305.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816
[2024-06-10 16:56:08,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.60
[2024-06-10 16:56:08,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.41 | bwd_microstep: 2400.33 | bwd_inner_microstep: 2052.06 | bwd_allreduce_microstep: 348.23 | step_microstep: 37.50
[2024-06-10 16:56:08,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16111.10 | bwd: 43677.29 | bwd_inner: 43328.13 | bwd_allreduce: 348.47 | step: 39.00
{'loss': 1.2113, 'learning_rate': 1.790196521259472e-05, 'epoch': 0.55}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2144
[2024-06-10 16:56:09,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.51 | bwd_microstep: 920.84 | bwd_inner_microstep: 920.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3878
[2024-06-10 16:56:11,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.13 | bwd_microstep: 1681.78 | bwd_inner_microstep: 1681.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 16:56:14,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.62 | bwd_microstep: 1653.15 | bwd_inner_microstep: 1653.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 16:56:15,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1244.85 | bwd_inner_microstep: 1244.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 16:56:17,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.96 | bwd_microstep: 1297.14 | bwd_inner_microstep: 1297.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 16:56:19,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1248.77 | bwd_inner_microstep: 1248.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 16:56:21,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1245.54 | bwd_inner_microstep: 1245.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 16:56:22,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.24 | bwd_microstep: 1308.27 | bwd_inner_microstep: 1308.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 16:56:24,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.05 | bwd_microstep: 1410.83 | bwd_inner_microstep: 1410.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 16:56:26,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1446.38 | bwd_inner_microstep: 1446.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 16:56:29,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.57 | bwd_microstep: 1714.62 | bwd_inner_microstep: 1714.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 16:56:31,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3021
[2024-06-10 16:56:32,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.43 | bwd_microstep: 1229.05 | bwd_inner_microstep: 1229.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284
[2024-06-10 16:56:34,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.07 | bwd_microstep: 1068.49 | bwd_inner_microstep: 1068.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3471
[2024-06-10 16:56:36,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1421.81 | bwd_inner_microstep: 1421.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 16:56:38,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.27 | bwd_microstep: 1519.71 | bwd_inner_microstep: 1519.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 16:56:40,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 16:56:42,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.86 | bwd_microstep: 1507.67 | bwd_inner_microstep: 1507.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987
[2024-06-10 16:56:43,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.73 | bwd_microstep: 706.70 | bwd_inner_microstep: 706.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 16:56:45,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1482.96 | bwd_inner_microstep: 1482.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 16:56:47,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.79 | bwd_microstep: 1336.12 | bwd_inner_microstep: 1336.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425
[2024-06-10 16:56:48,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.97 | bwd_microstep: 1281.49 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 16:56:50,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.89 | bwd_microstep: 1188.00 | bwd_inner_microstep: 1187.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 16:56:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.25 | bwd_microstep: 916.65 | bwd_inner_microstep: 916.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 16:56:54,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1605.38 | bwd_inner_microstep: 1605.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 16:56:55,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 1381.16 | bwd_inner_microstep: 1381.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 16:56:57,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1253.52 | bwd_inner_microstep: 1253.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3813
[2024-06-10 16:57:00,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.78 | bwd_microstep: 1859.93 | bwd_inner_microstep: 1859.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3563
[2024-06-10 16:57:02,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1525.39 | bwd_inner_microstep: 1525.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3760
[2024-06-10 16:57:04,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1307.71 | bwd_inner_microstep: 1307.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 16:57:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.91 | bwd_microstep: 962.15 | bwd_inner_microstep: 962.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 16:57:08,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 16:57:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 2645.59 | bwd_inner_microstep: 1524.54 | bwd_allreduce_microstep: 1121.01 | step_microstep: 37.60
[2024-06-10 16:57:08,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15993.07 | bwd: 44142.36 | bwd_inner: 43020.43 | bwd_allreduce: 1121.24 | step: 39.02
{'loss': 1.2606, 'learning_rate': 1.7864642056928823e-05, 'epoch': 0.55}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 16:57:10,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2623
[2024-06-10 16:57:11,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.31 | bwd_microstep: 1049.76 | bwd_inner_microstep: 1049.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515
[2024-06-10 16:57:13,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1247.78 | bwd_inner_microstep: 1247.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 16:57:15,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.25 | bwd_microstep: 1404.93 | bwd_inner_microstep: 1404.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-10 16:57:17,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.40 | bwd_microstep: 1644.79 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506
[2024-06-10 16:57:19,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.62 | bwd_microstep: 1551.57 | bwd_inner_microstep: 1551.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 16:57:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 16:57:23,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.94 | bwd_microstep: 1179.07 | bwd_inner_microstep: 1179.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549
[2024-06-10 16:57:25,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.55 | bwd_microstep: 1198.51 | bwd_inner_microstep: 1198.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 16:57:27,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2196
[2024-06-10 16:57:28,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.22 | bwd_microstep: 954.65 | bwd_inner_microstep: 954.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 16:57:30,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1283.77 | bwd_inner_microstep: 1283.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3464
[2024-06-10 16:57:32,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1324.18 | bwd_inner_microstep: 1324.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 16:57:34,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1511.50 | bwd_inner_microstep: 1511.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669
[2024-06-10 16:57:36,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1556.10 | bwd_inner_microstep: 1556.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-10 16:57:37,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.13 | bwd_microstep: 890.39 | bwd_inner_microstep: 890.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-10 16:57:39,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.34 | bwd_microstep: 1334.27 | bwd_inner_microstep: 1334.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 16:57:40,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.63 | bwd_microstep: 678.92 | bwd_inner_microstep: 678.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3375
[2024-06-10 16:57:42,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.85 | bwd_microstep: 1271.82 | bwd_inner_microstep: 1271.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 16:57:44,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.48 | bwd_microstep: 1485.15 | bwd_inner_microstep: 1485.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416
[2024-06-10 16:57:46,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1309.07 | bwd_inner_microstep: 1309.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 16:57:47,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.07 | bwd_microstep: 795.82 | bwd_inner_microstep: 795.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 16:57:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.21 | bwd_microstep: 1497.70 | bwd_inner_microstep: 1497.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 16:57:51,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.13 | bwd_microstep: 1350.81 | bwd_inner_microstep: 1350.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 16:57:52,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.55 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 16:57:54,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.70 | bwd_microstep: 1358.92 | bwd_inner_microstep: 1358.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 16:57:56,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.00 | bwd_microstep: 1458.21 | bwd_inner_microstep: 1458.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 16:57:57,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1389.77 | bwd_inner_microstep: 1389.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 16:57:59,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.20 | bwd_microstep: 1451.61 | bwd_inner_microstep: 1451.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 16:58:01,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.46 | bwd_microstep: 973.04 | bwd_inner_microstep: 973.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 16:58:03,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.23 | bwd_microstep: 1551.51 | bwd_inner_microstep: 1551.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430
[2024-06-10 16:58:10,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 16:58:10,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 6424.32 | bwd_inner_microstep: 1606.59 | bwd_allreduce_microstep: 4817.66 | step_microstep: 38.93
[2024-06-10 16:58:10,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15397.32 | bwd: 46086.01 | bwd_inner: 41267.44 | bwd_allreduce: 4817.90 | step: 40.42
{'loss': 1.1439, 'learning_rate': 1.7827326421982513e-05, 'epoch': 0.55}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-10 16:58:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1537.97 | bwd_inner_microstep: 1537.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3891
[2024-06-10 16:58:14,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.48 | bwd_microstep: 1479.21 | bwd_inner_microstep: 1479.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 16:58:16,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1343.51 | bwd_inner_microstep: 1343.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939
[2024-06-10 16:58:17,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.16 | bwd_microstep: 822.21 | bwd_inner_microstep: 822.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1355
[2024-06-10 16:58:18,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 221.49 | bwd_microstep: 581.81 | bwd_inner_microstep: 581.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4091
[2024-06-10 16:58:20,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.07 | bwd_microstep: 1693.03 | bwd_inner_microstep: 1693.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 16:58:22,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1523.78 | bwd_inner_microstep: 1523.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-10 16:58:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.59 | bwd_microstep: 1156.15 | bwd_inner_microstep: 1156.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 16:58:26,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1486.36 | bwd_inner_microstep: 1486.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403
[2024-06-10 16:58:28,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.87 | bwd_microstep: 1215.75 | bwd_inner_microstep: 1215.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 16:58:30,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1522.71 | bwd_inner_microstep: 1522.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3614
[2024-06-10 16:58:32,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.03 | bwd_microstep: 1446.98 | bwd_inner_microstep: 1446.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 16:58:34,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-10 16:58:35,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.24 | bwd_microstep: 828.33 | bwd_inner_microstep: 828.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506
[2024-06-10 16:58:37,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1318.61 | bwd_inner_microstep: 1318.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 16:58:39,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1390.62 | bwd_inner_microstep: 1390.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3651
[2024-06-10 16:58:41,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.85 | bwd_microstep: 1543.52 | bwd_inner_microstep: 1543.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446
[2024-06-10 16:58:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1414.06 | bwd_inner_microstep: 1414.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 16:58:45,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1648.04 | bwd_inner_microstep: 1648.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-10 16:58:46,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.29 | bwd_microstep: 899.06 | bwd_inner_microstep: 899.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076
[2024-06-10 16:58:47,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 819.64 | bwd_inner_microstep: 819.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 16:58:49,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1347.08 | bwd_inner_microstep: 1347.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 16:58:51,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1415.41 | bwd_inner_microstep: 1415.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 16:58:53,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.20 | bwd_microstep: 1179.48 | bwd_inner_microstep: 1179.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 16:58:55,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.17 | bwd_microstep: 1409.99 | bwd_inner_microstep: 1409.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035
[2024-06-10 16:58:56,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.95 | bwd_microstep: 810.55 | bwd_inner_microstep: 810.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044
[2024-06-10 16:58:57,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.75 | bwd_microstep: 810.90 | bwd_inner_microstep: 810.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 16:58:59,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1398.15 | bwd_inner_microstep: 1398.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812
[2024-06-10 16:59:01,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.36 | bwd_microstep: 1357.47 | bwd_inner_microstep: 1357.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 16:59:03,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1502.26 | bwd_inner_microstep: 1502.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580
[2024-06-10 16:59:05,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.14 | bwd_microstep: 1602.11 | bwd_inner_microstep: 1602.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 16:59:12,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 16:59:12,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 6051.38 | bwd_inner_microstep: 1445.57 | bwd_allreduce_microstep: 4605.75 | step_microstep: 38.11
[2024-06-10 16:59:12,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15421.56 | bwd: 45948.47 | bwd_inner: 41341.82 | bwd_allreduce: 4605.97 | step: 39.61
{'loss': 1.2524, 'learning_rate': 1.7790018439181243e-05, 'epoch': 0.55}
   | 943/1726 [16:16:44<13:14:48, 60.90s/it]
 55%|█████▍    | 944/1726 [16:17:44<13:10:13, 60.63s/it]


 55%|█████▍    | 944/1726 [16:17:44<13:10:13, 60.63s/it]
 55%|█████▍    | 945/1726 [16:18:44<13:07:13, 60.48s/it]


 55%|█████▍    | 945/1726 [16:18:44<13:07:13, 60.48s/it]
 55%|█████▍    | 946/1726 [16:19:45<13:06:10, 60.48s/it]


 55%|█████▍    | 946/1726 [16:19:45<13:06:10, 60.48s/it]
 55%|█████▍    | 947/1726 [16:20:47<13:10:25, 60.88s/it]


 55%|█████▍    | 947/1726 [16:20:47<13:10:25, 60.88s/it]
 55%|█████▍    | 948/1726 [16:21:48<13:12:36, 61.13s/it]


 55%|█████▍    | 948/1726 [16:21:48<13:12:36, 61.13s/it]dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3447
[2024-06-10 16:59:14,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1541.46 | bwd_inner_microstep: 1541.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3928
[2024-06-10 16:59:16,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.15 | bwd_microstep: 1587.63 | bwd_inner_microstep: 1587.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 16:59:18,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1383.47 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 16:59:20,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1479.49 | bwd_inner_microstep: 1479.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 16:59:22,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.77 | bwd_microstep: 1244.24 | bwd_inner_microstep: 1244.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952
[2024-06-10 16:59:23,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.26 | bwd_microstep: 728.30 | bwd_inner_microstep: 728.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 16:59:25,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1388.30 | bwd_inner_microstep: 1388.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 16:59:26,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-10 16:59:27,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.28 | bwd_microstep: 684.68 | bwd_inner_microstep: 684.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3690
[2024-06-10 16:59:29,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1477.31 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3675
[2024-06-10 16:59:31,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.36 | bwd_microstep: 1356.14 | bwd_inner_microstep: 1356.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3661
[2024-06-10 16:59:33,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1484.40 | bwd_inner_microstep: 1484.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2978
[2024-06-10 16:59:35,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.21 | bwd_microstep: 1195.87 | bwd_inner_microstep: 1195.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684
[2024-06-10 16:59:37,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.61 | bwd_microstep: 1718.72 | bwd_inner_microstep: 1718.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465
[2024-06-10 16:59:39,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1329.53 | bwd_inner_microstep: 1329.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922
[2024-06-10 16:59:40,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.89 | bwd_microstep: 818.54 | bwd_inner_microstep: 818.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463
[2024-06-10 16:59:42,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.20 | bwd_microstep: 1241.98 | bwd_inner_microstep: 1241.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 16:59:44,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1386.18 | bwd_inner_microstep: 1386.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3660
[2024-06-10 16:59:46,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1323.26 | bwd_inner_microstep: 1323.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 16:59:48,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.61 | bwd_microstep: 1532.34 | bwd_inner_microstep: 1532.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-10 16:59:50,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.33 | bwd_microstep: 1628.26 | bwd_inner_microstep: 1628.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865
[2024-06-10 16:59:51,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.15 | bwd_microstep: 739.27 | bwd_inner_microstep: 739.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 16:59:53,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1560.17 | bwd_inner_microstep: 1560.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 16:59:55,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.86 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 16:59:58,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1603.07 | bwd_inner_microstep: 1603.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2291
[2024-06-10 16:59:59,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.01 | bwd_microstep: 1071.31 | bwd_inner_microstep: 1071.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-10 17:00:01,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.08 | bwd_microstep: 1749.89 | bwd_inner_microstep: 1749.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-10 17:00:03,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.34 | bwd_microstep: 973.15 | bwd_inner_microstep: 973.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3578
[2024-06-10 17:00:05,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1462.80 | bwd_inner_microstep: 1462.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4154
[2024-06-10 17:00:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.27 | bwd_microstep: 1552.71 | bwd_inner_microstep: 1552.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 17:00:09,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1651.38 | bwd_inner_microstep: 1651.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 17:00:15,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 17:00:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.46 | bwd_microstep: 5095.53 | bwd_inner_microstep: 900.41 | bwd_allreduce_microstep: 4195.06 | step_microstep: 37.96
[2024-06-10 17:00:15,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15843.24 | bwd: 46760.38 | bwd_inner: 42564.41 | bwd_allreduce: 4195.29 | step: 39.41
{'loss': 1.2015, 'learning_rate': 1.775271823992354e-05, 'epoch': 0.55}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 17:00:17,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1392.54 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4386
[2024-06-10 17:00:19,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.59 | bwd_microstep: 1710.79 | bwd_inner_microstep: 1710.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 17:00:20,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.81 | bwd_microstep: 790.67 | bwd_inner_microstep: 790.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 17:00:22,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1248.25 | bwd_inner_microstep: 1248.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2674
[2024-06-10 17:00:23,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.64 | bwd_microstep: 1024.54 | bwd_inner_microstep: 1024.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 17:00:25,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.12 | bwd_inner_microstep: 1150.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3717
[2024-06-10 17:00:27,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1270.77 | bwd_inner_microstep: 1270.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 17:00:29,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1623.42 | bwd_inner_microstep: 1623.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 17:00:31,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1482.79 | bwd_inner_microstep: 1482.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 17:00:33,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1480.52 | bwd_inner_microstep: 1480.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 17:00:35,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.08 | bwd_microstep: 1285.09 | bwd_inner_microstep: 1285.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 17:00:36,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3074
[2024-06-10 17:00:38,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.81 | bwd_microstep: 1238.52 | bwd_inner_microstep: 1238.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3406
[2024-06-10 17:00:40,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.96 | bwd_microstep: 1309.28 | bwd_inner_microstep: 1309.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 17:00:42,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1394.73 | bwd_inner_microstep: 1394.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651
[2024-06-10 17:00:44,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.86 | bwd_microstep: 1323.20 | bwd_inner_microstep: 1323.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2188
[2024-06-10 17:00:45,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.38 | bwd_microstep: 858.32 | bwd_inner_microstep: 858.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 17:00:47,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.22 | bwd_microstep: 1532.58 | bwd_inner_microstep: 1532.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-10 17:00:49,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1319.44 | bwd_inner_microstep: 1319.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 17:00:51,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.42 | bwd_microstep: 1291.53 | bwd_inner_microstep: 1291.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 17:00:53,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3703
[2024-06-10 17:00:55,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1233.54 | bwd_inner_microstep: 1233.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 17:00:57,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1499.97 | bwd_inner_microstep: 1499.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003
[2024-06-10 17:00:58,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.39 | bwd_microstep: 740.15 | bwd_inner_microstep: 740.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 17:00:59,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 17:01:02,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.71 | bwd_microstep: 1540.55 | bwd_inner_microstep: 1540.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 17:01:03,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1282.92 | bwd_inner_microstep: 1282.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565
[2024-06-10 17:01:05,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.71 | bwd_microstep: 1334.15 | bwd_inner_microstep: 1334.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 17:01:06,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.17 | bwd_microstep: 702.22 | bwd_inner_microstep: 702.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 17:01:08,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 1485.62 | bwd_inner_microstep: 1485.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 17:01:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.45 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 17:01:15,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.32 | optimizer_step: 6.59
[2024-06-10 17:01:15,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.06 | bwd_microstep: 4903.99 | bwd_inner_microstep: 1814.54 | bwd_allreduce_microstep: 3089.39 | step_microstep: 39.50
[2024-06-10 17:01:15,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.06 | bwd: 44889.52 | bwd_inner: 41799.12 | bwd_allreduce: 3089.67 | step: 41.04
{'loss': 1.193, 'learning_rate': 1.7715425955580512e-05, 'epoch': 0.55}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924
[2024-06-10 17:01:17,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.16 | bwd_microstep: 839.31 | bwd_inner_microstep: 839.18 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931
[2024-06-10 17:01:19,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 17:01:21,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.25 | bwd_microstep: 1551.62 | bwd_inner_microstep: 1551.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3875
[2024-06-10 17:01:23,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.87 | bwd_microstep: 1680.04 | bwd_inner_microstep: 1680.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 17:01:25,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1376.57 | bwd_inner_microstep: 1376.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 17:01:27,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.21 | bwd_microstep: 1478.85 | bwd_inner_microstep: 1478.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 17:01:29,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 17:01:31,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-10 17:01:33,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.87 | bwd_microstep: 1526.60 | bwd_inner_microstep: 1526.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 17:01:35,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1496.85 | bwd_inner_microstep: 1496.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 17:01:37,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.43 | bwd_microstep: 1482.88 | bwd_inner_microstep: 1482.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 17:01:39,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1286.36 | bwd_inner_microstep: 1286.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 17:01:40,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.60 | bwd_microstep: 798.12 | bwd_inner_microstep: 798.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 17:01:41,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.35 | bwd_microstep: 1277.65 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3511
[2024-06-10 17:01:43,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1436.24 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3246
[2024-06-10 17:01:45,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.09 | bwd_microstep: 1278.63 | bwd_inner_microstep: 1278.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3612
[2024-06-10 17:01:47,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.12 | bwd_microstep: 1431.06 | bwd_inner_microstep: 1431.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 17:01:48,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 790.99 | bwd_inner_microstep: 790.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537
[2024-06-10 17:01:51,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1593.07 | bwd_inner_microstep: 1593.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 17:01:53,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.54 | bwd_microstep: 1619.64 | bwd_inner_microstep: 1619.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1958
[2024-06-10 17:01:54,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.11 | bwd_microstep: 736.22 | bwd_inner_microstep: 736.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 560
[2024-06-10 17:01:54,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 98.09 | bwd_microstep: 247.79 | bwd_inner_microstep: 247.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 17:01:56,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1395.00 | bwd_inner_microstep: 1394.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 17:01:58,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1609.22 | bwd_inner_microstep: 1609.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 17:02:00,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1493.05 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3566
[2024-06-10 17:02:02,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1549.28 | bwd_inner_microstep: 1549.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 17:02:04,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1400.37 | bwd_inner_microstep: 1400.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 17:02:07,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.71 | bwd_microstep: 1653.42 | bwd_inner_microstep: 1653.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 17:02:08,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.71 | bwd_microstep: 970.44 | bwd_inner_microstep: 970.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023
[2024-06-10 17:02:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.76 | bwd_microstep: 807.81 | bwd_inner_microstep: 807.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 17:02:11,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1497.87 | bwd_inner_microstep: 1497.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 17:02:17,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 17:02:17,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 4940.49 | bwd_inner_microstep: 1687.20 | bwd_allreduce_microstep: 3253.24 | step_microstep: 38.45
[2024-06-10 17:02:17,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15641.87 | bwd: 45271.35 | bwd_inner: 42017.11 | bwd_allreduce: 3253.52 | step: 39.92
{'loss': 1.2385, 'learning_rate': 1.7678141717495394e-05, 'epoch': 0.55}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 17:02:19,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1373.31 | bwd_inner_microstep: 1373.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3924
[2024-06-10 17:02:21,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.75 | bwd_microstep: 1590.81 | bwd_inner_microstep: 1590.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 17:02:22,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.09 | bwd_microstep: 1156.21 | bwd_inner_microstep: 1156.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3791
[2024-06-10 17:02:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.52 | bwd_microstep: 1444.89 | bwd_inner_microstep: 1444.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 17:02:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1343.00 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 17:02:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.16 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 17:02:30,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225
[2024-06-10 17:02:32,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.31 | bwd_microstep: 959.33 | bwd_inner_microstep: 959.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2656
[2024-06-10 17:02:33,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.47 | bwd_microstep: 1022.61 | bwd_inner_microstep: 1022.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 17:02:35,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1417.43 | bwd_inner_microstep: 1417.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 17:02:37,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.37 | bwd_microstep: 1483.22 | bwd_inner_microstep: 1483.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 17:02:39,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1512.55 | bwd_inner_microstep: 1512.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 17:02:41,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.70 | bwd_microstep: 1282.57 | bwd_inner_microstep: 1282.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486
[2024-06-10 17:02:43,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.32 | bwd_microstep: 1314.61 | bwd_inner_microstep: 1314.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-10 17:02:45,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1445.85 | bwd_inner_microstep: 1445.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 17:02:47,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1606.40 | bwd_inner_microstep: 1606.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 17:02:49,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1410.61 | bwd_inner_microstep: 1410.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-10 17:02:51,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1314.39 | bwd_inner_microstep: 1314.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 17:02:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1323.62 | bwd_inner_microstep: 1323.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534
[2024-06-10 17:02:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1228.95 | bwd_inner_microstep: 1228.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 17:02:56,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1414.33 | bwd_inner_microstep: 1414.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 17:02:58,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.75 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 17:03:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1397.72 | bwd_inner_microstep: 1397.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 17:03:01,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 700.24 | bwd_inner_microstep: 700.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 17:03:03,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.26 | bwd_microstep: 1359.72 | bwd_inner_microstep: 1359.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3815
[2024-06-10 17:03:05,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1506.13 | bwd_inner_microstep: 1506.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3865
[2024-06-10 17:03:07,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1401.00 | bwd_inner_microstep: 1400.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2039
[2024-06-10 17:03:08,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.04 | bwd_microstep: 782.68 | bwd_inner_microstep: 782.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 17:03:10,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.79 | bwd_microstep: 1496.32 | bwd_inner_microstep: 1496.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 17:03:12,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1300.04 | bwd_inner_microstep: 1300.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 17:03:14,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.64 | bwd_microstep: 1598.88 | bwd_inner_microstep: 1598.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3593
[2024-06-10 17:03:18,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.63
[2024-06-10 17:03:18,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 3069.94 | bwd_inner_microstep: 1768.81 | bwd_allreduce_microstep: 1301.08 | step_microstep: 37.94
[2024-06-10 17:03:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16093.31 | bwd: 44429.23 | bwd_inner: 43127.25 | bwd_allreduce: 1301.31 | step: 39.47
{'loss': 1.2139, 'learning_rate': 1.7640865656983084e-05, 'epoch': 0.55}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486
[2024-06-10 17:03:20,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.57 | bwd_microstep: 1574.70 | bwd_inner_microstep: 1574.60 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 17:03:21,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 792.33 | bwd_inner_microstep: 792.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470
[2024-06-10 17:03:23,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.78 | bwd_microstep: 1341.02 | bwd_inner_microstep: 1340.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 17:03:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.13 | bwd_microstep: 1644.41 | bwd_inner_microstep: 1644.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224
[2024-06-10 17:03:26,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 956.14 | bwd_inner_microstep: 956.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 17:03:27,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 796.46 | bwd_inner_microstep: 796.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 17:03:29,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 17:03:31,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.44 | bwd_inner_microstep: 1245.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-10 17:03:33,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.58 | bwd_microstep: 1190.10 | bwd_inner_microstep: 1190.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 17:03:34,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.05 | bwd_microstep: 1278.42 | bwd_inner_microstep: 1278.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2116
[2024-06-10 17:03:36,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.38 | bwd_microstep: 929.01 | bwd_inner_microstep: 928.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 17:03:38,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1480.68 | bwd_inner_microstep: 1480.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955
[2024-06-10 17:03:39,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.77 | bwd_microstep: 825.72 | bwd_inner_microstep: 825.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1904
[2024-06-10 17:03:40,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.37 | bwd_microstep: 714.22 | bwd_inner_microstep: 714.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-10 17:03:42,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1432.65 | bwd_inner_microstep: 1432.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 17:03:43,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.05 | bwd_microstep: 800.10 | bwd_inner_microstep: 800.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 17:03:45,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1353.18 | bwd_inner_microstep: 1353.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430
[2024-06-10 17:03:46,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.65 | bwd_microstep: 1192.76 | bwd_inner_microstep: 1192.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 17:03:48,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1470.05 | bwd_inner_microstep: 1470.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3529
[2024-06-10 17:03:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.12 | bwd_microstep: 1399.40 | bwd_inner_microstep: 1399.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 17:03:52,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.55 | bwd_microstep: 1404.31 | bwd_inner_microstep: 1404.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 17:03:55,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.78 | bwd_microstep: 1658.16 | bwd_inner_microstep: 1658.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 17:03:56,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.61 | bwd_microstep: 806.59 | bwd_inner_microstep: 806.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 17:03:58,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2288
[2024-06-10 17:03:59,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.12 | bwd_microstep: 1022.55 | bwd_inner_microstep: 1022.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 17:04:01,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1652.75 | bwd_inner_microstep: 1652.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 17:04:03,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.87 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3779
[2024-06-10 17:04:05,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1384.33 | bwd_inner_microstep: 1384.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 17:04:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1637.11 | bwd_inner_microstep: 1637.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 17:04:09,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.73 | bwd_microstep: 1653.70 | bwd_inner_microstep: 1653.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 17:04:11,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 17:04:20,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.27 | optimizer_step: 6.57
[2024-06-10 17:04:20,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.65 | bwd_microstep: 7826.85 | bwd_inner_microstep: 1323.82 | bwd_allreduce_microstep: 6502.97 | step_microstep: 38.48
[2024-06-10 17:04:20,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15047.78 | bwd: 46789.80 | bwd_inner: 40285.82 | bwd_allreduce: 6503.27 | step: 40.01
{'loss': 1.2357, 'learning_rate': 1.7603597905329658e-05, 'epoch': 0.55}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 17:04:22,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.46 | bwd_microstep: 1375.67 | bwd_inner_microstep: 1375.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:04:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1240.92 | bwd_inner_microstep: 1240.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4242
[2024-06-10 17:04:26,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.70 | bwd_microstep: 1559.94 | bwd_inner_microstep: 1559.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 17:04:27,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1244.54 | bwd_inner_microstep: 1244.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 17:04:29,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.52 | bwd_microstep: 1538.00 | bwd_inner_microstep: 1537.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 17:04:31,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.98 | bwd_microstep: 1275.63 | bwd_inner_microstep: 1275.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866
[2024-06-10 17:04:32,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 676.84 | bwd_inner_microstep: 676.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2886
[2024-06-10 17:04:34,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.41 | bwd_microstep: 1181.81 | bwd_inner_microstep: 1181.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 17:04:36,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.34 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 17:04:38,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.17 | bwd_microstep: 1485.22 | bwd_inner_microstep: 1485.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 17:04:39,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-10 17:04:41,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.25 | bwd_microstep: 1158.37 | bwd_inner_microstep: 1158.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3647
[2024-06-10 17:04:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1279.48 | bwd_inner_microstep: 1279.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 17:04:44,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.10 | bwd_microstep: 895.91 | bwd_inner_microstep: 895.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3533
[2024-06-10 17:04:46,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.17 | bwd_microstep: 1554.28 | bwd_inner_microstep: 1554.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3627
[2024-06-10 17:04:48,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.44 | bwd_microstep: 1454.69 | bwd_inner_microstep: 1454.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 17:04:50,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.00 | bwd_microstep: 1521.93 | bwd_inner_microstep: 1521.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 17:04:52,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.94 | bwd_microstep: 1477.46 | bwd_inner_microstep: 1477.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 17:04:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.01 | bwd_microstep: 1334.32 | bwd_inner_microstep: 1334.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-10 17:04:56,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1512.83 | bwd_inner_microstep: 1512.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3559
[2024-06-10 17:04:58,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.20 | bwd_microstep: 1360.17 | bwd_inner_microstep: 1360.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437
[2024-06-10 17:05:00,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1284.06 | bwd_inner_microstep: 1284.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 17:05:02,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1353.00 | bwd_inner_microstep: 1352.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 17:05:04,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.39 | bwd_microstep: 1443.54 | bwd_inner_microstep: 1443.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2223
[2024-06-10 17:05:05,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.94 | bwd_microstep: 863.09 | bwd_inner_microstep: 863.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 17:05:07,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.96 | bwd_microstep: 1277.63 | bwd_inner_microstep: 1277.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758
[2024-06-10 17:05:09,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 17:05:11,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.92 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 17:05:13,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.12 | bwd_microstep: 1545.69 | bwd_inner_microstep: 1545.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 17:05:15,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.89 | bwd_microstep: 1653.96 | bwd_inner_microstep: 1653.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 17:05:17,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1499.54 | bwd_inner_microstep: 1499.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 17:05:21,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.15 | optimizer_gradients: 4.06 | optimizer_step: 6.63
[2024-06-10 17:05:21,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.94 | bwd_microstep: 3447.13 | bwd_inner_microstep: 1757.31 | bwd_allreduce_microstep: 1689.76 | step_microstep: 38.17
[2024-06-10 17:05:21,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16181.54 | bwd: 45126.00 | bwd_inner: 43435.33 | bwd_allreduce: 1689.99 | step: 39.71

 55%|█████▍    | 949/1726 [16:22:51<13:18:36, 61.67s/it]


 55%|█████▍    | 949/1726 [16:22:51<13:18:36, 61.67s/it]
 55%|█████▌    | 950/1726 [16:23:52<13:14:30, 61.43s/it]


 55%|█████▌    | 950/1726 [16:23:52<13:14:30, 61.43s/it]
 55%|█████▌    | 951/1726 [16:24:53<13:12:46, 61.38s/it]


 55%|█████▌    | 951/1726 [16:24:53<13:12:46, 61.38s/it]
 55%|█████▌    | 952/1726 [16:25:54<13:09:43, 61.22s/it]


 55%|█████▌    | 952/1726 [16:25:54<13:09:43, 61.22s/it]
 55%|█████▌    | 953/1726 [16:26:57<13:12:21, 61.50s/it]


 55%|█████▌    | 953/1726 [16:26:57<13:12:21, 61.50s/it]
 55%|█████▌    | 954/1726 [16:2{'loss': 1.1952, 'learning_rate': 1.7566338593791955e-05, 'epoch': 0.55}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410
[2024-06-10 17:05:23,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.88 | bwd_microstep: 1367.20 | bwd_inner_microstep: 1367.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 17:05:25,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.63 | bwd_microstep: 874.36 | bwd_inner_microstep: 874.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 17:05:26,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1378.55 | bwd_inner_microstep: 1378.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 17:05:29,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.87 | bwd_microstep: 1543.10 | bwd_inner_microstep: 1543.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 17:05:30,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1381.15 | bwd_inner_microstep: 1381.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 17:05:32,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.32 | bwd_microstep: 1197.62 | bwd_inner_microstep: 1197.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3707
[2024-06-10 17:05:34,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.72 | bwd_microstep: 1488.69 | bwd_inner_microstep: 1488.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2162
[2024-06-10 17:05:35,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.52 | bwd_microstep: 910.27 | bwd_inner_microstep: 910.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2772
[2024-06-10 17:05:37,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.98 | bwd_microstep: 951.95 | bwd_inner_microstep: 951.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3483
[2024-06-10 17:05:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.63 | bwd_microstep: 1675.56 | bwd_inner_microstep: 1675.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3505
[2024-06-10 17:05:41,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1510.81 | bwd_inner_microstep: 1510.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 17:05:43,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1337.20 | bwd_inner_microstep: 1337.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091
[2024-06-10 17:05:44,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.91 | bwd_microstep: 824.81 | bwd_inner_microstep: 824.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 17:05:46,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1452.74 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-10 17:05:48,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1511.56 | bwd_inner_microstep: 1511.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 17:05:50,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.72 | bwd_microstep: 1524.27 | bwd_inner_microstep: 1524.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3660
[2024-06-10 17:05:52,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.13 | bwd_microstep: 1259.45 | bwd_inner_microstep: 1259.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299
[2024-06-10 17:05:53,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.26 | bwd_microstep: 978.55 | bwd_inner_microstep: 978.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 17:05:54,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 697.72 | bwd_inner_microstep: 697.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 17:05:56,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.16 | bwd_microstep: 800.97 | bwd_inner_microstep: 800.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987
[2024-06-10 17:05:57,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.25 | bwd_microstep: 706.61 | bwd_inner_microstep: 706.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-10 17:05:58,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.18 | bwd_microstep: 816.31 | bwd_inner_microstep: 816.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 17:06:00,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1504.97 | bwd_inner_microstep: 1504.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-10 17:06:02,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1324.54 | bwd_inner_microstep: 1324.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074
[2024-06-10 17:06:03,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.78 | bwd_microstep: 1010.18 | bwd_inner_microstep: 1010.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 17:06:05,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.55 | bwd_microstep: 1402.41 | bwd_inner_microstep: 1402.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 17:06:07,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.06 | bwd_microstep: 1301.52 | bwd_inner_microstep: 1301.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818
[2024-06-10 17:06:09,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1614.48 | bwd_inner_microstep: 1614.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 17:06:11,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.94 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1542.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398
[2024-06-10 17:06:13,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.65 | bwd_microstep: 1368.15 | bwd_inner_microstep: 1368.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-10 17:06:15,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1405.15 | bwd_inner_microstep: 1405.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 17:06:24,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.58
[2024-06-10 17:06:24,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.45 | bwd_microstep: 8109.24 | bwd_inner_microstep: 1985.51 | bwd_allreduce_microstep: 6123.67 | step_microstep: 38.50
[2024-06-10 17:06:24,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15149.39 | bwd: 46772.94 | bwd_inner: 40648.36 | bwd_allreduce: 6123.91 | step: 40.08
{'loss': 1.1751, 'learning_rate': 1.7529087853597072e-05, 'epoch': 0.55}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 17:06:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.10 | bwd_microstep: 1560.34 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3886
[2024-06-10 17:06:28,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 1478.22 | bwd_inner_microstep: 1478.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 17:06:30,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1242.07 | bwd_inner_microstep: 1242.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 17:06:32,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1476.93 | bwd_inner_microstep: 1476.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 17:06:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.68 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832
[2024-06-10 17:06:36,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.06 | bwd_microstep: 1479.89 | bwd_inner_microstep: 1479.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 17:06:38,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1387.02 | bwd_inner_microstep: 1386.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477
[2024-06-10 17:06:39,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.87 | bwd_microstep: 1245.66 | bwd_inner_microstep: 1245.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 864
[2024-06-10 17:06:40,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.15 | bwd_microstep: 349.97 | bwd_inner_microstep: 349.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 17:06:42,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.97 | bwd_microstep: 1494.65 | bwd_inner_microstep: 1494.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-10 17:06:44,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.14 | bwd_microstep: 1302.43 | bwd_inner_microstep: 1302.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 720
[2024-06-10 17:06:44,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.08 | bwd_microstep: 292.49 | bwd_inner_microstep: 292.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099
[2024-06-10 17:06:45,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.76 | bwd_microstep: 915.23 | bwd_inner_microstep: 915.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 17:06:48,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.86 | bwd_microstep: 1716.21 | bwd_inner_microstep: 1716.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934
[2024-06-10 17:06:49,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.13 | bwd_microstep: 758.12 | bwd_inner_microstep: 758.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3618
[2024-06-10 17:06:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.25 | bwd_microstep: 1567.77 | bwd_inner_microstep: 1567.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 17:06:53,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.81 | bwd_microstep: 1490.00 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3537
[2024-06-10 17:06:55,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1551.29 | bwd_inner_microstep: 1551.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668
[2024-06-10 17:06:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.17 | bwd_microstep: 1624.79 | bwd_inner_microstep: 1624.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 17:06:59,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.43 | bwd_microstep: 973.39 | bwd_inner_microstep: 973.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 17:07:00,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1288.35 | bwd_inner_microstep: 1288.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 17:07:02,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2159
[2024-06-10 17:07:04,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.79 | bwd_microstep: 857.07 | bwd_inner_microstep: 857.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 17:07:06,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1460.11 | bwd_inner_microstep: 1460.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 17:07:07,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.48 | bwd_microstep: 1185.97 | bwd_inner_microstep: 1185.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 17:07:09,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.25 | bwd_microstep: 1496.49 | bwd_inner_microstep: 1496.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-10 17:07:10,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.41 | bwd_microstep: 687.72 | bwd_inner_microstep: 687.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 17:07:12,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1439.64 | bwd_inner_microstep: 1439.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 17:07:14,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1555.46 | bwd_inner_microstep: 1555.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 17:07:16,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.40 | bwd_microstep: 1157.41 | bwd_inner_microstep: 1157.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806
[2024-06-10 17:07:18,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1481.08 | bwd_inner_microstep: 1481.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2261
[2024-06-10 17:07:26,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-10 17:07:26,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.65 | bwd_microstep: 7162.54 | bwd_inner_microstep: 1216.54 | bwd_allreduce_microstep: 5945.94 | step_microstep: 38.26
[2024-06-10 17:07:26,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15124.06 | bwd: 46458.43 | bwd_inner: 40511.56 | bwd_allreduce: 5946.18 | step: 39.72
{'loss': 1.1929, 'learning_rate': 1.7491845815941926e-05, 'epoch': 0.55}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 17:07:27,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.89 | bwd_microstep: 1356.31 | bwd_inner_microstep: 1356.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911
[2024-06-10 17:07:30,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.64 | bwd_microstep: 1682.61 | bwd_inner_microstep: 1682.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 17:07:32,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1376.89 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2254
[2024-06-10 17:07:33,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.62 | bwd_microstep: 900.22 | bwd_inner_microstep: 900.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-10 17:07:34,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.79 | bwd_microstep: 725.67 | bwd_inner_microstep: 725.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 17:07:36,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.33 | bwd_microstep: 1531.62 | bwd_inner_microstep: 1531.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895
[2024-06-10 17:07:37,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.56 | bwd_microstep: 683.43 | bwd_inner_microstep: 683.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 17:07:38,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.23 | bwd_microstep: 678.96 | bwd_inner_microstep: 678.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 17:07:41,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.45 | bwd_microstep: 2300.76 | bwd_inner_microstep: 2300.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 17:07:43,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.66 | bwd_microstep: 1439.00 | bwd_inner_microstep: 1438.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 17:07:45,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1346.43 | bwd_inner_microstep: 1346.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3381
[2024-06-10 17:07:46,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.52 | bwd_microstep: 1176.87 | bwd_inner_microstep: 1176.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 17:07:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1348.25 | bwd_inner_microstep: 1348.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2488
[2024-06-10 17:07:50,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.44 | bwd_microstep: 1142.67 | bwd_inner_microstep: 1142.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 682
[2024-06-10 17:07:50,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.76 | bwd_microstep: 283.52 | bwd_inner_microstep: 283.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3864
[2024-06-10 17:07:52,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.60 | bwd_microstep: 1569.50 | bwd_inner_microstep: 1569.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 17:07:53,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.91 | bwd_microstep: 804.60 | bwd_inner_microstep: 804.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3928
[2024-06-10 17:07:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.39 | bwd_microstep: 1335.03 | bwd_inner_microstep: 1335.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 17:07:57,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1295.96 | bwd_inner_microstep: 1295.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181
[2024-06-10 17:07:58,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.00 | bwd_microstep: 857.79 | bwd_inner_microstep: 857.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 17:08:00,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1431.94 | bwd_inner_microstep: 1431.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 17:08:01,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.24 | bwd_microstep: 806.00 | bwd_inner_microstep: 805.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 17:08:03,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.82 | bwd_microstep: 1424.06 | bwd_inner_microstep: 1424.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3560
[2024-06-10 17:08:05,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1265.39 | bwd_inner_microstep: 1265.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3646
[2024-06-10 17:08:07,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1438.78 | bwd_inner_microstep: 1438.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2226
[2024-06-10 17:08:08,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.23 | bwd_microstep: 959.69 | bwd_inner_microstep: 959.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271
[2024-06-10 17:08:10,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.35 | bwd_microstep: 1003.96 | bwd_inner_microstep: 1003.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 17:08:12,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1279.46 | bwd_inner_microstep: 1279.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 17:08:14,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1592.30 | bwd_inner_microstep: 1592.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3584
[2024-06-10 17:08:16,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.17 | bwd_microstep: 1348.25 | bwd_inner_microstep: 1348.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3589
[2024-06-10 17:08:17,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1339.25 | bwd_inner_microstep: 1339.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 17:08:26,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 17:08:26,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 7954.42 | bwd_inner_microstep: 1643.31 | bwd_allreduce_microstep: 6311.06 | step_microstep: 37.97
[2024-06-10 17:08:26,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14390.69 | bwd: 45679.61 | bwd_inner: 39367.64 | bwd_allreduce: 6311.29 | step: 39.58
{'loss': 1.2031, 'learning_rate': 1.7454612611992777e-05, 'epoch': 0.55}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 17:08:28,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1369.38 | bwd_inner_microstep: 1369.27 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 17:08:30,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1373.72 | bwd_inner_microstep: 1373.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763
[2024-06-10 17:08:32,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.65 | bwd_microstep: 1461.31 | bwd_inner_microstep: 1461.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 17:08:34,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1425.78 | bwd_inner_microstep: 1425.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 17:08:36,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.80 | bwd_microstep: 1379.84 | bwd_inner_microstep: 1379.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 17:08:38,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 17:08:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.25 | bwd_microstep: 1379.67 | bwd_inner_microstep: 1379.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 17:08:41,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1426.80 | bwd_inner_microstep: 1426.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 17:08:43,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 1154.35 | bwd_inner_microstep: 1154.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 17:08:45,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.63 | bwd_microstep: 1151.00 | bwd_inner_microstep: 1150.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-10 17:08:47,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1418.76 | bwd_inner_microstep: 1418.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 17:08:49,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.89 | bwd_microstep: 1434.69 | bwd_inner_microstep: 1434.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 17:08:50,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1359.06 | bwd_inner_microstep: 1359.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 17:08:52,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 17:08:54,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1255.38 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3565
[2024-06-10 17:08:56,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.22 | bwd_microstep: 1430.13 | bwd_inner_microstep: 1430.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 17:08:58,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 17:09:00,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-10 17:09:01,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.66 | bwd_microstep: 1154.20 | bwd_inner_microstep: 1154.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 17:09:03,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 17:09:06,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.58 | bwd_microstep: 1656.65 | bwd_inner_microstep: 1656.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 17:09:08,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1622.93 | bwd_inner_microstep: 1622.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107
[2024-06-10 17:09:09,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.78 | bwd_microstep: 920.76 | bwd_inner_microstep: 920.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823
[2024-06-10 17:09:11,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.02 | bwd_microstep: 1586.35 | bwd_inner_microstep: 1586.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 17:09:13,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.51 | bwd_microstep: 1302.26 | bwd_inner_microstep: 1302.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2193
[2024-06-10 17:09:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.18 | bwd_microstep: 890.86 | bwd_inner_microstep: 890.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 17:09:16,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.87 | bwd_microstep: 1282.79 | bwd_inner_microstep: 1282.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3570
[2024-06-10 17:09:18,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.93 | bwd_microstep: 1350.34 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 17:09:20,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.16 | bwd_microstep: 1549.58 | bwd_inner_microstep: 1549.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2902
[2024-06-10 17:09:22,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.68 | bwd_microstep: 1188.41 | bwd_inner_microstep: 1188.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-10 17:09:24,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1351.59 | bwd_inner_microstep: 1351.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 17:09:26,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 17:09:26,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1414.52 | bwd_inner_microstep: 1406.84 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.57
[2024-06-10 17:09:26,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16208.70 | bwd: 43207.99 | bwd_inner: 43199.38 | bwd_allreduce: 7.91 | step: 39.04
{'loss': 1.2159, 'learning_rate': 1.7417388372884775e-05, 'epoch': 0.56}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3440
[2024-06-10 17:09:28,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.49 | bwd_microstep: 1494.69 | bwd_inner_microstep: 1494.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 17:09:30,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1283.71 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 17:09:31,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.96 | bwd_microstep: 1374.35 | bwd_inner_microstep: 1374.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 17:09:34,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.43 | bwd_microstep: 1653.61 | bwd_inner_microstep: 1653.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 17:09:36,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.46 | bwd_microstep: 1478.39 | bwd_inner_microstep: 1478.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 17:09:37,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.72 | bwd_microstep: 1149.55 | bwd_inner_microstep: 1149.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 17:09:39,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2396
[2024-06-10 17:09:41,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.99 | bwd_microstep: 935.16 | bwd_inner_microstep: 935.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 17:09:42,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1382.70 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954
[2024-06-10 17:09:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.46 | bwd_microstep: 700.74 | bwd_inner_microstep: 700.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3545
[2024-06-10 17:09:45,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1229.87 | bwd_inner_microstep: 1229.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 17:09:47,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.12 | bwd_microstep: 1486.69 | bwd_inner_microstep: 1486.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3604
[2024-06-10 17:09:49,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1369.16 | bwd_inner_microstep: 1369.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3384
[2024-06-10 17:09:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.51 | bwd_microstep: 1143.24 | bwd_inner_microstep: 1143.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 17:09:53,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2165
[2024-06-10 17:09:54,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.78 | bwd_microstep: 916.87 | bwd_inner_microstep: 916.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2686
[2024-06-10 17:09:55,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.94 | bwd_microstep: 1121.52 | bwd_inner_microstep: 1121.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823
[2024-06-10 17:09:58,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.56 | bwd_microstep: 1511.94 | bwd_inner_microstep: 1511.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 17:09:59,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1380.67 | bwd_inner_microstep: 1380.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818
[2024-06-10 17:10:02,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.41 | bwd_microstep: 1717.66 | bwd_inner_microstep: 1717.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 17:10:04,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 17:10:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1374.58 | bwd_inner_microstep: 1374.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 17:10:08,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1512.57 | bwd_inner_microstep: 1512.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 17:10:10,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1343.81 | bwd_inner_microstep: 1343.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2283
[2024-06-10 17:10:11,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.15 | bwd_microstep: 1070.88 | bwd_inner_microstep: 1070.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 17:10:13,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.29 | bwd_microstep: 1654.32 | bwd_inner_microstep: 1654.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042
[2024-06-10 17:10:15,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.19 | bwd_microstep: 907.51 | bwd_inner_microstep: 907.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741
[2024-06-10 17:10:17,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1639.99 | bwd_inner_microstep: 1639.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982
[2024-06-10 17:10:18,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.14 | bwd_microstep: 735.13 | bwd_inner_microstep: 735.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 17:10:20,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.55 | bwd_microstep: 1402.54 | bwd_inner_microstep: 1402.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716
[2024-06-10 17:10:22,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.71 | bwd_microstep: 1463.83 | bwd_inner_microstep: 1463.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 17:10:24,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-10 17:10:24,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.88 | bwd_microstep: 1977.62 | bwd_inner_microstep: 1482.68 | bwd_allreduce_microstep: 494.89 | step_microstep: 37.68
[2024-06-10 17:10:24,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15721.19 | bwd: 42569.04 | bwd_inner: 42073.25 | bwd_allreduce: 495.11 | step: 39.18
7:58<13:11:53, 61.55s/it]


 55%|█████▌    | 954/1726 [16:27:58<13:11:53, 61.55s/it]
 55%|█████▌    | 955/1726 [16:29:00<13:13:38, 61.76s/it]


 55%|█████▌    | 955/1726 [16:29:00<13:13:38, 61.76s/it]
 55%|█████▌    | 956/1726 [16:30:02<13:13:11, 61.81s/it]


 55%|█████▌    | 956/1726 [16:30:02<13:13:11, 61.81s/it]
 55%|█████▌    | 957/1726 [16:31:03<13:06:41, 61.38s/it]


 55%|█████▌    | 957/1726 [16:31:03<13:06:41, 61.38s/it]
 56%|█████▌    | 958/1726 [16:32:02<12:59:22, 60.89s/it]


 56%|█████▌    | 958/1726 [16:32:02<12:59:22, 60.89s/it]
 56%|█████▌    | 959/1726 [16:33:01<12:49:39, 60.21s/it]
                  {'loss': 1.2427, 'learning_rate': 1.7380173229721494e-05, 'epoch': 0.56}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 17:10:26,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.93 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 17:10:28,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1383.00 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 17:10:30,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1383.09 | bwd_inner_microstep: 1383.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3801
[2024-06-10 17:10:32,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1556.90 | bwd_inner_microstep: 1556.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-10 17:10:34,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1462.89 | bwd_inner_microstep: 1462.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3744
[2024-06-10 17:10:36,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.62 | bwd_microstep: 1581.75 | bwd_inner_microstep: 1581.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 17:10:38,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 17:10:40,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1280.12 | bwd_inner_microstep: 1280.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 17:10:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.62 | bwd_microstep: 794.29 | bwd_inner_microstep: 794.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2148
[2024-06-10 17:10:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.90 | bwd_microstep: 975.78 | bwd_inner_microstep: 975.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 17:10:44,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1253.29 | bwd_inner_microstep: 1253.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3699
[2024-06-10 17:10:46,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1513.09 | bwd_inner_microstep: 1513.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 17:10:48,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1339.30 | bwd_inner_microstep: 1339.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3700
[2024-06-10 17:10:50,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.00 | bwd_microstep: 1720.06 | bwd_inner_microstep: 1720.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2612
[2024-06-10 17:10:52,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.95 | bwd_microstep: 947.62 | bwd_inner_microstep: 947.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527
[2024-06-10 17:10:54,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1322.06 | bwd_inner_microstep: 1322.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 17:10:56,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1475.92 | bwd_inner_microstep: 1475.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 17:10:58,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1509.41 | bwd_inner_microstep: 1509.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-10 17:10:59,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 698.12 | bwd_inner_microstep: 698.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2292
[2024-06-10 17:11:00,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.41 | bwd_microstep: 1004.24 | bwd_inner_microstep: 1004.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 17:11:02,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 17:11:04,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.96 | bwd_microstep: 1644.76 | bwd_inner_microstep: 1644.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 17:11:06,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.83 | bwd_microstep: 1430.08 | bwd_inner_microstep: 1430.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-10 17:11:08,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.95 | bwd_microstep: 1181.64 | bwd_inner_microstep: 1181.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 17:11:10,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.90 | bwd_microstep: 1181.61 | bwd_inner_microstep: 1181.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 17:11:12,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.67 | bwd_microstep: 1555.99 | bwd_inner_microstep: 1555.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-10 17:11:14,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.44 | bwd_microstep: 1444.08 | bwd_inner_microstep: 1444.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569
[2024-06-10 17:11:16,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1447.20 | bwd_inner_microstep: 1447.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-10 17:11:17,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.57 | bwd_microstep: 779.57 | bwd_inner_microstep: 779.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 17:11:19,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 17:11:21,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.11 | bwd_microstep: 1551.16 | bwd_inner_microstep: 1551.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3460
[2024-06-10 17:11:33,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 17:11:33,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.98 | bwd_microstep: 10849.92 | bwd_inner_microstep: 2060.53 | bwd_allreduce_microstep: 8789.32 | step_microstep: 38.79
[2024-06-10 17:11:33,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15999.93 | bwd: 51941.16 | bwd_inner: 43150.92 | bwd_allreduce: 8789.56 | step: 40.28
{'loss': 1.1465, 'learning_rate': 1.734296731357448e-05, 'epoch': 0.56}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 17:11:35,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1461.21 | bwd_inner_microstep: 1461.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 17:11:37,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1382.68 | bwd_inner_microstep: 1382.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4021
[2024-06-10 17:11:39,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.49 | bwd_microstep: 1604.75 | bwd_inner_microstep: 1604.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 17:11:41,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1381.46 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 17:11:42,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 17:11:44,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1396.97 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3616
[2024-06-10 17:11:46,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1245.62 | bwd_inner_microstep: 1245.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 17:11:48,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1398.39 | bwd_inner_microstep: 1398.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 17:11:50,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.99 | bwd_microstep: 1480.99 | bwd_inner_microstep: 1480.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 17:11:51,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 794.24 | bwd_inner_microstep: 794.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1915
[2024-06-10 17:11:52,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.44 | bwd_microstep: 748.97 | bwd_inner_microstep: 748.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 17:11:54,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.35 | bwd_microstep: 1485.29 | bwd_inner_microstep: 1485.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525
[2024-06-10 17:11:56,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.82 | bwd_microstep: 1444.31 | bwd_inner_microstep: 1444.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3662
[2024-06-10 17:11:59,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.89 | bwd_microstep: 1768.65 | bwd_inner_microstep: 1768.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-10 17:12:01,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.15 | bwd_microstep: 1725.01 | bwd_inner_microstep: 1724.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656
[2024-06-10 17:12:03,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.98 | bwd_microstep: 1618.80 | bwd_inner_microstep: 1618.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982
[2024-06-10 17:12:04,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.73 | bwd_microstep: 890.54 | bwd_inner_microstep: 890.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 17:12:06,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1282.41 | bwd_inner_microstep: 1282.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 17:12:08,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.49 | bwd_microstep: 1191.92 | bwd_inner_microstep: 1191.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177
[2024-06-10 17:12:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.27 | bwd_microstep: 857.11 | bwd_inner_microstep: 857.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590
[2024-06-10 17:12:11,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1309.80 | bwd_inner_microstep: 1309.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 17:12:13,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.87 | bwd_microstep: 1297.02 | bwd_inner_microstep: 1297.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 17:12:15,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.92 | bwd_microstep: 1457.95 | bwd_inner_microstep: 1457.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 17:12:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1395.80 | bwd_inner_microstep: 1395.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 17:12:18,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.11 | bwd_microstep: 1297.31 | bwd_inner_microstep: 1297.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3800
[2024-06-10 17:12:20,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.58 | bwd_microstep: 1350.93 | bwd_inner_microstep: 1350.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 17:12:23,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.71 | bwd_microstep: 1645.95 | bwd_inner_microstep: 1645.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 17:12:25,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1490.08 | bwd_inner_microstep: 1490.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2281
[2024-06-10 17:12:26,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.34 | bwd_microstep: 1031.71 | bwd_inner_microstep: 1031.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2966
[2024-06-10 17:12:27,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.61 | bwd_microstep: 1040.25 | bwd_inner_microstep: 1040.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3584
[2024-06-10 17:12:29,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.70 | bwd_microstep: 1443.92 | bwd_inner_microstep: 1443.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 17:12:33,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.50 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 17:12:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 2534.22 | bwd_inner_microstep: 1851.89 | bwd_allreduce_microstep: 682.28 | step_microstep: 38.89
[2024-06-10 17:12:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.76 | bwd: 43697.87 | bwd_inner: 43014.68 | bwd_allreduce: 682.51 | step: 40.39
{'loss': 1.1938, 'learning_rate': 1.7305770755482788e-05, 'epoch': 0.56}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3437
[2024-06-10 17:12:35,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.26 | bwd_microstep: 1543.27 | bwd_inner_microstep: 1543.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472
[2024-06-10 17:12:37,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.79 | bwd_microstep: 1408.86 | bwd_inner_microstep: 1408.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 17:12:39,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1507.10 | bwd_inner_microstep: 1507.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3794
[2024-06-10 17:12:41,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1380.71 | bwd_inner_microstep: 1380.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431
[2024-06-10 17:12:42,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.11 | bwd_microstep: 1180.65 | bwd_inner_microstep: 1180.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 17:12:44,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1379.25 | bwd_inner_microstep: 1379.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 17:12:46,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.15 | bwd_microstep: 1246.84 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 17:12:48,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1415.84 | bwd_inner_microstep: 1415.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 17:12:50,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1280.54 | bwd_inner_microstep: 1280.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 17:12:51,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-10 17:12:53,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.28 | bwd_microstep: 788.22 | bwd_inner_microstep: 788.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2009
[2024-06-10 17:12:54,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.43 | bwd_microstep: 775.47 | bwd_inner_microstep: 775.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 17:12:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 17:12:58,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1450.05 | bwd_inner_microstep: 1450.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4013
[2024-06-10 17:13:00,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.74 | bwd_microstep: 1808.87 | bwd_inner_microstep: 1808.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427
[2024-06-10 17:13:02,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.55 | bwd_microstep: 1279.86 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 17:13:04,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.66 | bwd_microstep: 1518.86 | bwd_inner_microstep: 1518.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3670
[2024-06-10 17:13:06,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.77 | bwd_microstep: 1691.02 | bwd_inner_microstep: 1691.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3436
[2024-06-10 17:13:08,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1334.13 | bwd_inner_microstep: 1334.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3483
[2024-06-10 17:13:10,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.84 | bwd_microstep: 1250.74 | bwd_inner_microstep: 1250.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 17:13:12,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.70 | bwd_microstep: 1408.39 | bwd_inner_microstep: 1408.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 17:13:14,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1395.78 | bwd_inner_microstep: 1395.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-10 17:13:16,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176
[2024-06-10 17:13:17,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.36 | bwd_microstep: 855.42 | bwd_inner_microstep: 855.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3505
[2024-06-10 17:13:19,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.56 | bwd_microstep: 1451.08 | bwd_inner_microstep: 1451.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 17:13:21,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1552.15 | bwd_inner_microstep: 1552.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 17:13:23,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1412.41 | bwd_inner_microstep: 1412.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 17:13:25,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1372.49 | bwd_inner_microstep: 1372.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2293
[2024-06-10 17:13:26,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.92 | bwd_microstep: 1072.98 | bwd_inner_microstep: 1072.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 17:13:29,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.74 | bwd_microstep: 1653.04 | bwd_inner_microstep: 1653.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 17:13:31,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1352.74 | bwd_inner_microstep: 1352.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 17:13:34,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-10 17:13:34,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.81 | bwd_microstep: 3270.98 | bwd_inner_microstep: 1889.51 | bwd_allreduce_microstep: 1381.41 | step_microstep: 37.70
[2024-06-10 17:13:34,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16219.37 | bwd: 45186.24 | bwd_inner: 43803.91 | bwd_allreduce: 1381.65 | step: 39.26
{'loss': 1.2126, 'learning_rate': 1.7268583686452474e-05, 'epoch': 0.56}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:13:36,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.48 | bwd_microstep: 1233.28 | bwd_inner_microstep: 1233.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3944
[2024-06-10 17:13:38,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.66 | bwd_microstep: 1691.95 | bwd_inner_microstep: 1691.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854
[2024-06-10 17:13:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.43 | bwd_microstep: 1659.91 | bwd_inner_microstep: 1659.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823
[2024-06-10 17:13:43,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1498.46 | bwd_inner_microstep: 1498.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 17:13:44,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1247.73 | bwd_inner_microstep: 1247.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743
[2024-06-10 17:13:47,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.84 | bwd_microstep: 1531.31 | bwd_inner_microstep: 1531.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 17:13:48,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.15 | bwd_microstep: 1150.68 | bwd_inner_microstep: 1150.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-10 17:13:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.44 | bwd_microstep: 794.05 | bwd_inner_microstep: 794.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1903
[2024-06-10 17:13:50,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.74 | bwd_microstep: 715.39 | bwd_inner_microstep: 715.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 17:13:52,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.30 | bwd_microstep: 1441.73 | bwd_inner_microstep: 1441.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 17:13:54,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.92 | bwd_microstep: 1342.77 | bwd_inner_microstep: 1342.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 17:13:56,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.01 | bwd_microstep: 1429.95 | bwd_inner_microstep: 1429.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3513
[2024-06-10 17:13:58,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.70 | bwd_microstep: 1652.73 | bwd_inner_microstep: 1652.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-10 17:14:00,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.02 | bwd_microstep: 820.20 | bwd_inner_microstep: 820.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 17:14:01,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 17:14:03,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 17:14:05,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1431.33 | bwd_inner_microstep: 1431.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 17:14:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 17:14:09,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1462.04 | bwd_inner_microstep: 1462.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 17:14:11,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1255.23 | bwd_inner_microstep: 1255.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 17:14:13,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1396.11 | bwd_inner_microstep: 1396.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3566
[2024-06-10 17:14:14,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.23 | bwd_microstep: 1237.61 | bwd_inner_microstep: 1237.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2069
[2024-06-10 17:14:16,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.91 | bwd_microstep: 960.64 | bwd_inner_microstep: 960.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 17:14:18,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1447.63 | bwd_inner_microstep: 1447.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 17:14:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1599.06 | bwd_inner_microstep: 1599.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 17:14:22,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.33 | bwd_microstep: 1569.88 | bwd_inner_microstep: 1569.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 17:14:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 1535.65 | bwd_inner_microstep: 1535.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059
[2024-06-10 17:14:25,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.23 | bwd_microstep: 913.38 | bwd_inner_microstep: 913.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3449
[2024-06-10 17:14:27,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1300.21 | bwd_inner_microstep: 1300.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 17:14:29,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.11 | bwd_microstep: 1305.92 | bwd_inner_microstep: 1305.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 17:14:31,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.99 | bwd_microstep: 1250.97 | bwd_inner_microstep: 1250.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3816
[2024-06-10 17:14:36,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-10 17:14:36,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 4243.04 | bwd_inner_microstep: 1797.78 | bwd_allreduce_microstep: 2445.21 | step_microstep: 37.94
[2024-06-10 17:14:36,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15872.55 | bwd: 45074.16 | bwd_inner: 42628.04 | bwd_allreduce: 2445.44 | step: 39.39
{'loss': 1.1877, 'learning_rate': 1.723140623745622e-05, 'epoch': 0.56}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 17:14:38,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1465.02 | bwd_inner_microstep: 1464.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902
[2024-06-10 17:14:40,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.55 | bwd_microstep: 1692.73 | bwd_inner_microstep: 1692.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 17:14:42,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1492.61 | bwd_inner_microstep: 1492.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 17:14:44,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.61 | bwd_microstep: 1557.74 | bwd_inner_microstep: 1557.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 17:14:46,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.78 | bwd_microstep: 1541.57 | bwd_inner_microstep: 1541.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 17:14:47,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.94 | bwd_microstep: 800.08 | bwd_inner_microstep: 800.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 17:14:49,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1380.75 | bwd_inner_microstep: 1380.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-10 17:14:51,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.89 | bwd_microstep: 1186.44 | bwd_inner_microstep: 1186.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 17:14:53,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.06 | bwd_microstep: 1629.67 | bwd_inner_microstep: 1629.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 17:14:55,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 17:14:57,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1414.71 | bwd_inner_microstep: 1414.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 17:14:59,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1390.28 | bwd_inner_microstep: 1390.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3717
[2024-06-10 17:15:01,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.94 | bwd_microstep: 1560.42 | bwd_inner_microstep: 1560.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 17:15:03,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.82 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-10 17:15:05,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.93 | bwd_microstep: 1525.68 | bwd_inner_microstep: 1525.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2126
[2024-06-10 17:15:06,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.11 | bwd_microstep: 863.16 | bwd_inner_microstep: 863.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925
[2024-06-10 17:15:09,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.06 | bwd_microstep: 1596.45 | bwd_inner_microstep: 1596.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 17:15:11,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1555.20 | bwd_inner_microstep: 1555.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-10 17:15:12,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.63 | bwd_microstep: 824.61 | bwd_inner_microstep: 824.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 17:15:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 17:15:15,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.81 | bwd_microstep: 807.36 | bwd_inner_microstep: 807.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 17:15:17,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1346.27 | bwd_inner_microstep: 1346.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1994
[2024-06-10 17:15:18,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.05 | bwd_microstep: 709.95 | bwd_inner_microstep: 709.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 17:15:19,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 17:15:21,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.00 | bwd_microstep: 1299.82 | bwd_inner_microstep: 1299.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 17:15:23,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1280.57 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891
[2024-06-10 17:15:24,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.96 | bwd_microstep: 714.39 | bwd_inner_microstep: 714.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551
[2024-06-10 17:15:26,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1427.13 | bwd_inner_microstep: 1427.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 17:15:28,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.69 | bwd_microstep: 1454.19 | bwd_inner_microstep: 1454.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 17:15:30,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.58 | bwd_microstep: 1595.74 | bwd_inner_microstep: 1595.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 17:15:32,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.83 | bwd_microstep: 1335.97 | bwd_inner_microstep: 1335.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2225
[2024-06-10 17:15:36,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 17:15:36,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.50 | bwd_microstep: 3550.62 | bwd_inner_microstep: 1127.28 | bwd_allreduce_microstep: 2423.29 | step_microstep: 38.05
[2024-06-10 17:15:36,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15645.99 | bwd: 44295.42 | bwd_inner: 41871.22 | bwd_allreduce: 2423.52 | step: 39.56
{'loss': 1.2416, 'learning_rate': 1.7194238539432807e-05, 'epoch': 0.56}


 56%|█████▌    | 959/1726 [16:33:01<12:49:39, 60.21s/it]
 56%|█████▌    | 960/1726 [16:34:09<13:19:32, 62.63s/it]


 56%|█████▌    | 960/1726 [16:34:09<13:19:32, 62.63s/it]
 56%|█████▌    | 961/1726 [16:35:09<13:08:33, 61.85s/it]


 56%|█████▌    | 961/1726 [16:35:09<13:08:33, 61.85s/it]
 56%|█████▌    | 962/1726 [16:36:11<13:07:07, 61.82s/it]


 56%|█████▌    | 962/1726 [16:36:11<13:07:07, 61.82s/it]
 56%|█████▌    | 963/1726 [16:37:12<13:04:02, 61.65s/it]


 56%|█████▌    | 963/1726 [16:37:12<13:04:02, 61.65s/it]
 56%|█████▌    | 964/1726 [16:38:13<12:57:46, 61.24s/it]


 56%dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 17:15:38,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1362.69 | bwd_inner_microstep: 1362.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3946
[2024-06-10 17:15:40,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.06 | bwd_microstep: 1623.44 | bwd_inner_microstep: 1623.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 17:15:42,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1349.36 | bwd_inner_microstep: 1349.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 17:15:44,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.07 | bwd_microstep: 1180.48 | bwd_inner_microstep: 1180.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2241
[2024-06-10 17:15:45,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.24 | bwd_microstep: 867.47 | bwd_inner_microstep: 867.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 17:15:47,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1350.41 | bwd_inner_microstep: 1350.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921
[2024-06-10 17:15:48,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.87 | bwd_microstep: 725.31 | bwd_inner_microstep: 725.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 17:15:49,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1280.93 | bwd_inner_microstep: 1280.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3693
[2024-06-10 17:15:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1390.50 | bwd_inner_microstep: 1390.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4143
[2024-06-10 17:15:54,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.86 | bwd_microstep: 1668.06 | bwd_inner_microstep: 1668.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.42
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 17:15:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1377.48 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3933
[2024-06-10 17:15:58,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.01 | bwd_microstep: 1688.17 | bwd_inner_microstep: 1688.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 17:16:00,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1525.71 | bwd_inner_microstep: 1525.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 17:16:02,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441
[2024-06-10 17:16:04,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.11 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 17:16:06,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-10 17:16:07,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1290.01 | bwd_inner_microstep: 1289.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 17:16:09,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-10 17:16:11,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1285.89 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3443
[2024-06-10 17:16:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.74 | bwd_microstep: 1217.69 | bwd_inner_microstep: 1217.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 17:16:15,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1508.46 | bwd_inner_microstep: 1508.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293
[2024-06-10 17:16:16,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.85 | bwd_microstep: 880.36 | bwd_inner_microstep: 880.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082
[2024-06-10 17:16:17,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.84 | bwd_microstep: 851.80 | bwd_inner_microstep: 851.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 17:16:19,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1507.18 | bwd_inner_microstep: 1507.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 17:16:21,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1480.97 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3758
[2024-06-10 17:16:23,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.35 | bwd_microstep: 1676.27 | bwd_inner_microstep: 1676.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2692
[2024-06-10 17:16:25,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.92 | bwd_microstep: 1033.26 | bwd_inner_microstep: 1033.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 17:16:27,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1425.73 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576
[2024-06-10 17:16:29,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.61 | bwd_microstep: 1527.15 | bwd_inner_microstep: 1527.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3420
[2024-06-10 17:16:31,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.36 | bwd_microstep: 1377.10 | bwd_inner_microstep: 1377.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 17:16:33,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1310.81 | bwd_inner_microstep: 1310.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 17:16:37,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 17:16:37,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 4221.18 | bwd_inner_microstep: 1445.68 | bwd_allreduce_microstep: 2775.46 | step_microstep: 37.76
[2024-06-10 17:16:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15887.72 | bwd: 45281.53 | bwd_inner: 42505.17 | bwd_allreduce: 2775.69 | step: 40.60
{'loss': 1.234, 'learning_rate': 1.715708072328668e-05, 'epoch': 0.56}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493
[2024-06-10 17:16:39,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1433.27 | bwd_inner_microstep: 1433.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3472
[2024-06-10 17:16:41,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1325.68 | bwd_inner_microstep: 1325.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3490
[2024-06-10 17:16:43,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1244.45 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 17:16:45,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.98 | bwd_microstep: 1549.71 | bwd_inner_microstep: 1549.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 17:16:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1394.11 | bwd_inner_microstep: 1394.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070
[2024-06-10 17:16:48,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.60 | bwd_microstep: 818.96 | bwd_inner_microstep: 818.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 17:16:50,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.20 | bwd_microstep: 1533.36 | bwd_inner_microstep: 1533.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 17:16:52,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1247.11 | bwd_inner_microstep: 1247.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 17:16:54,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.85 | bwd_microstep: 1388.44 | bwd_inner_microstep: 1388.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 17:16:56,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 17:16:58,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1301.07 | bwd_inner_microstep: 1301.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583
[2024-06-10 17:16:59,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.76 | bwd_microstep: 1237.08 | bwd_inner_microstep: 1237.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448
[2024-06-10 17:17:01,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.84 | bwd_microstep: 1313.18 | bwd_inner_microstep: 1313.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-10 17:17:03,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.58 | bwd_microstep: 1369.65 | bwd_inner_microstep: 1369.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 17:17:05,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1557.52 | bwd_inner_microstep: 1557.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998
[2024-06-10 17:17:06,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.00 | bwd_microstep: 737.45 | bwd_inner_microstep: 737.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3629
[2024-06-10 17:17:08,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1372.28 | bwd_inner_microstep: 1372.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 17:17:10,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1511.79 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 17:17:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1379.65 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 17:17:14,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.17 | bwd_microstep: 1639.83 | bwd_inner_microstep: 1639.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 17:17:16,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047
[2024-06-10 17:17:17,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 809.79 | bwd_inner_microstep: 809.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 17:17:19,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 17:17:21,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2183
[2024-06-10 17:17:22,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.15 | bwd_microstep: 953.89 | bwd_inner_microstep: 953.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 17:17:24,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1413.49 | bwd_inner_microstep: 1413.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-10 17:17:26,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1355.70 | bwd_inner_microstep: 1355.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 17:17:28,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.86 | bwd_microstep: 1658.45 | bwd_inner_microstep: 1658.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3061
[2024-06-10 17:17:30,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.39 | bwd_microstep: 1234.79 | bwd_inner_microstep: 1234.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 17:17:32,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1471.01 | bwd_inner_microstep: 1470.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3797
[2024-06-10 17:17:34,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1577.72 | bwd_inner_microstep: 1577.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 17:17:40,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 17:17:40,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.67 | bwd_microstep: 4755.36 | bwd_inner_microstep: 1617.11 | bwd_allreduce_microstep: 3138.20 | step_microstep: 37.90
[2024-06-10 17:17:40,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15986.57 | bwd: 45915.07 | bwd_inner: 42775.97 | bwd_allreduce: 3138.43 | step: 39.41
{'loss': 1.1931, 'learning_rate': 1.7119932919887453e-05, 'epoch': 0.56}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 17:17:42,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1432.64 | bwd_inner_microstep: 1432.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862
[2024-06-10 17:17:43,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.25 | bwd_microstep: 673.77 | bwd_inner_microstep: 673.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 17:17:44,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1370.72 | bwd_inner_microstep: 1370.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797
[2024-06-10 17:17:47,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.57 | bwd_microstep: 1642.70 | bwd_inner_microstep: 1642.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2248
[2024-06-10 17:17:48,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.83 | bwd_microstep: 868.16 | bwd_inner_microstep: 868.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 17:17:50,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3545
[2024-06-10 17:17:51,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.25 | bwd_microstep: 1197.90 | bwd_inner_microstep: 1197.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 17:17:53,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1481.42 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 17:17:55,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.52 | bwd_microstep: 1397.86 | bwd_inner_microstep: 1397.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 17:17:57,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1245.83 | bwd_inner_microstep: 1245.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 17:17:59,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.50 | bwd_microstep: 1185.89 | bwd_inner_microstep: 1185.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 17:18:00,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1252.76 | bwd_inner_microstep: 1252.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 17:18:02,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1286.82 | bwd_inner_microstep: 1286.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3656
[2024-06-10 17:18:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.69 | bwd_microstep: 1652.28 | bwd_inner_microstep: 1652.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3642
[2024-06-10 17:18:07,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.90 | bwd_microstep: 1559.57 | bwd_inner_microstep: 1559.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 17:18:09,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.61 | bwd_microstep: 1406.38 | bwd_inner_microstep: 1406.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 17:18:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1445.56 | bwd_inner_microstep: 1445.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3850
[2024-06-10 17:18:13,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.93 | bwd_microstep: 1829.93 | bwd_inner_microstep: 1829.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3853
[2024-06-10 17:18:16,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.52 | bwd_microstep: 1764.85 | bwd_inner_microstep: 1764.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 17:18:18,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1512.71 | bwd_inner_microstep: 1512.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 17:18:20,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.88 | bwd_microstep: 1584.48 | bwd_inner_microstep: 1584.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 17:18:22,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1288.77 | bwd_inner_microstep: 1288.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 17:18:24,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1510.97 | bwd_inner_microstep: 1510.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2242
[2024-06-10 17:18:25,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.61 | bwd_microstep: 930.05 | bwd_inner_microstep: 930.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 17:18:27,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3605
[2024-06-10 17:18:29,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.80 | bwd_microstep: 1371.33 | bwd_inner_microstep: 1371.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 17:18:31,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1354.96 | bwd_inner_microstep: 1354.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 17:18:32,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1417.29 | bwd_inner_microstep: 1417.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3558
[2024-06-10 17:18:34,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.42 | bwd_microstep: 1201.82 | bwd_inner_microstep: 1201.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 17:18:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1495.97 | bwd_inner_microstep: 1495.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 17:18:38,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1394.20 | bwd_inner_microstep: 1394.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3770
[2024-06-10 17:18:42,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 17:18:42,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.10 | bwd_microstep: 3387.09 | bwd_inner_microstep: 1976.94 | bwd_allreduce_microstep: 1410.10 | step_microstep: 37.96
[2024-06-10 17:18:42,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16492.28 | bwd: 45725.09 | bwd_inner: 44314.09 | bwd_allreduce: 1410.32 | step: 39.41
{'loss': 1.2262, 'learning_rate': 1.7082795260069515e-05, 'epoch': 0.56}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475
[2024-06-10 17:18:44,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.51 | bwd_microstep: 1212.31 | bwd_inner_microstep: 1212.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 17:18:46,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1252.12 | bwd_inner_microstep: 1252.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 17:18:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 17:18:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.70 | bwd_microstep: 1280.91 | bwd_inner_microstep: 1280.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402
[2024-06-10 17:18:51,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.58 | bwd_microstep: 1210.93 | bwd_inner_microstep: 1210.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807
[2024-06-10 17:18:53,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.97 | bwd_microstep: 1354.35 | bwd_inner_microstep: 1354.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 17:18:55,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 751
[2024-06-10 17:18:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.89 | bwd_microstep: 299.29 | bwd_inner_microstep: 299.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-10 17:18:57,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1417.14 | bwd_inner_microstep: 1417.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-10 17:18:59,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.31 | bwd_microstep: 1277.50 | bwd_inner_microstep: 1277.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3699
[2024-06-10 17:19:01,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.74 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1980
[2024-06-10 17:19:02,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.11 | bwd_microstep: 856.87 | bwd_inner_microstep: 856.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3508
[2024-06-10 17:19:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.70 | bwd_microstep: 1545.02 | bwd_inner_microstep: 1544.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 17:19:06,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1482.47 | bwd_inner_microstep: 1482.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 17:19:08,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1351.50 | bwd_inner_microstep: 1351.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2172
[2024-06-10 17:19:10,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.12 | bwd_microstep: 1047.71 | bwd_inner_microstep: 1047.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3533
[2024-06-10 17:19:11,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1257.50 | bwd_inner_microstep: 1257.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3943
[2024-06-10 17:19:14,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.57 | bwd_microstep: 1529.08 | bwd_inner_microstep: 1529.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 17:19:15,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 807.43 | bwd_inner_microstep: 807.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 17:19:17,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1424.38 | bwd_inner_microstep: 1424.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3547
[2024-06-10 17:19:19,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1420.61 | bwd_inner_microstep: 1420.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 17:19:21,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1510.22 | bwd_inner_microstep: 1510.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 17:19:23,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.31 | bwd_microstep: 1461.83 | bwd_inner_microstep: 1461.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 17:19:25,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1354.85 | bwd_inner_microstep: 1354.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-10 17:19:26,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1309.56 | bwd_inner_microstep: 1309.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3839
[2024-06-10 17:19:28,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.85 | bwd_microstep: 1420.42 | bwd_inner_microstep: 1420.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 17:19:30,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.82 | bwd_microstep: 1503.32 | bwd_inner_microstep: 1503.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3730
[2024-06-10 17:19:33,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.25 | bwd_microstep: 1835.43 | bwd_inner_microstep: 1835.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3571
[2024-06-10 17:19:35,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.59 | bwd_microstep: 1444.96 | bwd_inner_microstep: 1444.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 17:19:37,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.60 | bwd_microstep: 1507.24 | bwd_inner_microstep: 1507.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066
[2024-06-10 17:19:38,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 917.45 | bwd_inner_microstep: 917.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-10 17:19:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 17:19:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.18 | bwd_microstep: 3911.85 | bwd_inner_microstep: 1870.08 | bwd_allreduce_microstep: 2041.72 | step_microstep: 38.19
[2024-06-10 17:19:43,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15790.88 | bwd: 44466.62 | bwd_inner: 42424.00 | bwd_allreduce: 2041.95 | step: 39.65
{'loss': 1.2436, 'learning_rate': 1.704566787463151e-05, 'epoch': 0.56}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 17:19:45,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1244.22 | bwd_inner_microstep: 1244.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3945
[2024-06-10 17:19:47,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.33 | bwd_microstep: 1498.17 | bwd_inner_microstep: 1498.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-10 17:19:48,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.13 | bwd_microstep: 1214.51 | bwd_inner_microstep: 1214.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 17:19:50,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.68 | bwd_microstep: 1444.33 | bwd_inner_microstep: 1444.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 17:19:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1639.12 | bwd_inner_microstep: 1639.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410
[2024-06-10 17:19:54,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.32 | bwd_microstep: 1307.75 | bwd_inner_microstep: 1307.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 17:19:56,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1247.60 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-10 17:19:58,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.68 | bwd_microstep: 1626.31 | bwd_inner_microstep: 1626.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 847
[2024-06-10 17:19:59,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 134.32 | bwd_microstep: 346.49 | bwd_inner_microstep: 346.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704
[2024-06-10 17:20:01,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.38 | bwd_microstep: 1630.09 | bwd_inner_microstep: 1630.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 17:20:03,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1383.48 | bwd_inner_microstep: 1383.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 17:20:05,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1472.58 | bwd_inner_microstep: 1472.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 17:20:07,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1507.07 | bwd_inner_microstep: 1507.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 17:20:09,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 17:20:11,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1512.57 | bwd_inner_microstep: 1512.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-10 17:20:12,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.14 | bwd_microstep: 891.19 | bwd_inner_microstep: 891.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3876
[2024-06-10 17:20:14,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.73 | bwd_microstep: 1353.55 | bwd_inner_microstep: 1353.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 17:20:16,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.82 | bwd_microstep: 1299.97 | bwd_inner_microstep: 1299.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-10 17:20:17,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.68 | bwd_microstep: 932.36 | bwd_inner_microstep: 932.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 17:20:19,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1417.22 | bwd_inner_microstep: 1417.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3654
[2024-06-10 17:20:21,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1227.19 | bwd_inner_microstep: 1227.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-10 17:20:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.09 | bwd_microstep: 687.36 | bwd_inner_microstep: 687.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 17:20:24,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1405.42 | bwd_inner_microstep: 1405.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 17:20:26,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1379.83 | bwd_inner_microstep: 1379.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2283
[2024-06-10 17:20:27,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.66 | bwd_microstep: 940.86 | bwd_inner_microstep: 940.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-10 17:20:28,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.74 | bwd_microstep: 877.15 | bwd_inner_microstep: 877.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 17:20:30,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.32 | bwd_microstep: 1501.88 | bwd_inner_microstep: 1501.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 17:20:31,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.66 | bwd_microstep: 779.38 | bwd_inner_microstep: 779.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601
[2024-06-10 17:20:33,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1308.91 | bwd_inner_microstep: 1308.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 17:20:35,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1396.61 | bwd_inner_microstep: 1396.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2021
[2024-06-10 17:20:36,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.69 | bwd_microstep: 904.08 | bwd_inner_microstep: 904.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618
[2024-06-10 17:20:46,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.09 | optimizer_step: 6.59
[2024-06-10 17:20:46,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.17 | bwd_microstep: 8638.94 | bwd_inner_microstep: 1618.56 | bwd_allreduce_microstep: 7020.32 | step_microstep: 37.89
[2024-06-10 17:20:46,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15075.12 | bwd: 47393.84 | bwd_inner: 40372.61 | bwd_allreduce: 7020.56 | step: 39.38
{'loss': 1.1777, 'learning_rate': 1.700855089433589e-05, 'epoch': 0.56}
|█████▌    | 964/1726 [16:38:13<12:57:46, 61.24s/it]
 56%|█████▌    | 965/1726 [16:39:14<12:57:43, 61.32s/it]


 56%|█████▌    | 965/1726 [16:39:14<12:57:43, 61.32s/it]
 56%|█████▌    | 966/1726 [16:40:16<13:00:11, 61.59s/it]


 56%|█████▌    | 966/1726 [16:40:16<13:00:11, 61.59s/it]
 56%|█████▌    | 967/1726 [16:41:19<13:02:47, 61.88s/it]


 56%|█████▌    | 967/1726 [16:41:19<13:02:47, 61.88s/it]
 56%|█████▌    | 968/1726 [16:42:20<12:56:52, 61.49s/it]


 56%|█████▌    | 968/1726 [16:42:20<12:56:52, 61.49s/it]
 56%|█████▌    | 969/1726 [16:43:22<13:00:46, 61.88s/it]


 56%|█████▌    | 969/1726 [16:43:22<dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 17:20:47,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.11 | bwd_microstep: 1334.37 | bwd_inner_microstep: 1334.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 17:20:49,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1379.32 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3921
[2024-06-10 17:20:51,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.53 | bwd_microstep: 1449.79 | bwd_inner_microstep: 1449.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2453
[2024-06-10 17:20:53,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.92 | bwd_microstep: 929.44 | bwd_inner_microstep: 929.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 17:20:54,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.73 | bwd_microstep: 1247.70 | bwd_inner_microstep: 1247.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3711
[2024-06-10 17:20:56,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1430.36 | bwd_inner_microstep: 1430.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613
[2024-06-10 17:20:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.47 | bwd_microstep: 1211.11 | bwd_inner_microstep: 1211.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 17:21:00,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1385.40 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-10 17:21:02,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.87 | bwd_microstep: 1620.41 | bwd_inner_microstep: 1620.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3714
[2024-06-10 17:21:04,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.73 | bwd_microstep: 1660.98 | bwd_inner_microstep: 1660.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 17:21:06,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.01 | bwd_microstep: 1281.07 | bwd_inner_microstep: 1281.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3959
[2024-06-10 17:21:08,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1586.13 | bwd_inner_microstep: 1586.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2726
[2024-06-10 17:21:10,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.95 | bwd_microstep: 1132.31 | bwd_inner_microstep: 1132.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 17:21:12,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1341.37 | bwd_inner_microstep: 1341.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498
[2024-06-10 17:21:14,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.34 | bwd_microstep: 1444.39 | bwd_inner_microstep: 1444.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 17:21:16,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1417.89 | bwd_inner_microstep: 1417.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 17:21:17,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.74 | bwd_microstep: 1182.31 | bwd_inner_microstep: 1182.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 17:21:19,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.12 | bwd_microstep: 1254.35 | bwd_inner_microstep: 1254.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 17:21:21,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1293.48 | bwd_inner_microstep: 1293.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2000
[2024-06-10 17:21:22,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.93 | bwd_microstep: 707.93 | bwd_inner_microstep: 707.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1970
[2024-06-10 17:21:23,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 704.11 | bwd_inner_microstep: 704.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 17:21:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1431.33 | bwd_inner_microstep: 1431.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720
[2024-06-10 17:21:27,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1335.85 | bwd_inner_microstep: 1335.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 17:21:29,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.25 | bwd_microstep: 1649.70 | bwd_inner_microstep: 1649.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 17:21:31,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2298
[2024-06-10 17:21:32,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.90 | bwd_microstep: 877.90 | bwd_inner_microstep: 877.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-10 17:21:33,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.74 | bwd_microstep: 808.29 | bwd_inner_microstep: 808.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 17:21:35,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.04 | bwd_microstep: 914.37 | bwd_inner_microstep: 914.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3832
[2024-06-10 17:21:37,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.55 | bwd_microstep: 1510.58 | bwd_inner_microstep: 1510.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815
[2024-06-10 17:21:39,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.90 | bwd_microstep: 1607.78 | bwd_inner_microstep: 1607.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 17:21:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.31 | bwd_microstep: 1444.09 | bwd_inner_microstep: 1444.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3590
[2024-06-10 17:21:48,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.34 | optimizer_step: 6.60
[2024-06-10 17:21:48,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.55 | bwd_microstep: 6432.29 | bwd_inner_microstep: 2005.52 | bwd_allreduce_microstep: 4426.71 | step_microstep: 40.68
[2024-06-10 17:21:48,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15651.40 | bwd: 46408.75 | bwd_inner: 41981.13 | bwd_allreduce: 4426.94 | step: 42.15
{'loss': 1.1195, 'learning_rate': 1.6971444449908474e-05, 'epoch': 0.56}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2462
[2024-06-10 17:21:49,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.71 | bwd_microstep: 1034.22 | bwd_inner_microstep: 1034.09 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3460
[2024-06-10 17:21:51,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.04 | bwd_microstep: 1240.41 | bwd_inner_microstep: 1240.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3884
[2024-06-10 17:21:53,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.08 | bwd_microstep: 1679.63 | bwd_inner_microstep: 1679.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 17:21:55,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1375.52 | bwd_inner_microstep: 1375.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 17:21:57,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1279.38 | bwd_inner_microstep: 1279.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 17:21:59,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.18 | bwd_microstep: 1625.22 | bwd_inner_microstep: 1625.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 17:22:01,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 17:22:02,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.26 | bwd_microstep: 787.25 | bwd_inner_microstep: 787.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1878
[2024-06-10 17:22:03,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.22 | bwd_microstep: 725.59 | bwd_inner_microstep: 725.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 17:22:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.99 | bwd_microstep: 1388.10 | bwd_inner_microstep: 1388.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 17:22:07,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1388.66 | bwd_inner_microstep: 1388.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3505
[2024-06-10 17:22:09,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.56 | bwd_microstep: 1249.77 | bwd_inner_microstep: 1249.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 17:22:11,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 1484.50 | bwd_inner_microstep: 1484.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 17:22:13,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1487.10 | bwd_inner_microstep: 1487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2118
[2024-06-10 17:22:14,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.82 | bwd_microstep: 984.75 | bwd_inner_microstep: 984.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096
[2024-06-10 17:22:16,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.23 | bwd_microstep: 917.38 | bwd_inner_microstep: 917.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3443
[2024-06-10 17:22:18,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.95 | bwd_microstep: 1378.49 | bwd_inner_microstep: 1378.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2430
[2024-06-10 17:22:19,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.44 | bwd_microstep: 941.23 | bwd_inner_microstep: 941.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3843
[2024-06-10 17:22:21,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1267.96 | bwd_inner_microstep: 1267.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 17:22:23,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1353.60 | bwd_inner_microstep: 1353.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 17:22:24,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1383.96 | bwd_inner_microstep: 1383.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 17:22:26,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1453.92 | bwd_inner_microstep: 1453.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 17:22:28,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1280.72 | bwd_inner_microstep: 1280.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 17:22:30,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.92 | bwd_inner_microstep: 1349.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 17:22:32,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.33 | bwd_microstep: 1507.02 | bwd_inner_microstep: 1507.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 17:22:34,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.09 | bwd_microstep: 1451.47 | bwd_inner_microstep: 1451.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231
[2024-06-10 17:22:35,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 960.44 | bwd_inner_microstep: 960.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-10 17:22:37,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.83 | bwd_microstep: 1298.55 | bwd_inner_microstep: 1298.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819
[2024-06-10 17:22:40,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.15 | bwd_microstep: 1727.15 | bwd_inner_microstep: 1727.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3392
[2024-06-10 17:22:41,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.22 | bwd_microstep: 1273.68 | bwd_inner_microstep: 1273.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 17:22:43,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1301.27 | bwd_inner_microstep: 1301.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 17:22:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 17:22:50,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.58 | bwd_microstep: 6563.28 | bwd_inner_microstep: 2007.62 | bwd_allreduce_microstep: 4555.59 | step_microstep: 39.45
[2024-06-10 17:22:50,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15546.33 | bwd: 46523.66 | bwd_inner: 41967.05 | bwd_allreduce: 4555.88 | step: 41.03
{'loss': 1.21, 'learning_rate': 1.6934348672037956e-05, 'epoch': 0.56}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 17:22:52,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 17:22:54,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.19 | bwd_microstep: 1274.15 | bwd_inner_microstep: 1274.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 17:22:56,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1148.15 | bwd_inner_microstep: 1148.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 17:22:58,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.10 | bwd_microstep: 1385.02 | bwd_inner_microstep: 1385.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3427
[2024-06-10 17:22:59,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.77 | bwd_microstep: 1212.28 | bwd_inner_microstep: 1212.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4130
[2024-06-10 17:23:02,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.44 | bwd_microstep: 1636.54 | bwd_inner_microstep: 1636.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 17:23:03,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.58 | bwd_microstep: 1421.55 | bwd_inner_microstep: 1421.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705
[2024-06-10 17:23:06,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.39 | bwd_microstep: 1522.97 | bwd_inner_microstep: 1522.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 17:23:08,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.27 | bwd_microstep: 1385.86 | bwd_inner_microstep: 1385.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687
[2024-06-10 17:23:10,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.57 | bwd_microstep: 1524.33 | bwd_inner_microstep: 1524.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-10 17:23:11,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.96 | bwd_microstep: 1279.04 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2668
[2024-06-10 17:23:13,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.48 | bwd_microstep: 1117.22 | bwd_inner_microstep: 1117.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 17:23:14,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.32 | bwd_microstep: 791.54 | bwd_inner_microstep: 791.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3494
[2024-06-10 17:23:16,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.51 | bwd_microstep: 1678.97 | bwd_inner_microstep: 1678.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 17:23:18,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1246.91 | bwd_inner_microstep: 1246.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 17:23:20,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.60 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 17:23:21,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.49 | bwd_microstep: 1186.40 | bwd_inner_microstep: 1186.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 17:23:23,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1296.91 | bwd_inner_microstep: 1296.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 17:23:25,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.54 | bwd_microstep: 1292.19 | bwd_inner_microstep: 1292.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 17:23:27,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1293.64 | bwd_inner_microstep: 1293.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 17:23:29,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1291.58 | bwd_inner_microstep: 1291.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 17:23:31,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.76 | bwd_microstep: 1386.01 | bwd_inner_microstep: 1385.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 17:23:32,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.43 | bwd_microstep: 1291.07 | bwd_inner_microstep: 1291.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3829
[2024-06-10 17:23:34,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1498.01 | bwd_inner_microstep: 1497.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 17:23:36,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1439.99 | bwd_inner_microstep: 1439.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2593
[2024-06-10 17:23:38,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.89 | bwd_microstep: 1064.37 | bwd_inner_microstep: 1064.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 17:23:40,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.75 | bwd_microstep: 1501.75 | bwd_inner_microstep: 1501.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 17:23:42,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1407.44 | bwd_inner_microstep: 1407.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3633
[2024-06-10 17:23:44,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.42 | bwd_microstep: 1646.27 | bwd_inner_microstep: 1646.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 17:23:46,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1603.72 | bwd_inner_microstep: 1603.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3572
[2024-06-10 17:23:48,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.22 | bwd_microstep: 1521.20 | bwd_inner_microstep: 1521.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 17:23:55,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 17:23:55,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 5860.30 | bwd_inner_microstep: 1635.17 | bwd_allreduce_microstep: 4225.07 | step_microstep: 38.02
[2024-06-10 17:23:55,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16289.11 | bwd: 47874.28 | bwd_inner: 43648.31 | bwd_allreduce: 4225.30 | step: 39.49
{'loss': 1.2054, 'learning_rate': 1.6897263691375475e-05, 'epoch': 0.56}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 17:23:57,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.16 | bwd_microstep: 1276.72 | bwd_inner_microstep: 1276.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-10 17:23:58,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.99 | bwd_microstep: 1176.97 | bwd_inner_microstep: 1176.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3856
[2024-06-10 17:24:00,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.65 | bwd_microstep: 1458.07 | bwd_inner_microstep: 1458.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2267
[2024-06-10 17:24:01,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.70 | bwd_microstep: 778.67 | bwd_inner_microstep: 778.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 17:24:04,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1531.89 | bwd_inner_microstep: 1531.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 17:24:05,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1394.34 | bwd_inner_microstep: 1394.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 17:24:08,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.73 | bwd_microstep: 1632.22 | bwd_inner_microstep: 1632.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2440
[2024-06-10 17:24:09,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.83 | bwd_microstep: 1043.02 | bwd_inner_microstep: 1042.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3404
[2024-06-10 17:24:11,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.28 | bwd_microstep: 1211.67 | bwd_inner_microstep: 1211.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 17:24:13,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.59 | bwd_microstep: 1285.23 | bwd_inner_microstep: 1285.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 17:24:14,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 17:24:16,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1484.17 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 17:24:18,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1343.89 | bwd_inner_microstep: 1343.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-10 17:24:20,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1430.16 | bwd_inner_microstep: 1430.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693
[2024-06-10 17:24:23,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.99 | bwd_microstep: 1722.99 | bwd_inner_microstep: 1722.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 17:24:25,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1383.77 | bwd_inner_microstep: 1383.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437
[2024-06-10 17:24:26,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1313.42 | bwd_inner_microstep: 1313.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3501
[2024-06-10 17:24:28,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.81 | bwd_microstep: 1190.00 | bwd_inner_microstep: 1189.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-10 17:24:30,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1319.29 | bwd_inner_microstep: 1319.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478
[2024-06-10 17:24:32,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.01 | bwd_microstep: 1409.28 | bwd_inner_microstep: 1409.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 17:24:34,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1494.06 | bwd_inner_microstep: 1494.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 17:24:36,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.03 | bwd_microstep: 1487.05 | bwd_inner_microstep: 1487.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3633
[2024-06-10 17:24:38,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1376.27 | bwd_inner_microstep: 1376.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-10 17:24:40,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.72 | bwd_microstep: 1605.72 | bwd_inner_microstep: 1605.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 17:24:42,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1284.53 | bwd_inner_microstep: 1284.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3610
[2024-06-10 17:24:44,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1475.28 | bwd_inner_microstep: 1475.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3517
[2024-06-10 17:24:46,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1417.93 | bwd_inner_microstep: 1417.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 17:24:47,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.94 | bwd_microstep: 908.62 | bwd_inner_microstep: 908.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 17:24:49,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.42 | bwd_microstep: 1636.25 | bwd_inner_microstep: 1636.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 17:24:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1351.72 | bwd_inner_microstep: 1351.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 17:24:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.15 | bwd_microstep: 1482.64 | bwd_inner_microstep: 1482.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3796
[2024-06-10 17:24:57,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 17:24:57,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.85 | bwd_microstep: 2780.16 | bwd_inner_microstep: 1981.15 | bwd_allreduce_microstep: 798.97 | step_microstep: 37.93
[2024-06-10 17:24:57,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16422.85 | bwd: 44970.91 | bwd_inner: 44171.04 | bwd_allreduce: 799.19 | step: 39.42
{'loss': 1.274, 'learning_rate': 1.6860189638534142e-05, 'epoch': 0.56}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 17:24:58,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 787.89 | bwd_inner_microstep: 787.75 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4220
[2024-06-10 17:25:00,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.24 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 17:25:02,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1387.87 | bwd_inner_microstep: 1387.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4140
[2024-06-10 17:25:04,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.56 | bwd_microstep: 1639.61 | bwd_inner_microstep: 1639.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2475
[2024-06-10 17:25:05,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 856.53 | bwd_inner_microstep: 856.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 17:25:07,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1281.78 | bwd_inner_microstep: 1281.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 17:25:08,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.78 | bwd_microstep: 804.26 | bwd_inner_microstep: 804.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 17:25:10,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.52 | bwd_microstep: 1148.69 | bwd_inner_microstep: 1148.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2081
[2024-06-10 17:25:11,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.14 | bwd_microstep: 820.13 | bwd_inner_microstep: 820.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.47
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 17:25:13,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.31 | bwd_microstep: 1318.55 | bwd_inner_microstep: 1318.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3425
[2024-06-10 17:25:15,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.49 | bwd_microstep: 1372.56 | bwd_inner_microstep: 1372.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3680
[2024-06-10 17:25:17,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.48 | bwd_microstep: 1689.46 | bwd_inner_microstep: 1689.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 17:25:19,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.22 | bwd_microstep: 1376.06 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2923
[2024-06-10 17:25:21,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.47 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-10 17:25:22,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383
[2024-06-10 17:25:24,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.67 | bwd_microstep: 1241.77 | bwd_inner_microstep: 1241.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 17:25:26,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.36 | bwd_microstep: 1482.57 | bwd_inner_microstep: 1482.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532
[2024-06-10 17:25:28,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.79 | bwd_microstep: 1452.90 | bwd_inner_microstep: 1452.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3619
[2024-06-10 17:25:30,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.64 | bwd_microstep: 1554.64 | bwd_inner_microstep: 1554.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3533
[2024-06-10 17:25:33,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1542.32 | bwd_inner_microstep: 1542.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-10 17:25:34,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.32 | bwd_microstep: 1215.89 | bwd_inner_microstep: 1215.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3553
[2024-06-10 17:25:36,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.26 | bwd_microstep: 1197.69 | bwd_inner_microstep: 1197.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3611
[2024-06-10 17:25:38,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.41 | bwd_microstep: 1374.19 | bwd_inner_microstep: 1374.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 17:25:40,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1393.71 | bwd_inner_microstep: 1393.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 17:25:42,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1496.76 | bwd_inner_microstep: 1496.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 17:25:44,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1399.01 | bwd_inner_microstep: 1398.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 17:25:46,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.55 | bwd_microstep: 1386.85 | bwd_inner_microstep: 1386.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-10 17:25:47,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.32 | bwd_microstep: 1189.94 | bwd_inner_microstep: 1189.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3809
[2024-06-10 17:25:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1383.84 | bwd_inner_microstep: 1383.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-10 17:25:51,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.68 | bwd_microstep: 1649.74 | bwd_inner_microstep: 1649.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 17:25:54,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.27 | bwd_microstep: 1654.68 | bwd_inner_microstep: 1654.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 17:25:59,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 17:25:59,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 5172.56 | bwd_inner_microstep: 1465.87 | bwd_allreduce_microstep: 3706.63 | step_microstep: 38.03
[2024-06-10 17:25:59,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16005.53 | bwd: 46499.52 | bwd_inner: 42791.89 | bwd_allreduce: 3706.91 | step: 40.94
{'loss': 1.2102, 'learning_rate': 1.6823126644088586e-05, 'epoch': 0.56}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3509
[2024-06-10 17:26:02,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.82 | bwd_microstep: 1523.48 | bwd_inner_microstep: 1523.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-10 17:26:03,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.96 | bwd_microstep: 1214.57 | bwd_inner_microstep: 1214.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 17:26:05,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.16 | bwd_microstep: 1451.35 | bwd_inner_microstep: 1451.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1904
[2024-06-10 17:26:06,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.73 | bwd_microstep: 782.17 | bwd_inner_microstep: 782.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 17:26:08,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.49 | bwd_microstep: 1245.09 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3002
[2024-06-10 17:26:10,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.73 | bwd_microstep: 1107.16 | bwd_inner_microstep: 1107.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 17:26:12,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.88 | bwd_microstep: 1551.52 | bwd_inner_microstep: 1551.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 17:26:14,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.29 | bwd_microstep: 1525.76 | bwd_inner_microstep: 1525.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-10 17:26:15,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 699.08 | bwd_inner_microstep: 699.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3495
[2024-06-10 17:26:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.55 | bwd_microstep: 1295.34 | bwd_inner_microstep: 1295.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3448
[2024-06-10 17:26:18,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.59 | bwd_microstep: 1333.76 | bwd_inner_microstep: 1333.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469
[2024-06-10 17:26:20,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.31 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 17:26:22,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1490.51 | bwd_inner_microstep: 1490.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2670
[2024-06-10 17:26:24,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.04 | bwd_microstep: 1054.93 | bwd_inner_microstep: 1054.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 17:26:26,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1389.51 | bwd_inner_microstep: 1389.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3647
[2024-06-10 17:26:28,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.24 | bwd_microstep: 1647.04 | bwd_inner_microstep: 1647.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-10 17:26:30,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1408.14 | bwd_inner_microstep: 1408.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 17:26:32,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1377.52 | bwd_inner_microstep: 1377.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 17:26:34,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.76 | bwd_microstep: 1290.50 | bwd_inner_microstep: 1290.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 17:26:36,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.04 | bwd_microstep: 1655.96 | bwd_inner_microstep: 1655.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2192
[2024-06-10 17:26:37,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.09 | bwd_microstep: 764.65 | bwd_inner_microstep: 764.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-10 17:26:39,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1433.34 | bwd_inner_microstep: 1433.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816
[2024-06-10 17:26:41,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.05 | bwd_microstep: 1687.12 | bwd_inner_microstep: 1687.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432
[2024-06-10 17:26:43,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.88 | bwd_microstep: 1282.81 | bwd_inner_microstep: 1282.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3532
[2024-06-10 17:26:45,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.51 | bwd_microstep: 1437.64 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3597
[2024-06-10 17:26:47,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1465.42 | bwd_inner_microstep: 1465.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2054
[2024-06-10 17:26:48,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.73 | bwd_microstep: 726.09 | bwd_inner_microstep: 726.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 17:26:50,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.07 | bwd_microstep: 1464.55 | bwd_inner_microstep: 1464.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 17:26:52,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.98 | bwd_microstep: 1541.40 | bwd_inner_microstep: 1541.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-10 17:26:54,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.45 | bwd_microstep: 1444.14 | bwd_inner_microstep: 1444.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2265
[2024-06-10 17:26:56,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 968.51 | bwd_inner_microstep: 968.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 17:27:01,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 17:27:01,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 4633.42 | bwd_inner_microstep: 1562.57 | bwd_allreduce_microstep: 3070.80 | step_microstep: 37.90
[2024-06-10 17:27:01,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15733.98 | bwd: 45233.83 | bwd_inner: 42162.12 | bwd_allreduce: 3071.03 | step: 39.32
13:00:46, 61.88s/it]
 56%|█████▌    | 970/1726 [16:44:25<13:01:39, 62.04s/it]


 56%|█████▌    | 970/1726 [16:44:25<13:01:39, 62.04s/it]
 56%|█████▋    | 971/1726 [16:45:27<13:02:01, 62.15s/it]


 56%|█████▋    | 971/1726 [16:45:27<13:02:01, 62.15s/it]
 56%|█████▋    | 972/1726 [16:46:32<13:09:50, 62.85s/it]


 56%|█████▋    | 972/1726 [16:46:32<13:09:50, 62.85s/it]
 56%|█████▋    | 973/1726 [16:47:33<13:04:33, 62.51s/it]


 56%|█████▋    | 973/1726 [16:47:33<13:04:33, 62.51s/it]
 56%|█████▋    | 974/1726 [16:48:36<13:04:43, 62.61s/it]


 56%|█████▋    | 974/1726 [16:48:36<13:04:43, 62.61s/it]
 56%|█████▋{'loss': 1.2579, 'learning_rate': 1.678607483857448e-05, 'epoch': 0.56}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 17:27:03,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1370.27 | bwd_inner_microstep: 1370.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 17:27:05,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.23 | bwd_microstep: 1477.69 | bwd_inner_microstep: 1477.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4217
[2024-06-10 17:27:07,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1463.40 | bwd_inner_microstep: 1463.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-10 17:27:09,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.92 | bwd_microstep: 1345.75 | bwd_inner_microstep: 1345.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 17:27:10,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.11 | bwd_microstep: 1283.31 | bwd_inner_microstep: 1283.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 17:27:12,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1383.48 | bwd_inner_microstep: 1383.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 17:27:14,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.14 | bwd_microstep: 1540.48 | bwd_inner_microstep: 1540.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 17:27:16,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1438.02 | bwd_inner_microstep: 1437.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2099
[2024-06-10 17:27:18,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.35 | bwd_microstep: 824.10 | bwd_inner_microstep: 824.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 17:27:20,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 1483.79 | bwd_inner_microstep: 1483.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3447
[2024-06-10 17:27:22,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1399.00 | bwd_inner_microstep: 1398.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3673
[2024-06-10 17:27:24,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.90 | bwd_microstep: 1474.25 | bwd_inner_microstep: 1474.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 17:27:25,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1382.20 | bwd_inner_microstep: 1382.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 17:27:27,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1279.17 | bwd_inner_microstep: 1279.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2392
[2024-06-10 17:27:29,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.52 | bwd_microstep: 1031.19 | bwd_inner_microstep: 1031.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650
[2024-06-10 17:27:31,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.13 | bwd_microstep: 1615.51 | bwd_inner_microstep: 1615.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 17:27:33,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1288.65 | bwd_inner_microstep: 1288.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517
[2024-06-10 17:27:34,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.69 | bwd_microstep: 1194.07 | bwd_inner_microstep: 1194.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-10 17:27:35,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.75 | bwd_microstep: 703.32 | bwd_inner_microstep: 703.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 17:27:37,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.73 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-10 17:27:38,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 696.24 | bwd_inner_microstep: 696.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3571
[2024-06-10 17:27:40,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1267.09 | bwd_inner_microstep: 1267.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 17:27:42,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1512.98 | bwd_inner_microstep: 1512.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3441
[2024-06-10 17:27:44,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.03 | bwd_microstep: 1220.48 | bwd_inner_microstep: 1220.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 17:27:46,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.55 | bwd_microstep: 1537.76 | bwd_inner_microstep: 1537.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3922
[2024-06-10 17:27:48,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.21 | bwd_microstep: 1598.20 | bwd_inner_microstep: 1598.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3565
[2024-06-10 17:27:50,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.73 | bwd_microstep: 1431.23 | bwd_inner_microstep: 1431.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-10 17:27:51,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.49 | bwd_microstep: 832.90 | bwd_inner_microstep: 832.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 17:27:53,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 1352.95 | bwd_inner_microstep: 1352.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2020
[2024-06-10 17:27:54,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.79 | bwd_microstep: 905.83 | bwd_inner_microstep: 905.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3811
[2024-06-10 17:27:57,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.17 | bwd_microstep: 1621.56 | bwd_inner_microstep: 1621.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3032
[2024-06-10 17:28:02,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 17:28:02,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.70 | bwd_microstep: 4406.68 | bwd_inner_microstep: 1471.44 | bwd_allreduce_microstep: 2935.19 | step_microstep: 37.88
[2024-06-10 17:28:02,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15640.73 | bwd: 44789.73 | bwd_inner: 41853.65 | bwd_allreduce: 2935.41 | step: 39.31
{'loss': 1.2359, 'learning_rate': 1.6749034352488077e-05, 'epoch': 0.57}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1933
[2024-06-10 17:28:03,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.91 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.20 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3899
[2024-06-10 17:28:05,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1564.04 | bwd_inner_microstep: 1564.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 17:28:07,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.80 | bwd_microstep: 1344.78 | bwd_inner_microstep: 1344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 17:28:09,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1347.11 | bwd_inner_microstep: 1347.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 17:28:10,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1340.69 | bwd_inner_microstep: 1340.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:28:12,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1245.41 | bwd_inner_microstep: 1245.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 17:28:14,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1345.58 | bwd_inner_microstep: 1345.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 17:28:16,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 17:28:18,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 17:28:20,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1251.60 | bwd_inner_microstep: 1251.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 17:28:22,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1414.65 | bwd_inner_microstep: 1414.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1995
[2024-06-10 17:28:23,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.42 | bwd_microstep: 834.54 | bwd_inner_microstep: 834.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-10 17:28:25,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.35 | bwd_microstep: 1318.40 | bwd_inner_microstep: 1318.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454
[2024-06-10 17:28:26,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.72 | bwd_microstep: 1221.00 | bwd_inner_microstep: 1220.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 17:28:28,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.38 | bwd_microstep: 1477.83 | bwd_inner_microstep: 1477.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3518
[2024-06-10 17:28:30,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.25 | bwd_microstep: 1536.71 | bwd_inner_microstep: 1536.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:28:32,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1245.36 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 17:28:34,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.64 | bwd_microstep: 1650.32 | bwd_inner_microstep: 1650.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 17:28:36,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1555.11 | bwd_inner_microstep: 1555.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 17:28:39,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1454.17 | bwd_inner_microstep: 1454.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 17:28:40,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1280.53 | bwd_inner_microstep: 1280.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538
[2024-06-10 17:28:42,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.33 | bwd_microstep: 1200.24 | bwd_inner_microstep: 1200.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2288
[2024-06-10 17:28:43,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.47 | bwd_microstep: 849.67 | bwd_inner_microstep: 849.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 17:28:45,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.57 | bwd_microstep: 1286.29 | bwd_inner_microstep: 1286.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 17:28:47,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.70 | bwd_microstep: 1646.53 | bwd_inner_microstep: 1646.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 17:28:49,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.44 | bwd_microstep: 1498.72 | bwd_inner_microstep: 1498.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 17:28:51,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.13 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 17:28:53,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1249.47 | bwd_inner_microstep: 1249.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3443
[2024-06-10 17:28:55,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.90 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 17:28:57,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.16 | bwd_microstep: 1552.02 | bwd_inner_microstep: 1551.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 17:28:59,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1258.00 | bwd_inner_microstep: 1257.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 17:29:02,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.64
[2024-06-10 17:29:02,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.89 | bwd_microstep: 2970.09 | bwd_inner_microstep: 777.60 | bwd_allreduce_microstep: 2192.44 | step_microstep: 37.84
[2024-06-10 17:29:02,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15826.01 | bwd: 44531.69 | bwd_inner: 42338.29 | bwd_allreduce: 2192.68 | step: 39.36
{'loss': 1.2379, 'learning_rate': 1.671200531628578e-05, 'epoch': 0.57}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 17:29:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.93 | bwd_microstep: 1329.75 | bwd_inner_microstep: 1329.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 17:29:06,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.74 | bwd_microstep: 1292.75 | bwd_inner_microstep: 1292.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3920
[2024-06-10 17:29:08,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.66 | bwd_microstep: 1487.53 | bwd_inner_microstep: 1487.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 17:29:10,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1447.10 | bwd_inner_microstep: 1447.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 17:29:12,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1644.79 | bwd_inner_microstep: 1644.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 913
[2024-06-10 17:29:13,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.95 | bwd_microstep: 370.45 | bwd_inner_microstep: 370.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 17:29:15,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1393.90 | bwd_inner_microstep: 1393.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-10 17:29:17,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.45 | bwd_microstep: 1422.00 | bwd_inner_microstep: 1421.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 17:29:18,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1385.33 | bwd_inner_microstep: 1385.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3465
[2024-06-10 17:29:20,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.59 | bwd_microstep: 1243.72 | bwd_inner_microstep: 1243.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2679
[2024-06-10 17:29:22,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.90 | bwd_microstep: 1062.34 | bwd_inner_microstep: 1062.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 17:29:24,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1480.98 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 17:29:26,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1401.54 | bwd_inner_microstep: 1401.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 17:29:27,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.81 | bwd_microstep: 1280.31 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481
[2024-06-10 17:29:29,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1402.64 | bwd_inner_microstep: 1402.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 17:29:31,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1383.18 | bwd_inner_microstep: 1383.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-10 17:29:33,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.96 | bwd_microstep: 894.03 | bwd_inner_microstep: 894.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2102
[2024-06-10 17:29:34,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.89 | bwd_microstep: 930.58 | bwd_inner_microstep: 930.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 17:29:36,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.21 | bwd_microstep: 1505.40 | bwd_inner_microstep: 1505.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668
[2024-06-10 17:29:38,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.21 | bwd_microstep: 1621.35 | bwd_inner_microstep: 1621.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 17:29:40,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.64 | bwd_microstep: 1444.45 | bwd_inner_microstep: 1444.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449
[2024-06-10 17:29:42,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.49 | bwd_microstep: 1288.94 | bwd_inner_microstep: 1288.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-10 17:29:44,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.03 | bwd_microstep: 1361.92 | bwd_inner_microstep: 1361.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 17:29:46,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.74 | bwd_microstep: 1284.64 | bwd_inner_microstep: 1284.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-10 17:29:47,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.02 | bwd_microstep: 1192.66 | bwd_inner_microstep: 1192.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 17:29:49,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1496.88 | bwd_inner_microstep: 1496.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 17:29:51,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.13 | bwd_microstep: 1403.09 | bwd_inner_microstep: 1403.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2025
[2024-06-10 17:29:52,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.48 | bwd_microstep: 837.83 | bwd_inner_microstep: 837.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3934
[2024-06-10 17:29:55,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.83 | bwd_microstep: 1561.63 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547
[2024-06-10 17:29:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.71 | bwd_microstep: 1454.24 | bwd_inner_microstep: 1454.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3804
[2024-06-10 17:29:59,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.73 | bwd_microstep: 1477.26 | bwd_inner_microstep: 1477.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577
[2024-06-10 17:30:01,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.02 | optimizer_step: 6.59
[2024-06-10 17:30:01,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.33 | bwd_microstep: 1999.83 | bwd_inner_microstep: 1455.85 | bwd_allreduce_microstep: 543.93 | step_microstep: 37.42
[2024-06-10 17:30:01,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15802.12 | bwd: 42783.03 | bwd_inner: 42238.21 | bwd_allreduce: 544.16 | step: 38.85
{'loss': 1.198, 'learning_rate': 1.667498786038367e-05, 'epoch': 0.57}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 17:30:03,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.88 | bwd_microstep: 1143.91 | bwd_inner_microstep: 1143.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1938
[2024-06-10 17:30:04,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.09 | bwd_microstep: 821.76 | bwd_inner_microstep: 821.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 17:30:06,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.95 | bwd_microstep: 1239.56 | bwd_inner_microstep: 1239.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2328
[2024-06-10 17:30:07,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.23 | bwd_microstep: 980.77 | bwd_inner_microstep: 980.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1901
[2024-06-10 17:30:08,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.42 | bwd_microstep: 775.95 | bwd_inner_microstep: 775.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 17:30:10,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1382.68 | bwd_inner_microstep: 1382.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 17:30:11,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.59 | bwd_microstep: 959.68 | bwd_inner_microstep: 959.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870
[2024-06-10 17:30:12,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.58 | bwd_microstep: 740.85 | bwd_inner_microstep: 740.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 17:30:14,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1250.90 | bwd_inner_microstep: 1250.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091
[2024-06-10 17:30:15,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.30 | bwd_microstep: 821.17 | bwd_inner_microstep: 821.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 17:30:17,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075
[2024-06-10 17:30:18,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.20 | bwd_microstep: 914.83 | bwd_inner_microstep: 914.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3658
[2024-06-10 17:30:20,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.38 | bwd_microstep: 1450.16 | bwd_inner_microstep: 1450.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3455
[2024-06-10 17:30:23,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.01 | bwd_microstep: 1619.04 | bwd_inner_microstep: 1619.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 17:30:25,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1601.36 | bwd_inner_microstep: 1601.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 17:30:27,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.81 | bwd_microstep: 1517.61 | bwd_inner_microstep: 1517.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 17:30:29,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-10 17:30:31,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639
[2024-06-10 17:30:33,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.38 | bwd_microstep: 1573.35 | bwd_inner_microstep: 1573.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3857
[2024-06-10 17:30:35,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.04 | bwd_microstep: 1461.32 | bwd_inner_microstep: 1461.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3771
[2024-06-10 17:30:37,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1401.63 | bwd_inner_microstep: 1401.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 17:30:39,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1499.33 | bwd_inner_microstep: 1499.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 17:30:41,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.22 | bwd_microstep: 1654.53 | bwd_inner_microstep: 1654.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3847
[2024-06-10 17:30:43,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.45 | bwd_microstep: 1561.06 | bwd_inner_microstep: 1561.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 17:30:45,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.81 | bwd_microstep: 1295.81 | bwd_inner_microstep: 1295.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063
[2024-06-10 17:30:46,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.78 | bwd_microstep: 815.56 | bwd_inner_microstep: 815.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 17:30:48,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1414.90 | bwd_inner_microstep: 1414.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 17:30:50,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1350.96 | bwd_inner_microstep: 1350.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-10 17:30:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.16 | bwd_microstep: 1358.51 | bwd_inner_microstep: 1358.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 17:30:54,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1501.50 | bwd_inner_microstep: 1501.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 17:30:56,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.99 | bwd_microstep: 1311.35 | bwd_inner_microstep: 1311.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3812
[2024-06-10 17:31:02,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.69 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 17:31:02,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.65 | bwd_microstep: 5857.06 | bwd_inner_microstep: 1913.28 | bwd_allreduce_microstep: 3943.74 | step_microstep: 38.93
[2024-06-10 17:31:02,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15435.94 | bwd: 45452.77 | bwd_inner: 41508.13 | bwd_allreduce: 3943.96 | step: 40.42
{'loss': 1.2344, 'learning_rate': 1.663798211515704e-05, 'epoch': 0.57}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481
[2024-06-10 17:31:04,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.31 | bwd_microstep: 1335.76 | bwd_inner_microstep: 1335.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 17:31:06,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.95 | bwd_microstep: 1528.54 | bwd_inner_microstep: 1528.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 17:31:09,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.40 | bwd_microstep: 1652.65 | bwd_inner_microstep: 1652.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857
[2024-06-10 17:31:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.44 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 17:31:13,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.75 | bwd_microstep: 1546.43 | bwd_inner_microstep: 1546.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4173
[2024-06-10 17:31:15,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.61 | bwd_microstep: 1650.27 | bwd_inner_microstep: 1650.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073
[2024-06-10 17:31:16,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.27 | bwd_microstep: 818.19 | bwd_inner_microstep: 818.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-10 17:31:18,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.30 | bwd_microstep: 1155.42 | bwd_inner_microstep: 1155.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 17:31:19,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.24 | bwd_microstep: 681.35 | bwd_inner_microstep: 681.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 17:31:21,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.17 | bwd_microstep: 1640.18 | bwd_inner_microstep: 1640.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 17:31:23,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1392.66 | bwd_inner_microstep: 1392.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3690
[2024-06-10 17:31:25,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.38 | bwd_microstep: 1720.75 | bwd_inner_microstep: 1720.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3517
[2024-06-10 17:31:27,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.52 | bwd_microstep: 1548.53 | bwd_inner_microstep: 1548.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3650
[2024-06-10 17:31:30,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.22 | bwd_microstep: 1685.20 | bwd_inner_microstep: 1685.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 17:31:32,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.61 | bwd_microstep: 1589.70 | bwd_inner_microstep: 1589.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 17:31:34,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.36 | bwd_microstep: 1445.91 | bwd_inner_microstep: 1445.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 17:31:36,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1248.94 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2949
[2024-06-10 17:31:37,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.86 | bwd_microstep: 1196.82 | bwd_inner_microstep: 1196.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 17:31:39,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.34 | bwd_microstep: 1375.63 | bwd_inner_microstep: 1375.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-10 17:31:40,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.47 | bwd_microstep: 726.37 | bwd_inner_microstep: 726.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 17:31:42,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1415.41 | bwd_inner_microstep: 1415.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 17:31:44,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1311.45 | bwd_inner_microstep: 1311.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 17:31:46,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1510.45 | bwd_inner_microstep: 1510.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3833
[2024-06-10 17:31:48,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.79 | bwd_microstep: 1261.47 | bwd_inner_microstep: 1261.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 17:31:50,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.67 | bwd_microstep: 1295.01 | bwd_inner_microstep: 1294.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 17:31:52,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1391.57 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2027
[2024-06-10 17:31:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.49 | bwd_microstep: 806.15 | bwd_inner_microstep: 806.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 17:31:55,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.00 | bwd_microstep: 1626.68 | bwd_inner_microstep: 1626.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603
[2024-06-10 17:31:57,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.12 | bwd_microstep: 1705.24 | bwd_inner_microstep: 1705.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3651
[2024-06-10 17:31:59,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.03 | bwd_microstep: 1367.32 | bwd_inner_microstep: 1367.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3827
[2024-06-10 17:32:02,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.11 | bwd_microstep: 1866.48 | bwd_inner_microstep: 1866.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 17:32:04,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.04 | optimizer_step: 6.66
[2024-06-10 17:32:04,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1535.03 | bwd_inner_microstep: 1527.33 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.47
[2024-06-10 17:32:04,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16628.62 | bwd: 44590.83 | bwd_inner: 44582.29 | bwd_allreduce: 7.88 | step: 38.95
    | 975/1726 [16:49:37<12:58:44, 62.22s/it]


 56%|█████▋    | 975/1726 [16:49:37<12:58:44, 62.22s/it]
 57%|█████▋    | 976/1726 [16:50:38<12:52:13, 61.78s/it]


 57%|█████▋    | 976/1726 [16:50:38<12:52:13, 61.78s/it]
 57%|█████▋    | 977/1726 [16:51:39<12:47:07, 61.45s/it]


 57%|█████▋    | 977/1726 [16:51:39<12:47:07, 61.45s/it]
 57%|█████▋    | 978/1726 [16:52:38<12:36:36, 60.69s/it]


 57%|█████▋    | 978/1726 [16:52:38<12:36:36, 60.69s/it]
 57%|█████▋    | 979/1726 [16:53:39<12:37:35, 60.85s/it]


 57%|█████▋    | 979/1726 [16:53:39<12:37:35, 60.85s/it]
 57%|█████▋    | 980/1726 [16:54:41<12:39:11, 61.06s/it{'loss': 1.1842, 'learning_rate': 1.6600988210939915e-05, 'epoch': 0.57}
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3551
[2024-06-10 17:32:06,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.52 | bwd_microstep: 1405.61 | bwd_inner_microstep: 1405.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4039
[2024-06-10 17:32:08,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.42 | bwd_microstep: 1717.45 | bwd_inner_microstep: 1717.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1903
[2024-06-10 17:32:09,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.51 | bwd_microstep: 712.35 | bwd_inner_microstep: 712.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2387
[2024-06-10 17:32:11,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.23 | bwd_microstep: 1031.93 | bwd_inner_microstep: 1031.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 17:32:12,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1246.43 | bwd_inner_microstep: 1246.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 17:32:14,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.96 | bwd_microstep: 1384.02 | bwd_inner_microstep: 1383.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4189
[2024-06-10 17:32:16,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1549.35 | bwd_inner_microstep: 1549.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 17:32:18,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1247.50 | bwd_inner_microstep: 1247.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3490
[2024-06-10 17:32:20,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.35 | bwd_microstep: 1331.37 | bwd_inner_microstep: 1331.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 17:32:22,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.83 | bwd_microstep: 1519.42 | bwd_inner_microstep: 1519.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545
[2024-06-10 17:32:24,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.09 | bwd_microstep: 1587.92 | bwd_inner_microstep: 1587.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096
[2024-06-10 17:32:26,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 917.17 | bwd_inner_microstep: 917.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1997
[2024-06-10 17:32:27,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.59 | bwd_microstep: 895.43 | bwd_inner_microstep: 895.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-10 17:32:29,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1324.17 | bwd_inner_microstep: 1324.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 17:32:31,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.74 | bwd_microstep: 1514.08 | bwd_inner_microstep: 1514.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 17:32:33,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1393.15 | bwd_inner_microstep: 1393.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 17:32:35,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1399.21 | bwd_inner_microstep: 1399.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3682
[2024-06-10 17:32:36,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.80 | bwd_microstep: 1327.96 | bwd_inner_microstep: 1327.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 17:32:39,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.96 | bwd_microstep: 1612.21 | bwd_inner_microstep: 1612.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2292
[2024-06-10 17:32:40,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.17 | bwd_microstep: 791.69 | bwd_inner_microstep: 791.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076
[2024-06-10 17:32:41,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.65 | bwd_microstep: 815.08 | bwd_inner_microstep: 815.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3874
[2024-06-10 17:32:43,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 1527.48 | bwd_inner_microstep: 1527.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 17:32:45,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.96 | bwd_microstep: 1658.87 | bwd_inner_microstep: 1658.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059
[2024-06-10 17:32:46,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.75 | bwd_microstep: 913.11 | bwd_inner_microstep: 913.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-10 17:32:48,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1159.86 | bwd_inner_microstep: 1159.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 17:32:50,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.03 | bwd_microstep: 1514.84 | bwd_inner_microstep: 1514.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3824
[2024-06-10 17:32:53,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.69 | bwd_microstep: 1722.21 | bwd_inner_microstep: 1722.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895
[2024-06-10 17:32:54,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.30 | bwd_microstep: 777.17 | bwd_inner_microstep: 777.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3382
[2024-06-10 17:32:55,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.53 | bwd_microstep: 1242.55 | bwd_inner_microstep: 1242.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615
[2024-06-10 17:32:57,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1471.60 | bwd_inner_microstep: 1471.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 17:32:59,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.83 | bwd_microstep: 1410.73 | bwd_inner_microstep: 1410.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 17:33:05,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.56
[2024-06-10 17:33:05,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.39 | bwd_microstep: 5389.55 | bwd_inner_microstep: 1866.77 | bwd_allreduce_microstep: 3522.73 | step_microstep: 37.91
[2024-06-10 17:33:05,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15630.69 | bwd: 45511.54 | bwd_inner: 41987.90 | bwd_allreduce: 3522.95 | step: 39.31
{'loss': 1.1232, 'learning_rate': 1.6564006278024646e-05, 'epoch': 0.57}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 17:33:06,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.93 | bwd_microstep: 782.90 | bwd_inner_microstep: 782.83 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3915
[2024-06-10 17:33:09,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.79 | bwd_microstep: 1682.07 | bwd_inner_microstep: 1682.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 17:33:11,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.91 | bwd_microstep: 1678.70 | bwd_inner_microstep: 1678.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 17:33:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.33 | bwd_microstep: 1646.44 | bwd_inner_microstep: 1646.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 17:33:15,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.05 | bwd_microstep: 1545.43 | bwd_inner_microstep: 1545.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 810
[2024-06-10 17:33:16,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 122.09 | bwd_microstep: 310.96 | bwd_inner_microstep: 310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3525
[2024-06-10 17:33:18,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.31 | bwd_microstep: 1193.68 | bwd_inner_microstep: 1193.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 17:33:19,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 17:33:21,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1997
[2024-06-10 17:33:22,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.28 | bwd_microstep: 769.00 | bwd_inner_microstep: 768.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-10 17:33:24,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1316.80 | bwd_inner_microstep: 1316.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2013
[2024-06-10 17:33:25,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.56 | bwd_microstep: 772.59 | bwd_inner_microstep: 772.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2130
[2024-06-10 17:33:26,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.80 | bwd_microstep: 857.43 | bwd_inner_microstep: 857.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 17:33:29,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.63 | bwd_microstep: 1594.55 | bwd_inner_microstep: 1594.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4016
[2024-06-10 17:33:31,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.33 | bwd_microstep: 1805.16 | bwd_inner_microstep: 1805.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2083
[2024-06-10 17:33:32,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.16 | bwd_microstep: 754.51 | bwd_inner_microstep: 754.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3811
[2024-06-10 17:33:34,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.10 | bwd_microstep: 1513.93 | bwd_inner_microstep: 1513.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 17:33:36,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1253.20 | bwd_inner_microstep: 1253.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 17:33:38,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.73 | bwd_microstep: 1655.45 | bwd_inner_microstep: 1655.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 17:33:40,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.07 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 17:33:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.06 | bwd_microstep: 1398.34 | bwd_inner_microstep: 1398.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 17:33:43,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.14 | bwd_microstep: 796.51 | bwd_inner_microstep: 796.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 17:33:45,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1496.19 | bwd_inner_microstep: 1496.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004
[2024-06-10 17:33:46,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.56 | bwd_microstep: 708.43 | bwd_inner_microstep: 708.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 17:33:48,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1350.57 | bwd_inner_microstep: 1350.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3552
[2024-06-10 17:33:50,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.32 | bwd_microstep: 1345.87 | bwd_inner_microstep: 1345.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 17:33:52,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1492.91 | bwd_inner_microstep: 1492.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 17:33:54,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1346.38 | bwd_inner_microstep: 1346.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 17:33:56,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.30 | bwd_microstep: 1188.31 | bwd_inner_microstep: 1188.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 17:33:58,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.80 | bwd_microstep: 1553.18 | bwd_inner_microstep: 1553.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3727
[2024-06-10 17:34:00,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1465.57 | bwd_inner_microstep: 1465.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-10 17:34:08,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.28 | optimizer_step: 6.60
[2024-06-10 17:34:08,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 7371.78 | bwd_inner_microstep: 1602.85 | bwd_allreduce_microstep: 5768.86 | step_microstep: 38.32
[2024-06-10 17:34:08,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15284.54 | bwd: 46740.07 | bwd_inner: 40970.25 | bwd_allreduce: 5769.12 | step: 39.74
{'loss': 1.2546, 'learning_rate': 1.6527036446661396e-05, 'epoch': 0.57}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 17:34:09,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.84 | bwd_microstep: 1236.29 | bwd_inner_microstep: 1236.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3950
[2024-06-10 17:34:12,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.68 | bwd_microstep: 1589.61 | bwd_inner_microstep: 1589.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 17:34:14,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.99 | bwd_microstep: 1556.38 | bwd_inner_microstep: 1556.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 17:34:16,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.23 | bwd_microstep: 1442.86 | bwd_inner_microstep: 1442.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 17:34:18,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.74 | bwd_microstep: 1548.87 | bwd_inner_microstep: 1548.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3589
[2024-06-10 17:34:20,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.39 | bwd_microstep: 1208.54 | bwd_inner_microstep: 1208.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 17:34:22,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.63 | bwd_microstep: 1487.62 | bwd_inner_microstep: 1487.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 17:34:23,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1243.36 | bwd_inner_microstep: 1243.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 17:34:25,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3582
[2024-06-10 17:34:27,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.29 | bwd_microstep: 1239.90 | bwd_inner_microstep: 1239.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4021
[2024-06-10 17:34:29,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1511.46 | bwd_inner_microstep: 1511.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2173
[2024-06-10 17:34:30,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 879.92 | bwd_inner_microstep: 879.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-10 17:34:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.98 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 17:34:34,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1250.82 | bwd_inner_microstep: 1250.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3482
[2024-06-10 17:34:36,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1527.39 | bwd_inner_microstep: 1527.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3397
[2024-06-10 17:34:38,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.33 | bwd_microstep: 1369.39 | bwd_inner_microstep: 1369.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2160
[2024-06-10 17:34:39,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.39 | bwd_microstep: 1041.27 | bwd_inner_microstep: 1041.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 17:34:41,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.09 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 17:34:42,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.26 | bwd_microstep: 802.81 | bwd_inner_microstep: 802.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 17:34:44,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1377.23 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-10 17:34:46,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.47 | bwd_microstep: 908.61 | bwd_inner_microstep: 908.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 17:34:47,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1373.51 | bwd_inner_microstep: 1373.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-10 17:34:50,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.45 | bwd_microstep: 1719.82 | bwd_inner_microstep: 1719.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1941
[2024-06-10 17:34:51,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.49 | bwd_microstep: 849.20 | bwd_inner_microstep: 849.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-10 17:34:53,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.96 | bwd_microstep: 1637.44 | bwd_inner_microstep: 1637.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 17:34:55,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.69 | bwd_microstep: 1360.21 | bwd_inner_microstep: 1360.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3383
[2024-06-10 17:34:57,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.34 | bwd_microstep: 1272.00 | bwd_inner_microstep: 1271.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 17:34:59,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1353.48 | bwd_inner_microstep: 1353.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572
[2024-06-10 17:35:01,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1426.14 | bwd_inner_microstep: 1426.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774
[2024-06-10 17:35:03,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.89 | bwd_microstep: 1742.05 | bwd_inner_microstep: 1742.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2383
[2024-06-10 17:35:05,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.08 | bwd_microstep: 1122.80 | bwd_inner_microstep: 1122.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-10 17:35:09,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 17:35:09,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.96 | bwd_microstep: 4461.78 | bwd_inner_microstep: 1095.48 | bwd_allreduce_microstep: 3366.24 | step_microstep: 38.28
[2024-06-10 17:35:09,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15781.55 | bwd: 45658.91 | bwd_inner: 42291.72 | bwd_allreduce: 3366.48 | step: 39.81
{'loss': 1.1886, 'learning_rate': 1.6490078847057728e-05, 'epoch': 0.57}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 17:35:12,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.23 | bwd_microstep: 1476.36 | bwd_inner_microstep: 1476.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 17:35:13,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1248.36 | bwd_inner_microstep: 1248.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945
[2024-06-10 17:35:14,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.18 | bwd_microstep: 725.92 | bwd_inner_microstep: 725.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4313
[2024-06-10 17:35:16,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.38 | bwd_microstep: 1577.05 | bwd_inner_microstep: 1577.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 17:35:19,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 1540.16 | bwd_inner_microstep: 1540.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 17:35:21,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.57 | bwd_microstep: 1416.01 | bwd_inner_microstep: 1415.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 17:35:22,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.28 | bwd_microstep: 1151.14 | bwd_inner_microstep: 1151.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 17:35:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.04 | bwd_microstep: 1310.25 | bwd_inner_microstep: 1310.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1904
[2024-06-10 17:35:25,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.19 | bwd_microstep: 777.75 | bwd_inner_microstep: 777.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 17:35:27,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.01 | bwd_microstep: 1444.15 | bwd_inner_microstep: 1444.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3485
[2024-06-10 17:35:29,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.22 | bwd_microstep: 1410.20 | bwd_inner_microstep: 1410.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2479
[2024-06-10 17:35:30,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.24 | bwd_microstep: 1048.97 | bwd_inner_microstep: 1048.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 17:35:32,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 17:35:34,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1388.92 | bwd_inner_microstep: 1388.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3636
[2024-06-10 17:35:36,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.23 | bwd_microstep: 1539.33 | bwd_inner_microstep: 1539.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3518
[2024-06-10 17:35:38,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1319.80 | bwd_inner_microstep: 1319.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-10 17:35:40,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1425.42 | bwd_inner_microstep: 1425.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 17:35:42,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.10 | bwd_microstep: 1305.02 | bwd_inner_microstep: 1304.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 17:35:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1282.35 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 17:35:45,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1254.22 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 17:35:47,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1397.96 | bwd_inner_microstep: 1397.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 17:35:49,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.53 | bwd_microstep: 880.88 | bwd_inner_microstep: 880.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741
[2024-06-10 17:35:51,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1444.18 | bwd_inner_microstep: 1444.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 17:35:53,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1536.17 | bwd_inner_microstep: 1536.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-10 17:35:54,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.52 | bwd_microstep: 808.10 | bwd_inner_microstep: 808.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 17:35:55,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 974.19 | bwd_inner_microstep: 974.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-10 17:35:57,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.26 | bwd_microstep: 1306.40 | bwd_inner_microstep: 1306.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854
[2024-06-10 17:35:59,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.96 | bwd_microstep: 1569.51 | bwd_inner_microstep: 1569.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 17:36:01,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1510.17 | bwd_inner_microstep: 1510.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-10 17:36:03,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.00 | bwd_microstep: 1427.94 | bwd_inner_microstep: 1427.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807
[2024-06-10 17:36:05,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.68 | bwd_microstep: 1625.58 | bwd_inner_microstep: 1625.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 17:36:10,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.06 | optimizer_step: 6.58
[2024-06-10 17:36:10,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.76 | bwd_microstep: 3996.60 | bwd_inner_microstep: 1846.09 | bwd_allreduce_microstep: 2150.46 | step_microstep: 37.72
[2024-06-10 17:36:10,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15728.23 | bwd: 44509.38 | bwd_inner: 42358.01 | bwd_allreduce: 2150.69 | step: 39.20
{'loss': 1.1711, 'learning_rate': 1.6453133609378122e-05, 'epoch': 0.57}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-10 17:36:12,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.99 | bwd_microstep: 1332.68 | bwd_inner_microstep: 1332.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 17:36:14,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1381.28 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3394
[2024-06-10 17:36:15,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.99 | bwd_microstep: 1145.95 | bwd_inner_microstep: 1145.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407
[2024-06-10 17:36:17,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.04 | bwd_microstep: 1305.84 | bwd_inner_microstep: 1305.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 17:36:19,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.64 | bwd_microstep: 1281.25 | bwd_inner_microstep: 1281.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 17:36:21,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1279.37 | bwd_inner_microstep: 1279.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 17:36:22,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.89 | bwd_microstep: 680.28 | bwd_inner_microstep: 680.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 17:36:24,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1542.13 | bwd_inner_microstep: 1542.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 17:36:26,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.35 | bwd_microstep: 1627.45 | bwd_inner_microstep: 1627.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 17:36:27,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.94 | bwd_microstep: 812.31 | bwd_inner_microstep: 812.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 17:36:29,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.87 | bwd_microstep: 1302.82 | bwd_inner_microstep: 1302.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 17:36:31,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.64 | bwd_microstep: 1285.80 | bwd_inner_microstep: 1285.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2957
[2024-06-10 17:36:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.14 | bwd_microstep: 1100.17 | bwd_inner_microstep: 1100.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 17:36:34,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 1480.81 | bwd_inner_microstep: 1480.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3481
[2024-06-10 17:36:36,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.87 | bwd_microstep: 1537.89 | bwd_inner_microstep: 1537.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 17:36:38,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.48 | bwd_microstep: 1285.64 | bwd_inner_microstep: 1285.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 17:36:40,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1376.97 | bwd_inner_microstep: 1376.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 17:36:42,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1512.01 | bwd_inner_microstep: 1511.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-10 17:36:44,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1326.55 | bwd_inner_microstep: 1326.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 17:36:45,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.76 | bwd_microstep: 974.31 | bwd_inner_microstep: 974.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-10 17:36:48,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.63 | bwd_microstep: 1522.42 | bwd_inner_microstep: 1522.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2396
[2024-06-10 17:36:49,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.05 | bwd_microstep: 1030.05 | bwd_inner_microstep: 1030.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 17:36:51,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.33 | bwd_microstep: 1659.97 | bwd_inner_microstep: 1659.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-10 17:36:53,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.20 | bwd_microstep: 1583.70 | bwd_inner_microstep: 1583.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3807
[2024-06-10 17:36:56,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.11 | bwd_microstep: 1599.24 | bwd_inner_microstep: 1599.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 17:36:57,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.52 | bwd_microstep: 1184.26 | bwd_inner_microstep: 1184.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 17:36:59,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.36 | bwd_microstep: 1452.86 | bwd_inner_microstep: 1452.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 17:37:02,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.84 | bwd_microstep: 1645.17 | bwd_inner_microstep: 1645.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 17:37:04,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1477.77 | bwd_inner_microstep: 1477.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-10 17:37:06,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.46 | bwd_microstep: 1576.14 | bwd_inner_microstep: 1576.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-10 17:37:08,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1444.28 | bwd_inner_microstep: 1444.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3382
[2024-06-10 17:37:11,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.07 | optimizer_step: 6.57
[2024-06-10 17:37:11,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.72 | bwd_microstep: 2815.19 | bwd_inner_microstep: 1403.83 | bwd_allreduce_microstep: 1411.31 | step_microstep: 37.62
[2024-06-10 17:37:11,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16102.03 | bwd: 44562.59 | bwd_inner: 43150.38 | bwd_allreduce: 1411.54 | step: 39.02
]


 57%|█████▋    | 980/1726 [16:54:41<12:39:11, 61.06s/it]
 57%|█████▋    | 981/1726 [16:55:42<12:39:42, 61.18s/it]


 57%|█████▋    | 981/1726 [16:55:42<12:39:42, 61.18s/it]
 57%|█████▋    | 982/1726 [16:56:44<12:43:00, 61.53s/it]


 57%|█████▋    | 982/1726 [16:56:44<12:43:00, 61.53s/it]
 57%|█████▋    | 983/1726 [16:57:46<12:42:52, 61.60s/it]


 57%|█████▋    | 983/1726 [16:57:46<12:42:52, 61.60s/it]
 57%|█████▋    | 984/1726 [16:58:47<12:37:57, 61.29s/it]


 57%|█████▋    | 984/1726 [16:58:47<12:37:57, 61.29s/it]
 57%|█████▋    | 985/1726 [16:59:48<12:35:51, 61.20s/it]
                                          {'loss': 1.2102, 'learning_rate': 1.64162008637435e-05, 'epoch': 0.57}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 17:37:13,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1367.70 | bwd_inner_microstep: 1367.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 17:37:15,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.34 | bwd_microstep: 1145.08 | bwd_inner_microstep: 1145.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 17:37:17,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.19 | bwd_microstep: 1451.08 | bwd_inner_microstep: 1451.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 17:37:19,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.58 | bwd_microstep: 1541.06 | bwd_inner_microstep: 1541.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 17:37:21,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.56 | bwd_microstep: 1396.00 | bwd_inner_microstep: 1395.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816
[2024-06-10 17:37:23,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.58 | bwd_microstep: 1382.63 | bwd_inner_microstep: 1382.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 17:37:24,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1390.39 | bwd_inner_microstep: 1390.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 17:37:26,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 17:37:28,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.68 | bwd_microstep: 1285.47 | bwd_inner_microstep: 1285.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 17:37:30,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 17:37:32,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.35 | bwd_microstep: 1251.32 | bwd_inner_microstep: 1251.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3705
[2024-06-10 17:37:34,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.66 | bwd_microstep: 1663.95 | bwd_inner_microstep: 1663.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 17:37:36,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1576.36 | bwd_inner_microstep: 1576.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2097
[2024-06-10 17:37:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.92 | bwd_microstep: 918.29 | bwd_inner_microstep: 918.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 17:37:39,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.86 | bwd_microstep: 1391.62 | bwd_inner_microstep: 1391.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-10 17:37:40,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.67 | bwd_microstep: 805.74 | bwd_inner_microstep: 805.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-10 17:37:42,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.91 | bwd_microstep: 820.17 | bwd_inner_microstep: 820.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 17:37:44,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1611.54 | bwd_inner_microstep: 1611.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 17:37:46,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1553.78 | bwd_inner_microstep: 1553.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 17:37:48,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.09 | bwd_microstep: 1189.10 | bwd_inner_microstep: 1189.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 17:37:50,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1509.41 | bwd_inner_microstep: 1509.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 17:37:52,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2762
[2024-06-10 17:37:53,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.39 | bwd_microstep: 1048.29 | bwd_inner_microstep: 1048.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 17:37:55,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.63 | bwd_microstep: 1660.72 | bwd_inner_microstep: 1660.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 17:37:57,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1493.82 | bwd_inner_microstep: 1493.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688
[2024-06-10 17:37:59,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1423.68 | bwd_inner_microstep: 1423.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 17:38:01,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1409.31 | bwd_inner_microstep: 1409.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 17:38:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.95 | bwd_microstep: 1646.55 | bwd_inner_microstep: 1646.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3832
[2024-06-10 17:38:05,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.07 | bwd_microstep: 1319.50 | bwd_inner_microstep: 1319.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 17:38:07,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1445.00 | bwd_inner_microstep: 1444.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 17:38:10,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.86 | bwd_microstep: 1535.49 | bwd_inner_microstep: 1535.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2230
[2024-06-10 17:38:12,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 17:38:12,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.67 | bwd_microstep: 1781.74 | bwd_inner_microstep: 1140.59 | bwd_allreduce_microstep: 641.09 | step_microstep: 37.79
[2024-06-10 17:38:12,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16222.18 | bwd: 44159.96 | bwd_inner: 43517.97 | bwd_allreduce: 641.32 | step: 39.24
{'loss': 1.2447, 'learning_rate': 1.6379280740230803e-05, 'epoch': 0.57}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 17:38:14,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1605.81 | bwd_inner_microstep: 1605.63 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 17:38:16,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.50 | bwd_microstep: 1380.27 | bwd_inner_microstep: 1380.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865
[2024-06-10 17:38:18,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.77 | bwd_microstep: 1661.31 | bwd_inner_microstep: 1661.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 17:38:20,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1244.19 | bwd_inner_microstep: 1244.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 17:38:22,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1405.50 | bwd_inner_microstep: 1405.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 17:38:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.67 | bwd_microstep: 1533.55 | bwd_inner_microstep: 1533.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 17:38:26,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1248.16 | bwd_inner_microstep: 1248.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 932
[2024-06-10 17:38:26,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.40 | bwd_microstep: 378.37 | bwd_inner_microstep: 378.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 17:38:28,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.78 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-10 17:38:29,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.31 | bwd_microstep: 897.34 | bwd_inner_microstep: 897.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3496
[2024-06-10 17:38:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1443.25 | bwd_inner_microstep: 1443.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 4098
[2024-06-10 17:38:34,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.13 | bwd_microstep: 1674.44 | bwd_inner_microstep: 1674.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 17:38:36,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.37 | bwd_microstep: 1611.58 | bwd_inner_microstep: 1611.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 17:38:38,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1488.61 | bwd_inner_microstep: 1488.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 17:38:40,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.12 | bwd_microstep: 1528.19 | bwd_inner_microstep: 1528.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 17:38:42,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.73 | bwd_inner_microstep: 1245.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 17:38:44,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1504.44 | bwd_inner_microstep: 1504.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3537
[2024-06-10 17:38:46,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.41 | bwd_microstep: 1342.20 | bwd_inner_microstep: 1342.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3925
[2024-06-10 17:38:48,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1553.03 | bwd_inner_microstep: 1553.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 17:38:50,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1293.11 | bwd_inner_microstep: 1293.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 17:38:52,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.88 | bwd_microstep: 1648.69 | bwd_inner_microstep: 1648.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3168
[2024-06-10 17:38:53,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.76 | bwd_microstep: 1071.22 | bwd_inner_microstep: 1071.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2266
[2024-06-10 17:38:55,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.16 | bwd_microstep: 970.50 | bwd_inner_microstep: 970.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 17:38:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1554.75 | bwd_inner_microstep: 1554.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 634
[2024-06-10 17:38:57,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.06 | bwd_microstep: 263.66 | bwd_inner_microstep: 263.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-10 17:38:59,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.33 | bwd_microstep: 1602.42 | bwd_inner_microstep: 1602.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-10 17:39:01,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.35 | bwd_microstep: 1189.95 | bwd_inner_microstep: 1189.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-10 17:39:03,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.46 | bwd_microstep: 1608.67 | bwd_inner_microstep: 1608.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 17:39:05,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1406.58 | bwd_inner_microstep: 1406.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3805
[2024-06-10 17:39:07,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.81 | bwd_microstep: 1580.77 | bwd_inner_microstep: 1580.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3498
[2024-06-10 17:39:09,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1250.03 | bwd_inner_microstep: 1250.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-10 17:39:15,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-10 17:39:15,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.81 | bwd_microstep: 6094.82 | bwd_inner_microstep: 822.32 | bwd_allreduce_microstep: 5272.45 | step_microstep: 37.88
[2024-06-10 17:39:15,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15798.38 | bwd: 47620.88 | bwd_inner: 42347.39 | bwd_allreduce: 5272.75 | step: 39.40
{'loss': 1.2102, 'learning_rate': 1.634237336887252e-05, 'epoch': 0.57}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 17:39:18,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.61 | bwd_microstep: 1467.10 | bwd_inner_microstep: 1467.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 17:39:20,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.65 | bwd_microstep: 1582.31 | bwd_inner_microstep: 1582.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 17:39:22,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1380.61 | bwd_inner_microstep: 1380.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 17:39:23,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1341.96 | bwd_inner_microstep: 1341.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-10 17:39:26,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.85 | bwd_microstep: 1660.67 | bwd_inner_microstep: 1660.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4109
[2024-06-10 17:39:28,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.63 | bwd_microstep: 1633.50 | bwd_inner_microstep: 1633.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 780
[2024-06-10 17:39:28,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.78 | bwd_microstep: 309.20 | bwd_inner_microstep: 309.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 17:39:30,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.83 | bwd_microstep: 803.19 | bwd_inner_microstep: 803.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 17:39:31,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.66 | bwd_microstep: 1286.67 | bwd_inner_microstep: 1286.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 17:39:33,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.47 | bwd_microstep: 1417.18 | bwd_inner_microstep: 1417.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 17:39:36,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.90 | bwd_microstep: 1633.37 | bwd_inner_microstep: 1633.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3520
[2024-06-10 17:39:37,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1324.44 | bwd_inner_microstep: 1324.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3554
[2024-06-10 17:39:39,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.59 | bwd_microstep: 1432.26 | bwd_inner_microstep: 1432.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3524
[2024-06-10 17:39:41,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1354.34 | bwd_inner_microstep: 1354.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3528
[2024-06-10 17:39:43,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.52 | bwd_microstep: 1208.81 | bwd_inner_microstep: 1208.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3668
[2024-06-10 17:39:45,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 1652.25 | bwd_inner_microstep: 1652.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 17:39:47,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.38 | bwd_microstep: 1493.81 | bwd_inner_microstep: 1493.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 17:39:49,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1293.72 | bwd_inner_microstep: 1293.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2027
[2024-06-10 17:39:50,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.16 | bwd_microstep: 840.99 | bwd_inner_microstep: 840.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2169
[2024-06-10 17:39:51,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.68 | bwd_microstep: 803.81 | bwd_inner_microstep: 803.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 17:39:53,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.57 | bwd_microstep: 1186.44 | bwd_inner_microstep: 1186.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-10 17:39:55,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1289.84 | bwd_inner_microstep: 1289.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3469
[2024-06-10 17:39:57,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1365.06 | bwd_inner_microstep: 1365.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 17:39:58,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.22 | bwd_microstep: 1285.60 | bwd_inner_microstep: 1285.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 17:40:01,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.97 | bwd_microstep: 1588.75 | bwd_inner_microstep: 1588.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 17:40:03,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 1606.04 | bwd_inner_microstep: 1606.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969
[2024-06-10 17:40:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.02 | bwd_microstep: 704.82 | bwd_inner_microstep: 704.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2286
[2024-06-10 17:40:05,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.67 | bwd_microstep: 815.41 | bwd_inner_microstep: 815.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 17:40:07,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.11 | bwd_microstep: 1454.25 | bwd_inner_microstep: 1454.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3878
[2024-06-10 17:40:09,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 1490.45 | bwd_inner_microstep: 1490.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 17:40:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.66 | bwd_microstep: 1513.84 | bwd_inner_microstep: 1513.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 17:40:15,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-10 17:40:15,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.98 | bwd_microstep: 3460.86 | bwd_inner_microstep: 1524.08 | bwd_allreduce_microstep: 1936.73 | step_microstep: 37.78
[2024-06-10 17:40:15,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15619.71 | bwd: 43681.53 | bwd_inner: 41743.89 | bwd_allreduce: 1936.96 | step: 39.23
{'loss': 1.2179, 'learning_rate': 1.630547887965622e-05, 'epoch': 0.57}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3592
[2024-06-10 17:40:17,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.95 | bwd_microstep: 1431.06 | bwd_inner_microstep: 1431.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4499
[2024-06-10 17:40:20,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.18 | bwd_microstep: 1736.85 | bwd_inner_microstep: 1736.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3858
[2024-06-10 17:40:22,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.78 | bwd_microstep: 1659.64 | bwd_inner_microstep: 1659.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3846
[2024-06-10 17:40:24,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.89 | bwd_microstep: 1422.71 | bwd_inner_microstep: 1422.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3841
[2024-06-10 17:40:26,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1389.83 | bwd_inner_microstep: 1389.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 17:40:27,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.80 | bwd_microstep: 696.34 | bwd_inner_microstep: 696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2181
[2024-06-10 17:40:28,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.86 | bwd_microstep: 886.42 | bwd_inner_microstep: 886.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697
[2024-06-10 17:40:30,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1455.51 | bwd_inner_microstep: 1455.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 17:40:32,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.10 | bwd_microstep: 1286.71 | bwd_inner_microstep: 1286.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-10 17:40:33,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.51 | bwd_microstep: 1277.95 | bwd_inner_microstep: 1277.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 17:40:35,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.77 | bwd_microstep: 1388.01 | bwd_inner_microstep: 1387.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3453
[2024-06-10 17:40:37,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.36 | bwd_microstep: 1368.11 | bwd_inner_microstep: 1368.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 17:40:39,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1286.39 | bwd_inner_microstep: 1286.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3429
[2024-06-10 17:40:41,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.19 | bwd_microstep: 1409.66 | bwd_inner_microstep: 1409.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 17:40:43,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.23 | bwd_microstep: 1511.38 | bwd_inner_microstep: 1511.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 17:40:45,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1429.18 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 17:40:47,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1452.95 | bwd_inner_microstep: 1452.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 17:40:49,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1278.62 | bwd_inner_microstep: 1278.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 17:40:51,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1254.38 | bwd_inner_microstep: 1254.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 17:40:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1396.23 | bwd_inner_microstep: 1396.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 17:40:54,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1409.05 | bwd_inner_microstep: 1409.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 17:40:56,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1507.06 | bwd_inner_microstep: 1507.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 17:40:59,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.39 | bwd_microstep: 1525.84 | bwd_inner_microstep: 1525.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 17:41:00,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.07 | bwd_microstep: 1284.80 | bwd_inner_microstep: 1284.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4319
[2024-06-10 17:41:03,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.60 | bwd_microstep: 1582.83 | bwd_inner_microstep: 1582.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075
[2024-06-10 17:41:04,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.49 | bwd_microstep: 914.16 | bwd_inner_microstep: 914.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2716
[2024-06-10 17:41:05,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.98 | bwd_microstep: 969.83 | bwd_inner_microstep: 969.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 17:41:07,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038
[2024-06-10 17:41:08,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.75 | bwd_microstep: 902.36 | bwd_inner_microstep: 902.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597
[2024-06-10 17:41:10,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1572.49 | bwd_inner_microstep: 1572.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 17:41:13,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.14 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 17:41:18,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.39 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-10 17:41:18,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 5158.02 | bwd_inner_microstep: 1603.39 | bwd_allreduce_microstep: 3554.58 | step_microstep: 38.62
[2024-06-10 17:41:18,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16124.49 | bwd: 46749.65 | bwd_inner: 43194.15 | bwd_allreduce: 3554.81 | step: 40.15
{'loss': 1.2076, 'learning_rate': 1.6268597402524094e-05, 'epoch': 0.57}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3482
[2024-06-10 17:41:20,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.46 | bwd_microstep: 1336.14 | bwd_inner_microstep: 1336.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2402
[2024-06-10 17:41:21,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.99 | bwd_microstep: 901.18 | bwd_inner_microstep: 901.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 17:41:23,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1244.20 | bwd_inner_microstep: 1244.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843
[2024-06-10 17:41:25,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.15 | bwd_microstep: 1657.22 | bwd_inner_microstep: 1657.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 17:41:27,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.06 | bwd_microstep: 1377.57 | bwd_inner_microstep: 1377.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 17:41:29,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.41 | bwd_microstep: 1379.56 | bwd_inner_microstep: 1379.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 17:41:31,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4013
[2024-06-10 17:41:33,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.52 | bwd_microstep: 1608.04 | bwd_inner_microstep: 1608.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 17:41:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.80 | bwd_microstep: 1633.12 | bwd_inner_microstep: 1633.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 17:41:37,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1253.32 | bwd_inner_microstep: 1253.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 17:41:39,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1390.58 | bwd_inner_microstep: 1390.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 17:41:41,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.27 | bwd_microstep: 1391.26 | bwd_inner_microstep: 1391.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 17:41:43,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1414.82 | bwd_inner_microstep: 1414.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1896
[2024-06-10 17:41:44,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.19 | bwd_microstep: 682.28 | bwd_inner_microstep: 682.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3538
[2024-06-10 17:41:46,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.71 | bwd_microstep: 1585.01 | bwd_inner_microstep: 1584.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 17:41:48,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.02 | bwd_microstep: 1488.45 | bwd_inner_microstep: 1488.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3540
[2024-06-10 17:41:50,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-10 17:41:52,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.72 | bwd_microstep: 1423.60 | bwd_inner_microstep: 1423.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 17:41:54,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1395.83 | bwd_inner_microstep: 1395.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 17:41:56,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 17:41:58,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1522.67 | bwd_inner_microstep: 1522.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3677
[2024-06-10 17:42:00,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.99 | bwd_microstep: 1495.02 | bwd_inner_microstep: 1494.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 17:42:03,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.93 | bwd_microstep: 1652.07 | bwd_inner_microstep: 1652.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3553
[2024-06-10 17:42:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.60 | bwd_microstep: 1526.04 | bwd_inner_microstep: 1526.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-10 17:42:06,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.00 | bwd_microstep: 877.32 | bwd_inner_microstep: 877.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2036
[2024-06-10 17:42:07,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.93 | bwd_microstep: 715.06 | bwd_inner_microstep: 715.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3575
[2024-06-10 17:42:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.68 | bwd_microstep: 1333.28 | bwd_inner_microstep: 1333.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3568
[2024-06-10 17:42:11,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.02 | bwd_microstep: 1329.38 | bwd_inner_microstep: 1329.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 17:42:13,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.76 | bwd_microstep: 1545.86 | bwd_inner_microstep: 1545.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 17:42:15,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.44 | bwd_microstep: 1377.84 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3792
[2024-06-10 17:42:17,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1510.45 | bwd_inner_microstep: 1510.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-10 17:42:21,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.05 | optimizer_step: 6.60
[2024-06-10 17:42:21,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 3512.09 | bwd_inner_microstep: 1618.59 | bwd_allreduce_microstep: 1893.45 | step_microstep: 37.61
[2024-06-10 17:42:21,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16397.09 | bwd: 45846.30 | bwd_inner: 43951.95 | bwd_allreduce: 1893.67 | step: 39.07
{'loss': 1.1846, 'learning_rate': 1.6231729067372518e-05, 'epoch': 0.57}


 57%|█████▋    | 985/1726 [16:59:48<12:35:51, 61.20s/it]
 57%|█████▋    | 986/1726 [17:00:48<12:33:01, 61.06s/it]


 57%|█████▋    | 986/1726 [17:00:48<12:33:01, 61.06s/it]
 57%|█████▋    | 987/1726 [17:01:52<12:41:57, 61.86s/it]


 57%|█████▋    | 987/1726 [17:01:52<12:41:57, 61.86s/it]
 57%|█████▋    | 988/1726 [17:02:52<12:32:40, 61.19s/it]


 57%|█████▋    | 988/1726 [17:02:52<12:32:40, 61.19s/it]
 57%|█████▋    | 989/1726 [17:03:55<12:39:04, 61.80s/it]


 57%|█████▋    | 989/1726 [17:03:55<12:39:04, 61.80s/it]
 57%|█████▋    | 990/1726 [17:04:58<12:40:54, 62.03s/it]


 57%|█████▋    |dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977
[2024-06-10 17:42:22,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.43 | bwd_microstep: 817.54 | bwd_inner_microstep: 817.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 17:42:24,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.93 | bwd_microstep: 1249.33 | bwd_inner_microstep: 1249.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 17:42:26,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1295.78 | bwd_inner_microstep: 1295.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271
[2024-06-10 17:42:27,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.05 | bwd_microstep: 871.52 | bwd_inner_microstep: 871.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 17:42:29,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.84 | bwd_microstep: 1279.44 | bwd_inner_microstep: 1279.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 17:42:30,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1381.77 | bwd_inner_microstep: 1381.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 17:42:33,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.39 | bwd_microstep: 1527.66 | bwd_inner_microstep: 1527.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 17:42:34,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.26 | bwd_microstep: 1349.86 | bwd_inner_microstep: 1349.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 17:42:36,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1360
[2024-06-10 17:42:37,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.61 | bwd_microstep: 548.94 | bwd_inner_microstep: 548.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3379
[2024-06-10 17:42:39,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.56 | bwd_microstep: 1271.66 | bwd_inner_microstep: 1271.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2611
[2024-06-10 17:42:40,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.53 | bwd_microstep: 992.21 | bwd_inner_microstep: 992.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-10 17:42:42,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.49 | bwd_microstep: 1614.05 | bwd_inner_microstep: 1614.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2179
[2024-06-10 17:42:44,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.12 | bwd_microstep: 1045.50 | bwd_inner_microstep: 1045.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005
[2024-06-10 17:42:45,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.19 | bwd_microstep: 895.41 | bwd_inner_microstep: 895.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3642
[2024-06-10 17:42:47,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.50 | bwd_microstep: 1358.87 | bwd_inner_microstep: 1358.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 17:42:49,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.09 | bwd_microstep: 1374.44 | bwd_inner_microstep: 1374.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2305
[2024-06-10 17:42:50,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.27 | bwd_microstep: 932.49 | bwd_inner_microstep: 932.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 17:42:51,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.67 | bwd_microstep: 796.83 | bwd_inner_microstep: 796.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 17:42:53,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.61 | bwd_microstep: 973.81 | bwd_inner_microstep: 973.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 17:42:55,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 17:42:56,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1280.12 | bwd_inner_microstep: 1280.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-10 17:42:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.50 | bwd_microstep: 1492.14 | bwd_inner_microstep: 1492.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3447
[2024-06-10 17:43:00,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.25 | bwd_microstep: 1186.40 | bwd_inner_microstep: 1186.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2190
[2024-06-10 17:43:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.02 | bwd_microstep: 920.17 | bwd_inner_microstep: 920.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 17:43:03,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1494.68 | bwd_inner_microstep: 1494.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4651
[2024-06-10 17:43:06,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 737.02 | bwd_microstep: 1980.97 | bwd_inner_microstep: 1980.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 17:43:08,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.99 | bwd_microstep: 1300.01 | bwd_inner_microstep: 1299.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3808
[2024-06-10 17:43:10,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.79 | bwd_microstep: 1748.92 | bwd_inner_microstep: 1748.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2023
[2024-06-10 17:43:12,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.21 | bwd_microstep: 855.00 | bwd_inner_microstep: 854.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 17:43:14,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1552.00 | bwd_inner_microstep: 1551.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3736
[2024-06-10 17:43:22,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 17:43:22,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.07 | bwd_microstep: 7650.36 | bwd_inner_microstep: 2100.76 | bwd_allreduce_microstep: 5549.55 | step_microstep: 38.04
[2024-06-10 17:43:22,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14996.47 | bwd: 45807.52 | bwd_inner: 40257.05 | bwd_allreduce: 5549.78 | step: 39.65
{'loss': 1.2221, 'learning_rate': 1.619487400405158e-05, 'epoch': 0.57}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 17:43:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.33 | bwd_microstep: 1611.10 | bwd_inner_microstep: 1611.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3999
[2024-06-10 17:43:26,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1600.97 | bwd_inner_microstep: 1600.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 17:43:28,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.61 | bwd_microstep: 1239.23 | bwd_inner_microstep: 1239.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 17:43:30,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.68 | bwd_microstep: 1549.98 | bwd_inner_microstep: 1549.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3838
[2024-06-10 17:43:32,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1386.90 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 17:43:34,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1244.71 | bwd_inner_microstep: 1244.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 17:43:36,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1245.56 | bwd_inner_microstep: 1245.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-10 17:43:37,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 794.35 | bwd_inner_microstep: 794.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 17:43:39,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1402.41 | bwd_inner_microstep: 1402.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 17:43:40,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.12 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3688
[2024-06-10 17:43:42,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1457.60 | bwd_inner_microstep: 1457.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 17:43:44,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1392.54 | bwd_inner_microstep: 1392.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-10 17:43:46,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.12 | bwd_microstep: 932.70 | bwd_inner_microstep: 932.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910
[2024-06-10 17:43:47,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.18 | bwd_microstep: 717.23 | bwd_inner_microstep: 717.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2086
[2024-06-10 17:43:48,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.92 | bwd_microstep: 915.35 | bwd_inner_microstep: 915.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3525
[2024-06-10 17:43:50,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.51 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 17:43:52,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1349.13 | bwd_inner_microstep: 1349.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 17:43:54,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1345.37 | bwd_inner_microstep: 1345.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3628
[2024-06-10 17:43:56,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.62 | bwd_microstep: 1643.52 | bwd_inner_microstep: 1643.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 17:43:58,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.81 | bwd_microstep: 1602.40 | bwd_inner_microstep: 1602.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 17:44:00,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1405.00 | bwd_inner_microstep: 1404.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-10 17:44:01,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 697.07 | bwd_inner_microstep: 697.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3539
[2024-06-10 17:44:03,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1229.80 | bwd_inner_microstep: 1229.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2517
[2024-06-10 17:44:04,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.22 | bwd_microstep: 964.26 | bwd_inner_microstep: 964.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 17:44:06,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1611.00 | bwd_inner_microstep: 1610.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-10 17:44:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.26 | bwd_microstep: 1355.22 | bwd_inner_microstep: 1355.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3808
[2024-06-10 17:44:10,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.69 | bwd_microstep: 1601.09 | bwd_inner_microstep: 1601.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 17:44:13,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.92 | bwd_microstep: 1593.63 | bwd_inner_microstep: 1593.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-10 17:44:15,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.80 | bwd_microstep: 1426.87 | bwd_inner_microstep: 1426.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 17:44:17,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1498.88 | bwd_inner_microstep: 1498.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2527
[2024-06-10 17:44:18,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.17 | bwd_microstep: 1123.89 | bwd_inner_microstep: 1123.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 17:44:22,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 17:44:22,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 2875.25 | bwd_inner_microstep: 1679.93 | bwd_allreduce_microstep: 1195.27 | step_microstep: 37.94
[2024-06-10 17:44:22,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15749.76 | bwd: 43504.33 | bwd_inner: 42308.16 | bwd_allreduce: 1195.50 | step: 39.38
{'loss': 1.2264, 'learning_rate': 1.6158032342364623e-05, 'epoch': 0.57}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2490
[2024-06-10 17:44:23,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.77 | bwd_microstep: 1016.89 | bwd_inner_microstep: 1016.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 17:44:25,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1297.88 | bwd_inner_microstep: 1297.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-10 17:44:27,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.41 | bwd_microstep: 1647.89 | bwd_inner_microstep: 1647.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 17:44:29,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.74 | bwd_microstep: 1181.66 | bwd_inner_microstep: 1181.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 17:44:31,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1310.39 | bwd_inner_microstep: 1310.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 17:44:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 17:44:34,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.66 | bwd_microstep: 1385.74 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943
[2024-06-10 17:44:35,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.64 | bwd_microstep: 728.05 | bwd_inner_microstep: 728.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 17:44:37,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.12 | bwd_microstep: 1154.46 | bwd_inner_microstep: 1154.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 17:44:39,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1346.93 | bwd_inner_microstep: 1346.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2936
[2024-06-10 17:44:40,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.53 | bwd_microstep: 1093.77 | bwd_inner_microstep: 1093.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 17:44:42,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1382.51 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-10 17:44:44,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.08 | bwd_microstep: 1190.46 | bwd_inner_microstep: 1190.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 17:44:45,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.74 | bwd_microstep: 796.13 | bwd_inner_microstep: 796.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 17:44:47,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1408.63 | bwd_inner_microstep: 1408.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 17:44:49,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1288.62 | bwd_inner_microstep: 1288.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 17:44:51,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1498.27 | bwd_inner_microstep: 1498.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-10 17:44:52,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.06 | bwd_microstep: 1217.04 | bwd_inner_microstep: 1217.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-10 17:44:54,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.93 | bwd_microstep: 1517.44 | bwd_inner_microstep: 1517.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 17:44:56,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.05 | bwd_microstep: 1452.34 | bwd_inner_microstep: 1452.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 17:44:58,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1257.23 | bwd_inner_microstep: 1257.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 17:45:00,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1460.67 | bwd_inner_microstep: 1460.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-10 17:45:02,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.26 | bwd_microstep: 1281.95 | bwd_inner_microstep: 1281.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 17:45:04,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1250.96 | bwd_inner_microstep: 1250.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 17:45:05,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.07 | bwd_microstep: 973.48 | bwd_inner_microstep: 973.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-10 17:45:07,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1422.69 | bwd_inner_microstep: 1422.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 17:45:08,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.93 | bwd_microstep: 970.99 | bwd_inner_microstep: 970.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3413
[2024-06-10 17:45:11,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1574.74 | bwd_inner_microstep: 1574.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3462
[2024-06-10 17:45:13,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.94 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3583
[2024-06-10 17:45:15,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.09 | bwd_microstep: 1800.14 | bwd_inner_microstep: 1800.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817
[2024-06-10 17:45:17,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.71 | bwd_microstep: 1584.26 | bwd_inner_microstep: 1584.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 17:45:21,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 20.10 | optimizer_gradients: 4.19 | optimizer_step: 6.62
[2024-06-10 17:45:21,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.81 | bwd_microstep: 3397.76 | bwd_inner_microstep: 1747.79 | bwd_allreduce_microstep: 1649.92 | step_microstep: 41.22
[2024-06-10 17:45:21,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15707.11 | bwd: 43675.03 | bwd_inner: 42024.17 | bwd_allreduce: 1650.16 | step: 42.72
{'loss': 1.2704, 'learning_rate': 1.612120421206778e-05, 'epoch': 0.58}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 17:45:22,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.64 | bwd_microstep: 784.56 | bwd_inner_microstep: 784.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 17:45:24,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.40 | bwd_microstep: 1353.32 | bwd_inner_microstep: 1353.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3853
[2024-06-10 17:45:27,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1659.63 | bwd_inner_microstep: 1659.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3805
[2024-06-10 17:45:28,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.04 | bwd_microstep: 1348.69 | bwd_inner_microstep: 1348.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2321
[2024-06-10 17:45:30,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.29 | bwd_microstep: 979.71 | bwd_inner_microstep: 979.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 17:45:32,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.16 | bwd_microstep: 1396.03 | bwd_inner_microstep: 1396.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 17:45:33,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.86 | bwd_microstep: 1186.13 | bwd_inner_microstep: 1186.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770
[2024-06-10 17:45:35,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.35 | bwd_microstep: 1341.64 | bwd_inner_microstep: 1341.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408
[2024-06-10 17:45:37,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.82 | bwd_microstep: 1305.55 | bwd_inner_microstep: 1305.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 17:45:39,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.88 | bwd_microstep: 1341.46 | bwd_inner_microstep: 1341.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 17:45:41,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.28 | bwd_microstep: 1250.94 | bwd_inner_microstep: 1250.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506
[2024-06-10 17:45:42,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1316.66 | bwd_inner_microstep: 1316.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 17:45:44,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.70 | bwd_microstep: 886.80 | bwd_inner_microstep: 886.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 17:45:46,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.87 | bwd_microstep: 1494.45 | bwd_inner_microstep: 1494.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-10 17:45:48,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 1586.04 | bwd_inner_microstep: 1586.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677
[2024-06-10 17:45:50,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.35 | bwd_microstep: 1719.29 | bwd_inner_microstep: 1719.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3475
[2024-06-10 17:45:52,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.41 | bwd_microstep: 1571.08 | bwd_inner_microstep: 1571.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2133
[2024-06-10 17:45:54,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.90 | bwd_microstep: 926.72 | bwd_inner_microstep: 926.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 17:45:56,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1409.59 | bwd_inner_microstep: 1409.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3612
[2024-06-10 17:45:57,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1246.01 | bwd_inner_microstep: 1245.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 17:46:00,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.85 | bwd_microstep: 1524.31 | bwd_inner_microstep: 1524.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 17:46:02,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.19 | bwd_microstep: 1484.08 | bwd_inner_microstep: 1484.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 17:46:03,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.86 | bwd_microstep: 1406.42 | bwd_inner_microstep: 1406.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797
[2024-06-10 17:46:06,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.31 | bwd_microstep: 1652.71 | bwd_inner_microstep: 1652.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 17:46:08,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1422.27 | bwd_inner_microstep: 1422.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 17:46:09,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1253.51 | bwd_inner_microstep: 1253.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534
[2024-06-10 17:46:11,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1451.66 | bwd_inner_microstep: 1451.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3041
[2024-06-10 17:46:13,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.98 | bwd_microstep: 1329.18 | bwd_inner_microstep: 1329.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1932
[2024-06-10 17:46:14,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.38 | bwd_microstep: 774.99 | bwd_inner_microstep: 774.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 17:46:17,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.75 | bwd_microstep: 1646.94 | bwd_inner_microstep: 1646.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 17:46:19,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.86 | bwd_microstep: 1394.68 | bwd_inner_microstep: 1394.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 17:46:23,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 17:46:23,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 3515.52 | bwd_inner_microstep: 1568.55 | bwd_allreduce_microstep: 1946.92 | step_microstep: 38.05
[2024-06-10 17:46:23,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16020.60 | bwd: 44960.61 | bwd_inner: 43012.74 | bwd_allreduce: 1947.17 | step: 39.57
{'loss': 1.2703, 'learning_rate': 1.6084389742869543e-05, 'epoch': 0.58}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 17:46:25,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1373.09 | bwd_inner_microstep: 1373.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2933
[2024-06-10 17:46:26,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.94 | bwd_microstep: 1037.99 | bwd_inner_microstep: 1037.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3916
[2024-06-10 17:46:28,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.02 | bwd_microstep: 1690.25 | bwd_inner_microstep: 1690.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 17:46:30,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.06 | bwd_microstep: 1399.06 | bwd_inner_microstep: 1399.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 17:46:32,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1247.02 | bwd_inner_microstep: 1246.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 17:46:34,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.44 | bwd_microstep: 1482.68 | bwd_inner_microstep: 1482.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420
[2024-06-10 17:46:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.81 | bwd_microstep: 1217.48 | bwd_inner_microstep: 1217.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3497
[2024-06-10 17:46:38,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1345.64 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3770
[2024-06-10 17:46:40,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.09 | bwd_microstep: 1736.56 | bwd_inner_microstep: 1736.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3547
[2024-06-10 17:46:42,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1425.75 | bwd_inner_microstep: 1425.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 17:46:43,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.15 | bwd_microstep: 801.57 | bwd_inner_microstep: 801.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4025
[2024-06-10 17:46:45,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.64 | bwd_microstep: 1611.15 | bwd_inner_microstep: 1611.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1912
[2024-06-10 17:46:46,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.74 | bwd_microstep: 781.80 | bwd_inner_microstep: 781.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 17:46:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1485.67 | bwd_inner_microstep: 1485.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 17:46:50,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.67 | bwd_microstep: 789.84 | bwd_inner_microstep: 789.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3642
[2024-06-10 17:46:52,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.83 | bwd_microstep: 1474.82 | bwd_inner_microstep: 1474.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3429
[2024-06-10 17:46:53,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.97 | bwd_microstep: 1316.13 | bwd_inner_microstep: 1316.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900
[2024-06-10 17:46:54,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.34 | bwd_microstep: 780.89 | bwd_inner_microstep: 780.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 17:46:55,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.86 | bwd_microstep: 689.00 | bwd_inner_microstep: 688.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2825
[2024-06-10 17:46:57,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.05 | bwd_microstep: 1125.61 | bwd_inner_microstep: 1125.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 17:46:59,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.20 | bwd_microstep: 1284.43 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 17:47:01,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.95 | bwd_microstep: 1459.94 | bwd_inner_microstep: 1459.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 17:47:03,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1468.34 | bwd_inner_microstep: 1468.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2283
[2024-06-10 17:47:04,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.76 | bwd_microstep: 909.86 | bwd_inner_microstep: 909.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639
[2024-06-10 17:47:06,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1316.05 | bwd_inner_microstep: 1316.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3720
[2024-06-10 17:47:08,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1368.81 | bwd_inner_microstep: 1368.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 17:47:10,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1536.02 | bwd_inner_microstep: 1535.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 17:47:12,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1283.29 | bwd_inner_microstep: 1283.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 17:47:13,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1252.03 | bwd_inner_microstep: 1252.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-10 17:47:15,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1311.40 | bwd_inner_microstep: 1311.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 17:47:17,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1406.32 | bwd_inner_microstep: 1406.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 17:47:23,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-10 17:47:24,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 5770.25 | bwd_inner_microstep: 1580.91 | bwd_allreduce_microstep: 4189.27 | step_microstep: 38.80
[2024-06-10 17:47:24,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15327.85 | bwd: 45178.75 | bwd_inner: 40988.55 | bwd_allreduce: 4189.51 | step: 40.32
{'loss': 1.2268, 'learning_rate': 1.6047589064430268e-05, 'epoch': 0.58}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 17:47:25,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.45 | bwd_microstep: 1328.89 | bwd_inner_microstep: 1328.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 17:47:27,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.76 | bwd_microstep: 1278.54 | bwd_inner_microstep: 1278.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 17:47:29,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.90 | bwd_microstep: 1375.65 | bwd_inner_microstep: 1375.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 17:47:31,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.73 | bwd_microstep: 1240.44 | bwd_inner_microstep: 1240.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 17:47:33,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.64 | bwd_microstep: 1638.64 | bwd_inner_microstep: 1638.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4139
[2024-06-10 17:47:35,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1540.31 | bwd_inner_microstep: 1540.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 17:47:37,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.38 | bwd_microstep: 1278.10 | bwd_inner_microstep: 1278.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 17:47:39,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.77 | bwd_microstep: 1382.31 | bwd_inner_microstep: 1382.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 17:47:41,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.28 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 17:47:43,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.31 | bwd_microstep: 1411.67 | bwd_inner_microstep: 1411.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 17:47:45,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.71 | bwd_microstep: 1526.36 | bwd_inner_microstep: 1526.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 17:47:46,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1255.01 | bwd_inner_microstep: 1254.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3671
[2024-06-10 17:47:49,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.66 | bwd_microstep: 1679.81 | bwd_inner_microstep: 1679.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3653
[2024-06-10 17:47:51,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1660.22 | bwd_inner_microstep: 1660.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-10 17:47:53,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1414.21 | bwd_inner_microstep: 1414.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969
[2024-06-10 17:47:54,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.60 | bwd_microstep: 857.90 | bwd_inner_microstep: 857.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 17:47:56,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.69 | bwd_microstep: 1489.10 | bwd_inner_microstep: 1489.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 17:47:57,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.67 | bwd_microstep: 801.98 | bwd_inner_microstep: 801.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 17:47:59,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1502.88 | bwd_inner_microstep: 1502.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 17:48:01,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 17:48:03,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1459.14 | bwd_inner_microstep: 1459.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2081
[2024-06-10 17:48:04,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.48 | bwd_microstep: 850.13 | bwd_inner_microstep: 850.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713
[2024-06-10 17:48:06,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1333.97 | bwd_inner_microstep: 1333.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3704
[2024-06-10 17:48:08,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.33 | bwd_microstep: 1332.40 | bwd_inner_microstep: 1332.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 17:48:10,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.87 | bwd_microstep: 1426.78 | bwd_inner_microstep: 1426.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-10 17:48:12,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 1535.61 | bwd_inner_microstep: 1535.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 17:48:14,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.11 | bwd_microstep: 975.37 | bwd_inner_microstep: 975.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 17:48:16,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.27 | bwd_microstep: 1475.13 | bwd_inner_microstep: 1475.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 17:48:18,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.33 | bwd_microstep: 1600.75 | bwd_inner_microstep: 1600.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-10 17:48:19,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.20 | bwd_microstep: 959.58 | bwd_inner_microstep: 959.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 17:48:21,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.34 | bwd_microstep: 1477.31 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 17:48:26,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 17:48:26,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 3966.17 | bwd_inner_microstep: 1412.72 | bwd_allreduce_microstep: 2553.40 | step_microstep: 37.84
[2024-06-10 17:48:26,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16083.37 | bwd: 45726.18 | bwd_inner: 43171.88 | bwd_allreduce: 2553.63 | step: 39.24
 990/1726 [17:04:58<12:40:54, 62.03s/it]
 57%|█████▋    | 991/1726 [17:05:59<12:36:35, 61.76s/it]


 57%|█████▋    | 991/1726 [17:05:59<12:36:35, 61.76s/it]
 57%|█████▋    | 992/1726 [17:06:58<12:27:34, 61.11s/it]


 57%|█████▋    | 992/1726 [17:06:58<12:27:34, 61.11s/it]
 58%|█████▊    | 993/1726 [17:07:58<12:21:25, 60.69s/it]


 58%|█████▊    | 993/1726 [17:07:58<12:21:25, 60.69s/it]
 58%|█████▊    | 994/1726 [17:08:59<12:22:43, 60.88s/it]


 58%|█████▊    | 994/1726 [17:08:59<12:22:43, 60.88s/it]
 58%|█████▊    | 995/1726 [17:10:00<12:21:34, 60.87s/it]


 58%|█████▊    | 995/1726 [17:10:00<12:21:34, 60.87s/it]
 58{'loss': 1.2248, 'learning_rate': 1.6010802306361762e-05, 'epoch': 0.58}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 17:48:27,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.06 | bwd_microstep: 796.10 | bwd_inner_microstep: 796.04 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934
[2024-06-10 17:48:28,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.39 | bwd_microstep: 724.67 | bwd_inner_microstep: 724.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 17:48:30,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1376.41 | bwd_inner_microstep: 1376.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 17:48:32,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1347.12 | bwd_inner_microstep: 1347.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 17:48:33,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.19 | bwd_microstep: 1385.21 | bwd_inner_microstep: 1385.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 17:48:35,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.88 | bwd_microstep: 1148.75 | bwd_inner_microstep: 1148.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942
[2024-06-10 17:48:36,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.47 | bwd_microstep: 821.02 | bwd_inner_microstep: 820.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:48:38,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.89 | bwd_microstep: 1243.89 | bwd_inner_microstep: 1243.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 17:48:39,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.19 | bwd_microstep: 1147.49 | bwd_inner_microstep: 1147.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:48:41,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 17:48:43,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.31 | bwd_microstep: 1245.46 | bwd_inner_microstep: 1245.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910
[2024-06-10 17:48:44,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.44 | bwd_microstep: 779.14 | bwd_inner_microstep: 779.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2174
[2024-06-10 17:48:45,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.15 | bwd_microstep: 853.67 | bwd_inner_microstep: 853.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 17:48:46,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.83 | bwd_microstep: 798.76 | bwd_inner_microstep: 798.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 17:48:48,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1345.54 | bwd_inner_microstep: 1345.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 17:48:50,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 1352.79 | bwd_inner_microstep: 1352.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3965
[2024-06-10 17:48:52,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.85 | bwd_microstep: 1609.71 | bwd_inner_microstep: 1609.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928
[2024-06-10 17:48:53,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.60 | bwd_microstep: 697.72 | bwd_inner_microstep: 697.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 17:48:55,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.06 | bwd_microstep: 1417.47 | bwd_inner_microstep: 1417.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 17:48:57,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.12 | bwd_microstep: 1418.73 | bwd_inner_microstep: 1418.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 17:48:59,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1380.40 | bwd_inner_microstep: 1380.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 17:49:01,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1557.30 | bwd_inner_microstep: 1557.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1911
[2024-06-10 17:49:02,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.64 | bwd_microstep: 713.43 | bwd_inner_microstep: 713.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-10 17:49:04,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1421.79 | bwd_inner_microstep: 1421.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2162
[2024-06-10 17:49:05,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 885.38 | bwd_inner_microstep: 885.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 17:49:08,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.77 | bwd_microstep: 1653.57 | bwd_inner_microstep: 1653.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2176
[2024-06-10 17:49:09,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.31 | bwd_microstep: 981.95 | bwd_inner_microstep: 981.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3561
[2024-06-10 17:49:11,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.81 | bwd_microstep: 1559.66 | bwd_inner_microstep: 1559.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-10 17:49:13,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.95 | bwd_microstep: 1642.95 | bwd_inner_microstep: 1642.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 17:49:15,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1492.15 | bwd_inner_microstep: 1492.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 17:49:18,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1506.49 | bwd_inner_microstep: 1506.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 17:49:28,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.42 | optimizer_step: 6.60
[2024-06-10 17:49:28,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 10118.72 | bwd_inner_microstep: 1755.65 | bwd_allreduce_microstep: 8363.00 | step_microstep: 39.93
[2024-06-10 17:49:28,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14596.49 | bwd: 47669.60 | bwd_inner: 39305.63 | bwd_allreduce: 8363.26 | step: 41.46
{'loss': 1.2944, 'learning_rate': 1.5974029598226796e-05, 'epoch': 0.58}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 17:49:30,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1343.26 | bwd_inner_microstep: 1343.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 17:49:32,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.55 | bwd_microstep: 1445.70 | bwd_inner_microstep: 1445.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 17:49:34,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1372.11 | bwd_inner_microstep: 1372.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 17:49:36,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.15 | bwd_microstep: 1375.52 | bwd_inner_microstep: 1375.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4110
[2024-06-10 17:49:38,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.30 | bwd_microstep: 1630.43 | bwd_inner_microstep: 1630.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 17:49:40,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1537.35 | bwd_inner_microstep: 1537.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 17:49:42,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.82 | bwd_microstep: 1243.36 | bwd_inner_microstep: 1243.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1953
[2024-06-10 17:49:43,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.20 | bwd_microstep: 699.65 | bwd_inner_microstep: 699.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2188
[2024-06-10 17:49:44,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.30 | bwd_microstep: 792.95 | bwd_inner_microstep: 792.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692
[2024-06-10 17:49:46,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.39 | bwd_microstep: 1455.65 | bwd_inner_microstep: 1455.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 17:49:48,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1384.27 | bwd_inner_microstep: 1384.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 17:49:49,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.02 | bwd_microstep: 797.37 | bwd_inner_microstep: 797.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 17:49:51,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.17 | bwd_microstep: 1716.04 | bwd_inner_microstep: 1716.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-10 17:49:54,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.28 | bwd_microstep: 1706.35 | bwd_inner_microstep: 1706.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 17:49:55,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.25 | bwd_microstep: 791.69 | bwd_inner_microstep: 791.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3651
[2024-06-10 17:49:57,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1649.34 | bwd_inner_microstep: 1649.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2095
[2024-06-10 17:49:58,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.16 | bwd_microstep: 850.42 | bwd_inner_microstep: 850.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 17:50:01,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.85 | bwd_microstep: 1654.13 | bwd_inner_microstep: 1654.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291
[2024-06-10 17:50:02,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.33 | bwd_microstep: 911.00 | bwd_inner_microstep: 910.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 17:50:04,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1348.12 | bwd_inner_microstep: 1348.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 17:50:06,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058
[2024-06-10 17:50:07,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.74 | bwd_microstep: 911.58 | bwd_inner_microstep: 911.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 17:50:09,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1509.06 | bwd_inner_microstep: 1509.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3636
[2024-06-10 17:50:11,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1436.17 | bwd_inner_microstep: 1436.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 17:50:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1500.07 | bwd_inner_microstep: 1500.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 17:50:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.07 | bwd_microstep: 1649.82 | bwd_inner_microstep: 1649.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 17:50:17,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.48 | bwd_microstep: 976.15 | bwd_inner_microstep: 976.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 17:50:19,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1442.02 | bwd_inner_microstep: 1441.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 17:50:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.82 | bwd_microstep: 1547.53 | bwd_inner_microstep: 1547.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 17:50:23,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1508.72 | bwd_inner_microstep: 1508.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2882
[2024-06-10 17:50:24,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.50 | bwd_microstep: 1086.67 | bwd_inner_microstep: 1086.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1425
[2024-06-10 17:50:32,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.36 | optimizer_step: 6.60
[2024-06-10 17:50:32,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 206.64 | bwd_microstep: 7472.04 | bwd_inner_microstep: 610.88 | bwd_allreduce_microstep: 6861.09 | step_microstep: 38.94
[2024-06-10 17:50:32,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15384.50 | bwd: 48143.96 | bwd_inner: 41281.93 | bwd_allreduce: 6861.33 | step: 40.40
{'loss': 1.2004, 'learning_rate': 1.5937271069538665e-05, 'epoch': 0.58}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3933
[2024-06-10 17:50:34,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.83 | bwd_microstep: 1634.66 | bwd_inner_microstep: 1634.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 17:50:36,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1304.10 | bwd_inner_microstep: 1304.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 17:50:38,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1377.39 | bwd_inner_microstep: 1377.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2374
[2024-06-10 17:50:39,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.73 | bwd_microstep: 993.55 | bwd_inner_microstep: 993.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 17:50:41,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1379.74 | bwd_inner_microstep: 1379.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 17:50:43,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1388.04 | bwd_inner_microstep: 1388.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 17:50:46,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.87 | bwd_microstep: 1627.56 | bwd_inner_microstep: 1627.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 17:50:47,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.64 | bwd_microstep: 1188.88 | bwd_inner_microstep: 1188.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 17:50:49,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.62 | bwd_microstep: 1523.82 | bwd_inner_microstep: 1523.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 17:50:51,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1387.48 | bwd_inner_microstep: 1387.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3428
[2024-06-10 17:50:53,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.80 | bwd_microstep: 1310.59 | bwd_inner_microstep: 1310.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2995
[2024-06-10 17:50:55,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.96 | bwd_microstep: 1104.02 | bwd_inner_microstep: 1103.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732
[2024-06-10 17:50:57,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.12 | bwd_microstep: 1620.52 | bwd_inner_microstep: 1620.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3488
[2024-06-10 17:50:59,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.88 | bwd_microstep: 1578.44 | bwd_inner_microstep: 1578.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626
[2024-06-10 17:51:01,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.35 | bwd_microstep: 1468.67 | bwd_inner_microstep: 1468.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 17:51:03,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.13 | bwd_microstep: 1400.61 | bwd_inner_microstep: 1400.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 17:51:05,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1410.50 | bwd_inner_microstep: 1410.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3557
[2024-06-10 17:51:07,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.16 | bwd_microstep: 1476.26 | bwd_inner_microstep: 1476.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-10 17:51:09,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1533.88 | bwd_inner_microstep: 1533.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2922
[2024-06-10 17:51:11,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.13 | bwd_microstep: 1161.19 | bwd_inner_microstep: 1161.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 17:51:13,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.71 | bwd_microstep: 1552.20 | bwd_inner_microstep: 1552.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 17:51:15,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1553.46 | bwd_inner_microstep: 1553.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 17:51:17,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.83 | bwd_microstep: 1497.68 | bwd_inner_microstep: 1497.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 17:51:19,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.94 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1393.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2184
[2024-06-10 17:51:20,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.60 | bwd_microstep: 763.14 | bwd_inner_microstep: 763.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 17:51:22,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.41 | bwd_microstep: 1410.76 | bwd_inner_microstep: 1410.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 17:51:24,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.02 | bwd_microstep: 1653.72 | bwd_inner_microstep: 1653.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 17:51:26,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1437.32 | bwd_inner_microstep: 1437.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 17:51:28,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568
[2024-06-10 17:51:30,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.47 | bwd_microstep: 1589.27 | bwd_inner_microstep: 1589.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 17:51:32,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.01 | bwd_microstep: 1481.39 | bwd_inner_microstep: 1481.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3588
[2024-06-10 17:51:35,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 17:51:35,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.18 | bwd_microstep: 2039.57 | bwd_inner_microstep: 1335.86 | bwd_allreduce_microstep: 703.67 | step_microstep: 37.73
[2024-06-10 17:51:35,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16764.00 | bwd: 45621.08 | bwd_inner: 44916.52 | bwd_allreduce: 703.89 | step: 39.17
{'loss': 1.2546, 'learning_rate': 1.5900526849760697e-05, 'epoch': 0.58}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3567
[2024-06-10 17:51:37,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.97 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4235
[2024-06-10 17:51:39,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.29 | bwd_microstep: 1596.73 | bwd_inner_microstep: 1596.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3814
[2024-06-10 17:51:41,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.35 | bwd_microstep: 1511.44 | bwd_inner_microstep: 1511.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 17:51:43,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1340.28 | bwd_inner_microstep: 1340.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 17:51:45,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1381.25 | bwd_inner_microstep: 1381.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 17:51:47,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.62 | bwd_microstep: 1489.01 | bwd_inner_microstep: 1488.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3759
[2024-06-10 17:51:49,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.67 | bwd_microstep: 1245.43 | bwd_inner_microstep: 1245.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 17:51:50,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.83 | bwd_microstep: 793.01 | bwd_inner_microstep: 792.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 17:51:51,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.77 | bwd_microstep: 795.86 | bwd_inner_microstep: 795.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2221
[2024-06-10 17:51:52,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.31 | bwd_microstep: 957.62 | bwd_inner_microstep: 957.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 17:51:53,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.11 | bwd_microstep: 798.92 | bwd_inner_microstep: 798.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 17:51:54,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.17 | bwd_microstep: 685.86 | bwd_inner_microstep: 685.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-10 17:51:56,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1417.94 | bwd_inner_microstep: 1417.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2125
[2024-06-10 17:51:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.68 | bwd_microstep: 1022.30 | bwd_inner_microstep: 1022.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 17:51:59,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1288.69 | bwd_inner_microstep: 1288.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-10 17:52:01,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1421.41 | bwd_inner_microstep: 1421.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510
[2024-06-10 17:52:03,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1418.08 | bwd_inner_microstep: 1418.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 17:52:05,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1509.90 | bwd_inner_microstep: 1509.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3655
[2024-06-10 17:52:07,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.26 | bwd_microstep: 1325.66 | bwd_inner_microstep: 1325.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 17:52:09,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1297.55 | bwd_inner_microstep: 1297.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 17:52:11,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1295.81 | bwd_inner_microstep: 1295.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3605
[2024-06-10 17:52:13,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.80 | bwd_microstep: 1557.70 | bwd_inner_microstep: 1557.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2008
[2024-06-10 17:52:14,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.35 | bwd_microstep: 834.40 | bwd_inner_microstep: 834.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 17:52:16,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1279.26 | bwd_inner_microstep: 1279.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2225
[2024-06-10 17:52:17,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.29 | bwd_microstep: 863.79 | bwd_inner_microstep: 863.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3615
[2024-06-10 17:52:19,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.25 | bwd_microstep: 1460.73 | bwd_inner_microstep: 1460.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 17:52:21,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1345.60 | bwd_inner_microstep: 1345.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 17:52:23,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1394.30 | bwd_inner_microstep: 1394.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3571
[2024-06-10 17:52:25,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.04 | bwd_microstep: 1661.53 | bwd_inner_microstep: 1661.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 17:52:27,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.13 | bwd_microstep: 1639.17 | bwd_inner_microstep: 1639.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3582
[2024-06-10 17:52:29,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.97 | bwd_microstep: 1529.68 | bwd_inner_microstep: 1529.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 17:52:36,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 17:52:36,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 5670.83 | bwd_inner_microstep: 1761.97 | bwd_allreduce_microstep: 3908.79 | step_microstep: 39.17
[2024-06-10 17:52:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15398.89 | bwd: 45183.09 | bwd_inner: 41273.38 | bwd_allreduce: 3909.03 | step: 40.72
{'loss': 1.2331, 'learning_rate': 1.586379706830586e-05, 'epoch': 0.58}
%|█████▊    | 996/1726 [17:11:02<12:25:11, 61.25s/it]


 58%|█████▊    | 996/1726 [17:11:02<12:25:11, 61.25s/it]
 58%|█████▊    | 997/1726 [17:12:05<12:29:03, 61.65s/it]


 58%|█████▊    | 997/1726 [17:12:05<12:29:03, 61.65s/it]
 58%|█████▊    | 998/1726 [17:13:09<12:36:04, 62.31s/it]


 58%|█████▊    | 998/1726 [17:13:09<12:36:04, 62.31s/it]
 58%|█████▊    | 999/1726 [17:14:12<12:36:31, 62.44s/it]


 58%|█████▊    | 999/1726 [17:14:12<12:36:31, 62.44s/it]
 58%|█████▊    | 1000/1726 [17:15:12<12:29:59, 61.98s/it]


 58%|█████▊    | 1000/1726 [17:15:12<12:29:59, 61.98s/it][INFO|trainer.py:2936] 2024-06-10 17:52:38,621 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000
[INFO|configuration_utils.py:473] 2024-06-10 17:52:38,625 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/config.json
[INFO|configuration_utils.py:594] 2024-06-10 17:52:38,627 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-10 17:52:47,451 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-10 17:52:47,534 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-10 17:52:47,536 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-10 17:52:47,537 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/added_tokens.json
[2024-06-10 17:52:47,751] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is about to be saved!
[2024-06-10 17:52:47,763] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/mp_rank_00_model_states.pt
[2024-06-10 17:52:47,763] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/mp_rank_00_model_states.pt...
[2024-06-10 17:52:56,843] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/mp_rank_00_model_states.pt.
[2024-06-10 17:52:56,855] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-10 17:53:08,739] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-10 17:53:08,757] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-10 17:53:08,758] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1000 is ready now!
[INFO|trainer.py:3028] 2024-06-10 17:53:08,794 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-400] due to args.save_total_limit
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 17:53:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.15 | bwd_microstep: 1365.73 | bwd_inner_microstep: 1365.53 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 17:53:13,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.30 | bwd_microstep: 1483.85 | bwd_inner_microstep: 1483.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 17:53:15,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.64 | bwd_microstep: 1347.26 | bwd_inner_microstep: 1347.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-10 17:53:17,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.51 | bwd_microstep: 1439.32 | bwd_inner_microstep: 1439.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 17:53:18,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.98 | bwd_microstep: 1334.64 | bwd_inner_microstep: 1334.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 17:53:20,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.80 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-10 17:53:22,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1431.43 | bwd_inner_microstep: 1431.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 17:53:24,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.12 | bwd_microstep: 1147.85 | bwd_inner_microstep: 1147.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 17:53:26,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1284.43 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 17:53:27,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.29 | bwd_microstep: 788.04 | bwd_inner_microstep: 788.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 17:53:29,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 17:53:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1243.58 | bwd_inner_microstep: 1243.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 17:53:33,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1348.69 | bwd_inner_microstep: 1348.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-10 17:53:35,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1524.82 | bwd_inner_microstep: 1524.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 17:53:37,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1476.80 | bwd_inner_microstep: 1476.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3639
[2024-06-10 17:53:39,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1342.49 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 17:53:40,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1349.18 | bwd_inner_microstep: 1349.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-10 17:53:41,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.75 | bwd_microstep: 726.39 | bwd_inner_microstep: 726.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-10 17:53:43,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.28 | bwd_microstep: 801.12 | bwd_inner_microstep: 801.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1964
[2024-06-10 17:53:44,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.32 | bwd_microstep: 841.35 | bwd_inner_microstep: 841.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 17:53:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 17:53:48,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1387.22 | bwd_inner_microstep: 1387.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 17:53:50,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.87 | bwd_microstep: 1525.69 | bwd_inner_microstep: 1525.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 17:53:52,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.39 | bwd_microstep: 1460.26 | bwd_inner_microstep: 1460.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998
[2024-06-10 17:53:53,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.18 | bwd_microstep: 896.99 | bwd_inner_microstep: 896.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2187
[2024-06-10 17:53:54,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.25 | bwd_microstep: 954.93 | bwd_inner_microstep: 954.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 17:53:56,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1391.03 | bwd_inner_microstep: 1391.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3822
[2024-06-10 17:53:59,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.04 | bwd_microstep: 1750.32 | bwd_inner_microstep: 1750.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3682
[2024-06-10 17:54:01,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.98 | bwd_microstep: 1587.52 | bwd_inner_microstep: 1587.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 17:54:03,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1284.10 | bwd_inner_microstep: 1284.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 17:54:05,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1533.49 | bwd_inner_microstep: 1533.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 17:54:09,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 17:54:09,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 3536.41 | bwd_inner_microstep: 1751.46 | bwd_allreduce_microstep: 1784.90 | step_microstep: 38.01
[2024-06-10 17:54:09,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15785.95 | bwd: 44064.73 | bwd_inner: 42278.79 | bwd_allreduce: 1785.21 | step: 39.60
{'loss': 1.2108, 'learning_rate': 1.5827081854536237e-05, 'epoch': 0.58}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 17:54:11,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.59 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3933
[2024-06-10 17:54:13,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1494.41 | bwd_inner_microstep: 1494.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3878
[2024-06-10 17:54:15,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.48 | bwd_microstep: 1681.34 | bwd_inner_microstep: 1681.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1939
[2024-06-10 17:54:16,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.41 | bwd_microstep: 853.30 | bwd_inner_microstep: 853.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3773
[2024-06-10 17:54:18,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1402.33 | bwd_inner_microstep: 1402.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 17:54:20,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-10 17:54:22,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1549.97 | bwd_inner_microstep: 1549.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479
[2024-06-10 17:54:24,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1404.39 | bwd_inner_microstep: 1404.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3611
[2024-06-10 17:54:27,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1655.57 | bwd_inner_microstep: 1655.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3496
[2024-06-10 17:54:29,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.17 | bwd_microstep: 1717.35 | bwd_inner_microstep: 1717.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2103
[2024-06-10 17:54:30,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.19 | bwd_microstep: 1013.16 | bwd_inner_microstep: 1013.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 17:54:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.97 | bwd_microstep: 1487.33 | bwd_inner_microstep: 1487.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2056
[2024-06-10 17:54:34,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.95 | bwd_microstep: 913.59 | bwd_inner_microstep: 913.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 17:54:35,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 800.55 | bwd_inner_microstep: 800.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 17:54:37,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.62 | bwd_microstep: 1484.18 | bwd_inner_microstep: 1484.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 17:54:38,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 796.93 | bwd_inner_microstep: 796.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-10 17:54:39,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.37 | bwd_microstep: 1193.01 | bwd_inner_microstep: 1192.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 17:54:41,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1397.21 | bwd_inner_microstep: 1397.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 17:54:43,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 974.62 | bwd_inner_microstep: 974.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 17:54:45,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.84 | bwd_microstep: 1460.20 | bwd_inner_microstep: 1460.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 17:54:47,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1256.22 | bwd_inner_microstep: 1256.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 17:54:48,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.02 | bwd_microstep: 800.18 | bwd_inner_microstep: 800.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 17:54:50,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1516.38 | bwd_inner_microstep: 1516.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 17:54:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1399.49 | bwd_inner_microstep: 1399.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 17:54:54,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.22 | bwd_microstep: 1664.14 | bwd_inner_microstep: 1664.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 17:54:56,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1351.66 | bwd_inner_microstep: 1351.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3582
[2024-06-10 17:54:58,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.21 | bwd_microstep: 1253.64 | bwd_inner_microstep: 1253.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-10 17:55:00,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1442.97 | bwd_inner_microstep: 1442.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 17:55:02,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.90 | bwd_microstep: 1647.88 | bwd_inner_microstep: 1647.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2292
[2024-06-10 17:55:03,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.72 | bwd_microstep: 1073.79 | bwd_inner_microstep: 1073.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3440
[2024-06-10 17:55:10,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.11 | bwd_microstep: 1611.44 | bwd_inner_microstep: 1611.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2811
[2024-06-10 17:55:16,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 20.32 | optimizer_gradients: 4.32 | optimizer_step: 6.59
[2024-06-10 17:55:16,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.74 | bwd_microstep: 6208.04 | bwd_inner_microstep: 1303.45 | bwd_allreduce_microstep: 4904.53 | step_microstep: 41.94
[2024-06-10 17:55:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15721.17 | bwd: 47167.02 | bwd_inner: 42261.58 | bwd_allreduce: 4904.76 | step: 43.42
{'loss': 1.2651, 'learning_rate': 1.5790381337762643e-05, 'epoch': 0.58}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 17:55:18,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.66 | bwd_microstep: 1266.77 | bwd_inner_microstep: 1266.67 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3554
[2024-06-10 17:55:20,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1228.35 | bwd_inner_microstep: 1228.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 17:55:22,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.46 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3925
[2024-06-10 17:55:24,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.96 | bwd_microstep: 1692.95 | bwd_inner_microstep: 1692.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928
[2024-06-10 17:55:25,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.24 | bwd_microstep: 726.17 | bwd_inner_microstep: 726.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 17:55:27,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.16 | bwd_microstep: 1244.81 | bwd_inner_microstep: 1244.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873
[2024-06-10 17:55:28,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.10 | bwd_microstep: 678.11 | bwd_inner_microstep: 678.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2888
[2024-06-10 17:55:29,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.25 | bwd_microstep: 1022.65 | bwd_inner_microstep: 1022.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 17:55:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1411.49 | bwd_inner_microstep: 1411.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3760
[2024-06-10 17:55:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.12 | bwd_microstep: 1604.42 | bwd_inner_microstep: 1604.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 17:55:35,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1383.53 | bwd_inner_microstep: 1383.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492
[2024-06-10 17:55:37,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1442.22 | bwd_inner_microstep: 1442.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 17:55:38,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.27 | bwd_microstep: 807.70 | bwd_inner_microstep: 807.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629
[2024-06-10 17:55:40,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1533.53 | bwd_inner_microstep: 1533.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 17:55:42,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.24 | bwd_microstep: 1281.97 | bwd_inner_microstep: 1281.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-10 17:55:44,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.63 | bwd_microstep: 1190.10 | bwd_inner_microstep: 1190.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1932
[2024-06-10 17:55:45,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.60 | bwd_microstep: 848.08 | bwd_inner_microstep: 848.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2117
[2024-06-10 17:55:46,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.76 | bwd_microstep: 922.63 | bwd_inner_microstep: 922.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 17:55:49,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.51 | bwd_microstep: 1618.39 | bwd_inner_microstep: 1618.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2962
[2024-06-10 17:55:50,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.03 | bwd_microstep: 1101.44 | bwd_inner_microstep: 1101.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2132
[2024-06-10 17:55:51,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.03 | bwd_microstep: 834.83 | bwd_inner_microstep: 834.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 17:55:53,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1484.24 | bwd_inner_microstep: 1484.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3607
[2024-06-10 17:55:55,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1341.37 | bwd_inner_microstep: 1341.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 17:55:57,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1404.65 | bwd_inner_microstep: 1404.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-10 17:55:59,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1306.68 | bwd_inner_microstep: 1306.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 17:56:01,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.94 | bwd_microstep: 1280.85 | bwd_inner_microstep: 1280.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3555
[2024-06-10 17:56:03,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.36 | bwd_microstep: 1347.70 | bwd_inner_microstep: 1347.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-10 17:56:05,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1420.58 | bwd_inner_microstep: 1420.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3570
[2024-06-10 17:56:06,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1361.22 | bwd_inner_microstep: 1361.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2238
[2024-06-10 17:56:08,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.65 | bwd_microstep: 863.46 | bwd_inner_microstep: 863.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003
[2024-06-10 17:56:09,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.68 | bwd_microstep: 737.87 | bwd_inner_microstep: 737.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 17:56:17,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 17:56:17,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 8016.27 | bwd_inner_microstep: 1729.70 | bwd_allreduce_microstep: 6286.51 | step_microstep: 38.26
[2024-06-10 17:56:17,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14773.07 | bwd: 45689.85 | bwd_inner: 39402.34 | bwd_allreduce: 6286.80 | step: 39.82
{'loss': 1.2468, 'learning_rate': 1.5753695647244083e-05, 'epoch': 0.58}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 17:56:19,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1340.76 | bwd_inner_microstep: 1340.55 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.13
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-10 17:56:20,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.14 | bwd_microstep: 674.37 | bwd_inner_microstep: 674.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 17:56:22,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1863
[2024-06-10 17:56:23,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 739.16 | bwd_inner_microstep: 739.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 17:56:25,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.49 | bwd_microstep: 1243.77 | bwd_inner_microstep: 1243.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 17:56:27,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1394.54 | bwd_inner_microstep: 1394.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 17:56:28,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3701
[2024-06-10 17:56:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1291.77 | bwd_inner_microstep: 1291.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 4032
[2024-06-10 17:56:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.57 | bwd_microstep: 1494.21 | bwd_inner_microstep: 1494.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 17:56:34,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1448.89 | bwd_inner_microstep: 1448.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 17:56:36,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.05 | bwd_microstep: 1449.62 | bwd_inner_microstep: 1449.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3279
[2024-06-10 17:56:38,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.96 | bwd_microstep: 1284.37 | bwd_inner_microstep: 1284.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507
[2024-06-10 17:56:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1448.18 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 17:56:42,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1391.86 | bwd_inner_microstep: 1391.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 17:56:44,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.99 | bwd_microstep: 1511.91 | bwd_inner_microstep: 1511.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 17:56:46,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1508.99 | bwd_inner_microstep: 1508.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3516
[2024-06-10 17:56:48,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.47 | bwd_microstep: 1224.03 | bwd_inner_microstep: 1224.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 17:56:50,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1392.05 | bwd_inner_microstep: 1392.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 17:56:52,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1299.99 | bwd_inner_microstep: 1299.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 17:56:53,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.25 | bwd_microstep: 1294.75 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 17:56:55,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.63 | bwd_microstep: 1391.29 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 17:56:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 17:56:59,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1294.45 | bwd_inner_microstep: 1294.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3708
[2024-06-10 17:57:01,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.67 | bwd_microstep: 1667.82 | bwd_inner_microstep: 1667.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2270
[2024-06-10 17:57:02,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.70 | bwd_microstep: 811.46 | bwd_inner_microstep: 811.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 17:57:05,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1602.75 | bwd_inner_microstep: 1602.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 17:57:07,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.27 | bwd_microstep: 1562.70 | bwd_inner_microstep: 1562.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 17:57:09,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1496.78 | bwd_inner_microstep: 1496.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3620
[2024-06-10 17:57:11,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.04 | bwd_microstep: 1675.27 | bwd_inner_microstep: 1675.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3566
[2024-06-10 17:57:13,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.79 | bwd_microstep: 1331.65 | bwd_inner_microstep: 1331.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 17:57:15,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.63 | bwd_microstep: 1335.13 | bwd_inner_microstep: 1335.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3439
[2024-06-10 17:57:21,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 17:57:21,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.38 | bwd_microstep: 5230.48 | bwd_inner_microstep: 1582.74 | bwd_allreduce_microstep: 3647.69 | step_microstep: 37.82
[2024-06-10 17:57:21,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16153.41 | bwd: 46948.34 | bwd_inner: 43299.60 | bwd_allreduce: 3648.00 | step: 39.41
{'loss': 1.2429, 'learning_rate': 1.571702491218738e-05, 'epoch': 0.58}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396
[2024-06-10 17:57:22,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1273.22 | bwd_inner_microstep: 1273.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 17:57:24,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1241.79 | bwd_inner_microstep: 1241.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899
[2024-06-10 17:57:26,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.42 | bwd_microstep: 1482.26 | bwd_inner_microstep: 1482.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2915
[2024-06-10 17:57:28,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.14 | bwd_microstep: 1127.27 | bwd_inner_microstep: 1127.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 17:57:30,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1638.21 | bwd_inner_microstep: 1638.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 17:57:32,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1483.42 | bwd_inner_microstep: 1483.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3576
[2024-06-10 17:57:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.32 | bwd_microstep: 1205.33 | bwd_inner_microstep: 1205.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 17:57:35,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.83 | bwd_microstep: 700.38 | bwd_inner_microstep: 700.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 17:57:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1385.59 | bwd_inner_microstep: 1385.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 17:57:39,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1428.52 | bwd_inner_microstep: 1428.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 17:57:40,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 17:57:41,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.27 | bwd_microstep: 797.50 | bwd_inner_microstep: 797.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 17:57:43,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1344.66 | bwd_inner_microstep: 1344.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409
[2024-06-10 17:57:45,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1310.42 | bwd_inner_microstep: 1310.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 17:57:47,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1513.84 | bwd_inner_microstep: 1513.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404
[2024-06-10 17:57:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1436.29 | bwd_inner_microstep: 1436.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 17:57:51,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.56 | bwd_microstep: 1585.78 | bwd_inner_microstep: 1585.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-10 17:57:52,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 800.91 | bwd_inner_microstep: 800.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610
[2024-06-10 17:57:54,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.74 | bwd_microstep: 1346.11 | bwd_inner_microstep: 1346.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-10 17:57:55,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 805.02 | bwd_inner_microstep: 805.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807
[2024-06-10 17:57:58,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.10 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-10 17:58:00,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.63 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825
[2024-06-10 17:58:02,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.69 | bwd_microstep: 1486.50 | bwd_inner_microstep: 1486.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 17:58:04,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 17:58:06,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.77 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3591
[2024-06-10 17:58:08,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.86 | bwd_microstep: 1762.69 | bwd_inner_microstep: 1762.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607
[2024-06-10 17:58:10,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.87 | bwd_microstep: 1706.82 | bwd_inner_microstep: 1706.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 17:58:12,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.26 | bwd_microstep: 1401.80 | bwd_inner_microstep: 1401.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2051
[2024-06-10 17:58:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.24 | bwd_microstep: 752.40 | bwd_inner_microstep: 752.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3573
[2024-06-10 17:58:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1364.92 | bwd_inner_microstep: 1364.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 17:58:17,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1433.36 | bwd_inner_microstep: 1433.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577
[2024-06-10 17:58:21,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.63
[2024-06-10 17:58:21,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.31 | bwd_microstep: 3373.73 | bwd_inner_microstep: 1612.23 | bwd_allreduce_microstep: 1761.44 | step_microstep: 37.85
[2024-06-10 17:58:21,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15871.68 | bwd: 44318.67 | bwd_inner: 42556.33 | bwd_allreduce: 1761.67 | step: 39.43
{'loss': 1.2446, 'learning_rate': 1.5680369261746674e-05, 'epoch': 0.58}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1926
[2024-06-10 17:58:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.56 | bwd_microstep: 815.47 | bwd_inner_microstep: 815.40 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 17:58:23,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.49 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 17:58:25,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 17:58:27,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.67 | bwd_microstep: 1252.40 | bwd_inner_microstep: 1252.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 17:58:29,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.00 | bwd_microstep: 1183.02 | bwd_inner_microstep: 1182.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586
[2024-06-10 17:58:31,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.70 | bwd_microstep: 1435.87 | bwd_inner_microstep: 1435.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 17:58:33,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.57 | bwd_microstep: 1291.19 | bwd_inner_microstep: 1291.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-10 17:58:34,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.02 | bwd_microstep: 1314.02 | bwd_inner_microstep: 1313.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 17:58:36,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1532.56 | bwd_inner_microstep: 1532.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 17:58:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 17:58:40,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1161.29 | bwd_inner_microstep: 1161.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3947
[2024-06-10 17:58:42,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.47 | bwd_microstep: 1603.74 | bwd_inner_microstep: 1603.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3483
[2024-06-10 17:58:45,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.06 | bwd_microstep: 1679.01 | bwd_inner_microstep: 1678.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3642
[2024-06-10 17:58:47,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.78 | bwd_microstep: 1678.22 | bwd_inner_microstep: 1678.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510
[2024-06-10 17:58:48,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.18 | bwd_microstep: 1193.63 | bwd_inner_microstep: 1193.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 17:58:51,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.33 | bwd_microstep: 1660.10 | bwd_inner_microstep: 1660.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 17:58:53,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1404.78 | bwd_inner_microstep: 1404.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3820
[2024-06-10 17:58:55,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1506.64 | bwd_inner_microstep: 1506.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 17:58:57,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1287.08 | bwd_inner_microstep: 1287.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 17:58:58,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1357.50 | bwd_inner_microstep: 1357.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-10 17:59:00,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.49 | bwd_microstep: 1450.91 | bwd_inner_microstep: 1450.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-10 17:59:02,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1317.05 | bwd_inner_microstep: 1317.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-10 17:59:04,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.22 | bwd_microstep: 1423.57 | bwd_inner_microstep: 1423.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1996
[2024-06-10 17:59:05,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.28 | bwd_microstep: 832.61 | bwd_inner_microstep: 832.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2286
[2024-06-10 17:59:07,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.20 | bwd_microstep: 910.70 | bwd_inner_microstep: 910.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3637
[2024-06-10 17:59:09,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1580.04 | bwd_inner_microstep: 1580.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 17:59:11,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.05 | bwd_microstep: 1303.78 | bwd_inner_microstep: 1303.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-10 17:59:12,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.81 | bwd_microstep: 973.14 | bwd_inner_microstep: 973.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-10 17:59:14,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1587.86 | bwd_inner_microstep: 1587.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3569
[2024-06-10 17:59:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.86 | bwd_microstep: 1697.10 | bwd_inner_microstep: 1697.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831
[2024-06-10 17:59:19,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.84 | bwd_microstep: 1755.76 | bwd_inner_microstep: 1755.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3420
[2024-06-10 17:59:41,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-10 17:59:41,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.63 | bwd_microstep: 21864.03 | bwd_inner_microstep: 1761.36 | bwd_allreduce_microstep: 20102.61 | step_microstep: 38.59
[2024-06-10 17:59:41,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16185.81 | bwd: 63615.31 | bwd_inner: 43511.75 | bwd_allreduce: 20102.87 | step: 40.12

 58%|█████▊    | 1001/1726 [17:16:46<14:22:17, 71.36s/it]


 58%|█████▊    | 1001/1726 [17:16:46<14:22:17, 71.36s/it]
 58%|█████▊    | 1002/1726 [17:17:53<14:07:01, 70.19s/it]


 58%|█████▊    | 1002/1726 [17:17:53<14:07:01, 70.19s/it]
 58%|█████▊    | 1003/1726 [17:18:54<13:31:51, 67.37s/it]


 58%|█████▊    | 1003/1726 [17:18:54<13:31:51, 67.37s/it]
 58%|█████▊    | 1004/1726 [17:19:57<13:16:32, 66.20s/it]


 58%|█████▊    | 1004/1726 [17:19:57<13:16:32, 66.20s/it]
 58%|█████▊    | 1005/1726 [17:20:58<12:55:02, 64.50s/it]


 58%|█████▊    | 1005/1726 [17:20:58<12:55:02, 64.50s/it]
 58%|█████▊    |{'loss': 1.215, 'learning_rate': 1.564372882502297e-05, 'epoch': 0.58}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3542
[2024-06-10 17:59:44,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.16 | bwd_microstep: 1572.17 | bwd_inner_microstep: 1572.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3986
[2024-06-10 17:59:46,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.66 | bwd_microstep: 1696.77 | bwd_inner_microstep: 1696.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 17:59:48,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.01 | bwd_microstep: 1640.51 | bwd_inner_microstep: 1640.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 17:59:50,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.43 | bwd_microstep: 1540.98 | bwd_inner_microstep: 1540.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398
[2024-06-10 17:59:52,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.48 | bwd_microstep: 1145.03 | bwd_inner_microstep: 1145.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 17:59:54,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.39 | bwd_microstep: 1276.68 | bwd_inner_microstep: 1276.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 17:59:56,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2172
[2024-06-10 17:59:57,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.68 | bwd_microstep: 945.52 | bwd_inner_microstep: 945.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688
[2024-06-10 17:59:59,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.36 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397
[2024-06-10 18:00:01,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.50 | bwd_microstep: 1429.75 | bwd_inner_microstep: 1429.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435
[2024-06-10 18:00:03,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.15 | bwd_microstep: 1306.68 | bwd_inner_microstep: 1306.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433
[2024-06-10 18:00:05,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1443.40 | bwd_inner_microstep: 1443.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513
[2024-06-10 18:00:07,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.42 | bwd_microstep: 1428.76 | bwd_inner_microstep: 1428.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489
[2024-06-10 18:00:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.99 | bwd_microstep: 1219.00 | bwd_inner_microstep: 1218.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1974
[2024-06-10 18:00:09,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.02 | bwd_microstep: 827.85 | bwd_inner_microstep: 827.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2110
[2024-06-10 18:00:11,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.97 | bwd_microstep: 918.72 | bwd_inner_microstep: 918.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2301
[2024-06-10 18:00:12,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.18 | bwd_microstep: 910.94 | bwd_inner_microstep: 910.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 18:00:14,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1314.14 | bwd_inner_microstep: 1314.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-10 18:00:16,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1300.84 | bwd_inner_microstep: 1300.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465
[2024-06-10 18:00:17,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.34 | bwd_microstep: 1185.28 | bwd_inner_microstep: 1185.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-10 18:00:19,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1283.64 | bwd_inner_microstep: 1283.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1982
[2024-06-10 18:00:20,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.90 | bwd_microstep: 705.36 | bwd_inner_microstep: 705.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 18:00:22,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1390.17 | bwd_inner_microstep: 1390.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 18:00:24,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.22 | bwd_microstep: 1611.51 | bwd_inner_microstep: 1611.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-10 18:00:26,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.27 | bwd_microstep: 1529.67 | bwd_inner_microstep: 1529.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3734
[2024-06-10 18:00:28,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1457.45 | bwd_inner_microstep: 1457.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 18:00:30,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.59 | bwd_microstep: 1354.71 | bwd_inner_microstep: 1354.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 18:00:32,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.53 | bwd_microstep: 1491.12 | bwd_inner_microstep: 1491.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2174
[2024-06-10 18:00:33,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.44 | bwd_microstep: 887.79 | bwd_inner_microstep: 887.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2202
[2024-06-10 18:00:35,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.74 | bwd_microstep: 1053.89 | bwd_inner_microstep: 1053.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 18:00:37,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.30 | bwd_microstep: 1640.77 | bwd_inner_microstep: 1640.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3767
[2024-06-10 18:00:42,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.48 | optimizer_step: 6.62
[2024-06-10 18:00:42,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.59 | bwd_microstep: 4619.59 | bwd_inner_microstep: 1915.65 | bwd_allreduce_microstep: 2703.85 | step_microstep: 42.79
[2024-06-10 18:00:42,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15712.68 | bwd: 44934.24 | bwd_inner: 42229.44 | bwd_allreduce: 2704.10 | step: 44.26
{'loss': 1.1875, 'learning_rate': 1.5607103731063708e-05, 'epoch': 0.58}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3493
[2024-06-10 18:00:44,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.28 | bwd_microstep: 1336.43 | bwd_inner_microstep: 1336.32 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 18:00:46,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1276.56 | bwd_inner_microstep: 1276.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1935
[2024-06-10 18:00:47,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.37 | bwd_microstep: 756.19 | bwd_inner_microstep: 756.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3835
[2024-06-10 18:00:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.64 | bwd_microstep: 1602.33 | bwd_inner_microstep: 1602.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-10 18:00:51,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.80 | bwd_microstep: 1314.62 | bwd_inner_microstep: 1314.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 18:00:53,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 18:00:54,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 793.79 | bwd_inner_microstep: 793.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 18:00:56,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.75 | bwd_microstep: 1148.71 | bwd_inner_microstep: 1148.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 18:00:57,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 18:00:59,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1516.81 | bwd_inner_microstep: 1516.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3651
[2024-06-10 18:01:01,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.64 | bwd_microstep: 1471.58 | bwd_inner_microstep: 1471.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-10 18:01:04,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.01 | bwd_microstep: 1602.06 | bwd_inner_microstep: 1602.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 18:01:06,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1391.10 | bwd_inner_microstep: 1391.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 18:01:07,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1245.78 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 18:01:09,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.29 | bwd_microstep: 1386.21 | bwd_inner_microstep: 1386.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 18:01:11,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1485.35 | bwd_inner_microstep: 1485.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 18:01:13,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1513.29 | bwd_inner_microstep: 1513.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 18:01:15,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1390.53 | bwd_inner_microstep: 1390.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 18:01:17,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1558.23 | bwd_inner_microstep: 1558.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 18:01:19,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.24 | bwd_microstep: 1495.50 | bwd_inner_microstep: 1495.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 18:01:21,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1458.78 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2140
[2024-06-10 18:01:23,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.52 | bwd_microstep: 738.08 | bwd_inner_microstep: 738.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3832
[2024-06-10 18:01:24,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.39 | bwd_microstep: 1295.71 | bwd_inner_microstep: 1295.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 18:01:26,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1383.12 | bwd_inner_microstep: 1383.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2273
[2024-06-10 18:01:28,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.80 | bwd_microstep: 1005.36 | bwd_inner_microstep: 1005.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3606
[2024-06-10 18:01:30,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.22 | bwd_microstep: 1370.81 | bwd_inner_microstep: 1370.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 18:01:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1494.07 | bwd_inner_microstep: 1494.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2922
[2024-06-10 18:01:33,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.46 | bwd_microstep: 1187.24 | bwd_inner_microstep: 1187.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3381
[2024-06-10 18:01:35,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1436.12 | bwd_inner_microstep: 1436.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 18:01:37,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1286.10 | bwd_inner_microstep: 1286.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 18:01:39,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1379.36 | bwd_inner_microstep: 1379.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 18:01:45,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 18:01:45,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 4969.03 | bwd_inner_microstep: 1866.52 | bwd_allreduce_microstep: 3102.46 | step_microstep: 38.07
[2024-06-10 18:01:45,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15957.28 | bwd: 45861.08 | bwd_inner: 42757.61 | bwd_allreduce: 3102.74 | step: 39.57
{'loss': 1.1487, 'learning_rate': 1.5570494108862256e-05, 'epoch': 0.58}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 18:01:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.34 | bwd_microstep: 1467.89 | bwd_inner_microstep: 1467.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4012
[2024-06-10 18:01:49,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.62 | bwd_microstep: 1605.64 | bwd_inner_microstep: 1605.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 18:01:51,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.60 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 18:01:53,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.80 | bwd_microstep: 1549.33 | bwd_inner_microstep: 1549.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 18:01:55,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1534.16 | bwd_inner_microstep: 1534.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 18:01:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.36 | bwd_microstep: 808.61 | bwd_inner_microstep: 808.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508
[2024-06-10 18:01:58,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.48 | bwd_microstep: 1219.93 | bwd_inner_microstep: 1219.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2209
[2024-06-10 18:01:59,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.41 | bwd_microstep: 954.48 | bwd_inner_microstep: 954.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 18:02:01,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1386.06 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 18:02:03,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 18:02:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.26 | bwd_microstep: 1282.43 | bwd_inner_microstep: 1282.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 18:02:07,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1348.74 | bwd_inner_microstep: 1348.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3664
[2024-06-10 18:02:09,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.76 | bwd_microstep: 1655.73 | bwd_inner_microstep: 1655.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563
[2024-06-10 18:02:11,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.14 | bwd_microstep: 1457.34 | bwd_inner_microstep: 1457.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-10 18:02:13,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.13 | bwd_microstep: 1369.29 | bwd_inner_microstep: 1369.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 18:02:15,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.54 | bwd_microstep: 1344.81 | bwd_inner_microstep: 1344.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 18:02:17,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.07 | bwd_microstep: 1192.96 | bwd_inner_microstep: 1192.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523
[2024-06-10 18:02:18,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1228.61 | bwd_inner_microstep: 1228.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-10 18:02:21,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.00 | bwd_microstep: 1639.77 | bwd_inner_microstep: 1639.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3627
[2024-06-10 18:02:23,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.22 | bwd_microstep: 1574.44 | bwd_inner_microstep: 1574.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 18:02:25,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.71 | bwd_microstep: 1488.72 | bwd_inner_microstep: 1488.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-10 18:02:27,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.89 | bwd_microstep: 1342.59 | bwd_inner_microstep: 1342.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 18:02:29,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1495.62 | bwd_inner_microstep: 1495.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-10 18:02:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.05 | bwd_microstep: 877.90 | bwd_inner_microstep: 877.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 18:02:32,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1405.45 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 18:02:34,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1498.16 | bwd_inner_microstep: 1498.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 18:02:36,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.28 | bwd_microstep: 1450.15 | bwd_inner_microstep: 1450.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 18:02:38,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1289.28 | bwd_inner_microstep: 1289.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2269
[2024-06-10 18:02:39,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.93 | bwd_microstep: 1005.79 | bwd_inner_microstep: 1005.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3733
[2024-06-10 18:02:41,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.46 | bwd_microstep: 1737.37 | bwd_inner_microstep: 1737.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-10 18:02:43,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.93 | bwd_microstep: 1299.77 | bwd_inner_microstep: 1299.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576
[2024-06-10 18:02:47,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.17 | optimizer_step: 6.57
[2024-06-10 18:02:47,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.68 | bwd_microstep: 2819.09 | bwd_inner_microstep: 1708.86 | bwd_allreduce_microstep: 1110.18 | step_microstep: 37.91
[2024-06-10 18:02:47,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16421.29 | bwd: 45374.20 | bwd_inner: 44263.12 | bwd_allreduce: 1110.40 | step: 39.37
{'loss': 1.2081, 'learning_rate': 1.5533900087357527e-05, 'epoch': 0.58}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 18:02:48,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.28 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 18:02:50,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1408.26 | bwd_inner_microstep: 1408.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900
[2024-06-10 18:02:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1584.79 | bwd_inner_microstep: 1584.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-10 18:02:53,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.72 | bwd_microstep: 706.44 | bwd_inner_microstep: 706.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 18:02:55,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.33 | bwd_microstep: 790.00 | bwd_inner_microstep: 789.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 18:02:57,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1483.99 | bwd_inner_microstep: 1483.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569
[2024-06-10 18:02:58,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1204.73 | bwd_inner_microstep: 1204.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 18:03:00,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1532.20 | bwd_inner_microstep: 1532.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 18:03:03,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1632.23 | bwd_inner_microstep: 1632.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 18:03:05,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.92 | bwd_microstep: 1390.62 | bwd_inner_microstep: 1390.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 18:03:07,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1529.80 | bwd_inner_microstep: 1529.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-10 18:03:09,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.91 | bwd_microstep: 1413.92 | bwd_inner_microstep: 1413.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 18:03:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.62 | bwd_microstep: 1439.37 | bwd_inner_microstep: 1439.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 18:03:12,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1256.08 | bwd_inner_microstep: 1256.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2150
[2024-06-10 18:03:14,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.26 | bwd_microstep: 1045.32 | bwd_inner_microstep: 1045.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3635
[2024-06-10 18:03:16,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.22 | bwd_microstep: 1424.17 | bwd_inner_microstep: 1424.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3427
[2024-06-10 18:03:18,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1851
[2024-06-10 18:03:19,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.44 | bwd_microstep: 671.93 | bwd_inner_microstep: 671.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 18:03:20,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.32 | bwd_microstep: 1439.85 | bwd_inner_microstep: 1439.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 18:03:22,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1353.44 | bwd_inner_microstep: 1353.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 18:03:24,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.49 | bwd_microstep: 1364.66 | bwd_inner_microstep: 1364.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3440
[2024-06-10 18:03:26,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1425.79 | bwd_inner_microstep: 1425.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2008
[2024-06-10 18:03:27,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.16 | bwd_microstep: 772.53 | bwd_inner_microstep: 772.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 18:03:29,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.26 | bwd_microstep: 974.64 | bwd_inner_microstep: 974.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 18:03:31,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1455.29 | bwd_inner_microstep: 1455.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 18:03:33,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.82 | bwd_microstep: 1544.99 | bwd_inner_microstep: 1544.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969
[2024-06-10 18:03:34,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.73 | bwd_microstep: 703.60 | bwd_inner_microstep: 703.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 18:03:35,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.77 | bwd_microstep: 978.13 | bwd_inner_microstep: 978.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 18:03:37,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.04 | bwd_microstep: 1587.72 | bwd_inner_microstep: 1587.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 18:03:39,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 18:03:41,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.62 | bwd_microstep: 1647.10 | bwd_inner_microstep: 1647.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 18:03:47,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 18:03:47,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.66 | bwd_microstep: 5698.04 | bwd_inner_microstep: 929.83 | bwd_allreduce_microstep: 4768.16 | step_microstep: 37.92
[2024-06-10 18:03:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15116.87 | bwd: 45362.38 | bwd_inner: 40593.32 | bwd_allreduce: 4768.39 | step: 39.35
{'loss': 1.1692, 'learning_rate': 1.5497321795433474e-05, 'epoch': 0.59}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481
[2024-06-10 18:03:50,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.72 | bwd_microstep: 1570.19 | bwd_inner_microstep: 1570.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3975
[2024-06-10 18:03:52,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.11 | bwd_microstep: 1601.90 | bwd_inner_microstep: 1601.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 18:03:53,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.23 | bwd_microstep: 694.20 | bwd_inner_microstep: 694.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 18:03:55,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.09 | bwd_microstep: 1406.05 | bwd_inner_microstep: 1406.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 18:03:56,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 797.09 | bwd_inner_microstep: 797.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 18:03:58,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.39 | bwd_microstep: 1243.81 | bwd_inner_microstep: 1243.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426
[2024-06-10 18:03:59,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.50 | bwd_microstep: 1186.43 | bwd_inner_microstep: 1186.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 18:04:01,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.40 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3438
[2024-06-10 18:04:03,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.58 | bwd_microstep: 1188.92 | bwd_inner_microstep: 1188.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3578
[2024-06-10 18:04:04,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1270.49 | bwd_inner_microstep: 1270.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2062
[2024-06-10 18:04:06,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.52 | bwd_microstep: 863.44 | bwd_inner_microstep: 863.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 18:04:08,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1412.10 | bwd_inner_microstep: 1412.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1985
[2024-06-10 18:04:09,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.62 | bwd_microstep: 894.93 | bwd_inner_microstep: 894.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 18:04:11,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1401.41 | bwd_inner_microstep: 1401.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652
[2024-06-10 18:04:13,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.97 | bwd_microstep: 1447.64 | bwd_inner_microstep: 1447.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 18:04:15,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.19 | bwd_microstep: 1386.33 | bwd_inner_microstep: 1386.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 18:04:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1486.97 | bwd_inner_microstep: 1486.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 18:04:19,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.20 | bwd_microstep: 1603.49 | bwd_inner_microstep: 1603.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3398
[2024-06-10 18:04:21,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.53 | bwd_microstep: 1402.37 | bwd_inner_microstep: 1402.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 18:04:23,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.37 | bwd_microstep: 1358.31 | bwd_inner_microstep: 1358.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 18:04:25,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.70 | bwd_microstep: 1295.65 | bwd_inner_microstep: 1295.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3815
[2024-06-10 18:04:27,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.84 | bwd_microstep: 1616.81 | bwd_inner_microstep: 1616.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 18:04:29,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1399.05 | bwd_inner_microstep: 1399.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 18:04:31,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1409.11 | bwd_inner_microstep: 1409.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-10 18:04:33,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.94 | bwd_microstep: 1561.20 | bwd_inner_microstep: 1561.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275
[2024-06-10 18:04:34,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.83 | bwd_microstep: 880.70 | bwd_inner_microstep: 880.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 18:04:36,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.17 | bwd_microstep: 1418.27 | bwd_inner_microstep: 1418.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3446
[2024-06-10 18:04:38,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.55 | bwd_microstep: 1312.85 | bwd_inner_microstep: 1312.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-10 18:04:40,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.71 | bwd_microstep: 1480.46 | bwd_inner_microstep: 1480.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-10 18:04:42,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.62 | bwd_microstep: 1600.47 | bwd_inner_microstep: 1600.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3569
[2024-06-10 18:04:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1555.10 | bwd_inner_microstep: 1555.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-10 18:04:51,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.31 | optimizer_step: 6.59
[2024-06-10 18:04:51,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 6201.67 | bwd_inner_microstep: 1607.67 | bwd_allreduce_microstep: 4593.94 | step_microstep: 38.40
[2024-06-10 18:04:51,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15910.89 | bwd: 47232.38 | bwd_inner: 42637.52 | bwd_allreduce: 4594.17 | step: 39.82
 1006/1726 [17:22:18<13:50:18, 69.19s/it]


 58%|█████▊    | 1006/1726 [17:22:18<13:50:18, 69.19s/it]
 58%|█████▊    | 1007/1726 [17:23:19<13:19:40, 66.73s/it]


 58%|█████▊    | 1007/1726 [17:23:19<13:19:40, 66.73s/it]
 58%|█████▊    | 1008/1726 [17:24:21<13:02:06, 65.36s/it]


 58%|█████▊    | 1008/1726 [17:24:21<13:02:06, 65.36s/it]
 58%|█████▊    | 1009/1726 [17:25:23<12:49:26, 64.39s/it]


 58%|█████▊    | 1009/1726 [17:25:23<12:49:26, 64.39s/it]
 59%|█████▊    | 1010/1726 [17:26:24<12:35:32, 63.31s/it]


 59%|█████▊    | 1010/1726 [17:26:24<12:35:32, 63.31s/it]
 59%|█████▊    | 1011/1726 [17:27:28<12:35:03{'loss': 1.201, 'learning_rate': 1.546075936191866e-05, 'epoch': 0.59}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 18:04:53,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1432.27 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887
[2024-06-10 18:04:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.69 | bwd_microstep: 1578.97 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 18:04:57,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.55 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3789
[2024-06-10 18:04:59,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.85 | bwd_microstep: 1349.54 | bwd_inner_microstep: 1349.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 18:05:01,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.82 | bwd_microstep: 1649.91 | bwd_inner_microstep: 1649.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2054
[2024-06-10 18:05:02,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.19 | bwd_microstep: 784.31 | bwd_inner_microstep: 784.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3475
[2024-06-10 18:05:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.36 | bwd_microstep: 1243.82 | bwd_inner_microstep: 1243.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 18:05:05,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 793.25 | bwd_inner_microstep: 793.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 18:05:07,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.28 | bwd_microstep: 1628.33 | bwd_inner_microstep: 1628.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 18:05:09,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.84 | bwd_microstep: 1160.70 | bwd_inner_microstep: 1160.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 18:05:11,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.38 | bwd_microstep: 1159.57 | bwd_inner_microstep: 1159.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3985
[2024-06-10 18:05:13,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1508.09 | bwd_inner_microstep: 1508.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-10 18:05:14,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.75 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891
[2024-06-10 18:05:16,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.37 | bwd_microstep: 774.75 | bwd_inner_microstep: 774.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3995
[2024-06-10 18:05:18,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.18 | bwd_microstep: 1702.43 | bwd_inner_microstep: 1702.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2478
[2024-06-10 18:05:19,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.20 | bwd_microstep: 1016.57 | bwd_inner_microstep: 1016.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2402
[2024-06-10 18:05:21,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.50 | bwd_microstep: 1098.93 | bwd_inner_microstep: 1098.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 18:05:23,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.08 | bwd_microstep: 1372.40 | bwd_inner_microstep: 1372.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 18:05:25,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.39 | bwd_microstep: 1387.71 | bwd_inner_microstep: 1387.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545
[2024-06-10 18:05:26,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1360.65 | bwd_inner_microstep: 1360.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 18:05:29,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.70 | bwd_microstep: 1644.71 | bwd_inner_microstep: 1644.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 18:05:31,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1397.03 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 18:05:33,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1344.36 | bwd_inner_microstep: 1344.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3818
[2024-06-10 18:05:35,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.27 | bwd_microstep: 1747.71 | bwd_inner_microstep: 1747.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3634
[2024-06-10 18:05:37,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1456.93 | bwd_inner_microstep: 1456.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 18:05:39,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.20 | bwd_microstep: 1456.45 | bwd_inner_microstep: 1456.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2076
[2024-06-10 18:05:40,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 791.29 | bwd_inner_microstep: 791.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2045
[2024-06-10 18:05:41,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.23 | bwd_microstep: 750.50 | bwd_inner_microstep: 750.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3797
[2024-06-10 18:05:43,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.62 | bwd_microstep: 1356.19 | bwd_inner_microstep: 1356.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 18:05:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.87 | bwd_microstep: 1559.28 | bwd_inner_microstep: 1559.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 18:05:47,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1313.27 | bwd_inner_microstep: 1313.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 18:05:52,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 18:05:52,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.76 | bwd_microstep: 4440.24 | bwd_inner_microstep: 1437.85 | bwd_allreduce_microstep: 3002.34 | step_microstep: 38.29
[2024-06-10 18:05:52,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15688.45 | bwd: 44992.17 | bwd_inner: 41988.80 | bwd_allreduce: 3002.64 | step: 39.85
{'loss': 1.1801, 'learning_rate': 1.5424212915585766e-05, 'epoch': 0.59}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 18:05:54,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.43 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3938
[2024-06-10 18:05:56,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1556.27 | bwd_inner_microstep: 1556.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3423
[2024-06-10 18:05:58,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.01 | bwd_microstep: 1375.55 | bwd_inner_microstep: 1375.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 18:06:00,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.64 | bwd_microstep: 1378.69 | bwd_inner_microstep: 1378.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 18:06:02,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1479.51 | bwd_inner_microstep: 1479.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 18:06:04,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.30 | bwd_microstep: 1387.51 | bwd_inner_microstep: 1387.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-10 18:06:06,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1431.21 | bwd_inner_microstep: 1431.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 18:06:08,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1385.61 | bwd_inner_microstep: 1385.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 18:06:09,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 18:06:11,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1389.29 | bwd_inner_microstep: 1389.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1958
[2024-06-10 18:06:12,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.45 | bwd_microstep: 766.10 | bwd_inner_microstep: 766.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2653
[2024-06-10 18:06:14,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.39 | bwd_microstep: 1113.61 | bwd_inner_microstep: 1113.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 18:06:16,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.63 | bwd_microstep: 1353.21 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 18:06:18,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.71 | bwd_microstep: 1195.16 | bwd_inner_microstep: 1195.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 18:06:19,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1251.31 | bwd_inner_microstep: 1251.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 18:06:21,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1394.96 | bwd_inner_microstep: 1394.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 18:06:22,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.39 | bwd_microstep: 698.07 | bwd_inner_microstep: 698.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3837
[2024-06-10 18:06:24,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.62 | bwd_microstep: 1360.24 | bwd_inner_microstep: 1360.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-10 18:06:26,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.53 | bwd_microstep: 1522.28 | bwd_inner_microstep: 1522.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 18:06:28,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1392.64 | bwd_inner_microstep: 1392.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 18:06:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963
[2024-06-10 18:06:31,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.51 | bwd_microstep: 703.73 | bwd_inner_microstep: 703.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 18:06:33,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1349.28 | bwd_inner_microstep: 1349.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530
[2024-06-10 18:06:35,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1424.71 | bwd_inner_microstep: 1424.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 18:06:36,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 18:06:38,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1428.17 | bwd_inner_microstep: 1428.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 18:06:40,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1283.00 | bwd_inner_microstep: 1282.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3801
[2024-06-10 18:06:42,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1389.75 | bwd_inner_microstep: 1389.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-10 18:06:44,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.93 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3418
[2024-06-10 18:06:46,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.95 | bwd_microstep: 1540.68 | bwd_inner_microstep: 1540.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3764
[2024-06-10 18:06:48,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.54 | bwd_microstep: 1610.89 | bwd_inner_microstep: 1610.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823
[2024-06-10 18:06:53,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-10 18:06:53,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.45 | bwd_microstep: 3512.29 | bwd_inner_microstep: 1707.41 | bwd_allreduce_microstep: 1804.82 | step_microstep: 39.22
[2024-06-10 18:06:53,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15907.17 | bwd: 44370.71 | bwd_inner: 42564.96 | bwd_allreduce: 1805.05 | step: 40.86
{'loss': 1.2534, 'learning_rate': 1.53876825851512e-05, 'epoch': 0.59}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 18:06:54,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1274.18 | bwd_inner_microstep: 1273.98 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3919
[2024-06-10 18:06:56,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 18:06:59,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.18 | bwd_microstep: 1539.89 | bwd_inner_microstep: 1539.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 18:07:00,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.41 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 18:07:02,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1382.81 | bwd_inner_microstep: 1382.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 18:07:04,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.29 | bwd_microstep: 1283.32 | bwd_inner_microstep: 1283.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 18:07:05,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.97 | bwd_microstep: 709.62 | bwd_inner_microstep: 709.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3421
[2024-06-10 18:07:07,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1396.47 | bwd_inner_microstep: 1396.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3515
[2024-06-10 18:07:09,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1429.26 | bwd_inner_microstep: 1429.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1862
[2024-06-10 18:07:10,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.46 | bwd_microstep: 706.58 | bwd_inner_microstep: 706.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-10 18:07:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 1441.68 | bwd_inner_microstep: 1441.59 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.16
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 18:07:14,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.31 | bwd_microstep: 1491.67 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 18:07:15,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 795.60 | bwd_inner_microstep: 795.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 18:07:17,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1408.35 | bwd_inner_microstep: 1408.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612
[2024-06-10 18:07:19,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1313.79 | bwd_inner_microstep: 1313.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 18:07:21,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.78 | bwd_microstep: 1302.46 | bwd_inner_microstep: 1302.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2085
[2024-06-10 18:07:22,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.52 | bwd_microstep: 918.13 | bwd_inner_microstep: 918.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 18:07:24,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1296.19 | bwd_inner_microstep: 1296.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-10 18:07:25,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.67 | bwd_microstep: 1158.09 | bwd_inner_microstep: 1158.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 18:07:27,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1501.52 | bwd_inner_microstep: 1501.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 18:07:29,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1385.97 | bwd_inner_microstep: 1385.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 18:07:31,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1462.94 | bwd_inner_microstep: 1462.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 18:07:33,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.30 | bwd_microstep: 1186.06 | bwd_inner_microstep: 1186.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3422
[2024-06-10 18:07:35,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.26 | bwd_microstep: 1296.46 | bwd_inner_microstep: 1296.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012
[2024-06-10 18:07:36,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.38 | bwd_microstep: 711.58 | bwd_inner_microstep: 711.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2185
[2024-06-10 18:07:37,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.25 | bwd_microstep: 834.84 | bwd_inner_microstep: 834.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 18:07:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1325.44 | bwd_inner_microstep: 1325.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581
[2024-06-10 18:07:41,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1373.21 | bwd_inner_microstep: 1373.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3733
[2024-06-10 18:07:43,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.13 | bwd_microstep: 1339.99 | bwd_inner_microstep: 1339.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-10 18:07:44,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1346.88 | bwd_inner_microstep: 1346.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2960
[2024-06-10 18:07:46,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.11 | bwd_microstep: 1249.23 | bwd_inner_microstep: 1249.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385
[2024-06-10 18:07:54,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.60 | optimizer_step: 6.60
[2024-06-10 18:07:54,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 6824.60 | bwd_inner_microstep: 1631.82 | bwd_allreduce_microstep: 5192.66 | step_microstep: 41.50
[2024-06-10 18:07:54,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15128.92 | bwd: 45559.57 | bwd_inner: 40365.70 | bwd_allreduce: 5193.07 | step: 43.29
{'loss': 1.1768, 'learning_rate': 1.5351168499274588e-05, 'epoch': 0.59}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 18:07:56,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 18:07:58,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.84 | bwd_microstep: 1504.29 | bwd_inner_microstep: 1504.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 18:07:59,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.94 | bwd_microstep: 1344.22 | bwd_inner_microstep: 1344.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-10 18:08:02,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.33 | bwd_microstep: 1538.30 | bwd_inner_microstep: 1538.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 18:08:04,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.40 | bwd_microstep: 1435.57 | bwd_inner_microstep: 1435.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 18:08:06,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769
[2024-06-10 18:08:08,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1472.73 | bwd_inner_microstep: 1472.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 18:08:09,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.88 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 18:08:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1387.76 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 18:08:13,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1250.32 | bwd_inner_microstep: 1250.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-10 18:08:15,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1437.27 | bwd_inner_microstep: 1437.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 18:08:17,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1380.16 | bwd_inner_microstep: 1380.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2152
[2024-06-10 18:08:18,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.30 | bwd_microstep: 1044.82 | bwd_inner_microstep: 1044.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 18:08:20,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1479.35 | bwd_inner_microstep: 1479.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-10 18:08:23,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.91 | bwd_microstep: 1519.12 | bwd_inner_microstep: 1519.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448
[2024-06-10 18:08:25,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.63 | bwd_microstep: 1452.34 | bwd_inner_microstep: 1452.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-10 18:08:27,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1411.35 | bwd_inner_microstep: 1411.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-10 18:08:29,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1454.30 | bwd_inner_microstep: 1454.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 18:08:30,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.41 | bwd_microstep: 1183.60 | bwd_inner_microstep: 1183.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 18:08:32,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.87 | bwd_microstep: 1619.22 | bwd_inner_microstep: 1619.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1976
[2024-06-10 18:08:33,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.46 | bwd_microstep: 766.75 | bwd_inner_microstep: 766.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1989
[2024-06-10 18:08:34,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.73 | bwd_microstep: 708.36 | bwd_inner_microstep: 708.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 18:08:36,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.77 | bwd_microstep: 1219.59 | bwd_inner_microstep: 1219.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-10 18:08:38,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.83 | bwd_microstep: 1329.01 | bwd_inner_microstep: 1328.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2247
[2024-06-10 18:08:39,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.25 | bwd_microstep: 872.13 | bwd_inner_microstep: 872.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-10 18:08:40,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.91 | bwd_microstep: 806.11 | bwd_inner_microstep: 806.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 18:08:42,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1302.29 | bwd_inner_microstep: 1302.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 18:08:44,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.31 | bwd_microstep: 1260.65 | bwd_inner_microstep: 1260.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 18:08:46,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.50 | bwd_microstep: 1607.44 | bwd_inner_microstep: 1607.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3423
[2024-06-10 18:08:48,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.91 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-10 18:08:50,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.43 | bwd_microstep: 1515.29 | bwd_inner_microstep: 1515.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3807
[2024-06-10 18:08:55,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-10 18:08:55,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.06 | bwd_microstep: 4332.08 | bwd_inner_microstep: 1913.43 | bwd_allreduce_microstep: 2418.58 | step_microstep: 39.34
[2024-06-10 18:08:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15967.56 | bwd: 45302.46 | bwd_inner: 42882.92 | bwd_allreduce: 2418.82 | step: 41.07
{'loss': 1.2218, 'learning_rate': 1.5314670786558358e-05, 'epoch': 0.59}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 18:08:57,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1490.71 | bwd_inner_microstep: 1490.60 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 18:08:58,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.09 | bwd_microstep: 777.36 | bwd_inner_microstep: 777.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2312
[2024-06-10 18:09:00,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.10 | bwd_microstep: 916.00 | bwd_inner_microstep: 915.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 18:09:02,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.46 | bwd_microstep: 1552.10 | bwd_inner_microstep: 1552.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 18:09:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.03 | bwd_microstep: 1538.44 | bwd_inner_microstep: 1538.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2234
[2024-06-10 18:09:05,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.22 | bwd_microstep: 959.11 | bwd_inner_microstep: 959.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2239
[2024-06-10 18:09:06,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.84 | bwd_microstep: 866.47 | bwd_inner_microstep: 866.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 18:09:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-10 18:09:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.45 | bwd_microstep: 1191.03 | bwd_inner_microstep: 1191.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 18:09:12,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.83 | bwd_microstep: 1248.12 | bwd_inner_microstep: 1248.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 18:09:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1392.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3506
[2024-06-10 18:09:15,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1460.41 | bwd_inner_microstep: 1460.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 18:09:17,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 18:09:19,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.09 | bwd_microstep: 1378.05 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1935
[2024-06-10 18:09:20,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.64 | bwd_microstep: 824.60 | bwd_inner_microstep: 824.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2683
[2024-06-10 18:09:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1200.45 | bwd_inner_microstep: 1200.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2172
[2024-06-10 18:09:24,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.83 | bwd_microstep: 1051.80 | bwd_inner_microstep: 1051.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 18:09:25,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818
[2024-06-10 18:09:28,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.12 | bwd_microstep: 1624.53 | bwd_inner_microstep: 1624.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2123
[2024-06-10 18:09:29,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.26 | bwd_microstep: 1026.57 | bwd_inner_microstep: 1026.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 18:09:31,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1558.60 | bwd_inner_microstep: 1558.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 18:09:33,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1257.71 | bwd_inner_microstep: 1257.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 18:09:35,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.66 | bwd_microstep: 1657.48 | bwd_inner_microstep: 1657.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 18:09:37,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.77 | bwd_microstep: 1416.31 | bwd_inner_microstep: 1416.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 18:09:39,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.07 | bwd_microstep: 1358.36 | bwd_inner_microstep: 1358.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 18:09:41,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1557.45 | bwd_inner_microstep: 1557.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 18:09:43,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1414.65 | bwd_inner_microstep: 1414.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942
[2024-06-10 18:09:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.14 | bwd_microstep: 725.86 | bwd_inner_microstep: 725.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822
[2024-06-10 18:09:46,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1418.87 | bwd_inner_microstep: 1418.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 18:09:48,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.29 | bwd_microstep: 1533.99 | bwd_inner_microstep: 1533.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 18:09:50,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1439.90 | bwd_inner_microstep: 1439.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 18:09:59,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.14 | optimizer_step: 6.58
[2024-06-10 18:09:59,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.29 | bwd_microstep: 8110.07 | bwd_inner_microstep: 1809.72 | bwd_allreduce_microstep: 6300.29 | step_microstep: 38.96
[2024-06-10 18:09:59,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15457.31 | bwd: 47931.84 | bwd_inner: 41630.53 | bwd_allreduce: 6300.58 | step: 40.52
, 63.36s/it]


 59%|█████▊    | 1011/1726 [17:27:28<12:35:03, 63.36s/it]
 59%|█████▊    | 1012/1726 [17:28:29<12:25:37, 62.66s/it]


 59%|█████▊    | 1012/1726 [17:28:29<12:25:37, 62.66s/it]
 59%|█████▊    | 1013/1726 [17:29:29<12:17:19, 62.05s/it]


 59%|█████▊    | 1013/1726 [17:29:29<12:17:19, 62.05s/it]
 59%|█████▊    | 1014/1726 [17:30:30<12:12:42, 61.75s/it]


 59%|█████▊    | 1014/1726 [17:30:30<12:12:42, 61.75s/it]
 59%|█████▉    | 1015/1726 [17:31:32<12:11:15, 61.71s/it]


 59%|█████▉    | 1015/1726 [17:31:32<12:11:15, 61.71s/it]
 59%|█████▉    | 1016/1726 [17:32:36<12:17:24, 62.32s/it]
                {'loss': 1.2499, 'learning_rate': 1.5278189575547265e-05, 'epoch': 0.59}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 18:10:01,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.46 | bwd_microstep: 1237.22 | bwd_inner_microstep: 1237.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2434
[2024-06-10 18:10:02,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.68 | bwd_microstep: 1008.46 | bwd_inner_microstep: 1008.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 18:10:04,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1250.53 | bwd_inner_microstep: 1250.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 18:10:06,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.03 | bwd_microstep: 1275.99 | bwd_inner_microstep: 1275.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 18:10:07,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 795.04 | bwd_inner_microstep: 795.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 18:10:09,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1493.86 | bwd_inner_microstep: 1493.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047
[2024-06-10 18:10:10,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.53 | bwd_microstep: 810.55 | bwd_inner_microstep: 810.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 18:10:12,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.89 | bwd_microstep: 1538.93 | bwd_inner_microstep: 1538.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2156
[2024-06-10 18:10:13,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.62 | bwd_microstep: 1005.82 | bwd_inner_microstep: 1005.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3727
[2024-06-10 18:10:16,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.33 | bwd_microstep: 1681.25 | bwd_inner_microstep: 1681.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 18:10:18,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1346.34 | bwd_inner_microstep: 1346.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 18:10:20,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1484.93 | bwd_inner_microstep: 1484.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975
[2024-06-10 18:10:21,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.93 | bwd_microstep: 763.41 | bwd_inner_microstep: 763.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 18:10:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.93 | bwd_microstep: 1647.77 | bwd_inner_microstep: 1647.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 18:10:25,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.94 | bwd_microstep: 1614.27 | bwd_inner_microstep: 1614.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 18:10:27,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1337.59 | bwd_inner_microstep: 1337.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 18:10:28,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.25 | bwd_microstep: 971.58 | bwd_inner_microstep: 971.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 18:10:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1349.60 | bwd_inner_microstep: 1349.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 18:10:32,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1475.59 | bwd_inner_microstep: 1475.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 18:10:34,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.17 | bwd_microstep: 1516.97 | bwd_inner_microstep: 1516.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 18:10:36,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1372.81 | bwd_inner_microstep: 1372.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 18:10:38,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.26 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720
[2024-06-10 18:10:40,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.84 | bwd_microstep: 1335.94 | bwd_inner_microstep: 1335.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 18:10:42,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.81 | bwd_microstep: 1654.83 | bwd_inner_microstep: 1654.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 18:10:44,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1599.02 | bwd_inner_microstep: 1599.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909
[2024-06-10 18:10:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.25 | bwd_microstep: 689.72 | bwd_inner_microstep: 689.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 18:10:47,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 1301.43 | bwd_inner_microstep: 1301.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447
[2024-06-10 18:10:49,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.45 | bwd_microstep: 1379.59 | bwd_inner_microstep: 1379.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 18:10:51,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.58 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 18:10:53,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1412.39 | bwd_inner_microstep: 1412.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932
[2024-06-10 18:10:54,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.69 | bwd_microstep: 696.99 | bwd_inner_microstep: 696.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 18:11:01,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.12 | optimizer_step: 6.64
[2024-06-10 18:11:01,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.49 | bwd_microstep: 6806.35 | bwd_inner_microstep: 2177.32 | bwd_allreduce_microstep: 4628.98 | step_microstep: 38.06
[2024-06-10 18:11:01,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15518.75 | bwd: 46637.33 | bwd_inner: 42007.45 | bwd_allreduce: 4629.20 | step: 39.64
{'loss': 1.188, 'learning_rate': 1.5241724994727933e-05, 'epoch': 0.59}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3386
[2024-06-10 18:11:03,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.38 | bwd_microstep: 1232.09 | bwd_inner_microstep: 1231.84 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 18:11:04,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.10 | bwd_microstep: 788.11 | bwd_inner_microstep: 788.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857
[2024-06-10 18:11:07,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.68 | bwd_microstep: 1656.71 | bwd_inner_microstep: 1656.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2261
[2024-06-10 18:11:08,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.62 | bwd_microstep: 869.19 | bwd_inner_microstep: 869.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 18:11:10,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1376.83 | bwd_inner_microstep: 1376.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 18:11:12,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1545.74 | bwd_inner_microstep: 1545.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586
[2024-06-10 18:11:13,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.62 | bwd_microstep: 1208.79 | bwd_inner_microstep: 1208.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 18:11:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.73 | bwd_microstep: 1628.95 | bwd_inner_microstep: 1628.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705
[2024-06-10 18:11:18,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.57 | bwd_microstep: 1526.46 | bwd_inner_microstep: 1526.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181
[2024-06-10 18:11:19,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.02 | bwd_microstep: 952.41 | bwd_inner_microstep: 952.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3903
[2024-06-10 18:11:21,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1558.02 | bwd_inner_microstep: 1557.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 18:11:23,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1244.13 | bwd_inner_microstep: 1244.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2130
[2024-06-10 18:11:24,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.82 | bwd_microstep: 928.95 | bwd_inner_microstep: 928.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 18:11:26,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.41 | bwd_microstep: 1444.84 | bwd_inner_microstep: 1444.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896
[2024-06-10 18:11:27,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.57 | bwd_microstep: 776.11 | bwd_inner_microstep: 776.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 18:11:29,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1482.28 | bwd_inner_microstep: 1482.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-10 18:11:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 699.48 | bwd_inner_microstep: 699.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 18:11:32,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1276.88 | bwd_inner_microstep: 1276.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287
[2024-06-10 18:11:33,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.71 | bwd_microstep: 877.81 | bwd_inner_microstep: 877.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 18:11:35,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.92 | bwd_microstep: 1258.11 | bwd_inner_microstep: 1258.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 18:11:37,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1295.76 | bwd_inner_microstep: 1295.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 18:11:39,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.41 | bwd_microstep: 1416.97 | bwd_inner_microstep: 1416.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 18:11:41,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 1383.01 | bwd_inner_microstep: 1382.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-10 18:11:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.88 | bwd_microstep: 708.74 | bwd_inner_microstep: 708.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 18:11:44,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.33 | bwd_microstep: 1256.13 | bwd_inner_microstep: 1256.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 18:11:46,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 1450.32 | bwd_inner_microstep: 1450.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-10 18:11:48,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.56 | bwd_microstep: 1543.44 | bwd_inner_microstep: 1543.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3608
[2024-06-10 18:11:50,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.74 | bwd_microstep: 1534.35 | bwd_inner_microstep: 1534.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3776
[2024-06-10 18:11:52,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.52 | bwd_microstep: 1847.75 | bwd_inner_microstep: 1847.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3850
[2024-06-10 18:11:54,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.18 | bwd_microstep: 1519.94 | bwd_inner_microstep: 1519.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 18:11:56,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 18:12:01,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.64
[2024-06-10 18:12:01,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.74 | bwd_microstep: 4255.21 | bwd_inner_microstep: 1868.70 | bwd_allreduce_microstep: 2386.46 | step_microstep: 38.97
[2024-06-10 18:12:01,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.41 | bwd: 43797.38 | bwd_inner: 41409.81 | bwd_allreduce: 2386.78 | step: 40.61
{'loss': 1.1992, 'learning_rate': 1.5205277172528438e-05, 'epoch': 0.59}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1950
[2024-06-10 18:12:02,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.96 | bwd_microstep: 884.31 | bwd_inner_microstep: 884.22 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3946
[2024-06-10 18:12:05,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.37 | bwd_microstep: 1695.54 | bwd_inner_microstep: 1695.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3978
[2024-06-10 18:12:07,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1602.31 | bwd_inner_microstep: 1602.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 18:12:09,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1551.78 | bwd_inner_microstep: 1551.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3396
[2024-06-10 18:12:11,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1306.41 | bwd_inner_microstep: 1306.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 18:12:13,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1383.06 | bwd_inner_microstep: 1383.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 18:12:15,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.64 | bwd_microstep: 1385.60 | bwd_inner_microstep: 1385.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 18:12:16,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1291.16 | bwd_inner_microstep: 1291.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 18:12:19,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.39 | bwd_microstep: 1641.91 | bwd_inner_microstep: 1641.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1950
[2024-06-10 18:12:20,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.92 | bwd_microstep: 854.06 | bwd_inner_microstep: 854.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555
[2024-06-10 18:12:22,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1236.63 | bwd_inner_microstep: 1236.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875
[2024-06-10 18:12:23,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.72 | bwd_microstep: 709.27 | bwd_inner_microstep: 709.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 18:12:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1484.01 | bwd_inner_microstep: 1483.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 18:12:26,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1395.02 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 18:12:28,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.20 | bwd_microstep: 1342.17 | bwd_inner_microstep: 1342.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 18:12:31,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.86 | bwd_microstep: 1718.87 | bwd_inner_microstep: 1718.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513
[2024-06-10 18:12:33,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.72 | bwd_microstep: 1436.73 | bwd_inner_microstep: 1436.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 18:12:34,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.81 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 18:12:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1420.09 | bwd_inner_microstep: 1420.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 18:12:38,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1292.36 | bwd_inner_microstep: 1292.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 18:12:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1399.24 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 18:12:42,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1257.52 | bwd_inner_microstep: 1257.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 18:12:44,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 18:12:46,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1408.18 | bwd_inner_microstep: 1408.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770
[2024-06-10 18:12:48,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.77 | bwd_microstep: 1347.05 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 18:12:50,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.24 | bwd_microstep: 1487.01 | bwd_inner_microstep: 1486.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 18:12:51,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.68 | bwd_microstep: 1305.58 | bwd_inner_microstep: 1305.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-10 18:12:53,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.45 | bwd_microstep: 1360.32 | bwd_inner_microstep: 1360.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570
[2024-06-10 18:12:55,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1591.57 | bwd_inner_microstep: 1591.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637
[2024-06-10 18:12:57,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1320.17 | bwd_inner_microstep: 1320.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 18:12:59,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1366.55 | bwd_inner_microstep: 1366.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 18:13:01,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-10 18:13:01,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.08 | bwd_microstep: 1539.52 | bwd_inner_microstep: 1531.81 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.56
[2024-06-10 18:13:01,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.92 | bwd: 43608.71 | bwd_inner: 43600.09 | bwd_allreduce: 7.93 | step: 39.17
{'loss': 1.2296, 'learning_rate': 1.5168846237317814e-05, 'epoch': 0.59}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3493
[2024-06-10 18:13:03,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.26 | bwd_microstep: 1521.83 | bwd_inner_microstep: 1521.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 18:13:05,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.96 | bwd_microstep: 1484.24 | bwd_inner_microstep: 1484.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 18:13:07,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.22 | bwd_microstep: 800.04 | bwd_inner_microstep: 800.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 18:13:09,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.69 | bwd_microstep: 1477.23 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 18:13:10,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 18:13:12,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.29 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3517
[2024-06-10 18:13:14,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1435.71 | bwd_inner_microstep: 1435.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 18:13:16,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1285.63 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913
[2024-06-10 18:13:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.65 | bwd_microstep: 686.39 | bwd_inner_microstep: 686.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 18:13:19,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1398.58 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-10 18:13:20,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 700.24 | bwd_inner_microstep: 700.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 18:13:22,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1316.29 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 18:13:24,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.22 | bwd_microstep: 1621.66 | bwd_inner_microstep: 1621.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690
[2024-06-10 18:13:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.57 | bwd_microstep: 1358.00 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2093
[2024-06-10 18:13:27,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.74 | bwd_microstep: 819.71 | bwd_inner_microstep: 819.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 18:13:29,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1392.13 | bwd_inner_microstep: 1392.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-10 18:13:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.82 | bwd_microstep: 797.43 | bwd_inner_microstep: 797.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-10 18:13:32,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1340.91 | bwd_inner_microstep: 1340.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-10 18:13:33,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.60 | bwd_microstep: 1180.57 | bwd_inner_microstep: 1180.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2106
[2024-06-10 18:13:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.32 | bwd_microstep: 921.63 | bwd_inner_microstep: 921.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 18:13:37,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1468.80 | bwd_inner_microstep: 1468.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 18:13:39,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.19 | bwd_microstep: 1545.65 | bwd_inner_microstep: 1545.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 18:13:41,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.73 | bwd_microstep: 1169.54 | bwd_inner_microstep: 1169.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 18:13:43,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.28 | bwd_microstep: 1751.99 | bwd_inner_microstep: 1751.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 18:13:45,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.48 | bwd_microstep: 1280.70 | bwd_inner_microstep: 1280.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 18:13:47,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.68 | bwd_microstep: 1485.63 | bwd_inner_microstep: 1485.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 18:13:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1400.25 | bwd_inner_microstep: 1400.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 18:13:51,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.20 | bwd_microstep: 1453.34 | bwd_inner_microstep: 1453.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3953
[2024-06-10 18:13:53,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.26 | bwd_microstep: 1506.66 | bwd_inner_microstep: 1506.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 18:13:55,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.60 | bwd_microstep: 1280.31 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2567
[2024-06-10 18:13:56,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.45 | bwd_microstep: 1161.08 | bwd_inner_microstep: 1161.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 18:14:03,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 18:14:03,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.76 | bwd_microstep: 5822.09 | bwd_inner_microstep: 1691.25 | bwd_allreduce_microstep: 4130.79 | step_microstep: 37.88
[2024-06-10 18:14:03,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15461.56 | bwd: 45499.80 | bwd_inner: 41368.10 | bwd_allreduce: 4131.02 | step: 39.33
{'loss': 1.2185, 'learning_rate': 1.5132432317405626e-05, 'epoch': 0.59}
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2639
[2024-06-10 18:14:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.01 | bwd_microstep: 1041.81 | bwd_inner_microstep: 1041.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4348
[2024-06-10 18:14:06,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.14 | bwd_microstep: 1700.57 | bwd_inner_microstep: 1700.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-10 18:14:08,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.61 | bwd_microstep: 874.10 | bwd_inner_microstep: 874.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 18:14:09,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.55 | bwd_microstep: 1341.18 | bwd_inner_microstep: 1341.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2679
[2024-06-10 18:14:11,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.37 | bwd_microstep: 1119.14 | bwd_inner_microstep: 1119.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 18:14:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.92 | bwd_microstep: 796.86 | bwd_inner_microstep: 796.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3752
[2024-06-10 18:14:14,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1371.48 | bwd_inner_microstep: 1371.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 777
[2024-06-10 18:14:14,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.07 | bwd_microstep: 308.27 | bwd_inner_microstep: 308.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 18:14:16,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1354.91 | bwd_inner_microstep: 1354.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2157
[2024-06-10 18:14:17,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.95 | bwd_microstep: 851.07 | bwd_inner_microstep: 851.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975
[2024-06-10 18:14:19,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.19 | bwd_microstep: 736.19 | bwd_inner_microstep: 736.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 18:14:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.12 | bwd_microstep: 796.59 | bwd_inner_microstep: 796.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 18:14:22,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1488.82 | bwd_inner_microstep: 1488.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 18:14:24,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1351.35 | bwd_inner_microstep: 1351.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 18:14:25,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1376.17 | bwd_inner_microstep: 1376.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 18:14:27,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.99 | bwd_microstep: 1372.52 | bwd_inner_microstep: 1372.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2090
[2024-06-10 18:14:28,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 820.69 | bwd_inner_microstep: 820.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570
[2024-06-10 18:14:31,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.77 | bwd_microstep: 1523.86 | bwd_inner_microstep: 1523.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 18:14:33,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 1390.61 | bwd_inner_microstep: 1390.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-10 18:14:35,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.76 | bwd_microstep: 1583.83 | bwd_inner_microstep: 1583.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 18:14:37,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 18:14:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.78 | bwd_microstep: 1441.44 | bwd_inner_microstep: 1441.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 18:14:41,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.76 | bwd_microstep: 1405.83 | bwd_inner_microstep: 1405.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 18:14:42,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.75 | bwd_microstep: 1301.67 | bwd_inner_microstep: 1301.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 18:14:45,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.72 | bwd_microstep: 1656.72 | bwd_inner_microstep: 1656.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 18:14:47,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.13 | bwd_microstep: 1635.34 | bwd_inner_microstep: 1635.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 18:14:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1416.77 | bwd_inner_microstep: 1416.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 18:14:51,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1407.52 | bwd_inner_microstep: 1407.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431
[2024-06-10 18:14:53,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.18 | bwd_microstep: 1543.82 | bwd_inner_microstep: 1543.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 18:14:55,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1509.99 | bwd_inner_microstep: 1509.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 18:14:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1310.09 | bwd_inner_microstep: 1310.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 18:15:04,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.62
[2024-06-10 18:15:04,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.62 | bwd_microstep: 6709.89 | bwd_inner_microstep: 1883.33 | bwd_allreduce_microstep: 4826.51 | step_microstep: 38.05
[2024-06-10 18:15:04,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15306.29 | bwd: 45936.54 | bwd_inner: 41109.13 | bwd_allreduce: 4826.74 | step: 39.53


 59%|█████▉    | 1016/1726 [17:32:36<12:17:24, 62.32s/it]
 59%|█████▉    | 1017/1726 [17:33:38<12:17:02, 62.37s/it]


 59%|█████▉    | 1017/1726 [17:33:38<12:17:02, 62.37s/it]
 59%|█████▉    | 1018/1726 [17:34:38<12:06:03, 61.53s/it]


 59%|█████▉    | 1018/1726 [17:34:38<12:06:03, 61.53s/it]
 59%|█████▉    | 1019/1726 [17:35:38<12:00:36, 61.15s/it]


 59%|█████▉    | 1019/1726 [17:35:38<12:00:36, 61.15s/it]
 59%|█████▉    | 1020/1726 [17:36:39<12:00:03, 61.19s/it]


 59%|█████▉    | 1020/1726 [17:36:39<12:00:03, 61.19s/it]
 59%|█████▉    | 1021/1726 [17:37:41<12:00:21, 61.31s/it]
                                             {'loss': 1.222, 'learning_rate': 1.509603554104152e-05, 'epoch': 0.59}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3460
[2024-06-10 18:15:06,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1494.05 | bwd_inner_microstep: 1494.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3922
[2024-06-10 18:15:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1390.40 | bwd_inner_microstep: 1390.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 18:15:10,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1377.16 | bwd_inner_microstep: 1377.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 18:15:12,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 18:15:13,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 788.86 | bwd_inner_microstep: 788.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 18:15:15,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1279.69 | bwd_inner_microstep: 1279.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947
[2024-06-10 18:15:16,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.69 | bwd_microstep: 821.02 | bwd_inner_microstep: 820.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 18:15:18,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 18:15:20,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1245.88 | bwd_inner_microstep: 1245.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 18:15:21,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1249.55 | bwd_inner_microstep: 1249.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 18:15:23,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 18:15:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1342.95 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 18:15:27,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.25 | bwd_microstep: 1313.46 | bwd_inner_microstep: 1313.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 18:15:29,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1250.28 | bwd_inner_microstep: 1250.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 18:15:31,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1452.77 | bwd_inner_microstep: 1452.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 18:15:33,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1353.47 | bwd_inner_microstep: 1353.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2098
[2024-06-10 18:15:34,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 852.06 | bwd_inner_microstep: 852.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 18:15:36,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.31 | bwd_microstep: 1389.45 | bwd_inner_microstep: 1389.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 18:15:37,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1281.92 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122
[2024-06-10 18:15:39,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.44 | bwd_microstep: 927.13 | bwd_inner_microstep: 927.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069
[2024-06-10 18:15:40,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.81 | bwd_microstep: 914.40 | bwd_inner_microstep: 914.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 18:15:42,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1256.78 | bwd_inner_microstep: 1256.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 18:15:44,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1394.83 | bwd_inner_microstep: 1394.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3935
[2024-06-10 18:15:46,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.19 | bwd_microstep: 1620.78 | bwd_inner_microstep: 1620.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 18:15:47,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.64 | bwd_microstep: 973.73 | bwd_inner_microstep: 973.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 18:15:49,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1252.29 | bwd_inner_microstep: 1252.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3570
[2024-06-10 18:15:51,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.53 | bwd_microstep: 1663.50 | bwd_inner_microstep: 1663.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 18:15:53,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1448.16 | bwd_inner_microstep: 1448.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807
[2024-06-10 18:15:56,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.61 | bwd_microstep: 1752.93 | bwd_inner_microstep: 1752.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3589
[2024-06-10 18:15:58,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1464.21 | bwd_inner_microstep: 1464.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 18:16:00,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.30 | bwd_microstep: 1643.04 | bwd_inner_microstep: 1643.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-10 18:16:07,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 18:16:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.43 | bwd_microstep: 6565.57 | bwd_inner_microstep: 926.31 | bwd_allreduce_microstep: 5639.21 | step_microstep: 38.12
[2024-06-10 18:16:07,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15404.04 | bwd: 46901.59 | bwd_inner: 41261.47 | bwd_allreduce: 5639.44 | step: 39.61
{'loss': 1.2151, 'learning_rate': 1.5059656036414738e-05, 'epoch': 0.59}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 18:16:09,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1474.60 | bwd_inner_microstep: 1474.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 18:16:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.44 | bwd_microstep: 695.97 | bwd_inner_microstep: 695.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907
[2024-06-10 18:16:12,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.70 | bwd_microstep: 1587.33 | bwd_inner_microstep: 1587.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262
[2024-06-10 18:16:13,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.52 | bwd_microstep: 966.85 | bwd_inner_microstep: 966.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 18:16:15,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1380.58 | bwd_inner_microstep: 1380.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 18:16:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.53 | bwd_microstep: 1245.63 | bwd_inner_microstep: 1245.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960
[2024-06-10 18:16:18,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.06 | bwd_microstep: 702.50 | bwd_inner_microstep: 702.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 18:16:20,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 1385.91 | bwd_inner_microstep: 1385.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909
[2024-06-10 18:16:21,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.53 | bwd_microstep: 778.51 | bwd_inner_microstep: 778.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-10 18:16:23,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.48 | bwd_microstep: 1624.11 | bwd_inner_microstep: 1624.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4091
[2024-06-10 18:16:25,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.59 | bwd_microstep: 1525.46 | bwd_inner_microstep: 1525.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-10 18:16:27,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.85 | bwd_microstep: 1279.78 | bwd_inner_microstep: 1279.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3916
[2024-06-10 18:16:30,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.52 | bwd_microstep: 1790.23 | bwd_inner_microstep: 1790.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 18:16:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 18:16:33,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1439.18 | bwd_inner_microstep: 1439.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 18:16:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.32 | bwd_microstep: 1445.89 | bwd_inner_microstep: 1445.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 18:16:37,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1485.60 | bwd_inner_microstep: 1485.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627
[2024-06-10 18:16:39,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1613.02 | bwd_inner_microstep: 1613.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 18:16:41,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1416.09 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 18:16:43,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1278.74 | bwd_inner_microstep: 1278.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529
[2024-06-10 18:16:44,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.99 | bwd_microstep: 1357.48 | bwd_inner_microstep: 1357.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 18:16:47,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1500.64 | bwd_inner_microstep: 1500.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 18:16:48,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1397.82 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-10 18:16:50,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1411.78 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 18:16:52,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.73 | bwd_microstep: 802.29 | bwd_inner_microstep: 802.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3706
[2024-06-10 18:16:53,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.56 | bwd_microstep: 1361.16 | bwd_inner_microstep: 1361.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 18:16:55,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1253.94 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3728
[2024-06-10 18:16:57,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.00 | bwd_microstep: 1682.27 | bwd_inner_microstep: 1682.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 18:17:00,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.95 | bwd_microstep: 1606.30 | bwd_inner_microstep: 1606.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3769
[2024-06-10 18:17:02,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.40 | bwd_microstep: 1744.03 | bwd_inner_microstep: 1744.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 18:17:04,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1312.13 | bwd_inner_microstep: 1312.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3646
[2024-06-10 18:17:11,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 18:17:11,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 6719.02 | bwd_inner_microstep: 1544.39 | bwd_allreduce_microstep: 5174.58 | step_microstep: 38.21
[2024-06-10 18:17:11,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15946.72 | bwd: 48055.99 | bwd_inner: 42880.51 | bwd_allreduce: 5174.81 | step: 39.63
{'loss': 1.2306, 'learning_rate': 1.5023293931653714e-05, 'epoch': 0.59}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2906
[2024-06-10 18:17:13,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.65 | bwd_microstep: 1176.98 | bwd_inner_microstep: 1176.91 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 18:17:15,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907
[2024-06-10 18:17:17,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.49 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 18:17:19,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.10 | bwd_microstep: 1652.79 | bwd_inner_microstep: 1652.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 18:17:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1382.66 | bwd_inner_microstep: 1382.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 18:17:23,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.41 | bwd_microstep: 1146.94 | bwd_inner_microstep: 1146.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403
[2024-06-10 18:17:24,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.36 | bwd_microstep: 1207.54 | bwd_inner_microstep: 1207.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 18:17:26,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.22 | bwd_microstep: 1279.05 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2216
[2024-06-10 18:17:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.45 | bwd_microstep: 955.56 | bwd_inner_microstep: 955.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 18:17:29,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1242.44 | bwd_inner_microstep: 1242.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 18:17:31,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1286.61 | bwd_inner_microstep: 1286.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 18:17:33,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.87 | bwd_microstep: 1380.27 | bwd_inner_microstep: 1380.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3486
[2024-06-10 18:17:35,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1365.40 | bwd_inner_microstep: 1365.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497
[2024-06-10 18:17:37,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.88 | bwd_microstep: 1501.60 | bwd_inner_microstep: 1501.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 18:17:39,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1448.25 | bwd_inner_microstep: 1448.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 18:17:41,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1449.62 | bwd_inner_microstep: 1449.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 18:17:43,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1405.37 | bwd_inner_microstep: 1405.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3439
[2024-06-10 18:17:44,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.66 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2245
[2024-06-10 18:17:46,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.42 | bwd_microstep: 968.74 | bwd_inner_microstep: 968.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548
[2024-06-10 18:17:48,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1427.43 | bwd_inner_microstep: 1427.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 18:17:50,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.18 | bwd_microstep: 1421.26 | bwd_inner_microstep: 1421.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3672
[2024-06-10 18:17:52,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.79 | bwd_microstep: 1529.28 | bwd_inner_microstep: 1529.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 18:17:54,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 18:17:56,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.48 | bwd_microstep: 1549.79 | bwd_inner_microstep: 1549.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 18:17:58,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1286.18 | bwd_inner_microstep: 1286.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 18:18:00,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1343.29 | bwd_inner_microstep: 1343.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2041
[2024-06-10 18:18:01,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.59 | bwd_microstep: 935.40 | bwd_inner_microstep: 935.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 18:18:03,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1375.34 | bwd_inner_microstep: 1375.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 18:18:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.26 | bwd_microstep: 1602.65 | bwd_inner_microstep: 1602.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 18:18:07,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.94 | bwd_microstep: 1643.51 | bwd_inner_microstep: 1643.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3425
[2024-06-10 18:18:09,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.89 | bwd_microstep: 1379.32 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-10 18:18:12,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 18:18:12,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.63 | bwd_microstep: 2183.54 | bwd_inner_microstep: 1208.75 | bwd_allreduce_microstep: 974.74 | step_microstep: 37.50
[2024-06-10 18:18:12,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16048.59 | bwd: 44166.28 | bwd_inner: 43190.58 | bwd_allreduce: 975.00 | step: 39.00
{'loss': 1.2669, 'learning_rate': 1.498694935482559e-05, 'epoch': 0.59}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 18:18:14,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1375.56 | bwd_inner_microstep: 1375.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 18:18:15,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.23 | bwd_microstep: 1283.79 | bwd_inner_microstep: 1283.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2336
[2024-06-10 18:18:17,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.41 | bwd_microstep: 984.23 | bwd_inner_microstep: 984.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2304
[2024-06-10 18:18:18,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.31 | bwd_microstep: 908.29 | bwd_inner_microstep: 908.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4080
[2024-06-10 18:18:20,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.91 | bwd_microstep: 1719.52 | bwd_inner_microstep: 1719.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4102
[2024-06-10 18:18:23,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 712.31 | bwd_microstep: 1939.58 | bwd_inner_microstep: 1939.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 18:18:25,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.98 | bwd_microstep: 1479.91 | bwd_inner_microstep: 1479.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 18:18:27,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.92 | bwd_microstep: 1245.92 | bwd_inner_microstep: 1245.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2105
[2024-06-10 18:18:28,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.53 | bwd_microstep: 760.40 | bwd_inner_microstep: 760.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 18:18:30,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1427.68 | bwd_inner_microstep: 1427.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592
[2024-06-10 18:18:32,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.86 | bwd_microstep: 1307.74 | bwd_inner_microstep: 1307.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 18:18:33,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1287.92 | bwd_inner_microstep: 1287.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2463
[2024-06-10 18:18:35,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.51 | bwd_microstep: 981.13 | bwd_inner_microstep: 981.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 18:18:37,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1349.48 | bwd_inner_microstep: 1349.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515
[2024-06-10 18:18:38,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.71 | bwd_microstep: 1348.01 | bwd_inner_microstep: 1347.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 18:18:41,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.80 | bwd_microstep: 1610.36 | bwd_inner_microstep: 1610.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 18:18:43,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1382.57 | bwd_inner_microstep: 1382.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3630
[2024-06-10 18:18:45,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.10 | bwd_microstep: 1776.63 | bwd_inner_microstep: 1776.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 18:18:47,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1444.99 | bwd_inner_microstep: 1444.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 18:18:49,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.49 | bwd_microstep: 1521.21 | bwd_inner_microstep: 1521.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 18:18:51,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1298.42 | bwd_inner_microstep: 1298.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 18:18:53,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1312.30 | bwd_inner_microstep: 1312.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 18:18:55,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.98 | bwd_microstep: 1295.33 | bwd_inner_microstep: 1295.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 18:18:56,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.39 | bwd_microstep: 1178.07 | bwd_inner_microstep: 1178.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3565
[2024-06-10 18:18:58,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.34 | bwd_microstep: 1444.47 | bwd_inner_microstep: 1444.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 18:19:00,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1655.17 | bwd_inner_microstep: 1655.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 18:19:03,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1508.24 | bwd_inner_microstep: 1508.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 18:19:05,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.78 | bwd_microstep: 1529.33 | bwd_inner_microstep: 1529.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559
[2024-06-10 18:19:06,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1333.58 | bwd_inner_microstep: 1333.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 18:19:09,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.52 | bwd_microstep: 1603.02 | bwd_inner_microstep: 1602.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3594
[2024-06-10 18:19:11,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1441.27 | bwd_inner_microstep: 1441.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 18:19:15,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-10 18:19:15,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.90 | bwd_microstep: 3491.50 | bwd_inner_microstep: 1696.01 | bwd_allreduce_microstep: 1795.45 | step_microstep: 37.76
[2024-06-10 18:19:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16538.89 | bwd: 46225.62 | bwd_inner: 44429.28 | bwd_allreduce: 1795.67 | step: 39.25
{'loss': 1.2366, 'learning_rate': 1.4950622433935786e-05, 'epoch': 0.59}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 18:19:17,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.43 | bwd_microstep: 1468.66 | bwd_inner_microstep: 1468.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3935
[2024-06-10 18:19:19,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.17 | bwd_microstep: 1425.22 | bwd_inner_microstep: 1425.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 18:19:21,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1480.12 | bwd_inner_microstep: 1480.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 18:19:23,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1248.20 | bwd_inner_microstep: 1248.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4170
[2024-06-10 18:19:25,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.85 | bwd_microstep: 1648.26 | bwd_inner_microstep: 1648.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 18:19:27,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-10 18:19:29,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1438.96 | bwd_inner_microstep: 1438.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 18:19:30,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.58 | bwd_microstep: 1279.41 | bwd_inner_microstep: 1279.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 18:19:32,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.74 | bwd_microstep: 1278.12 | bwd_inner_microstep: 1278.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901
[2024-06-10 18:19:33,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.63 | bwd_microstep: 685.72 | bwd_inner_microstep: 685.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 18:19:34,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 792.91 | bwd_inner_microstep: 792.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 18:19:36,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.22 | bwd_microstep: 1442.00 | bwd_inner_microstep: 1441.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2387
[2024-06-10 18:19:37,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.67 | bwd_microstep: 934.68 | bwd_inner_microstep: 934.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3640
[2024-06-10 18:19:39,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 18:19:41,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-10 18:19:43,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.06 | bwd_microstep: 1425.88 | bwd_inner_microstep: 1425.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3631
[2024-06-10 18:19:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1348.07 | bwd_inner_microstep: 1348.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 18:19:47,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 1559.64 | bwd_inner_microstep: 1559.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2375
[2024-06-10 18:19:49,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.74 | bwd_microstep: 934.53 | bwd_inner_microstep: 934.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-10 18:19:50,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.02 | bwd_microstep: 1157.22 | bwd_inner_microstep: 1157.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 18:19:52,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1511.31 | bwd_inner_microstep: 1511.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 18:19:54,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.53 | bwd_microstep: 1313.68 | bwd_inner_microstep: 1313.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 18:19:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.70 | bwd_microstep: 1501.46 | bwd_inner_microstep: 1501.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 18:19:58,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.06 | bwd_microstep: 1158.60 | bwd_inner_microstep: 1158.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 18:20:00,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1251.09 | bwd_inner_microstep: 1251.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3467
[2024-06-10 18:20:01,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.13 | bwd_microstep: 1246.06 | bwd_inner_microstep: 1246.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-10 18:20:03,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.77 | bwd_microstep: 1375.67 | bwd_inner_microstep: 1375.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601
[2024-06-10 18:20:05,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.08 | bwd_microstep: 1706.96 | bwd_inner_microstep: 1706.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229
[2024-06-10 18:20:07,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.07 | bwd_microstep: 959.55 | bwd_inner_microstep: 959.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-10 18:20:08,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.72 | bwd_microstep: 968.79 | bwd_inner_microstep: 968.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3645
[2024-06-10 18:20:10,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.79 | bwd_microstep: 1680.35 | bwd_inner_microstep: 1680.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 18:20:16,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 18:20:16,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 5127.50 | bwd_inner_microstep: 1579.81 | bwd_allreduce_microstep: 3547.64 | step_microstep: 38.07
[2024-06-10 18:20:16,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15631.98 | bwd: 45394.43 | bwd_inner: 41845.89 | bwd_allreduce: 3547.87 | step: 39.57
{'loss': 1.185, 'learning_rate': 1.491431329692751e-05, 'epoch': 0.59}


 59%|█████▉    | 1021/1726 [17:37:41<12:00:21, 61.31s/it]
 59%|█████▉    | 1022/1726 [17:38:44<12:03:59, 61.70s/it]


 59%|█████▉    | 1022/1726 [17:38:44<12:03:59, 61.70s/it]
 59%|█████▉    | 1023/1726 [17:39:48<12:12:11, 62.49s/it]


 59%|█████▉    | 1023/1726 [17:39:48<12:12:11, 62.49s/it]
 59%|█████▉    | 1024/1726 [17:40:48<12:04:21, 61.91s/it]


 59%|█████▉    | 1024/1726 [17:40:48<12:04:21, 61.91s/it]
 59%|█████▉    | 1025/1726 [17:41:52<12:07:30, 62.27s/it]


 59%|█████▉    | 1025/1726 [17:41:52<12:07:30, 62.27s/it]
 59%|█████▉    | 1026/1726 [17:42:53<12:03:18, 62.00s/it]


 59%|███�dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411
[2024-06-10 18:20:18,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.28 | bwd_microstep: 1368.69 | bwd_inner_microstep: 1368.54 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3923
[2024-06-10 18:20:20,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1488.22 | bwd_inner_microstep: 1488.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 18:20:21,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.52 | bwd_microstep: 786.22 | bwd_inner_microstep: 786.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 18:20:23,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1287.44 | bwd_inner_microstep: 1287.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 18:20:25,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.22 | bwd_microstep: 1249.94 | bwd_inner_microstep: 1249.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 18:20:26,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1185.63 | bwd_inner_microstep: 1185.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2806
[2024-06-10 18:20:28,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.31 | bwd_microstep: 1203.32 | bwd_inner_microstep: 1203.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2246
[2024-06-10 18:20:29,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.44 | bwd_microstep: 965.83 | bwd_inner_microstep: 965.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 18:20:31,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1243.07 | bwd_inner_microstep: 1243.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 18:20:33,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1385.74 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3692
[2024-06-10 18:20:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.73 | bwd_microstep: 1546.72 | bwd_inner_microstep: 1546.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3705
[2024-06-10 18:20:37,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1483.70 | bwd_inner_microstep: 1483.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3651
[2024-06-10 18:20:39,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.31 | bwd_microstep: 1541.43 | bwd_inner_microstep: 1541.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 18:20:41,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1381.76 | bwd_inner_microstep: 1381.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 867
[2024-06-10 18:20:42,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.82 | bwd_microstep: 365.06 | bwd_inner_microstep: 365.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2435
[2024-06-10 18:20:43,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.51 | bwd_microstep: 944.92 | bwd_inner_microstep: 944.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 18:20:45,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.15 | bwd_microstep: 1286.67 | bwd_inner_microstep: 1286.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 18:20:47,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1350.65 | bwd_inner_microstep: 1350.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-10 18:20:48,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 800.24 | bwd_inner_microstep: 800.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2085
[2024-06-10 18:20:49,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.18 | bwd_microstep: 916.22 | bwd_inner_microstep: 916.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 18:20:51,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1402.39 | bwd_inner_microstep: 1402.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2057
[2024-06-10 18:20:52,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.44 | bwd_microstep: 940.97 | bwd_inner_microstep: 940.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2379
[2024-06-10 18:20:54,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.02 | bwd_microstep: 1027.82 | bwd_inner_microstep: 1027.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3516
[2024-06-10 18:20:56,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.66 | bwd_microstep: 1545.34 | bwd_inner_microstep: 1545.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 18:20:58,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1488.54 | bwd_inner_microstep: 1488.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3594
[2024-06-10 18:21:00,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.53 | bwd_microstep: 1367.56 | bwd_inner_microstep: 1367.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830
[2024-06-10 18:21:02,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.50 | bwd_microstep: 1752.16 | bwd_inner_microstep: 1752.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3790
[2024-06-10 18:21:05,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.54 | bwd_microstep: 1848.86 | bwd_inner_microstep: 1848.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3777
[2024-06-10 18:21:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.76 | bwd_microstep: 1609.47 | bwd_inner_microstep: 1609.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890
[2024-06-10 18:21:08,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.59 | bwd_microstep: 775.74 | bwd_inner_microstep: 775.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2193
[2024-06-10 18:21:09,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.78 | bwd_microstep: 954.10 | bwd_inner_microstep: 954.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-10 18:21:22,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.59 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 18:21:22,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.01 | bwd_microstep: 11924.49 | bwd_inner_microstep: 1456.81 | bwd_allreduce_microstep: 10467.62 | step_microstep: 39.11
[2024-06-10 18:21:22,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14890.35 | bwd: 50418.93 | bwd_inner: 39950.28 | bwd_allreduce: 10467.91 | step: 40.64
{'loss': 1.1816, 'learning_rate': 1.4878022071681368e-05, 'epoch': 0.59}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3398
[2024-06-10 18:21:24,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.13 | bwd_microstep: 1378.31 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3442
[2024-06-10 18:21:26,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1400.42 | bwd_inner_microstep: 1400.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 18:21:28,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1395.23 | bwd_inner_microstep: 1395.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2348
[2024-06-10 18:21:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.42 | bwd_microstep: 918.69 | bwd_inner_microstep: 918.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 18:21:31,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1444.87 | bwd_inner_microstep: 1444.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 18:21:33,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1374.41 | bwd_inner_microstep: 1374.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 18:21:35,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.88 | bwd_microstep: 1339.32 | bwd_inner_microstep: 1339.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 18:21:37,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1525.65 | bwd_inner_microstep: 1525.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 18:21:38,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.35 | bwd_microstep: 1146.26 | bwd_inner_microstep: 1146.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696
[2024-06-10 18:21:40,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.67 | bwd_microstep: 1427.30 | bwd_inner_microstep: 1427.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 18:21:41,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 18:21:43,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1342.78 | bwd_inner_microstep: 1342.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1468
[2024-06-10 18:21:44,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 199.22 | bwd_microstep: 514.87 | bwd_inner_microstep: 514.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 18:21:46,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.38 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 18:21:48,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 18:21:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1491.35 | bwd_inner_microstep: 1491.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 18:21:52,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1242.72 | bwd_inner_microstep: 1242.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 18:21:54,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1489.80 | bwd_inner_microstep: 1489.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3475
[2024-06-10 18:21:56,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1506.57 | bwd_inner_microstep: 1506.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1967
[2024-06-10 18:21:57,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.00 | bwd_microstep: 764.37 | bwd_inner_microstep: 764.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 18:21:59,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 527
[2024-06-10 18:21:59,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 96.71 | bwd_microstep: 241.94 | bwd_inner_microstep: 241.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 18:22:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1490.27 | bwd_inner_microstep: 1490.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3550
[2024-06-10 18:22:03,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1262.96 | bwd_inner_microstep: 1262.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-10 18:22:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 702.52 | bwd_inner_microstep: 702.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579
[2024-06-10 18:22:06,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1426.99 | bwd_inner_microstep: 1426.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2292
[2024-06-10 18:22:07,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.20 | bwd_microstep: 1007.80 | bwd_inner_microstep: 1007.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3733
[2024-06-10 18:22:09,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1464.50 | bwd_inner_microstep: 1464.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 18:22:11,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.65 | bwd_microstep: 1481.93 | bwd_inner_microstep: 1481.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425
[2024-06-10 18:22:13,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.94 | bwd_microstep: 1042.35 | bwd_inner_microstep: 1042.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2585
[2024-06-10 18:22:15,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.52 | bwd_microstep: 1161.23 | bwd_inner_microstep: 1161.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3578
[2024-06-10 18:22:23,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.39 | optimizer_step: 6.62
[2024-06-10 18:22:23,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 7608.32 | bwd_inner_microstep: 1651.65 | bwd_allreduce_microstep: 5956.59 | step_microstep: 40.04
[2024-06-10 18:22:23,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14847.81 | bwd: 45744.91 | bwd_inner: 39787.38 | bwd_allreduce: 5956.83 | step: 41.61
{'loss': 1.2242, 'learning_rate': 1.4841748886014866e-05, 'epoch': 0.6}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 18:22:25,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1373.11 | bwd_inner_microstep: 1373.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-10 18:22:26,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.64 | bwd_microstep: 679.80 | bwd_inner_microstep: 679.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2366
[2024-06-10 18:22:27,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.38 | bwd_microstep: 888.06 | bwd_inner_microstep: 888.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-10 18:22:29,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1455.53 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 18:22:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.68 | bwd_microstep: 1276.19 | bwd_inner_microstep: 1276.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1760
[2024-06-10 18:22:31,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 243.11 | bwd_microstep: 624.43 | bwd_inner_microstep: 624.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 18:22:33,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1394.15 | bwd_inner_microstep: 1394.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 18:22:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1244.62 | bwd_inner_microstep: 1244.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 18:22:37,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 18:22:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.67 | bwd_microstep: 1380.96 | bwd_inner_microstep: 1380.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3441
[2024-06-10 18:22:41,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.29 | bwd_microstep: 1319.37 | bwd_inner_microstep: 1319.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3673
[2024-06-10 18:22:43,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.47 | bwd_microstep: 1821.68 | bwd_inner_microstep: 1821.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3693
[2024-06-10 18:22:46,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1658.59 | bwd_inner_microstep: 1658.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2125
[2024-06-10 18:22:47,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.38 | bwd_microstep: 858.60 | bwd_inner_microstep: 858.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 18:22:48,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1153.89 | bwd_inner_microstep: 1153.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 18:22:50,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1282.68 | bwd_inner_microstep: 1282.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-10 18:22:52,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.54 | bwd_microstep: 1311.74 | bwd_inner_microstep: 1311.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-10 18:22:54,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.55 | bwd_microstep: 1191.61 | bwd_inner_microstep: 1191.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3818
[2024-06-10 18:22:55,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.69 | bwd_microstep: 1258.90 | bwd_inner_microstep: 1258.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 18:22:56,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.78 | bwd_microstep: 696.82 | bwd_inner_microstep: 696.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 18:22:58,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1497.21 | bwd_inner_microstep: 1497.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 18:23:00,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.88 | bwd_microstep: 818.65 | bwd_inner_microstep: 818.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-10 18:23:01,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.75 | bwd_microstep: 1191.07 | bwd_inner_microstep: 1191.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 18:23:03,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.49 | bwd_microstep: 1657.10 | bwd_inner_microstep: 1657.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 18:23:05,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1459.30 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608
[2024-06-10 18:23:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.73 | bwd_microstep: 1310.70 | bwd_inner_microstep: 1310.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 18:23:09,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1396.83 | bwd_inner_microstep: 1396.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 18:23:11,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.37 | bwd_microstep: 1452.46 | bwd_inner_microstep: 1452.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-10 18:23:12,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.00 | bwd_microstep: 809.97 | bwd_inner_microstep: 809.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2068
[2024-06-10 18:23:14,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.78 | bwd_microstep: 867.47 | bwd_inner_microstep: 867.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 18:23:15,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1352.95 | bwd_inner_microstep: 1352.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615
[2024-06-10 18:23:23,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.61
[2024-06-10 18:23:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 6987.14 | bwd_inner_microstep: 1503.94 | bwd_allreduce_microstep: 5483.14 | step_microstep: 38.57
[2024-06-10 18:23:23,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14820.77 | bwd: 45057.04 | bwd_inner: 39572.99 | bwd_allreduce: 5483.37 | step: 40.03
{'loss': 1.2432, 'learning_rate': 1.4805493867681969e-05, 'epoch': 0.6}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 18:23:25,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1470.99 | bwd_inner_microstep: 1470.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2431
[2024-06-10 18:23:26,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 967.75 | bwd_inner_microstep: 967.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2433
[2024-06-10 18:23:28,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.40 | bwd_microstep: 974.20 | bwd_inner_microstep: 974.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 18:23:30,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2251
[2024-06-10 18:23:31,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.78 | bwd_microstep: 962.81 | bwd_inner_microstep: 962.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 18:23:33,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.48 | bwd_microstep: 1243.88 | bwd_inner_microstep: 1243.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724
[2024-06-10 18:23:35,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.88 | bwd_microstep: 1362.60 | bwd_inner_microstep: 1362.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 18:23:36,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1425.14 | bwd_inner_microstep: 1425.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 18:23:37,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.87 | bwd_microstep: 702.63 | bwd_inner_microstep: 702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 18:23:39,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1296.80 | bwd_inner_microstep: 1296.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1984
[2024-06-10 18:23:40,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.55 | bwd_microstep: 827.32 | bwd_inner_microstep: 827.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2456
[2024-06-10 18:23:42,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.29 | bwd_microstep: 948.12 | bwd_inner_microstep: 948.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1968
[2024-06-10 18:23:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.76 | bwd_microstep: 822.30 | bwd_inner_microstep: 822.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-10 18:23:45,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1578.76 | bwd_inner_microstep: 1578.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 18:23:47,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1508.85 | bwd_inner_microstep: 1508.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 18:23:48,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.00 | bwd_microstep: 698.92 | bwd_inner_microstep: 698.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632
[2024-06-10 18:23:50,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 18:23:52,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.04 | bwd_microstep: 1255.57 | bwd_inner_microstep: 1255.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2104
[2024-06-10 18:23:53,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.30 | bwd_microstep: 826.26 | bwd_inner_microstep: 826.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 18:23:55,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3705
[2024-06-10 18:23:57,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.31 | bwd_microstep: 1267.10 | bwd_inner_microstep: 1267.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-10 18:23:59,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.83 | bwd_microstep: 1417.05 | bwd_inner_microstep: 1417.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 18:24:00,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1295.59 | bwd_inner_microstep: 1295.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-10 18:24:03,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1549.80 | bwd_inner_microstep: 1549.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 18:24:05,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1557.61 | bwd_inner_microstep: 1557.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931
[2024-06-10 18:24:06,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.05 | bwd_microstep: 727.58 | bwd_inner_microstep: 727.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068
[2024-06-10 18:24:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.29 | bwd_microstep: 913.47 | bwd_inner_microstep: 913.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3634
[2024-06-10 18:24:09,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.66 | bwd_microstep: 1639.07 | bwd_inner_microstep: 1639.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3604
[2024-06-10 18:24:11,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1463.63 | bwd_inner_microstep: 1463.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807
[2024-06-10 18:24:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.43 | bwd_microstep: 1614.07 | bwd_inner_microstep: 1614.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781
[2024-06-10 18:24:15,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.49 | bwd_microstep: 1449.58 | bwd_inner_microstep: 1449.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3578
[2024-06-10 18:24:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 18:24:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.67 | bwd_microstep: 6493.27 | bwd_inner_microstep: 1770.40 | bwd_allreduce_microstep: 4722.81 | step_microstep: 37.95
[2024-06-10 18:24:23,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14841.57 | bwd: 44495.22 | bwd_inner: 39771.50 | bwd_allreduce: 4723.04 | step: 39.46
{'loss': 1.1959, 'learning_rate': 1.4769257144372668e-05, 'epoch': 0.6}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-10 18:24:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1402.31 | bwd_inner_microstep: 1402.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 18:24:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.69 | bwd_microstep: 1196.08 | bwd_inner_microstep: 1196.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3882
[2024-06-10 18:24:29,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.91 | bwd_microstep: 1678.81 | bwd_inner_microstep: 1678.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782
[2024-06-10 18:24:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.80 | bwd_microstep: 1346.40 | bwd_inner_microstep: 1346.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 18:24:33,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.49 | bwd_microstep: 1543.21 | bwd_inner_microstep: 1543.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 18:24:34,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1380.29 | bwd_inner_microstep: 1380.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 18:24:36,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.94 | bwd_microstep: 1153.52 | bwd_inner_microstep: 1153.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4067
[2024-06-10 18:24:38,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.84 | bwd_microstep: 1625.31 | bwd_inner_microstep: 1625.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3701
[2024-06-10 18:24:40,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.29 | bwd_microstep: 1356.21 | bwd_inner_microstep: 1356.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 18:24:42,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.61 | bwd_microstep: 1388.23 | bwd_inner_microstep: 1388.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 18:24:44,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1527.22 | bwd_inner_microstep: 1527.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 18:24:46,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.01 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 18:24:48,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1477.81 | bwd_inner_microstep: 1477.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 18:24:50,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1325.70 | bwd_inner_microstep: 1325.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 18:24:52,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1443.23 | bwd_inner_microstep: 1443.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3701
[2024-06-10 18:24:54,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.25 | bwd_microstep: 1471.85 | bwd_inner_microstep: 1471.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2575
[2024-06-10 18:24:55,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.85 | bwd_microstep: 972.59 | bwd_inner_microstep: 972.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 18:24:57,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1415.96 | bwd_inner_microstep: 1415.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 18:24:59,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1251.71 | bwd_inner_microstep: 1251.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-10 18:25:01,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.31 | bwd_microstep: 1198.31 | bwd_inner_microstep: 1198.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 18:25:02,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.45 | bwd_microstep: 797.76 | bwd_inner_microstep: 797.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 18:25:03,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.13 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 18:25:06,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 18:25:07,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1297.65 | bwd_inner_microstep: 1297.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3675
[2024-06-10 18:25:09,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1374.71 | bwd_inner_microstep: 1374.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3429
[2024-06-10 18:25:11,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.84 | bwd_microstep: 1511.99 | bwd_inner_microstep: 1511.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 18:25:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.30 | bwd_microstep: 1544.36 | bwd_inner_microstep: 1544.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3558
[2024-06-10 18:25:15,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.75 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2282
[2024-06-10 18:25:17,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1066.64 | bwd_inner_microstep: 1066.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 18:25:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.83 | bwd_microstep: 1484.49 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2198
[2024-06-10 18:25:20,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.17 | bwd_microstep: 985.22 | bwd_inner_microstep: 985.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743
[2024-06-10 18:25:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 18:25:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.12 | bwd_microstep: 4053.22 | bwd_inner_microstep: 2151.31 | bwd_allreduce_microstep: 1901.86 | step_microstep: 37.91
[2024-06-10 18:25:25,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16276.10 | bwd: 45826.63 | bwd_inner: 43923.82 | bwd_allreduce: 1902.11 | step: 39.48
{'loss': 1.2071, 'learning_rate': 1.4733038843712515e-05, 'epoch': 0.6}
��█▉    | 1026/1726 [17:42:53<12:03:18, 62.00s/it]
 60%|█████▉    | 1027/1726 [17:43:59<12:15:03, 63.09s/it]


 60%|█████▉    | 1027/1726 [17:43:59<12:15:03, 63.09s/it]
 60%|█████▉    | 1028/1726 [17:44:59<12:06:25, 62.44s/it]


 60%|█████▉    | 1028/1726 [17:44:59<12:06:25, 62.44s/it]
 60%|█████▉    | 1029/1726 [17:46:00<11:57:33, 61.77s/it]


 60%|█████▉    | 1029/1726 [17:46:00<11:57:33, 61.77s/it]
 60%|█████▉    | 1030/1726 [17:46:59<11:49:11, 61.14s/it]


 60%|█████▉    | 1030/1726 [17:46:59<11:49:11, 61.14s/it]
 60%|█████▉    | 1031/1726 [17:48:02<11:52:43, 61.53s/it]


 60%|█████▉    | 1031/1726 [17:4dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 18:25:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3947
[2024-06-10 18:25:29,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.15 | bwd_microstep: 1689.53 | bwd_inner_microstep: 1689.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 18:25:31,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1395.84 | bwd_inner_microstep: 1395.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 18:25:33,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.51 | bwd_microstep: 1246.44 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 18:25:35,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1387.03 | bwd_inner_microstep: 1387.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 18:25:37,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.13 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 18:25:38,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.82 | bwd_microstep: 1151.03 | bwd_inner_microstep: 1151.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 18:25:40,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.02 | bwd_microstep: 1388.14 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-10 18:25:42,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 1440.25 | bwd_inner_microstep: 1440.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 18:25:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.93 | bwd_microstep: 1282.53 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523
[2024-06-10 18:25:46,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1328.92 | bwd_inner_microstep: 1328.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 18:25:48,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.83 | bwd_microstep: 1590.92 | bwd_inner_microstep: 1590.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491
[2024-06-10 18:25:50,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1580.03 | bwd_inner_microstep: 1580.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3694
[2024-06-10 18:25:52,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1390.40 | bwd_inner_microstep: 1390.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 18:25:54,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.17 | bwd_microstep: 1178.65 | bwd_inner_microstep: 1178.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082
[2024-06-10 18:25:55,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 820.42 | bwd_inner_microstep: 820.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2032
[2024-06-10 18:25:56,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.99 | bwd_microstep: 715.24 | bwd_inner_microstep: 715.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1987
[2024-06-10 18:25:57,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.47 | bwd_microstep: 862.53 | bwd_inner_microstep: 862.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2289
[2024-06-10 18:25:58,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.04 | bwd_microstep: 911.05 | bwd_inner_microstep: 911.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2311
[2024-06-10 18:26:00,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 885.86 | bwd_inner_microstep: 885.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588
[2024-06-10 18:26:02,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.73 | bwd_microstep: 1607.06 | bwd_inner_microstep: 1607.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 18:26:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.08 | bwd_microstep: 1347.88 | bwd_inner_microstep: 1347.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 18:26:06,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.13 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-10 18:26:08,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.47 | bwd_microstep: 1544.10 | bwd_inner_microstep: 1544.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 18:26:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.27 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199
[2024-06-10 18:26:11,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.77 | bwd_microstep: 957.12 | bwd_inner_microstep: 957.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 18:26:13,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.61 | bwd_microstep: 1488.16 | bwd_inner_microstep: 1488.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 18:26:15,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1561.13 | bwd_inner_microstep: 1561.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 18:26:17,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1375.32 | bwd_inner_microstep: 1375.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3458
[2024-06-10 18:26:19,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.81 | bwd_microstep: 1408.25 | bwd_inner_microstep: 1408.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 18:26:21,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.10 | bwd_microstep: 1346.96 | bwd_inner_microstep: 1346.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 18:26:26,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.10 | optimizer_step: 6.61
[2024-06-10 18:26:26,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.54 | bwd_microstep: 4366.92 | bwd_inner_microstep: 1813.57 | bwd_allreduce_microstep: 2553.30 | step_microstep: 37.68
[2024-06-10 18:26:26,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15697.91 | bwd: 44619.25 | bwd_inner: 42065.05 | bwd_allreduce: 2553.54 | step: 39.24
{'loss': 1.1826, 'learning_rate': 1.469683909326217e-05, 'epoch': 0.6}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 18:26:28,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1330.63 | bwd_inner_microstep: 1330.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4074
[2024-06-10 18:26:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.54 | bwd_microstep: 1721.81 | bwd_inner_microstep: 1721.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3885
[2024-06-10 18:26:32,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1416.16 | bwd_inner_microstep: 1416.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-10 18:26:35,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.89 | bwd_microstep: 2436.50 | bwd_inner_microstep: 2436.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744
[2024-06-10 18:26:37,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.16 | bwd_microstep: 1461.93 | bwd_inner_microstep: 1461.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 18:26:39,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.31 | bwd_microstep: 1246.31 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 18:26:40,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.94 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-10 18:26:42,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.19 | bwd_microstep: 1182.41 | bwd_inner_microstep: 1182.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 18:26:44,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1382.61 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 18:26:46,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1345.91 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668
[2024-06-10 18:26:48,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.97 | bwd_microstep: 1448.67 | bwd_inner_microstep: 1448.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443
[2024-06-10 18:26:50,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.37 | bwd_microstep: 1281.04 | bwd_inner_microstep: 1281.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-10 18:26:52,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.05 | bwd_microstep: 1716.61 | bwd_inner_microstep: 1716.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 18:26:54,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1346.16 | bwd_inner_microstep: 1346.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 18:26:56,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1473.78 | bwd_inner_microstep: 1473.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2095
[2024-06-10 18:26:57,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.94 | bwd_microstep: 927.74 | bwd_inner_microstep: 927.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105
[2024-06-10 18:26:58,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.00 | bwd_microstep: 920.71 | bwd_inner_microstep: 920.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687
[2024-06-10 18:27:00,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-10 18:27:03,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.96 | bwd_microstep: 1535.07 | bwd_inner_microstep: 1535.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-10 18:27:04,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.40 | bwd_microstep: 1352.64 | bwd_inner_microstep: 1352.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 18:27:06,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.86 | bwd_microstep: 1429.17 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 18:27:08,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 18:27:10,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 18:27:12,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 18:27:13,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.86 | bwd_microstep: 802.48 | bwd_inner_microstep: 802.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3543
[2024-06-10 18:27:15,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.91 | bwd_microstep: 1442.84 | bwd_inner_microstep: 1442.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2221
[2024-06-10 18:27:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.85 | bwd_microstep: 960.17 | bwd_inner_microstep: 960.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 18:27:18,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1376.47 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 18:27:21,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1549.96 | bwd_inner_microstep: 1549.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3765
[2024-06-10 18:27:23,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.97 | bwd_microstep: 1681.94 | bwd_inner_microstep: 1681.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 18:27:25,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.04 | bwd_microstep: 1644.46 | bwd_inner_microstep: 1644.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 18:27:27,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.16 | optimizer_step: 6.63
[2024-06-10 18:27:27,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1325.19 | bwd_inner_microstep: 1316.62 | bwd_allreduce_microstep: 8.53 | step_microstep: 37.64
[2024-06-10 18:27:27,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16365.48 | bwd: 44665.08 | bwd_inner: 44655.65 | bwd_allreduce: 8.76 | step: 39.08
{'loss': 1.2214, 'learning_rate': 1.4660658020516966e-05, 'epoch': 0.6}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2462
[2024-06-10 18:27:28,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.31 | bwd_microstep: 1034.88 | bwd_inner_microstep: 1034.81 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 18:27:30,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.37 | bwd_microstep: 1274.35 | bwd_inner_microstep: 1274.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 18:27:32,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1342.73 | bwd_inner_microstep: 1342.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2231
[2024-06-10 18:27:33,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.33 | bwd_microstep: 769.01 | bwd_inner_microstep: 768.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 18:27:35,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.18 | bwd_microstep: 1632.62 | bwd_inner_microstep: 1632.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 18:27:37,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.98 | bwd_microstep: 1187.59 | bwd_inner_microstep: 1187.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 18:27:39,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1393.70 | bwd_inner_microstep: 1393.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 18:27:41,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.92 | bwd_microstep: 1189.46 | bwd_inner_microstep: 1189.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 18:27:42,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 18:27:44,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1483.82 | bwd_inner_microstep: 1483.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 18:27:46,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1278.37 | bwd_inner_microstep: 1278.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428
[2024-06-10 18:27:48,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.67 | bwd_microstep: 1301.24 | bwd_inner_microstep: 1301.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 18:27:50,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637
[2024-06-10 18:27:52,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1539.86 | bwd_inner_microstep: 1539.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 18:27:53,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.14 | bwd_microstep: 795.42 | bwd_inner_microstep: 795.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2490
[2024-06-10 18:27:54,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.06 | bwd_microstep: 954.16 | bwd_inner_microstep: 954.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-10 18:27:56,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.14 | bwd_microstep: 1514.05 | bwd_inner_microstep: 1514.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 18:27:59,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.08 | bwd_microstep: 1658.21 | bwd_inner_microstep: 1658.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 18:28:01,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1487.23 | bwd_inner_microstep: 1487.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 18:28:02,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.35 | bwd_microstep: 1284.60 | bwd_inner_microstep: 1284.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 18:28:04,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1395.38 | bwd_inner_microstep: 1395.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275
[2024-06-10 18:28:06,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.79 | bwd_microstep: 877.11 | bwd_inner_microstep: 877.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 18:28:07,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 803.56 | bwd_inner_microstep: 803.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-10 18:28:09,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.75 | bwd_microstep: 1313.55 | bwd_inner_microstep: 1313.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2687
[2024-06-10 18:28:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.01 | bwd_microstep: 1222.09 | bwd_inner_microstep: 1222.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3559
[2024-06-10 18:28:12,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.76 | bwd_microstep: 1250.44 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934
[2024-06-10 18:28:13,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.14 | bwd_microstep: 761.24 | bwd_inner_microstep: 761.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 18:28:15,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1493.54 | bwd_inner_microstep: 1493.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3425
[2024-06-10 18:28:17,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.98 | bwd_microstep: 1541.44 | bwd_inner_microstep: 1541.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 18:28:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1594.27 | bwd_inner_microstep: 1594.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 18:28:21,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1506.27 | bwd_inner_microstep: 1506.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3562
[2024-06-10 18:28:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 18:28:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 6697.75 | bwd_inner_microstep: 1713.46 | bwd_allreduce_microstep: 4984.22 | step_microstep: 38.86
[2024-06-10 18:28:29,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15354.46 | bwd: 46081.76 | bwd_inner: 41096.56 | bwd_allreduce: 4984.49 | step: 40.36
{'loss': 1.2168, 'learning_rate': 1.4624495752906472e-05, 'epoch': 0.6}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2013
[2024-06-10 18:28:30,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.52 | bwd_microstep: 890.37 | bwd_inner_microstep: 890.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3973
[2024-06-10 18:28:32,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.63 | bwd_microstep: 1306.55 | bwd_inner_microstep: 1306.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856
[2024-06-10 18:28:34,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.67 | bwd_microstep: 1553.70 | bwd_inner_microstep: 1553.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 18:28:36,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1376.02 | bwd_inner_microstep: 1376.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 18:28:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1480.30 | bwd_inner_microstep: 1480.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 18:28:40,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1244.06 | bwd_inner_microstep: 1244.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 18:28:41,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.54 | bwd_microstep: 1247.29 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 18:28:43,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.88 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 18:28:45,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1290.08 | bwd_inner_microstep: 1290.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 18:28:47,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1288.79 | bwd_inner_microstep: 1288.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 18:28:49,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1344.16 | bwd_inner_microstep: 1344.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448
[2024-06-10 18:28:51,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1410.91 | bwd_inner_microstep: 1410.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569
[2024-06-10 18:28:53,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 1457.52 | bwd_inner_microstep: 1457.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 18:28:54,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.92 | bwd_microstep: 1341.03 | bwd_inner_microstep: 1341.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2350
[2024-06-10 18:28:56,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.51 | bwd_microstep: 990.36 | bwd_inner_microstep: 990.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-10 18:28:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.26 | bwd_microstep: 921.54 | bwd_inner_microstep: 921.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 18:28:59,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.52 | bwd_microstep: 1339.38 | bwd_inner_microstep: 1339.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3640
[2024-06-10 18:29:01,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1472.91 | bwd_inner_microstep: 1472.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 18:29:03,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1292.51 | bwd_inner_microstep: 1292.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 18:29:05,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1459.01 | bwd_inner_microstep: 1458.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 643
[2024-06-10 18:29:05,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 108.95 | bwd_microstep: 274.14 | bwd_inner_microstep: 274.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 18:29:07,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.68 | bwd_microstep: 1660.00 | bwd_inner_microstep: 1659.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 18:29:10,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1555.94 | bwd_inner_microstep: 1555.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 18:29:12,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1657.47 | bwd_inner_microstep: 1657.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 18:29:14,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1557.30 | bwd_inner_microstep: 1557.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 18:29:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 18:29:17,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.43 | bwd_microstep: 809.15 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 18:29:19,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1423.65 | bwd_inner_microstep: 1423.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 18:29:21,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1494.94 | bwd_inner_microstep: 1494.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 18:29:23,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.34 | bwd_microstep: 1550.70 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583
[2024-06-10 18:29:25,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.10 | bwd_microstep: 1524.41 | bwd_inner_microstep: 1524.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 18:29:30,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 18:29:30,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.90 | bwd_microstep: 3627.24 | bwd_inner_microstep: 1750.24 | bwd_allreduce_microstep: 1876.93 | step_microstep: 38.15
[2024-06-10 18:29:30,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15895.33 | bwd: 44520.06 | bwd_inner: 42642.18 | bwd_allreduce: 1877.17 | step: 39.74
{'loss': 1.1991, 'learning_rate': 1.4588352417793976e-05, 'epoch': 0.6}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 18:29:32,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.08 | bwd_microstep: 1488.81 | bwd_inner_microstep: 1488.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3984
[2024-06-10 18:29:34,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.39 | bwd_microstep: 1703.21 | bwd_inner_microstep: 1703.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476
[2024-06-10 18:29:36,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.98 | bwd_microstep: 1212.79 | bwd_inner_microstep: 1212.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 18:29:37,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.44 | bwd_microstep: 1313.08 | bwd_inner_microstep: 1313.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 18:29:39,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.24 | bwd_microstep: 1380.38 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 18:29:41,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1389.99 | bwd_inner_microstep: 1389.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 18:29:43,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.44 | bwd_microstep: 1149.75 | bwd_inner_microstep: 1149.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 18:29:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1377.35 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-10 18:29:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 1153.56 | bwd_inner_microstep: 1153.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1900
[2024-06-10 18:29:47,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.31 | bwd_microstep: 748.49 | bwd_inner_microstep: 748.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3507
[2024-06-10 18:29:49,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1437.50 | bwd_inner_microstep: 1437.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1950
[2024-06-10 18:29:50,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.43 | bwd_microstep: 728.42 | bwd_inner_microstep: 728.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2493
[2024-06-10 18:29:52,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.93 | bwd_microstep: 1119.14 | bwd_inner_microstep: 1119.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 18:29:54,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-10 18:29:56,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.65 | bwd_microstep: 1276.60 | bwd_inner_microstep: 1276.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 18:29:58,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1524.38 | bwd_inner_microstep: 1524.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3643
[2024-06-10 18:30:00,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.68 | bwd_microstep: 1680.91 | bwd_inner_microstep: 1680.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 18:30:02,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1396.08 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 18:30:04,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.93 | bwd_microstep: 1525.91 | bwd_inner_microstep: 1525.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 18:30:06,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 1490.57 | bwd_inner_microstep: 1490.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 18:30:08,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3036
[2024-06-10 18:30:10,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.79 | bwd_microstep: 1230.31 | bwd_inner_microstep: 1230.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3642
[2024-06-10 18:30:12,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.58 | bwd_microstep: 1437.86 | bwd_inner_microstep: 1437.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3593
[2024-06-10 18:30:13,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.82 | bwd_microstep: 1355.04 | bwd_inner_microstep: 1355.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 18:30:16,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1497.86 | bwd_inner_microstep: 1497.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3032
[2024-06-10 18:30:17,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.61 | bwd_microstep: 1170.48 | bwd_inner_microstep: 1170.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 18:30:19,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.81 | bwd_microstep: 1470.94 | bwd_inner_microstep: 1470.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805
[2024-06-10 18:30:22,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.58 | bwd_microstep: 1752.40 | bwd_inner_microstep: 1752.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 18:30:23,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1345.86 | bwd_inner_microstep: 1345.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-10 18:30:25,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1430.93 | bwd_inner_microstep: 1430.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 18:30:27,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1454.96 | bwd_inner_microstep: 1454.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 18:30:32,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 18:30:32,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 4132.50 | bwd_inner_microstep: 1699.53 | bwd_allreduce_microstep: 2432.92 | step_microstep: 37.95
[2024-06-10 18:30:32,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16250.98 | bwd: 46020.35 | bwd_inner: 43586.52 | bwd_allreduce: 2433.15 | step: 39.44
{'loss': 1.2325, 'learning_rate': 1.4552228142476138e-05, 'epoch': 0.6}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 18:30:34,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1379.09 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883
[2024-06-10 18:30:36,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.34 | bwd_microstep: 1578.19 | bwd_inner_microstep: 1578.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 18:30:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.30 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 18:30:40,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1287.90 | bwd_inner_microstep: 1287.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 18:30:42,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.28 | bwd_microstep: 1436.46 | bwd_inner_microstep: 1436.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4075
[2024-06-10 18:30:44,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.36 | bwd_microstep: 1724.27 | bwd_inner_microstep: 1724.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 18:30:46,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.91 | bwd_microstep: 793.35 | bwd_inner_microstep: 793.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 18:30:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1381.99 | bwd_inner_microstep: 1381.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3425
[2024-06-10 18:30:49,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.42 | bwd_microstep: 1201.97 | bwd_inner_microstep: 1201.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2897
[2024-06-10 18:30:51,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.56 | bwd_microstep: 1133.91 | bwd_inner_microstep: 1133.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-10 18:30:52,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 891.47 | bwd_inner_microstep: 891.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3383
[2024-06-10 18:30:54,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.69 | bwd_microstep: 1271.90 | bwd_inner_microstep: 1271.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687
[2024-06-10 18:30:56,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.66 | bwd_microstep: 1624.72 | bwd_inner_microstep: 1624.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 18:30:58,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.08 | bwd_microstep: 1618.24 | bwd_inner_microstep: 1618.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3512
[2024-06-10 18:31:00,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1549.56 | bwd_inner_microstep: 1549.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3631
[2024-06-10 18:31:03,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.44 | bwd_microstep: 1710.38 | bwd_inner_microstep: 1710.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 18:31:04,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.05 | bwd_microstep: 795.37 | bwd_inner_microstep: 795.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-10 18:31:05,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.39 | bwd_microstep: 1158.34 | bwd_inner_microstep: 1158.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 18:31:06,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 696.01 | bwd_inner_microstep: 695.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 18:31:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1378.43 | bwd_inner_microstep: 1378.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 18:31:10,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.98 | bwd_microstep: 1487.27 | bwd_inner_microstep: 1487.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 18:31:12,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1391.82 | bwd_inner_microstep: 1391.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3527
[2024-06-10 18:31:14,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1258.65 | bwd_inner_microstep: 1258.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 18:31:15,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.89 | bwd_microstep: 878.66 | bwd_inner_microstep: 878.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 18:31:17,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1286.82 | bwd_inner_microstep: 1286.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-10 18:31:19,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1311.23 | bwd_inner_microstep: 1311.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 18:31:20,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.83 | bwd_microstep: 818.03 | bwd_inner_microstep: 818.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3838
[2024-06-10 18:31:22,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.39 | bwd_microstep: 1484.37 | bwd_inner_microstep: 1484.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 18:31:24,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.74 | bwd_microstep: 1300.32 | bwd_inner_microstep: 1300.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 18:31:26,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.04 | bwd_microstep: 1622.94 | bwd_inner_microstep: 1622.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 18:31:28,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1372.95 | bwd_inner_microstep: 1372.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583
[2024-06-10 18:31:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 18:31:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 6270.41 | bwd_inner_microstep: 1646.12 | bwd_allreduce_microstep: 4624.24 | step_microstep: 37.78
[2024-06-10 18:31:35,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.50 | bwd: 46573.98 | bwd_inner: 41948.79 | bwd_allreduce: 4624.48 | step: 39.37
8:02<11:52:43, 61.53s/it]
 60%|█████▉    | 1032/1726 [17:49:02<11:48:38, 61.27s/it]


 60%|█████▉    | 1032/1726 [17:49:02<11:48:38, 61.27s/it]
 60%|█████▉    | 1033/1726 [17:50:04<11:47:58, 61.30s/it]


 60%|█████▉    | 1033/1726 [17:50:04<11:47:58, 61.30s/it]
 60%|█████▉    | 1034/1726 [17:51:06<11:48:33, 61.44s/it]


 60%|█████▉    | 1034/1726 [17:51:06<11:48:33, 61.44s/it]
 60%|█████▉    | 1035/1726 [17:52:06<11:45:10, 61.23s/it]


 60%|█████▉    | 1035/1726 [17:52:06<11:45:10, 61.23s/it]
 60%|██████    | 1036/1726 [17:53:09<11:48:53, 61.64s/it]


 60%|██████    | 1036/1726 [17:53:09<11:48:53, 61.64s/it]
 60{'loss': 1.2181, 'learning_rate': 1.4516123054182457e-05, 'epoch': 0.6}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1939
[2024-06-10 18:31:36,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.14 | bwd_microstep: 836.43 | bwd_inner_microstep: 836.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 18:31:38,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 18:31:40,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.36 | bwd_microstep: 1551.20 | bwd_inner_microstep: 1551.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-10 18:31:42,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.08 | bwd_microstep: 1429.10 | bwd_inner_microstep: 1429.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 18:31:44,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 18:31:46,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181
[2024-06-10 18:31:47,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.29 | bwd_microstep: 949.42 | bwd_inner_microstep: 949.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 18:31:49,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1242.39 | bwd_inner_microstep: 1242.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 18:31:51,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1481.36 | bwd_inner_microstep: 1481.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 18:31:52,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.52 | bwd_microstep: 794.37 | bwd_inner_microstep: 794.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 18:31:54,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1251.20 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501
[2024-06-10 18:31:56,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.56 | bwd_microstep: 1405.96 | bwd_inner_microstep: 1405.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3433
[2024-06-10 18:31:57,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1295.44 | bwd_inner_microstep: 1295.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2042
[2024-06-10 18:31:59,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 829.56 | bwd_inner_microstep: 829.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2955
[2024-06-10 18:32:00,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.87 | bwd_microstep: 1196.66 | bwd_inner_microstep: 1196.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3702
[2024-06-10 18:32:02,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1557.62 | bwd_inner_microstep: 1557.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 18:32:04,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1381.97 | bwd_inner_microstep: 1381.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3538
[2024-06-10 18:32:06,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1360.88 | bwd_inner_microstep: 1360.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3697
[2024-06-10 18:32:08,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1261.80 | bwd_inner_microstep: 1261.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 18:32:10,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.86 | bwd_microstep: 1495.75 | bwd_inner_microstep: 1495.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1049
[2024-06-10 18:32:11,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 155.53 | bwd_microstep: 401.85 | bwd_inner_microstep: 401.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 18:32:13,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1399.36 | bwd_inner_microstep: 1399.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 18:32:14,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1392.82 | bwd_inner_microstep: 1392.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523
[2024-06-10 18:32:16,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1227.44 | bwd_inner_microstep: 1227.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2133
[2024-06-10 18:32:17,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.58 | bwd_microstep: 831.58 | bwd_inner_microstep: 831.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 18:32:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.43 | bwd_microstep: 805.14 | bwd_inner_microstep: 805.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 18:32:20,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1402.31 | bwd_inner_microstep: 1402.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 18:32:22,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1508.65 | bwd_inner_microstep: 1508.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 18:32:24,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1406.38 | bwd_inner_microstep: 1406.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 18:32:26,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.67 | bwd_microstep: 1511.76 | bwd_inner_microstep: 1511.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3818
[2024-06-10 18:32:29,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.09 | bwd_microstep: 1691.20 | bwd_inner_microstep: 1691.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3557
[2024-06-10 18:32:38,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-10 18:32:38,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.78 | bwd_microstep: 8639.44 | bwd_inner_microstep: 1841.57 | bwd_allreduce_microstep: 6797.81 | step_microstep: 37.80
[2024-06-10 18:32:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.14 | bwd: 47753.02 | bwd_inner: 40954.29 | bwd_allreduce: 6798.04 | step: 39.28
{'loss': 1.2485, 'learning_rate': 1.4480037280074876e-05, 'epoch': 0.6}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 18:32:40,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 18:32:42,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1376.15 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352
[2024-06-10 18:32:43,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.81 | bwd_microstep: 984.11 | bwd_inner_microstep: 984.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4263
[2024-06-10 18:32:46,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.31 | bwd_microstep: 1663.87 | bwd_inner_microstep: 1663.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3401
[2024-06-10 18:32:47,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1303.61 | bwd_inner_microstep: 1303.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 18:32:48,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.91 | bwd_microstep: 791.20 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 18:32:50,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488
[2024-06-10 18:32:52,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.18 | bwd_microstep: 1215.59 | bwd_inner_microstep: 1215.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 18:32:54,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1242.33 | bwd_inner_microstep: 1242.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 18:32:55,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1250.55 | bwd_inner_microstep: 1250.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3500
[2024-06-10 18:32:57,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1441.59 | bwd_inner_microstep: 1441.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447
[2024-06-10 18:32:59,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.94 | bwd_microstep: 1374.99 | bwd_inner_microstep: 1374.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-10 18:33:01,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.74 | bwd_microstep: 1308.12 | bwd_inner_microstep: 1308.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 18:33:02,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.91 | bwd_microstep: 890.62 | bwd_inner_microstep: 890.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-10 18:33:04,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 18:33:07,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.56 | bwd_microstep: 1623.05 | bwd_inner_microstep: 1623.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 18:33:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1347.10 | bwd_inner_microstep: 1347.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 18:33:10,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.62 | bwd_microstep: 1443.11 | bwd_inner_microstep: 1443.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 18:33:11,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.04 | bwd_microstep: 798.67 | bwd_inner_microstep: 798.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 18:33:13,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1409.49 | bwd_inner_microstep: 1409.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 18:33:15,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.09 | bwd_microstep: 1474.23 | bwd_inner_microstep: 1474.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 18:33:17,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.71 | bwd_microstep: 1159.66 | bwd_inner_microstep: 1159.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 18:33:19,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.99 | bwd_microstep: 1439.12 | bwd_inner_microstep: 1439.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 18:33:21,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1552.67 | bwd_inner_microstep: 1552.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2293
[2024-06-10 18:33:22,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.16 | bwd_microstep: 911.21 | bwd_inner_microstep: 911.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3802
[2024-06-10 18:33:25,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.69 | bwd_microstep: 1581.50 | bwd_inner_microstep: 1581.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-10 18:33:26,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.69 | bwd_microstep: 901.53 | bwd_inner_microstep: 901.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3567
[2024-06-10 18:33:28,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1361.86 | bwd_inner_microstep: 1361.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537
[2024-06-10 18:33:30,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.46 | bwd_microstep: 1325.83 | bwd_inner_microstep: 1325.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2950
[2024-06-10 18:33:31,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.18 | bwd_microstep: 1198.21 | bwd_inner_microstep: 1198.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 18:33:33,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.23 | bwd_microstep: 1339.79 | bwd_inner_microstep: 1339.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446
[2024-06-10 18:33:39,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 18:33:39,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.25 | bwd_microstep: 5713.60 | bwd_inner_microstep: 1379.00 | bwd_allreduce_microstep: 4334.55 | step_microstep: 37.95
[2024-06-10 18:33:39,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15398.69 | bwd: 45529.43 | bwd_inner: 41193.97 | bwd_allreduce: 4334.78 | step: 39.38
{'loss': 1.2828, 'learning_rate': 1.4443970947247308e-05, 'epoch': 0.6}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 18:33:41,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1374.36 | bwd_inner_microstep: 1374.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3996
[2024-06-10 18:33:43,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.12 | bwd_microstep: 1600.50 | bwd_inner_microstep: 1600.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 18:33:45,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1372.81 | bwd_inner_microstep: 1372.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933
[2024-06-10 18:33:46,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.85 | bwd_microstep: 819.32 | bwd_inner_microstep: 819.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3554
[2024-06-10 18:33:48,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.32 | bwd_microstep: 1199.94 | bwd_inner_microstep: 1199.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 18:33:50,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.96 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 18:33:51,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 791.93 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 18:33:53,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.33 | bwd_microstep: 1244.52 | bwd_inner_microstep: 1244.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4030
[2024-06-10 18:33:55,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1609.24 | bwd_inner_microstep: 1609.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 18:33:57,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.85 | bwd_microstep: 1514.80 | bwd_inner_microstep: 1514.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3487
[2024-06-10 18:33:59,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.86 | bwd_microstep: 1423.46 | bwd_inner_microstep: 1423.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659
[2024-06-10 18:34:01,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.37 | bwd_microstep: 1440.12 | bwd_inner_microstep: 1440.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 18:34:03,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.73 | bwd_microstep: 1309.70 | bwd_inner_microstep: 1309.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3645
[2024-06-10 18:34:05,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.44 | bwd_microstep: 1432.78 | bwd_inner_microstep: 1432.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 18:34:06,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 698.29 | bwd_inner_microstep: 698.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 18:34:08,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.39 | bwd_microstep: 1339.95 | bwd_inner_microstep: 1339.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2458
[2024-06-10 18:34:09,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.53 | bwd_microstep: 949.70 | bwd_inner_microstep: 949.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 18:34:11,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1552.92 | bwd_inner_microstep: 1552.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 18:34:13,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.87 | bwd_microstep: 1385.69 | bwd_inner_microstep: 1385.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 18:34:15,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1276.20 | bwd_inner_microstep: 1276.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510
[2024-06-10 18:34:17,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1190.77 | bwd_inner_microstep: 1190.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 18:34:18,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.60 | bwd_microstep: 1153.12 | bwd_inner_microstep: 1153.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177
[2024-06-10 18:34:19,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.40 | bwd_microstep: 854.80 | bwd_inner_microstep: 854.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 18:34:21,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1295.75 | bwd_inner_microstep: 1295.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3662
[2024-06-10 18:34:23,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.84 | bwd_microstep: 1324.44 | bwd_inner_microstep: 1324.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279
[2024-06-10 18:34:24,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.19 | bwd_microstep: 906.64 | bwd_inner_microstep: 906.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2074
[2024-06-10 18:34:26,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.16 | bwd_microstep: 974.20 | bwd_inner_microstep: 974.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 18:34:28,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1495.81 | bwd_inner_microstep: 1495.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3757
[2024-06-10 18:34:30,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 1608.98 | bwd_inner_microstep: 1608.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 18:34:32,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1498.46 | bwd_inner_microstep: 1498.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419
[2024-06-10 18:34:34,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 18:34:41,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 18:34:41,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 6643.52 | bwd_inner_microstep: 1641.36 | bwd_allreduce_microstep: 5002.10 | step_microstep: 38.21
[2024-06-10 18:34:41,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15322.72 | bwd: 45948.93 | bwd_inner: 40945.92 | bwd_allreduce: 5002.33 | step: 39.64
{'loss': 1.1659, 'learning_rate': 1.4407924182725168e-05, 'epoch': 0.6}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474
[2024-06-10 18:34:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1568.07 | bwd_inner_microstep: 1568.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840
[2024-06-10 18:34:45,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.90 | bwd_microstep: 1652.06 | bwd_inner_microstep: 1652.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3945
[2024-06-10 18:34:48,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.93 | bwd_microstep: 1593.24 | bwd_inner_microstep: 1593.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4400
[2024-06-10 18:34:50,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.51 | bwd_microstep: 1712.10 | bwd_inner_microstep: 1712.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3406
[2024-06-10 18:34:52,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.73 | bwd_microstep: 1179.20 | bwd_inner_microstep: 1179.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 18:34:54,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1540.26 | bwd_inner_microstep: 1540.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 18:34:56,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.58 | bwd_microstep: 1646.94 | bwd_inner_microstep: 1646.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 18:34:58,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.71 | bwd_microstep: 1151.95 | bwd_inner_microstep: 1151.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 18:34:59,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.53 | bwd_microstep: 1402.76 | bwd_inner_microstep: 1402.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4093
[2024-06-10 18:35:02,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1526.16 | bwd_inner_microstep: 1526.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 18:35:04,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.47 | bwd_microstep: 1486.35 | bwd_inner_microstep: 1486.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 18:35:06,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1450.41 | bwd_inner_microstep: 1450.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-10 18:35:07,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.26 | bwd_microstep: 906.88 | bwd_inner_microstep: 906.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 18:35:09,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1399.82 | bwd_inner_microstep: 1399.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3560
[2024-06-10 18:35:11,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1590.81 | bwd_inner_microstep: 1590.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 18:35:13,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.56 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645
[2024-06-10 18:35:15,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 18:35:17,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1245.44 | bwd_inner_microstep: 1245.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3498
[2024-06-10 18:35:19,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.27 | bwd_microstep: 1508.19 | bwd_inner_microstep: 1508.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3730
[2024-06-10 18:35:21,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1367.13 | bwd_inner_microstep: 1367.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2927
[2024-06-10 18:35:22,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.22 | bwd_microstep: 1228.87 | bwd_inner_microstep: 1228.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 18:35:24,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1488.81 | bwd_inner_microstep: 1488.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3824
[2024-06-10 18:35:27,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.32 | bwd_microstep: 1690.68 | bwd_inner_microstep: 1690.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3631
[2024-06-10 18:35:29,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1475.09 | bwd_inner_microstep: 1475.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 18:35:30,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.51 | bwd_microstep: 1183.22 | bwd_inner_microstep: 1183.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 18:35:32,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.00 | bwd_microstep: 1157.53 | bwd_inner_microstep: 1157.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 18:35:34,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 18:35:36,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.94 | bwd_microstep: 1451.90 | bwd_inner_microstep: 1451.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3574
[2024-06-10 18:35:38,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.59 | bwd_microstep: 1331.72 | bwd_inner_microstep: 1331.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 18:35:40,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1350.32 | bwd_inner_microstep: 1350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 18:35:41,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1299.21 | bwd_inner_microstep: 1299.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 18:35:43,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.03 | optimizer_step: 6.66
[2024-06-10 18:35:43,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1326.58 | bwd_inner_microstep: 1317.69 | bwd_allreduce_microstep: 8.85 | step_microstep: 37.64
[2024-06-10 18:35:43,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16866.51 | bwd: 45089.46 | bwd_inner: 45079.71 | bwd_allreduce: 9.07 | step: 39.07
{'loss': 1.2037, 'learning_rate': 1.4371897113464992e-05, 'epoch': 0.6}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3458
[2024-06-10 18:35:45,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.51 | bwd_microstep: 1568.06 | bwd_inner_microstep: 1568.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3956
[2024-06-10 18:35:47,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1494.29 | bwd_inner_microstep: 1494.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1935
[2024-06-10 18:35:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.38 | bwd_microstep: 851.75 | bwd_inner_microstep: 851.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 18:35:51,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.24 | bwd_microstep: 1538.48 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 18:35:53,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.60 | bwd_microstep: 1378.28 | bwd_inner_microstep: 1378.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-10 18:35:55,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.02 | bwd_microstep: 1632.64 | bwd_inner_microstep: 1632.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 18:35:57,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508
[2024-06-10 18:35:58,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.25 | bwd_microstep: 1223.15 | bwd_inner_microstep: 1223.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 18:36:01,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.11 | bwd_microstep: 1637.35 | bwd_inner_microstep: 1637.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 18:36:02,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1251.69 | bwd_inner_microstep: 1251.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 18:36:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.79 | bwd_microstep: 1187.43 | bwd_inner_microstep: 1187.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507
[2024-06-10 18:36:06,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1446.31 | bwd_inner_microstep: 1446.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2202
[2024-06-10 18:36:07,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.96 | bwd_microstep: 960.59 | bwd_inner_microstep: 960.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 18:36:09,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1389.53 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 18:36:11,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.83 | bwd_microstep: 1614.18 | bwd_inner_microstep: 1614.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 18:36:13,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1493.10 | bwd_inner_microstep: 1493.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-10 18:36:16,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.87 | bwd_microstep: 1625.47 | bwd_inner_microstep: 1625.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431
[2024-06-10 18:36:18,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.78 | bwd_microstep: 1314.90 | bwd_inner_microstep: 1314.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 18:36:19,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.09 | bwd_microstep: 1352.43 | bwd_inner_microstep: 1352.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1937
[2024-06-10 18:36:21,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.90 | bwd_microstep: 819.22 | bwd_inner_microstep: 819.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3672
[2024-06-10 18:36:23,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.66 | bwd_microstep: 1785.38 | bwd_inner_microstep: 1785.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 18:36:25,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.99 | bwd_microstep: 1539.77 | bwd_inner_microstep: 1539.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268
[2024-06-10 18:36:26,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.37 | bwd_microstep: 972.11 | bwd_inner_microstep: 972.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 18:36:29,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1506.83 | bwd_inner_microstep: 1506.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 18:36:30,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1402.55 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 18:36:32,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1286.84 | bwd_inner_microstep: 1286.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 18:36:34,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1258.15 | bwd_inner_microstep: 1258.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 18:36:36,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.94 | bwd_microstep: 1187.76 | bwd_inner_microstep: 1187.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3749
[2024-06-10 18:36:38,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.08 | bwd_microstep: 1373.34 | bwd_inner_microstep: 1373.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093
[2024-06-10 18:36:39,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.80 | bwd_microstep: 921.48 | bwd_inner_microstep: 921.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3768
[2024-06-10 18:36:41,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.91 | bwd_microstep: 1676.79 | bwd_inner_microstep: 1676.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-10 18:36:45,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 18:36:45,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.08 | bwd_microstep: 3710.82 | bwd_inner_microstep: 929.19 | bwd_allreduce_microstep: 2781.58 | step_microstep: 38.08
[2024-06-10 18:36:45,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15965.27 | bwd: 45647.29 | bwd_inner: 42864.81 | bwd_allreduce: 2781.81 | step: 39.53
%|██████    | 1037/1726 [17:54:11<11:51:04, 61.92s/it]


 60%|██████    | 1037/1726 [17:54:11<11:51:04, 61.92s/it]
 60%|██████    | 1038/1726 [17:55:15<11:54:56, 62.35s/it]


 60%|██████    | 1038/1726 [17:55:15<11:54:56, 62.35s/it]
 60%|██████    | 1039/1726 [17:56:16<11:50:07, 62.02s/it]


 60%|██████    | 1039/1726 [17:56:16<11:50:07, 62.02s/it]
 60%|██████    | 1040/1726 [17:57:18<11:47:37, 61.89s/it]


 60%|██████    | 1040/1726 [17:57:18<11:47:37, 61.89s/it]
 60%|██████    | 1041/1726 [17:58:20<11:47:57, 62.01s/it]


 60%|██████    | 1041/1726 [17:58:20<11:47:57, 62.01s/it]
 60%|██████    | 104{'loss': 1.1925, 'learning_rate': 1.433588986635392e-05, 'epoch': 0.6}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 18:36:47,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.65 | bwd_microstep: 1330.66 | bwd_inner_microstep: 1330.47 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 18:36:48,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.66 | bwd_microstep: 804.26 | bwd_inner_microstep: 804.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 18:36:50,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.92 | bwd_microstep: 1245.23 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-10 18:36:52,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.51 | bwd_microstep: 1543.21 | bwd_inner_microstep: 1543.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 18:36:54,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 18:36:56,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1280.66 | bwd_inner_microstep: 1280.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 18:36:57,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.36 | bwd_microstep: 1273.07 | bwd_inner_microstep: 1273.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 18:36:59,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1414.41 | bwd_inner_microstep: 1414.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689
[2024-06-10 18:37:02,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.71 | bwd_microstep: 1551.26 | bwd_inner_microstep: 1551.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 886
[2024-06-10 18:37:02,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.81 | bwd_microstep: 369.59 | bwd_inner_microstep: 369.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-10 18:37:04,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.70 | bwd_microstep: 1307.63 | bwd_inner_microstep: 1307.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3539
[2024-06-10 18:37:06,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1416.10 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2656
[2024-06-10 18:37:07,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.60 | bwd_microstep: 923.76 | bwd_inner_microstep: 923.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 18:37:09,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.12 | bwd_microstep: 1489.97 | bwd_inner_microstep: 1489.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 18:37:11,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1353.21 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049
[2024-06-10 18:37:12,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.49 | bwd_microstep: 817.70 | bwd_inner_microstep: 817.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 18:37:14,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1390.35 | bwd_inner_microstep: 1390.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 18:37:16,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.51 | bwd_microstep: 1584.00 | bwd_inner_microstep: 1583.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819
[2024-06-10 18:37:19,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.64 | bwd_microstep: 1684.73 | bwd_inner_microstep: 1684.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 18:37:21,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1455.76 | bwd_inner_microstep: 1455.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1940
[2024-06-10 18:37:22,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.81 | bwd_microstep: 729.35 | bwd_inner_microstep: 729.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 18:37:24,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.66 | bwd_microstep: 1449.85 | bwd_inner_microstep: 1449.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 18:37:26,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1553.54 | bwd_inner_microstep: 1553.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 18:37:28,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1398.66 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3602
[2024-06-10 18:37:30,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.51 | bwd_microstep: 1555.16 | bwd_inner_microstep: 1555.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3824
[2024-06-10 18:37:32,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.71 | bwd_microstep: 1723.05 | bwd_inner_microstep: 1723.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 18:37:34,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.27 | bwd_microstep: 1384.52 | bwd_inner_microstep: 1384.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654
[2024-06-10 18:37:36,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1426.38 | bwd_inner_microstep: 1426.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3595
[2024-06-10 18:37:38,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.69 | bwd_microstep: 1457.06 | bwd_inner_microstep: 1457.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 18:37:40,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.82 | bwd_microstep: 1607.01 | bwd_inner_microstep: 1606.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 18:37:42,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1523.82 | bwd_inner_microstep: 1523.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 18:37:46,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.55 | optimizer_step: 6.62
[2024-06-10 18:37:46,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.17 | bwd_microstep: 3188.91 | bwd_inner_microstep: 1690.95 | bwd_allreduce_microstep: 1497.88 | step_microstep: 44.91
[2024-06-10 18:37:46,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16064.90 | bwd: 44618.66 | bwd_inner: 43119.71 | bwd_allreduce: 1498.20 | step: 46.48
{'loss': 1.2195, 'learning_rate': 1.4299902568209297e-05, 'epoch': 0.6}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962
[2024-06-10 18:37:47,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.15 | bwd_microstep: 882.51 | bwd_inner_microstep: 882.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 18:37:49,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1252.64 | bwd_inner_microstep: 1252.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3872
[2024-06-10 18:37:51,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.75 | bwd_microstep: 1466.43 | bwd_inner_microstep: 1466.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2310
[2024-06-10 18:37:52,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.78 | bwd_microstep: 790.04 | bwd_inner_microstep: 790.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 18:37:53,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.37 | bwd_microstep: 788.21 | bwd_inner_microstep: 788.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3787
[2024-06-10 18:37:55,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1379.42 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 18:37:57,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 18:37:59,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1280.89 | bwd_inner_microstep: 1280.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 18:38:01,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1279.71 | bwd_inner_microstep: 1279.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 18:38:02,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1371
[2024-06-10 18:38:03,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 199.91 | bwd_microstep: 521.08 | bwd_inner_microstep: 521.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 18:38:05,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.11 | bwd_microstep: 1625.76 | bwd_inner_microstep: 1625.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3499
[2024-06-10 18:38:07,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.33 | bwd_microstep: 1221.73 | bwd_inner_microstep: 1221.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 18:38:08,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.09 | bwd_microstep: 789.97 | bwd_inner_microstep: 789.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1965
[2024-06-10 18:38:09,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 823.88 | bwd_inner_microstep: 823.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-10 18:38:11,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.26 | bwd_microstep: 1418.13 | bwd_inner_microstep: 1418.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513
[2024-06-10 18:38:13,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.30 | bwd_microstep: 1430.77 | bwd_inner_microstep: 1430.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 18:38:15,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1313.31 | bwd_inner_microstep: 1313.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 18:38:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.73 | bwd_microstep: 1271.69 | bwd_inner_microstep: 1271.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 18:38:19,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.98 | bwd_microstep: 1539.08 | bwd_inner_microstep: 1539.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399
[2024-06-10 18:38:21,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 1367.09 | bwd_inner_microstep: 1367.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 18:38:23,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.98 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 18:38:25,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.38 | bwd_microstep: 1314.73 | bwd_inner_microstep: 1314.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2545
[2024-06-10 18:38:26,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.45 | bwd_microstep: 993.98 | bwd_inner_microstep: 993.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3449
[2024-06-10 18:38:28,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1409.94 | bwd_inner_microstep: 1409.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 18:38:30,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1648.75 | bwd_inner_microstep: 1648.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-10 18:38:32,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.78 | bwd_microstep: 1544.27 | bwd_inner_microstep: 1544.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3466
[2024-06-10 18:38:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.50 | bwd_microstep: 1501.36 | bwd_inner_microstep: 1501.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3760
[2024-06-10 18:38:36,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1372.49 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3771
[2024-06-10 18:38:38,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.96 | bwd_microstep: 1572.37 | bwd_inner_microstep: 1572.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 18:38:40,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1395.90 | bwd_inner_microstep: 1395.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178
[2024-06-10 18:38:49,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.50 | optimizer_step: 6.60
[2024-06-10 18:38:49,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.82 | bwd_microstep: 8388.72 | bwd_inner_microstep: 974.32 | bwd_allreduce_microstep: 7414.33 | step_microstep: 40.78
[2024-06-10 18:38:49,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15026.58 | bwd: 47561.11 | bwd_inner: 40145.85 | bwd_allreduce: 7414.58 | step: 42.21
{'loss': 1.2184, 'learning_rate': 1.4263935345778202e-05, 'epoch': 0.6}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-10 18:38:51,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.07 | bwd_microstep: 1378.37 | bwd_inner_microstep: 1378.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4159
[2024-06-10 18:38:53,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.09 | bwd_microstep: 1734.23 | bwd_inner_microstep: 1734.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 18:38:55,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.16 | bwd_microstep: 787.15 | bwd_inner_microstep: 787.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-10 18:38:57,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1456.18 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 18:38:58,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1241.76 | bwd_inner_microstep: 1241.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 18:39:00,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.15 | bwd_microstep: 1412.35 | bwd_inner_microstep: 1412.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 18:39:02,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.87 | bwd_microstep: 1246.53 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 18:39:04,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1384.93 | bwd_inner_microstep: 1384.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 18:39:06,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1386.13 | bwd_inner_microstep: 1386.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 18:39:08,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.98 | bwd_microstep: 1543.91 | bwd_inner_microstep: 1543.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 18:39:10,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.61 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 4133
[2024-06-10 18:39:12,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.49 | bwd_microstep: 1773.59 | bwd_inner_microstep: 1773.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 18:39:14,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1379.63 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143
[2024-06-10 18:39:15,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.77 | bwd_microstep: 834.13 | bwd_inner_microstep: 834.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 655
[2024-06-10 18:39:16,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.64 | bwd_microstep: 275.81 | bwd_inner_microstep: 275.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-10 18:39:17,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.94 | bwd_microstep: 921.46 | bwd_inner_microstep: 921.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 18:39:19,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1395.79 | bwd_inner_microstep: 1395.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 18:39:21,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1295.38 | bwd_inner_microstep: 1295.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 18:39:23,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1514.02 | bwd_inner_microstep: 1513.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 18:39:25,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.28 | bwd_microstep: 1490.66 | bwd_inner_microstep: 1490.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556
[2024-06-10 18:39:27,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1364.65 | bwd_inner_microstep: 1364.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3834
[2024-06-10 18:39:28,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.22 | bwd_microstep: 1390.63 | bwd_inner_microstep: 1390.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 18:39:30,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1303.02 | bwd_inner_microstep: 1303.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 18:39:32,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 1401.31 | bwd_inner_microstep: 1401.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3829
[2024-06-10 18:39:34,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.63 | bwd_microstep: 1620.45 | bwd_inner_microstep: 1620.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290
[2024-06-10 18:39:36,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.62 | bwd_microstep: 908.18 | bwd_inner_microstep: 908.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3567
[2024-06-10 18:39:38,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.92 | bwd_microstep: 1423.93 | bwd_inner_microstep: 1423.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 18:39:40,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1499.85 | bwd_inner_microstep: 1499.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 18:39:42,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1478.27 | bwd_inner_microstep: 1478.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 18:39:43,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.95 | bwd_microstep: 907.24 | bwd_inner_microstep: 907.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 18:39:45,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1486.80 | bwd_inner_microstep: 1486.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3633
[2024-06-10 18:39:51,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.25 | optimizer_gradients: 4.33 | optimizer_step: 6.58
[2024-06-10 18:39:51,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.70 | bwd_microstep: 5020.89 | bwd_inner_microstep: 1778.00 | bwd_allreduce_microstep: 3242.83 | step_microstep: 40.00
[2024-06-10 18:39:51,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15761.37 | bwd: 45542.64 | bwd_inner: 42298.89 | bwd_allreduce: 3243.07 | step: 41.50
{'loss': 1.2385, 'learning_rate': 1.4227988325736991e-05, 'epoch': 0.61}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4828
[2024-06-10 18:39:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 708.22 | bwd_microstep: 1874.07 | bwd_inner_microstep: 1873.89 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3953
[2024-06-10 18:39:55,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1398.45 | bwd_inner_microstep: 1398.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856
[2024-06-10 18:39:58,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.56 | bwd_microstep: 1662.53 | bwd_inner_microstep: 1662.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 18:40:00,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1396.26 | bwd_inner_microstep: 1396.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 18:40:02,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4194
[2024-06-10 18:40:04,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.41 | bwd_microstep: 1563.60 | bwd_inner_microstep: 1563.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 18:40:05,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 790.74 | bwd_inner_microstep: 790.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 18:40:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.66 | bwd_microstep: 1191.39 | bwd_inner_microstep: 1191.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 18:40:08,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1249.91 | bwd_inner_microstep: 1249.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137
[2024-06-10 18:40:09,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.36 | bwd_microstep: 832.86 | bwd_inner_microstep: 832.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3536
[2024-06-10 18:40:11,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1451.66 | bwd_inner_microstep: 1451.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 18:40:14,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.44 | bwd_microstep: 1483.57 | bwd_inner_microstep: 1483.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487
[2024-06-10 18:40:16,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.07 | bwd_microstep: 1440.96 | bwd_inner_microstep: 1440.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 18:40:18,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.83 | bwd_microstep: 1612.29 | bwd_inner_microstep: 1612.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3695
[2024-06-10 18:40:20,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.40 | bwd_microstep: 1487.14 | bwd_inner_microstep: 1487.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 18:40:22,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.02 | bwd_microstep: 1479.97 | bwd_inner_microstep: 1479.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3508
[2024-06-10 18:40:24,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.99 | bwd_microstep: 1352.72 | bwd_inner_microstep: 1352.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 640
[2024-06-10 18:40:24,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.41 | bwd_microstep: 264.71 | bwd_inner_microstep: 264.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 18:40:26,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 18:40:28,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1409.17 | bwd_inner_microstep: 1409.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 18:40:29,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.91 | bwd_microstep: 698.10 | bwd_inner_microstep: 698.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3619
[2024-06-10 18:40:31,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 18:40:32,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.47 | bwd_microstep: 696.88 | bwd_inner_microstep: 696.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 18:40:34,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1382.85 | bwd_inner_microstep: 1382.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 18:40:36,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1492.36 | bwd_inner_microstep: 1492.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181
[2024-06-10 18:40:37,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 958.31 | bwd_inner_microstep: 958.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3814
[2024-06-10 18:40:39,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.12 | bwd_microstep: 1263.03 | bwd_inner_microstep: 1263.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2186
[2024-06-10 18:40:40,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.88 | bwd_microstep: 796.44 | bwd_inner_microstep: 796.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 18:40:42,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1372.49 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3779
[2024-06-10 18:40:44,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1476.49 | bwd_inner_microstep: 1476.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3574
[2024-06-10 18:40:46,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.36 | bwd_microstep: 1593.97 | bwd_inner_microstep: 1593.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2037
[2024-06-10 18:40:50,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 18:40:50,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.19 | bwd_microstep: 3465.65 | bwd_inner_microstep: 1073.20 | bwd_allreduce_microstep: 2392.38 | step_microstep: 38.92
[2024-06-10 18:40:50,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15362.88 | bwd: 43433.64 | bwd_inner: 41040.20 | bwd_allreduce: 2392.70 | step: 40.49
{'loss': 1.1524, 'learning_rate': 1.4192061634690892e-05, 'epoch': 0.61}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 18:40:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1470.03 | bwd_inner_microstep: 1470.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-10 18:40:53,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.53 | bwd_microstep: 727.41 | bwd_inner_microstep: 727.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3808
[2024-06-10 18:40:55,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.09 | bwd_microstep: 1301.44 | bwd_inner_microstep: 1301.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 18:40:57,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 18:40:58,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 18:41:00,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1252.77 | bwd_inner_microstep: 1252.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 18:41:02,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3699
[2024-06-10 18:41:04,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.94 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 18:41:06,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 18:41:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.99 | bwd_microstep: 889.09 | bwd_inner_microstep: 889.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 18:41:09,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.09 | bwd_microstep: 1487.95 | bwd_inner_microstep: 1487.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506
[2024-06-10 18:41:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.97 | bwd_microstep: 1552.37 | bwd_inner_microstep: 1552.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 18:41:13,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1344.68 | bwd_inner_microstep: 1344.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-10 18:41:15,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.97 | bwd_microstep: 1584.23 | bwd_inner_microstep: 1584.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3884
[2024-06-10 18:41:17,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.45 | bwd_microstep: 1680.85 | bwd_inner_microstep: 1680.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 18:41:19,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1383.53 | bwd_inner_microstep: 1383.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-10 18:41:21,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.15 | bwd_microstep: 1427.36 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2685
[2024-06-10 18:41:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.53 | bwd_microstep: 1125.22 | bwd_inner_microstep: 1125.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 18:41:24,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.57 | bwd_microstep: 800.32 | bwd_inner_microstep: 800.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 18:41:26,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.78 | bwd_microstep: 1402.24 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 18:41:28,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1350.61 | bwd_inner_microstep: 1350.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-10 18:41:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.36 | bwd_microstep: 1455.16 | bwd_inner_microstep: 1455.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 18:41:32,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.84 | bwd_microstep: 1295.93 | bwd_inner_microstep: 1295.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 18:41:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.76 | bwd_microstep: 1503.73 | bwd_inner_microstep: 1503.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2434
[2024-06-10 18:41:35,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.61 | bwd_microstep: 1041.57 | bwd_inner_microstep: 1041.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 18:41:37,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 18:41:39,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1396.75 | bwd_inner_microstep: 1396.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-10 18:41:41,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.99 | bwd_microstep: 1538.89 | bwd_inner_microstep: 1538.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 18:41:43,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1351.69 | bwd_inner_microstep: 1351.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2618
[2024-06-10 18:41:44,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.87 | bwd_microstep: 1013.63 | bwd_inner_microstep: 1013.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3599
[2024-06-10 18:41:46,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1567.98 | bwd_inner_microstep: 1567.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054
[2024-06-10 18:41:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 18:41:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.14 | bwd_microstep: 2967.69 | bwd_inner_microstep: 1040.40 | bwd_allreduce_microstep: 1927.23 | step_microstep: 39.10
[2024-06-10 18:41:50,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.76 | bwd: 43861.85 | bwd_inner: 41933.70 | bwd_allreduce: 1927.46 | step: 40.60
2/1726 [17:59:22<11:46:43, 61.99s/it]


 60%|██████    | 1042/1726 [17:59:22<11:46:43, 61.99s/it]
 60%|██████    | 1043/1726 [18:00:23<11:42:24, 61.71s/it]


 60%|██████    | 1043/1726 [18:00:23<11:42:24, 61.71s/it]
 60%|██████    | 1044/1726 [18:01:26<11:45:30, 62.07s/it]


 60%|██████    | 1044/1726 [18:01:26<11:45:30, 62.07s/it]
 61%|██████    | 1045/1726 [18:02:28<11:43:00, 61.94s/it]


 61%|██████    | 1045/1726 [18:02:28<11:43:00, 61.94s/it]
 61%|██████    | 1046/1726 [18:03:27<11:32:26, 61.10s/it]


 61%|██████    | 1046/1726 [18:03:27<11:32:26, 61.10s/it]
 61%|██████    | 1047/1726 [18:04:26<11:27:09, 60{'loss': 1.1369, 'learning_rate': 1.4156155399173526e-05, 'epoch': 0.61}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 18:41:52,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.48 | bwd_microstep: 1378.28 | bwd_inner_microstep: 1378.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2416
[2024-06-10 18:41:53,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.51 | bwd_microstep: 906.68 | bwd_inner_microstep: 906.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3827
[2024-06-10 18:41:55,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1403.76 | bwd_inner_microstep: 1403.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877
[2024-06-10 18:41:57,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.74 | bwd_microstep: 1681.07 | bwd_inner_microstep: 1681.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736
[2024-06-10 18:41:59,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1428.75 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 18:42:00,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.83 | bwd_microstep: 698.10 | bwd_inner_microstep: 698.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 18:42:02,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.03 | bwd_inner_microstep: 1479.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2637
[2024-06-10 18:42:04,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.80 | bwd_microstep: 1114.96 | bwd_inner_microstep: 1114.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 18:42:06,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 18:42:07,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1249.50 | bwd_inner_microstep: 1249.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 18:42:10,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.50 | bwd_microstep: 1620.85 | bwd_inner_microstep: 1620.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-10 18:42:11,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1362.28 | bwd_inner_microstep: 1362.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 18:42:13,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.83 | bwd_microstep: 1342.12 | bwd_inner_microstep: 1342.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 18:42:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1494.65 | bwd_inner_microstep: 1494.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 18:42:17,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.37 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 18:42:19,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.09 | bwd_microstep: 1589.38 | bwd_inner_microstep: 1589.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 18:42:22,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.09 | bwd_microstep: 1521.38 | bwd_inner_microstep: 1521.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3532
[2024-06-10 18:42:24,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.89 | bwd_microstep: 1434.33 | bwd_inner_microstep: 1434.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3999
[2024-06-10 18:42:26,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.70 | bwd_microstep: 1814.51 | bwd_inner_microstep: 1814.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2105
[2024-06-10 18:42:27,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.40 | bwd_microstep: 730.59 | bwd_inner_microstep: 730.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292
[2024-06-10 18:42:28,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 880.53 | bwd_inner_microstep: 880.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 18:42:30,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1395.14 | bwd_inner_microstep: 1395.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 18:42:32,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.30 | bwd_microstep: 1560.29 | bwd_inner_microstep: 1560.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 18:42:34,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.62 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3558
[2024-06-10 18:42:36,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.08 | bwd_microstep: 1583.76 | bwd_inner_microstep: 1583.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 18:42:38,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1508.78 | bwd_inner_microstep: 1508.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-10 18:42:40,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.12 | bwd_microstep: 966.65 | bwd_inner_microstep: 966.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-10 18:42:42,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1592.46 | bwd_inner_microstep: 1592.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461
[2024-06-10 18:42:44,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.14 | bwd_microstep: 1576.33 | bwd_inner_microstep: 1576.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 18:42:46,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.92 | bwd_microstep: 1476.37 | bwd_inner_microstep: 1476.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 18:42:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1447.54 | bwd_inner_microstep: 1447.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 18:42:51,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.26 | optimizer_step: 6.62
[2024-06-10 18:42:51,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 2686.08 | bwd_inner_microstep: 1569.24 | bwd_allreduce_microstep: 1116.78 | step_microstep: 38.99
[2024-06-10 18:42:51,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16321.74 | bwd: 44977.39 | bwd_inner: 43859.70 | bwd_allreduce: 1117.01 | step: 41.51
{'loss': 1.2202, 'learning_rate': 1.4120269745646469e-05, 'epoch': 0.61}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 18:42:53,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.20 | bwd_microstep: 1471.84 | bwd_inner_microstep: 1471.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 18:42:55,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1376.55 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 18:42:57,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1347.48 | bwd_inner_microstep: 1347.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 18:42:59,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1451.49 | bwd_inner_microstep: 1451.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 18:43:01,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1389.51 | bwd_inner_microstep: 1389.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 18:43:03,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 18:43:04,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.28 | bwd_microstep: 1149.24 | bwd_inner_microstep: 1149.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912
[2024-06-10 18:43:07,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.55 | bwd_microstep: 1695.07 | bwd_inner_microstep: 1695.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 18:43:09,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.16 | bwd_microstep: 1290.94 | bwd_inner_microstep: 1290.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 18:43:10,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.46 | bwd_microstep: 1403.02 | bwd_inner_microstep: 1403.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 18:43:13,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.41 | bwd_microstep: 1630.98 | bwd_inner_microstep: 1630.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3687
[2024-06-10 18:43:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1479.69 | bwd_inner_microstep: 1479.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3383
[2024-06-10 18:43:17,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1310.93 | bwd_inner_microstep: 1310.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-10 18:43:19,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1408.94 | bwd_inner_microstep: 1408.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2351
[2024-06-10 18:43:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.33 | bwd_microstep: 991.20 | bwd_inner_microstep: 991.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 18:43:22,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1395.72 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 18:43:24,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1323.72 | bwd_inner_microstep: 1323.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3005
[2024-06-10 18:43:25,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.00 | bwd_microstep: 1110.15 | bwd_inner_microstep: 1110.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 18:43:27,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1295.97 | bwd_inner_microstep: 1295.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-10 18:43:29,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.90 | bwd_microstep: 1615.95 | bwd_inner_microstep: 1615.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3810
[2024-06-10 18:43:31,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.00 | bwd_microstep: 1620.75 | bwd_inner_microstep: 1620.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 18:43:34,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.87 | bwd_microstep: 1582.95 | bwd_inner_microstep: 1582.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 18:43:36,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1537.35 | bwd_inner_microstep: 1537.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3820
[2024-06-10 18:43:38,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.71 | bwd_microstep: 1690.74 | bwd_inner_microstep: 1690.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3436
[2024-06-10 18:43:40,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1398.59 | bwd_inner_microstep: 1398.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 18:43:41,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 792.34 | bwd_inner_microstep: 792.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 18:43:43,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1292.64 | bwd_inner_microstep: 1292.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 18:43:45,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1453.07 | bwd_inner_microstep: 1453.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 18:43:47,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.05 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 18:43:49,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1441.11 | bwd_inner_microstep: 1441.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 18:43:51,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.07 | bwd_microstep: 1638.18 | bwd_inner_microstep: 1638.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 18:43:53,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.68
[2024-06-10 18:43:53,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.85 | bwd_microstep: 1537.39 | bwd_inner_microstep: 1529.42 | bwd_allreduce_microstep: 7.92 | step_microstep: 37.85
[2024-06-10 18:43:53,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16734.55 | bwd: 44866.39 | bwd_inner: 44857.56 | bwd_allreduce: 8.14 | step: 39.41
{'loss': 1.214, 'learning_rate': 1.4084404800498796e-05, 'epoch': 0.61}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3388
[2024-06-10 18:43:55,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1300.77 | bwd_inner_microstep: 1300.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 18:43:57,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1243.11 | bwd_inner_microstep: 1243.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 18:43:59,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3789
[2024-06-10 18:44:01,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1647.34 | bwd_inner_microstep: 1647.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 18:44:02,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.58 | bwd_microstep: 681.91 | bwd_inner_microstep: 681.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 18:44:04,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.46 | bwd_microstep: 1152.28 | bwd_inner_microstep: 1152.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-10 18:44:06,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.22 | bwd_microstep: 1626.48 | bwd_inner_microstep: 1626.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407
[2024-06-10 18:44:07,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.07 | bwd_microstep: 1180.76 | bwd_inner_microstep: 1180.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 18:44:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 18:44:11,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1289.33 | bwd_inner_microstep: 1289.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 18:44:13,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.70 | bwd_microstep: 1255.24 | bwd_inner_microstep: 1255.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413
[2024-06-10 18:44:15,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1439.05 | bwd_inner_microstep: 1439.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615
[2024-06-10 18:44:17,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1607.06 | bwd_inner_microstep: 1607.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 18:44:19,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1349.74 | bwd_inner_microstep: 1349.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498
[2024-06-10 18:44:21,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.32 | bwd_microstep: 1547.86 | bwd_inner_microstep: 1547.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3428
[2024-06-10 18:44:23,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.31 | bwd_microstep: 1396.30 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 18:44:25,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1396.57 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 18:44:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.21 | bwd_microstep: 1283.29 | bwd_inner_microstep: 1283.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 18:44:29,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352
[2024-06-10 18:44:30,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.05 | bwd_microstep: 988.76 | bwd_inner_microstep: 988.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 18:44:32,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-10 18:44:34,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.56 | bwd_microstep: 1296.96 | bwd_inner_microstep: 1296.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 754
[2024-06-10 18:44:34,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.95 | bwd_microstep: 302.80 | bwd_inner_microstep: 302.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 18:44:36,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1396.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3847
[2024-06-10 18:44:38,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.39 | bwd_microstep: 1696.84 | bwd_inner_microstep: 1696.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659
[2024-06-10 18:44:40,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1455.00 | bwd_inner_microstep: 1454.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3598
[2024-06-10 18:44:42,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1476.43 | bwd_inner_microstep: 1476.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2078
[2024-06-10 18:44:44,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 786.68 | bwd_inner_microstep: 786.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 18:44:46,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.12 | bwd_microstep: 1500.11 | bwd_inner_microstep: 1500.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 18:44:48,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1494.84 | bwd_inner_microstep: 1494.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 18:44:50,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.79 | bwd_microstep: 1504.06 | bwd_inner_microstep: 1504.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3574
[2024-06-10 18:44:56,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.30 | optimizer_step: 6.62
[2024-06-10 18:44:56,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 5714.86 | bwd_inner_microstep: 1533.92 | bwd_allreduce_microstep: 4180.88 | step_microstep: 39.12
[2024-06-10 18:44:56,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15803.46 | bwd: 46514.47 | bwd_inner: 42332.69 | bwd_allreduce: 4181.10 | step: 40.68
{'loss': 1.2603, 'learning_rate': 1.4048560690046661e-05, 'epoch': 0.61}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-10 18:44:58,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.24 | bwd_microstep: 1317.74 | bwd_inner_microstep: 1317.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4000
[2024-06-10 18:45:00,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.95 | bwd_microstep: 1531.44 | bwd_inner_microstep: 1531.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433
[2024-06-10 18:45:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1447.48 | bwd_inner_microstep: 1447.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 18:45:04,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1378.44 | bwd_inner_microstep: 1378.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485
[2024-06-10 18:45:06,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.44 | bwd_microstep: 1346.74 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 18:45:07,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.27 | bwd_microstep: 1279.55 | bwd_inner_microstep: 1279.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 842
[2024-06-10 18:45:08,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.39 | bwd_microstep: 344.83 | bwd_inner_microstep: 344.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 18:45:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.75 | bwd_microstep: 689.05 | bwd_inner_microstep: 689.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 18:45:10,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.23 | bwd_microstep: 795.59 | bwd_inner_microstep: 795.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 18:45:12,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1256.53 | bwd_inner_microstep: 1256.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3720
[2024-06-10 18:45:14,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.86 | bwd_microstep: 1681.62 | bwd_inner_microstep: 1681.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3671
[2024-06-10 18:45:16,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.03 | bwd_microstep: 1658.29 | bwd_inner_microstep: 1658.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 18:45:18,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1390.70 | bwd_inner_microstep: 1390.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3485
[2024-06-10 18:45:20,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.40 | bwd_microstep: 1581.17 | bwd_inner_microstep: 1581.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1932
[2024-06-10 18:45:22,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.97 | bwd_microstep: 886.40 | bwd_inner_microstep: 886.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 18:45:24,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1491.54 | bwd_inner_microstep: 1491.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3529
[2024-06-10 18:45:26,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1589.76 | bwd_inner_microstep: 1589.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3375
[2024-06-10 18:45:28,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.73 | bwd_microstep: 1431.22 | bwd_inner_microstep: 1431.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434
[2024-06-10 18:45:30,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.92 | bwd_microstep: 1185.19 | bwd_inner_microstep: 1185.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441
[2024-06-10 18:45:31,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.67 | bwd_microstep: 1159.53 | bwd_inner_microstep: 1159.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 18:45:33,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1255.15 | bwd_inner_microstep: 1255.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564
[2024-06-10 18:45:35,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.83 | bwd_microstep: 1203.07 | bwd_inner_microstep: 1203.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 18:45:36,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.05 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 18:45:38,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.51 | bwd_microstep: 1458.24 | bwd_inner_microstep: 1458.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-10 18:45:39,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 696.13 | bwd_inner_microstep: 696.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-10 18:45:40,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.50 | bwd_microstep: 910.91 | bwd_inner_microstep: 910.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 18:45:42,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.62 | bwd_microstep: 1382.56 | bwd_inner_microstep: 1382.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 18:45:44,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1277.62 | bwd_inner_microstep: 1277.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 18:45:46,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.86 | bwd_microstep: 1645.08 | bwd_inner_microstep: 1645.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 18:45:48,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1399.55 | bwd_inner_microstep: 1399.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2036
[2024-06-10 18:45:50,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.77 | bwd_microstep: 903.38 | bwd_inner_microstep: 903.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587
[2024-06-10 18:46:00,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.41 | optimizer_step: 6.60
[2024-06-10 18:46:00,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 10328.34 | bwd_inner_microstep: 1722.17 | bwd_allreduce_microstep: 8606.10 | step_microstep: 39.99
[2024-06-10 18:46:00,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15064.65 | bwd: 49090.67 | bwd_inner: 40483.64 | bwd_allreduce: 8606.34 | step: 41.59
{'loss': 1.1862, 'learning_rate': 1.4012737540532842e-05, 'epoch': 0.61}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 18:46:03,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1465.82 | bwd_inner_microstep: 1465.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4005
[2024-06-10 18:46:05,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.07 | bwd_microstep: 1536.63 | bwd_inner_microstep: 1536.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3487
[2024-06-10 18:46:06,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1329.40 | bwd_inner_microstep: 1329.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4232
[2024-06-10 18:46:09,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.40 | bwd_microstep: 1659.52 | bwd_inner_microstep: 1659.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 18:46:11,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1476.98 | bwd_inner_microstep: 1476.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 18:46:12,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.26 | bwd_microstep: 678.71 | bwd_inner_microstep: 678.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897
[2024-06-10 18:46:13,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 681.61 | bwd_inner_microstep: 681.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 18:46:15,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.93 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-10 18:46:16,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.03 | bwd_microstep: 697.11 | bwd_inner_microstep: 697.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 18:46:17,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.80 | bwd_microstep: 1150.53 | bwd_inner_microstep: 1150.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 18:46:19,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.28 | bwd_microstep: 1284.17 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 18:46:20,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.40 | bwd_microstep: 791.93 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 18:46:22,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.01 | bwd_microstep: 1422.91 | bwd_inner_microstep: 1422.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3517
[2024-06-10 18:46:24,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.35 | bwd_microstep: 1647.52 | bwd_inner_microstep: 1647.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 18:46:26,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3381
[2024-06-10 18:46:28,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1499.01 | bwd_inner_microstep: 1498.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 18:46:30,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.43 | bwd_microstep: 1340.63 | bwd_inner_microstep: 1340.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-10 18:46:31,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.71 | bwd_microstep: 922.10 | bwd_inner_microstep: 922.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556
[2024-06-10 18:46:33,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.81 | bwd_microstep: 1446.21 | bwd_inner_microstep: 1446.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 18:46:35,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1392.42 | bwd_inner_microstep: 1392.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 18:46:37,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.52 | bwd_microstep: 1354.49 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 18:46:39,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1494.23 | bwd_inner_microstep: 1494.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975
[2024-06-10 18:46:40,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.04 | bwd_microstep: 834.10 | bwd_inner_microstep: 834.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 18:46:42,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.33 | bwd_microstep: 1284.55 | bwd_inner_microstep: 1284.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822
[2024-06-10 18:46:44,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1360.21 | bwd_inner_microstep: 1360.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 18:46:46,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1548.98 | bwd_inner_microstep: 1548.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706
[2024-06-10 18:46:48,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.70 | bwd_microstep: 1439.54 | bwd_inner_microstep: 1439.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 18:46:50,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1509.49 | bwd_inner_microstep: 1509.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 18:46:52,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1507.80 | bwd_inner_microstep: 1507.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 18:46:54,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1493.50 | bwd_inner_microstep: 1493.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193
[2024-06-10 18:46:56,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.89 | bwd_microstep: 862.49 | bwd_inner_microstep: 862.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 18:47:02,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.61 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 18:47:02,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.94 | bwd_microstep: 6228.84 | bwd_inner_microstep: 1816.13 | bwd_allreduce_microstep: 4412.66 | step_microstep: 39.25
[2024-06-10 18:47:02,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15474.39 | bwd: 46107.48 | bwd_inner: 41693.92 | bwd_allreduce: 4412.89 | step: 40.86
.72s/it]


 61%|██████    | 1047/1726 [18:04:26<11:27:09, 60.72s/it]
 61%|██████    | 1048/1726 [18:05:28<11:29:17, 61.00s/it]


 61%|██████    | 1048/1726 [18:05:28<11:29:17, 61.00s/it]
 61%|██████    | 1049/1726 [18:06:30<11:31:30, 61.29s/it]


 61%|██████    | 1049/1726 [18:06:30<11:31:30, 61.29s/it]
 61%|██████    | 1050/1726 [18:07:33<11:35:07, 61.70s/it]


 61%|██████    | 1050/1726 [18:07:33<11:35:07, 61.70s/it]
 61%|██████    | 1051/1726 [18:08:37<11:43:30, 62.53s/it]


 61%|██████    | 1051/1726 [18:08:37<11:43:30, 62.53s/it]
 61%|██████    | 1052/1726 [18:09:39<11:40:23, 62.35s/it]
                    {'loss': 1.2246, 'learning_rate': 1.3976935478126281e-05, 'epoch': 0.61}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 18:47:04,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1370.91 | bwd_inner_microstep: 1370.64 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3934
[2024-06-10 18:47:07,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1591.94 | bwd_inner_microstep: 1591.80 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.19
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865
[2024-06-10 18:47:09,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.37 | bwd_microstep: 1465.31 | bwd_inner_microstep: 1465.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 18:47:10,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.65 | bwd_microstep: 1373.61 | bwd_inner_microstep: 1373.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-10 18:47:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.74 | bwd_microstep: 1343.69 | bwd_inner_microstep: 1343.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 18:47:14,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-10 18:47:16,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.11 | bwd_microstep: 1532.82 | bwd_inner_microstep: 1532.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 18:47:18,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1527.68 | bwd_inner_microstep: 1527.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-10 18:47:20,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1525.02 | bwd_inner_microstep: 1524.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 18:47:22,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1342.04 | bwd_inner_microstep: 1342.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 18:47:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.69 | bwd_microstep: 1630.28 | bwd_inner_microstep: 1630.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 18:47:26,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.37 | bwd_microstep: 817.55 | bwd_inner_microstep: 817.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 18:47:27,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1285.28 | bwd_inner_microstep: 1285.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-10 18:47:30,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.15 | bwd_microstep: 1714.44 | bwd_inner_microstep: 1714.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-10 18:47:32,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1475.83 | bwd_inner_microstep: 1475.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3464
[2024-06-10 18:47:34,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1403.37 | bwd_inner_microstep: 1403.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3658
[2024-06-10 18:47:35,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.45 | bwd_microstep: 1226.60 | bwd_inner_microstep: 1226.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503
[2024-06-10 18:47:37,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.86 | bwd_microstep: 1192.72 | bwd_inner_microstep: 1192.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 18:47:39,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1403.38 | bwd_inner_microstep: 1403.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 18:47:41,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.24 | bwd_microstep: 1405.21 | bwd_inner_microstep: 1405.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2653
[2024-06-10 18:47:42,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.64 | bwd_microstep: 1022.78 | bwd_inner_microstep: 1022.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536
[2024-06-10 18:47:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1326.12 | bwd_inner_microstep: 1326.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456
[2024-06-10 18:47:46,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.46 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-10 18:47:48,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.82 | bwd_microstep: 1344.78 | bwd_inner_microstep: 1344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 18:47:50,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1401.28 | bwd_inner_microstep: 1401.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3551
[2024-06-10 18:47:52,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.71 | bwd_microstep: 1326.86 | bwd_inner_microstep: 1326.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 18:47:54,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1659.04 | bwd_inner_microstep: 1659.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 18:47:56,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1296.74 | bwd_inner_microstep: 1296.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3822
[2024-06-10 18:47:58,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.73 | bwd_microstep: 1825.90 | bwd_inner_microstep: 1825.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812
[2024-06-10 18:48:00,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.76 | bwd_microstep: 1515.39 | bwd_inner_microstep: 1515.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3568
[2024-06-10 18:48:02,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1445.36 | bwd_inner_microstep: 1445.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-10 18:48:05,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.64
[2024-06-10 18:48:05,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.21 | bwd_microstep: 1635.38 | bwd_inner_microstep: 1627.36 | bwd_allreduce_microstep: 7.96 | step_microstep: 38.75
[2024-06-10 18:48:05,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16864.44 | bwd: 45055.34 | bwd_inner: 45046.10 | bwd_allreduce: 8.35 | step: 41.68
{'loss': 1.1998, 'learning_rate': 1.3941154628921654e-05, 'epoch': 0.61}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3415
[2024-06-10 18:48:07,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.69 | bwd_microstep: 1470.19 | bwd_inner_microstep: 1470.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 18:48:09,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1484.50 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2354
[2024-06-10 18:48:10,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.44 | bwd_microstep: 891.41 | bwd_inner_microstep: 891.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 18:48:12,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.19 | bwd_microstep: 1482.90 | bwd_inner_microstep: 1482.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 18:48:14,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.81 | bwd_microstep: 1315.33 | bwd_inner_microstep: 1315.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2208
[2024-06-10 18:48:15,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.65 | bwd_microstep: 955.55 | bwd_inner_microstep: 955.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 18:48:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 18:48:19,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1510.22 | bwd_inner_microstep: 1510.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 18:48:21,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1343.91 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 18:48:23,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1391.14 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3720
[2024-06-10 18:48:25,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1395.97 | bwd_inner_microstep: 1395.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492
[2024-06-10 18:48:27,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.36 | bwd_microstep: 1435.50 | bwd_inner_microstep: 1435.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 18:48:29,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1382.24 | bwd_inner_microstep: 1382.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 18:48:31,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 1488.06 | bwd_inner_microstep: 1488.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3674
[2024-06-10 18:48:33,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.88 | bwd_microstep: 1786.51 | bwd_inner_microstep: 1786.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686
[2024-06-10 18:48:36,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.63 | bwd_microstep: 1719.37 | bwd_inner_microstep: 1719.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 18:48:37,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.55 | bwd_microstep: 1156.68 | bwd_inner_microstep: 1156.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 18:48:39,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1287.90 | bwd_inner_microstep: 1287.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628
[2024-06-10 18:48:41,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1436.13 | bwd_inner_microstep: 1436.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 18:48:43,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1460.49 | bwd_inner_microstep: 1460.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-10 18:48:44,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1160.86 | bwd_inner_microstep: 1160.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3520
[2024-06-10 18:48:46,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.09 | bwd_microstep: 1192.75 | bwd_inner_microstep: 1192.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 18:48:48,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.58 | bwd_microstep: 974.69 | bwd_inner_microstep: 974.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 18:48:49,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1393.88 | bwd_inner_microstep: 1393.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-10 18:48:52,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.81 | bwd_microstep: 1624.40 | bwd_inner_microstep: 1624.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826
[2024-06-10 18:48:54,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1514.70 | bwd_inner_microstep: 1514.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2094
[2024-06-10 18:48:55,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 885.56 | bwd_inner_microstep: 885.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-10 18:48:57,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.99 | bwd_microstep: 1530.93 | bwd_inner_microstep: 1530.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2042
[2024-06-10 18:48:58,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.47 | bwd_microstep: 1003.27 | bwd_inner_microstep: 1003.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3588
[2024-06-10 18:49:00,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.66 | bwd_microstep: 1464.74 | bwd_inner_microstep: 1464.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 18:49:03,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1496.82 | bwd_inner_microstep: 1496.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3781
[2024-06-10 18:49:06,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.32 | optimizer_step: 6.60
[2024-06-10 18:49:06,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.17 | bwd_microstep: 2482.88 | bwd_inner_microstep: 1686.59 | bwd_allreduce_microstep: 796.23 | step_microstep: 39.48
[2024-06-10 18:49:06,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16225.28 | bwd: 44402.57 | bwd_inner: 43605.43 | bwd_allreduce: 796.47 | step: 41.07
{'loss': 1.2779, 'learning_rate': 1.3905395118938929e-05, 'epoch': 0.61}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-10 18:49:08,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1546.69 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 18:49:10,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.91 | bwd_microstep: 1244.14 | bwd_inner_microstep: 1244.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 18:49:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.44 | bwd_microstep: 788.11 | bwd_inner_microstep: 788.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867
[2024-06-10 18:49:13,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.52 | bwd_microstep: 1661.14 | bwd_inner_microstep: 1661.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 18:49:15,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1351.10 | bwd_inner_microstep: 1351.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-10 18:49:17,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1311.20 | bwd_inner_microstep: 1311.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 18:49:18,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.62 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 18:49:20,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1282.10 | bwd_inner_microstep: 1282.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 18:49:22,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.04 | bwd_microstep: 1155.30 | bwd_inner_microstep: 1155.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531
[2024-06-10 18:49:24,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1355.19 | bwd_inner_microstep: 1355.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2891
[2024-06-10 18:49:25,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.41 | bwd_microstep: 1025.99 | bwd_inner_microstep: 1025.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 18:49:27,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.64 | bwd_microstep: 1483.57 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-10 18:49:29,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.74 | bwd_microstep: 1584.90 | bwd_inner_microstep: 1584.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 18:49:31,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 18:49:33,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1398.50 | bwd_inner_microstep: 1398.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 18:49:35,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.52 | bwd_microstep: 1606.00 | bwd_inner_microstep: 1605.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 18:49:37,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.13 | bwd_microstep: 878.27 | bwd_inner_microstep: 878.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 18:49:38,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.55 | bwd_microstep: 796.82 | bwd_inner_microstep: 796.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113
[2024-06-10 18:49:39,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.98 | bwd_microstep: 827.48 | bwd_inner_microstep: 827.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 18:49:40,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.05 | bwd_microstep: 799.52 | bwd_inner_microstep: 799.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-10 18:49:42,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1316.60 | bwd_inner_microstep: 1316.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 18:49:44,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 18:49:45,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1389.83 | bwd_inner_microstep: 1389.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2173
[2024-06-10 18:49:47,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.97 | bwd_microstep: 824.45 | bwd_inner_microstep: 824.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-10 18:49:48,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.23 | bwd_microstep: 1287.79 | bwd_inner_microstep: 1287.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 18:49:51,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.92 | bwd_microstep: 1644.20 | bwd_inner_microstep: 1644.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2239
[2024-06-10 18:49:52,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.24 | bwd_microstep: 995.84 | bwd_inner_microstep: 995.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803
[2024-06-10 18:49:54,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1474.35 | bwd_inner_microstep: 1474.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3770
[2024-06-10 18:49:56,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.13 | bwd_microstep: 1581.04 | bwd_inner_microstep: 1581.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 18:49:58,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444
[2024-06-10 18:50:00,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.62 | bwd_microstep: 1411.24 | bwd_inner_microstep: 1411.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628
[2024-06-10 18:50:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 18:50:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.81 | bwd_microstep: 5907.03 | bwd_inner_microstep: 1734.02 | bwd_allreduce_microstep: 4172.96 | step_microstep: 38.05
[2024-06-10 18:50:07,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15355.64 | bwd: 45397.74 | bwd_inner: 41223.86 | bwd_allreduce: 4173.19 | step: 39.64
{'loss': 1.2234, 'learning_rate': 1.3869657074122906e-05, 'epoch': 0.61}
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 3967
[2024-06-10 18:50:08,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.15 | bwd_microstep: 1196.89 | bwd_inner_microstep: 1196.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3971
[2024-06-10 18:50:11,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.61 | bwd_microstep: 1703.69 | bwd_inner_microstep: 1703.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 18:50:12,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.97 | bwd_microstep: 789.48 | bwd_inner_microstep: 789.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461
[2024-06-10 18:50:14,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.96 | bwd_microstep: 1337.03 | bwd_inner_microstep: 1337.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 18:50:16,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.94 | bwd_microstep: 1341.12 | bwd_inner_microstep: 1341.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1966
[2024-06-10 18:50:17,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.70 | bwd_microstep: 764.42 | bwd_inner_microstep: 764.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794
[2024-06-10 18:50:19,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.73 | bwd_microstep: 1648.74 | bwd_inner_microstep: 1648.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 18:50:21,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1246.72 | bwd_inner_microstep: 1246.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 18:50:22,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504
[2024-06-10 18:50:25,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.82 | bwd_microstep: 1580.95 | bwd_inner_microstep: 1580.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109
[2024-06-10 18:50:26,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.10 | bwd_microstep: 923.15 | bwd_inner_microstep: 923.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3419
[2024-06-10 18:50:28,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1410.63 | bwd_inner_microstep: 1410.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3709
[2024-06-10 18:50:30,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.09 | bwd_microstep: 1671.23 | bwd_inner_microstep: 1671.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 18:50:32,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 18:50:34,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1246.82 | bwd_inner_microstep: 1246.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 18:50:36,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 18:50:38,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1342.82 | bwd_inner_microstep: 1342.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3646
[2024-06-10 18:50:40,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1444.14 | bwd_inner_microstep: 1444.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 18:50:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1411.19 | bwd_inner_microstep: 1411.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3643
[2024-06-10 18:50:43,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1283.02 | bwd_inner_microstep: 1283.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 18:50:45,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 802.77 | bwd_inner_microstep: 802.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 18:50:46,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1399.13 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815
[2024-06-10 18:50:48,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.76 | bwd_microstep: 1387.11 | bwd_inner_microstep: 1387.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 18:50:50,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.94 | bwd_microstep: 1393.50 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 18:50:52,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1297.60 | bwd_inner_microstep: 1297.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 18:50:54,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1412.41 | bwd_inner_microstep: 1412.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769
[2024-06-10 18:50:56,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.51 | bwd_microstep: 1350.06 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 18:50:58,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.49 | bwd_microstep: 1425.36 | bwd_inner_microstep: 1425.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3430
[2024-06-10 18:51:00,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.07 | bwd_microstep: 1373.03 | bwd_inner_microstep: 1373.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777
[2024-06-10 18:51:02,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.57 | bwd_microstep: 1640.37 | bwd_inner_microstep: 1640.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3593
[2024-06-10 18:51:04,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1433.39 | bwd_inner_microstep: 1433.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 18:51:08,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 18:51:08,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.67 | bwd_microstep: 3182.56 | bwd_inner_microstep: 901.62 | bwd_allreduce_microstep: 2280.88 | step_microstep: 39.09
[2024-06-10 18:51:08,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15818.38 | bwd: 44657.89 | bwd_inner: 42376.10 | bwd_allreduce: 2281.11 | step: 40.69
{'loss': 1.1897, 'learning_rate': 1.3833940620342803e-05, 'epoch': 0.61}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3467
[2024-06-10 18:51:09,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1331.10 | bwd_inner_microstep: 1331.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2384
[2024-06-10 18:51:11,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.12 | bwd_microstep: 994.65 | bwd_inner_microstep: 994.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 18:51:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1480.03 | bwd_inner_microstep: 1480.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.63
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1928
[2024-06-10 18:51:14,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.58 | bwd_microstep: 819.77 | bwd_inner_microstep: 819.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3863
[2024-06-10 18:51:16,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.06 | bwd_microstep: 1666.24 | bwd_inner_microstep: 1666.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 18:51:18,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1247.42 | bwd_inner_microstep: 1247.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-10 18:51:19,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.15 | bwd_microstep: 674.83 | bwd_inner_microstep: 674.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3680
[2024-06-10 18:51:21,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.05 | bwd_microstep: 1721.71 | bwd_inner_microstep: 1721.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 18:51:22,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.27 | bwd_microstep: 791.61 | bwd_inner_microstep: 791.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-10 18:51:24,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.89 | bwd_microstep: 794.29 | bwd_inner_microstep: 794.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 18:51:25,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.48 | bwd_microstep: 1314.34 | bwd_inner_microstep: 1314.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 18:51:28,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1628.44 | bwd_inner_microstep: 1628.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3628
[2024-06-10 18:51:29,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1281.77 | bwd_inner_microstep: 1281.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 18:51:31,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.14 | bwd_microstep: 1289.00 | bwd_inner_microstep: 1288.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 18:51:33,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976
[2024-06-10 18:51:34,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.87 | bwd_microstep: 887.73 | bwd_inner_microstep: 887.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 18:51:36,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1391.69 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 18:51:37,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.44 | bwd_microstep: 796.12 | bwd_inner_microstep: 796.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 18:51:39,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1511.89 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-10 18:51:41,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1455.12 | bwd_inner_microstep: 1455.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2039
[2024-06-10 18:51:43,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.91 | bwd_microstep: 875.05 | bwd_inner_microstep: 875.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 18:51:45,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.03 | bwd_microstep: 1529.43 | bwd_inner_microstep: 1529.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3713
[2024-06-10 18:51:47,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.80 | bwd_microstep: 1562.83 | bwd_inner_microstep: 1562.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268
[2024-06-10 18:51:48,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.49 | bwd_microstep: 974.28 | bwd_inner_microstep: 974.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830
[2024-06-10 18:51:50,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1363.74 | bwd_inner_microstep: 1363.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3428
[2024-06-10 18:51:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1376.11 | bwd_inner_microstep: 1376.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 18:51:54,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1380.91 | bwd_inner_microstep: 1380.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 18:51:56,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.21 | bwd_microstep: 1310.19 | bwd_inner_microstep: 1310.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 18:51:58,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.31 | bwd_microstep: 1542.73 | bwd_inner_microstep: 1542.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 18:52:00,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1377.88 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3597
[2024-06-10 18:52:02,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.47 | bwd_microstep: 1703.38 | bwd_inner_microstep: 1703.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 18:52:10,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 18:52:10,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.69 | bwd_microstep: 6845.80 | bwd_inner_microstep: 1678.71 | bwd_allreduce_microstep: 5167.03 | step_microstep: 37.93
[2024-06-10 18:52:10,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15315.33 | bwd: 46303.31 | bwd_inner: 41135.37 | bwd_allreduce: 5167.26 | step: 40.07


 61%|██████    | 1052/1726 [18:09:39<11:40:23, 62.35s/it]
 61%|██████    | 1053/1726 [18:10:41<11:39:05, 62.33s/it]


 61%|██████    | 1053/1726 [18:10:41<11:39:05, 62.33s/it]
 61%|██████    | 1054/1726 [18:11:42<11:33:32, 61.92s/it]


 61%|██████    | 1054/1726 [18:11:42<11:33:32, 61.92s/it]
 61%|██████    | 1055/1726 [18:12:44<11:29:45, 61.68s/it]


 61%|██████    | 1055/1726 [18:12:44<11:29:45, 61.68s/it]
 61%|██████    | 1056/1726 [18:13:44<11:25:51, 61.42s/it]


 61%|██████    | 1056/1726 [18:13:44<11:25:51, 61.42s/it]
 61%|██████    | 1057/1726 [18:14:46<11:26:39, 61.58s/it]
                                                 {'loss': 1.19, 'learning_rate': 1.3798245883391788e-05, 'epoch': 0.61}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 18:52:12,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.50 | bwd_microstep: 1478.16 | bwd_inner_microstep: 1478.00 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 18:52:13,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.16 | bwd_microstep: 788.33 | bwd_inner_microstep: 788.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475
[2024-06-10 18:52:15,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.43 | bwd_microstep: 1406.35 | bwd_inner_microstep: 1406.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 18:52:17,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1358.02 | bwd_inner_microstep: 1357.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 18:52:18,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3839
[2024-06-10 18:52:20,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1486.90 | bwd_inner_microstep: 1486.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 18:52:22,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.35 | bwd_microstep: 1308.01 | bwd_inner_microstep: 1307.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 18:52:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1246.75 | bwd_inner_microstep: 1246.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 18:52:26,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.82 | bwd_microstep: 1282.39 | bwd_inner_microstep: 1282.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3496
[2024-06-10 18:52:27,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.18 | bwd_microstep: 1221.48 | bwd_inner_microstep: 1221.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 18:52:29,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1376.91 | bwd_inner_microstep: 1376.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-10 18:52:32,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1527.75 | bwd_inner_microstep: 1527.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3654
[2024-06-10 18:52:34,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.04 | bwd_microstep: 1582.58 | bwd_inner_microstep: 1582.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 18:52:35,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.76 | bwd_microstep: 1179.43 | bwd_inner_microstep: 1179.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 18:52:37,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1386.55 | bwd_inner_microstep: 1386.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 18:52:39,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128
[2024-06-10 18:52:40,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.14 | bwd_microstep: 926.81 | bwd_inner_microstep: 926.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 18:52:43,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.22 | bwd_microstep: 1708.34 | bwd_inner_microstep: 1708.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 18:52:45,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1493.42 | bwd_inner_microstep: 1493.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-10 18:52:47,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1513.00 | bwd_inner_microstep: 1512.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3506
[2024-06-10 18:52:48,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.57 | bwd_microstep: 1221.45 | bwd_inner_microstep: 1221.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 18:52:51,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.38 | bwd_microstep: 1657.43 | bwd_inner_microstep: 1657.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 18:52:53,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1301.00 | bwd_inner_microstep: 1300.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 18:52:55,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.13 | bwd_microstep: 1457.20 | bwd_inner_microstep: 1457.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193
[2024-06-10 18:52:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.95 | bwd_microstep: 864.37 | bwd_inner_microstep: 864.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452
[2024-06-10 18:52:58,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.39 | bwd_microstep: 1381.52 | bwd_inner_microstep: 1381.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 18:53:00,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.52 | bwd_microstep: 1458.84 | bwd_inner_microstep: 1458.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2073
[2024-06-10 18:53:01,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.50 | bwd_microstep: 851.56 | bwd_inner_microstep: 851.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 18:53:03,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.12 | bwd_microstep: 1546.68 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 18:53:05,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.05 | bwd_microstep: 1651.36 | bwd_inner_microstep: 1651.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 18:53:07,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.60 | bwd_microstep: 1450.93 | bwd_inner_microstep: 1450.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 18:53:11,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-10 18:53:11,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.95 | bwd_microstep: 3451.88 | bwd_inner_microstep: 1689.14 | bwd_allreduce_microstep: 1762.69 | step_microstep: 38.33
[2024-06-10 18:53:11,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16198.80 | bwd: 45199.59 | bwd_inner: 43435.87 | bwd_allreduce: 1762.98 | step: 39.88
{'loss': 1.1974, 'learning_rate': 1.3762572988986522e-05, 'epoch': 0.61}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 18:53:13,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1390.88 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 18:53:15,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 1390.88 | bwd_inner_microstep: 1390.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3877
[2024-06-10 18:53:17,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.82 | bwd_microstep: 1582.39 | bwd_inner_microstep: 1582.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 18:53:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 794.48 | bwd_inner_microstep: 794.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3848
[2024-06-10 18:53:21,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.06 | bwd_microstep: 1592.59 | bwd_inner_microstep: 1592.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 18:53:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1346.99 | bwd_inner_microstep: 1346.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3427
[2024-06-10 18:53:24,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.58 | bwd_microstep: 1154.64 | bwd_inner_microstep: 1154.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3783
[2024-06-10 18:53:26,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 1510.65 | bwd_inner_microstep: 1510.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 18:53:28,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.28 | bwd_microstep: 1631.82 | bwd_inner_microstep: 1631.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 18:53:30,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 18:53:32,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1391.93 | bwd_inner_microstep: 1391.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 18:53:34,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.25 | bwd_microstep: 1161.78 | bwd_inner_microstep: 1161.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2510
[2024-06-10 18:53:35,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.36 | bwd_microstep: 989.25 | bwd_inner_microstep: 989.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913
[2024-06-10 18:53:36,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.41 | bwd_microstep: 694.78 | bwd_inner_microstep: 694.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495
[2024-06-10 18:53:38,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1578.28 | bwd_inner_microstep: 1578.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 18:53:40,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.89 | bwd_microstep: 1400.63 | bwd_inner_microstep: 1400.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 18:53:42,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.52 | bwd_microstep: 1622.16 | bwd_inner_microstep: 1622.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 18:53:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3842
[2024-06-10 18:53:47,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.62 | bwd_microstep: 1832.36 | bwd_inner_microstep: 1832.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 18:53:49,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1349.48 | bwd_inner_microstep: 1349.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3753
[2024-06-10 18:53:51,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.85 | bwd_microstep: 1392.47 | bwd_inner_microstep: 1392.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3701
[2024-06-10 18:53:53,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1435.59 | bwd_inner_microstep: 1435.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564
[2024-06-10 18:53:54,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.10 | bwd_microstep: 1203.38 | bwd_inner_microstep: 1203.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 18:53:56,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1497.34 | bwd_inner_microstep: 1497.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3821
[2024-06-10 18:53:58,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.76 | bwd_microstep: 1626.22 | bwd_inner_microstep: 1626.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279
[2024-06-10 18:54:00,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.72 | bwd_microstep: 909.11 | bwd_inner_microstep: 909.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823
[2024-06-10 18:54:02,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1419.55 | bwd_inner_microstep: 1419.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 18:54:04,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.79 | bwd_microstep: 1551.05 | bwd_inner_microstep: 1551.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773
[2024-06-10 18:54:06,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.32 | bwd_microstep: 1739.21 | bwd_inner_microstep: 1739.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816
[2024-06-10 18:54:09,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.02 | bwd_microstep: 1751.58 | bwd_inner_microstep: 1751.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 18:54:11,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.61 | bwd_microstep: 1502.48 | bwd_inner_microstep: 1502.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 18:54:13,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.65
[2024-06-10 18:54:13,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1436.28 | bwd_inner_microstep: 1428.33 | bwd_allreduce_microstep: 7.90 | step_microstep: 37.84
[2024-06-10 18:54:13,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16594.72 | bwd: 44488.89 | bwd_inner: 44479.94 | bwd_allreduce: 8.21 | step: 39.35
{'loss': 1.2312, 'learning_rate': 1.3726922062766765e-05, 'epoch': 0.61}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 18:54:15,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1336.57 | bwd_inner_microstep: 1336.44 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940
[2024-06-10 18:54:17,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.55 | bwd_microstep: 1598.15 | bwd_inner_microstep: 1598.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 18:54:19,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1398.65 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 18:54:21,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 1349.60 | bwd_inner_microstep: 1349.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-10 18:54:22,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.64 | bwd_microstep: 798.26 | bwd_inner_microstep: 798.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822
[2024-06-10 18:54:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1355.00 | bwd_inner_microstep: 1354.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4037
[2024-06-10 18:54:26,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1516.05 | bwd_inner_microstep: 1516.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945
[2024-06-10 18:54:27,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.74 | bwd_microstep: 728.50 | bwd_inner_microstep: 728.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 18:54:29,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1424.87 | bwd_inner_microstep: 1424.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 18:54:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1473
[2024-06-10 18:54:31,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 209.80 | bwd_microstep: 543.65 | bwd_inner_microstep: 543.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501
[2024-06-10 18:54:33,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-10 18:54:34,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.09 | bwd_microstep: 779.26 | bwd_inner_microstep: 779.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 18:54:35,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.11 | bwd_microstep: 884.64 | bwd_inner_microstep: 884.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 18:54:37,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.56 | bwd_microstep: 1342.51 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-10 18:54:38,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 797.65 | bwd_inner_microstep: 797.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 18:54:40,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832
[2024-06-10 18:54:42,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1388.60 | bwd_inner_microstep: 1388.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 18:54:44,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1373.16 | bwd_inner_microstep: 1373.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3568
[2024-06-10 18:54:46,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.84 | bwd_microstep: 1236.47 | bwd_inner_microstep: 1236.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2401
[2024-06-10 18:54:47,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.94 | bwd_microstep: 937.61 | bwd_inner_microstep: 937.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3472
[2024-06-10 18:54:49,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1360.07 | bwd_inner_microstep: 1360.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3820
[2024-06-10 18:54:51,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.25 | bwd_microstep: 1705.08 | bwd_inner_microstep: 1705.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-10 18:54:53,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 806.03 | bwd_inner_microstep: 806.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654
[2024-06-10 18:54:55,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.46 | bwd_microstep: 1448.23 | bwd_inner_microstep: 1448.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587
[2024-06-10 18:54:56,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1339.11 | bwd_inner_microstep: 1339.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3706
[2024-06-10 18:54:58,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.67 | bwd_microstep: 1460.69 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 18:55:00,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1474.84 | bwd_inner_microstep: 1474.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3791
[2024-06-10 18:55:02,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.31 | bwd_microstep: 1414.49 | bwd_inner_microstep: 1414.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 18:55:05,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.42 | bwd_microstep: 1655.52 | bwd_inner_microstep: 1655.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3421
[2024-06-10 18:55:07,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1407.09 | bwd_inner_microstep: 1407.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3773
[2024-06-10 18:55:14,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 18:55:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.23 | bwd_microstep: 6287.43 | bwd_inner_microstep: 1912.64 | bwd_allreduce_microstep: 4374.73 | step_microstep: 37.95
[2024-06-10 18:55:14,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15251.51 | bwd: 45255.39 | bwd_inner: 40879.64 | bwd_allreduce: 4375.02 | step: 39.49
{'loss': 1.1915, 'learning_rate': 1.369129323029489e-05, 'epoch': 0.61}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 18:55:15,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.13 | bwd_microstep: 1329.25 | bwd_inner_microstep: 1329.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2427
[2024-06-10 18:55:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.38 | bwd_microstep: 905.76 | bwd_inner_microstep: 905.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 18:55:18,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.32 | bwd_microstep: 1241.86 | bwd_inner_microstep: 1241.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 18:55:21,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.87 | bwd_microstep: 1648.43 | bwd_inner_microstep: 1648.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 18:55:23,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.93 | bwd_microstep: 1553.17 | bwd_inner_microstep: 1553.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 18:55:25,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1285.45 | bwd_inner_microstep: 1285.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416
[2024-06-10 18:55:26,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.48 | bwd_microstep: 1152.30 | bwd_inner_microstep: 1152.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 18:55:27,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.75 | bwd_microstep: 792.53 | bwd_inner_microstep: 792.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 18:55:29,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.97 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 18:55:31,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.58 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 18:55:33,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1413.02 | bwd_inner_microstep: 1412.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 18:55:34,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1250.61 | bwd_inner_microstep: 1250.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-10 18:55:36,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.29 | bwd_microstep: 1411.71 | bwd_inner_microstep: 1411.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3679
[2024-06-10 18:55:39,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.72 | bwd_microstep: 1657.85 | bwd_inner_microstep: 1657.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2605
[2024-06-10 18:55:40,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.58 | bwd_microstep: 964.05 | bwd_inner_microstep: 964.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3519
[2024-06-10 18:55:42,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1650.88 | bwd_inner_microstep: 1650.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 18:55:44,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1340.31 | bwd_inner_microstep: 1340.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2947
[2024-06-10 18:55:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.84 | bwd_microstep: 1199.93 | bwd_inner_microstep: 1199.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647
[2024-06-10 18:55:48,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.62 | bwd_microstep: 1578.76 | bwd_inner_microstep: 1578.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744
[2024-06-10 18:55:50,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1471.51 | bwd_inner_microstep: 1471.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2279
[2024-06-10 18:55:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.34 | bwd_microstep: 924.66 | bwd_inner_microstep: 924.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 18:55:53,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.47 | bwd_microstep: 1358.01 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 18:55:55,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1309.53 | bwd_inner_microstep: 1309.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466
[2024-06-10 18:55:57,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.37 | bwd_microstep: 1308.87 | bwd_inner_microstep: 1308.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-10 18:55:59,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1355.36 | bwd_inner_microstep: 1355.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3636
[2024-06-10 18:56:01,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1437.60 | bwd_inner_microstep: 1437.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816
[2024-06-10 18:56:03,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.14 | bwd_microstep: 1621.45 | bwd_inner_microstep: 1621.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825
[2024-06-10 18:56:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.56 | bwd_microstep: 1751.57 | bwd_inner_microstep: 1751.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 18:56:07,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1393.70 | bwd_inner_microstep: 1393.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939
[2024-06-10 18:56:08,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.52 | bwd_microstep: 819.66 | bwd_inner_microstep: 819.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720
[2024-06-10 18:56:11,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.52 | bwd_microstep: 1629.87 | bwd_inner_microstep: 1629.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2889
[2024-06-10 18:56:15,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 18:56:15,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.70 | bwd_microstep: 3424.28 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 2086.49 | step_microstep: 37.97
[2024-06-10 18:56:15,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15897.82 | bwd: 44753.34 | bwd_inner: 42665.96 | bwd_allreduce: 2086.72 | step: 39.53
{'loss': 1.2001, 'learning_rate': 1.3655686617055466e-05, 'epoch': 0.61}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 18:56:16,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.12 | bwd_microstep: 1276.49 | bwd_inner_microstep: 1276.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532
[2024-06-10 18:56:18,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.70 | bwd_microstep: 1452.28 | bwd_inner_microstep: 1452.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-10 18:56:20,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.85 | bwd_microstep: 1558.17 | bwd_inner_microstep: 1558.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-10 18:56:22,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.08 | bwd_microstep: 1341.75 | bwd_inner_microstep: 1341.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4126
[2024-06-10 18:56:24,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1438.41 | bwd_inner_microstep: 1438.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-10 18:56:26,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.86 | bwd_microstep: 1533.74 | bwd_inner_microstep: 1533.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 18:56:28,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1244.07 | bwd_inner_microstep: 1244.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 18:56:30,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-10 18:56:32,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.16 | bwd_microstep: 1277.91 | bwd_inner_microstep: 1277.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 18:56:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.94 | bwd_microstep: 1189.69 | bwd_inner_microstep: 1189.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1968
[2024-06-10 18:56:35,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.18 | bwd_microstep: 839.23 | bwd_inner_microstep: 839.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124
[2024-06-10 18:56:36,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.01 | bwd_microstep: 921.54 | bwd_inner_microstep: 921.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491
[2024-06-10 18:56:38,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1439.75 | bwd_inner_microstep: 1439.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1934
[2024-06-10 18:56:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.53 | bwd_microstep: 884.69 | bwd_inner_microstep: 884.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3536
[2024-06-10 18:56:41,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1452.47 | bwd_inner_microstep: 1452.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2468
[2024-06-10 18:56:42,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.58 | bwd_microstep: 980.26 | bwd_inner_microstep: 980.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 18:56:44,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.45 | bwd_microstep: 1285.17 | bwd_inner_microstep: 1285.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 18:56:46,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.42 | bwd_microstep: 1289.80 | bwd_inner_microstep: 1289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 18:56:48,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.00 | bwd_microstep: 1611.22 | bwd_inner_microstep: 1611.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3821
[2024-06-10 18:56:50,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1485.15 | bwd_inner_microstep: 1485.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 18:56:51,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.67 | bwd_microstep: 799.61 | bwd_inner_microstep: 799.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 18:56:54,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.58 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 18:56:55,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1287.95 | bwd_inner_microstep: 1287.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 18:56:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.69 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2236
[2024-06-10 18:56:59,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.95 | bwd_microstep: 962.48 | bwd_inner_microstep: 962.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1512
[2024-06-10 18:56:59,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 226.38 | bwd_microstep: 589.68 | bwd_inner_microstep: 589.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-10 18:57:02,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.59 | bwd_microstep: 1580.56 | bwd_inner_microstep: 1580.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 18:57:04,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1392.96 | bwd_inner_microstep: 1392.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2107
[2024-06-10 18:57:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.87 | bwd_microstep: 855.55 | bwd_inner_microstep: 855.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3605
[2024-06-10 18:57:07,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.72 | bwd_microstep: 1754.09 | bwd_inner_microstep: 1754.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 18:57:09,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.18 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 18:57:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-10 18:57:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 6437.13 | bwd_inner_microstep: 1574.82 | bwd_allreduce_microstep: 4862.25 | step_microstep: 37.85
[2024-06-10 18:57:16,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15362.57 | bwd: 46018.64 | bwd_inner: 41155.47 | bwd_allreduce: 4862.49 | step: 39.31
{'loss': 1.2175, 'learning_rate': 1.3620102348454802e-05, 'epoch': 0.62}


 61%|██████    | 1057/1726 [18:14:46<11:26:39, 61.58s/it]
 61%|██████▏   | 1058/1726 [18:15:48<11:26:10, 61.63s/it]


 61%|██████▏   | 1058/1726 [18:15:48<11:26:10, 61.63s/it]
 61%|██████▏   | 1059/1726 [18:16:49<11:24:26, 61.57s/it]


 61%|██████▏   | 1059/1726 [18:16:49<11:24:26, 61.57s/it]
 61%|██████▏   | 1060/1726 [18:17:50<11:20:57, 61.35s/it]


 61%|██████▏   | 1060/1726 [18:17:50<11:20:57, 61.35s/it]
 61%|██████▏   | 1061/1726 [18:18:51<11:18:44, 61.24s/it]


 61%|██████▏   | 1061/1726 [18:18:51<11:18:44, 61.24s/it]
 62%|██████▏   | 1062/1726 [18:19:53<11:19:16, 61.38s/it]


 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3962
[2024-06-10 18:57:19,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.91 | bwd_microstep: 1776.49 | bwd_inner_microstep: 1776.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 18:57:21,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.67 | bwd_microstep: 1381.57 | bwd_inner_microstep: 1381.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 18:57:22,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.61 | bwd_microstep: 1242.07 | bwd_inner_microstep: 1242.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3760
[2024-06-10 18:57:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1338.07 | bwd_inner_microstep: 1338.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 18:57:26,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1243.31 | bwd_inner_microstep: 1243.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 18:57:28,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1281.65 | bwd_inner_microstep: 1281.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 18:57:29,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.68 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 18:57:31,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1398.01 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 18:57:33,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.76 | bwd_microstep: 1284.41 | bwd_inner_microstep: 1284.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418
[2024-06-10 18:57:35,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.32 | bwd_microstep: 1307.84 | bwd_inner_microstep: 1307.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 18:57:37,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1483.98 | bwd_inner_microstep: 1483.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2398
[2024-06-10 18:57:39,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.98 | bwd_microstep: 1097.16 | bwd_inner_microstep: 1097.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 18:57:41,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1604.30 | bwd_inner_microstep: 1604.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529
[2024-06-10 18:57:43,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.79 | bwd_microstep: 1413.70 | bwd_inner_microstep: 1413.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 18:57:45,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.54 | bwd_microstep: 1510.93 | bwd_inner_microstep: 1510.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 18:57:46,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.46 | bwd_microstep: 803.42 | bwd_inner_microstep: 803.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-10 18:57:47,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.47 | bwd_microstep: 878.79 | bwd_inner_microstep: 878.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 18:57:49,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.83 | bwd_microstep: 1341.71 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 18:57:51,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1284.24 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3464
[2024-06-10 18:57:52,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.19 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 18:57:55,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.74 | bwd_microstep: 1493.91 | bwd_inner_microstep: 1493.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 18:57:57,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1497.19 | bwd_inner_microstep: 1497.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 18:57:59,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1609.04 | bwd_inner_microstep: 1609.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1990
[2024-06-10 18:58:00,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.29 | bwd_microstep: 861.83 | bwd_inner_microstep: 861.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3569
[2024-06-10 18:58:02,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.82 | bwd_microstep: 1544.27 | bwd_inner_microstep: 1544.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 18:58:04,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.96 | bwd_microstep: 1538.58 | bwd_inner_microstep: 1538.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 18:58:07,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.11 | bwd_microstep: 1647.66 | bwd_inner_microstep: 1647.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3381
[2024-06-10 18:58:08,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.83 | bwd_microstep: 1273.62 | bwd_inner_microstep: 1273.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2281
[2024-06-10 18:58:09,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.39 | bwd_microstep: 782.86 | bwd_inner_microstep: 782.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3575
[2024-06-10 18:58:12,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1698.07 | bwd_inner_microstep: 1698.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3810
[2024-06-10 18:58:14,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.42 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2229
[2024-06-10 18:58:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.20 | optimizer_step: 6.56
[2024-06-10 18:58:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.44 | bwd_microstep: 4184.16 | bwd_inner_microstep: 981.22 | bwd_allreduce_microstep: 3202.89 | step_microstep: 37.94
[2024-06-10 18:58:18,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15955.16 | bwd: 45922.17 | bwd_inner: 42718.37 | bwd_allreduce: 3203.12 | step: 39.42
{'loss': 1.1969, 'learning_rate': 1.3584540549820493e-05, 'epoch': 0.62}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 18:58:20,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1375.73 | bwd_inner_microstep: 1375.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 18:58:22,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1375.75 | bwd_inner_microstep: 1375.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 18:58:24,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1338.20 | bwd_inner_microstep: 1338.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3044
[2024-06-10 18:58:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.82 | bwd_microstep: 1131.82 | bwd_inner_microstep: 1131.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 18:58:28,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.55 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4017
[2024-06-10 18:58:30,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.10 | bwd_microstep: 1709.50 | bwd_inner_microstep: 1709.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 18:58:32,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1251.39 | bwd_inner_microstep: 1251.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956
[2024-06-10 18:58:33,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.05 | bwd_microstep: 824.84 | bwd_inner_microstep: 824.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 18:58:35,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.01 | bwd_microstep: 1381.98 | bwd_inner_microstep: 1381.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 18:58:37,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1377.63 | bwd_inner_microstep: 1377.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3463
[2024-06-10 18:58:39,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.02 | bwd_microstep: 1362.16 | bwd_inner_microstep: 1362.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2432
[2024-06-10 18:58:40,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.02 | bwd_microstep: 1035.95 | bwd_inner_microstep: 1035.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 18:58:41,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.46 | bwd_microstep: 892.09 | bwd_inner_microstep: 892.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 18:58:43,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.04 | bwd_microstep: 1611.05 | bwd_inner_microstep: 1611.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3649
[2024-06-10 18:58:46,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.89 | bwd_microstep: 1652.84 | bwd_inner_microstep: 1652.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642
[2024-06-10 18:58:48,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1512.04 | bwd_inner_microstep: 1512.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3705
[2024-06-10 18:58:50,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.44 | bwd_microstep: 1297.27 | bwd_inner_microstep: 1297.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537
[2024-06-10 18:58:52,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1414.66 | bwd_inner_microstep: 1414.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 18:58:54,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.07 | bwd_microstep: 1489.08 | bwd_inner_microstep: 1489.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2289
[2024-06-10 18:58:55,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.38 | bwd_microstep: 942.36 | bwd_inner_microstep: 942.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3916
[2024-06-10 18:58:57,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.38 | bwd_microstep: 1792.75 | bwd_inner_microstep: 1792.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 18:59:00,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1645.72 | bwd_inner_microstep: 1645.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 18:59:01,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1350.43 | bwd_inner_microstep: 1350.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 18:59:03,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.03 | bwd_microstep: 975.94 | bwd_inner_microstep: 975.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3722
[2024-06-10 18:59:05,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.05 | bwd_microstep: 1781.16 | bwd_inner_microstep: 1781.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 18:59:07,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.53 | bwd_microstep: 1359.80 | bwd_inner_microstep: 1359.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 18:59:09,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.62 | bwd_microstep: 1522.43 | bwd_inner_microstep: 1522.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3603
[2024-06-10 18:59:11,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1641.63 | bwd_inner_microstep: 1641.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 18:59:14,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1556.72 | bwd_inner_microstep: 1556.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 18:59:16,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1556.74 | bwd_inner_microstep: 1556.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 18:59:18,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 18:59:47,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 18:59:47,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.32 | bwd_microstep: 29101.22 | bwd_inner_microstep: 1951.10 | bwd_allreduce_microstep: 27150.05 | step_microstep: 38.86
[2024-06-10 18:59:47,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16565.85 | bwd: 72021.23 | bwd_inner: 44870.25 | bwd_allreduce: 27150.29 | step: 40.33
{'loss': 1.2115, 'learning_rate': 1.3549001346401017e-05, 'epoch': 0.62}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 18:59:49,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.19 | bwd_microstep: 1366.78 | bwd_inner_microstep: 1366.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3906
[2024-06-10 18:59:51,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.91 | bwd_microstep: 1479.69 | bwd_inner_microstep: 1479.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-10 18:59:53,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.78 | bwd_microstep: 1547.42 | bwd_inner_microstep: 1547.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 18:59:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.19 | bwd_microstep: 1273.27 | bwd_inner_microstep: 1273.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 18:59:57,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1474.44 | bwd_inner_microstep: 1474.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 18:59:59,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.69 | bwd_inner_microstep: 1256.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 19:00:01,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.64 | bwd_microstep: 1626.15 | bwd_inner_microstep: 1626.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2087
[2024-06-10 19:00:02,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.60 | bwd_microstep: 787.79 | bwd_inner_microstep: 787.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 19:00:46,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 696.55 | bwd_inner_microstep: 696.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3626
[2024-06-10 19:00:48,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062
[2024-06-10 19:00:53,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.07 | bwd_microstep: 810.25 | bwd_inner_microstep: 810.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-10 19:00:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.72 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3503
[2024-06-10 19:00:58,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.50 | bwd_microstep: 1666.28 | bwd_inner_microstep: 1666.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 19:00:59,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1335.23 | bwd_inner_microstep: 1335.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 19:01:01,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1371.11 | bwd_inner_microstep: 1371.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2133
[2024-06-10 19:01:02,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.19 | bwd_microstep: 857.02 | bwd_inner_microstep: 856.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 19:01:04,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.00 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974
[2024-06-10 19:01:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.68 | bwd_microstep: 764.22 | bwd_inner_microstep: 764.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 19:01:07,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1388.71 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-10 19:01:08,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.92 | bwd_microstep: 798.49 | bwd_inner_microstep: 798.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3829
[2024-06-10 19:01:09,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.21 | bwd_microstep: 1257.58 | bwd_inner_microstep: 1257.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-10 19:01:11,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1318.43 | bwd_inner_microstep: 1318.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 19:01:13,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1296.47 | bwd_inner_microstep: 1296.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3088
[2024-06-10 19:01:15,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1241.67 | bwd_inner_microstep: 1241.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-10 19:01:17,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.55 | bwd_microstep: 1449.99 | bwd_inner_microstep: 1449.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 19:01:19,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.43 | bwd_microstep: 1296.00 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 19:01:21,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1491.85 | bwd_inner_microstep: 1491.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 19:01:23,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1639.71 | bwd_inner_microstep: 1639.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2104
[2024-06-10 19:01:24,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.01 | bwd_microstep: 1014.82 | bwd_inner_microstep: 1014.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3440
[2024-06-10 19:01:26,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.45 | bwd_microstep: 1496.60 | bwd_inner_microstep: 1496.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 19:01:28,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.01 | bwd_microstep: 1592.23 | bwd_inner_microstep: 1592.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3777
[2024-06-10 19:01:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 19:01:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.39 | bwd_microstep: 13206.18 | bwd_inner_microstep: 2301.03 | bwd_allreduce_microstep: 10905.08 | step_microstep: 38.06
[2024-06-10 19:01:42,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15398.48 | bwd: 52458.83 | bwd_inner: 41552.83 | bwd_allreduce: 10905.31 | step: 39.52
{'loss': 1.2288, 'learning_rate': 1.3513484863365265e-05, 'epoch': 0.62}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 19:01:44,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1330.32 | bwd_inner_microstep: 1330.23 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 19:01:46,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.21 | bwd_microstep: 1542.35 | bwd_inner_microstep: 1542.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 19:01:48,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1349.59 | bwd_inner_microstep: 1349.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 19:01:50,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.48 | bwd_microstep: 1146.97 | bwd_inner_microstep: 1146.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-10 19:01:52,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.85 | bwd_microstep: 1632.19 | bwd_inner_microstep: 1632.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 19:01:54,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.34 | bwd_microstep: 1290.75 | bwd_inner_microstep: 1290.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-10 19:01:55,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.21 | bwd_microstep: 1152.07 | bwd_inner_microstep: 1152.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 19:01:57,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1380.53 | bwd_inner_microstep: 1380.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 19:01:59,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1285.68 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 19:02:01,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1383.76 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 19:02:03,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1286.48 | bwd_inner_microstep: 1286.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625
[2024-06-10 19:02:05,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1441.58 | bwd_inner_microstep: 1441.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911
[2024-06-10 19:02:07,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.65 | bwd_microstep: 1679.98 | bwd_inner_microstep: 1679.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 19:02:09,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.65 | bwd_microstep: 1310.58 | bwd_inner_microstep: 1310.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3879
[2024-06-10 19:02:11,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.29 | bwd_microstep: 1673.51 | bwd_inner_microstep: 1673.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3463
[2024-06-10 19:02:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.74 | bwd_microstep: 1568.40 | bwd_inner_microstep: 1568.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3569
[2024-06-10 19:02:15,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.14 | bwd_microstep: 1331.01 | bwd_inner_microstep: 1330.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-10 19:02:17,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.83 | bwd_microstep: 1180.14 | bwd_inner_microstep: 1180.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-10 19:02:19,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.21 | bwd_microstep: 1199.76 | bwd_inner_microstep: 1199.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-10 19:02:20,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 808.72 | bwd_inner_microstep: 808.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 19:02:21,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.08 | bwd_microstep: 799.21 | bwd_inner_microstep: 799.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 19:02:23,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1252.62 | bwd_inner_microstep: 1252.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 19:02:24,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.90 | bwd_microstep: 1394.77 | bwd_inner_microstep: 1394.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 19:02:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.21 | bwd_microstep: 1293.38 | bwd_inner_microstep: 1293.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3995
[2024-06-10 19:02:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 695.92 | bwd_microstep: 1916.74 | bwd_inner_microstep: 1916.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 19:02:31,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1282.86 | bwd_inner_microstep: 1282.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735
[2024-06-10 19:02:33,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 1436.84 | bwd_inner_microstep: 1436.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3718
[2024-06-10 19:02:34,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.66 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3423
[2024-06-10 19:02:36,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2296
[2024-06-10 19:02:38,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.32 | bwd_microstep: 875.67 | bwd_inner_microstep: 875.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 19:02:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1346.82 | bwd_inner_microstep: 1346.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 19:03:06,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.36 | optimizer_step: 6.62
[2024-06-10 19:03:06,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 26410.54 | bwd_inner_microstep: 1800.07 | bwd_allreduce_microstep: 24610.40 | step_microstep: 38.98
[2024-06-10 19:03:06,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16079.66 | bwd: 67659.66 | bwd_inner: 43048.27 | bwd_allreduce: 24610.68 | step: 40.47
{'loss': 1.2629, 'learning_rate': 1.3477991225802103e-05, 'epoch': 0.62}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 19:03:08,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.49 | bwd_microstep: 1366.55 | bwd_inner_microstep: 1366.36 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3970
[2024-06-10 19:03:11,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.65 | bwd_microstep: 1596.09 | bwd_inner_microstep: 1596.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 19:03:13,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.84 | bwd_microstep: 1641.29 | bwd_inner_microstep: 1641.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 19:03:15,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1372.90 | bwd_inner_microstep: 1372.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3496
[2024-06-10 19:03:16,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.21 | bwd_microstep: 1217.12 | bwd_inner_microstep: 1217.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 19:03:18,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 19:03:20,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.88 | bwd_microstep: 1277.67 | bwd_inner_microstep: 1277.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 19:03:22,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 1307.36 | bwd_inner_microstep: 1307.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-10 19:04:05,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.06 | bwd_microstep: 1183.95 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 19:04:08,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.56 | bwd_microstep: 1377.14 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-10 19:04:10,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1423.33 | bwd_inner_microstep: 1423.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3547
[2024-06-10 19:04:11,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.31 | bwd_microstep: 1345.90 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 19:04:13,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1475.98 | bwd_inner_microstep: 1475.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 19:04:15,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.30 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3627
[2024-06-10 19:04:17,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.77 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-10 19:04:20,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.25 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 19:04:22,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1431.41 | bwd_inner_microstep: 1431.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 19:04:24,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.68 | bwd_microstep: 1483.13 | bwd_inner_microstep: 1483.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829
[2024-06-10 19:04:52,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.77 | bwd_microstep: 1740.23 | bwd_inner_microstep: 1740.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 19:04:54,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1409.15 | bwd_inner_microstep: 1409.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 19:04:56,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1383.95 | bwd_inner_microstep: 1383.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 19:04:57,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.16 | bwd_microstep: 695.80 | bwd_inner_microstep: 695.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 19:04:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1411.59 | bwd_inner_microstep: 1411.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 19:05:01,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1345.84 | bwd_inner_microstep: 1345.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2035
[2024-06-10 19:05:02,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.89 | bwd_microstep: 838.61 | bwd_inner_microstep: 838.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 19:05:03,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1280.70 | bwd_inner_microstep: 1280.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 19:05:06,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.51 | bwd_microstep: 1528.49 | bwd_inner_microstep: 1528.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 19:05:08,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1504.34 | bwd_inner_microstep: 1504.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3915
[2024-06-10 19:05:39,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.14 | bwd_microstep: 1477.64 | bwd_inner_microstep: 1477.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 19:05:58,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.65 | bwd_microstep: 1631.05 | bwd_inner_microstep: 1631.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2237
[2024-06-10 19:06:00,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.79 | bwd_microstep: 1052.08 | bwd_inner_microstep: 1052.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3450
[2024-06-10 19:06:14,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.43 | optimizer_step: 6.60
[2024-06-10 19:06:14,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 13512.61 | bwd_inner_microstep: 1572.32 | bwd_allreduce_microstep: 11940.22 | step_microstep: 41.05
[2024-06-10 19:06:14,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16512.88 | bwd: 56151.37 | bwd_inner: 44210.07 | bwd_allreduce: 11940.54 | step: 42.63
{'loss': 1.1741, 'learning_rate': 1.3442520558719944e-05, 'epoch': 0.62}
62%|██████▏   | 1062/1726 [18:19:53<11:19:16, 61.38s/it]
 62%|██████▏   | 1063/1726 [18:20:55<11:21:01, 61.63s/it]


 62%|██████▏   | 1063/1726 [18:20:55<11:21:01, 61.63s/it]
 62%|██████▏   | 1064/1726 [18:22:24<12:50:21, 69.82s/it]


 62%|██████▏   | 1064/1726 [18:22:24<12:50:21, 69.82s/it]
 62%|██████▏   | 1065/1726 [18:24:19<15:18:29, 83.37s/it]


 62%|██████▏   | 1065/1726 [18:24:19<15:18:29, 83.37s/it]
 62%|██████▏   | 1066/1726 [18:25:43<15:19:23, 83.58s/it]


 62%|██████▏   | 1066/1726 [18:25:43<15:19:23, 83.58s/it]
 62%|██████▏   | 1067/1726 [18:28:50<20:59:39, 114.69s/it]


 62%|█dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3417
[2024-06-10 19:06:15,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.45 | bwd_microstep: 1195.22 | bwd_inner_microstep: 1195.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 19:06:17,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1368.07 | bwd_inner_microstep: 1368.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 19:06:19,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.85 | bwd_microstep: 1276.40 | bwd_inner_microstep: 1276.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 19:06:21,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.88 | bwd_microstep: 1238.24 | bwd_inner_microstep: 1238.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 19:06:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.34 | bwd_microstep: 1625.50 | bwd_inner_microstep: 1625.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 19:06:25,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1374.94 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 19:06:27,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.03 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 19:06:29,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.46 | bwd_microstep: 1276.85 | bwd_inner_microstep: 1276.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 19:07:05,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1241.43 | bwd_inner_microstep: 1241.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3448
[2024-06-10 19:07:06,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.19 | bwd_microstep: 1183.38 | bwd_inner_microstep: 1183.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 19:07:08,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1382.46 | bwd_inner_microstep: 1382.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3452
[2024-06-10 19:07:10,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1402.50 | bwd_inner_microstep: 1402.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-10 19:07:11,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.43 | bwd_microstep: 873.28 | bwd_inner_microstep: 873.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2483
[2024-06-10 19:07:13,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.19 | bwd_microstep: 1070.96 | bwd_inner_microstep: 1070.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 19:07:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1408.06 | bwd_inner_microstep: 1408.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 19:07:17,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1370.61 | bwd_inner_microstep: 1370.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 19:07:19,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1386.89 | bwd_inner_microstep: 1386.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2950
[2024-06-10 19:07:20,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.84 | bwd_microstep: 1005.48 | bwd_inner_microstep: 1005.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 19:07:22,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1395.02 | bwd_inner_microstep: 1394.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2083
[2024-06-10 19:07:23,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.52 | bwd_microstep: 946.68 | bwd_inner_microstep: 946.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 19:07:25,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1656.50 | bwd_inner_microstep: 1656.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681
[2024-06-10 19:07:28,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.05 | bwd_microstep: 1520.48 | bwd_inner_microstep: 1520.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 19:07:29,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.47 | bwd_microstep: 1305.51 | bwd_inner_microstep: 1305.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 19:07:31,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1397.64 | bwd_inner_microstep: 1397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 19:07:33,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 1408.97 | bwd_inner_microstep: 1408.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3713
[2024-06-10 19:07:35,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.54 | bwd_microstep: 1490.29 | bwd_inner_microstep: 1490.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 19:07:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1414.70 | bwd_inner_microstep: 1414.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059
[2024-06-10 19:07:38,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.61 | bwd_microstep: 912.65 | bwd_inner_microstep: 912.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 19:07:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.09 | bwd_microstep: 1453.41 | bwd_inner_microstep: 1453.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 19:07:42,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.90 | bwd_microstep: 973.96 | bwd_inner_microstep: 973.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-10 19:07:44,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.06 | bwd_microstep: 1540.11 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 19:08:02,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 19:08:02,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 17137.64 | bwd_inner_microstep: 1551.76 | bwd_allreduce_microstep: 15585.81 | step_microstep: 38.34
[2024-06-10 19:08:02,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15728.26 | bwd: 57623.92 | bwd_inner: 42037.21 | bwd_allreduce: 15586.04 | step: 39.84
{'loss': 1.1831, 'learning_rate': 1.3407072987046283e-05, 'epoch': 0.62}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 19:08:03,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.63 | bwd_microstep: 1230.56 | bwd_inner_microstep: 1230.38 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 19:08:05,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1378.54 | bwd_inner_microstep: 1378.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 19:08:07,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.07 | bwd_microstep: 1496.86 | bwd_inner_microstep: 1496.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782
[2024-06-10 19:08:09,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1465.36 | bwd_inner_microstep: 1465.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3762
[2024-06-10 19:08:11,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.45 | bwd_microstep: 1494.96 | bwd_inner_microstep: 1494.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 19:08:13,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.09 | bwd_microstep: 786.65 | bwd_inner_microstep: 786.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 19:08:14,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.00 | bwd_microstep: 1246.61 | bwd_inner_microstep: 1246.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 19:08:15,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.43 | bwd_microstep: 678.27 | bwd_inner_microstep: 678.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699
[2024-06-10 19:08:17,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.46 | bwd_microstep: 1620.14 | bwd_inner_microstep: 1620.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1956
[2024-06-10 19:08:19,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.72 | bwd_microstep: 839.77 | bwd_inner_microstep: 839.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 19:08:20,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1375.42 | bwd_inner_microstep: 1375.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 19:08:23,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1489.21 | bwd_inner_microstep: 1489.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 19:08:25,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.97 | bwd_microstep: 1489.38 | bwd_inner_microstep: 1489.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2919
[2024-06-10 19:08:26,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.49 | bwd_microstep: 1030.67 | bwd_inner_microstep: 1030.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 19:08:28,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.89 | bwd_microstep: 1391.70 | bwd_inner_microstep: 1391.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3524
[2024-06-10 19:08:30,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.71 | bwd_microstep: 1228.50 | bwd_inner_microstep: 1228.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469
[2024-06-10 19:08:32,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.60 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-10 19:08:33,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.50 | bwd_microstep: 801.77 | bwd_inner_microstep: 801.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 19:08:35,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1276.74 | bwd_inner_microstep: 1276.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 19:08:37,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.74 | bwd_microstep: 1427.86 | bwd_inner_microstep: 1427.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 19:08:38,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.39 | bwd_microstep: 1402.51 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 19:08:40,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 19:08:42,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.72 | bwd_microstep: 1278.44 | bwd_inner_microstep: 1278.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 19:08:44,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.50 | bwd_microstep: 1290.01 | bwd_inner_microstep: 1289.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 19:08:46,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.25 | bwd_microstep: 1442.37 | bwd_inner_microstep: 1442.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 19:08:48,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.56 | bwd_microstep: 1607.89 | bwd_inner_microstep: 1607.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 19:08:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.72 | bwd_microstep: 1415.26 | bwd_inner_microstep: 1415.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 19:08:52,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1377.93 | bwd_inner_microstep: 1377.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3575
[2024-06-10 19:08:54,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.45 | bwd_microstep: 1663.66 | bwd_inner_microstep: 1663.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2957
[2024-06-10 19:08:56,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.94 | bwd_microstep: 1137.40 | bwd_inner_microstep: 1137.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3596
[2024-06-10 19:08:58,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.56 | bwd_microstep: 1702.74 | bwd_inner_microstep: 1702.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-10 19:09:02,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 19:09:02,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 3104.41 | bwd_inner_microstep: 1831.49 | bwd_allreduce_microstep: 1272.87 | step_microstep: 38.03
[2024-06-10 19:09:02,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15850.40 | bwd: 43992.16 | bwd_inner: 42718.25 | bwd_allreduce: 1273.17 | step: 39.65
{'loss': 1.1749, 'learning_rate': 1.3371648635627285e-05, 'epoch': 0.62}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465
[2024-06-10 19:09:04,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.34 | bwd_microstep: 1321.52 | bwd_inner_microstep: 1321.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 19:09:05,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1242.61 | bwd_inner_microstep: 1242.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 19:09:07,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.20 | bwd_microstep: 1377.81 | bwd_inner_microstep: 1377.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 19:09:09,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1492.21 | bwd_inner_microstep: 1492.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 19:09:12,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 1552.28 | bwd_inner_microstep: 1552.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 19:09:13,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.90 | bwd_microstep: 1384.23 | bwd_inner_microstep: 1384.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 19:09:15,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3712
[2024-06-10 19:09:17,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.51 | bwd_microstep: 1460.46 | bwd_inner_microstep: 1460.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 19:09:19,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1377.01 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 19:09:21,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2499
[2024-06-10 19:09:22,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.89 | bwd_microstep: 957.32 | bwd_inner_microstep: 957.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-10 19:09:25,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1633.49 | bwd_inner_microstep: 1633.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-10 19:09:27,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.04 | bwd_microstep: 1625.05 | bwd_inner_microstep: 1625.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529
[2024-06-10 19:09:29,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1354.31 | bwd_inner_microstep: 1354.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 19:09:30,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.82 | bwd_microstep: 699.73 | bwd_inner_microstep: 699.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 19:09:31,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1281.15 | bwd_inner_microstep: 1281.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534
[2024-06-10 19:09:33,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1324.97 | bwd_inner_microstep: 1324.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-10 19:09:35,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.11 | bwd_microstep: 1182.74 | bwd_inner_microstep: 1182.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654
[2024-06-10 19:09:37,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1425.24 | bwd_inner_microstep: 1425.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 19:09:39,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1295.59 | bwd_inner_microstep: 1295.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3467
[2024-06-10 19:09:41,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1364.04 | bwd_inner_microstep: 1364.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2007
[2024-06-10 19:09:42,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.99 | bwd_microstep: 837.13 | bwd_inner_microstep: 837.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 19:09:43,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.77 | bwd_microstep: 806.99 | bwd_inner_microstep: 806.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1976
[2024-06-10 19:09:44,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.86 | bwd_microstep: 735.49 | bwd_inner_microstep: 735.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 19:09:46,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.45 | bwd_microstep: 1310.08 | bwd_inner_microstep: 1310.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 19:09:48,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1410.48 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 19:09:50,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.75 | bwd_microstep: 1547.91 | bwd_inner_microstep: 1547.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-10 19:09:52,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.05 | bwd_microstep: 1600.30 | bwd_inner_microstep: 1600.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 19:09:54,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1390.90 | bwd_inner_microstep: 1390.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 19:09:56,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 19:09:58,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1503.32 | bwd_inner_microstep: 1503.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3808
[2024-06-10 19:10:01,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.05 | optimizer_step: 6.61
[2024-06-10 19:10:01,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.26 | bwd_microstep: 2430.25 | bwd_inner_microstep: 1899.62 | bwd_allreduce_microstep: 530.58 | step_microstep: 37.58
[2024-06-10 19:10:01,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15883.26 | bwd: 42975.10 | bwd_inner: 42443.62 | bwd_allreduce: 530.81 | step: 39.10
{'loss': 1.2407, 'learning_rate': 1.3336247629227339e-05, 'epoch': 0.62}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 19:10:03,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1499.92 | bwd_inner_microstep: 1499.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4603
[2024-06-10 19:10:06,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.11 | bwd_microstep: 1760.47 | bwd_inner_microstep: 1760.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4298
[2024-06-10 19:10:08,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.07 | bwd_microstep: 1679.24 | bwd_inner_microstep: 1679.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766
[2024-06-10 19:10:10,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.19 | bwd_microstep: 1603.91 | bwd_inner_microstep: 1603.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 19:10:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1384.27 | bwd_inner_microstep: 1384.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2206
[2024-06-10 19:10:13,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.63 | bwd_microstep: 959.55 | bwd_inner_microstep: 959.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 19:10:15,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1554.99 | bwd_inner_microstep: 1554.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1878
[2024-06-10 19:10:16,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.58 | bwd_microstep: 712.98 | bwd_inner_microstep: 712.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3406
[2024-06-10 19:10:18,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1309.62 | bwd_inner_microstep: 1309.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503
[2024-06-10 19:10:20,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.83 | bwd_microstep: 1328.33 | bwd_inner_microstep: 1328.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1915
[2024-06-10 19:10:21,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.47 | bwd_microstep: 879.09 | bwd_inner_microstep: 879.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3671
[2024-06-10 19:10:24,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.94 | bwd_microstep: 1664.26 | bwd_inner_microstep: 1664.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 19:10:25,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1340.62 | bwd_inner_microstep: 1340.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 19:10:27,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1251.74 | bwd_inner_microstep: 1251.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 19:10:29,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1614.10 | bwd_inner_microstep: 1614.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398
[2024-06-10 19:10:31,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 1439.70 | bwd_inner_microstep: 1439.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3873
[2024-06-10 19:10:34,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.26 | bwd_microstep: 1583.81 | bwd_inner_microstep: 1583.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 19:10:35,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.76 | bwd_microstep: 1375.79 | bwd_inner_microstep: 1375.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3668
[2024-06-10 19:10:38,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.77 | bwd_microstep: 1487.49 | bwd_inner_microstep: 1487.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2158
[2024-06-10 19:10:39,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.74 | bwd_microstep: 758.89 | bwd_inner_microstep: 758.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 19:10:41,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.81 | bwd_microstep: 1657.13 | bwd_inner_microstep: 1657.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 19:10:43,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.68 | bwd_microstep: 1427.75 | bwd_inner_microstep: 1427.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 19:10:45,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 19:10:47,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.64 | bwd_microstep: 1556.51 | bwd_inner_microstep: 1556.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.28
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 19:10:49,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.87 | bwd_microstep: 1494.92 | bwd_inner_microstep: 1494.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4058
[2024-06-10 19:10:51,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.73 | bwd_microstep: 1621.99 | bwd_inner_microstep: 1621.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 19:10:53,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547
[2024-06-10 19:10:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1519.88 | bwd_inner_microstep: 1519.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3598
[2024-06-10 19:10:57,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.62 | bwd_microstep: 1701.49 | bwd_inner_microstep: 1701.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2970
[2024-06-10 19:10:59,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.59 | bwd_microstep: 1201.21 | bwd_inner_microstep: 1201.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3413
[2024-06-10 19:11:01,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.76 | bwd_microstep: 1308.26 | bwd_inner_microstep: 1308.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2264
[2024-06-10 19:11:03,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 19:11:03,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.29 | bwd_microstep: 2356.07 | bwd_inner_microstep: 1028.12 | bwd_allreduce_microstep: 1327.90 | step_microstep: 37.98
[2024-06-10 19:11:03,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16492.77 | bwd: 45603.93 | bwd_inner: 44275.13 | bwd_allreduce: 1328.13 | step: 41.78
{'loss': 1.1883, 'learning_rate': 1.3300870092528607e-05, 'epoch': 0.62}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 19:11:05,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 794.36 | bwd_inner_microstep: 794.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3964
[2024-06-10 19:11:07,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.43 | bwd_microstep: 1599.05 | bwd_inner_microstep: 1599.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3927
[2024-06-10 19:11:09,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1392.51 | bwd_inner_microstep: 1392.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 19:11:11,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1479.92 | bwd_inner_microstep: 1479.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 19:11:13,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1276.84 | bwd_inner_microstep: 1276.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1860
[2024-06-10 19:11:13,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.74 | bwd_microstep: 677.74 | bwd_inner_microstep: 677.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3433
[2024-06-10 19:11:15,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.45 | bwd_microstep: 1312.26 | bwd_inner_microstep: 1312.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 19:11:17,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1251.16 | bwd_inner_microstep: 1251.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 19:11:19,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1345.71 | bwd_inner_microstep: 1345.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 19:11:21,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 1391.67 | bwd_inner_microstep: 1391.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 19:11:23,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1286.90 | bwd_inner_microstep: 1286.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2404
[2024-06-10 19:11:24,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.97 | bwd_microstep: 871.68 | bwd_inner_microstep: 871.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 19:11:26,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1320.78 | bwd_inner_microstep: 1320.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1985
[2024-06-10 19:11:27,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.71 | bwd_microstep: 784.36 | bwd_inner_microstep: 784.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 19:11:29,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.48 | bwd_microstep: 1353.31 | bwd_inner_microstep: 1353.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 19:11:31,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1490.90 | bwd_inner_microstep: 1490.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3665
[2024-06-10 19:11:33,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1373.38 | bwd_inner_microstep: 1373.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652
[2024-06-10 19:11:35,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.91 | bwd_microstep: 1445.66 | bwd_inner_microstep: 1445.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499
[2024-06-10 19:11:37,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.73 | bwd_microstep: 1437.13 | bwd_inner_microstep: 1437.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 19:11:38,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1295.52 | bwd_inner_microstep: 1295.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-10 19:11:40,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.17 | bwd_microstep: 1356.82 | bwd_inner_microstep: 1356.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3608
[2024-06-10 19:11:42,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.15 | bwd_microstep: 1215.27 | bwd_inner_microstep: 1215.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994
[2024-06-10 19:11:43,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.29 | bwd_microstep: 835.74 | bwd_inner_microstep: 835.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1908
[2024-06-10 19:11:44,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.32 | bwd_microstep: 733.63 | bwd_inner_microstep: 733.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2449
[2024-06-10 19:11:46,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.37 | bwd_microstep: 1110.49 | bwd_inner_microstep: 1110.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2060
[2024-06-10 19:11:47,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.72 | bwd_microstep: 1009.90 | bwd_inner_microstep: 1009.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 19:11:48,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 792.88 | bwd_inner_microstep: 792.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3777
[2024-06-10 19:11:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.41 | bwd_microstep: 1815.67 | bwd_inner_microstep: 1815.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3766
[2024-06-10 19:11:53,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.18 | bwd_microstep: 1740.19 | bwd_inner_microstep: 1740.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 19:11:55,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1453.42 | bwd_inner_microstep: 1453.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 19:11:57,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 1645.56 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 19:12:04,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 19:12:04,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 6036.77 | bwd_inner_microstep: 1526.23 | bwd_allreduce_microstep: 4510.49 | step_microstep: 38.45
[2024-06-10 19:12:04,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15078.38 | bwd: 44927.21 | bwd_inner: 40415.72 | bwd_allreduce: 4510.76 | step: 39.96
{'loss': 1.2107, 'learning_rate': 1.3265516150130577e-05, 'epoch': 0.62}
█████▏   | 1067/1726 [18:28:50<20:59:39, 114.69s/it]
 62%|██████▏   | 1068/1726 [18:30:38<20:35:35, 112.67s/it]


 62%|██████▏   | 1068/1726 [18:30:38<20:35:35, 112.67s/it]
 62%|██████▏   | 1069/1726 [18:31:39<17:41:17, 96.92s/it]


 62%|██████▏   | 1069/1726 [18:31:39<17:41:17, 96.92s/it]
 62%|██████▏   | 1070/1726 [18:32:38<15:35:55, 85.60s/it]


 62%|██████▏   | 1070/1726 [18:32:38<15:35:55, 85.60s/it]
 62%|██████▏   | 1071/1726 [18:33:40<14:18:39, 78.66s/it]


 62%|██████▏   | 1071/1726 [18:33:40<14:18:39, 78.66s/it]
 62%|██████▏   | 1072/1726 [18:34:41<13:17:27, 73.16s/it]


 62%|██�dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 19:12:05,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.87 | bwd_microstep: 788.81 | bwd_inner_microstep: 788.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-10 19:12:06,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.77 | bwd_microstep: 676.98 | bwd_inner_microstep: 676.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848
[2024-06-10 19:12:08,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1461.39 | bwd_inner_microstep: 1461.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 19:12:10,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.61 | bwd_microstep: 1450.36 | bwd_inner_microstep: 1450.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 19:12:12,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.73 | bwd_microstep: 1535.80 | bwd_inner_microstep: 1535.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 19:12:14,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1247.01 | bwd_inner_microstep: 1246.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 19:12:16,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1344.24 | bwd_inner_microstep: 1344.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-10 19:12:17,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 701.01 | bwd_inner_microstep: 700.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 19:12:18,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1344.08 | bwd_inner_microstep: 1344.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888
[2024-06-10 19:12:20,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.92 | bwd_microstep: 1090.02 | bwd_inner_microstep: 1089.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 19:12:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.04 | bwd_microstep: 1524.08 | bwd_inner_microstep: 1524.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 19:12:24,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.63 | bwd_microstep: 1436.80 | bwd_inner_microstep: 1436.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 19:12:26,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.13 | bwd_microstep: 1584.46 | bwd_inner_microstep: 1584.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 19:12:28,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.65 | bwd_microstep: 1254.26 | bwd_inner_microstep: 1254.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 19:12:30,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1425.05 | bwd_inner_microstep: 1425.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3512
[2024-06-10 19:12:32,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.58 | bwd_microstep: 1452.68 | bwd_inner_microstep: 1452.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3825
[2024-06-10 19:12:34,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.14 | bwd_microstep: 1684.33 | bwd_inner_microstep: 1684.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 19:12:36,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.03 | bwd_microstep: 1167.01 | bwd_inner_microstep: 1166.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2117
[2024-06-10 19:12:37,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.29 | bwd_microstep: 955.76 | bwd_inner_microstep: 955.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519
[2024-06-10 19:12:39,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1448.26 | bwd_inner_microstep: 1448.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3616
[2024-06-10 19:12:41,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.86 | bwd_microstep: 1582.14 | bwd_inner_microstep: 1582.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3837
[2024-06-10 19:12:43,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1320.89 | bwd_inner_microstep: 1320.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 19:12:45,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.83 | bwd_microstep: 978.69 | bwd_inner_microstep: 978.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 19:12:46,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.39 | bwd_microstep: 975.07 | bwd_inner_microstep: 975.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695
[2024-06-10 19:12:48,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1329.27 | bwd_inner_microstep: 1329.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 19:12:50,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1440.35 | bwd_inner_microstep: 1440.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 19:12:52,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1386.25 | bwd_inner_microstep: 1386.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2510
[2024-06-10 19:12:53,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.04 | bwd_microstep: 932.87 | bwd_inner_microstep: 932.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 19:12:55,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.64 | bwd_microstep: 1449.76 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 19:12:57,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1296.05 | bwd_inner_microstep: 1296.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 19:12:59,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1438.29 | bwd_inner_microstep: 1438.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3561
[2024-06-10 19:13:07,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 19:13:07,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.65 | bwd_microstep: 8071.91 | bwd_inner_microstep: 1741.99 | bwd_allreduce_microstep: 6329.88 | step_microstep: 37.85
[2024-06-10 19:13:07,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15452.31 | bwd: 47773.94 | bwd_inner: 41443.13 | bwd_allreduce: 6330.11 | step: 39.31
{'loss': 1.2141, 'learning_rate': 1.3230185926549654e-05, 'epoch': 0.62}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 19:13:09,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.31 | bwd_microstep: 1276.35 | bwd_inner_microstep: 1276.23 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 19:13:11,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.02 | bwd_microstep: 1236.80 | bwd_inner_microstep: 1236.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 19:13:13,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1379.96 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 19:13:15,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1342.75 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 19:13:16,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.18 | bwd_microstep: 1243.62 | bwd_inner_microstep: 1243.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 19:13:18,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1376.62 | bwd_inner_microstep: 1376.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 19:13:20,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1280.11 | bwd_inner_microstep: 1280.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 19:13:22,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.98 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405
[2024-06-10 19:13:24,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1309.36 | bwd_inner_microstep: 1309.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 19:13:26,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.61 | bwd_microstep: 1552.64 | bwd_inner_microstep: 1552.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2030
[2024-06-10 19:13:27,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.89 | bwd_microstep: 851.93 | bwd_inner_microstep: 851.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-10 19:13:28,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.33 | bwd_microstep: 757.62 | bwd_inner_microstep: 757.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1967
[2024-06-10 19:13:29,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.11 | bwd_microstep: 856.48 | bwd_inner_microstep: 856.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 19:13:31,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1343.63 | bwd_inner_microstep: 1343.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3657
[2024-06-10 19:13:33,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1556.41 | bwd_inner_microstep: 1556.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2718
[2024-06-10 19:13:34,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.90 | bwd_microstep: 939.70 | bwd_inner_microstep: 939.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 19:13:36,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.07 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-10 19:13:38,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1311.43 | bwd_inner_microstep: 1311.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 19:13:40,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1490.91 | bwd_inner_microstep: 1490.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2218
[2024-06-10 19:13:41,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.63 | bwd_microstep: 863.51 | bwd_inner_microstep: 863.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 19:13:43,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1348.22 | bwd_inner_microstep: 1348.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 19:13:45,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.12 | bwd_microstep: 1660.27 | bwd_inner_microstep: 1660.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 19:13:48,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.16 | bwd_microstep: 1499.34 | bwd_inner_microstep: 1499.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2272
[2024-06-10 19:13:49,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.05 | bwd_microstep: 1003.40 | bwd_inner_microstep: 1003.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3554
[2024-06-10 19:13:51,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.65 | bwd_microstep: 1202.56 | bwd_inner_microstep: 1202.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 19:13:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1406.13 | bwd_inner_microstep: 1406.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834
[2024-06-10 19:13:54,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.76 | bwd_microstep: 1358.87 | bwd_inner_microstep: 1358.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-10 19:13:56,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.27 | bwd_microstep: 909.28 | bwd_inner_microstep: 909.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3664
[2024-06-10 19:13:58,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1476.63 | bwd_inner_microstep: 1476.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 19:13:59,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1284.66 | bwd_inner_microstep: 1284.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3730
[2024-06-10 19:14:02,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.40 | bwd_microstep: 1463.43 | bwd_inner_microstep: 1463.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 19:14:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 19:14:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.22 | bwd_microstep: 4379.49 | bwd_inner_microstep: 1745.73 | bwd_allreduce_microstep: 2633.70 | step_microstep: 38.04
[2024-06-10 19:14:07,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15291.32 | bwd: 43524.33 | bwd_inner: 40889.63 | bwd_allreduce: 2633.97 | step: 39.52
{'loss': 1.2215, 'learning_rate': 1.3194879546218709e-05, 'epoch': 0.62}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 19:14:08,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.95 | bwd_microstep: 1336.28 | bwd_inner_microstep: 1336.16 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 19:14:10,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1477.95 | bwd_inner_microstep: 1477.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 19:14:12,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1342.93 | bwd_inner_microstep: 1342.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397
[2024-06-10 19:14:14,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.17 | bwd_microstep: 1146.53 | bwd_inner_microstep: 1146.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 19:14:15,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.33 | bwd_microstep: 792.81 | bwd_inner_microstep: 792.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 19:14:17,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 19:14:19,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.14 | bwd_microstep: 1385.41 | bwd_inner_microstep: 1385.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 19:14:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1379.61 | bwd_inner_microstep: 1379.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 19:14:22,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.70 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479
[2024-06-10 19:14:24,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.99 | bwd_microstep: 1185.80 | bwd_inner_microstep: 1185.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 19:14:26,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.70 | bwd_microstep: 1387.22 | bwd_inner_microstep: 1387.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3710
[2024-06-10 19:14:28,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.29 | bwd_microstep: 1471.85 | bwd_inner_microstep: 1471.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 19:14:30,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1338.43 | bwd_inner_microstep: 1338.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 19:14:32,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.79 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1902
[2024-06-10 19:14:33,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.35 | bwd_microstep: 807.46 | bwd_inner_microstep: 807.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3513
[2024-06-10 19:14:35,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1510.89 | bwd_inner_microstep: 1510.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3550
[2024-06-10 19:14:37,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.67 | bwd_microstep: 1440.99 | bwd_inner_microstep: 1440.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3833
[2024-06-10 19:14:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.53 | bwd_microstep: 1689.37 | bwd_inner_microstep: 1689.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 19:14:42,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1491.09 | bwd_inner_microstep: 1491.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 19:14:43,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1390.82 | bwd_inner_microstep: 1390.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-10 19:14:46,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.87 | bwd_microstep: 1511.14 | bwd_inner_microstep: 1511.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3520
[2024-06-10 19:14:47,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.78 | bwd_microstep: 1192.23 | bwd_inner_microstep: 1192.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 19:14:49,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1351.49 | bwd_inner_microstep: 1351.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 19:14:51,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1381.13 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 19:14:53,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1295.52 | bwd_inner_microstep: 1295.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-10 19:14:55,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.21 | bwd_microstep: 1435.22 | bwd_inner_microstep: 1435.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-10 19:14:56,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.17 | bwd_microstep: 969.91 | bwd_inner_microstep: 969.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603
[2024-06-10 19:14:58,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 1703.86 | bwd_inner_microstep: 1703.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 19:15:00,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1434.89 | bwd_inner_microstep: 1434.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4111
[2024-06-10 19:15:03,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.16 | bwd_microstep: 1745.75 | bwd_inner_microstep: 1745.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 19:15:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1395.92 | bwd_inner_microstep: 1395.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 19:15:07,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.14 | optimizer_step: 6.57
[2024-06-10 19:15:07,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.86 | bwd_microstep: 2012.51 | bwd_inner_microstep: 1525.41 | bwd_allreduce_microstep: 487.05 | step_microstep: 37.52
[2024-06-10 19:15:07,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16334.67 | bwd: 44164.75 | bwd_inner: 43676.70 | bwd_allreduce: 487.35 | step: 39.10
{'loss': 1.1837, 'learning_rate': 1.3159597133486628e-05, 'epoch': 0.62}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 19:15:09,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.84 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 19:15:11,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.26 | bwd_microstep: 1178.14 | bwd_inner_microstep: 1178.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 19:15:13,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1552.94 | bwd_inner_microstep: 1552.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 19:15:15,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1244.60 | bwd_inner_microstep: 1244.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 19:15:17,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1382.41 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 19:15:19,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 19:15:19,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.75 | bwd_microstep: 679.24 | bwd_inner_microstep: 679.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 19:15:21,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.45 | bwd_microstep: 1281.70 | bwd_inner_microstep: 1281.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 19:15:23,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.36 | bwd_microstep: 1618.02 | bwd_inner_microstep: 1618.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-10 19:15:25,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.86 | bwd_microstep: 1450.04 | bwd_inner_microstep: 1450.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 19:15:27,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.20 | bwd_microstep: 1351.78 | bwd_inner_microstep: 1351.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 19:15:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1535.21 | bwd_inner_microstep: 1535.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3662
[2024-06-10 19:15:31,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.05 | bwd_microstep: 1354.48 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 19:15:34,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1613.56 | bwd_inner_microstep: 1613.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 19:15:36,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1575.50 | bwd_inner_microstep: 1575.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 19:15:38,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.51 | bwd_microstep: 1498.30 | bwd_inner_microstep: 1498.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512
[2024-06-10 19:15:40,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.98 | bwd_microstep: 1437.90 | bwd_inner_microstep: 1437.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 19:15:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.35 | bwd_microstep: 1393.91 | bwd_inner_microstep: 1393.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 19:15:44,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.83 | bwd_microstep: 1487.99 | bwd_inner_microstep: 1487.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 19:15:46,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.71 | bwd_microstep: 1308.81 | bwd_inner_microstep: 1308.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 19:15:48,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.42 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-10 19:15:50,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.00 | bwd_microstep: 1442.63 | bwd_inner_microstep: 1442.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 19:15:52,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.71 | bwd_microstep: 1501.72 | bwd_inner_microstep: 1501.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 19:15:54,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1611.58 | bwd_inner_microstep: 1611.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563
[2024-06-10 19:15:56,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1428.94 | bwd_inner_microstep: 1428.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 19:15:58,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.06 | bwd_microstep: 1662.80 | bwd_inner_microstep: 1662.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516
[2024-06-10 19:16:00,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.48 | bwd_microstep: 1192.55 | bwd_inner_microstep: 1192.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 19:16:02,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1609.51 | bwd_inner_microstep: 1609.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 19:16:04,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1442.31 | bwd_inner_microstep: 1442.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 19:16:06,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.87 | bwd_microstep: 1651.70 | bwd_inner_microstep: 1651.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 19:16:08,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 19:16:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 19:16:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1415.77 | bwd_inner_microstep: 1408.13 | bwd_allreduce_microstep: 7.59 | step_microstep: 37.61
[2024-06-10 19:16:10,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16977.60 | bwd: 45574.23 | bwd_inner: 45565.75 | bwd_allreduce: 7.82 | step: 39.04
{'loss': 1.1564, 'learning_rate': 1.3124338812617881e-05, 'epoch': 0.62}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-10 19:16:11,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.16 | bwd_microstep: 799.96 | bwd_inner_microstep: 799.90 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 19:16:13,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1385.43 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850
[2024-06-10 19:16:15,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.61 | bwd_microstep: 1563.25 | bwd_inner_microstep: 1563.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 19:16:18,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.71 | bwd_microstep: 1654.25 | bwd_inner_microstep: 1654.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 19:16:20,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1441.39 | bwd_inner_microstep: 1441.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 19:16:22,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.18 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 19:16:24,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1433.86 | bwd_inner_microstep: 1433.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 19:16:25,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1381.44 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873
[2024-06-10 19:16:26,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.13 | bwd_microstep: 678.48 | bwd_inner_microstep: 678.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 19:16:28,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.46 | bwd_microstep: 1286.68 | bwd_inner_microstep: 1286.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 19:16:30,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.56 | bwd_microstep: 1283.81 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2086
[2024-06-10 19:16:31,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.61 | bwd_microstep: 881.21 | bwd_inner_microstep: 881.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 19:16:33,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 19:16:35,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1451.81 | bwd_inner_microstep: 1451.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942
[2024-06-10 19:16:36,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.00 | bwd_microstep: 888.95 | bwd_inner_microstep: 888.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568
[2024-06-10 19:16:38,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1362.86 | bwd_inner_microstep: 1362.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635
[2024-06-10 19:16:40,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 19:16:42,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1394.37 | bwd_inner_microstep: 1394.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 19:16:44,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.61 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 19:16:46,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1357.35 | bwd_inner_microstep: 1357.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-10 19:16:48,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.16 | bwd_microstep: 1632.51 | bwd_inner_microstep: 1632.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 19:16:50,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.66 | bwd_microstep: 1155.69 | bwd_inner_microstep: 1155.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3695
[2024-06-10 19:16:52,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.82 | bwd_microstep: 1724.49 | bwd_inner_microstep: 1724.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 19:16:54,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.00 | bwd_microstep: 1661.66 | bwd_inner_microstep: 1661.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 19:16:56,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1302.44 | bwd_inner_microstep: 1302.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1928
[2024-06-10 19:16:57,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.64 | bwd_microstep: 761.12 | bwd_inner_microstep: 761.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 19:16:59,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.95 | bwd_microstep: 1306.10 | bwd_inner_microstep: 1306.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3700
[2024-06-10 19:17:01,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2672
[2024-06-10 19:17:03,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.06 | bwd_microstep: 1152.99 | bwd_inner_microstep: 1152.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 19:17:05,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.86 | bwd_microstep: 1601.09 | bwd_inner_microstep: 1601.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 19:17:07,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1376.05 | bwd_inner_microstep: 1376.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 19:17:12,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.34 | optimizer_step: 6.60
[2024-06-10 19:17:12,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 5064.44 | bwd_inner_microstep: 1864.27 | bwd_allreduce_microstep: 3200.10 | step_microstep: 38.69
[2024-06-10 19:17:12,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15865.08 | bwd: 45888.37 | bwd_inner: 42687.29 | bwd_allreduce: 3200.36 | step: 40.15
{'loss': 1.2133, 'learning_rate': 1.308910470779209e-05, 'epoch': 0.62}
��███▏   | 1072/1726 [18:34:41<13:17:27, 73.16s/it]
 62%|██████▏   | 1073/1726 [18:35:44<12:44:51, 70.28s/it]


 62%|██████▏   | 1073/1726 [18:35:44<12:44:51, 70.28s/it]
 62%|██████▏   | 1074/1726 [18:36:43<12:07:22, 66.94s/it]


 62%|██████▏   | 1074/1726 [18:36:43<12:07:22, 66.94s/it]
 62%|██████▏   | 1075/1726 [18:37:44<11:46:22, 65.10s/it]


 62%|██████▏   | 1075/1726 [18:37:44<11:46:22, 65.10s/it]
 62%|██████▏   | 1076/1726 [18:38:47<11:38:05, 64.44s/it]


 62%|██████▏   | 1076/1726 [18:38:47<11:38:05, 64.44s/it]
 62%|██████▏   | 1077/1726 [18:39:49<11:29:23, 63.73s/it]


 62%|█████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 19:17:14,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1334.46 | bwd_inner_microstep: 1334.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915
[2024-06-10 19:17:16,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.24 | bwd_microstep: 1584.79 | bwd_inner_microstep: 1584.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3863
[2024-06-10 19:17:18,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.27 | bwd_microstep: 1520.83 | bwd_inner_microstep: 1520.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 19:17:20,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1375.55 | bwd_inner_microstep: 1375.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 19:17:22,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1537.11 | bwd_inner_microstep: 1537.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 19:17:24,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1252.54 | bwd_inner_microstep: 1252.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 19:17:26,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.04 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 19:17:28,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1295.36 | bwd_inner_microstep: 1295.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 19:17:30,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1149.82 | bwd_inner_microstep: 1149.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 19:17:31,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1387.28 | bwd_inner_microstep: 1387.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-10 19:17:34,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.14 | bwd_microstep: 1524.12 | bwd_inner_microstep: 1524.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 19:17:35,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1278.96 | bwd_inner_microstep: 1278.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3642
[2024-06-10 19:17:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1436.32 | bwd_inner_microstep: 1436.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-10 19:17:39,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1416.10 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1190
[2024-06-10 19:17:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 177.39 | bwd_microstep: 458.63 | bwd_inner_microstep: 458.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 19:17:42,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.63 | bwd_microstep: 1520.82 | bwd_inner_microstep: 1520.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 19:17:44,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.72 | bwd_microstep: 1485.65 | bwd_inner_microstep: 1485.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962
[2024-06-10 19:17:45,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 889.47 | bwd_inner_microstep: 889.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3453
[2024-06-10 19:17:47,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1320.32 | bwd_inner_microstep: 1320.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 19:17:49,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.55 | bwd_microstep: 1261.35 | bwd_inner_microstep: 1261.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831
[2024-06-10 19:17:51,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.89 | bwd_microstep: 1389.68 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-10 19:17:53,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1325.91 | bwd_inner_microstep: 1325.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 19:17:55,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.97 | bwd_microstep: 1485.38 | bwd_inner_microstep: 1485.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3595
[2024-06-10 19:17:57,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.86 | bwd_microstep: 1702.06 | bwd_inner_microstep: 1702.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 19:17:59,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-10 19:18:01,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.37 | bwd_microstep: 1300.92 | bwd_inner_microstep: 1300.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3582
[2024-06-10 19:18:03,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1365.50 | bwd_inner_microstep: 1365.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 19:18:05,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 1509.84 | bwd_inner_microstep: 1509.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3649
[2024-06-10 19:18:07,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.03 | bwd_microstep: 1583.04 | bwd_inner_microstep: 1583.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580
[2024-06-10 19:18:09,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1461.11 | bwd_inner_microstep: 1461.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 19:18:11,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.74 | bwd_microstep: 1636.89 | bwd_inner_microstep: 1636.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 19:18:14,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.15 | optimizer_step: 6.63
[2024-06-10 19:18:14,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 2434.98 | bwd_inner_microstep: 1692.27 | bwd_allreduce_microstep: 742.64 | step_microstep: 37.76
[2024-06-10 19:18:14,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16536.36 | bwd: 45088.17 | bwd_inner: 44344.62 | bwd_allreduce: 742.86 | step: 39.25
{'loss': 1.2284, 'learning_rate': 1.3053894943103598e-05, 'epoch': 0.62}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 19:18:16,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1384.33 | bwd_inner_microstep: 1384.26 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908
[2024-06-10 19:18:18,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1588.89 | bwd_inner_microstep: 1588.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 19:18:20,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1339.50 | bwd_inner_microstep: 1339.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 19:18:22,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.26 | bwd_microstep: 1443.29 | bwd_inner_microstep: 1443.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-10 19:18:24,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1528.65 | bwd_inner_microstep: 1528.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 19:18:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1349.98 | bwd_inner_microstep: 1349.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 19:18:28,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1536.49 | bwd_inner_microstep: 1536.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 19:18:30,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.89 | bwd_microstep: 1483.69 | bwd_inner_microstep: 1483.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 19:18:32,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1406.85 | bwd_inner_microstep: 1406.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 19:18:34,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.85 | bwd_microstep: 1474.78 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 19:18:36,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1342.02 | bwd_inner_microstep: 1341.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 19:18:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1526.84 | bwd_inner_microstep: 1526.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 19:18:40,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1246.35 | bwd_inner_microstep: 1246.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3536
[2024-06-10 19:18:42,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.05 | bwd_microstep: 1520.73 | bwd_inner_microstep: 1520.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2092
[2024-06-10 19:18:43,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.08 | bwd_microstep: 758.10 | bwd_inner_microstep: 758.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 19:18:45,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1397.31 | bwd_inner_microstep: 1397.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 19:18:47,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.92 | bwd_microstep: 1458.98 | bwd_inner_microstep: 1458.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 19:18:49,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1297.64 | bwd_inner_microstep: 1297.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3924
[2024-06-10 19:18:51,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1499.81 | bwd_inner_microstep: 1499.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 19:18:52,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.11 | bwd_microstep: 798.23 | bwd_inner_microstep: 798.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-10 19:18:53,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.60 | bwd_microstep: 974.61 | bwd_inner_microstep: 974.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 19:18:55,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1395.63 | bwd_inner_microstep: 1395.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896
[2024-06-10 19:18:58,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.35 | bwd_microstep: 1585.82 | bwd_inner_microstep: 1585.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-10 19:18:59,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.85 | bwd_microstep: 1307.00 | bwd_inner_microstep: 1306.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437
[2024-06-10 19:19:01,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.55 | bwd_microstep: 1380.97 | bwd_inner_microstep: 1380.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 19:19:03,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1313.01 | bwd_inner_microstep: 1312.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 19:19:05,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 1516.33 | bwd_inner_microstep: 1516.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3571
[2024-06-10 19:19:07,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1358.79 | bwd_inner_microstep: 1358.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 19:19:09,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.37 | bwd_microstep: 1593.06 | bwd_inner_microstep: 1593.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-10 19:19:11,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.44 | bwd_microstep: 1597.47 | bwd_inner_microstep: 1597.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 19:19:14,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.23 | bwd_microstep: 1497.55 | bwd_inner_microstep: 1497.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 19:19:17,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 19:19:17,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.59 | bwd_microstep: 2659.86 | bwd_inner_microstep: 1935.59 | bwd_allreduce_microstep: 724.22 | step_microstep: 37.60
[2024-06-10 19:19:17,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16630.32 | bwd: 45562.59 | bwd_inner: 44837.41 | bwd_allreduce: 724.47 | step: 39.07
{'loss': 1.1892, 'learning_rate': 1.3018709642561e-05, 'epoch': 0.63}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 19:19:19,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1396.29 | bwd_inner_microstep: 1396.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 19:19:21,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1401.79 | bwd_inner_microstep: 1401.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 19:19:23,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1477.99 | bwd_inner_microstep: 1477.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924
[2024-06-10 19:19:24,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 695.57 | bwd_inner_microstep: 695.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2379
[2024-06-10 19:19:25,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.45 | bwd_microstep: 961.99 | bwd_inner_microstep: 961.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794
[2024-06-10 19:19:27,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.19 | bwd_microstep: 1648.65 | bwd_inner_microstep: 1648.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 19:19:29,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3774
[2024-06-10 19:19:31,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1345.51 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 19:19:33,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.02 | bwd_microstep: 1285.35 | bwd_inner_microstep: 1285.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101
[2024-06-10 19:19:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.11 | bwd_microstep: 824.07 | bwd_inner_microstep: 824.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 19:19:36,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.22 | bwd_inner_microstep: 1253.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-10 19:19:38,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.77 | bwd_microstep: 1580.30 | bwd_inner_microstep: 1580.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481
[2024-06-10 19:19:40,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1571.32 | bwd_inner_microstep: 1571.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3508
[2024-06-10 19:19:42,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.54 | bwd_microstep: 1347.65 | bwd_inner_microstep: 1347.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 19:19:43,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1252.40 | bwd_inner_microstep: 1252.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 19:19:45,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1398.15 | bwd_inner_microstep: 1398.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650
[2024-06-10 19:19:47,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.00 | bwd_microstep: 1416.95 | bwd_inner_microstep: 1416.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 19:19:49,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1417.44 | bwd_inner_microstep: 1417.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 19:19:51,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.91 | bwd_microstep: 1346.65 | bwd_inner_microstep: 1346.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 19:19:53,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1433.46 | bwd_inner_microstep: 1433.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 19:19:54,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 797.48 | bwd_inner_microstep: 797.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178
[2024-06-10 19:19:55,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.27 | bwd_microstep: 859.90 | bwd_inner_microstep: 859.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-10 19:19:56,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.67 | bwd_microstep: 729.43 | bwd_inner_microstep: 729.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3832
[2024-06-10 19:19:58,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1360.08 | bwd_inner_microstep: 1360.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-10 19:20:00,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1498.97 | bwd_inner_microstep: 1498.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-10 19:20:02,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.00 | bwd_microstep: 1006.35 | bwd_inner_microstep: 1006.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049
[2024-06-10 19:20:03,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 812.72 | bwd_inner_microstep: 812.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2235
[2024-06-10 19:20:04,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.16 | bwd_microstep: 837.17 | bwd_inner_microstep: 837.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 19:20:06,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1351.56 | bwd_inner_microstep: 1351.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2112
[2024-06-10 19:20:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.64 | bwd_microstep: 1018.91 | bwd_inner_microstep: 1018.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 19:20:09,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.71 | bwd_microstep: 1454.45 | bwd_inner_microstep: 1454.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3797
[2024-06-10 19:20:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 19:20:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.05 | bwd_microstep: 8473.61 | bwd_inner_microstep: 1827.22 | bwd_allreduce_microstep: 6646.33 | step_microstep: 38.47
[2024-06-10 19:20:18,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14863.13 | bwd: 46501.62 | bwd_inner: 39854.38 | bwd_allreduce: 6646.57 | step: 39.95
{'loss': 1.2096, 'learning_rate': 1.2983548930086757e-05, 'epoch': 0.63}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-10 19:20:20,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1447.53 | bwd_inner_microstep: 1447.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-10 19:20:21,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.06 | bwd_microstep: 696.75 | bwd_inner_microstep: 696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423
[2024-06-10 19:20:23,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.30 | bwd_microstep: 1150.73 | bwd_inner_microstep: 1150.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4133
[2024-06-10 19:20:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.27 | bwd_microstep: 1535.31 | bwd_inner_microstep: 1535.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1869
[2024-06-10 19:20:26,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.51 | bwd_microstep: 741.62 | bwd_inner_microstep: 741.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 19:20:28,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.51 | bwd_microstep: 1644.81 | bwd_inner_microstep: 1644.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 19:20:30,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.20 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 19:20:32,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1378.55 | bwd_inner_microstep: 1378.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3422
[2024-06-10 19:20:34,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.37 | bwd_microstep: 1280.09 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756
[2024-06-10 19:20:36,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.73 | bwd_microstep: 1628.78 | bwd_inner_microstep: 1628.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 19:20:38,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.27 | bwd_microstep: 1343.28 | bwd_inner_microstep: 1343.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1888
[2024-06-10 19:20:39,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 835.80 | bwd_inner_microstep: 835.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 19:20:41,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1476.56 | bwd_inner_microstep: 1476.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 19:20:43,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.34 | bwd_microstep: 1336.11 | bwd_inner_microstep: 1336.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3663
[2024-06-10 19:20:45,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.20 | bwd_microstep: 1547.74 | bwd_inner_microstep: 1547.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 19:20:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1387.50 | bwd_inner_microstep: 1387.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-10 19:20:49,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.67 | bwd_microstep: 1596.63 | bwd_inner_microstep: 1596.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 19:20:52,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1602.73 | bwd_inner_microstep: 1602.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523
[2024-06-10 19:20:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1453.31 | bwd_inner_microstep: 1453.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3681
[2024-06-10 19:20:55,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1286.18 | bwd_inner_microstep: 1286.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3545
[2024-06-10 19:20:57,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1545.60 | bwd_inner_microstep: 1545.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-10 19:20:59,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.70 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2944
[2024-06-10 19:21:01,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.20 | bwd_microstep: 1006.99 | bwd_inner_microstep: 1006.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 19:21:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1399.75 | bwd_inner_microstep: 1399.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3563
[2024-06-10 19:21:04,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.32 | bwd_microstep: 1332.58 | bwd_inner_microstep: 1332.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 19:21:07,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.45 | bwd_microstep: 1608.96 | bwd_inner_microstep: 1608.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 19:21:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.60 | bwd_microstep: 973.00 | bwd_inner_microstep: 972.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698
[2024-06-10 19:21:10,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.39 | bwd_microstep: 1533.47 | bwd_inner_microstep: 1533.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 19:21:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1432.78 | bwd_inner_microstep: 1432.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 842
[2024-06-10 19:21:13,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.41 | bwd_microstep: 345.41 | bwd_inner_microstep: 345.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756
[2024-06-10 19:21:15,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1536.70 | bwd_inner_microstep: 1536.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585
[2024-06-10 19:21:20,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.58
[2024-06-10 19:21:20,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 4298.69 | bwd_inner_microstep: 1499.19 | bwd_allreduce_microstep: 2799.45 | step_microstep: 37.87
[2024-06-10 19:21:20,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15751.00 | bwd: 45013.83 | bwd_inner: 42213.46 | bwd_allreduce: 2799.67 | step: 39.39
{'loss': 1.191, 'learning_rate': 1.2948412929516703e-05, 'epoch': 0.63}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 19:21:21,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1332.67 | bwd_inner_microstep: 1332.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 19:21:23,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1249.58 | bwd_inner_microstep: 1249.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 19:21:24,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.55 | bwd_microstep: 875.17 | bwd_inner_microstep: 875.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 19:21:26,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1244.60 | bwd_inner_microstep: 1244.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 19:21:28,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.75 | bwd_microstep: 1538.99 | bwd_inner_microstep: 1538.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 19:21:30,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1245.69 | bwd_inner_microstep: 1245.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 19:21:32,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.03 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 19:21:34,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1251.90 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2433
[2024-06-10 19:21:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.26 | bwd_microstep: 946.47 | bwd_inner_microstep: 946.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 19:21:37,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.33 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 19:21:38,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.35 | bwd_microstep: 789.73 | bwd_inner_microstep: 789.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3412
[2024-06-10 19:21:40,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1407.24 | bwd_inner_microstep: 1407.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3691
[2024-06-10 19:21:42,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.72 | bwd_microstep: 1721.36 | bwd_inner_microstep: 1721.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2019
[2024-06-10 19:21:43,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.70 | bwd_microstep: 743.55 | bwd_inner_microstep: 743.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 19:21:45,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1485.96 | bwd_inner_microstep: 1485.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 19:21:47,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1613.75 | bwd_inner_microstep: 1613.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 19:21:49,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1282.17 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1959
[2024-06-10 19:21:50,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 731.94 | bwd_inner_microstep: 731.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 19:21:52,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.58 | bwd_microstep: 1186.61 | bwd_inner_microstep: 1186.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2450
[2024-06-10 19:21:53,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.09 | bwd_microstep: 978.10 | bwd_inner_microstep: 978.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 19:21:55,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1414.49 | bwd_inner_microstep: 1414.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2191
[2024-06-10 19:21:56,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.54 | bwd_microstep: 860.70 | bwd_inner_microstep: 860.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-10 19:21:58,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.57 | bwd_microstep: 1287.67 | bwd_inner_microstep: 1287.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284
[2024-06-10 19:21:59,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.66 | bwd_microstep: 1069.54 | bwd_inner_microstep: 1069.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 19:22:02,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1496.28 | bwd_inner_microstep: 1496.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3814
[2024-06-10 19:22:04,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.27 | bwd_microstep: 1856.85 | bwd_inner_microstep: 1856.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 19:22:06,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.80 | bwd_microstep: 1355.06 | bwd_inner_microstep: 1355.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823
[2024-06-10 19:22:08,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.12 | bwd_microstep: 1417.21 | bwd_inner_microstep: 1417.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 19:22:10,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.03 | bwd_microstep: 1473.98 | bwd_inner_microstep: 1473.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2328
[2024-06-10 19:22:11,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.70 | bwd_microstep: 983.62 | bwd_inner_microstep: 983.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 19:22:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1248.79 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044
[2024-06-10 19:22:21,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.10 | optimizer_step: 6.63
[2024-06-10 19:22:21,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.29 | bwd_microstep: 8097.93 | bwd_inner_microstep: 1059.11 | bwd_allreduce_microstep: 7038.76 | step_microstep: 38.03
[2024-06-10 19:22:21,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14803.38 | bwd: 46760.03 | bwd_inner: 39720.37 | bwd_allreduce: 7038.99 | step: 39.51
{'loss': 1.1777, 'learning_rate': 1.291330176459965e-05, 'epoch': 0.63}
��▏   | 1077/1726 [18:39:49<11:29:23, 63.73s/it]
 62%|██████▏   | 1078/1726 [18:40:51<11:22:33, 63.20s/it]


 62%|██████▏   | 1078/1726 [18:40:51<11:22:33, 63.20s/it]
 63%|██████▎   | 1079/1726 [18:41:54<11:19:19, 63.00s/it]


 63%|██████▎   | 1079/1726 [18:41:54<11:19:19, 63.00s/it]
 63%|██████▎   | 1080/1726 [18:42:55<11:14:03, 62.61s/it]


 63%|██████▎   | 1080/1726 [18:42:55<11:14:03, 62.61s/it]
 63%|██████▎   | 1081/1726 [18:43:56<11:08:08, 62.15s/it]


 63%|██████▎   | 1081/1726 [18:43:56<11:08:08, 62.15s/it]
 63%|██████▎   | 1082/1726 [18:44:58<11:06:15, 62.07s/it]


 63%|██████▎   |dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 19:22:23,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1437.19 | bwd_inner_microstep: 1437.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3974
[2024-06-10 19:22:26,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.42 | bwd_microstep: 1601.15 | bwd_inner_microstep: 1601.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4070
[2024-06-10 19:22:28,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.56 | bwd_microstep: 1552.42 | bwd_inner_microstep: 1552.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3514
[2024-06-10 19:22:30,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1334.37 | bwd_inner_microstep: 1334.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 19:22:31,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.24 | bwd_microstep: 1243.12 | bwd_inner_microstep: 1243.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-10 19:22:33,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.83 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2439
[2024-06-10 19:22:34,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.76 | bwd_microstep: 946.37 | bwd_inner_microstep: 946.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479
[2024-06-10 19:22:36,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.10 | bwd_microstep: 1215.24 | bwd_inner_microstep: 1215.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3697
[2024-06-10 19:22:38,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1482.33 | bwd_inner_microstep: 1482.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 19:22:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.43 | bwd_microstep: 1632.20 | bwd_inner_microstep: 1632.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-10 19:22:43,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.54 | bwd_microstep: 1717.37 | bwd_inner_microstep: 1717.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 19:22:45,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1400.55 | bwd_inner_microstep: 1400.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3560
[2024-06-10 19:22:47,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.40 | bwd_microstep: 1456.64 | bwd_inner_microstep: 1456.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3453
[2024-06-10 19:22:49,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1477.41 | bwd_inner_microstep: 1477.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3688
[2024-06-10 19:22:51,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.76 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3663
[2024-06-10 19:22:53,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.09 | bwd_microstep: 1476.83 | bwd_inner_microstep: 1476.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 19:22:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1384.72 | bwd_inner_microstep: 1384.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 19:22:57,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 19:22:59,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1608.64 | bwd_inner_microstep: 1608.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 19:23:01,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1397.05 | bwd_inner_microstep: 1397.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 19:23:03,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1252.26 | bwd_inner_microstep: 1252.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 19:23:05,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.04 | bwd_microstep: 1656.47 | bwd_inner_microstep: 1656.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 19:23:07,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827
[2024-06-10 19:23:09,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.58 | bwd_microstep: 1516.48 | bwd_inner_microstep: 1516.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 19:23:11,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 19:23:13,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942
[2024-06-10 19:23:13,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.05 | bwd_microstep: 700.39 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 19:23:16,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1498.27 | bwd_inner_microstep: 1498.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-10 19:23:17,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.79 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2410
[2024-06-10 19:23:19,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.48 | bwd_microstep: 1017.03 | bwd_inner_microstep: 1017.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 19:23:21,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1444.14 | bwd_inner_microstep: 1444.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3406
[2024-06-10 19:23:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.18 | optimizer_step: 6.61
[2024-06-10 19:23:23,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1543.50 | bwd_inner_microstep: 1535.82 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.78
[2024-06-10 19:23:23,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16639.09 | bwd: 44570.07 | bwd_inner: 44561.55 | bwd_allreduce: 7.86 | step: 39.34
{'loss': 1.2133, 'learning_rate': 1.2878215558996945e-05, 'epoch': 0.63}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 19:23:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1275.84 | bwd_inner_microstep: 1275.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4328
[2024-06-10 19:23:27,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.74 | bwd_microstep: 1634.16 | bwd_inner_microstep: 1634.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-10 19:23:29,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.36 | bwd_microstep: 1581.53 | bwd_inner_microstep: 1581.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3482
[2024-06-10 19:23:31,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.52 | bwd_microstep: 1342.73 | bwd_inner_microstep: 1342.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940
[2024-06-10 19:23:32,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.73 | bwd_microstep: 697.47 | bwd_inner_microstep: 697.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 19:23:34,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.43 | bwd_microstep: 1479.76 | bwd_inner_microstep: 1479.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 19:23:36,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 19:23:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.71 | bwd_microstep: 1342.42 | bwd_inner_microstep: 1342.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3737
[2024-06-10 19:23:40,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1382.07 | bwd_inner_microstep: 1382.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-10 19:23:42,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.02 | bwd_microstep: 1628.20 | bwd_inner_microstep: 1628.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 19:23:44,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.09 | bwd_microstep: 1290.88 | bwd_inner_microstep: 1290.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-10 19:23:46,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.11 | bwd_microstep: 1626.80 | bwd_inner_microstep: 1626.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 19:23:48,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1381.60 | bwd_inner_microstep: 1381.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225
[2024-06-10 19:23:49,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.64 | bwd_microstep: 963.55 | bwd_inner_microstep: 963.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 19:23:51,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1354.52 | bwd_inner_microstep: 1354.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3713
[2024-06-10 19:23:53,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.68 | bwd_microstep: 1696.08 | bwd_inner_microstep: 1696.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 19:23:55,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1350.35 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 19:23:57,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1462.94 | bwd_inner_microstep: 1462.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 19:23:59,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1293.09 | bwd_inner_microstep: 1293.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 19:24:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1298.71 | bwd_inner_microstep: 1298.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-10 19:24:03,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.89 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612
[2024-06-10 19:24:05,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1535.36 | bwd_inner_microstep: 1535.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3721
[2024-06-10 19:24:07,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 19:24:08,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1257.71 | bwd_inner_microstep: 1257.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821
[2024-06-10 19:24:10,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.86 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 19:24:12,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1487.56 | bwd_inner_microstep: 1487.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2785
[2024-06-10 19:24:14,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.06 | bwd_microstep: 1148.47 | bwd_inner_microstep: 1148.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 19:24:16,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.29 | bwd_microstep: 1557.32 | bwd_inner_microstep: 1557.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 19:24:18,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.70 | bwd_microstep: 1249.20 | bwd_inner_microstep: 1249.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 19:24:20,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1438.59 | bwd_inner_microstep: 1438.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-10 19:24:22,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1584.56 | bwd_inner_microstep: 1584.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3811
[2024-06-10 19:24:25,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.94 | optimizer_gradients: 4.02 | optimizer_step: 6.62
[2024-06-10 19:24:25,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.79 | bwd_microstep: 2436.94 | bwd_inner_microstep: 2029.51 | bwd_allreduce_microstep: 407.38 | step_microstep: 37.61
[2024-06-10 19:24:25,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16668.87 | bwd: 45161.86 | bwd_inner: 44753.59 | bwd_allreduce: 407.61 | step: 39.06
{'loss': 1.1771, 'learning_rate': 1.2843154436282014e-05, 'epoch': 0.63}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 19:24:27,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.53 | bwd_microstep: 1146.76 | bwd_inner_microstep: 1146.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 19:24:29,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1247.17 | bwd_inner_microstep: 1247.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 19:24:30,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.34 | bwd_microstep: 1381.70 | bwd_inner_microstep: 1381.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 19:24:32,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.34 | bwd_microstep: 809.76 | bwd_inner_microstep: 809.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 19:24:33,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1396.02 | bwd_inner_microstep: 1395.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 19:24:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.17 | bwd_microstep: 1657.88 | bwd_inner_microstep: 1657.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 19:24:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1436.38 | bwd_inner_microstep: 1436.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3762
[2024-06-10 19:24:40,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1501.11 | bwd_inner_microstep: 1501.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 19:24:42,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.91 | bwd_microstep: 1309.07 | bwd_inner_microstep: 1309.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4061
[2024-06-10 19:24:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.55 | bwd_microstep: 1519.73 | bwd_inner_microstep: 1519.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-10 19:24:45,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.42 | bwd_microstep: 732.45 | bwd_inner_microstep: 732.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 19:24:47,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1417.97 | bwd_inner_microstep: 1417.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 19:24:49,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111
[2024-06-10 19:24:50,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.65 | bwd_microstep: 919.19 | bwd_inner_microstep: 919.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3578
[2024-06-10 19:24:52,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.29 | bwd_microstep: 1668.80 | bwd_inner_microstep: 1668.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-10 19:24:54,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.55 | bwd_microstep: 1610.48 | bwd_inner_microstep: 1610.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3004
[2024-06-10 19:24:56,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.42 | bwd_microstep: 1207.72 | bwd_inner_microstep: 1207.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 19:24:57,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.34 | bwd_microstep: 975.29 | bwd_inner_microstep: 975.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 19:24:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.57 | bwd_microstep: 980.89 | bwd_inner_microstep: 980.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-10 19:25:01,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1614.04 | bwd_inner_microstep: 1614.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 19:25:03,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1513.54 | bwd_inner_microstep: 1513.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 19:25:05,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.66 | bwd_microstep: 1190.74 | bwd_inner_microstep: 1190.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 19:25:07,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.78 | bwd_microstep: 1453.88 | bwd_inner_microstep: 1453.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3606
[2024-06-10 19:25:09,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1435.22 | bwd_inner_microstep: 1435.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551
[2024-06-10 19:25:11,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.94 | bwd_microstep: 1420.26 | bwd_inner_microstep: 1420.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3444
[2024-06-10 19:25:13,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1379.78 | bwd_inner_microstep: 1379.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 19:25:15,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.73 | bwd_microstep: 1648.73 | bwd_inner_microstep: 1648.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3779
[2024-06-10 19:25:17,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.74 | bwd_microstep: 1612.56 | bwd_inner_microstep: 1612.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2110
[2024-06-10 19:25:18,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.18 | bwd_microstep: 825.01 | bwd_inner_microstep: 824.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 19:25:20,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.68 | bwd_microstep: 1602.95 | bwd_inner_microstep: 1602.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 19:25:22,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.04 | bwd_microstep: 1445.50 | bwd_inner_microstep: 1445.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3480
[2024-06-10 19:25:27,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.06 | optimizer_step: 6.61
[2024-06-10 19:25:27,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.21 | bwd_microstep: 3883.58 | bwd_inner_microstep: 1780.90 | bwd_allreduce_microstep: 2102.63 | step_microstep: 37.56
[2024-06-10 19:25:27,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16044.24 | bwd: 45326.52 | bwd_inner: 43222.98 | bwd_allreduce: 2102.86 | step: 39.03
{'loss': 1.2323, 'learning_rate': 1.2808118519939965e-05, 'epoch': 0.63}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977
[2024-06-10 19:25:28,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.96 | bwd_microstep: 886.73 | bwd_inner_microstep: 886.65 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-10 19:25:30,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.68 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3794
[2024-06-10 19:25:32,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.07 | bwd_microstep: 1346.27 | bwd_inner_microstep: 1346.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 19:25:34,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1378.70 | bwd_inner_microstep: 1378.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 19:25:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1292.75 | bwd_inner_microstep: 1292.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741
[2024-06-10 19:25:38,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.66 | bwd_microstep: 1629.89 | bwd_inner_microstep: 1629.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420
[2024-06-10 19:25:40,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.93 | bwd_microstep: 1312.92 | bwd_inner_microstep: 1312.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3426
[2024-06-10 19:25:41,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.25 | bwd_microstep: 1152.22 | bwd_inner_microstep: 1152.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 19:25:42,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.77 | bwd_microstep: 791.32 | bwd_inner_microstep: 791.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 19:25:44,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3399
[2024-06-10 19:25:46,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.32 | bwd_microstep: 1180.90 | bwd_inner_microstep: 1180.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1874
[2024-06-10 19:25:47,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.39 | bwd_microstep: 804.37 | bwd_inner_microstep: 804.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 19:25:49,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.64 | bwd_microstep: 1617.92 | bwd_inner_microstep: 1617.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 19:25:51,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.25 | bwd_microstep: 1508.30 | bwd_inner_microstep: 1508.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 19:25:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.94 | bwd_microstep: 1609.34 | bwd_inner_microstep: 1609.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 19:25:55,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 19:25:57,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1416.05 | bwd_inner_microstep: 1416.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 19:25:59,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1294.48 | bwd_inner_microstep: 1294.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 19:26:00,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 19:26:03,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.48 | bwd_microstep: 1658.76 | bwd_inner_microstep: 1658.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 19:26:05,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 1459.17 | bwd_inner_microstep: 1459.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538
[2024-06-10 19:26:07,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1422.96 | bwd_inner_microstep: 1422.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 19:26:09,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1490.86 | bwd_inner_microstep: 1490.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 19:26:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1609.75 | bwd_inner_microstep: 1609.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 19:26:13,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1245.39 | bwd_inner_microstep: 1245.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 19:26:15,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1490.04 | bwd_inner_microstep: 1490.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-10 19:26:17,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.40 | bwd_microstep: 1369.78 | bwd_inner_microstep: 1369.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2389
[2024-06-10 19:26:18,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.45 | bwd_microstep: 1129.15 | bwd_inner_microstep: 1129.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3599
[2024-06-10 19:26:20,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1246.04 | bwd_inner_microstep: 1246.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 19:26:21,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.83 | bwd_microstep: 803.96 | bwd_inner_microstep: 803.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 19:26:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778
[2024-06-10 19:26:26,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 19:26:26,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.39 | bwd_microstep: 2739.31 | bwd_inner_microstep: 1976.77 | bwd_allreduce_microstep: 762.49 | step_microstep: 37.80
[2024-06-10 19:26:26,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15872.47 | bwd: 43353.58 | bwd_inner: 42590.13 | bwd_allreduce: 762.76 | step: 39.25
{'loss': 1.1982, 'learning_rate': 1.2773107933367093e-05, 'epoch': 0.63}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 19:26:28,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 782.66 | bwd_inner_microstep: 782.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060
[2024-06-10 19:26:29,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.74 | bwd_microstep: 812.89 | bwd_inner_microstep: 812.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3916
[2024-06-10 19:26:31,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1690.59 | bwd_inner_microstep: 1690.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 19:26:33,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1553.12 | bwd_inner_microstep: 1553.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-10 19:26:35,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.47 | bwd_microstep: 1469.07 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 19:26:37,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1283.07 | bwd_inner_microstep: 1283.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 19:26:38,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.76 | bwd_microstep: 793.31 | bwd_inner_microstep: 793.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 19:26:40,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1282.72 | bwd_inner_microstep: 1282.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 19:26:42,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.48 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 19:26:44,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.45 | bwd_microstep: 1532.90 | bwd_inner_microstep: 1532.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447
[2024-06-10 19:26:46,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 19:26:47,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1285.75 | bwd_inner_microstep: 1285.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2434
[2024-06-10 19:26:49,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.16 | bwd_microstep: 945.66 | bwd_inner_microstep: 945.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 19:26:51,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1353.27 | bwd_inner_microstep: 1353.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 19:26:53,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.93 | bwd_microstep: 1493.27 | bwd_inner_microstep: 1493.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 19:26:55,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1490.07 | bwd_inner_microstep: 1490.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2958
[2024-06-10 19:26:57,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.45 | bwd_microstep: 1291.69 | bwd_inner_microstep: 1291.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1979
[2024-06-10 19:26:58,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.69 | bwd_microstep: 735.20 | bwd_inner_microstep: 735.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 19:27:00,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.70 | bwd_microstep: 1615.42 | bwd_inner_microstep: 1615.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2203
[2024-06-10 19:27:01,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.59 | bwd_microstep: 863.48 | bwd_inner_microstep: 863.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 19:27:03,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.06 | bwd_microstep: 1189.73 | bwd_inner_microstep: 1189.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555
[2024-06-10 19:27:04,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.34 | bwd_microstep: 1202.17 | bwd_inner_microstep: 1202.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3726
[2024-06-10 19:27:06,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1339.20 | bwd_inner_microstep: 1339.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3612
[2024-06-10 19:27:08,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1641.88 | bwd_inner_microstep: 1641.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 19:27:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.60 | bwd_microstep: 1402.05 | bwd_inner_microstep: 1402.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952
[2024-06-10 19:27:11,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.89 | bwd_microstep: 731.39 | bwd_inner_microstep: 731.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3815
[2024-06-10 19:27:14,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.67 | bwd_microstep: 1689.82 | bwd_inner_microstep: 1689.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 19:27:16,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.05 | bwd_microstep: 1604.35 | bwd_inner_microstep: 1604.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2298
[2024-06-10 19:27:17,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.21 | bwd_microstep: 1006.20 | bwd_inner_microstep: 1006.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556
[2024-06-10 19:27:19,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1424.79 | bwd_inner_microstep: 1424.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3437
[2024-06-10 19:27:21,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.11 | bwd_microstep: 1365.59 | bwd_inner_microstep: 1365.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 19:27:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.32 | optimizer_step: 6.60
[2024-06-10 19:27:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 5295.01 | bwd_inner_microstep: 1628.17 | bwd_allreduce_microstep: 3666.79 | step_microstep: 38.34
[2024-06-10 19:27:27,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15361.73 | bwd: 44835.42 | bwd_inner: 41167.67 | bwd_allreduce: 3667.06 | step: 39.83
{'loss': 1.2565, 'learning_rate': 1.273812279987051e-05, 'epoch': 0.63}
 1082/1726 [18:44:58<11:06:15, 62.07s/it]
 63%|██████▎   | 1083/1726 [18:46:00<11:03:33, 61.92s/it]


 63%|██████▎   | 1083/1726 [18:46:00<11:03:33, 61.92s/it]
 63%|██████▎   | 1084/1726 [18:47:02<11:03:18, 61.99s/it]


 63%|██████▎   | 1084/1726 [18:47:02<11:03:18, 61.99s/it]
 63%|██████▎   | 1085/1726 [18:48:04<11:01:22, 61.91s/it]


 63%|██████▎   | 1085/1726 [18:48:04<11:01:22, 61.91s/it]
 63%|██████▎   | 1086/1726 [18:49:03<10:52:48, 61.20s/it]


 63%|██████▎   | 1086/1726 [18:49:03<10:52:48, 61.20s/it]
 63%|██████▎   | 1087/1726 [18:50:04<10:49:38, 61.00s/it]


 63%|██████▎   | 1087/172dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 19:27:29,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1333.39 | bwd_inner_microstep: 1333.12 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 19:27:31,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1243.97 | bwd_inner_microstep: 1243.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 19:27:33,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.92 | bwd_microstep: 1474.01 | bwd_inner_microstep: 1473.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 19:27:35,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.81 | bwd_microstep: 1550.32 | bwd_inner_microstep: 1550.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 19:27:37,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1448.08 | bwd_inner_microstep: 1448.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740
[2024-06-10 19:27:39,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1334.53 | bwd_inner_microstep: 1334.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205
[2024-06-10 19:27:40,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.08 | bwd_microstep: 954.06 | bwd_inner_microstep: 954.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 19:27:42,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.74 | bwd_microstep: 1277.35 | bwd_inner_microstep: 1277.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 19:27:43,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.48 | bwd_microstep: 794.45 | bwd_inner_microstep: 794.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189
[2024-06-10 19:27:44,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.71 | bwd_microstep: 858.49 | bwd_inner_microstep: 858.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 19:27:46,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1520.98 | bwd_inner_microstep: 1520.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-10 19:27:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1327.55 | bwd_inner_microstep: 1327.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919
[2024-06-10 19:27:49,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.13 | bwd_microstep: 778.65 | bwd_inner_microstep: 778.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491
[2024-06-10 19:27:51,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1577.66 | bwd_inner_microstep: 1577.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 19:27:53,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.91 | bwd_microstep: 1503.03 | bwd_inner_microstep: 1503.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 19:27:55,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1579.64 | bwd_inner_microstep: 1579.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 19:27:57,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.24 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 19:27:59,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1500.83 | bwd_inner_microstep: 1500.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 19:28:01,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 1451.68 | bwd_inner_microstep: 1451.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 19:28:03,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1498.58 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 19:28:05,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1378.29 | bwd_inner_microstep: 1378.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3377
[2024-06-10 19:28:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.94 | bwd_microstep: 1269.88 | bwd_inner_microstep: 1269.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 19:28:09,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.98 | bwd_microstep: 1497.05 | bwd_inner_microstep: 1497.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3819
[2024-06-10 19:28:11,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.57 | bwd_microstep: 1416.48 | bwd_inner_microstep: 1416.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2231
[2024-06-10 19:28:12,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.47 | bwd_microstep: 771.73 | bwd_inner_microstep: 771.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 19:28:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.21 | bwd_microstep: 1660.64 | bwd_inner_microstep: 1660.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-10 19:28:16,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.99 | bwd_microstep: 1312.18 | bwd_inner_microstep: 1312.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550
[2024-06-10 19:28:19,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.19 | bwd_microstep: 1586.27 | bwd_inner_microstep: 1586.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-10 19:28:21,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.14 | bwd_microstep: 1640.27 | bwd_inner_microstep: 1640.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 19:28:23,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1374.80 | bwd_inner_microstep: 1374.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 19:28:25,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.70 | bwd_microstep: 1388.85 | bwd_inner_microstep: 1388.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 19:28:29,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.20 | optimizer_step: 6.58
[2024-06-10 19:28:29,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 3567.32 | bwd_inner_microstep: 1571.51 | bwd_allreduce_microstep: 1995.76 | step_microstep: 40.14
[2024-06-10 19:28:29,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16138.97 | bwd: 45258.05 | bwd_inner: 43261.17 | bwd_allreduce: 1996.10 | step: 41.82
{'loss': 1.2228, 'learning_rate': 1.270316324266768e-05, 'epoch': 0.63}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 19:28:31,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1337.65 | bwd_inner_microstep: 1337.54 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908
[2024-06-10 19:28:33,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 19:28:35,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.81 | bwd_microstep: 1502.34 | bwd_inner_microstep: 1502.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2008
[2024-06-10 19:28:36,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.81 | bwd_microstep: 831.98 | bwd_inner_microstep: 831.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 19:28:38,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1380.25 | bwd_inner_microstep: 1380.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 19:28:40,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1246.25 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 19:28:41,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.89 | bwd_microstep: 1187.55 | bwd_inner_microstep: 1187.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 19:28:42,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 791.19 | bwd_inner_microstep: 791.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 19:28:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1387.43 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 19:28:46,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1389.95 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503
[2024-06-10 19:28:48,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.36 | bwd_microstep: 1222.90 | bwd_inner_microstep: 1222.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 19:28:50,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063
[2024-06-10 19:28:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.51 | bwd_microstep: 915.64 | bwd_inner_microstep: 915.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2970
[2024-06-10 19:28:52,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.59 | bwd_microstep: 1041.17 | bwd_inner_microstep: 1041.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3527
[2024-06-10 19:28:55,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 1559.11 | bwd_inner_microstep: 1559.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 19:28:57,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1510.66 | bwd_inner_microstep: 1510.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3837
[2024-06-10 19:28:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.88 | bwd_microstep: 1689.27 | bwd_inner_microstep: 1689.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 19:29:01,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.50 | bwd_microstep: 1341.88 | bwd_inner_microstep: 1341.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2637
[2024-06-10 19:29:02,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.88 | bwd_microstep: 1017.96 | bwd_inner_microstep: 1017.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 19:29:04,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1304.19 | bwd_inner_microstep: 1304.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 19:29:06,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.87 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 19:29:08,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1400.23 | bwd_inner_microstep: 1400.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070
[2024-06-10 19:29:09,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.34 | bwd_microstep: 815.64 | bwd_inner_microstep: 815.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 19:29:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.66 | bwd_microstep: 1529.51 | bwd_inner_microstep: 1529.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 19:29:12,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.98 | bwd_microstep: 801.31 | bwd_inner_microstep: 801.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 19:29:14,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.67 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024
[2024-06-10 19:29:15,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.10 | bwd_microstep: 715.20 | bwd_inner_microstep: 715.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577
[2024-06-10 19:29:17,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1432.42 | bwd_inner_microstep: 1432.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-10 19:29:19,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1467.07 | bwd_inner_microstep: 1467.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3570
[2024-06-10 19:29:21,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1629.95 | bwd_inner_microstep: 1629.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3584
[2024-06-10 19:29:24,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.02 | bwd_microstep: 1619.83 | bwd_inner_microstep: 1619.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-10 19:29:31,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 19:29:31,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 6298.14 | bwd_inner_microstep: 1748.42 | bwd_allreduce_microstep: 4549.67 | step_microstep: 37.88
[2024-06-10 19:29:31,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15480.04 | bwd: 45980.83 | bwd_inner: 41430.15 | bwd_allreduce: 4549.97 | step: 39.44
{'loss': 1.2098, 'learning_rate': 1.266822938488597e-05, 'epoch': 0.63}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-10 19:29:32,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 19:29:34,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.51 | bwd_microstep: 1274.24 | bwd_inner_microstep: 1274.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 19:29:37,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.14 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 19:29:38,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1382.01 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150
[2024-06-10 19:29:40,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.83 | bwd_microstep: 848.10 | bwd_inner_microstep: 848.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2526
[2024-06-10 19:29:41,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.19 | bwd_microstep: 1027.70 | bwd_inner_microstep: 1027.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-10 19:29:43,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.78 | bwd_microstep: 1186.14 | bwd_inner_microstep: 1186.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 19:29:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1279.08 | bwd_inner_microstep: 1279.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 19:29:46,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.90 | bwd_microstep: 808.75 | bwd_inner_microstep: 808.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-10 19:29:47,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 19:29:49,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1249.27 | bwd_inner_microstep: 1249.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103
[2024-06-10 19:29:50,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 822.91 | bwd_inner_microstep: 822.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3499
[2024-06-10 19:29:52,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1324.94 | bwd_inner_microstep: 1324.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-10 19:29:54,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.35 | bwd_microstep: 1610.67 | bwd_inner_microstep: 1610.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3505
[2024-06-10 19:29:56,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.33 | bwd_microstep: 1365.66 | bwd_inner_microstep: 1365.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3542
[2024-06-10 19:29:58,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.21 | bwd_microstep: 1417.32 | bwd_inner_microstep: 1417.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3940
[2024-06-10 19:30:01,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.07 | bwd_microstep: 1796.45 | bwd_inner_microstep: 1796.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 19:30:02,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1339.94 | bwd_inner_microstep: 1339.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 19:30:04,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.19 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431
[2024-06-10 19:30:06,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.41 | bwd_microstep: 1180.43 | bwd_inner_microstep: 1180.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3814
[2024-06-10 19:30:08,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.06 | bwd_microstep: 1618.11 | bwd_inner_microstep: 1618.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 19:30:10,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1283.87 | bwd_inner_microstep: 1283.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 19:30:12,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1654.08 | bwd_inner_microstep: 1654.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-10 19:30:14,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.83 | bwd_microstep: 1516.85 | bwd_inner_microstep: 1516.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 19:30:17,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1504.86 | bwd_inner_microstep: 1504.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2264
[2024-06-10 19:30:18,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.97 | bwd_microstep: 970.41 | bwd_inner_microstep: 970.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 19:30:20,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1486.22 | bwd_inner_microstep: 1486.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 19:30:22,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1544.49 | bwd_inner_microstep: 1544.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 19:30:24,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1336.32 | bwd_inner_microstep: 1336.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 19:30:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.18 | bwd_microstep: 1388.67 | bwd_inner_microstep: 1388.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3947
[2024-06-10 19:30:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.06 | bwd_microstep: 1803.06 | bwd_inner_microstep: 1803.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3566
[2024-06-10 19:30:32,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 19:30:32,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.26 | bwd_microstep: 3456.46 | bwd_inner_microstep: 1717.21 | bwd_allreduce_microstep: 1739.20 | step_microstep: 37.76
[2024-06-10 19:30:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16213.50 | bwd: 45324.89 | bwd_inner: 43584.79 | bwd_allreduce: 1739.43 | step: 39.25
{'loss': 1.2051, 'learning_rate': 1.263332134956226e-05, 'epoch': 0.63}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423
[2024-06-10 19:30:34,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.96 | bwd_microstep: 1154.65 | bwd_inner_microstep: 1154.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258
[2024-06-10 19:30:35,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.56 | bwd_microstep: 965.10 | bwd_inner_microstep: 965.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 19:30:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.61 | bwd_microstep: 1445.50 | bwd_inner_microstep: 1445.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 19:30:39,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1278.07 | bwd_inner_microstep: 1278.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 19:30:41,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1389.46 | bwd_inner_microstep: 1389.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 19:30:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1281.92 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 19:30:45,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3493
[2024-06-10 19:30:47,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.08 | bwd_microstep: 1329.26 | bwd_inner_microstep: 1329.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3812
[2024-06-10 19:30:49,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1529.64 | bwd_inner_microstep: 1529.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671
[2024-06-10 19:30:51,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.21 | bwd_microstep: 1610.99 | bwd_inner_microstep: 1610.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499
[2024-06-10 19:30:53,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.10 | bwd_microstep: 1433.85 | bwd_inner_microstep: 1433.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 19:30:55,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1348.35 | bwd_inner_microstep: 1348.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4005
[2024-06-10 19:30:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.36 | bwd_microstep: 1677.13 | bwd_inner_microstep: 1677.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1888
[2024-06-10 19:30:58,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.25 | bwd_microstep: 775.10 | bwd_inner_microstep: 775.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3667
[2024-06-10 19:31:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.24 | bwd_microstep: 1716.78 | bwd_inner_microstep: 1716.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 19:31:03,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1511.79 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2953
[2024-06-10 19:31:04,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.75 | bwd_microstep: 1100.04 | bwd_inner_microstep: 1100.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1974
[2024-06-10 19:31:05,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.49 | bwd_microstep: 859.54 | bwd_inner_microstep: 859.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 19:31:07,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.65 | bwd_microstep: 1554.37 | bwd_inner_microstep: 1554.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 19:31:09,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.23 | bwd_microstep: 1280.33 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 19:31:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1416.58 | bwd_inner_microstep: 1416.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-10 19:31:13,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1421.47 | bwd_inner_microstep: 1421.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 19:31:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.69 | bwd_microstep: 804.53 | bwd_inner_microstep: 804.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 19:31:16,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1402.19 | bwd_inner_microstep: 1402.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2246
[2024-06-10 19:31:17,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.32 | bwd_microstep: 931.49 | bwd_inner_microstep: 931.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 19:31:20,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.89 | bwd_microstep: 1654.26 | bwd_inner_microstep: 1654.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-10 19:31:22,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1328.90 | bwd_inner_microstep: 1328.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 19:31:23,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.52 | bwd_microstep: 899.93 | bwd_inner_microstep: 899.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-10 19:31:25,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1606.66 | bwd_inner_microstep: 1606.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612
[2024-06-10 19:31:27,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.93 | bwd_microstep: 1538.48 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 19:31:29,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1540.91 | bwd_inner_microstep: 1540.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 19:31:33,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 19:31:33,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.75 | bwd_microstep: 3549.27 | bwd_inner_microstep: 1573.18 | bwd_allreduce_microstep: 1976.04 | step_microstep: 37.68
[2024-06-10 19:31:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15890.21 | bwd: 44714.85 | bwd_inner: 42737.91 | bwd_allreduce: 1976.27 | step: 39.13
{'loss': 1.224, 'learning_rate': 1.2598439259642459e-05, 'epoch': 0.63}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 19:31:35,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.99 | bwd_microstep: 1470.51 | bwd_inner_microstep: 1470.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2499
[2024-06-10 19:31:37,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.84 | bwd_microstep: 1020.84 | bwd_inner_microstep: 1020.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3865
[2024-06-10 19:31:39,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1522.20 | bwd_inner_microstep: 1522.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 19:31:41,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1648.72 | bwd_inner_microstep: 1648.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3956
[2024-06-10 19:31:43,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.38 | bwd_microstep: 1594.65 | bwd_inner_microstep: 1594.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3747
[2024-06-10 19:31:46,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1637.66 | bwd_inner_microstep: 1637.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 19:31:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 19:31:49,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1388.26 | bwd_inner_microstep: 1388.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 19:31:51,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1343.92 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2109
[2024-06-10 19:31:52,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.46 | bwd_microstep: 825.29 | bwd_inner_microstep: 825.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3943
[2024-06-10 19:31:55,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.52 | bwd_microstep: 1694.94 | bwd_inner_microstep: 1694.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 19:31:56,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1251.55 | bwd_inner_microstep: 1251.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681
[2024-06-10 19:31:58,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.90 | bwd_microstep: 1532.30 | bwd_inner_microstep: 1532.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1931
[2024-06-10 19:31:59,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.73 | bwd_microstep: 775.21 | bwd_inner_microstep: 775.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686
[2024-06-10 19:32:02,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.26 | bwd_microstep: 1722.59 | bwd_inner_microstep: 1722.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 19:32:04,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.36 | bwd_microstep: 1581.93 | bwd_inner_microstep: 1581.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2112
[2024-06-10 19:32:05,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.37 | bwd_microstep: 827.71 | bwd_inner_microstep: 827.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2783
[2024-06-10 19:32:07,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.89 | bwd_microstep: 1053.54 | bwd_inner_microstep: 1053.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2095
[2024-06-10 19:32:08,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.70 | bwd_microstep: 979.38 | bwd_inner_microstep: 979.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2126
[2024-06-10 19:32:09,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.65 | bwd_microstep: 801.21 | bwd_inner_microstep: 801.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2141
[2024-06-10 19:32:11,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.59 | bwd_microstep: 1024.76 | bwd_inner_microstep: 1024.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2007
[2024-06-10 19:32:12,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.76 | bwd_microstep: 709.81 | bwd_inner_microstep: 709.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 19:32:14,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1638.38 | bwd_inner_microstep: 1638.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 19:32:16,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.18 | bwd_microstep: 1455.52 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 19:32:18,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 19:32:19,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.75 | bwd_microstep: 1298.72 | bwd_inner_microstep: 1298.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 19:32:22,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 19:32:23,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1351.54 | bwd_inner_microstep: 1351.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 19:32:25,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1251.82 | bwd_inner_microstep: 1251.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 19:32:27,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.61 | bwd_microstep: 1558.47 | bwd_inner_microstep: 1558.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 19:32:29,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1546.50 | bwd_inner_microstep: 1546.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 19:32:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 19:32:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 5600.98 | bwd_inner_microstep: 1871.15 | bwd_allreduce_microstep: 3729.78 | step_microstep: 37.93
[2024-06-10 19:32:36,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15791.32 | bwd: 46276.97 | bwd_inner: 42546.28 | bwd_allreduce: 3730.01 | step: 39.38
{'loss': 1.2286, 'learning_rate': 1.2563583237981103e-05, 'epoch': 0.63}
6 [18:50:04<10:49:38, 61.00s/it]
 63%|██████▎   | 1088/1726 [18:51:05<10:51:00, 61.22s/it]


 63%|██████▎   | 1088/1726 [18:51:05<10:51:00, 61.22s/it]
 63%|██████▎   | 1089/1726 [18:52:07<10:51:47, 61.39s/it]


 63%|██████▎   | 1089/1726 [18:52:07<10:51:47, 61.39s/it]
 63%|██████▎   | 1090/1726 [18:53:09<10:52:17, 61.54s/it]


 63%|██████▎   | 1090/1726 [18:53:09<10:52:17, 61.54s/it]
 63%|██████▎   | 1091/1726 [18:54:10<10:49:20, 61.36s/it]


 63%|██████▎   | 1091/1726 [18:54:10<10:49:20, 61.36s/it]
 63%|██████▎   | 1092/1726 [18:55:12<10:51:39, 61.67s/it]


 63%|██████▎   | 1092/1726 [18:55:dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 19:32:38,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.80 | bwd_microstep: 1333.45 | bwd_inner_microstep: 1333.31 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 19:32:40,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1455.00 | bwd_inner_microstep: 1454.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 19:32:41,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1289.55 | bwd_inner_microstep: 1289.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3899
[2024-06-10 19:32:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.34 | bwd_microstep: 1517.29 | bwd_inner_microstep: 1517.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 19:32:45,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.81 | bwd_microstep: 1247.55 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1941
[2024-06-10 19:32:46,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.08 | bwd_microstep: 820.62 | bwd_inner_microstep: 820.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 19:32:48,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1248.07 | bwd_inner_microstep: 1248.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888
[2024-06-10 19:32:50,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.48 | bwd_microstep: 1087.08 | bwd_inner_microstep: 1087.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 19:32:51,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495
[2024-06-10 19:32:53,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1413.44 | bwd_inner_microstep: 1413.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479
[2024-06-10 19:32:55,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1329.12 | bwd_inner_microstep: 1329.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436
[2024-06-10 19:32:57,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.44 | bwd_microstep: 1314.44 | bwd_inner_microstep: 1314.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074
[2024-06-10 19:32:58,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.63 | bwd_microstep: 818.66 | bwd_inner_microstep: 818.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-10 19:33:00,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.91 | bwd_microstep: 1514.73 | bwd_inner_microstep: 1514.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2960
[2024-06-10 19:33:02,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.43 | bwd_microstep: 1103.57 | bwd_inner_microstep: 1103.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 19:33:04,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1448.05 | bwd_inner_microstep: 1448.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 19:33:06,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1382.64 | bwd_inner_microstep: 1382.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 19:33:07,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.20 | bwd_microstep: 1284.38 | bwd_inner_microstep: 1284.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3837
[2024-06-10 19:33:09,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1293.57 | bwd_inner_microstep: 1293.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 19:33:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.98 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 19:33:13,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1406.30 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 19:33:15,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1377.05 | bwd_inner_microstep: 1377.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 19:33:17,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.91 | bwd_microstep: 1558.75 | bwd_inner_microstep: 1558.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 19:33:19,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1251.91 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 19:33:21,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1407.89 | bwd_inner_microstep: 1407.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3726
[2024-06-10 19:33:22,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1336.48 | bwd_inner_microstep: 1336.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2846
[2024-06-10 19:33:24,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.90 | bwd_microstep: 1098.72 | bwd_inner_microstep: 1098.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3726
[2024-06-10 19:33:26,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1557.63 | bwd_inner_microstep: 1557.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591
[2024-06-10 19:33:28,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1336.90 | bwd_inner_microstep: 1336.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2083
[2024-06-10 19:33:29,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.02 | bwd_microstep: 821.42 | bwd_inner_microstep: 821.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-10 19:33:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.32 | bwd_microstep: 1300.29 | bwd_inner_microstep: 1300.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 19:33:36,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 19:33:36,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 4548.46 | bwd_inner_microstep: 1526.73 | bwd_allreduce_microstep: 3021.68 | step_microstep: 37.82
[2024-06-10 19:33:36,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15514.05 | bwd: 44464.20 | bwd_inner: 41441.49 | bwd_allreduce: 3021.97 | step: 39.38
{'loss': 1.2142, 'learning_rate': 1.2528753407340929e-05, 'epoch': 0.63}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 19:33:38,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.32 | bwd_microstep: 1368.21 | bwd_inner_microstep: 1368.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2452
[2024-06-10 19:33:39,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.59 | bwd_microstep: 1044.45 | bwd_inner_microstep: 1044.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4241
[2024-06-10 19:33:42,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.27 | bwd_microstep: 1662.09 | bwd_inner_microstep: 1662.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 19:33:44,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1649.68 | bwd_inner_microstep: 1649.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 19:33:46,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 19:33:48,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1280.88 | bwd_inner_microstep: 1280.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-10 19:33:49,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.72 | bwd_microstep: 951.28 | bwd_inner_microstep: 951.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4080
[2024-06-10 19:33:51,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1522.61 | bwd_inner_microstep: 1522.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 19:33:53,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1385.15 | bwd_inner_microstep: 1385.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004
[2024-06-10 19:33:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.06 | bwd_microstep: 894.64 | bwd_inner_microstep: 894.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3510
[2024-06-10 19:33:56,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1510.45 | bwd_inner_microstep: 1510.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 19:33:58,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1386.79 | bwd_inner_microstep: 1386.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 19:34:00,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.97 | bwd_microstep: 1452.10 | bwd_inner_microstep: 1452.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3657
[2024-06-10 19:34:03,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.96 | bwd_microstep: 1818.66 | bwd_inner_microstep: 1818.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 19:34:04,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1298.34 | bwd_inner_microstep: 1298.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 19:34:06,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.64 | bwd_microstep: 1289.32 | bwd_inner_microstep: 1289.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3875
[2024-06-10 19:34:08,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.32 | bwd_microstep: 1490.28 | bwd_inner_microstep: 1490.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279
[2024-06-10 19:34:10,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.22 | bwd_microstep: 972.98 | bwd_inner_microstep: 972.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 19:34:12,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1394.69 | bwd_inner_microstep: 1394.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 19:34:14,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1538.61 | bwd_inner_microstep: 1538.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3853
[2024-06-10 19:34:15,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1272.42 | bwd_inner_microstep: 1272.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3680
[2024-06-10 19:34:17,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.70 | bwd_microstep: 1262.34 | bwd_inner_microstep: 1262.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912
[2024-06-10 19:34:18,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.26 | bwd_microstep: 685.81 | bwd_inner_microstep: 685.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 19:34:20,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1354.89 | bwd_inner_microstep: 1354.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 19:34:22,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1531.84 | bwd_inner_microstep: 1531.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 19:34:24,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.20 | bwd_microstep: 1646.90 | bwd_inner_microstep: 1646.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3614
[2024-06-10 19:34:27,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.57 | bwd_microstep: 1562.39 | bwd_inner_microstep: 1562.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819
[2024-06-10 19:34:29,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.27 | bwd_microstep: 1690.42 | bwd_inner_microstep: 1690.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583
[2024-06-10 19:34:31,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.27 | bwd_microstep: 1531.13 | bwd_inner_microstep: 1531.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 19:34:33,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.03 | bwd_microstep: 1572.80 | bwd_inner_microstep: 1572.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-10 19:34:35,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1349.62 | bwd_inner_microstep: 1349.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2932
[2024-06-10 19:34:37,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-10 19:34:37,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.67 | bwd_microstep: 1941.14 | bwd_inner_microstep: 1298.11 | bwd_allreduce_microstep: 642.99 | step_microstep: 37.67
[2024-06-10 19:34:37,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16421.59 | bwd: 44691.24 | bwd_inner: 44047.35 | bwd_allreduce: 643.21 | step: 39.13
{'loss': 1.1726, 'learning_rate': 1.2493949890392418e-05, 'epoch': 0.63}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 19:34:40,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1470.62 | bwd_inner_microstep: 1470.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 19:34:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1475.34 | bwd_inner_microstep: 1475.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 19:34:43,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.02 | bwd_microstep: 1378.45 | bwd_inner_microstep: 1378.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 19:34:45,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 19:34:46,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.97 | bwd_microstep: 695.96 | bwd_inner_microstep: 695.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 19:34:47,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.96 | bwd_inner_microstep: 1247.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 19:34:49,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1279.10 | bwd_inner_microstep: 1279.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1957
[2024-06-10 19:34:50,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.02 | bwd_microstep: 730.84 | bwd_inner_microstep: 730.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 19:34:52,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1509.69 | bwd_inner_microstep: 1509.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 19:34:54,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.83 | bwd_microstep: 1155.77 | bwd_inner_microstep: 1155.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 19:34:56,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1341.23 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3092
[2024-06-10 19:34:57,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1298.80 | bwd_inner_microstep: 1298.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3488
[2024-06-10 19:34:59,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1393.04 | bwd_inner_microstep: 1393.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 19:35:01,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.50 | bwd_microstep: 1478.23 | bwd_inner_microstep: 1478.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-10 19:35:03,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1192.82 | bwd_inner_microstep: 1192.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3824
[2024-06-10 19:35:05,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1419.93 | bwd_inner_microstep: 1419.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 19:35:07,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.95 | bwd_microstep: 1189.72 | bwd_inner_microstep: 1189.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 19:35:08,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.95 | bwd_microstep: 795.97 | bwd_inner_microstep: 795.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3519
[2024-06-10 19:35:09,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1222.78 | bwd_inner_microstep: 1222.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 19:35:11,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.57 | bwd_microstep: 1192.24 | bwd_inner_microstep: 1192.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 19:35:13,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1254.21 | bwd_inner_microstep: 1254.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3618
[2024-06-10 19:35:15,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.70 | bwd_microstep: 1466.57 | bwd_inner_microstep: 1466.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 19:35:17,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1553.82 | bwd_inner_microstep: 1553.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 19:35:19,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.48 | bwd_microstep: 1495.73 | bwd_inner_microstep: 1495.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 19:35:21,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1253.11 | bwd_inner_microstep: 1253.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593
[2024-06-10 19:35:23,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.84 | bwd_microstep: 1365.68 | bwd_inner_microstep: 1365.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591
[2024-06-10 19:35:25,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.14 | bwd_microstep: 1458.04 | bwd_inner_microstep: 1458.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3814
[2024-06-10 19:35:27,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.96 | bwd_microstep: 1510.36 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2673
[2024-06-10 19:35:28,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.78 | bwd_microstep: 1116.28 | bwd_inner_microstep: 1116.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 19:35:30,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.38 | bwd_microstep: 1354.95 | bwd_inner_microstep: 1354.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3446
[2024-06-10 19:35:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 19:35:39,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 19:35:39,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.95 | bwd_microstep: 6078.94 | bwd_inner_microstep: 1570.34 | bwd_allreduce_microstep: 4508.54 | step_microstep: 38.16
[2024-06-10 19:35:39,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15385.04 | bwd: 45566.09 | bwd_inner: 41056.64 | bwd_allreduce: 4508.77 | step: 39.61
{'loss': 1.1916, 'learning_rate': 1.245917280971337e-05, 'epoch': 0.63}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 19:35:41,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1472.78 | bwd_inner_microstep: 1472.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 19:35:43,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1379.97 | bwd_inner_microstep: 1379.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 19:35:44,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.58 | bwd_microstep: 1275.86 | bwd_inner_microstep: 1275.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899
[2024-06-10 19:35:47,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1584.98 | bwd_inner_microstep: 1584.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 19:35:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1380.99 | bwd_inner_microstep: 1380.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 19:35:50,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.71 | bwd_microstep: 1281.49 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 19:35:52,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.51 | bwd_microstep: 1314.25 | bwd_inner_microstep: 1314.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 19:35:54,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 19:35:56,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 19:35:58,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.87 | bwd_microstep: 1284.80 | bwd_inner_microstep: 1284.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 19:36:00,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.97 | bwd_microstep: 1396.28 | bwd_inner_microstep: 1396.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 19:36:01,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1396.63 | bwd_inner_microstep: 1396.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4056
[2024-06-10 19:36:04,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.77 | bwd_microstep: 1724.50 | bwd_inner_microstep: 1724.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433
[2024-06-10 19:36:05,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.18 | bwd_microstep: 1184.50 | bwd_inner_microstep: 1184.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 19:36:07,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1257.96 | bwd_inner_microstep: 1257.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825
[2024-06-10 19:36:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.80 | bwd_microstep: 1751.32 | bwd_inner_microstep: 1751.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 19:36:11,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 796.39 | bwd_inner_microstep: 796.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 19:36:13,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1595.26 | bwd_inner_microstep: 1595.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 19:36:15,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 19:36:16,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 19:36:18,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.19 | bwd_microstep: 1391.00 | bwd_inner_microstep: 1390.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 19:36:20,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1393.51 | bwd_inner_microstep: 1393.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3655
[2024-06-10 19:36:23,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.59 | bwd_microstep: 1655.04 | bwd_inner_microstep: 1655.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942
[2024-06-10 19:36:24,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 700.92 | bwd_inner_microstep: 700.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 19:36:25,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1257.54 | bwd_inner_microstep: 1257.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 19:36:27,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-10 19:36:29,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.98 | bwd_microstep: 1547.43 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3553
[2024-06-10 19:36:31,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.44 | bwd_microstep: 1459.85 | bwd_inner_microstep: 1459.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-10 19:36:34,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1639.40 | bwd_inner_microstep: 1639.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2929
[2024-06-10 19:36:35,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.19 | bwd_microstep: 1225.44 | bwd_inner_microstep: 1225.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 19:36:37,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.04 | bwd_microstep: 1308.24 | bwd_inner_microstep: 1308.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3876
[2024-06-10 19:36:39,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 19:36:39,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.10 | bwd_microstep: 1521.89 | bwd_inner_microstep: 1514.24 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.53
[2024-06-10 19:36:39,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16414.08 | bwd: 43774.75 | bwd_inner: 43766.25 | bwd_allreduce: 7.84 | step: 39.02
{'loss': 1.1956, 'learning_rate': 1.242442228778848e-05, 'epoch': 0.63}
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1953
[2024-06-10 19:36:40,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.46 | bwd_microstep: 802.35 | bwd_inner_microstep: 802.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3879
[2024-06-10 19:36:43,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.50 | bwd_microstep: 1544.44 | bwd_inner_microstep: 1544.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 19:36:44,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1380.42 | bwd_inner_microstep: 1380.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 19:36:46,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1480.32 | bwd_inner_microstep: 1480.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 19:36:49,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.44 | bwd_microstep: 1548.40 | bwd_inner_microstep: 1548.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 19:36:50,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.66 | bwd_microstep: 790.86 | bwd_inner_microstep: 790.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 19:36:52,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1529.72 | bwd_inner_microstep: 1529.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 19:36:53,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.38 | bwd_microstep: 788.76 | bwd_inner_microstep: 788.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 19:36:54,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 794.58 | bwd_inner_microstep: 794.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3690
[2024-06-10 19:36:56,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.28 | bwd_microstep: 1485.59 | bwd_inner_microstep: 1485.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4114
[2024-06-10 19:36:58,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.28 | bwd_microstep: 1668.21 | bwd_inner_microstep: 1668.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540
[2024-06-10 19:37:00,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1324.72 | bwd_inner_microstep: 1324.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 19:37:02,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1379.55 | bwd_inner_microstep: 1379.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3936
[2024-06-10 19:37:04,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.14 | bwd_microstep: 1688.63 | bwd_inner_microstep: 1688.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 19:37:06,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1244.65 | bwd_inner_microstep: 1244.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3952
[2024-06-10 19:37:08,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1503.54 | bwd_inner_microstep: 1503.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-10 19:37:10,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1422.61 | bwd_inner_microstep: 1422.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 19:37:12,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.32 | bwd_microstep: 1381.37 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 19:37:14,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.76 | bwd_microstep: 1508.86 | bwd_inner_microstep: 1508.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 623
[2024-06-10 19:37:15,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.30 | bwd_microstep: 265.08 | bwd_inner_microstep: 265.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 618
[2024-06-10 19:37:15,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.61 | bwd_microstep: 261.40 | bwd_inner_microstep: 261.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3459
[2024-06-10 19:37:17,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.67 | bwd_microstep: 1505.09 | bwd_inner_microstep: 1505.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 19:37:19,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.73 | bwd_microstep: 1518.09 | bwd_inner_microstep: 1518.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 19:37:21,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 19:37:23,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1485.43 | bwd_inner_microstep: 1485.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3857
[2024-06-10 19:37:25,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1363.32 | bwd_inner_microstep: 1363.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2006
[2024-06-10 19:37:26,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.98 | bwd_microstep: 709.53 | bwd_inner_microstep: 709.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4055
[2024-06-10 19:37:28,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.11 | bwd_microstep: 1458.55 | bwd_inner_microstep: 1458.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 19:37:30,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 19:37:32,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.83 | bwd_microstep: 1311.09 | bwd_inner_microstep: 1311.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 19:37:33,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1378.19 | bwd_inner_microstep: 1378.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2628
[2024-06-10 19:37:40,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.20 | optimizer_step: 6.57
[2024-06-10 19:37:40,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.84 | bwd_microstep: 6535.03 | bwd_inner_microstep: 1256.26 | bwd_allreduce_microstep: 5278.72 | step_microstep: 37.78
[2024-06-10 19:37:40,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15124.88 | bwd: 45703.22 | bwd_inner: 40423.58 | bwd_allreduce: 5278.96 | step: 39.27
{'loss': 1.2249, 'learning_rate': 1.2389698447008916e-05, 'epoch': 0.64}
12<10:51:39, 61.67s/it]
 63%|██████▎   | 1093/1726 [18:56:13<10:46:19, 61.26s/it]


 63%|██████▎   | 1093/1726 [18:56:13<10:46:19, 61.26s/it]
 63%|██████▎   | 1094/1726 [18:57:14<10:45:52, 61.32s/it]


 63%|██████▎   | 1094/1726 [18:57:14<10:45:52, 61.32s/it]
 63%|██████▎   | 1095/1726 [18:58:16<10:44:42, 61.30s/it]


 63%|██████▎   | 1095/1726 [18:58:16<10:44:42, 61.30s/it]
 63%|██████▎   | 1096/1726 [18:59:16<10:41:12, 61.07s/it]


 63%|██████▎   | 1096/1726 [18:59:16<10:41:12, 61.07s/it]
 64%|██████▎   | 1097/1726 [19:00:17<10:40:27, 61.09s/it]


 64%|██████▎   | 1097/1726 [19:00:17<10:40:dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1935
[2024-06-10 19:37:42,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.90 | bwd_microstep: 877.64 | bwd_inner_microstep: 877.55 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 19:37:44,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1409.28 | bwd_inner_microstep: 1409.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-10 19:37:46,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.36 | bwd_microstep: 1581.21 | bwd_inner_microstep: 1581.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2293
[2024-06-10 19:37:47,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.96 | bwd_microstep: 784.05 | bwd_inner_microstep: 784.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 19:37:49,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1281.76 | bwd_inner_microstep: 1281.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3799
[2024-06-10 19:37:51,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1512.28 | bwd_inner_microstep: 1512.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 19:37:53,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.46 | bwd_microstep: 1405.28 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2425
[2024-06-10 19:37:54,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.99 | bwd_microstep: 841.91 | bwd_inner_microstep: 841.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 19:37:56,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1559.18 | bwd_inner_microstep: 1559.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2154
[2024-06-10 19:37:57,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.22 | bwd_microstep: 909.92 | bwd_inner_microstep: 909.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 19:37:58,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.10 | bwd_microstep: 792.76 | bwd_inner_microstep: 792.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 19:38:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1412.92 | bwd_inner_microstep: 1412.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3498
[2024-06-10 19:38:02,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1331.56 | bwd_inner_microstep: 1331.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 19:38:03,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.27 | bwd_microstep: 888.72 | bwd_inner_microstep: 888.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 19:38:05,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.85 | bwd_microstep: 1482.76 | bwd_inner_microstep: 1482.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 19:38:07,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1376.50 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2287
[2024-06-10 19:38:09,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.12 | bwd_microstep: 1072.67 | bwd_inner_microstep: 1072.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 19:38:11,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.62 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 19:38:12,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1387.15 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 19:38:14,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1297.16 | bwd_inner_microstep: 1297.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 19:38:16,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.19 | bwd_microstep: 1418.79 | bwd_inner_microstep: 1418.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 19:38:18,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.36 | bwd_microstep: 977.89 | bwd_inner_microstep: 977.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607
[2024-06-10 19:38:20,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.42 | bwd_microstep: 1439.20 | bwd_inner_microstep: 1439.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 19:38:22,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.13 | bwd_microstep: 1488.63 | bwd_inner_microstep: 1488.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 19:38:23,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1296.98 | bwd_inner_microstep: 1296.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3546
[2024-06-10 19:38:25,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1375.56 | bwd_inner_microstep: 1375.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 19:38:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1394.38 | bwd_inner_microstep: 1394.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 19:38:29,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3755
[2024-06-10 19:38:31,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1500.50 | bwd_inner_microstep: 1500.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3591
[2024-06-10 19:38:34,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.92 | bwd_microstep: 1702.15 | bwd_inner_microstep: 1702.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3565
[2024-06-10 19:38:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1544.55 | bwd_inner_microstep: 1544.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 19:38:42,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.63
[2024-06-10 19:38:42,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.15 | bwd_microstep: 5213.26 | bwd_inner_microstep: 1552.63 | bwd_allreduce_microstep: 3660.57 | step_microstep: 38.08
[2024-06-10 19:38:42,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15515.35 | bwd: 45229.62 | bwd_inner: 41568.07 | bwd_allreduce: 3660.84 | step: 39.63
{'loss': 1.2006, 'learning_rate': 1.2355001409671856e-05, 'epoch': 0.64}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 19:38:44,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.47 | bwd_microstep: 1473.90 | bwd_inner_microstep: 1473.73 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 19:38:45,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1273.94 | bwd_inner_microstep: 1273.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848
[2024-06-10 19:38:47,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.04 | bwd_microstep: 1460.96 | bwd_inner_microstep: 1460.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3484
[2024-06-10 19:38:49,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.64 | bwd_microstep: 1247.58 | bwd_inner_microstep: 1247.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-10 19:38:50,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.38 | bwd_microstep: 679.66 | bwd_inner_microstep: 679.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 19:38:52,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1248.26 | bwd_inner_microstep: 1248.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4013
[2024-06-10 19:38:54,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.91 | bwd_microstep: 1619.06 | bwd_inner_microstep: 1619.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 19:38:56,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1442.97 | bwd_inner_microstep: 1442.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-10 19:38:58,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.03 | bwd_microstep: 1319.29 | bwd_inner_microstep: 1319.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501
[2024-06-10 19:39:00,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1417.14 | bwd_inner_microstep: 1417.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920
[2024-06-10 19:39:02,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1584.63 | bwd_inner_microstep: 1584.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2621
[2024-06-10 19:39:04,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.46 | bwd_microstep: 1143.40 | bwd_inner_microstep: 1143.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 19:39:05,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1346.93 | bwd_inner_microstep: 1346.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3838
[2024-06-10 19:39:07,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.40 | bwd_microstep: 1391.20 | bwd_inner_microstep: 1391.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 19:39:09,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 19:39:11,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1280.80 | bwd_inner_microstep: 1280.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 19:39:13,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1500.64 | bwd_inner_microstep: 1500.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682
[2024-06-10 19:39:15,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.78 | bwd_microstep: 1423.58 | bwd_inner_microstep: 1423.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 19:39:17,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.62 | bwd_microstep: 1611.90 | bwd_inner_microstep: 1611.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998
[2024-06-10 19:39:18,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.07 | bwd_microstep: 708.01 | bwd_inner_microstep: 707.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 19:39:20,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1390.87 | bwd_inner_microstep: 1390.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063
[2024-06-10 19:39:21,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.30 | bwd_microstep: 913.83 | bwd_inner_microstep: 913.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443
[2024-06-10 19:39:23,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.29 | bwd_microstep: 1285.11 | bwd_inner_microstep: 1285.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 19:39:25,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1556.55 | bwd_inner_microstep: 1556.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429
[2024-06-10 19:39:27,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.49 | bwd_microstep: 1379.78 | bwd_inner_microstep: 1379.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3427
[2024-06-10 19:39:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1409.14 | bwd_inner_microstep: 1409.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 19:39:31,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.08 | bwd_microstep: 1337.80 | bwd_inner_microstep: 1337.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 609
[2024-06-10 19:39:31,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.65 | bwd_microstep: 259.98 | bwd_inner_microstep: 259.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 19:39:34,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.24 | bwd_microstep: 1600.95 | bwd_inner_microstep: 1600.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 19:39:36,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.96 | bwd_microstep: 1757.16 | bwd_inner_microstep: 1757.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3606
[2024-06-10 19:39:38,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1470.44 | bwd_inner_microstep: 1470.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 19:39:42,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 19:39:42,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 3518.85 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 1939.54 | step_microstep: 37.75
[2024-06-10 19:39:42,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15842.83 | bwd: 44446.77 | bwd_inner: 42506.19 | bwd_allreduce: 1939.84 | step: 39.33
{'loss': 1.1916, 'learning_rate': 1.2320331297980097e-05, 'epoch': 0.64}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468
[2024-06-10 19:39:44,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1398.69 | bwd_inner_microstep: 1398.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3913
[2024-06-10 19:39:46,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1490.02 | bwd_inner_microstep: 1489.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 19:39:48,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1374.38 | bwd_inner_microstep: 1374.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 19:39:50,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1296.54 | bwd_inner_microstep: 1296.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-10 19:39:52,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1647.02 | bwd_inner_microstep: 1646.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 19:39:54,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.03 | bwd_microstep: 1180.92 | bwd_inner_microstep: 1180.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 19:39:55,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1244.70 | bwd_inner_microstep: 1244.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3705
[2024-06-10 19:39:57,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 19:39:59,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.85 | bwd_microstep: 1388.61 | bwd_inner_microstep: 1388.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 19:40:01,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1247.59 | bwd_inner_microstep: 1247.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2214
[2024-06-10 19:40:02,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.62 | bwd_microstep: 987.14 | bwd_inner_microstep: 987.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3494
[2024-06-10 19:40:04,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.86 | bwd_microstep: 1514.19 | bwd_inner_microstep: 1514.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 19:40:07,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.29 | bwd_microstep: 1525.57 | bwd_inner_microstep: 1525.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 19:40:09,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1467.14 | bwd_inner_microstep: 1467.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3647
[2024-06-10 19:40:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 1535.21 | bwd_inner_microstep: 1535.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446
[2024-06-10 19:40:12,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.83 | bwd_microstep: 1281.13 | bwd_inner_microstep: 1281.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988
[2024-06-10 19:40:14,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.76 | bwd_microstep: 833.20 | bwd_inner_microstep: 833.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 19:40:15,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 798.42 | bwd_inner_microstep: 798.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 19:40:16,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1294.27 | bwd_inner_microstep: 1294.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-10 19:40:18,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1412.92 | bwd_inner_microstep: 1412.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3564
[2024-06-10 19:40:20,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.99 | bwd_microstep: 1447.49 | bwd_inner_microstep: 1447.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 19:40:22,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1392.98 | bwd_inner_microstep: 1392.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546
[2024-06-10 19:40:24,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.40 | bwd_microstep: 1200.06 | bwd_inner_microstep: 1200.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 19:40:26,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1412.78 | bwd_inner_microstep: 1412.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 19:40:28,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 19:40:30,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.84 | bwd_microstep: 1617.15 | bwd_inner_microstep: 1617.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3533
[2024-06-10 19:40:32,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1453.94 | bwd_inner_microstep: 1453.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 19:40:34,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1284.97 | bwd_inner_microstep: 1284.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3591
[2024-06-10 19:40:36,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1438.34 | bwd_inner_microstep: 1438.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556
[2024-06-10 19:40:38,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.09 | bwd_microstep: 1358.12 | bwd_inner_microstep: 1358.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2917
[2024-06-10 19:40:39,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.25 | bwd_microstep: 1190.05 | bwd_inner_microstep: 1190.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 19:40:45,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.28 | optimizer_step: 6.63
[2024-06-10 19:40:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.62 | bwd_microstep: 5416.75 | bwd_inner_microstep: 1867.45 | bwd_allreduce_microstep: 3549.25 | step_microstep: 38.97
[2024-06-10 19:40:46,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16166.95 | bwd: 46848.27 | bwd_inner: 43298.10 | bwd_allreduce: 3549.48 | step: 40.53
{'loss': 1.1933, 'learning_rate': 1.2285688234041575e-05, 'epoch': 0.64}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 19:40:48,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1488.70 | bwd_inner_microstep: 1488.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 19:40:49,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1343.18 | bwd_inner_microstep: 1343.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 19:40:51,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.21 | bwd_microstep: 1310.77 | bwd_inner_microstep: 1310.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-10 19:40:53,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.83 | bwd_microstep: 1581.59 | bwd_inner_microstep: 1581.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 19:40:56,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1552.37 | bwd_inner_microstep: 1552.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-10 19:40:58,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.91 | bwd_microstep: 1640.79 | bwd_inner_microstep: 1640.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 19:40:59,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1149.41 | bwd_inner_microstep: 1149.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1892
[2024-06-10 19:41:00,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.43 | bwd_microstep: 683.78 | bwd_inner_microstep: 683.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 19:41:02,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.08 | bwd_microstep: 1284.53 | bwd_inner_microstep: 1284.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 19:41:04,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 19:41:05,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.81 | bwd_microstep: 1154.66 | bwd_inner_microstep: 1154.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 19:41:07,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1394.65 | bwd_inner_microstep: 1394.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1981
[2024-06-10 19:41:09,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.47 | bwd_microstep: 830.55 | bwd_inner_microstep: 830.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-10 19:41:11,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1530.51 | bwd_inner_microstep: 1530.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2088
[2024-06-10 19:41:12,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.38 | bwd_microstep: 919.95 | bwd_inner_microstep: 919.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 19:41:14,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.81 | bwd_microstep: 1612.70 | bwd_inner_microstep: 1612.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 19:41:16,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1471.92 | bwd_inner_microstep: 1471.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3643
[2024-06-10 19:41:18,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.40 | bwd_microstep: 1472.81 | bwd_inner_microstep: 1472.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535
[2024-06-10 19:41:20,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1421.60 | bwd_inner_microstep: 1421.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3583
[2024-06-10 19:41:22,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.81 | bwd_microstep: 1208.30 | bwd_inner_microstep: 1208.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-10 19:41:24,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.25 | bwd_microstep: 1613.97 | bwd_inner_microstep: 1613.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 19:41:26,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.38 | bwd_microstep: 1485.45 | bwd_inner_microstep: 1485.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3617
[2024-06-10 19:41:28,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.10 | bwd_microstep: 1341.77 | bwd_inner_microstep: 1341.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2344
[2024-06-10 19:41:29,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.41 | bwd_microstep: 987.67 | bwd_inner_microstep: 987.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 19:41:31,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1277.11 | bwd_inner_microstep: 1277.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 19:41:33,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1298.79 | bwd_inner_microstep: 1298.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3693
[2024-06-10 19:41:35,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.90 | bwd_microstep: 1453.67 | bwd_inner_microstep: 1453.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 19:41:37,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1409.86 | bwd_inner_microstep: 1409.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 19:41:39,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1412.57 | bwd_inner_microstep: 1412.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 19:41:41,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1497.49 | bwd_inner_microstep: 1497.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2073
[2024-06-10 19:41:42,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.44 | bwd_microstep: 1010.23 | bwd_inner_microstep: 1010.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-10 19:41:46,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 19:41:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 3351.36 | bwd_inner_microstep: 1458.82 | bwd_allreduce_microstep: 1892.48 | step_microstep: 37.90
[2024-06-10 19:41:46,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15890.37 | bwd: 44439.31 | bwd_inner: 42545.92 | bwd_allreduce: 1892.71 | step: 39.51
{'loss': 1.2369, 'learning_rate': 1.2251072339868997e-05, 'epoch': 0.64}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 19:41:48,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1332.14 | bwd_inner_microstep: 1332.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3391
[2024-06-10 19:41:50,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.40 | bwd_microstep: 1143.95 | bwd_inner_microstep: 1143.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2310
[2024-06-10 19:41:51,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.00 | bwd_microstep: 979.78 | bwd_inner_microstep: 979.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2321
[2024-06-10 19:41:52,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.49 | bwd_microstep: 820.57 | bwd_inner_microstep: 820.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 19:41:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.62 | bwd_microstep: 1646.01 | bwd_inner_microstep: 1645.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 19:41:56,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.44 | bwd_microstep: 1451.73 | bwd_inner_microstep: 1451.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-10 19:41:58,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1410.81 | bwd_inner_microstep: 1410.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 19:41:59,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.92 | bwd_microstep: 678.99 | bwd_inner_microstep: 678.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 19:42:00,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.71 | bwd_microstep: 804.26 | bwd_inner_microstep: 804.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 19:42:02,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.35 | bwd_microstep: 1317.55 | bwd_inner_microstep: 1317.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670
[2024-06-10 19:42:04,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.59 | bwd_microstep: 1512.33 | bwd_inner_microstep: 1512.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2906
[2024-06-10 19:42:06,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.95 | bwd_microstep: 1160.16 | bwd_inner_microstep: 1160.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686
[2024-06-10 19:42:08,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1328.65 | bwd_inner_microstep: 1328.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 19:42:09,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.87 | bwd_microstep: 889.44 | bwd_inner_microstep: 889.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3401
[2024-06-10 19:42:11,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.71 | bwd_microstep: 1197.48 | bwd_inner_microstep: 1197.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 19:42:12,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-10 19:42:14,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.26 | bwd_microstep: 1417.14 | bwd_inner_microstep: 1417.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 19:42:16,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1398.11 | bwd_inner_microstep: 1398.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1395
[2024-06-10 19:42:17,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.36 | bwd_microstep: 527.47 | bwd_inner_microstep: 527.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 19:42:19,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1507.23 | bwd_inner_microstep: 1507.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297
[2024-06-10 19:42:20,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.18 | bwd_microstep: 880.98 | bwd_inner_microstep: 880.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 19:42:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1515.03 | bwd_inner_microstep: 1515.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 19:42:25,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1655.40 | bwd_inner_microstep: 1655.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 19:42:26,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.35 | bwd_microstep: 1255.74 | bwd_inner_microstep: 1255.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 19:42:28,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1402.13 | bwd_inner_microstep: 1402.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 19:42:29,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.01 | bwd_microstep: 811.30 | bwd_inner_microstep: 811.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864
[2024-06-10 19:42:31,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1465.22 | bwd_inner_microstep: 1465.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-10 19:42:34,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.29 | bwd_microstep: 1474.47 | bwd_inner_microstep: 1474.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3575
[2024-06-10 19:42:35,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.50 | bwd_microstep: 1236.22 | bwd_inner_microstep: 1236.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240
[2024-06-10 19:42:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.02 | bwd_microstep: 964.56 | bwd_inner_microstep: 964.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2247
[2024-06-10 19:42:38,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.33 | bwd_microstep: 902.17 | bwd_inner_microstep: 902.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3593
[2024-06-10 19:42:48,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.10 | optimizer_step: 6.61
[2024-06-10 19:42:48,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.84 | bwd_microstep: 9164.91 | bwd_inner_microstep: 1874.73 | bwd_allreduce_microstep: 7290.12 | step_microstep: 37.99
[2024-06-10 19:42:48,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14566.33 | bwd: 46498.11 | bwd_inner: 39207.09 | bwd_allreduce: 7290.36 | step: 39.45
{'loss': 1.2101, 'learning_rate': 1.221648373737935e-05, 'epoch': 0.64}
27, 61.09s/it]
 64%|██████▎   | 1098/1726 [19:01:18<10:39:23, 61.09s/it]


 64%|██████▎   | 1098/1726 [19:01:18<10:39:23, 61.09s/it]
 64%|██████▎   | 1099/1726 [19:02:19<10:36:56, 60.95s/it]


 64%|██████▎   | 1099/1726 [19:02:19<10:36:56, 60.95s/it]
 64%|██████▎   | 1100/1726 [19:03:22<10:43:27, 61.67s/it]


 64%|██████▎   | 1100/1726 [19:03:22<10:43:27, 61.67s/it]
 64%|██████▍   | 1101/1726 [19:04:23<10:39:16, 61.37s/it]


 64%|██████▍   | 1101/1726 [19:04:23<10:39:16, 61.37s/it]
 64%|██████▍   | 1102/1726 [19:05:24<10:38:18, 61.38s/it]


 64%|██████▍   | 1102/1726 [19:05:24<10:38:18, 61.38dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960
[2024-06-10 19:42:49,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 698.60 | bwd_inner_microstep: 698.47 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4157
[2024-06-10 19:42:51,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1638.11 | bwd_inner_microstep: 1638.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-10 19:42:53,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1653.06 | bwd_inner_microstep: 1653.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297
[2024-06-10 19:42:54,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.67 | bwd_microstep: 971.67 | bwd_inner_microstep: 971.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 19:42:56,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.33 | bwd_microstep: 1341.29 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 19:42:58,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.56 | bwd_microstep: 1274.98 | bwd_inner_microstep: 1274.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 19:43:00,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.81 | bwd_microstep: 1146.98 | bwd_inner_microstep: 1146.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754
[2024-06-10 19:43:02,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.75 | bwd_microstep: 1537.99 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2051
[2024-06-10 19:43:03,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.26 | bwd_microstep: 863.80 | bwd_inner_microstep: 863.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910
[2024-06-10 19:43:04,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.49 | bwd_microstep: 748.76 | bwd_inner_microstep: 748.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 19:43:06,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.83 | bwd_microstep: 1378.59 | bwd_inner_microstep: 1378.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462
[2024-06-10 19:43:08,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.88 | bwd_microstep: 1566.01 | bwd_inner_microstep: 1565.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3502
[2024-06-10 19:43:10,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1510.07 | bwd_inner_microstep: 1510.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489
[2024-06-10 19:43:12,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1581.55 | bwd_inner_microstep: 1581.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 19:43:15,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.36 | bwd_microstep: 1647.89 | bwd_inner_microstep: 1647.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 19:43:17,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1500.92 | bwd_inner_microstep: 1500.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 19:43:19,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-10 19:43:21,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1349.82 | bwd_inner_microstep: 1349.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-10 19:43:23,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.94 | bwd_microstep: 1626.12 | bwd_inner_microstep: 1626.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 19:43:25,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1558.86 | bwd_inner_microstep: 1558.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454
[2024-06-10 19:43:27,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.81 | bwd_microstep: 1219.86 | bwd_inner_microstep: 1219.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 19:43:29,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.54 | bwd_microstep: 1554.06 | bwd_inner_microstep: 1554.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 19:43:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.39 | bwd_microstep: 1508.74 | bwd_inner_microstep: 1508.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425
[2024-06-10 19:43:33,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.39 | bwd_microstep: 1211.37 | bwd_inner_microstep: 1211.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545
[2024-06-10 19:43:35,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.80 | bwd_microstep: 1585.12 | bwd_inner_microstep: 1585.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814
[2024-06-10 19:43:37,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.31 | bwd_microstep: 1716.80 | bwd_inner_microstep: 1716.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 19:43:39,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.18 | bwd_microstep: 1584.54 | bwd_inner_microstep: 1584.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2620
[2024-06-10 19:43:41,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.64 | bwd_microstep: 1111.19 | bwd_inner_microstep: 1111.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 19:43:43,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.31 | bwd_microstep: 1302.12 | bwd_inner_microstep: 1302.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3606
[2024-06-10 19:43:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1338.58 | bwd_inner_microstep: 1338.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3568
[2024-06-10 19:43:46,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.30 | bwd_microstep: 1449.55 | bwd_inner_microstep: 1449.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960
[2024-06-10 19:43:48,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 19:43:48,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.29 | bwd_microstep: 1129.57 | bwd_inner_microstep: 740.72 | bwd_allreduce_microstep: 388.80 | step_microstep: 37.69
[2024-06-10 19:43:48,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16173.06 | bwd: 43816.62 | bwd_inner: 43426.83 | bwd_allreduce: 389.06 | step: 39.24
{'loss': 1.1748, 'learning_rate': 1.2181922548393519e-05, 'epoch': 0.64}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-10 19:43:50,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1442.16 | bwd_inner_microstep: 1442.07 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 19:43:52,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1478.34 | bwd_inner_microstep: 1478.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 19:43:54,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.39 | bwd_microstep: 1483.38 | bwd_inner_microstep: 1483.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 19:43:56,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1476.56 | bwd_inner_microstep: 1476.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 19:43:58,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1383.46 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 19:44:00,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1479.92 | bwd_inner_microstep: 1479.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 19:44:02,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1636.58 | bwd_inner_microstep: 1636.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868
[2024-06-10 19:44:03,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.56 | bwd_microstep: 708.80 | bwd_inner_microstep: 708.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 19:44:05,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1283.91 | bwd_inner_microstep: 1283.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408
[2024-06-10 19:44:07,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1305.51 | bwd_inner_microstep: 1305.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 19:44:09,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1442.86 | bwd_inner_microstep: 1442.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655
[2024-06-10 19:44:11,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.39 | bwd_microstep: 1612.38 | bwd_inner_microstep: 1612.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689
[2024-06-10 19:44:13,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.38 | bwd_microstep: 1423.72 | bwd_inner_microstep: 1423.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 19:44:15,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.27 | bwd_microstep: 1423.46 | bwd_inner_microstep: 1423.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3461
[2024-06-10 19:44:17,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.38 | bwd_microstep: 1246.96 | bwd_inner_microstep: 1246.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 19:44:19,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 1425.50 | bwd_inner_microstep: 1425.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088
[2024-06-10 19:44:20,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.56 | bwd_microstep: 727.48 | bwd_inner_microstep: 727.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-10 19:44:22,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1511.24 | bwd_inner_microstep: 1511.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2088
[2024-06-10 19:44:23,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 19:44:25,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.76 | bwd_microstep: 1658.38 | bwd_inner_microstep: 1658.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 19:44:26,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 977.35 | bwd_inner_microstep: 977.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 19:44:29,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1524.48 | bwd_inner_microstep: 1524.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3824
[2024-06-10 19:44:30,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1293.47 | bwd_inner_microstep: 1293.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3719
[2024-06-10 19:44:32,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.97 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 19:44:33,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.35 | bwd_microstep: 882.18 | bwd_inner_microstep: 882.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-10 19:44:34,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.18 | bwd_microstep: 815.49 | bwd_inner_microstep: 815.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 19:44:36,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1445.85 | bwd_inner_microstep: 1445.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1746
[2024-06-10 19:44:37,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 244.14 | bwd_microstep: 627.55 | bwd_inner_microstep: 627.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3805
[2024-06-10 19:44:39,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1509.63 | bwd_inner_microstep: 1509.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 19:44:41,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1348.90 | bwd_inner_microstep: 1348.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3450
[2024-06-10 19:44:43,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1403.70 | bwd_inner_microstep: 1403.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 19:44:50,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 19:44:50,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 6524.86 | bwd_inner_microstep: 1643.96 | bwd_allreduce_microstep: 4880.85 | step_microstep: 37.80
[2024-06-10 19:44:50,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15539.63 | bwd: 46538.20 | bwd_inner: 41656.38 | bwd_allreduce: 4881.13 | step: 39.26
{'loss': 1.208, 'learning_rate': 1.2147388894635832e-05, 'epoch': 0.64}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 19:44:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.23 | bwd_microstep: 1269.32 | bwd_inner_microstep: 1269.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 19:44:54,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1411.03 | bwd_inner_microstep: 1411.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-10 19:44:56,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 19:44:58,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1389.59 | bwd_inner_microstep: 1389.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3768
[2024-06-10 19:45:00,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1505.53 | bwd_inner_microstep: 1505.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 19:45:02,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-10 19:45:04,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.08 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 19:45:05,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1252.64 | bwd_inner_microstep: 1252.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 19:45:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1320.01 | bwd_inner_microstep: 1319.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 19:45:09,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1345.71 | bwd_inner_microstep: 1345.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 19:45:10,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.33 | bwd_microstep: 783.08 | bwd_inner_microstep: 783.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 19:45:12,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.06 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1994
[2024-06-10 19:45:13,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 897.21 | bwd_inner_microstep: 897.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3034
[2024-06-10 19:45:15,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.32 | bwd_microstep: 1230.60 | bwd_inner_microstep: 1230.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 19:45:17,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.54 | bwd_microstep: 1514.11 | bwd_inner_microstep: 1514.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-10 19:45:18,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.10 | bwd_microstep: 891.14 | bwd_inner_microstep: 891.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-10 19:45:20,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3718
[2024-06-10 19:45:22,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.03 | bwd_microstep: 1562.56 | bwd_inner_microstep: 1562.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3532
[2024-06-10 19:45:24,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1277.17 | bwd_inner_microstep: 1277.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3468
[2024-06-10 19:45:26,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1325.67 | bwd_inner_microstep: 1325.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 19:45:28,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.83 | bwd_microstep: 1693.85 | bwd_inner_microstep: 1693.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 19:45:30,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1490.04 | bwd_inner_microstep: 1490.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903
[2024-06-10 19:45:31,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.34 | bwd_microstep: 684.33 | bwd_inner_microstep: 684.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 19:45:33,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 1294.89 | bwd_inner_microstep: 1294.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3563
[2024-06-10 19:45:35,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.13 | bwd_microstep: 1204.71 | bwd_inner_microstep: 1204.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3828
[2024-06-10 19:45:37,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1296.64 | bwd_inner_microstep: 1296.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 971
[2024-06-10 19:45:37,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 150.78 | bwd_microstep: 386.45 | bwd_inner_microstep: 386.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 19:45:39,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.54 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 19:45:41,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.79 | bwd_microstep: 1458.46 | bwd_inner_microstep: 1458.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2248
[2024-06-10 19:45:42,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.60 | bwd_microstep: 842.34 | bwd_inner_microstep: 842.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1953
[2024-06-10 19:45:43,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 701.17 | bwd_inner_microstep: 701.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 19:45:52,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 19:45:52,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.03 | bwd_microstep: 8140.24 | bwd_inner_microstep: 930.71 | bwd_allreduce_microstep: 7209.47 | step_microstep: 38.02
[2024-06-10 19:45:52,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14685.16 | bwd: 46347.70 | bwd_inner: 39137.32 | bwd_allreduce: 7209.70 | step: 39.45
{'loss': 1.1387, 'learning_rate': 1.2112882897733634e-05, 'epoch': 0.64}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 19:45:53,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.47 | bwd_microstep: 1266.14 | bwd_inner_microstep: 1266.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063
[2024-06-10 19:45:55,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.25 | bwd_microstep: 815.13 | bwd_inner_microstep: 815.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4260
[2024-06-10 19:45:57,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.21 | bwd_microstep: 1666.99 | bwd_inner_microstep: 1666.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827
[2024-06-10 19:45:59,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1518.57 | bwd_inner_microstep: 1518.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-10 19:46:01,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 19:46:03,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.47 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 19:46:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1395.08 | bwd_inner_microstep: 1395.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 19:46:06,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 19:46:08,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-10 19:46:10,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1312.51 | bwd_inner_microstep: 1312.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3500
[2024-06-10 19:46:12,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.23 | bwd_microstep: 1220.85 | bwd_inner_microstep: 1220.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3490
[2024-06-10 19:46:14,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.23 | bwd_microstep: 1508.56 | bwd_inner_microstep: 1508.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379
[2024-06-10 19:46:16,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.79 | bwd_microstep: 1239.18 | bwd_inner_microstep: 1239.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2487
[2024-06-10 19:46:17,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.53 | bwd_microstep: 952.99 | bwd_inner_microstep: 952.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3648
[2024-06-10 19:46:19,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.05 | bwd_microstep: 1312.67 | bwd_inner_microstep: 1312.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 19:46:21,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1473.13 | bwd_inner_microstep: 1473.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 19:46:23,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1372.87 | bwd_inner_microstep: 1372.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465
[2024-06-10 19:46:25,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.95 | bwd_microstep: 1216.14 | bwd_inner_microstep: 1216.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293
[2024-06-10 19:46:26,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.19 | bwd_microstep: 879.47 | bwd_inner_microstep: 879.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 19:46:28,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1389.36 | bwd_inner_microstep: 1389.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1490
[2024-06-10 19:46:28,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 210.06 | bwd_microstep: 546.08 | bwd_inner_microstep: 546.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 19:46:30,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.73 | bwd_microstep: 1384.85 | bwd_inner_microstep: 1384.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 19:46:31,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.94 | bwd_microstep: 801.21 | bwd_inner_microstep: 801.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-10 19:46:33,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.09 | bwd_microstep: 1353.61 | bwd_inner_microstep: 1353.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-10 19:46:35,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1414.80 | bwd_inner_microstep: 1414.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811
[2024-06-10 19:46:37,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1411.90 | bwd_inner_microstep: 1411.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3614
[2024-06-10 19:46:39,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1572.76 | bwd_inner_microstep: 1572.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824
[2024-06-10 19:46:42,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.80 | bwd_microstep: 1755.28 | bwd_inner_microstep: 1755.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3585
[2024-06-10 19:46:44,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.49 | bwd_microstep: 1463.64 | bwd_inner_microstep: 1463.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2252
[2024-06-10 19:46:45,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.20 | bwd_microstep: 869.09 | bwd_inner_microstep: 869.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 19:46:47,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1635.38 | bwd_inner_microstep: 1635.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2908
[2024-06-10 19:46:53,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 19:46:53,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.12 | bwd_microstep: 5735.63 | bwd_inner_microstep: 1276.73 | bwd_allreduce_microstep: 4458.85 | step_microstep: 37.68
[2024-06-10 19:46:53,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15507.74 | bwd: 45987.38 | bwd_inner: 41527.63 | bwd_allreduce: 4459.08 | step: 39.13
{'loss': 1.1551, 'learning_rate': 1.2078404679216864e-05, 'epoch': 0.64}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 19:46:55,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1381.47 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-10 19:46:58,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.56 | bwd_microstep: 1558.41 | bwd_inner_microstep: 1558.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 19:47:00,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.10 | bwd_microstep: 1479.18 | bwd_inner_microstep: 1479.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462
[2024-06-10 19:47:01,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.37 | bwd_microstep: 1211.12 | bwd_inner_microstep: 1211.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 19:47:02,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 793.48 | bwd_inner_microstep: 793.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 19:47:04,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 19:47:06,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4062
[2024-06-10 19:47:08,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1585.89 | bwd_inner_microstep: 1585.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 19:47:10,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.63 | bwd_microstep: 1386.38 | bwd_inner_microstep: 1386.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2077
[2024-06-10 19:47:11,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 851.74 | bwd_inner_microstep: 851.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697
[2024-06-10 19:47:13,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.93 | bwd_microstep: 1449.31 | bwd_inner_microstep: 1449.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-10 19:47:15,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 1346.82 | bwd_inner_microstep: 1346.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670
[2024-06-10 19:47:17,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.55 | bwd_microstep: 1513.18 | bwd_inner_microstep: 1513.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-10 19:47:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.72 | bwd_microstep: 885.42 | bwd_inner_microstep: 885.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 19:47:20,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.78 | bwd_microstep: 1257.23 | bwd_inner_microstep: 1257.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3883
[2024-06-10 19:47:23,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.08 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 19:47:24,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 794.34 | bwd_inner_microstep: 794.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3504
[2024-06-10 19:47:26,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1513.15 | bwd_inner_microstep: 1513.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 19:47:28,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 19:47:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.30 | bwd_microstep: 1548.25 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 19:47:32,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1558.37 | bwd_inner_microstep: 1558.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 19:47:34,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 19:47:35,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1252.95 | bwd_inner_microstep: 1252.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 19:47:37,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1295.63 | bwd_inner_microstep: 1295.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 19:47:39,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1397.05 | bwd_inner_microstep: 1397.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 19:47:41,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1416.01 | bwd_inner_microstep: 1415.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 19:47:43,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.76 | bwd_microstep: 1449.63 | bwd_inner_microstep: 1449.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-10 19:47:45,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1497.36 | bwd_inner_microstep: 1497.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 19:47:47,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1297.44 | bwd_inner_microstep: 1297.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574
[2024-06-10 19:47:49,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.44 | bwd_microstep: 1523.64 | bwd_inner_microstep: 1523.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577
[2024-06-10 19:47:51,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1526.76 | bwd_inner_microstep: 1526.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2044
[2024-06-10 19:47:54,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 19:47:54,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.42 | bwd_microstep: 2261.56 | bwd_inner_microstep: 965.52 | bwd_allreduce_microstep: 1296.00 | step_microstep: 37.67
[2024-06-10 19:47:54,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15962.49 | bwd: 44083.12 | bwd_inner: 42786.22 | bwd_allreduce: 1296.23 | step: 39.12
{'loss': 1.2537, 'learning_rate': 1.2043954360517635e-05, 'epoch': 0.64}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612
[2024-06-10 19:47:56,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1331.60 | bwd_inner_microstep: 1331.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3975
[2024-06-10 19:47:58,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1602.85 | bwd_inner_microstep: 1602.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851
[2024-06-10 19:48:00,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.45 | bwd_microstep: 1557.81 | bwd_inner_microstep: 1557.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 19:48:02,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 19:48:04,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1540.64 | bwd_inner_microstep: 1540.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3740
[2024-06-10 19:48:06,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1367.72 | bwd_inner_microstep: 1367.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 19:48:08,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1389.73 | bwd_inner_microstep: 1389.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4112
[2024-06-10 19:48:10,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.88 | bwd_microstep: 1734.93 | bwd_inner_microstep: 1734.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 19:48:12,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-10 19:48:14,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1522.87 | bwd_inner_microstep: 1522.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502
[2024-06-10 19:48:16,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1439.54 | bwd_inner_microstep: 1439.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 19:48:18,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1471.23 | bwd_inner_microstep: 1471.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-10 19:48:20,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1241.83 | bwd_inner_microstep: 1241.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 19:48:22,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 19:48:24,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1244.17 | bwd_inner_microstep: 1244.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-10 19:48:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.47 | bwd_microstep: 1645.50 | bwd_inner_microstep: 1645.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 19:48:28,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.95 | bwd_microstep: 1244.23 | bwd_inner_microstep: 1244.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 19:48:29,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1391.50 | bwd_inner_microstep: 1391.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446
[2024-06-10 19:48:31,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.86 | bwd_microstep: 1287.33 | bwd_inner_microstep: 1287.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 19:48:33,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.24 | bwd_microstep: 1349.44 | bwd_inner_microstep: 1349.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-10 19:48:35,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.39 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 19:48:37,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1554.87 | bwd_inner_microstep: 1554.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 19:48:39,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1378.99 | bwd_inner_microstep: 1378.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 19:48:41,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.28 | bwd_microstep: 1249.79 | bwd_inner_microstep: 1249.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 19:48:43,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.82 | bwd_microstep: 1456.88 | bwd_inner_microstep: 1456.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 19:48:44,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.74 | bwd_microstep: 974.14 | bwd_inner_microstep: 974.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 19:48:46,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 19:48:49,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.87 | bwd_microstep: 1528.88 | bwd_inner_microstep: 1528.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2055
[2024-06-10 19:48:50,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.70 | bwd_microstep: 872.95 | bwd_inner_microstep: 872.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-10 19:48:52,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1516.01 | bwd_inner_microstep: 1515.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3413
[2024-06-10 19:48:54,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.71 | bwd_microstep: 1465.13 | bwd_inner_microstep: 1465.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 19:48:56,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-10 19:48:56,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1452.98 | bwd_inner_microstep: 1377.30 | bwd_allreduce_microstep: 75.63 | step_microstep: 37.75
[2024-06-10 19:48:56,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16767.28 | bwd: 44962.57 | bwd_inner: 44886.04 | bwd_allreduce: 75.85 | step: 39.24
s/it]
 64%|██████▍   | 1103/1726 [19:06:25<10:34:00, 61.06s/it]


 64%|██████▍   | 1103/1726 [19:06:25<10:34:00, 61.06s/it]
 64%|██████▍   | 1104/1726 [19:07:27<10:37:11, 61.46s/it]


 64%|██████▍   | 1104/1726 [19:07:27<10:37:11, 61.46s/it]
 64%|██████▍   | 1105/1726 [19:08:28<10:35:48, 61.43s/it]


 64%|██████▍   | 1105/1726 [19:08:28<10:35:48, 61.43s/it]
 64%|██████▍   | 1106/1726 [19:09:30<10:35:58, 61.55s/it]


 64%|██████▍   | 1106/1726 [19:09:30<10:35:58, 61.55s/it]
 64%|██████▍   | 1107/1726 [19:10:31<10:31:18, 61.19s/it]


 64%|██████▍   | 1107/1726 [19:10:31<10:31:18, 61.19s/it]
 64{'loss': 1.2238, 'learning_rate': 1.2009532062969801e-05, 'epoch': 0.64}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 19:48:58,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1373.92 | bwd_inner_microstep: 1373.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3910
[2024-06-10 19:49:00,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1421.35 | bwd_inner_microstep: 1421.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3858
[2024-06-10 19:49:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.10 | bwd_microstep: 1560.32 | bwd_inner_microstep: 1560.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 19:49:04,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403
[2024-06-10 19:49:05,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.63 | bwd_microstep: 1209.33 | bwd_inner_microstep: 1209.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3753
[2024-06-10 19:49:08,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.91 | bwd_microstep: 1586.29 | bwd_inner_microstep: 1586.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 19:49:09,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.28 | bwd_microstep: 806.92 | bwd_inner_microstep: 806.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 19:49:11,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 19:49:12,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1249.70 | bwd_inner_microstep: 1249.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3692
[2024-06-10 19:49:15,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1555.54 | bwd_inner_microstep: 1555.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3406
[2024-06-10 19:49:16,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2012
[2024-06-10 19:49:18,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.12 | bwd_microstep: 862.20 | bwd_inner_microstep: 862.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-10 19:49:20,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1412.53 | bwd_inner_microstep: 1412.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642
[2024-06-10 19:49:22,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.96 | bwd_microstep: 1710.74 | bwd_inner_microstep: 1710.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644
[2024-06-10 19:49:24,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.52 | bwd_microstep: 1504.39 | bwd_inner_microstep: 1504.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3706
[2024-06-10 19:49:26,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.90 | bwd_microstep: 1578.95 | bwd_inner_microstep: 1578.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3845
[2024-06-10 19:49:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1463.40 | bwd_inner_microstep: 1463.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 19:49:30,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1513.53 | bwd_inner_microstep: 1513.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987
[2024-06-10 19:49:31,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.12 | bwd_microstep: 707.15 | bwd_inner_microstep: 707.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3578
[2024-06-10 19:49:33,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.93 | bwd_microstep: 1206.37 | bwd_inner_microstep: 1206.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 19:49:35,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.09 | bwd_microstep: 1606.48 | bwd_inner_microstep: 1606.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3620
[2024-06-10 19:49:37,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1440.97 | bwd_inner_microstep: 1440.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3450
[2024-06-10 19:49:39,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.44 | bwd_microstep: 1187.53 | bwd_inner_microstep: 1187.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 19:49:41,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.36 | bwd_microstep: 1372.21 | bwd_inner_microstep: 1372.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 19:49:43,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.97 | bwd_microstep: 1438.96 | bwd_inner_microstep: 1438.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919
[2024-06-10 19:49:44,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.10 | bwd_microstep: 1193.62 | bwd_inner_microstep: 1193.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 19:49:46,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.42 | bwd_microstep: 1297.97 | bwd_inner_microstep: 1297.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 19:49:48,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.48 | bwd_microstep: 1185.14 | bwd_inner_microstep: 1185.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-10 19:49:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1512.78 | bwd_inner_microstep: 1512.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3803
[2024-06-10 19:49:52,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.51 | bwd_microstep: 1618.54 | bwd_inner_microstep: 1618.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3001
[2024-06-10 19:49:54,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.77 | bwd_microstep: 1203.47 | bwd_inner_microstep: 1203.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3584
[2024-06-10 19:49:56,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 19:49:56,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.72 | bwd_microstep: 1786.80 | bwd_inner_microstep: 1628.22 | bwd_allreduce_microstep: 158.53 | step_microstep: 37.90
[2024-06-10 19:49:56,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16296.87 | bwd: 43712.18 | bwd_inner: 43552.75 | bwd_allreduce: 158.76 | step: 39.41
{'loss': 1.247, 'learning_rate': 1.1975137907808492e-05, 'epoch': 0.64}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-10 19:49:58,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1509.49 | bwd_inner_microstep: 1509.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952
[2024-06-10 19:50:01,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.92 | bwd_microstep: 1694.04 | bwd_inner_microstep: 1694.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804
[2024-06-10 19:50:03,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1551.79 | bwd_inner_microstep: 1551.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-10 19:50:05,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.77 | bwd_microstep: 1441.83 | bwd_inner_microstep: 1441.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 19:50:07,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-10 19:50:09,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.36 | bwd_microstep: 1632.03 | bwd_inner_microstep: 1632.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-10 19:50:10,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.24 | bwd_microstep: 1215.47 | bwd_inner_microstep: 1215.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 19:50:13,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.79 | bwd_microstep: 1625.82 | bwd_inner_microstep: 1625.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 19:50:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.38 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 19:50:16,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 19:50:18,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.52 | bwd_microstep: 1422.14 | bwd_inner_microstep: 1422.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2661
[2024-06-10 19:50:20,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.91 | bwd_microstep: 1082.05 | bwd_inner_microstep: 1082.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3718
[2024-06-10 19:50:22,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.19 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2133
[2024-06-10 19:50:23,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.70 | bwd_microstep: 1023.06 | bwd_inner_microstep: 1023.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910
[2024-06-10 19:50:25,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.08 | bwd_microstep: 1665.63 | bwd_inner_microstep: 1665.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2143
[2024-06-10 19:50:26,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.49 | bwd_microstep: 740.77 | bwd_inner_microstep: 740.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527
[2024-06-10 19:50:29,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.14 | bwd_microstep: 1581.61 | bwd_inner_microstep: 1581.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3420
[2024-06-10 19:50:31,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1612.84 | bwd_inner_microstep: 1612.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3380
[2024-06-10 19:50:32,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.59 | bwd_microstep: 1241.68 | bwd_inner_microstep: 1241.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2911
[2024-06-10 19:50:34,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.46 | bwd_microstep: 1280.30 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3668
[2024-06-10 19:50:36,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.60 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287
[2024-06-10 19:50:38,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 876.74 | bwd_inner_microstep: 876.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820
[2024-06-10 19:50:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.17 | bwd_microstep: 1852.53 | bwd_inner_microstep: 1852.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-10 19:50:41,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.96 | bwd_microstep: 912.68 | bwd_inner_microstep: 912.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-10 19:50:44,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.12 | bwd_microstep: 1610.99 | bwd_inner_microstep: 1610.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 19:50:46,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1503.66 | bwd_inner_microstep: 1503.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 19:50:48,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1346.56 | bwd_inner_microstep: 1346.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 19:50:49,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 19:50:51,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1354.77 | bwd_inner_microstep: 1354.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549
[2024-06-10 19:50:53,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1326.87 | bwd_inner_microstep: 1326.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-10 19:50:55,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.02 | bwd_microstep: 1599.70 | bwd_inner_microstep: 1599.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 19:50:57,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-10 19:50:57,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.03 | bwd_microstep: 1323.02 | bwd_inner_microstep: 1314.10 | bwd_allreduce_microstep: 8.88 | step_microstep: 37.55
[2024-06-10 19:50:57,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16234.77 | bwd: 44420.35 | bwd_inner: 44410.58 | bwd_allreduce: 9.11 | step: 39.10
{'loss': 1.2276, 'learning_rate': 1.1940772016169753e-05, 'epoch': 0.64}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 19:50:59,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1473.62 | bwd_inner_microstep: 1473.45 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 19:51:01,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.80 | bwd_microstep: 1182.05 | bwd_inner_microstep: 1182.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 19:51:02,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 808.66 | bwd_inner_microstep: 808.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3554
[2024-06-10 19:51:04,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.97 | bwd_microstep: 1360.33 | bwd_inner_microstep: 1360.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 19:51:05,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.96 | bwd_microstep: 791.40 | bwd_inner_microstep: 791.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 19:51:06,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.11 | bwd_microstep: 790.66 | bwd_inner_microstep: 790.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 19:51:07,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.48 | bwd_microstep: 777.43 | bwd_inner_microstep: 777.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2528
[2024-06-10 19:51:08,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.24 | bwd_microstep: 933.04 | bwd_inner_microstep: 933.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741
[2024-06-10 19:51:10,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.96 | bwd_microstep: 1439.35 | bwd_inner_microstep: 1439.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 19:51:12,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 19:51:14,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953
[2024-06-10 19:51:15,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.93 | bwd_microstep: 823.78 | bwd_inner_microstep: 823.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-10 19:51:18,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.21 | bwd_microstep: 1595.55 | bwd_inner_microstep: 1595.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-10 19:51:20,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.06 | bwd_microstep: 1601.48 | bwd_inner_microstep: 1601.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439
[2024-06-10 19:51:21,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.76 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 19:51:24,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.41 | bwd_microstep: 1603.80 | bwd_inner_microstep: 1603.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 19:51:26,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.33 | bwd_microstep: 1494.06 | bwd_inner_microstep: 1494.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 19:51:27,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.36 | bwd_microstep: 697.43 | bwd_inner_microstep: 697.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 19:51:28,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130
[2024-06-10 19:51:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.96 | bwd_microstep: 830.64 | bwd_inner_microstep: 830.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 19:51:32,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.87 | bwd_microstep: 1492.70 | bwd_inner_microstep: 1492.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 19:51:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1277.49 | bwd_inner_microstep: 1277.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 19:51:35,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822
[2024-06-10 19:51:37,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1388.92 | bwd_inner_microstep: 1388.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 19:51:39,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.10 | bwd_microstep: 1292.96 | bwd_inner_microstep: 1292.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 19:51:41,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.52 | bwd_microstep: 1660.48 | bwd_inner_microstep: 1660.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461
[2024-06-10 19:51:43,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.16 | bwd_microstep: 1343.32 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 19:51:44,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 809.88 | bwd_inner_microstep: 809.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2658
[2024-06-10 19:51:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.41 | bwd_microstep: 1117.62 | bwd_inner_microstep: 1117.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-10 19:51:48,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.83 | bwd_microstep: 1593.35 | bwd_inner_microstep: 1593.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2943
[2024-06-10 19:51:50,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.54 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3423
[2024-06-10 19:51:59,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.19 | optimizer_step: 6.64
[2024-06-10 19:51:59,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.40 | bwd_microstep: 8710.11 | bwd_inner_microstep: 1757.13 | bwd_allreduce_microstep: 6952.93 | step_microstep: 37.91
[2024-06-10 19:51:59,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14826.60 | bwd: 46682.22 | bwd_inner: 39728.23 | bwd_allreduce: 6953.23 | step: 39.43
{'loss': 1.2272, 'learning_rate': 1.190643450909008e-05, 'epoch': 0.64}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 19:52:00,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.04 | bwd_microstep: 784.93 | bwd_inner_microstep: 784.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2419
[2024-06-10 19:52:01,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.37 | bwd_microstep: 964.07 | bwd_inner_microstep: 964.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 19:52:03,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1382.76 | bwd_inner_microstep: 1382.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 19:52:05,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1339.28 | bwd_inner_microstep: 1339.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1950
[2024-06-10 19:52:06,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.16 | bwd_microstep: 728.11 | bwd_inner_microstep: 728.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 19:52:08,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.01 | bwd_microstep: 1144.37 | bwd_inner_microstep: 1144.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2885
[2024-06-10 19:52:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.31 | bwd_microstep: 1086.15 | bwd_inner_microstep: 1086.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211
[2024-06-10 19:52:11,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.90 | bwd_microstep: 954.80 | bwd_inner_microstep: 954.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2141
[2024-06-10 19:52:12,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.09 | bwd_microstep: 799.45 | bwd_inner_microstep: 799.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-10 19:52:14,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.27 | bwd_microstep: 1317.88 | bwd_inner_microstep: 1317.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 19:52:16,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1408.66 | bwd_inner_microstep: 1408.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3483
[2024-06-10 19:52:18,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.99 | bwd_microstep: 1570.46 | bwd_inner_microstep: 1570.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 19:52:20,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1345.41 | bwd_inner_microstep: 1345.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3642
[2024-06-10 19:52:22,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.07 | bwd_microstep: 1673.49 | bwd_inner_microstep: 1673.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 19:52:24,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 19:52:26,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1440.41 | bwd_inner_microstep: 1440.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 19:52:28,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-10 19:52:29,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.40 | bwd_microstep: 797.92 | bwd_inner_microstep: 797.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644
[2024-06-10 19:52:31,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.68 | bwd_microstep: 1445.37 | bwd_inner_microstep: 1445.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3475
[2024-06-10 19:52:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.50 | bwd_microstep: 1435.03 | bwd_inner_microstep: 1435.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 19:52:35,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.07 | bwd_microstep: 1598.92 | bwd_inner_microstep: 1598.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3721
[2024-06-10 19:52:37,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1465.93 | bwd_inner_microstep: 1465.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295
[2024-06-10 19:52:38,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 879.45 | bwd_inner_microstep: 879.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 19:52:40,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.86 | bwd_microstep: 1188.02 | bwd_inner_microstep: 1187.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 19:52:42,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1404.28 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 19:52:44,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1298.41 | bwd_inner_microstep: 1298.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 19:52:46,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.57 | bwd_microstep: 1438.81 | bwd_inner_microstep: 1438.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 19:52:48,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.68 | bwd_microstep: 1438.76 | bwd_inner_microstep: 1438.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 19:52:50,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1394.11 | bwd_inner_microstep: 1394.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 19:52:51,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.91 | bwd_microstep: 1405.49 | bwd_inner_microstep: 1405.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 19:52:54,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.42 | bwd_microstep: 1486.76 | bwd_inner_microstep: 1486.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 19:53:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.58
[2024-06-10 19:53:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.12 | bwd_microstep: 7624.53 | bwd_inner_microstep: 1419.17 | bwd_allreduce_microstep: 6205.31 | step_microstep: 38.11
[2024-06-10 19:53:02,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15263.96 | bwd: 47005.91 | bwd_inner: 40799.69 | bwd_allreduce: 6205.54 | step: 39.52
{'loss': 1.1481, 'learning_rate': 1.1872125507505993e-05, 'epoch': 0.64}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 19:53:04,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1379.79 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-10 19:53:05,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.39 | bwd_microstep: 775.48 | bwd_inner_microstep: 775.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 19:53:07,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1475.20 | bwd_inner_microstep: 1475.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 19:53:09,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1541.36 | bwd_inner_microstep: 1541.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 19:53:11,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.41 | bwd_microstep: 1549.63 | bwd_inner_microstep: 1549.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 19:53:13,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.29 | bwd_microstep: 1386.02 | bwd_inner_microstep: 1385.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1976
[2024-06-10 19:53:14,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.81 | bwd_microstep: 702.00 | bwd_inner_microstep: 701.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 19:53:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1379.90 | bwd_inner_microstep: 1379.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 19:53:18,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1402.68 | bwd_inner_microstep: 1402.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3503
[2024-06-10 19:53:19,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1250.70 | bwd_inner_microstep: 1250.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-10 19:53:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.25 | bwd_microstep: 1156.90 | bwd_inner_microstep: 1156.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 19:53:23,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.92 | bwd_microstep: 1485.13 | bwd_inner_microstep: 1485.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 19:53:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.35 | bwd_microstep: 794.37 | bwd_inner_microstep: 794.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 19:53:26,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1418.92 | bwd_inner_microstep: 1418.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 19:53:28,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1389.22 | bwd_inner_microstep: 1389.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 19:53:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 19:53:32,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1410.14 | bwd_inner_microstep: 1410.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3972
[2024-06-10 19:53:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.70 | bwd_microstep: 1638.53 | bwd_inner_microstep: 1638.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3515
[2024-06-10 19:53:36,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.35 | bwd_microstep: 1514.25 | bwd_inner_microstep: 1514.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-10 19:53:38,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1416.78 | bwd_inner_microstep: 1416.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 19:53:40,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.36 | bwd_microstep: 1634.84 | bwd_inner_microstep: 1634.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 19:53:43,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.70 | bwd_microstep: 1555.15 | bwd_inner_microstep: 1555.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 19:53:45,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1406.93 | bwd_inner_microstep: 1406.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827
[2024-06-10 19:53:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.02 | bwd_microstep: 1511.83 | bwd_inner_microstep: 1511.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 19:53:49,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.21 | bwd_microstep: 1396.12 | bwd_inner_microstep: 1396.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 19:53:51,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1424.58 | bwd_inner_microstep: 1424.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3852
[2024-06-10 19:53:53,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.84 | bwd_microstep: 1731.95 | bwd_inner_microstep: 1731.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3596
[2024-06-10 19:53:55,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 1369.17 | bwd_inner_microstep: 1369.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2439
[2024-06-10 19:53:56,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 947.81 | bwd_inner_microstep: 947.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2041
[2024-06-10 19:53:57,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.95 | bwd_microstep: 747.06 | bwd_inner_microstep: 747.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 19:53:59,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1402.19 | bwd_inner_microstep: 1402.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 19:54:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 19:54:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.21 | bwd_microstep: 3381.30 | bwd_inner_microstep: 1577.98 | bwd_allreduce_microstep: 1803.28 | step_microstep: 37.67
[2024-06-10 19:54:03,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16093.04 | bwd: 44963.78 | bwd_inner: 43159.61 | bwd_allreduce: 1803.51 | step: 39.11
%|██████▍   | 1108/1726 [19:11:33<10:32:59, 61.46s/it]


 64%|██████▍   | 1108/1726 [19:11:33<10:32:59, 61.46s/it]
 64%|██████▍   | 1109/1726 [19:12:33<10:28:32, 61.12s/it]


 64%|██████▍   | 1109/1726 [19:12:33<10:28:32, 61.12s/it]
 64%|██████▍   | 1110/1726 [19:13:34<10:27:08, 61.09s/it]


 64%|██████▍   | 1110/1726 [19:13:34<10:27:08, 61.09s/it]
 64%|██████▍   | 1111/1726 [19:14:36<10:28:25, 61.31s/it]


 64%|██████▍   | 1111/1726 [19:14:36<10:28:25, 61.31s/it]
 64%|██████▍   | 1112/1726 [19:15:38<10:31:20, 61.69s/it]


 64%|██████▍   | 1112/1726 [19:15:38<10:31:20, 61.69s/it]
 64%|██�{'loss': 1.2004, 'learning_rate': 1.1837845132253615e-05, 'epoch': 0.64}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 19:54:05,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.89 | bwd_microstep: 1263.44 | bwd_inner_microstep: 1263.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 19:54:07,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1351.08 | bwd_inner_microstep: 1351.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 19:54:09,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1552.43 | bwd_inner_microstep: 1552.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2259
[2024-06-10 19:54:10,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.48 | bwd_microstep: 967.56 | bwd_inner_microstep: 967.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1411
[2024-06-10 19:54:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.70 | bwd_microstep: 562.53 | bwd_inner_microstep: 562.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435
[2024-06-10 19:54:13,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.23 | bwd_microstep: 1156.81 | bwd_inner_microstep: 1156.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-10 19:54:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.43 | bwd_microstep: 1177.96 | bwd_inner_microstep: 1177.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425
[2024-06-10 19:54:16,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.90 | bwd_microstep: 1214.78 | bwd_inner_microstep: 1214.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 19:54:18,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1397.69 | bwd_inner_microstep: 1397.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 19:54:20,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1246.69 | bwd_inner_microstep: 1246.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 19:54:22,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.67 | bwd_microstep: 1526.80 | bwd_inner_microstep: 1526.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 712
[2024-06-10 19:54:22,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 115.51 | bwd_microstep: 289.80 | bwd_inner_microstep: 289.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3501
[2024-06-10 19:54:24,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.83 | bwd_microstep: 1511.73 | bwd_inner_microstep: 1511.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 19:54:26,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.94 | bwd_microstep: 1342.73 | bwd_inner_microstep: 1342.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507
[2024-06-10 19:54:28,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1444.00 | bwd_inner_microstep: 1443.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2674
[2024-06-10 19:54:29,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.56 | bwd_microstep: 984.15 | bwd_inner_microstep: 984.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 19:54:31,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.42 | bwd_microstep: 1526.94 | bwd_inner_microstep: 1526.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-10 19:54:34,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1581.86 | bwd_inner_microstep: 1581.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 19:54:36,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.46 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 19:54:37,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.71 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 19:54:40,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.39 | bwd_microstep: 1611.57 | bwd_inner_microstep: 1611.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 19:54:42,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.46 | bwd_microstep: 1553.16 | bwd_inner_microstep: 1553.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-10 19:54:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.14 | bwd_microstep: 1332.35 | bwd_inner_microstep: 1332.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 19:54:46,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1492.54 | bwd_inner_microstep: 1492.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 19:54:48,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1435.29 | bwd_inner_microstep: 1435.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2065
[2024-06-10 19:54:49,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.59 | bwd_microstep: 946.21 | bwd_inner_microstep: 946.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 19:54:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.75 | bwd_microstep: 1591.81 | bwd_inner_microstep: 1591.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 19:54:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1402.04 | bwd_inner_microstep: 1402.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2061
[2024-06-10 19:54:54,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.44 | bwd_microstep: 1007.91 | bwd_inner_microstep: 1007.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559
[2024-06-10 19:54:56,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1430.55 | bwd_inner_microstep: 1430.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 19:54:58,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580
[2024-06-10 19:55:04,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 19:55:04,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.74 | bwd_microstep: 5675.16 | bwd_inner_microstep: 1898.96 | bwd_allreduce_microstep: 3776.15 | step_microstep: 37.89
[2024-06-10 19:55:04,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15491.49 | bwd: 45591.33 | bwd_inner: 41814.28 | bwd_allreduce: 3776.38 | step: 39.32
{'loss': 1.2255, 'learning_rate': 1.1803593504068256e-05, 'epoch': 0.65}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400
[2024-06-10 19:55:06,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.71 | bwd_microstep: 1172.56 | bwd_inner_microstep: 1172.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 19:55:08,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1373.72 | bwd_inner_microstep: 1373.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3935
[2024-06-10 19:55:10,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.95 | bwd_microstep: 1691.20 | bwd_inner_microstep: 1691.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 19:55:12,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 19:55:13,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.66 | bwd_microstep: 969.87 | bwd_inner_microstep: 969.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764
[2024-06-10 19:55:15,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1438.28 | bwd_inner_microstep: 1438.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4097
[2024-06-10 19:55:18,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.30 | bwd_microstep: 1565.75 | bwd_inner_microstep: 1565.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 19:55:19,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.17 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 19:55:21,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.32 | bwd_microstep: 797.94 | bwd_inner_microstep: 797.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498
[2024-06-10 19:55:22,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.71 | bwd_microstep: 1316.11 | bwd_inner_microstep: 1316.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 19:55:24,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1316.30 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 19:55:26,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.38 | bwd_inner_microstep: 1385.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3540
[2024-06-10 19:55:28,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.26 | bwd_microstep: 1688.24 | bwd_inner_microstep: 1688.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 19:55:30,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1386.73 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623
[2024-06-10 19:55:33,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.82 | bwd_microstep: 1705.55 | bwd_inner_microstep: 1705.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3439
[2024-06-10 19:55:35,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.19 | bwd_microstep: 1310.38 | bwd_inner_microstep: 1310.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3638
[2024-06-10 19:55:37,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.20 | bwd_microstep: 1678.77 | bwd_inner_microstep: 1678.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632
[2024-06-10 19:55:39,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1436.88 | bwd_inner_microstep: 1436.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-10 19:55:41,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.55 | bwd_microstep: 1439.31 | bwd_inner_microstep: 1439.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 19:55:43,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1287.37 | bwd_inner_microstep: 1287.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 19:55:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1450.05 | bwd_inner_microstep: 1450.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 19:55:46,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1284.04 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 19:55:48,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.59 | bwd_microstep: 971.60 | bwd_inner_microstep: 971.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3697
[2024-06-10 19:55:50,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.20 | bwd_microstep: 1480.21 | bwd_inner_microstep: 1480.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3595
[2024-06-10 19:55:52,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.96 | bwd_microstep: 1430.26 | bwd_inner_microstep: 1430.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005
[2024-06-10 19:55:53,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 894.84 | bwd_inner_microstep: 894.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 19:55:55,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.93 | bwd_microstep: 1599.02 | bwd_inner_microstep: 1598.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 19:55:57,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1247.87 | bwd_inner_microstep: 1247.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 19:55:59,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1552.09 | bwd_inner_microstep: 1552.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-10 19:56:01,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.56 | bwd_microstep: 1182.45 | bwd_inner_microstep: 1182.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 19:56:02,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.60 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 19:56:06,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.17 | optimizer_step: 6.60
[2024-06-10 19:56:06,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 3328.39 | bwd_inner_microstep: 1643.36 | bwd_allreduce_microstep: 1684.98 | step_microstep: 37.67
[2024-06-10 19:56:06,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16262.83 | bwd: 45342.90 | bwd_inner: 43657.03 | bwd_allreduce: 1685.21 | step: 39.13
{'loss': 1.2148, 'learning_rate': 1.1769370743583957e-05, 'epoch': 0.65}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-10 19:56:08,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1383.37 | bwd_inner_microstep: 1383.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3971
[2024-06-10 19:56:11,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.43 | bwd_microstep: 1705.89 | bwd_inner_microstep: 1705.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 19:56:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1377.04 | bwd_inner_microstep: 1377.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3844
[2024-06-10 19:56:15,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.26 | bwd_microstep: 1425.28 | bwd_inner_microstep: 1425.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 19:56:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.14 | bwd_microstep: 807.80 | bwd_inner_microstep: 807.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 770
[2024-06-10 19:56:16,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.70 | bwd_microstep: 304.55 | bwd_inner_microstep: 304.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 19:56:18,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1246.44 | bwd_inner_microstep: 1246.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 19:56:20,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 19:56:22,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1350.89 | bwd_inner_microstep: 1350.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 19:56:23,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-10 19:56:24,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.44 | bwd_microstep: 730.30 | bwd_inner_microstep: 730.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2116
[2024-06-10 19:56:26,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.00 | bwd_microstep: 957.34 | bwd_inner_microstep: 957.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521
[2024-06-10 19:56:28,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.75 | bwd_microstep: 1451.43 | bwd_inner_microstep: 1451.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 19:56:30,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1485.24 | bwd_inner_microstep: 1485.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3821
[2024-06-10 19:56:32,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1478.98 | bwd_inner_microstep: 1478.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 19:56:34,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 1559.34 | bwd_inner_microstep: 1559.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644
[2024-06-10 19:56:36,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1514.23 | bwd_inner_microstep: 1514.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480
[2024-06-10 19:56:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.07 | bwd_microstep: 1215.60 | bwd_inner_microstep: 1215.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-10 19:56:40,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1511.17 | bwd_inner_microstep: 1511.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-10 19:56:42,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1611.14 | bwd_inner_microstep: 1611.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 19:56:44,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1358.79 | bwd_inner_microstep: 1358.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 19:56:46,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1254.48 | bwd_inner_microstep: 1254.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121
[2024-06-10 19:56:47,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.47 | bwd_microstep: 830.02 | bwd_inner_microstep: 829.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3870
[2024-06-10 19:56:49,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.80 | bwd_microstep: 1568.93 | bwd_inner_microstep: 1568.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 19:56:51,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.86 | bwd_microstep: 1287.22 | bwd_inner_microstep: 1287.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178
[2024-06-10 19:56:52,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.69 | bwd_microstep: 856.71 | bwd_inner_microstep: 856.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652
[2024-06-10 19:56:54,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.58 | bwd_microstep: 1444.63 | bwd_inner_microstep: 1444.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3457
[2024-06-10 19:56:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1498.99 | bwd_inner_microstep: 1498.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004
[2024-06-10 19:56:57,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.47 | bwd_microstep: 894.20 | bwd_inner_microstep: 894.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3592
[2024-06-10 19:57:00,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.41 | bwd_microstep: 1805.83 | bwd_inner_microstep: 1805.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 19:57:02,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1425.20 | bwd_inner_microstep: 1425.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775
[2024-06-10 19:57:08,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.28 | optimizer_step: 6.58
[2024-06-10 19:57:08,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.21 | bwd_microstep: 5254.22 | bwd_inner_microstep: 1668.67 | bwd_allreduce_microstep: 3585.49 | step_microstep: 38.07
[2024-06-10 19:57:08,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15523.31 | bwd: 45271.58 | bwd_inner: 41685.17 | bwd_allreduce: 3585.72 | step: 39.61
{'loss': 1.1815, 'learning_rate': 1.1735176971333115e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 19:57:10,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.33 | bwd_microstep: 1462.77 | bwd_inner_microstep: 1462.71 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4017
[2024-06-10 19:57:12,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.95 | bwd_microstep: 1609.27 | bwd_inner_microstep: 1609.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 19:57:14,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1485.19 | bwd_inner_microstep: 1485.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 19:57:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1653.04 | bwd_inner_microstep: 1653.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-10 19:57:18,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1641.75 | bwd_inner_microstep: 1641.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 19:57:20,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.10 | bwd_microstep: 1448.87 | bwd_inner_microstep: 1448.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 19:57:22,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1247.28 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-10 19:57:24,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.12 | bwd_microstep: 1629.52 | bwd_inner_microstep: 1629.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3695
[2024-06-10 19:57:26,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.44 | bwd_microstep: 1587.82 | bwd_inner_microstep: 1587.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664
[2024-06-10 19:57:29,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.15 | bwd_microstep: 1717.32 | bwd_inner_microstep: 1717.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671
[2024-06-10 19:57:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1324.38 | bwd_inner_microstep: 1324.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3406
[2024-06-10 19:57:33,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1392.58 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1954
[2024-06-10 19:57:34,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.05 | bwd_microstep: 920.36 | bwd_inner_microstep: 920.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-10 19:57:36,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.83 | bwd_microstep: 1521.07 | bwd_inner_microstep: 1521.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 19:57:38,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1252.84 | bwd_inner_microstep: 1252.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818
[2024-06-10 19:57:40,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1386.71 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 19:57:42,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.48 | bwd_microstep: 1521.14 | bwd_inner_microstep: 1521.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141
[2024-06-10 19:57:43,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.10 | bwd_microstep: 833.75 | bwd_inner_microstep: 833.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 19:57:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 1348.94 | bwd_inner_microstep: 1348.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178
[2024-06-10 19:57:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.19 | bwd_microstep: 858.64 | bwd_inner_microstep: 858.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-10 19:57:47,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.97 | bwd_microstep: 800.82 | bwd_inner_microstep: 800.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 19:57:49,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.40 | bwd_microstep: 1452.73 | bwd_inner_microstep: 1452.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 19:57:51,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.55 | bwd_microstep: 1286.94 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-10 19:57:53,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1517.64 | bwd_inner_microstep: 1517.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 19:57:55,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1295.24 | bwd_inner_microstep: 1295.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2271
[2024-06-10 19:57:56,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.10 | bwd_microstep: 810.53 | bwd_inner_microstep: 810.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 19:57:58,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 19:58:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.40 | bwd_microstep: 1456.00 | bwd_inner_microstep: 1455.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2040
[2024-06-10 19:58:01,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.55 | bwd_microstep: 872.64 | bwd_inner_microstep: 872.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 19:58:02,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.94 | bwd_microstep: 790.03 | bwd_inner_microstep: 790.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3586
[2024-06-10 19:58:04,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.33 | bwd_microstep: 1565.00 | bwd_inner_microstep: 1564.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3748
[2024-06-10 19:58:08,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 19:58:08,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 2907.84 | bwd_inner_microstep: 1660.63 | bwd_allreduce_microstep: 1247.15 | step_microstep: 38.01
[2024-06-10 19:58:08,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15892.81 | bwd: 44002.15 | bwd_inner: 42754.05 | bwd_allreduce: 1247.40 | step: 39.53
{'loss': 1.2364, 'learning_rate': 1.1701012307746021e-05, 'epoch': 0.65}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1876
[2024-06-10 19:58:09,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.56 | bwd_microstep: 824.18 | bwd_inner_microstep: 824.12 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3921
[2024-06-10 19:58:11,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.10 | bwd_microstep: 1592.14 | bwd_inner_microstep: 1592.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3895
[2024-06-10 19:58:13,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 19:58:15,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1343.58 | bwd_inner_microstep: 1343.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922
[2024-06-10 19:58:16,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.41 | bwd_microstep: 824.61 | bwd_inner_microstep: 824.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2232
[2024-06-10 19:58:17,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.47 | bwd_microstep: 814.69 | bwd_inner_microstep: 814.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 19:58:19,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.64 | bwd_microstep: 1284.26 | bwd_inner_microstep: 1284.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 19:58:21,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1412.38 | bwd_inner_microstep: 1412.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 19:58:23,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 19:58:25,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.70 | bwd_microstep: 1434.24 | bwd_inner_microstep: 1434.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 19:58:26,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 795.49 | bwd_inner_microstep: 795.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446
[2024-06-10 19:58:27,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.90 | bwd_microstep: 1220.36 | bwd_inner_microstep: 1220.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 19:58:29,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.04 | bwd_microstep: 794.84 | bwd_inner_microstep: 794.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2121
[2024-06-10 19:58:30,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.79 | bwd_microstep: 970.50 | bwd_inner_microstep: 970.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2429
[2024-06-10 19:58:31,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.20 | bwd_microstep: 1107.96 | bwd_inner_microstep: 1107.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3662
[2024-06-10 19:58:33,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1447.85 | bwd_inner_microstep: 1447.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 19:58:35,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1281.60 | bwd_inner_microstep: 1281.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2068
[2024-06-10 19:58:36,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.21 | bwd_microstep: 849.45 | bwd_inner_microstep: 849.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538
[2024-06-10 19:58:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.69 | bwd_microstep: 1199.67 | bwd_inner_microstep: 1199.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982
[2024-06-10 19:58:39,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 893.68 | bwd_inner_microstep: 893.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3604
[2024-06-10 19:58:41,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.16 | bwd_microstep: 1370.38 | bwd_inner_microstep: 1370.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 19:58:43,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.56 | bwd_microstep: 1292.45 | bwd_inner_microstep: 1292.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 19:58:45,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1243.10 | bwd_inner_microstep: 1243.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3607
[2024-06-10 19:58:47,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.03 | bwd_microstep: 1359.86 | bwd_inner_microstep: 1359.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3605
[2024-06-10 19:58:49,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.66 | bwd_microstep: 1705.32 | bwd_inner_microstep: 1705.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968
[2024-06-10 19:58:50,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.03 | bwd_microstep: 703.67 | bwd_inner_microstep: 703.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 19:58:52,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1390.21 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 19:58:54,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817
[2024-06-10 19:58:56,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1481.05 | bwd_inner_microstep: 1481.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 19:58:58,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.24 | bwd_microstep: 1405.02 | bwd_inner_microstep: 1405.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 19:59:00,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.77 | bwd_microstep: 1500.63 | bwd_inner_microstep: 1500.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060
[2024-06-10 19:59:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-10 19:59:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.65 | bwd_microstep: 10549.83 | bwd_inner_microstep: 977.33 | bwd_allreduce_microstep: 9572.43 | step_microstep: 38.62
[2024-06-10 19:59:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14449.88 | bwd: 48113.52 | bwd_inner: 38540.11 | bwd_allreduce: 9572.70 | step: 40.13
��███▍   | 1113/1726 [19:16:40<10:29:22, 61.60s/it]


 64%|██████▍   | 1113/1726 [19:16:40<10:29:22, 61.60s/it]
 65%|██████▍   | 1114/1726 [19:17:41<10:27:45, 61.55s/it]


 65%|██████▍   | 1114/1726 [19:17:41<10:27:45, 61.55s/it]
 65%|██████▍   | 1115/1726 [19:18:43<10:27:56, 61.66s/it]


 65%|██████▍   | 1115/1726 [19:18:43<10:27:56, 61.66s/it]
 65%|██████▍   | 1116/1726 [19:19:44<10:25:15, 61.50s/it]


 65%|██████▍   | 1116/1726 [19:19:44<10:25:15, 61.50s/it]
 65%|██████▍   | 1117/1726 [19:20:44<10:20:22, 61.12s/it]


 65%|██████▍   | 1117/1726 [19:20:44<10:20:22, 61.12s/it]
 65%|█████�{'loss': 1.1988, 'learning_rate': 1.166687687315043e-05, 'epoch': 0.65}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 19:59:12,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.78 | bwd_microstep: 1264.35 | bwd_inner_microstep: 1264.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 19:59:14,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 19:59:16,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1372.36 | bwd_inner_microstep: 1372.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 19:59:18,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.77 | bwd_microstep: 1454.07 | bwd_inner_microstep: 1454.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 19:59:20,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.48 | bwd_microstep: 1644.90 | bwd_inner_microstep: 1644.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 19:59:22,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 19:59:24,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1279.39 | bwd_inner_microstep: 1279.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486
[2024-06-10 19:59:26,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.68 | bwd_microstep: 1310.59 | bwd_inner_microstep: 1310.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 743
[2024-06-10 19:59:26,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.30 | bwd_microstep: 298.47 | bwd_inner_microstep: 298.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 19:59:28,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1387.76 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2140
[2024-06-10 19:59:30,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.65 | bwd_microstep: 861.62 | bwd_inner_microstep: 861.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-10 19:59:31,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1281.77 | bwd_inner_microstep: 1281.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 19:59:33,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.46 | bwd_microstep: 1422.41 | bwd_inner_microstep: 1422.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 19:59:35,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1492.98 | bwd_inner_microstep: 1492.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2398
[2024-06-10 19:59:37,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.73 | bwd_microstep: 1035.59 | bwd_inner_microstep: 1035.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2011
[2024-06-10 19:59:38,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.07 | bwd_microstep: 832.05 | bwd_inner_microstep: 832.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3828
[2024-06-10 19:59:40,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.34 | bwd_microstep: 1858.31 | bwd_inner_microstep: 1858.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-10 19:59:42,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1437.17 | bwd_inner_microstep: 1437.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498
[2024-06-10 19:59:44,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.38 | bwd_microstep: 1448.14 | bwd_inner_microstep: 1448.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290
[2024-06-10 19:59:46,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.26 | bwd_microstep: 1070.94 | bwd_inner_microstep: 1070.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 19:59:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.47 | bwd_microstep: 1450.55 | bwd_inner_microstep: 1450.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 19:59:50,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.82 | bwd_microstep: 1357.88 | bwd_inner_microstep: 1357.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 19:59:52,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1277.78 | bwd_inner_microstep: 1277.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538
[2024-06-10 19:59:54,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1421.24 | bwd_inner_microstep: 1421.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061
[2024-06-10 19:59:55,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.09 | bwd_microstep: 913.22 | bwd_inner_microstep: 913.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-10 19:59:57,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.42 | bwd_microstep: 1589.72 | bwd_inner_microstep: 1589.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 19:59:59,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.17 | bwd_microstep: 1439.65 | bwd_inner_microstep: 1439.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 20:00:01,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1657.93 | bwd_inner_microstep: 1657.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 20:00:04,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.48 | bwd_microstep: 1659.45 | bwd_inner_microstep: 1659.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-10 20:00:05,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 811.33 | bwd_inner_microstep: 811.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 20:00:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1298.26 | bwd_inner_microstep: 1298.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 20:00:13,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 20:00:13,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 6466.55 | bwd_inner_microstep: 1580.97 | bwd_allreduce_microstep: 4885.53 | step_microstep: 37.85
[2024-06-10 20:00:14,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15662.55 | bwd: 46861.13 | bwd_inner: 41974.70 | bwd_allreduce: 4885.76 | step: 39.34
{'loss': 1.2173, 'learning_rate': 1.1632770787771167e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 20:00:16,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.17 | bwd_microstep: 1501.97 | bwd_inner_microstep: 1501.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 20:00:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1287.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406
[2024-06-10 20:00:19,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.16 | bwd_microstep: 1204.25 | bwd_inner_microstep: 1204.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-10 20:00:21,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.30 | bwd_microstep: 1656.22 | bwd_inner_microstep: 1656.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 20:00:23,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.20 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 20:00:25,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.62 | bwd_microstep: 1277.21 | bwd_inner_microstep: 1277.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 20:00:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1405.91 | bwd_inner_microstep: 1405.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 20:00:29,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1281.97 | bwd_inner_microstep: 1281.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-10 20:00:30,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.20 | bwd_microstep: 1297.80 | bwd_inner_microstep: 1297.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618
[2024-06-10 20:00:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1312.25 | bwd_inner_microstep: 1312.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082
[2024-06-10 20:00:33,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.98 | bwd_microstep: 859.47 | bwd_inner_microstep: 859.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 20:00:35,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1350.04 | bwd_inner_microstep: 1350.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 20:00:37,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.12 | bwd_microstep: 1340.29 | bwd_inner_microstep: 1340.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3224
[2024-06-10 20:00:39,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.14 | bwd_microstep: 1324.23 | bwd_inner_microstep: 1324.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 20:00:41,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1315.35 | bwd_inner_microstep: 1315.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-10 20:00:43,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1401.66 | bwd_inner_microstep: 1401.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3527
[2024-06-10 20:00:44,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.89 | bwd_microstep: 1195.27 | bwd_inner_microstep: 1195.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 20:00:46,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.31 | bwd_microstep: 1419.65 | bwd_inner_microstep: 1419.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446
[2024-06-10 20:00:48,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.89 | bwd_microstep: 1375.75 | bwd_inner_microstep: 1375.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 20:00:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1256.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 20:00:52,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.91 | bwd_microstep: 1663.31 | bwd_inner_microstep: 1663.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2927
[2024-06-10 20:00:54,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.49 | bwd_microstep: 1253.14 | bwd_inner_microstep: 1253.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 20:00:55,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.90 | bwd_microstep: 974.90 | bwd_inner_microstep: 974.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3816
[2024-06-10 20:00:57,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1584.95 | bwd_inner_microstep: 1584.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224
[2024-06-10 20:00:59,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.33 | bwd_microstep: 963.37 | bwd_inner_microstep: 963.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2538
[2024-06-10 20:01:00,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.76 | bwd_microstep: 966.39 | bwd_inner_microstep: 966.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219
[2024-06-10 20:01:01,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.91 | bwd_microstep: 862.98 | bwd_inner_microstep: 862.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 20:01:03,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.74 | bwd_microstep: 1184.68 | bwd_inner_microstep: 1184.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 20:01:05,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.27 | bwd_microstep: 1460.12 | bwd_inner_microstep: 1460.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2453
[2024-06-10 20:01:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.77 | bwd_microstep: 1047.51 | bwd_inner_microstep: 1047.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-10 20:01:09,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.18 | bwd_microstep: 1700.69 | bwd_inner_microstep: 1700.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 20:01:16,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 20:01:16,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 6960.70 | bwd_inner_microstep: 1685.88 | bwd_allreduce_microstep: 5274.76 | step_microstep: 37.99
[2024-06-10 20:01:16,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15558.80 | bwd: 46969.49 | bwd_inner: 41693.83 | bwd_allreduce: 5274.99 | step: 39.44
{'loss': 1.1683, 'learning_rate': 1.1598694171729703e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 20:01:18,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1462.28 | bwd_inner_microstep: 1462.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 20:01:20,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.24 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842
[2024-06-10 20:01:22,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.58 | bwd_microstep: 1461.33 | bwd_inner_microstep: 1461.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857
[2024-06-10 20:01:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.10 | bwd_microstep: 677.84 | bwd_inner_microstep: 677.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2631
[2024-06-10 20:01:25,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.20 | bwd_microstep: 919.15 | bwd_inner_microstep: 919.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 20:01:27,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.00 | bwd_microstep: 1440.32 | bwd_inner_microstep: 1440.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 20:01:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.54 | bwd_microstep: 1284.46 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 20:01:30,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1279.76 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 20:01:31,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.68 | bwd_microstep: 793.79 | bwd_inner_microstep: 793.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 20:01:33,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1181.36 | bwd_inner_microstep: 1181.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692
[2024-06-10 20:01:35,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1622.52 | bwd_inner_microstep: 1622.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3419
[2024-06-10 20:01:37,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.72 | bwd_microstep: 1309.87 | bwd_inner_microstep: 1309.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-10 20:01:39,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1359.83 | bwd_inner_microstep: 1359.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1945
[2024-06-10 20:01:40,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.52 | bwd_microstep: 885.43 | bwd_inner_microstep: 885.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 20:01:42,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.76 | bwd_microstep: 1595.97 | bwd_inner_microstep: 1595.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3639
[2024-06-10 20:01:45,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.28 | bwd_microstep: 1706.80 | bwd_inner_microstep: 1706.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107
[2024-06-10 20:01:46,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.34 | bwd_microstep: 918.06 | bwd_inner_microstep: 918.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2110
[2024-06-10 20:01:47,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.90 | bwd_microstep: 856.36 | bwd_inner_microstep: 856.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459
[2024-06-10 20:01:49,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.81 | bwd_microstep: 1569.47 | bwd_inner_microstep: 1569.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2893
[2024-06-10 20:01:51,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.85 | bwd_microstep: 1087.14 | bwd_inner_microstep: 1087.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-10 20:01:53,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.88 | bwd_microstep: 1430.06 | bwd_inner_microstep: 1430.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 20:01:55,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.85 | bwd_microstep: 1375.59 | bwd_inner_microstep: 1375.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 20:01:57,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1495.18 | bwd_inner_microstep: 1495.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145
[2024-06-10 20:01:58,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.25 | bwd_microstep: 850.53 | bwd_inner_microstep: 850.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 20:02:00,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.39 | bwd_microstep: 1304.82 | bwd_inner_microstep: 1304.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2183
[2024-06-10 20:02:01,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.23 | bwd_microstep: 857.32 | bwd_inner_microstep: 857.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2285
[2024-06-10 20:02:02,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.01 | bwd_microstep: 784.57 | bwd_inner_microstep: 784.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 20:02:04,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.67 | bwd_microstep: 1505.56 | bwd_inner_microstep: 1505.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 20:02:06,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.35 | bwd_microstep: 1659.32 | bwd_inner_microstep: 1659.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 20:02:09,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1589.97 | bwd_inner_microstep: 1589.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 20:02:11,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.50 | bwd_microstep: 1471.36 | bwd_inner_microstep: 1471.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 20:02:17,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 20:02:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.20 | bwd_microstep: 5540.29 | bwd_inner_microstep: 1437.10 | bwd_allreduce_microstep: 4103.14 | step_microstep: 38.01
[2024-06-10 20:02:17,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15195.53 | bwd: 44750.34 | bwd_inner: 40646.30 | bwd_allreduce: 4103.37 | step: 39.47
{'loss': 1.1936, 'learning_rate': 1.156464714504369e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 20:02:19,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.89 | bwd_microstep: 1471.21 | bwd_inner_microstep: 1471.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3863
[2024-06-10 20:02:21,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1360.59 | bwd_inner_microstep: 1360.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-10 20:02:22,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.47 | bwd_microstep: 1310.49 | bwd_inner_microstep: 1310.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 20:02:24,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1245.79 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 20:02:26,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3716
[2024-06-10 20:02:28,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1267.22 | bwd_inner_microstep: 1267.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 20:02:29,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1248.29 | bwd_inner_microstep: 1248.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 20:02:31,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1279.12 | bwd_inner_microstep: 1279.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-10 20:02:33,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.11 | bwd_microstep: 1436.76 | bwd_inner_microstep: 1436.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489
[2024-06-10 20:02:35,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1442.85 | bwd_inner_microstep: 1442.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3387
[2024-06-10 20:02:37,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1290.32 | bwd_inner_microstep: 1290.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 20:02:39,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.44 | bwd_microstep: 1339.46 | bwd_inner_microstep: 1339.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 20:02:41,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.59 | bwd_microstep: 1516.44 | bwd_inner_microstep: 1516.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3666
[2024-06-10 20:02:43,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.46 | bwd_microstep: 1820.31 | bwd_inner_microstep: 1820.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-10 20:02:44,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.56 | bwd_microstep: 798.21 | bwd_inner_microstep: 798.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1970
[2024-06-10 20:02:45,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.66 | bwd_microstep: 765.43 | bwd_inner_microstep: 765.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912
[2024-06-10 20:02:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.39 | bwd_microstep: 685.35 | bwd_inner_microstep: 685.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-10 20:02:48,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1526.41 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 20:02:50,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1284.62 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2011
[2024-06-10 20:02:51,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.08 | bwd_microstep: 739.04 | bwd_inner_microstep: 739.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3550
[2024-06-10 20:02:53,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.37 | bwd_microstep: 1205.69 | bwd_inner_microstep: 1205.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 20:02:55,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1643.23 | bwd_inner_microstep: 1643.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 20:02:57,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1400.58 | bwd_inner_microstep: 1400.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548
[2024-06-10 20:02:59,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.18 | bwd_microstep: 1427.67 | bwd_inner_microstep: 1427.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 20:03:01,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1395.63 | bwd_inner_microstep: 1395.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3808
[2024-06-10 20:03:03,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1615.65 | bwd_inner_microstep: 1615.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3909
[2024-06-10 20:03:05,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1394.63 | bwd_inner_microstep: 1394.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560
[2024-06-10 20:03:07,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1421.79 | bwd_inner_microstep: 1421.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3734
[2024-06-10 20:03:09,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.84 | bwd_microstep: 1562.78 | bwd_inner_microstep: 1562.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 20:03:11,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1507.64 | bwd_inner_microstep: 1507.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035
[2024-06-10 20:03:13,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.06 | bwd_microstep: 808.65 | bwd_inner_microstep: 808.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 20:03:16,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-10 20:03:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 3005.55 | bwd_inner_microstep: 1417.74 | bwd_allreduce_microstep: 1587.76 | step_microstep: 37.78
[2024-06-10 20:03:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15660.89 | bwd: 43465.37 | bwd_inner: 41876.71 | bwd_allreduce: 1587.98 | step: 39.24
{'loss': 1.2184, 'learning_rate': 1.1530629827626583e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 20:03:18,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1473.13 | bwd_inner_microstep: 1473.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475
[2024-06-10 20:03:20,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.62 | bwd_microstep: 1341.96 | bwd_inner_microstep: 1341.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4176
[2024-06-10 20:03:22,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.47 | bwd_microstep: 1752.12 | bwd_inner_microstep: 1752.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 20:03:25,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1550.08 | bwd_inner_microstep: 1550.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 20:03:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.29 | bwd_microstep: 1282.00 | bwd_inner_microstep: 1281.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 20:03:28,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.55 | bwd_microstep: 1150.48 | bwd_inner_microstep: 1150.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 20:03:29,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.64 | bwd_microstep: 793.45 | bwd_inner_microstep: 793.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 20:03:31,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.07 | bwd_microstep: 1386.74 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 20:03:33,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.64 | bwd_inner_microstep: 1253.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3496
[2024-06-10 20:03:35,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.58 | bwd_microstep: 1510.03 | bwd_inner_microstep: 1510.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 20:03:37,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092
[2024-06-10 20:03:38,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.18 | bwd_microstep: 920.33 | bwd_inner_microstep: 920.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-10 20:03:40,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.75 | bwd_microstep: 1624.68 | bwd_inner_microstep: 1624.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1976
[2024-06-10 20:03:41,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.73 | bwd_microstep: 854.14 | bwd_inner_microstep: 854.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 20:03:43,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3890
[2024-06-10 20:03:45,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1497.11 | bwd_inner_microstep: 1497.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 20:03:47,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1527.09 | bwd_inner_microstep: 1527.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3638
[2024-06-10 20:03:50,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1568.61 | bwd_inner_microstep: 1568.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3699
[2024-06-10 20:03:52,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.88 | bwd_microstep: 1726.41 | bwd_inner_microstep: 1726.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3628
[2024-06-10 20:03:54,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.11 | bwd_microstep: 1711.97 | bwd_inner_microstep: 1711.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3678
[2024-06-10 20:03:56,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1328.04 | bwd_inner_microstep: 1328.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 20:03:58,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1512.72 | bwd_inner_microstep: 1512.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 20:04:00,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1460.95 | bwd_inner_microstep: 1460.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 20:04:02,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1412.61 | bwd_inner_microstep: 1412.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-10 20:04:03,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.81 | bwd_microstep: 797.06 | bwd_inner_microstep: 797.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 20:04:05,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.88 | bwd_microstep: 1535.73 | bwd_inner_microstep: 1535.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 917
[2024-06-10 20:04:06,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.47 | bwd_microstep: 374.65 | bwd_inner_microstep: 374.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2034
[2024-06-10 20:04:07,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.98 | bwd_microstep: 744.64 | bwd_inner_microstep: 744.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 20:04:09,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1493.62 | bwd_inner_microstep: 1493.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-10 20:04:11,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.45 | bwd_microstep: 1484.57 | bwd_inner_microstep: 1484.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 20:04:13,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.20 | bwd_microstep: 1639.31 | bwd_inner_microstep: 1639.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831
[2024-06-10 20:04:16,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 20:04:16,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.34 | bwd_microstep: 1668.42 | bwd_inner_microstep: 1572.77 | bwd_allreduce_microstep: 95.60 | step_microstep: 37.46
[2024-06-10 20:04:16,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16031.78 | bwd: 43108.22 | bwd_inner: 43011.69 | bwd_allreduce: 95.83 | step: 39.05
��▍   | 1118/1726 [19:21:47<10:24:43, 61.65s/it]


 65%|██████▍   | 1118/1726 [19:21:47<10:24:43, 61.65s/it]
 65%|██████▍   | 1119/1726 [19:22:50<10:27:21, 62.01s/it]


 65%|██████▍   | 1119/1726 [19:22:50<10:27:21, 62.01s/it]
 65%|██████▍   | 1120/1726 [19:23:53<10:28:52, 62.26s/it]


 65%|██████▍   | 1120/1726 [19:23:53<10:28:52, 62.26s/it]
 65%|██████▍   | 1121/1726 [19:24:53<10:21:48, 61.67s/it]


 65%|██████▍   | 1121/1726 [19:24:53<10:21:48, 61.67s/it]
 65%|██████▌   | 1122/1726 [19:25:53<10:14:05, 61.00s/it]


 65%|██████▌   | 1122/1726 [19:25:53<10:14:05, 61.00s/it]
 65%|██████▌   |{'loss': 1.159, 'learning_rate': 1.1496642339287191e-05, 'epoch': 0.65}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 20:04:17,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1278.26 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 20:04:19,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1245.30 | bwd_inner_microstep: 1245.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471
[2024-06-10 20:04:21,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.84 | bwd_microstep: 1441.78 | bwd_inner_microstep: 1441.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 20:04:23,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2225
[2024-06-10 20:04:24,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.58 | bwd_microstep: 864.12 | bwd_inner_microstep: 864.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 20:04:26,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 20:04:28,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1248.33 | bwd_inner_microstep: 1248.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 20:04:30,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.59 | bwd_microstep: 1389.09 | bwd_inner_microstep: 1389.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 20:04:31,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.84 | bwd_microstep: 799.03 | bwd_inner_microstep: 799.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1914
[2024-06-10 20:04:32,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.07 | bwd_microstep: 731.96 | bwd_inner_microstep: 731.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1943
[2024-06-10 20:04:33,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.84 | bwd_microstep: 848.25 | bwd_inner_microstep: 848.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3665
[2024-06-10 20:04:35,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.82 | bwd_microstep: 1580.43 | bwd_inner_microstep: 1580.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-10 20:04:37,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.60 | bwd_microstep: 1719.81 | bwd_inner_microstep: 1719.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-10 20:04:39,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1403.50 | bwd_inner_microstep: 1403.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 20:04:41,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.00 | bwd_microstep: 1374.00 | bwd_inner_microstep: 1373.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 20:04:43,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454
[2024-06-10 20:04:45,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.85 | bwd_microstep: 1315.58 | bwd_inner_microstep: 1315.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478
[2024-06-10 20:04:47,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.17 | bwd_microstep: 1436.14 | bwd_inner_microstep: 1436.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 20:04:48,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 801.06 | bwd_inner_microstep: 801.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 20:04:50,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.35 | bwd_microstep: 1381.61 | bwd_inner_microstep: 1381.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-10 20:04:52,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.18 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3824
[2024-06-10 20:04:54,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.84 | bwd_microstep: 1259.02 | bwd_inner_microstep: 1258.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 20:04:55,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.31 | bwd_microstep: 1403.37 | bwd_inner_microstep: 1403.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1912
[2024-06-10 20:04:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.82 | bwd_microstep: 750.33 | bwd_inner_microstep: 750.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 20:04:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1398.62 | bwd_inner_microstep: 1398.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825
[2024-06-10 20:05:01,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.69 | bwd_microstep: 1584.53 | bwd_inner_microstep: 1584.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233
[2024-06-10 20:05:02,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.28 | bwd_microstep: 865.39 | bwd_inner_microstep: 865.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 20:05:04,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1553.57 | bwd_inner_microstep: 1553.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3432
[2024-06-10 20:05:06,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 20:05:08,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1251.11 | bwd_inner_microstep: 1251.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 20:05:10,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.54 | bwd_microstep: 1479.67 | bwd_inner_microstep: 1479.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-10 20:05:19,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.37 | optimizer_step: 6.59
[2024-06-10 20:05:19,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.30 | bwd_microstep: 8316.65 | bwd_inner_microstep: 1824.57 | bwd_allreduce_microstep: 6492.02 | step_microstep: 40.28
[2024-06-10 20:05:19,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15290.42 | bwd: 47438.16 | bwd_inner: 40945.05 | bwd_allreduce: 6492.35 | step: 41.84
{'loss': 1.2558, 'learning_rate': 1.1462684799729272e-05, 'epoch': 0.65}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 20:05:21,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.28 | bwd_microstep: 1581.92 | bwd_inner_microstep: 1581.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 20:05:23,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.79 | bwd_microstep: 1275.66 | bwd_inner_microstep: 1275.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 20:05:24,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.54 | bwd_microstep: 1199.22 | bwd_inner_microstep: 1199.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 20:05:26,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1479.37 | bwd_inner_microstep: 1479.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-10 20:05:27,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.71 | bwd_microstep: 808.03 | bwd_inner_microstep: 808.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 20:05:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.39 | bwd_microstep: 1277.23 | bwd_inner_microstep: 1277.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4096
[2024-06-10 20:05:31,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.45 | bwd_microstep: 1622.81 | bwd_inner_microstep: 1622.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 768
[2024-06-10 20:05:32,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.15 | bwd_microstep: 303.22 | bwd_inner_microstep: 303.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 20:05:34,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.09 | bwd_microstep: 1246.98 | bwd_inner_microstep: 1246.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 20:05:35,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1383.89 | bwd_inner_microstep: 1383.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3726
[2024-06-10 20:05:38,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1557.02 | bwd_inner_microstep: 1557.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 20:05:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.61 | bwd_microstep: 1417.95 | bwd_inner_microstep: 1417.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513
[2024-06-10 20:05:42,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1431.39 | bwd_inner_microstep: 1431.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 20:05:43,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.18 | bwd_microstep: 1273.97 | bwd_inner_microstep: 1273.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3442
[2024-06-10 20:05:45,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1297.80 | bwd_inner_microstep: 1297.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 20:05:47,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1396.11 | bwd_inner_microstep: 1396.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2065
[2024-06-10 20:05:48,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 913.95 | bwd_inner_microstep: 913.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 20:05:50,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1396.46 | bwd_inner_microstep: 1396.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 20:05:52,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1388.32 | bwd_inner_microstep: 1388.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 20:05:54,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1275.86 | bwd_inner_microstep: 1275.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 557
[2024-06-10 20:05:54,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 98.53 | bwd_microstep: 248.52 | bwd_inner_microstep: 248.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3543
[2024-06-10 20:05:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.32 | bwd_microstep: 1358.61 | bwd_inner_microstep: 1358.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610
[2024-06-10 20:05:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1338.35 | bwd_inner_microstep: 1338.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 20:06:00,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.33 | bwd_microstep: 1303.84 | bwd_inner_microstep: 1303.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3558
[2024-06-10 20:06:02,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1478.63 | bwd_inner_microstep: 1478.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819
[2024-06-10 20:06:04,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 1690.05 | bwd_inner_microstep: 1690.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 20:06:06,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1476.46 | bwd_inner_microstep: 1476.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3561
[2024-06-10 20:06:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 1664.72 | bwd_inner_microstep: 1664.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 20:06:10,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1375.51 | bwd_inner_microstep: 1375.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3780
[2024-06-10 20:06:12,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3405
[2024-06-10 20:06:14,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.14 | bwd_microstep: 1208.99 | bwd_inner_microstep: 1208.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 20:06:18,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.94 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-10 20:06:18,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.37 | bwd_microstep: 3509.87 | bwd_inner_microstep: 1408.28 | bwd_allreduce_microstep: 2101.54 | step_microstep: 39.04
[2024-06-10 20:06:18,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15529.91 | bwd: 43565.45 | bwd_inner: 41463.01 | bwd_allreduce: 2101.76 | step: 40.54
{'loss': 1.1775, 'learning_rate': 1.142875732855111e-05, 'epoch': 0.65}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 20:06:20,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.43 | bwd_microstep: 1338.87 | bwd_inner_microstep: 1338.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 20:06:22,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1272.57 | bwd_inner_microstep: 1272.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 20:06:23,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.91 | bwd_microstep: 1181.74 | bwd_inner_microstep: 1181.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-10 20:06:25,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1409.25 | bwd_inner_microstep: 1409.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 20:06:27,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.51 | bwd_microstep: 1501.85 | bwd_inner_microstep: 1501.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 20:06:28,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.47 | bwd_microstep: 677.42 | bwd_inner_microstep: 677.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 20:06:30,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1243.05 | bwd_inner_microstep: 1243.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 20:06:32,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 20:06:34,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1533.46 | bwd_inner_microstep: 1533.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 20:06:36,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.65 | bwd_microstep: 1277.60 | bwd_inner_microstep: 1277.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089
[2024-06-10 20:06:37,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.68 | bwd_microstep: 853.19 | bwd_inner_microstep: 853.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3835
[2024-06-10 20:06:39,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.93 | bwd_microstep: 1690.15 | bwd_inner_microstep: 1690.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-10 20:06:41,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-10 20:06:42,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.34 | bwd_microstep: 889.38 | bwd_inner_microstep: 889.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2663
[2024-06-10 20:06:44,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.51 | bwd_microstep: 1150.30 | bwd_inner_microstep: 1150.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135
[2024-06-10 20:06:45,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.70 | bwd_microstep: 827.79 | bwd_inner_microstep: 827.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1993
[2024-06-10 20:06:46,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.03 | bwd_microstep: 895.92 | bwd_inner_microstep: 895.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 20:06:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.60 | bwd_microstep: 696.35 | bwd_inner_microstep: 696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644
[2024-06-10 20:06:49,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1510.02 | bwd_inner_microstep: 1509.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031
[2024-06-10 20:06:51,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.59 | bwd_microstep: 806.12 | bwd_inner_microstep: 806.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 20:06:52,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1394.16 | bwd_inner_microstep: 1394.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529
[2024-06-10 20:06:54,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.03 | bwd_microstep: 1196.11 | bwd_inner_microstep: 1196.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 20:06:56,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.12 | bwd_microstep: 1190.92 | bwd_inner_microstep: 1190.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 20:06:58,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1455.73 | bwd_inner_microstep: 1455.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 20:07:00,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 20:07:02,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.76 | bwd_microstep: 1401.42 | bwd_inner_microstep: 1401.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 20:07:04,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1453.77 | bwd_inner_microstep: 1453.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 20:07:06,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1495.46 | bwd_inner_microstep: 1495.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 20:07:07,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1347.78 | bwd_inner_microstep: 1347.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 20:07:10,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.64 | bwd_microstep: 1647.69 | bwd_inner_microstep: 1647.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 20:07:12,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.58 | bwd_microstep: 1402.73 | bwd_inner_microstep: 1402.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 20:07:18,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 20:07:18,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 5667.29 | bwd_inner_microstep: 1417.76 | bwd_allreduce_microstep: 4249.48 | step_microstep: 38.33
[2024-06-10 20:07:18,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15042.39 | bwd: 44440.94 | bwd_inner: 40190.55 | bwd_allreduce: 4249.71 | step: 39.80
{'loss': 1.2329, 'learning_rate': 1.1394860045245084e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 20:07:20,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.65 | bwd_microstep: 1469.35 | bwd_inner_microstep: 1469.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621
[2024-06-10 20:07:22,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1443.84 | bwd_inner_microstep: 1443.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2308
[2024-06-10 20:07:23,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.67 | bwd_microstep: 882.98 | bwd_inner_microstep: 882.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 20:07:25,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2890
[2024-06-10 20:07:27,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.40 | bwd_microstep: 1234.89 | bwd_inner_microstep: 1234.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 20:07:29,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.85 | bwd_microstep: 1242.38 | bwd_inner_microstep: 1242.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 20:07:30,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 790.79 | bwd_inner_microstep: 790.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 20:07:32,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1387.19 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1922
[2024-06-10 20:07:33,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.88 | bwd_microstep: 879.13 | bwd_inner_microstep: 879.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3694
[2024-06-10 20:07:35,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.48 | bwd_microstep: 1359.42 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490
[2024-06-10 20:07:37,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1575.09 | bwd_inner_microstep: 1575.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3119
[2024-06-10 20:07:39,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1246.51 | bwd_inner_microstep: 1246.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 20:07:40,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1340.92 | bwd_inner_microstep: 1340.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 20:07:42,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3645
[2024-06-10 20:07:45,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.31 | bwd_microstep: 1639.21 | bwd_inner_microstep: 1639.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 20:07:47,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 1408.44 | bwd_inner_microstep: 1408.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2433
[2024-06-10 20:07:48,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.42 | bwd_microstep: 851.36 | bwd_inner_microstep: 851.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 20:07:50,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.40 | bwd_microstep: 1556.53 | bwd_inner_microstep: 1556.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 20:07:52,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1384.04 | bwd_inner_microstep: 1384.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-10 20:07:53,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.51 | bwd_microstep: 976.02 | bwd_inner_microstep: 975.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 20:07:55,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1254.79 | bwd_inner_microstep: 1254.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-10 20:07:56,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.19 | bwd_microstep: 1152.74 | bwd_inner_microstep: 1152.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-10 20:07:58,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-10 20:08:00,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1319.44 | bwd_inner_microstep: 1319.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-10 20:08:02,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 1594.27 | bwd_inner_microstep: 1594.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 20:08:04,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.29 | bwd_microstep: 1300.63 | bwd_inner_microstep: 1300.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 20:08:07,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.08 | bwd_microstep: 1657.71 | bwd_inner_microstep: 1657.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 20:08:09,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.99 | bwd_microstep: 1539.26 | bwd_inner_microstep: 1539.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 20:08:11,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1642.58 | bwd_inner_microstep: 1642.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 20:08:13,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1646.55 | bwd_inner_microstep: 1646.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 20:08:15,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.07 | bwd_microstep: 1496.46 | bwd_inner_microstep: 1496.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 20:08:19,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.03 | optimizer_step: 6.58
[2024-06-10 20:08:19,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.00 | bwd_microstep: 2894.49 | bwd_inner_microstep: 1521.64 | bwd_allreduce_microstep: 1372.81 | step_microstep: 37.55
[2024-06-10 20:08:19,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16026.66 | bwd: 44441.46 | bwd_inner: 43067.76 | bwd_allreduce: 1373.04 | step: 39.03
{'loss': 1.1706, 'learning_rate': 1.1360993069197241e-05, 'epoch': 0.65}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 20:08:20,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.75 | bwd_microstep: 1270.57 | bwd_inner_microstep: 1270.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 20:08:22,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1345.61 | bwd_inner_microstep: 1345.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3847
[2024-06-10 20:08:24,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1390.67 | bwd_inner_microstep: 1390.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 20:08:26,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1245.65 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 20:08:28,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1276.98 | bwd_inner_microstep: 1276.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 20:08:30,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1649.22 | bwd_inner_microstep: 1649.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 20:08:32,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.42 | bwd_microstep: 1187.05 | bwd_inner_microstep: 1187.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 761
[2024-06-10 20:08:32,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.97 | bwd_microstep: 301.58 | bwd_inner_microstep: 301.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-10 20:08:34,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.53 | bwd_microstep: 1527.60 | bwd_inner_microstep: 1527.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 20:08:36,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1313.55 | bwd_inner_microstep: 1313.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492
[2024-06-10 20:08:38,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1441.49 | bwd_inner_microstep: 1441.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113
[2024-06-10 20:08:39,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.88 | bwd_microstep: 825.85 | bwd_inner_microstep: 825.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 20:08:41,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1340.11 | bwd_inner_microstep: 1340.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 20:08:43,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.00 | bwd_microstep: 1479.29 | bwd_inner_microstep: 1479.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527
[2024-06-10 20:08:45,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.30 | bwd_microstep: 1591.13 | bwd_inner_microstep: 1591.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657
[2024-06-10 20:08:47,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.22 | bwd_microstep: 1474.91 | bwd_inner_microstep: 1474.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970
[2024-06-10 20:08:48,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.74 | bwd_microstep: 831.27 | bwd_inner_microstep: 831.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 20:08:50,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1352.91 | bwd_inner_microstep: 1352.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-10 20:08:52,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.24 | bwd_microstep: 1309.91 | bwd_inner_microstep: 1309.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3654
[2024-06-10 20:08:54,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1289.89 | bwd_inner_microstep: 1289.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3671
[2024-06-10 20:08:56,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1457.23 | bwd_inner_microstep: 1457.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 20:08:58,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1416.05 | bwd_inner_microstep: 1416.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 20:09:00,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.77 | bwd_microstep: 1419.08 | bwd_inner_microstep: 1419.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3576
[2024-06-10 20:09:02,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.55 | bwd_microstep: 1308.08 | bwd_inner_microstep: 1308.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-10 20:09:03,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.91 | bwd_microstep: 1357.39 | bwd_inner_microstep: 1357.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515
[2024-06-10 20:09:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.10 | bwd_microstep: 1222.65 | bwd_inner_microstep: 1222.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3058
[2024-06-10 20:09:07,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.98 | bwd_microstep: 1234.00 | bwd_inner_microstep: 1233.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 20:09:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1433.75 | bwd_inner_microstep: 1433.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-10 20:09:11,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1538.74 | bwd_inner_microstep: 1538.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 20:09:13,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.44 | bwd_microstep: 1650.48 | bwd_inner_microstep: 1650.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2241
[2024-06-10 20:09:14,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.64 | bwd_microstep: 845.64 | bwd_inner_microstep: 845.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-10 20:09:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-10 20:09:19,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.15 | bwd_microstep: 4301.39 | bwd_inner_microstep: 1563.05 | bwd_allreduce_microstep: 2738.28 | step_microstep: 37.74
[2024-06-10 20:09:19,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15654.24 | bwd: 44629.74 | bwd_inner: 41890.55 | bwd_allreduce: 2738.51 | step: 39.31
 1123/1726 [19:26:52<10:08:30, 60.55s/it]


 65%|██████▌   | 1123/1726 [19:26:52<10:08:30, 60.55s/it]
 65%|██████▌   | 1124/1726 [19:27:55<10:15:05, 61.31s/it]


 65%|██████▌   | 1124/1726 [19:27:55<10:15:05, 61.31s/it]
 65%|██████▌   | 1125/1726 [19:28:55<10:08:25, 60.74s/it]


 65%|██████▌   | 1125/1726 [19:28:55<10:08:25, 60.74s/it]
 65%|██████▌   | 1126/1726 [19:29:55<10:04:35, 60.46s/it]


 65%|██████▌   | 1126/1726 [19:29:55<10:04:35, 60.46s/it]
 65%|██████▌   | 1127/1726 [19:30:55<10:04:37, 60.56s/it]


 65%|██████▌   | 1127/1726 [19:30:55<10:04:37, 60.56s/it]
 65%|██████▌   | 1128/172{'loss': 1.2433, 'learning_rate': 1.1327156519686896e-05, 'epoch': 0.65}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 20:09:21,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.02 | bwd_microstep: 1434.85 | bwd_inner_microstep: 1434.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3888
[2024-06-10 20:09:24,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.76 | bwd_microstep: 1677.42 | bwd_inner_microstep: 1677.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 20:09:25,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.11 | bwd_microstep: 1340.81 | bwd_inner_microstep: 1340.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 20:09:28,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.17 | bwd_microstep: 1549.26 | bwd_inner_microstep: 1549.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 20:09:29,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1375.73 | bwd_inner_microstep: 1375.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2590
[2024-06-10 20:09:31,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.37 | bwd_microstep: 1007.97 | bwd_inner_microstep: 1007.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 20:09:33,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1380.93 | bwd_inner_microstep: 1380.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 20:09:34,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 799.13 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2929
[2024-06-10 20:09:35,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.18 | bwd_microstep: 1093.32 | bwd_inner_microstep: 1093.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2170
[2024-06-10 20:09:37,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.84 | bwd_microstep: 948.68 | bwd_inner_microstep: 948.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 20:09:38,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1289.46 | bwd_inner_microstep: 1289.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 725
[2024-06-10 20:09:39,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.90 | bwd_microstep: 294.30 | bwd_inner_microstep: 294.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942
[2024-06-10 20:09:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.08 | bwd_microstep: 825.07 | bwd_inner_microstep: 825.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656
[2024-06-10 20:09:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.43 | bwd_microstep: 1513.71 | bwd_inner_microstep: 1513.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3475
[2024-06-10 20:09:44,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.32 | bwd_microstep: 1577.24 | bwd_inner_microstep: 1577.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-10 20:09:46,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1439.66 | bwd_inner_microstep: 1439.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-10 20:09:47,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 699.06 | bwd_inner_microstep: 699.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-10 20:09:49,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 20:09:51,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.39 | bwd_microstep: 1159.03 | bwd_inner_microstep: 1159.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 20:09:53,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.82 | bwd_microstep: 1454.64 | bwd_inner_microstep: 1454.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2087
[2024-06-10 20:09:54,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.12 | bwd_microstep: 822.25 | bwd_inner_microstep: 822.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4010
[2024-06-10 20:09:57,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.20 | bwd_microstep: 1920.73 | bwd_inner_microstep: 1920.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3523
[2024-06-10 20:09:59,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.28 | bwd_microstep: 1516.32 | bwd_inner_microstep: 1516.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 636
[2024-06-10 20:09:59,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.08 | bwd_microstep: 264.98 | bwd_inner_microstep: 264.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1983
[2024-06-10 20:10:00,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.87 | bwd_microstep: 832.22 | bwd_inner_microstep: 832.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-10 20:10:02,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1441.70 | bwd_inner_microstep: 1441.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834
[2024-06-10 20:10:05,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.95 | bwd_microstep: 1756.88 | bwd_inner_microstep: 1756.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563
[2024-06-10 20:10:07,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.24 | bwd_microstep: 1594.61 | bwd_inner_microstep: 1594.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 20:10:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.54 | bwd_microstep: 1646.31 | bwd_inner_microstep: 1646.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 20:10:11,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.81 | bwd_microstep: 1518.33 | bwd_inner_microstep: 1518.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 20:10:13,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.41 | bwd_microstep: 1541.35 | bwd_inner_microstep: 1541.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 20:10:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-10 20:10:23,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.68 | bwd_microstep: 9093.61 | bwd_inner_microstep: 1099.43 | bwd_allreduce_microstep: 7994.12 | step_microstep: 37.84
[2024-06-10 20:10:23,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14995.16 | bwd: 48299.93 | bwd_inner: 40304.91 | bwd_allreduce: 7994.35 | step: 39.33
{'loss': 1.2375, 'learning_rate': 1.1293350515886203e-05, 'epoch': 0.65}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3458
[2024-06-10 20:10:25,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1423.25 | bwd_inner_microstep: 1423.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 20:10:27,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3980
[2024-06-10 20:10:29,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.04 | bwd_microstep: 1498.10 | bwd_inner_microstep: 1498.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 20:10:31,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1648.70 | bwd_inner_microstep: 1648.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4129
[2024-06-10 20:10:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.62 | bwd_microstep: 1699.40 | bwd_inner_microstep: 1699.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 20:10:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.25 | bwd_microstep: 1181.84 | bwd_inner_microstep: 1181.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 20:10:37,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1346.36 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 20:10:39,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1428.28 | bwd_inner_microstep: 1428.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-10 20:10:41,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1303.64 | bwd_inner_microstep: 1303.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405
[2024-06-10 20:10:43,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1307.03 | bwd_inner_microstep: 1307.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3428
[2024-06-10 20:10:45,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1504.09 | bwd_inner_microstep: 1504.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 20:10:46,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.52 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2645
[2024-06-10 20:10:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.06 | bwd_microstep: 1112.29 | bwd_inner_microstep: 1112.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549
[2024-06-10 20:10:50,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1436.87 | bwd_inner_microstep: 1436.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-10 20:10:52,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1600.42 | bwd_inner_microstep: 1600.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446
[2024-06-10 20:10:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1407.59 | bwd_inner_microstep: 1407.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3462
[2024-06-10 20:10:56,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1400.06 | bwd_inner_microstep: 1400.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-10 20:10:57,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 791.67 | bwd_inner_microstep: 791.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102
[2024-06-10 20:10:58,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.23 | bwd_microstep: 821.41 | bwd_inner_microstep: 821.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 20:11:00,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1286.05 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 20:11:02,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1373.31 | bwd_inner_microstep: 1373.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585
[2024-06-10 20:11:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.10 | bwd_microstep: 1530.70 | bwd_inner_microstep: 1530.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 20:11:06,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.17 | bwd_microstep: 1652.66 | bwd_inner_microstep: 1652.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 20:11:09,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.01 | bwd_microstep: 1651.98 | bwd_inner_microstep: 1651.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3708
[2024-06-10 20:11:10,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1329.47 | bwd_inner_microstep: 1329.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 20:11:12,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1389.02 | bwd_inner_microstep: 1388.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 20:11:14,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1508.58 | bwd_inner_microstep: 1508.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 20:11:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1550.25 | bwd_inner_microstep: 1550.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-10 20:11:18,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1297.24 | bwd_inner_microstep: 1297.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 20:11:21,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.16 | bwd_microstep: 1655.81 | bwd_inner_microstep: 1655.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2133
[2024-06-10 20:11:22,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.51 | bwd_microstep: 893.25 | bwd_inner_microstep: 893.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-10 20:11:24,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.64 | optimizer_gradients: 4.14 | optimizer_step: 6.63
[2024-06-10 20:11:24,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.17 | bwd_microstep: 1347.80 | bwd_inner_microstep: 1307.02 | bwd_allreduce_microstep: 40.73 | step_microstep: 39.34
[2024-06-10 20:11:24,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16458.30 | bwd: 44113.63 | bwd_inner: 44072.00 | bwd_allreduce: 40.95 | step: 40.77
{'loss': 1.2005, 'learning_rate': 1.1259575176859739e-05, 'epoch': 0.65}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1880
[2024-06-10 20:11:25,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.95 | bwd_microstep: 768.41 | bwd_inner_microstep: 768.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-10 20:11:27,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.30 | bwd_microstep: 1539.10 | bwd_inner_microstep: 1539.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 20:11:29,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1243.14 | bwd_inner_microstep: 1243.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429
[2024-06-10 20:11:30,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.42 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2340
[2024-06-10 20:11:32,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.24 | bwd_microstep: 889.09 | bwd_inner_microstep: 889.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2061
[2024-06-10 20:11:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 815.23 | bwd_inner_microstep: 815.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4066
[2024-06-10 20:11:35,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.09 | bwd_microstep: 1628.46 | bwd_inner_microstep: 1628.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2050
[2024-06-10 20:11:36,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.34 | bwd_microstep: 753.59 | bwd_inner_microstep: 753.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 20:11:38,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.92 | bwd_microstep: 1155.01 | bwd_inner_microstep: 1154.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 20:11:39,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1250.56 | bwd_inner_microstep: 1250.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 20:11:41,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1349.02 | bwd_inner_microstep: 1348.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 20:11:43,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.41 | bwd_microstep: 1248.78 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-10 20:11:45,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.64 | bwd_microstep: 1523.49 | bwd_inner_microstep: 1523.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2408
[2024-06-10 20:11:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.96 | bwd_microstep: 1039.90 | bwd_inner_microstep: 1039.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-10 20:11:48,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.94 | bwd_microstep: 1241.40 | bwd_inner_microstep: 1241.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646
[2024-06-10 20:11:51,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.33 | bwd_microstep: 1604.41 | bwd_inner_microstep: 1604.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3611
[2024-06-10 20:11:53,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.31 | bwd_microstep: 1632.79 | bwd_inner_microstep: 1632.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-10 20:11:54,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.79 | bwd_microstep: 779.69 | bwd_inner_microstep: 779.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3831
[2024-06-10 20:11:56,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.44 | bwd_microstep: 1855.81 | bwd_inner_microstep: 1855.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 20:11:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1444.64 | bwd_inner_microstep: 1444.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3478
[2024-06-10 20:12:00,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1331.10 | bwd_inner_microstep: 1331.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-10 20:12:01,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.79 | bwd_microstep: 797.33 | bwd_inner_microstep: 797.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3492
[2024-06-10 20:12:03,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.12 | bwd_microstep: 1562.20 | bwd_inner_microstep: 1562.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 20:12:05,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.69 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3622
[2024-06-10 20:12:07,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.75 | bwd_microstep: 1537.89 | bwd_inner_microstep: 1537.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-10 20:12:09,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.40 | bwd_microstep: 1448.64 | bwd_inner_microstep: 1448.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 20:12:11,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.68 | bwd_microstep: 1492.38 | bwd_inner_microstep: 1492.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-10 20:12:13,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1358.72 | bwd_inner_microstep: 1358.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2550
[2024-06-10 20:12:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.66 | bwd_microstep: 967.64 | bwd_inner_microstep: 967.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-10 20:12:16,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.15 | bwd_microstep: 812.66 | bwd_inner_microstep: 812.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814
[2024-06-10 20:12:18,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.11 | bwd_microstep: 1599.63 | bwd_inner_microstep: 1599.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 20:12:24,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.10 | optimizer_step: 6.62
[2024-06-10 20:12:24,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.04 | bwd_microstep: 5690.11 | bwd_inner_microstep: 2098.15 | bwd_allreduce_microstep: 3591.90 | step_microstep: 38.04
[2024-06-10 20:12:24,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15267.92 | bwd: 44923.20 | bwd_inner: 41330.37 | bwd_allreduce: 3592.14 | step: 39.56
{'loss': 1.1641, 'learning_rate': 1.122583062156406e-05, 'epoch': 0.66}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 20:12:25,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.34 | bwd_microstep: 784.19 | bwd_inner_microstep: 784.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3933
[2024-06-10 20:12:28,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.97 | bwd_microstep: 1691.97 | bwd_inner_microstep: 1691.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 20:12:29,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1246.40 | bwd_inner_microstep: 1246.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 20:12:31,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.78 | bwd_microstep: 1446.28 | bwd_inner_microstep: 1446.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 20:12:33,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.05 | bwd_microstep: 1244.12 | bwd_inner_microstep: 1244.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 20:12:35,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768
[2024-06-10 20:12:37,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1339.15 | bwd_inner_microstep: 1339.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1941
[2024-06-10 20:12:38,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.46 | bwd_microstep: 761.45 | bwd_inner_microstep: 761.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-10 20:12:40,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.23 | bwd_microstep: 1189.36 | bwd_inner_microstep: 1189.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 20:12:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.29 | bwd_microstep: 1381.69 | bwd_inner_microstep: 1381.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891
[2024-06-10 20:12:43,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.34 | bwd_microstep: 712.90 | bwd_inner_microstep: 712.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1917
[2024-06-10 20:12:44,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.53 | bwd_microstep: 749.47 | bwd_inner_microstep: 749.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 20:12:46,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.01 | bwd_microstep: 1486.11 | bwd_inner_microstep: 1486.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719
[2024-06-10 20:12:48,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1563.15 | bwd_inner_microstep: 1563.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659
[2024-06-10 20:12:50,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.58 | bwd_microstep: 1611.11 | bwd_inner_microstep: 1611.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 20:12:52,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.65 | bwd_microstep: 1181.15 | bwd_inner_microstep: 1181.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3465
[2024-06-10 20:12:54,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.13 | bwd_microstep: 1542.45 | bwd_inner_microstep: 1542.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-10 20:12:56,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.30 | bwd_microstep: 1620.62 | bwd_inner_microstep: 1620.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 20:12:58,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1416.21 | bwd_inner_microstep: 1416.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832
[2024-06-10 20:13:00,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.67 | bwd_microstep: 1480.72 | bwd_inner_microstep: 1480.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480
[2024-06-10 20:13:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.15 | bwd_microstep: 1405.75 | bwd_inner_microstep: 1405.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3831
[2024-06-10 20:13:04,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.62 | bwd_microstep: 1616.51 | bwd_inner_microstep: 1616.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 20:13:06,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1614.81 | bwd_inner_microstep: 1614.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 20:13:08,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1495.14 | bwd_inner_microstep: 1495.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 20:13:10,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1378.11 | bwd_inner_microstep: 1378.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 20:13:13,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.18 | bwd_microstep: 1663.57 | bwd_inner_microstep: 1663.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 20:13:15,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1392.79 | bwd_inner_microstep: 1392.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-10 20:13:16,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 878.62 | bwd_inner_microstep: 878.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 20:13:18,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1391.49 | bwd_inner_microstep: 1391.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 20:13:20,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.89 | bwd_microstep: 1635.78 | bwd_inner_microstep: 1635.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 20:13:22,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1552.95 | bwd_inner_microstep: 1552.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581
[2024-06-10 20:13:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.58
[2024-06-10 20:13:26,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.13 | bwd_microstep: 3091.53 | bwd_inner_microstep: 1926.42 | bwd_allreduce_microstep: 1165.05 | step_microstep: 39.40
[2024-06-10 20:13:26,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16244.00 | bwd: 44914.45 | bwd_inner: 43748.47 | bwd_allreduce: 1165.28 | step: 40.86
{'loss': 1.2043, 'learning_rate': 1.1192116968847313e-05, 'epoch': 0.66}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4351
[2024-06-10 20:13:28,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.33 | bwd_microstep: 1788.91 | bwd_inner_microstep: 1788.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 20:13:30,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1480.13 | bwd_inner_microstep: 1480.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-10 20:13:32,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.02 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 20:13:34,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.54 | bwd_microstep: 1651.68 | bwd_inner_microstep: 1651.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 20:13:36,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1344.90 | bwd_inner_microstep: 1344.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3499
[2024-06-10 20:13:38,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.80 | bwd_microstep: 1335.91 | bwd_inner_microstep: 1335.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469
[2024-06-10 20:13:40,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.63 | bwd_microstep: 1341.52 | bwd_inner_microstep: 1341.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 20:13:42,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3747
[2024-06-10 20:13:44,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1638.66 | bwd_inner_microstep: 1638.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-10 20:13:46,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.24 | bwd_microstep: 1211.96 | bwd_inner_microstep: 1211.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 20:13:48,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.78 | bwd_microstep: 1432.08 | bwd_inner_microstep: 1432.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 20:13:50,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686
[2024-06-10 20:13:52,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.51 | bwd_microstep: 1720.09 | bwd_inner_microstep: 1720.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043
[2024-06-10 20:13:53,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 809.81 | bwd_inner_microstep: 809.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 20:13:55,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1386.14 | bwd_inner_microstep: 1386.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-10 20:13:56,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.08 | bwd_microstep: 695.38 | bwd_inner_microstep: 695.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537
[2024-06-10 20:13:58,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1426.35 | bwd_inner_microstep: 1426.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2171
[2024-06-10 20:13:59,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.52 | bwd_microstep: 885.28 | bwd_inner_microstep: 885.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2317
[2024-06-10 20:14:01,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.72 | bwd_microstep: 982.84 | bwd_inner_microstep: 982.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-10 20:14:03,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1557.47 | bwd_inner_microstep: 1557.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 20:14:04,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 975.27 | bwd_inner_microstep: 975.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940
[2024-06-10 20:14:05,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.08 | bwd_microstep: 697.39 | bwd_inner_microstep: 697.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576
[2024-06-10 20:14:07,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.26 | bwd_microstep: 1333.52 | bwd_inner_microstep: 1333.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 20:14:09,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.43 | bwd_microstep: 1662.68 | bwd_inner_microstep: 1662.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 20:14:11,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1401.73 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 20:14:13,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1284.72 | bwd_inner_microstep: 1284.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 20:14:15,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1449.74 | bwd_inner_microstep: 1449.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3495
[2024-06-10 20:14:17,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.92 | bwd_microstep: 1530.40 | bwd_inner_microstep: 1530.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 20:14:19,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1352.83 | bwd_inner_microstep: 1352.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 20:14:21,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.98 | bwd_microstep: 1597.09 | bwd_inner_microstep: 1597.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 20:14:23,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1403.03 | bwd_inner_microstep: 1403.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468
[2024-06-10 20:14:29,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 20:14:29,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 4860.43 | bwd_inner_microstep: 1583.92 | bwd_allreduce_microstep: 3276.45 | step_microstep: 37.97
[2024-06-10 20:14:29,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16051.75 | bwd: 46347.86 | bwd_inner: 43070.50 | bwd_allreduce: 3276.69 | step: 39.48
6 [19:31:56<10:03:45, 60.58s/it]


 65%|██████▌   | 1128/1726 [19:31:56<10:03:45, 60.58s/it]
 65%|██████▌   | 1129/1726 [19:33:00<10:11:49, 61.49s/it]


 65%|██████▌   | 1129/1726 [19:33:00<10:11:49, 61.49s/it]
 65%|██████▌   | 1130/1726 [19:34:01<10:09:04, 61.32s/it]


 65%|██████▌   | 1130/1726 [19:34:01<10:09:04, 61.32s/it]
 66%|██████▌   | 1131/1726 [19:35:01<10:05:40, 61.08s/it]


 66%|██████▌   | 1131/1726 [19:35:01<10:05:40, 61.08s/it]
 66%|██████▌   | 1132/1726 [19:36:03<10:05:54, 61.20s/it]


 66%|██████▌   | 1132/1726 [19:36:03<10:05:54, 61.20s/it]
 66%|██████▌   | 1133/1726 [19:37:{'loss': 1.1506, 'learning_rate': 1.1158434337448822e-05, 'epoch': 0.66}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 20:14:31,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1432.47 | bwd_inner_microstep: 1432.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4057
[2024-06-10 20:14:33,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1546.30 | bwd_inner_microstep: 1546.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 20:14:35,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.25 | bwd_microstep: 1481.07 | bwd_inner_microstep: 1481.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 20:14:36,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.66 | bwd_microstep: 1278.80 | bwd_inner_microstep: 1278.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 20:14:38,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1409.44 | bwd_inner_microstep: 1409.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1871
[2024-06-10 20:14:39,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.70 | bwd_microstep: 742.39 | bwd_inner_microstep: 742.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4085
[2024-06-10 20:14:42,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.96 | bwd_microstep: 1588.82 | bwd_inner_microstep: 1588.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 20:14:43,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.30 | bwd_microstep: 1296.47 | bwd_inner_microstep: 1296.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 20:14:45,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.11 | bwd_microstep: 801.94 | bwd_inner_microstep: 801.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1885
[2024-06-10 20:14:46,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.75 | bwd_microstep: 804.69 | bwd_inner_microstep: 804.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3502
[2024-06-10 20:14:48,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1334.10 | bwd_inner_microstep: 1334.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415
[2024-06-10 20:14:49,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.28 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3511
[2024-06-10 20:14:52,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.07 | bwd_microstep: 1681.33 | bwd_inner_microstep: 1681.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 20:14:54,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1510.44 | bwd_inner_microstep: 1510.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 20:14:55,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.23 | bwd_microstep: 791.42 | bwd_inner_microstep: 791.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912
[2024-06-10 20:14:57,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.61 | bwd_microstep: 1685.52 | bwd_inner_microstep: 1685.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021
[2024-06-10 20:14:58,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.51 | bwd_microstep: 713.28 | bwd_inner_microstep: 713.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 20:15:00,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1278.78 | bwd_inner_microstep: 1278.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 20:15:02,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1390.58 | bwd_inner_microstep: 1390.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 20:15:03,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 800.88 | bwd_inner_microstep: 800.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3427
[2024-06-10 20:15:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.33 | bwd_microstep: 1234.03 | bwd_inner_microstep: 1234.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3711
[2024-06-10 20:15:07,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1391.22 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-10 20:15:08,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1195.50 | bwd_inner_microstep: 1195.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 20:15:10,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.05 | bwd_microstep: 1278.35 | bwd_inner_microstep: 1278.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 20:15:12,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.09 | bwd_microstep: 1400.62 | bwd_inner_microstep: 1400.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3827
[2024-06-10 20:15:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.89 | bwd_microstep: 1265.40 | bwd_inner_microstep: 1265.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3779
[2024-06-10 20:15:16,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1473.70 | bwd_inner_microstep: 1473.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 20:15:18,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.18 | bwd_microstep: 1282.46 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2657
[2024-06-10 20:15:19,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.06 | bwd_microstep: 1155.96 | bwd_inner_microstep: 1155.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763
[2024-06-10 20:15:21,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.07 | bwd_microstep: 1477.55 | bwd_inner_microstep: 1477.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3463
[2024-06-10 20:15:23,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.96 | bwd_microstep: 1538.53 | bwd_inner_microstep: 1538.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 20:15:30,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.43 | optimizer_step: 6.60
[2024-06-10 20:15:30,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 5660.66 | bwd_inner_microstep: 1525.98 | bwd_allreduce_microstep: 4134.60 | step_microstep: 39.97
[2024-06-10 20:15:30,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15387.11 | bwd: 45328.13 | bwd_inner: 41192.59 | bwd_allreduce: 4134.85 | step: 41.43
{'loss': 1.1378, 'learning_rate': 1.1124782845998632e-05, 'epoch': 0.66}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 20:15:31,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1343.32 | bwd_inner_microstep: 1343.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600
[2024-06-10 20:15:33,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 1435.63 | bwd_inner_microstep: 1435.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 20:15:35,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.13 | bwd_microstep: 1476.21 | bwd_inner_microstep: 1476.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864
[2024-06-10 20:15:36,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.07 | bwd_microstep: 704.48 | bwd_inner_microstep: 704.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-10 20:15:38,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.57 | bwd_microstep: 1444.87 | bwd_inner_microstep: 1444.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 20:15:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1247.87 | bwd_inner_microstep: 1247.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878
[2024-06-10 20:15:41,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.35 | bwd_microstep: 742.49 | bwd_inner_microstep: 742.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 20:15:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.25 | bwd_microstep: 1147.66 | bwd_inner_microstep: 1147.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 20:15:45,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1399.94 | bwd_inner_microstep: 1399.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1427
[2024-06-10 20:15:45,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.54 | bwd_microstep: 535.65 | bwd_inner_microstep: 535.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1897
[2024-06-10 20:15:47,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.55 | bwd_microstep: 746.30 | bwd_inner_microstep: 746.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3407
[2024-06-10 20:15:49,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1437.73 | bwd_inner_microstep: 1437.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656
[2024-06-10 20:15:51,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.08 | bwd_microstep: 1522.13 | bwd_inner_microstep: 1522.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499
[2024-06-10 20:15:53,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1575.70 | bwd_inner_microstep: 1575.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 20:15:55,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1353.54 | bwd_inner_microstep: 1353.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-10 20:15:56,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.99 | bwd_microstep: 697.98 | bwd_inner_microstep: 697.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 20:15:57,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.28 | bwd_microstep: 975.17 | bwd_inner_microstep: 975.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 20:15:59,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.15 | bwd_microstep: 1461.15 | bwd_inner_microstep: 1461.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-10 20:16:00,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.98 | bwd_microstep: 804.20 | bwd_inner_microstep: 804.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 20:16:02,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1532.70 | bwd_inner_microstep: 1532.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 20:16:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.28 | bwd_microstep: 1258.86 | bwd_inner_microstep: 1258.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 20:16:06,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1480.54 | bwd_inner_microstep: 1480.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3848
[2024-06-10 20:16:08,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1365.55 | bwd_inner_microstep: 1365.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 20:16:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1500.55 | bwd_inner_microstep: 1500.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469
[2024-06-10 20:16:12,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.34 | bwd_microstep: 1215.62 | bwd_inner_microstep: 1215.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3713
[2024-06-10 20:16:14,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.31 | bwd_microstep: 1359.09 | bwd_inner_microstep: 1359.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3632
[2024-06-10 20:16:16,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1542.51 | bwd_inner_microstep: 1542.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2268
[2024-06-10 20:16:17,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.99 | bwd_microstep: 935.92 | bwd_inner_microstep: 935.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 20:16:19,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.89 | bwd_microstep: 1645.57 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 20:16:21,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.00 | bwd_microstep: 1450.57 | bwd_inner_microstep: 1450.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-10 20:16:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.60 | bwd_microstep: 1741.05 | bwd_inner_microstep: 1741.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809
[2024-06-10 20:16:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.36 | optimizer_step: 6.61
[2024-06-10 20:16:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.93 | bwd_microstep: 6565.02 | bwd_inner_microstep: 1788.55 | bwd_allreduce_microstep: 4776.39 | step_microstep: 38.85
[2024-06-10 20:16:31,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15219.80 | bwd: 45645.57 | bwd_inner: 40868.25 | bwd_allreduce: 4776.64 | step: 40.32
{'loss': 1.2201, 'learning_rate': 1.1091162613017113e-05, 'epoch': 0.66}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555
[2024-06-10 20:16:33,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1420.34 | bwd_inner_microstep: 1420.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2977
[2024-06-10 20:16:34,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.18 | bwd_microstep: 1096.95 | bwd_inner_microstep: 1096.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 20:16:36,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.82 | bwd_microstep: 1309.86 | bwd_inner_microstep: 1309.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 20:16:38,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.87 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3771
[2024-06-10 20:16:40,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4158
[2024-06-10 20:16:42,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.02 | bwd_microstep: 1347.65 | bwd_inner_microstep: 1347.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 20:16:43,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 789.56 | bwd_inner_microstep: 789.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768
[2024-06-10 20:16:45,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1341.62 | bwd_inner_microstep: 1341.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 20:16:47,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.55 | bwd_microstep: 1148.26 | bwd_inner_microstep: 1148.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-10 20:16:49,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1429.57 | bwd_inner_microstep: 1429.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 20:16:51,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3496
[2024-06-10 20:16:52,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1330.73 | bwd_inner_microstep: 1330.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 20:16:54,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1484.01 | bwd_inner_microstep: 1483.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-10 20:16:56,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1443.88 | bwd_inner_microstep: 1443.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487
[2024-06-10 20:16:58,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.08 | bwd_microstep: 1439.93 | bwd_inner_microstep: 1439.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 20:17:00,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.40 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3624
[2024-06-10 20:17:03,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.47 | bwd_microstep: 1464.08 | bwd_inner_microstep: 1464.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 20:17:04,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1374.29 | bwd_inner_microstep: 1374.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499
[2024-06-10 20:17:06,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1405.47 | bwd_inner_microstep: 1405.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 20:17:08,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1510.35 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2295
[2024-06-10 20:17:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.91 | bwd_microstep: 814.11 | bwd_inner_microstep: 814.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-10 20:17:11,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.47 | bwd_microstep: 1311.88 | bwd_inner_microstep: 1311.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-10 20:17:13,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1434.67 | bwd_inner_microstep: 1434.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2280
[2024-06-10 20:17:14,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.40 | bwd_microstep: 786.06 | bwd_inner_microstep: 786.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 20:17:17,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 20:17:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1349.20 | bwd_inner_microstep: 1349.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 20:17:20,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.13 | bwd_microstep: 1374.76 | bwd_inner_microstep: 1374.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3669
[2024-06-10 20:17:22,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.86 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 20:17:25,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1598.55 | bwd_inner_microstep: 1598.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3448
[2024-06-10 20:17:26,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1300.31 | bwd_inner_microstep: 1300.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 20:17:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.13 | bwd_microstep: 1375.47 | bwd_inner_microstep: 1375.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3385
[2024-06-10 20:17:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 20:17:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.94 | bwd_microstep: 1535.19 | bwd_inner_microstep: 1230.41 | bwd_allreduce_microstep: 304.73 | step_microstep: 37.53
[2024-06-10 20:17:30,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16049.97 | bwd: 43128.23 | bwd_inner: 42822.61 | bwd_allreduce: 304.95 | step: 39.04
{'loss': 1.2306, 'learning_rate': 1.1057573756914573e-05, 'epoch': 0.66}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465
[2024-06-10 20:17:32,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1570.22 | bwd_inner_microstep: 1570.14 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-10 20:17:33,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.01 | bwd_microstep: 676.57 | bwd_inner_microstep: 676.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 20:17:35,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.87 | bwd_microstep: 1353.80 | bwd_inner_microstep: 1353.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 20:17:37,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1282.69 | bwd_inner_microstep: 1282.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-10 20:17:39,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1437.01 | bwd_inner_microstep: 1436.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489
[2024-06-10 20:17:41,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.71 | bwd_microstep: 1219.69 | bwd_inner_microstep: 1219.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 20:17:43,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.32 | bwd_microstep: 1291.48 | bwd_inner_microstep: 1291.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1965
[2024-06-10 20:17:44,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.09 | bwd_microstep: 732.33 | bwd_inner_microstep: 732.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 20:17:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.22 | bwd_microstep: 1381.22 | bwd_inner_microstep: 1381.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2462
[2024-06-10 20:17:47,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.72 | bwd_microstep: 921.69 | bwd_inner_microstep: 921.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 20:17:49,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.34 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 20:17:50,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1392.89 | bwd_inner_microstep: 1392.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2893
[2024-06-10 20:17:52,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.47 | bwd_microstep: 1279.57 | bwd_inner_microstep: 1279.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3930
[2024-06-10 20:17:54,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.85 | bwd_microstep: 1619.40 | bwd_inner_microstep: 1619.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3434
[2024-06-10 20:17:56,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1391.93 | bwd_inner_microstep: 1391.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3443
[2024-06-10 20:17:58,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2936
[2024-06-10 20:18:00,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.94 | bwd_microstep: 1285.51 | bwd_inner_microstep: 1285.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 20:18:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 20:18:04,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.36 | bwd_microstep: 1578.97 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3463
[2024-06-10 20:18:06,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.31 | bwd_microstep: 1359.31 | bwd_inner_microstep: 1359.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3608
[2024-06-10 20:18:08,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.19 | bwd_microstep: 1649.50 | bwd_inner_microstep: 1649.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-10 20:18:10,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1510.81 | bwd_inner_microstep: 1510.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-10 20:18:12,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1580.49 | bwd_inner_microstep: 1580.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2071
[2024-06-10 20:18:14,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.34 | bwd_microstep: 879.70 | bwd_inner_microstep: 879.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546
[2024-06-10 20:18:16,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.74 | bwd_microstep: 1327.13 | bwd_inner_microstep: 1327.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 20:18:18,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1498.41 | bwd_inner_microstep: 1498.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 20:18:20,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1459.79 | bwd_inner_microstep: 1459.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 20:18:22,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 20:18:24,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.11 | bwd_microstep: 1499.47 | bwd_inner_microstep: 1499.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 20:18:26,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.91 | bwd_microstep: 1415.45 | bwd_inner_microstep: 1415.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066
[2024-06-10 20:18:27,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.76 | bwd_microstep: 819.83 | bwd_inner_microstep: 819.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3626
[2024-06-10 20:18:32,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.63
[2024-06-10 20:18:32,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 4658.55 | bwd_inner_microstep: 1737.10 | bwd_allreduce_microstep: 2921.40 | step_microstep: 37.70
[2024-06-10 20:18:32,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15860.52 | bwd: 45499.95 | bwd_inner: 42577.58 | bwd_allreduce: 2921.67 | step: 39.22
{'loss': 1.1829, 'learning_rate': 1.1024016395990758e-05, 'epoch': 0.66}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399
[2024-06-10 20:18:34,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.91 | bwd_microstep: 1359.31 | bwd_inner_microstep: 1359.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844
[2024-06-10 20:18:36,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1461.35 | bwd_inner_microstep: 1461.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 20:18:38,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1343.84 | bwd_inner_microstep: 1343.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2900
[2024-06-10 20:18:39,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.84 | bwd_microstep: 996.90 | bwd_inner_microstep: 996.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 20:18:41,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.62 | bwd_microstep: 1288.25 | bwd_inner_microstep: 1288.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 20:18:43,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1254.45 | bwd_inner_microstep: 1254.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-10 20:18:44,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.90 | bwd_microstep: 1149.50 | bwd_inner_microstep: 1149.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 20:18:46,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1384.91 | bwd_inner_microstep: 1384.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-10 20:18:48,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.98 | bwd_microstep: 1185.83 | bwd_inner_microstep: 1185.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 20:18:50,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1341.48 | bwd_inner_microstep: 1341.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3484
[2024-06-10 20:18:52,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1622.53 | bwd_inner_microstep: 1622.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3534
[2024-06-10 20:18:54,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.66 | bwd_microstep: 1584.20 | bwd_inner_microstep: 1584.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488
[2024-06-10 20:18:56,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1442.60 | bwd_inner_microstep: 1442.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 20:18:58,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.76 | bwd_microstep: 1155.07 | bwd_inner_microstep: 1155.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3509
[2024-06-10 20:18:59,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-10 20:19:01,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.59 | bwd_microstep: 1287.64 | bwd_inner_microstep: 1287.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 20:19:03,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.91 | bwd_microstep: 1613.83 | bwd_inner_microstep: 1613.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 20:19:05,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1248.37 | bwd_inner_microstep: 1248.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434
[2024-06-10 20:19:07,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.73 | bwd_microstep: 1188.05 | bwd_inner_microstep: 1188.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 20:19:09,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.90 | bwd_microstep: 1502.76 | bwd_inner_microstep: 1502.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 20:19:11,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.05 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 20:19:13,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1498.30 | bwd_inner_microstep: 1498.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 20:19:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.76 | bwd_microstep: 1535.04 | bwd_inner_microstep: 1535.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634
[2024-06-10 20:19:17,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.34 | bwd_microstep: 1316.67 | bwd_inner_microstep: 1316.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-10 20:19:19,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1402.46 | bwd_inner_microstep: 1402.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 20:19:20,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.27 | bwd_microstep: 810.04 | bwd_inner_microstep: 810.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3429
[2024-06-10 20:19:22,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.32 | bwd_microstep: 1203.85 | bwd_inner_microstep: 1203.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3381
[2024-06-10 20:19:23,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.11 | bwd_microstep: 1439.12 | bwd_inner_microstep: 1439.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565
[2024-06-10 20:19:26,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.53 | bwd_microstep: 1566.07 | bwd_inner_microstep: 1566.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 20:19:28,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.81 | bwd_microstep: 1505.93 | bwd_inner_microstep: 1505.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 20:19:30,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.96 | bwd_microstep: 1411.82 | bwd_inner_microstep: 1411.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3800
[2024-06-10 20:19:33,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 20:19:33,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.36 | bwd_microstep: 2302.94 | bwd_inner_microstep: 1916.53 | bwd_allreduce_microstep: 386.35 | step_microstep: 37.51
[2024-06-10 20:19:33,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16286.87 | bwd: 44034.35 | bwd_inner: 43647.10 | bwd_allreduce: 386.57 | step: 38.97
05<10:09:25, 61.66s/it]


 66%|██████▌   | 1133/1726 [19:37:05<10:09:25, 61.66s/it]
 66%|██████▌   | 1134/1726 [19:38:06<10:06:33, 61.47s/it]


 66%|██████▌   | 1134/1726 [19:38:06<10:06:33, 61.47s/it]
 66%|██████▌   | 1135/1726 [19:39:08<10:04:41, 61.39s/it]


 66%|██████▌   | 1135/1726 [19:39:08<10:04:41, 61.39s/it]
 66%|██████▌   | 1136/1726 [19:40:07<9:58:06, 60.82s/it]


 66%|██████▌   | 1136/1726 [19:40:07<9:58:06, 60.82s/it]
 66%|██████▌   | 1137/1726 [19:41:09<9:59:38, 61.08s/it]


 66%|██████▌   | 1137/1726 [19:41:09<9:59:38, 61.08s/it]
 66%|██████▌   | 1138/1726 [19:42:09<9:57:21, 60{'loss': 1.2093, 'learning_rate': 1.0990490648434541e-05, 'epoch': 0.66}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399
[2024-06-10 20:19:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.80 | bwd_microstep: 1371.79 | bwd_inner_microstep: 1371.71 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 20:19:36,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1244.41 | bwd_inner_microstep: 1244.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2352
[2024-06-10 20:19:38,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.72 | bwd_microstep: 1050.93 | bwd_inner_microstep: 1050.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-10 20:19:39,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.84 | bwd_microstep: 960.33 | bwd_inner_microstep: 960.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 20:19:40,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 707.95 | bwd_inner_microstep: 707.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 20:19:42,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1245.28 | bwd_inner_microstep: 1245.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3714
[2024-06-10 20:19:44,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.50 | bwd_microstep: 1493.13 | bwd_inner_microstep: 1493.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 20:19:46,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-10 20:19:48,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.20 | bwd_microstep: 1638.32 | bwd_inner_microstep: 1638.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 20:19:50,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.66 | bwd_microstep: 1249.51 | bwd_inner_microstep: 1249.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425
[2024-06-10 20:19:51,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.89 | bwd_microstep: 1214.39 | bwd_inner_microstep: 1214.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 20:19:53,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.64 | bwd_microstep: 1380.68 | bwd_inner_microstep: 1380.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1956
[2024-06-10 20:19:54,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 891.28 | bwd_inner_microstep: 891.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3908
[2024-06-10 20:19:57,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.27 | bwd_microstep: 1735.93 | bwd_inner_microstep: 1735.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-10 20:19:59,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.00 | bwd_microstep: 1481.47 | bwd_inner_microstep: 1481.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 20:20:01,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.24 | bwd_microstep: 1306.06 | bwd_inner_microstep: 1306.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-10 20:20:03,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.39 | bwd_microstep: 1656.72 | bwd_inner_microstep: 1656.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-10 20:20:04,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.87 | bwd_microstep: 698.83 | bwd_inner_microstep: 698.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3814
[2024-06-10 20:20:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.90 | bwd_microstep: 1384.98 | bwd_inner_microstep: 1384.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-10 20:20:08,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1418.82 | bwd_inner_microstep: 1418.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 20:20:10,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1494.02 | bwd_inner_microstep: 1493.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2293
[2024-06-10 20:20:11,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 785.07 | bwd_inner_microstep: 785.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 20:20:13,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 20:20:15,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3804
[2024-06-10 20:20:17,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.10 | bwd_microstep: 1581.80 | bwd_inner_microstep: 1581.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548
[2024-06-10 20:20:19,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.26 | bwd_microstep: 1325.95 | bwd_inner_microstep: 1325.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3598
[2024-06-10 20:20:21,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.88 | bwd_microstep: 1533.86 | bwd_inner_microstep: 1533.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 20:20:23,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.86 | bwd_microstep: 1600.18 | bwd_inner_microstep: 1600.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2279
[2024-06-10 20:20:24,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.81 | bwd_microstep: 1074.41 | bwd_inner_microstep: 1074.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2221
[2024-06-10 20:20:26,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.63 | bwd_microstep: 1060.37 | bwd_inner_microstep: 1060.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-10 20:20:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.59 | bwd_microstep: 1598.48 | bwd_inner_microstep: 1598.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581
[2024-06-10 20:20:33,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 20:20:33,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 4893.32 | bwd_inner_microstep: 1526.79 | bwd_allreduce_microstep: 3366.48 | step_microstep: 37.90
[2024-06-10 20:20:33,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15503.36 | bwd: 44958.77 | bwd_inner: 41591.32 | bwd_allreduce: 3366.74 | step: 39.43
{'loss': 1.1462, 'learning_rate': 1.095699663232342e-05, 'epoch': 0.66}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 20:20:35,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.72 | bwd_microstep: 1296.52 | bwd_inner_microstep: 1296.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3951
[2024-06-10 20:20:37,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.15 | bwd_microstep: 1595.09 | bwd_inner_microstep: 1595.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 20:20:39,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1378.74 | bwd_inner_microstep: 1378.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 20:20:41,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 20:20:43,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 20:20:45,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.56 | bwd_microstep: 1150.67 | bwd_inner_microstep: 1150.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-10 20:20:47,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.51 | bwd_microstep: 1634.60 | bwd_inner_microstep: 1634.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 20:20:48,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.95 | bwd_microstep: 795.57 | bwd_inner_microstep: 795.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3491
[2024-06-10 20:20:50,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.10 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495
[2024-06-10 20:20:52,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1575.86 | bwd_inner_microstep: 1575.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 20:20:54,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1477.58 | bwd_inner_microstep: 1477.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 20:20:56,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1447.79 | bwd_inner_microstep: 1447.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 20:20:58,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 804.89 | bwd_inner_microstep: 804.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627
[2024-06-10 20:21:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.99 | bwd_microstep: 1705.32 | bwd_inner_microstep: 1705.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3473
[2024-06-10 20:21:02,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.90 | bwd_microstep: 1437.08 | bwd_inner_microstep: 1437.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 20:21:04,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1375.67 | bwd_inner_microstep: 1375.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 20:21:06,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1395.46 | bwd_inner_microstep: 1395.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 20:21:07,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1289.66 | bwd_inner_microstep: 1289.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-10 20:21:10,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.40 | bwd_microstep: 1621.86 | bwd_inner_microstep: 1621.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 20:21:11,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2168
[2024-06-10 20:21:13,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.95 | bwd_microstep: 950.53 | bwd_inner_microstep: 950.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 20:21:15,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.34 | bwd_microstep: 1492.03 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 20:21:17,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1280.11 | bwd_inner_microstep: 1280.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 20:21:19,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1392.88 | bwd_inner_microstep: 1392.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3803
[2024-06-10 20:21:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.85 | bwd_microstep: 1600.53 | bwd_inner_microstep: 1600.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 20:21:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1406.41 | bwd_inner_microstep: 1406.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 20:21:25,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1509.42 | bwd_inner_microstep: 1509.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 20:21:27,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1416.13 | bwd_inner_microstep: 1416.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3612
[2024-06-10 20:21:29,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1440.79 | bwd_inner_microstep: 1440.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 20:21:31,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743
[2024-06-10 20:21:33,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1438.31 | bwd_inner_microstep: 1438.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068
[2024-06-10 20:21:37,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.14 | optimizer_step: 6.61
[2024-06-10 20:21:37,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.88 | bwd_microstep: 3903.11 | bwd_inner_microstep: 1044.29 | bwd_allreduce_microstep: 2858.77 | step_microstep: 38.83
[2024-06-10 20:21:37,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16369.06 | bwd: 46771.77 | bwd_inner: 43912.10 | bwd_allreduce: 2859.00 | step: 40.31
{'loss': 1.176, 'learning_rate': 1.0923534465623165e-05, 'epoch': 0.66}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 20:21:39,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1469.07 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 20:21:41,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.42 | bwd_microstep: 1273.84 | bwd_inner_microstep: 1273.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-10 20:21:43,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1345.97 | bwd_inner_microstep: 1345.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-10 20:21:45,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1542.37 | bwd_inner_microstep: 1542.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741
[2024-06-10 20:21:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1430.26 | bwd_inner_microstep: 1430.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 20:21:49,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1484.05 | bwd_inner_microstep: 1484.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 20:21:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 793.14 | bwd_inner_microstep: 793.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 20:21:52,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1287.88 | bwd_inner_microstep: 1287.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3414
[2024-06-10 20:21:53,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.64 | bwd_microstep: 1200.16 | bwd_inner_microstep: 1200.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2097
[2024-06-10 20:21:54,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 759.49 | bwd_inner_microstep: 759.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 20:21:56,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1258.52 | bwd_inner_microstep: 1258.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974
[2024-06-10 20:21:57,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.55 | bwd_microstep: 890.75 | bwd_inner_microstep: 890.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 20:21:59,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1410.36 | bwd_inner_microstep: 1410.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 20:22:01,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 20:22:03,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1420.46 | bwd_inner_microstep: 1420.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 20:22:05,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1276.89 | bwd_inner_microstep: 1276.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431
[2024-06-10 20:22:07,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.02 | bwd_microstep: 1309.68 | bwd_inner_microstep: 1309.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 20:22:08,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1286.46 | bwd_inner_microstep: 1286.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2433
[2024-06-10 20:22:10,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.58 | bwd_microstep: 1043.39 | bwd_inner_microstep: 1043.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2353
[2024-06-10 20:22:11,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.54 | bwd_microstep: 830.33 | bwd_inner_microstep: 830.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2301
[2024-06-10 20:22:12,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.65 | bwd_microstep: 912.16 | bwd_inner_microstep: 912.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-10 20:22:14,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.55 | bwd_microstep: 1311.96 | bwd_inner_microstep: 1311.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481
[2024-06-10 20:22:16,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1316.47 | bwd_inner_microstep: 1316.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 20:22:18,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1405.09 | bwd_inner_microstep: 1405.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 20:22:20,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1377.29 | bwd_inner_microstep: 1377.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 20:22:22,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1299.53 | bwd_inner_microstep: 1299.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 20:22:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.22 | bwd_microstep: 1477.36 | bwd_inner_microstep: 1477.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 20:22:26,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.39 | bwd_microstep: 1644.11 | bwd_inner_microstep: 1644.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3803
[2024-06-10 20:22:28,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.11 | bwd_microstep: 1723.39 | bwd_inner_microstep: 1723.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 20:22:30,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1282.52 | bwd_inner_microstep: 1282.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 20:22:32,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.12 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-10 20:22:39,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 20:22:39,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.86 | bwd_microstep: 6479.32 | bwd_inner_microstep: 1987.76 | bwd_allreduce_microstep: 4491.51 | step_microstep: 37.95
[2024-06-10 20:22:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15664.23 | bwd: 46487.37 | bwd_inner: 41994.95 | bwd_allreduce: 4491.74 | step: 39.38
{'loss': 1.2104, 'learning_rate': 1.089010426618732e-05, 'epoch': 0.66}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 20:22:41,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1331.89 | bwd_inner_microstep: 1331.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2397
[2024-06-10 20:22:42,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.30 | bwd_microstep: 901.33 | bwd_inner_microstep: 901.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2718
[2024-06-10 20:22:44,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.05 | bwd_microstep: 1028.40 | bwd_inner_microstep: 1028.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 20:22:46,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.02 | bwd_microstep: 1145.78 | bwd_inner_microstep: 1145.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 20:22:47,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1389.12 | bwd_inner_microstep: 1389.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 20:22:49,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 20:22:51,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-10 20:22:52,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.02 | bwd_microstep: 807.88 | bwd_inner_microstep: 807.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-10 20:22:54,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1628.09 | bwd_inner_microstep: 1628.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3438
[2024-06-10 20:22:56,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.53 | bwd_microstep: 1541.15 | bwd_inner_microstep: 1541.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3675
[2024-06-10 20:22:59,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.01 | bwd_microstep: 1718.63 | bwd_inner_microstep: 1718.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3966
[2024-06-10 20:23:01,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.00 | bwd_microstep: 1691.21 | bwd_inner_microstep: 1691.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-10 20:23:03,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.54 | bwd_microstep: 1513.36 | bwd_inner_microstep: 1513.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447
[2024-06-10 20:23:05,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.22 | bwd_microstep: 1373.50 | bwd_inner_microstep: 1373.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-10 20:23:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1336.47 | bwd_inner_microstep: 1336.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3224
[2024-06-10 20:23:08,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.54 | bwd_microstep: 1173.37 | bwd_inner_microstep: 1173.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 20:23:10,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1379.21 | bwd_inner_microstep: 1379.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1976
[2024-06-10 20:23:11,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.04 | bwd_microstep: 704.20 | bwd_inner_microstep: 704.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 20:23:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1510.74 | bwd_inner_microstep: 1510.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3870
[2024-06-10 20:23:15,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.28 | bwd_microstep: 1463.62 | bwd_inner_microstep: 1463.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 955
[2024-06-10 20:23:16,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.22 | bwd_microstep: 380.09 | bwd_inner_microstep: 380.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 20:23:18,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1248.40 | bwd_inner_microstep: 1248.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 20:23:20,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.09 | bwd_microstep: 1546.27 | bwd_inner_microstep: 1546.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 20:23:22,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1398.60 | bwd_inner_microstep: 1398.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 20:23:24,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1356.82 | bwd_inner_microstep: 1356.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821
[2024-06-10 20:23:26,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.36 | bwd_microstep: 1418.68 | bwd_inner_microstep: 1418.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768
[2024-06-10 20:23:28,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.09 | bwd_microstep: 1468.43 | bwd_inner_microstep: 1468.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 20:23:30,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.00 | bwd_microstep: 1454.62 | bwd_inner_microstep: 1454.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643
[2024-06-10 20:23:32,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.04 | bwd_microstep: 1616.05 | bwd_inner_microstep: 1616.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057
[2024-06-10 20:23:33,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.75 | bwd_microstep: 847.93 | bwd_inner_microstep: 847.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-10 20:23:35,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.14 | bwd_microstep: 1193.00 | bwd_inner_microstep: 1192.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 20:23:42,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 20:23:42,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.52 | bwd_microstep: 7030.13 | bwd_inner_microstep: 1691.40 | bwd_allreduce_microstep: 5338.68 | step_microstep: 37.92
[2024-06-10 20:23:42,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15558.35 | bwd: 47093.67 | bwd_inner: 41754.09 | bwd_allreduce: 5338.90 | step: 39.36
{'loss': 1.2498, 'learning_rate': 1.0856706151756902e-05, 'epoch': 0.66}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418
[2024-06-10 20:23:44,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.33 | bwd_microstep: 1273.55 | bwd_inner_microstep: 1273.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 20:23:46,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.57 | bwd_microstep: 1277.47 | bwd_inner_microstep: 1277.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 20:23:47,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.97 | bwd_microstep: 872.88 | bwd_inner_microstep: 872.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 20:23:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.78 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 20:23:51,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.43 | bwd_microstep: 1279.53 | bwd_inner_microstep: 1279.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760
[2024-06-10 20:23:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1437.99 | bwd_inner_microstep: 1437.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-10 20:23:54,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.28 | bwd_microstep: 802.24 | bwd_inner_microstep: 802.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2937
[2024-06-10 20:23:55,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.09 | bwd_microstep: 1031.78 | bwd_inner_microstep: 1031.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 20:23:57,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.54 | bwd_microstep: 1428.48 | bwd_inner_microstep: 1428.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 20:23:59,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1251.87 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 20:24:01,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1346.87 | bwd_inner_microstep: 1346.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 20:24:03,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1458.05 | bwd_inner_microstep: 1458.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 20:24:05,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.72 | bwd_microstep: 1616.82 | bwd_inner_microstep: 1616.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2797
[2024-06-10 20:24:07,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.80 | bwd_microstep: 1150.16 | bwd_inner_microstep: 1150.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976
[2024-06-10 20:24:08,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.07 | bwd_microstep: 891.51 | bwd_inner_microstep: 891.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-10 20:24:10,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.83 | bwd_microstep: 1516.23 | bwd_inner_microstep: 1516.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474
[2024-06-10 20:24:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.44 | bwd_microstep: 1581.73 | bwd_inner_microstep: 1581.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3525
[2024-06-10 20:24:14,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1519.37 | bwd_inner_microstep: 1519.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534
[2024-06-10 20:24:16,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.13 | bwd_microstep: 1227.24 | bwd_inner_microstep: 1227.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-10 20:24:18,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1370.95 | bwd_inner_microstep: 1370.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 20:24:20,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.39 | bwd_microstep: 1450.99 | bwd_inner_microstep: 1450.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3400
[2024-06-10 20:24:22,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.41 | bwd_microstep: 1205.01 | bwd_inner_microstep: 1204.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539
[2024-06-10 20:24:23,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.35 | bwd_microstep: 1197.76 | bwd_inner_microstep: 1197.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3809
[2024-06-10 20:24:26,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.40 | bwd_microstep: 1716.85 | bwd_inner_microstep: 1716.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181
[2024-06-10 20:24:27,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.16 | bwd_microstep: 864.51 | bwd_inner_microstep: 864.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 20:24:29,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.84 | bwd_microstep: 1348.74 | bwd_inner_microstep: 1348.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 20:24:31,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.62 | bwd_microstep: 1401.06 | bwd_inner_microstep: 1401.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 20:24:33,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.01 | bwd_microstep: 1626.91 | bwd_inner_microstep: 1626.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 20:24:35,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.37 | bwd_microstep: 1651.57 | bwd_inner_microstep: 1651.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569
[2024-06-10 20:24:37,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.00 | bwd_microstep: 1203.98 | bwd_inner_microstep: 1203.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-10 20:24:39,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1499.71 | bwd_inner_microstep: 1499.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2049
[2024-06-10 20:24:46,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.35 | optimizer_step: 6.64
[2024-06-10 20:24:46,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.55 | bwd_microstep: 7063.24 | bwd_inner_microstep: 1078.65 | bwd_allreduce_microstep: 5984.53 | step_microstep: 42.19
[2024-06-10 20:24:46,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15681.59 | bwd: 47943.55 | bwd_inner: 41958.10 | bwd_allreduce: 5984.76 | step: 43.67
.96s/it]


 66%|██████▌   | 1138/1726 [19:42:09<9:57:21, 60.96s/it]
 66%|██████▌   | 1139/1726 [19:43:10<9:55:53, 60.91s/it]


 66%|██████▌   | 1139/1726 [19:43:10<9:55:53, 60.91s/it]
 66%|██████▌   | 1140/1726 [19:44:14<10:02:25, 61.68s/it]


 66%|██████▌   | 1140/1726 [19:44:14<10:02:25, 61.68s/it]
 66%|██████▌   | 1141/1726 [19:45:16<10:03:43, 61.92s/it]


 66%|██████▌   | 1141/1726 [19:45:16<10:03:43, 61.92s/it]
 66%|██████▌   | 1142/1726 [19:46:19<10:05:46, 62.24s/it]


 66%|██████▌   | 1142/1726 [19:46:19<10:05:46, 62.24s/it]
 66%|██████▌   | 1143/1726 [19:47:23<10:09:45, 62.75s/it]
     {'loss': 1.163, 'learning_rate': 1.0823340239959883e-05, 'epoch': 0.66}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3475
[2024-06-10 20:24:48,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.14 | bwd_microstep: 1567.51 | bwd_inner_microstep: 1567.36 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 20:24:50,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2400
[2024-06-10 20:24:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.70 | bwd_microstep: 901.87 | bwd_inner_microstep: 901.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3801
[2024-06-10 20:24:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1480.15 | bwd_inner_microstep: 1480.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 20:24:55,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 20:24:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.74 | bwd_microstep: 961.03 | bwd_inner_microstep: 961.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 20:24:59,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.45 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 20:25:01,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 20:25:02,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.62 | bwd_microstep: 793.50 | bwd_inner_microstep: 793.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948
[2024-06-10 20:25:03,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.38 | bwd_microstep: 730.30 | bwd_inner_microstep: 730.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 20:25:05,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.45 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 20:25:06,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 799.61 | bwd_inner_microstep: 799.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3444
[2024-06-10 20:25:07,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1205.83 | bwd_inner_microstep: 1205.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3984
[2024-06-10 20:25:10,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.49 | bwd_microstep: 1810.60 | bwd_inner_microstep: 1810.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652
[2024-06-10 20:25:12,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.81 | bwd_microstep: 1716.76 | bwd_inner_microstep: 1716.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3675
[2024-06-10 20:25:14,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1356.96 | bwd_inner_microstep: 1356.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521
[2024-06-10 20:25:16,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1322.23 | bwd_inner_microstep: 1322.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 20:25:18,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.47 | bwd_microstep: 1257.92 | bwd_inner_microstep: 1257.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 20:25:20,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1409.79 | bwd_inner_microstep: 1409.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3609
[2024-06-10 20:25:22,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.29 | bwd_microstep: 1673.22 | bwd_inner_microstep: 1673.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 20:25:24,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1401.84 | bwd_inner_microstep: 1401.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 20:25:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.46 | bwd_microstep: 1456.51 | bwd_inner_microstep: 1456.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812
[2024-06-10 20:25:28,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.17 | bwd_microstep: 1514.41 | bwd_inner_microstep: 1514.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 20:25:30,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.95 | bwd_microstep: 1328.12 | bwd_inner_microstep: 1328.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 20:25:32,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1298.66 | bwd_inner_microstep: 1298.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454
[2024-06-10 20:25:33,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1317.82 | bwd_inner_microstep: 1317.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 20:25:36,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1582.10 | bwd_inner_microstep: 1582.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3799
[2024-06-10 20:25:38,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1622.38 | bwd_inner_microstep: 1622.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-10 20:25:40,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.24 | bwd_microstep: 1445.62 | bwd_inner_microstep: 1445.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3748
[2024-06-10 20:25:42,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.04 | bwd_microstep: 1738.33 | bwd_inner_microstep: 1738.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-10 20:25:44,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1548.13 | bwd_inner_microstep: 1548.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 20:25:46,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.07 | optimizer_gradients: 4.04 | optimizer_step: 6.63
[2024-06-10 20:25:46,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1516.48 | bwd_inner_microstep: 1508.79 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.74
[2024-06-10 20:25:46,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16240.64 | bwd: 43512.32 | bwd_inner: 43503.66 | bwd_allreduce: 7.93 | step: 39.26
{'loss': 1.1666, 'learning_rate': 1.0790006648310828e-05, 'epoch': 0.66}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1887
[2024-06-10 20:25:47,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.29 | bwd_microstep: 773.58 | bwd_inner_microstep: 773.50 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 20:25:49,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.10 | bwd_microstep: 1355.63 | bwd_inner_microstep: 1355.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3952
[2024-06-10 20:25:51,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.66 | bwd_microstep: 1501.93 | bwd_inner_microstep: 1501.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 20:25:53,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480
[2024-06-10 20:25:55,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.82 | bwd_microstep: 1186.01 | bwd_inner_microstep: 1185.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 20:25:57,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.88 | bwd_microstep: 1301.24 | bwd_inner_microstep: 1301.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-10 20:25:59,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.34 | bwd_microstep: 1635.82 | bwd_inner_microstep: 1635.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4084
[2024-06-10 20:26:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.48 | bwd_microstep: 1729.08 | bwd_inner_microstep: 1729.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3712
[2024-06-10 20:26:04,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.91 | bwd_microstep: 1631.23 | bwd_inner_microstep: 1631.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968
[2024-06-10 20:26:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.89 | bwd_microstep: 709.06 | bwd_inner_microstep: 709.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3458
[2024-06-10 20:26:06,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1423.70 | bwd_inner_microstep: 1423.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 20:26:08,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1343.45 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397
[2024-06-10 20:26:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.54 | bwd_microstep: 1439.79 | bwd_inner_microstep: 1439.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3730
[2024-06-10 20:26:13,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.09 | bwd_microstep: 1730.10 | bwd_inner_microstep: 1730.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-10 20:26:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.92 | bwd_microstep: 1584.45 | bwd_inner_microstep: 1584.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 20:26:17,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.96 | bwd_microstep: 1608.84 | bwd_inner_microstep: 1608.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3751
[2024-06-10 20:26:20,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.20 | bwd_microstep: 1844.23 | bwd_inner_microstep: 1844.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3646
[2024-06-10 20:26:22,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.86 | bwd_microstep: 1542.46 | bwd_inner_microstep: 1542.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2037
[2024-06-10 20:26:23,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.63 | bwd_microstep: 717.94 | bwd_inner_microstep: 717.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 618
[2024-06-10 20:26:23,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.56 | bwd_microstep: 260.31 | bwd_inner_microstep: 260.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 20:26:25,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.00 | bwd_microstep: 1461.47 | bwd_inner_microstep: 1461.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 20:26:27,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1379.89 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 20:26:29,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.54 | bwd_microstep: 1422.58 | bwd_inner_microstep: 1422.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3446
[2024-06-10 20:26:31,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.69 | bwd_microstep: 1312.54 | bwd_inner_microstep: 1312.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 20:26:33,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3609
[2024-06-10 20:26:35,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.63 | bwd_microstep: 1370.54 | bwd_inner_microstep: 1370.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 20:26:37,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1551.16 | bwd_inner_microstep: 1551.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907
[2024-06-10 20:26:38,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.27 | bwd_microstep: 685.78 | bwd_inner_microstep: 685.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 20:26:39,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.10 | bwd_microstep: 1157.01 | bwd_inner_microstep: 1156.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3789
[2024-06-10 20:26:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.94 | bwd_microstep: 1718.14 | bwd_inner_microstep: 1718.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 20:26:44,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.38 | bwd_microstep: 1453.79 | bwd_inner_microstep: 1453.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3427
[2024-06-10 20:26:49,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.27 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 20:26:49,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 4740.03 | bwd_inner_microstep: 1695.19 | bwd_allreduce_microstep: 3044.79 | step_microstep: 38.40
[2024-06-10 20:26:49,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16028.88 | bwd: 46235.06 | bwd_inner: 43189.30 | bwd_allreduce: 3045.06 | step: 39.84
{'loss': 1.1101, 'learning_rate': 1.0756705494210489e-05, 'epoch': 0.66}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 20:26:51,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.14 | bwd_microstep: 1341.16 | bwd_inner_microstep: 1341.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3549
[2024-06-10 20:26:53,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1359.37 | bwd_inner_microstep: 1359.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 20:26:55,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798
[2024-06-10 20:26:57,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.70 | bwd_microstep: 1647.54 | bwd_inner_microstep: 1647.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 20:26:59,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 1285.35 | bwd_inner_microstep: 1285.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 20:27:01,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1384.18 | bwd_inner_microstep: 1384.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 20:27:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1247.49 | bwd_inner_microstep: 1247.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-10 20:27:04,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.08 | bwd_microstep: 1151.24 | bwd_inner_microstep: 1151.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3577
[2024-06-10 20:27:06,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.54 | bwd_microstep: 1207.79 | bwd_inner_microstep: 1207.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 20:27:08,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1388.34 | bwd_inner_microstep: 1388.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688
[2024-06-10 20:27:10,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1526.76 | bwd_inner_microstep: 1526.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2187
[2024-06-10 20:27:11,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.52 | bwd_microstep: 859.37 | bwd_inner_microstep: 859.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637
[2024-06-10 20:27:13,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1319.19 | bwd_inner_microstep: 1319.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1990
[2024-06-10 20:27:14,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.62 | bwd_microstep: 898.60 | bwd_inner_microstep: 898.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 20:27:16,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.06 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3126
[2024-06-10 20:27:17,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.83 | bwd_microstep: 1062.97 | bwd_inner_microstep: 1062.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827
[2024-06-10 20:27:20,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.77 | bwd_microstep: 1761.23 | bwd_inner_microstep: 1761.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 20:27:22,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3822
[2024-06-10 20:27:24,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1498.10 | bwd_inner_microstep: 1498.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536
[2024-06-10 20:27:26,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1420.42 | bwd_inner_microstep: 1420.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 20:27:27,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.57 | bwd_microstep: 1256.48 | bwd_inner_microstep: 1256.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 20:27:29,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1399.50 | bwd_inner_microstep: 1399.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-10 20:27:30,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.88 | bwd_microstep: 806.38 | bwd_inner_microstep: 806.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2020
[2024-06-10 20:27:32,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.41 | bwd_microstep: 899.33 | bwd_inner_microstep: 899.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3603
[2024-06-10 20:27:34,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.08 | bwd_microstep: 1565.28 | bwd_inner_microstep: 1565.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 20:27:36,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.23 | bwd_microstep: 1651.64 | bwd_inner_microstep: 1651.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 20:27:38,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1451.86 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-10 20:27:39,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 810.06 | bwd_inner_microstep: 810.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 20:27:41,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.24 | bwd_microstep: 1489.27 | bwd_inner_microstep: 1489.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 20:27:43,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.94 | bwd_microstep: 1249.95 | bwd_inner_microstep: 1249.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 20:27:45,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1401.49 | bwd_inner_microstep: 1401.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 20:27:51,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 20:27:51,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 5061.60 | bwd_inner_microstep: 1687.96 | bwd_allreduce_microstep: 3373.58 | step_microstep: 38.19
[2024-06-10 20:27:51,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15751.88 | bwd: 45557.72 | bwd_inner: 42183.22 | bwd_allreduce: 3373.82 | step: 39.66
{'loss': 1.2439, 'learning_rate': 1.0723436894945345e-05, 'epoch': 0.66}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 20:27:52,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.83 | bwd_microstep: 1302.98 | bwd_inner_microstep: 1302.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3964
[2024-06-10 20:27:55,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.84 | bwd_microstep: 1596.09 | bwd_inner_microstep: 1596.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3835
[2024-06-10 20:27:57,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.53 | bwd_microstep: 1484.61 | bwd_inner_microstep: 1484.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 20:27:59,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1341.56 | bwd_inner_microstep: 1341.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 20:28:00,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1275.94 | bwd_inner_microstep: 1275.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743
[2024-06-10 20:28:02,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.28 | bwd_microstep: 1438.28 | bwd_inner_microstep: 1438.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2729
[2024-06-10 20:28:04,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.37 | bwd_microstep: 1037.83 | bwd_inner_microstep: 1037.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 20:28:05,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-10 20:28:07,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1411.97 | bwd_inner_microstep: 1411.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2089
[2024-06-10 20:28:09,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.19 | bwd_microstep: 866.89 | bwd_inner_microstep: 866.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 20:28:10,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1255.35 | bwd_inner_microstep: 1255.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486
[2024-06-10 20:28:12,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1404.11 | bwd_inner_microstep: 1404.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 20:28:14,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.98 | bwd_microstep: 1381.86 | bwd_inner_microstep: 1381.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435
[2024-06-10 20:28:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.56 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3650
[2024-06-10 20:28:18,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.75 | bwd_microstep: 1711.93 | bwd_inner_microstep: 1711.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 20:28:20,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1343.32 | bwd_inner_microstep: 1343.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2078
[2024-06-10 20:28:22,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.33 | bwd_microstep: 1011.11 | bwd_inner_microstep: 1011.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 20:28:23,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.07 | bwd_microstep: 1293.33 | bwd_inner_microstep: 1293.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 20:28:26,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.93 | bwd_microstep: 1659.23 | bwd_inner_microstep: 1659.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3651
[2024-06-10 20:28:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 1446.75 | bwd_inner_microstep: 1446.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 20:28:30,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.13 | bwd_microstep: 1634.47 | bwd_inner_microstep: 1634.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 20:28:32,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1562.06 | bwd_inner_microstep: 1562.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 923
[2024-06-10 20:28:33,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 156.34 | bwd_microstep: 405.80 | bwd_inner_microstep: 405.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2303
[2024-06-10 20:28:34,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.67 | bwd_microstep: 848.83 | bwd_inner_microstep: 848.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-10 20:28:36,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1496.94 | bwd_inner_microstep: 1496.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3607
[2024-06-10 20:28:38,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1243.39 | bwd_inner_microstep: 1243.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 20:28:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.08 | bwd_microstep: 1463.15 | bwd_inner_microstep: 1463.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3598
[2024-06-10 20:28:42,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.49 | bwd_microstep: 1450.81 | bwd_inner_microstep: 1450.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 20:28:44,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603
[2024-06-10 20:28:46,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.51 | bwd_microstep: 1463.18 | bwd_inner_microstep: 1463.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 20:28:47,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.45 | bwd_microstep: 1348.62 | bwd_inner_microstep: 1348.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 20:28:51,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 20:28:51,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.68 | bwd_microstep: 2583.20 | bwd_inner_microstep: 1324.83 | bwd_allreduce_microstep: 1258.32 | step_microstep: 37.78
[2024-06-10 20:28:51,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15854.33 | bwd: 43735.90 | bwd_inner: 42476.68 | bwd_allreduce: 1258.55 | step: 39.18
{'loss': 1.2171, 'learning_rate': 1.0690200967687234e-05, 'epoch': 0.66}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 20:28:53,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1475.69 | bwd_inner_microstep: 1475.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3987
[2024-06-10 20:28:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.67 | bwd_microstep: 1701.44 | bwd_inner_microstep: 1701.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 20:28:57,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 20:28:59,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1554.38 | bwd_inner_microstep: 1554.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 20:29:01,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1545.40 | bwd_inner_microstep: 1545.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3748
[2024-06-10 20:29:04,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.38 | bwd_microstep: 1845.31 | bwd_inner_microstep: 1845.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 20:29:05,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1252.85 | bwd_inner_microstep: 1252.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1366
[2024-06-10 20:29:06,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.66 | bwd_microstep: 519.59 | bwd_inner_microstep: 519.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 20:29:08,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1376.25 | bwd_inner_microstep: 1376.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440
[2024-06-10 20:29:10,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1313.34 | bwd_inner_microstep: 1313.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 20:29:12,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1291.82 | bwd_inner_microstep: 1291.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 20:29:13,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1381.71 | bwd_inner_microstep: 1381.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3613
[2024-06-10 20:29:16,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.25 | bwd_microstep: 1560.09 | bwd_inner_microstep: 1560.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 20:29:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.57 | bwd_microstep: 1350.99 | bwd_inner_microstep: 1350.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-10 20:29:19,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.57 | bwd_microstep: 890.81 | bwd_inner_microstep: 890.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3655
[2024-06-10 20:29:21,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.26 | bwd_microstep: 1817.63 | bwd_inner_microstep: 1817.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 20:29:23,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1345.00 | bwd_inner_microstep: 1344.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963
[2024-06-10 20:29:24,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.75 | bwd_microstep: 702.34 | bwd_inner_microstep: 702.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 20:29:26,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1508.65 | bwd_inner_microstep: 1508.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 20:29:28,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1398.19 | bwd_inner_microstep: 1398.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 20:29:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.84 | bwd_microstep: 684.43 | bwd_inner_microstep: 684.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 20:29:31,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.81 | bwd_microstep: 1382.48 | bwd_inner_microstep: 1382.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 20:29:33,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.60 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3668
[2024-06-10 20:29:34,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.92 | bwd_microstep: 1230.10 | bwd_inner_microstep: 1230.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3474
[2024-06-10 20:29:36,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1230.04 | bwd_inner_microstep: 1230.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-10 20:29:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1423.43 | bwd_inner_microstep: 1423.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 20:29:40,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.23 | bwd_microstep: 1473.28 | bwd_inner_microstep: 1473.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1982
[2024-06-10 20:29:41,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.04 | bwd_microstep: 927.26 | bwd_inner_microstep: 927.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 20:29:43,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.22 | bwd_microstep: 1375.69 | bwd_inner_microstep: 1375.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 20:29:45,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1493.06 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 20:29:47,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1528.28 | bwd_inner_microstep: 1528.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3460
[2024-06-10 20:29:53,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.19 | optimizer_step: 6.62
[2024-06-10 20:29:53,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.49 | bwd_microstep: 4567.51 | bwd_inner_microstep: 1696.95 | bwd_allreduce_microstep: 2870.51 | step_microstep: 38.04
[2024-06-10 20:29:53,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15968.36 | bwd: 45779.63 | bwd_inner: 42908.21 | bwd_allreduce: 2870.74 | step: 39.47


 66%|██████▌   | 1143/1726 [19:47:23<10:09:45, 62.75s/it]
 66%|██████▋   | 1144/1726 [19:48:23<10:00:56, 61.95s/it]


 66%|██████▋   | 1144/1726 [19:48:23<10:00:56, 61.95s/it]
 66%|██████▋   | 1145/1726 [19:49:26<10:01:46, 62.15s/it]


 66%|██████▋   | 1145/1726 [19:49:26<10:01:46, 62.15s/it]
 66%|██████▋   | 1146/1726 [19:50:27<9:59:15, 61.99s/it]


 66%|██████▋   | 1146/1726 [19:50:27<9:59:15, 61.99s/it]
 66%|██████▋   | 1147/1726 [19:51:27<9:52:12, 61.37s/it]


 66%|██████▋   | 1147/1726 [19:51:27<9:52:12, 61.37s/it]
 67%|██████▋   | 1148/1726 [19:52:29<9:53:13, 61.58s/it]
                    {'loss': 1.1854, 'learning_rate': 1.0656997829492912e-05, 'epoch': 0.67}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 20:29:55,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1459.90 | bwd_inner_microstep: 1459.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3388
[2024-06-10 20:29:56,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.26 | bwd_microstep: 1140.83 | bwd_inner_microstep: 1140.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 20:29:58,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1338.28 | bwd_inner_microstep: 1338.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868
[2024-06-10 20:30:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.90 | bwd_microstep: 1561.27 | bwd_inner_microstep: 1561.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 20:30:02,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1381.88 | bwd_inner_microstep: 1381.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 20:30:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.09 | bwd_microstep: 1285.84 | bwd_inner_microstep: 1285.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-10 20:30:06,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.23 | bwd_microstep: 1187.26 | bwd_inner_microstep: 1187.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739
[2024-06-10 20:30:08,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.67 | bwd_microstep: 1634.39 | bwd_inner_microstep: 1634.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 20:30:10,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1247.59 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 20:30:11,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1387.99 | bwd_inner_microstep: 1387.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 852
[2024-06-10 20:30:12,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 134.01 | bwd_microstep: 347.91 | bwd_inner_microstep: 347.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 20:30:14,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1247.55 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-10 20:30:15,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1192.79 | bwd_inner_microstep: 1192.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1977
[2024-06-10 20:30:17,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.40 | bwd_microstep: 856.10 | bwd_inner_microstep: 856.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3698
[2024-06-10 20:30:19,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1653.59 | bwd_inner_microstep: 1653.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 20:30:21,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1377.23 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-10 20:30:22,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.37 | bwd_microstep: 1190.45 | bwd_inner_microstep: 1190.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 20:30:24,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.01 | bwd_microstep: 1286.30 | bwd_inner_microstep: 1286.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 20:30:26,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 20:30:28,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1391.91 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3622
[2024-06-10 20:30:30,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.68 | bwd_microstep: 1371.88 | bwd_inner_microstep: 1371.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625
[2024-06-10 20:30:32,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1442.45 | bwd_inner_microstep: 1442.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3646
[2024-06-10 20:30:34,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1472.61 | bwd_inner_microstep: 1472.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3491
[2024-06-10 20:30:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.84 | bwd_microstep: 1252.19 | bwd_inner_microstep: 1252.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-10 20:30:37,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.71 | bwd_microstep: 917.05 | bwd_inner_microstep: 917.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3721
[2024-06-10 20:30:39,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.88 | bwd_microstep: 1583.88 | bwd_inner_microstep: 1583.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 20:30:41,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1351.87 | bwd_inner_microstep: 1351.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1844
[2024-06-10 20:30:42,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.84 | bwd_microstep: 671.08 | bwd_inner_microstep: 671.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 20:30:44,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1495.31 | bwd_inner_microstep: 1495.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2233
[2024-06-10 20:30:45,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.20 | bwd_microstep: 1060.07 | bwd_inner_microstep: 1060.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-10 20:30:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1448.12 | bwd_inner_microstep: 1448.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-10 20:30:54,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.10 | optimizer_step: 6.58
[2024-06-10 20:30:54,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.63 | bwd_microstep: 5784.29 | bwd_inner_microstep: 911.37 | bwd_allreduce_microstep: 4872.87 | step_microstep: 37.95
[2024-06-10 20:30:54,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15163.78 | bwd: 45403.95 | bwd_inner: 40530.18 | bwd_allreduce: 4873.10 | step: 39.41
{'loss': 1.2743, 'learning_rate': 1.0623827597303679e-05, 'epoch': 0.67}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3457
[2024-06-10 20:30:55,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1232.77 | bwd_inner_microstep: 1232.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3392
[2024-06-10 20:30:57,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1302.73 | bwd_inner_microstep: 1302.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 20:30:59,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1478.42 | bwd_inner_microstep: 1478.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 20:31:01,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1380.79 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 854
[2024-06-10 20:31:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.59 | bwd_microstep: 346.98 | bwd_inner_microstep: 346.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 20:31:03,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.44 | bwd_microstep: 1185.64 | bwd_inner_microstep: 1185.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-10 20:31:04,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.27 | bwd_microstep: 789.62 | bwd_inner_microstep: 789.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400
[2024-06-10 20:31:06,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.77 | bwd_microstep: 1179.29 | bwd_inner_microstep: 1179.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 20:31:08,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.91 | bwd_microstep: 1189.91 | bwd_inner_microstep: 1189.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3971
[2024-06-10 20:31:10,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.04 | bwd_microstep: 1650.72 | bwd_inner_microstep: 1650.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-10 20:31:11,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.89 | bwd_microstep: 912.74 | bwd_inner_microstep: 912.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3491
[2024-06-10 20:31:13,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.90 | bwd_microstep: 1531.47 | bwd_inner_microstep: 1531.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 20:31:15,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1377.91 | bwd_inner_microstep: 1377.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 20:31:17,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1381.28 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 20:31:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1251.51 | bwd_inner_microstep: 1251.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2112
[2024-06-10 20:31:20,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.37 | bwd_microstep: 982.21 | bwd_inner_microstep: 982.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 20:31:22,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.85 | bwd_microstep: 1624.64 | bwd_inner_microstep: 1624.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 20:31:24,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.10 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-10 20:31:26,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.32 | bwd_microstep: 1611.05 | bwd_inner_microstep: 1611.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 20:31:28,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1251.21 | bwd_inner_microstep: 1251.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3822
[2024-06-10 20:31:30,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.04 | bwd_microstep: 1622.60 | bwd_inner_microstep: 1622.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 20:31:32,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1287.79 | bwd_inner_microstep: 1287.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070
[2024-06-10 20:31:33,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.79 | bwd_microstep: 848.11 | bwd_inner_microstep: 848.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679
[2024-06-10 20:31:35,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.48 | bwd_microstep: 1326.90 | bwd_inner_microstep: 1326.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 20:31:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 1404.95 | bwd_inner_microstep: 1404.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 20:31:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1399.17 | bwd_inner_microstep: 1399.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3380
[2024-06-10 20:31:41,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1304.01 | bwd_inner_microstep: 1303.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3418
[2024-06-10 20:31:43,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.23 | bwd_microstep: 1376.87 | bwd_inner_microstep: 1376.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-10 20:31:44,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.31 | bwd_microstep: 1004.34 | bwd_inner_microstep: 1004.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3768
[2024-06-10 20:31:46,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1374.50 | bwd_inner_microstep: 1374.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3572
[2024-06-10 20:31:48,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.82 | bwd_microstep: 1646.36 | bwd_inner_microstep: 1646.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555
[2024-06-10 20:31:55,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.38 | optimizer_step: 6.60
[2024-06-10 20:31:55,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.99 | bwd_microstep: 6173.92 | bwd_inner_microstep: 1376.05 | bwd_allreduce_microstep: 4797.80 | step_microstep: 38.79
[2024-06-10 20:31:55,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.83 | bwd: 45882.92 | bwd_inner: 41084.19 | bwd_allreduce: 4798.05 | step: 40.27
{'loss': 1.203, 'learning_rate': 1.059069038794489e-05, 'epoch': 0.67}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3473
[2024-06-10 20:31:57,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.32 | bwd_microstep: 1432.03 | bwd_inner_microstep: 1432.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128
[2024-06-10 20:31:58,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.73 | bwd_microstep: 924.16 | bwd_inner_microstep: 924.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4510
[2024-06-10 20:32:01,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.71 | bwd_microstep: 1640.28 | bwd_inner_microstep: 1640.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3876
[2024-06-10 20:32:03,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.32 | bwd_microstep: 1683.67 | bwd_inner_microstep: 1683.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2404
[2024-06-10 20:32:04,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.99 | bwd_microstep: 1001.96 | bwd_inner_microstep: 1001.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 20:32:06,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.89 | bwd_microstep: 1344.20 | bwd_inner_microstep: 1344.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 20:32:08,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 20:32:09,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 790.94 | bwd_inner_microstep: 790.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-10 20:32:11,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.97 | bwd_microstep: 1635.26 | bwd_inner_microstep: 1635.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 20:32:13,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.79 | bwd_microstep: 1280.68 | bwd_inner_microstep: 1280.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 20:32:15,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.61 | bwd_microstep: 1612.56 | bwd_inner_microstep: 1612.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3752
[2024-06-10 20:32:18,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.80 | bwd_microstep: 1680.52 | bwd_inner_microstep: 1680.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 20:32:20,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.02 | bwd_microstep: 1629.75 | bwd_inner_microstep: 1629.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 20:32:22,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.14 | bwd_microstep: 1341.48 | bwd_inner_microstep: 1341.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 20:32:24,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1487.56 | bwd_inner_microstep: 1487.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674
[2024-06-10 20:32:26,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.92 | bwd_microstep: 1689.29 | bwd_inner_microstep: 1689.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 20:32:27,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 793.87 | bwd_inner_microstep: 793.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 20:32:29,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.57 | bwd_microstep: 1181.93 | bwd_inner_microstep: 1181.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 20:32:31,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.29 | bwd_microstep: 1279.60 | bwd_inner_microstep: 1279.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 20:32:33,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.26 | bwd_microstep: 1556.47 | bwd_inner_microstep: 1556.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2135
[2024-06-10 20:32:34,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.04 | bwd_microstep: 736.65 | bwd_inner_microstep: 736.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2000
[2024-06-10 20:32:35,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.33 | bwd_microstep: 737.22 | bwd_inner_microstep: 737.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 20:32:37,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1534.07 | bwd_inner_microstep: 1534.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 20:32:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.04 | bwd_microstep: 1498.31 | bwd_inner_microstep: 1498.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 20:32:41,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2192
[2024-06-10 20:32:42,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.80 | bwd_microstep: 795.60 | bwd_inner_microstep: 795.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 20:32:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1256.99 | bwd_inner_microstep: 1256.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 20:32:46,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1656.32 | bwd_inner_microstep: 1656.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2077
[2024-06-10 20:32:47,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.52 | bwd_microstep: 997.78 | bwd_inner_microstep: 997.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 20:32:49,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.72 | bwd_microstep: 1600.74 | bwd_inner_microstep: 1600.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 20:32:52,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.48 | bwd_microstep: 1652.50 | bwd_inner_microstep: 1652.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2368
[2024-06-10 20:32:59,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 20:32:59,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.64 | bwd_microstep: 6859.02 | bwd_inner_microstep: 1277.64 | bwd_allreduce_microstep: 5581.31 | step_microstep: 38.74
[2024-06-10 20:32:59,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15709.37 | bwd: 47878.55 | bwd_inner: 42296.31 | bwd_allreduce: 5581.55 | step: 40.17
{'loss': 1.1962, 'learning_rate': 1.055758631812565e-05, 'epoch': 0.67}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3470
[2024-06-10 20:33:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1494.26 | bwd_inner_microstep: 1494.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 20:33:03,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.55 | bwd_microstep: 1157.26 | bwd_inner_microstep: 1157.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 20:33:04,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1275.43 | bwd_inner_microstep: 1275.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 20:33:06,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1477.48 | bwd_inner_microstep: 1477.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3776
[2024-06-10 20:33:09,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1488.22 | bwd_inner_microstep: 1488.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-10 20:33:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.89 | bwd_microstep: 1432.25 | bwd_inner_microstep: 1432.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-10 20:33:12,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.83 | bwd_microstep: 794.63 | bwd_inner_microstep: 794.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 20:33:14,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 20:33:16,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.50 | bwd_microstep: 1526.98 | bwd_inner_microstep: 1526.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 20:33:18,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.67 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-10 20:33:19,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.94 | bwd_microstep: 798.21 | bwd_inner_microstep: 798.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-10 20:33:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.87 | bwd_microstep: 689.20 | bwd_inner_microstep: 689.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3678
[2024-06-10 20:33:22,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1402.71 | bwd_inner_microstep: 1402.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2774
[2024-06-10 20:33:23,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.47 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 20:33:25,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1384.69 | bwd_inner_microstep: 1384.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-10 20:33:28,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.81 | bwd_microstep: 1716.73 | bwd_inner_microstep: 1716.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 20:33:29,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 798.04 | bwd_inner_microstep: 798.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3675
[2024-06-10 20:33:30,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.43 | bwd_microstep: 1230.70 | bwd_inner_microstep: 1230.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-10 20:33:32,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.38 | bwd_microstep: 1295.45 | bwd_inner_microstep: 1295.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3611
[2024-06-10 20:33:34,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.91 | bwd_microstep: 1212.76 | bwd_inner_microstep: 1212.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144
[2024-06-10 20:33:35,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.01 | bwd_microstep: 834.69 | bwd_inner_microstep: 834.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 20:33:37,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.13 | bwd_microstep: 1288.83 | bwd_inner_microstep: 1288.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-10 20:33:39,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.96 | bwd_microstep: 1637.63 | bwd_inner_microstep: 1637.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 20:33:41,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.42 | bwd_microstep: 1400.43 | bwd_inner_microstep: 1400.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 20:33:42,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.35 | bwd_microstep: 697.29 | bwd_inner_microstep: 697.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 20:33:44,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.02 | bwd_microstep: 1292.56 | bwd_inner_microstep: 1292.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2627
[2024-06-10 20:33:45,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.84 | bwd_microstep: 1017.02 | bwd_inner_microstep: 1016.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 20:33:47,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.45 | bwd_microstep: 1601.20 | bwd_inner_microstep: 1601.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 20:33:49,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1376.47 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574
[2024-06-10 20:33:51,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.23 | bwd_microstep: 1554.63 | bwd_inner_microstep: 1554.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 20:33:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1604.79 | bwd_inner_microstep: 1604.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3426
[2024-06-10 20:33:58,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.60
[2024-06-10 20:33:58,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 3987.27 | bwd_inner_microstep: 1599.53 | bwd_allreduce_microstep: 2387.69 | step_microstep: 38.07
[2024-06-10 20:33:58,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15325.94 | bwd: 43488.17 | bwd_inner: 41099.56 | bwd_allreduce: 2387.91 | step: 39.59
{'loss': 1.2328, 'learning_rate': 1.0524515504438302e-05, 'epoch': 0.67}
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4539
[2024-06-10 20:34:01,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 711.65 | bwd_microstep: 1930.23 | bwd_inner_microstep: 1930.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-10 20:34:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.21 | bwd_microstep: 725.15 | bwd_inner_microstep: 725.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 20:34:04,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.21 | bwd_microstep: 1481.73 | bwd_inner_microstep: 1481.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 20:34:06,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 20:34:08,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 859
[2024-06-10 20:34:08,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.05 | bwd_microstep: 347.95 | bwd_inner_microstep: 347.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 20:34:10,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1244.21 | bwd_inner_microstep: 1244.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697
[2024-06-10 20:34:12,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1420.90 | bwd_inner_microstep: 1420.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870
[2024-06-10 20:34:13,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 679.57 | bwd_inner_microstep: 679.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-10 20:34:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1412.29 | bwd_inner_microstep: 1412.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 20:34:17,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.60 | bwd_microstep: 1310.27 | bwd_inner_microstep: 1310.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1964
[2024-06-10 20:34:18,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.19 | bwd_microstep: 920.89 | bwd_inner_microstep: 920.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 20:34:19,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.66 | bwd_microstep: 889.88 | bwd_inner_microstep: 889.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2780
[2024-06-10 20:34:20,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.22 | bwd_microstep: 954.32 | bwd_inner_microstep: 954.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 20:34:23,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.51 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 20:34:25,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1490.52 | bwd_inner_microstep: 1490.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 20:34:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1434.66 | bwd_inner_microstep: 1434.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 20:34:29,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.14 | bwd_microstep: 1611.82 | bwd_inner_microstep: 1611.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-10 20:34:31,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1309.02 | bwd_inner_microstep: 1309.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-10 20:34:32,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.82 | bwd_microstep: 1196.05 | bwd_inner_microstep: 1196.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 20:34:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1276.96 | bwd_inner_microstep: 1276.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 20:34:36,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.10 | bwd_microstep: 1555.52 | bwd_inner_microstep: 1555.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3890
[2024-06-10 20:34:38,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.11 | bwd_microstep: 1489.90 | bwd_inner_microstep: 1489.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 20:34:40,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.77 | bwd_microstep: 1599.50 | bwd_inner_microstep: 1599.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-10 20:34:43,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.22 | bwd_microstep: 1575.27 | bwd_inner_microstep: 1575.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 20:34:45,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1391.25 | bwd_inner_microstep: 1391.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817
[2024-06-10 20:34:47,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1484.85 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 20:34:49,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1454.26 | bwd_inner_microstep: 1454.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 20:34:51,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1380.19 | bwd_inner_microstep: 1380.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 20:34:53,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.94 | bwd_microstep: 1643.04 | bwd_inner_microstep: 1643.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-10 20:34:55,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1593.22 | bwd_inner_microstep: 1593.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-10 20:35:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.29 | optimizer_step: 6.59
[2024-06-10 20:35:01,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.73 | bwd_microstep: 5494.82 | bwd_inner_microstep: 933.32 | bwd_allreduce_microstep: 4561.44 | step_microstep: 39.07
[2024-06-10 20:35:01,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15708.15 | bwd: 46639.03 | bwd_inner: 42076.68 | bwd_allreduce: 4561.67 | step: 40.53


 67%|██████▋   | 1148/1726 [19:52:29<9:53:13, 61.58s/it]
 67%|██████▋   | 1149/1726 [19:53:30<9:50:12, 61.37s/it]


 67%|██████▋   | 1149/1726 [19:53:30<9:50:12, 61.37s/it]
 67%|██████▋   | 1150/1726 [19:54:32<9:49:42, 61.43s/it]


 67%|██████▋   | 1150/1726 [19:54:32<9:49:42, 61.43s/it]
 67%|██████▋   | 1151/1726 [19:55:36<9:55:51, 62.18s/it]


 67%|██████▋   | 1151/1726 [19:55:36<9:55:51, 62.18s/it]
 67%|██████▋   | 1152/1726 [19:56:35<9:46:05, 61.26s/it]


 67%|██████▋   | 1152/1726 [19:56:35<9:46:05, 61.26s/it]
 67%|██████▋   | 1153/1726 [19:57:38<9:49:06, 61.69s/it]
                                            {'loss': 1.1984, 'learning_rate': 1.0491478063358096e-05, 'epoch': 0.67}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405
[2024-06-10 20:35:03,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.66 | bwd_microstep: 1436.23 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 20:35:05,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.94 | bwd_microstep: 1245.64 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-10 20:35:06,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.95 | bwd_microstep: 711.04 | bwd_inner_microstep: 711.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 20:35:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.27 | bwd_microstep: 1350.91 | bwd_inner_microstep: 1350.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2637
[2024-06-10 20:35:09,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.96 | bwd_microstep: 1112.46 | bwd_inner_microstep: 1112.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414
[2024-06-10 20:35:11,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.06 | bwd_microstep: 1209.16 | bwd_inner_microstep: 1209.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238
[2024-06-10 20:35:12,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.66 | bwd_microstep: 959.56 | bwd_inner_microstep: 959.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 20:35:13,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.66 | bwd_microstep: 970.46 | bwd_inner_microstep: 970.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 20:35:15,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1457.13 | bwd_inner_microstep: 1457.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2210
[2024-06-10 20:35:16,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.85 | bwd_microstep: 891.39 | bwd_inner_microstep: 891.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4057
[2024-06-10 20:35:19,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.40 | bwd_microstep: 1622.52 | bwd_inner_microstep: 1622.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 20:35:21,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.67 | bwd_microstep: 1418.40 | bwd_inner_microstep: 1418.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3670
[2024-06-10 20:35:23,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1366.92 | bwd_inner_microstep: 1366.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 20:35:24,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487
[2024-06-10 20:35:26,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1406.26 | bwd_inner_microstep: 1406.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 20:35:28,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1406.73 | bwd_inner_microstep: 1406.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3438
[2024-06-10 20:35:30,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.42 | bwd_microstep: 1215.04 | bwd_inner_microstep: 1215.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3671
[2024-06-10 20:35:32,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1454.42 | bwd_inner_microstep: 1454.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3626
[2024-06-10 20:35:34,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.20 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 20:35:36,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1454.40 | bwd_inner_microstep: 1454.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 20:35:38,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1299.21 | bwd_inner_microstep: 1299.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 20:35:40,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.43 | bwd_microstep: 1555.06 | bwd_inner_microstep: 1555.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 20:35:42,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1392.69 | bwd_inner_microstep: 1392.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 20:35:44,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1557.62 | bwd_inner_microstep: 1557.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150
[2024-06-10 20:35:45,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.30 | bwd_microstep: 948.92 | bwd_inner_microstep: 948.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 20:35:47,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1258.51 | bwd_inner_microstep: 1258.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817
[2024-06-10 20:35:49,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1479.23 | bwd_inner_microstep: 1479.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3721
[2024-06-10 20:35:51,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.81 | bwd_microstep: 1467.82 | bwd_inner_microstep: 1467.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3377
[2024-06-10 20:35:53,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.33 | bwd_microstep: 1365.58 | bwd_inner_microstep: 1365.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396
[2024-06-10 20:35:55,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.42 | bwd_microstep: 1277.39 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-10 20:35:57,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.49 | bwd_microstep: 1747.99 | bwd_inner_microstep: 1747.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 20:36:01,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.05 | optimizer_step: 6.61
[2024-06-10 20:36:01,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 3197.21 | bwd_inner_microstep: 1664.48 | bwd_allreduce_microstep: 1532.67 | step_microstep: 37.58
[2024-06-10 20:36:01,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15816.53 | bwd: 43945.66 | bwd_inner: 42412.08 | bwd_allreduce: 1532.90 | step: 39.00
{'loss': 1.1819, 'learning_rate': 1.0458474111242723e-05, 'epoch': 0.67}
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2677
[2024-06-10 20:36:02,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.24 | bwd_microstep: 1015.32 | bwd_inner_microstep: 1015.15 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3904
[2024-06-10 20:36:05,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1684.83 | bwd_inner_microstep: 1684.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 20:36:07,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.33 | bwd_microstep: 1484.49 | bwd_inner_microstep: 1484.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892
[2024-06-10 20:36:09,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1583.81 | bwd_inner_microstep: 1583.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-10 20:36:11,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1402.39 | bwd_inner_microstep: 1402.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 20:36:13,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-10 20:36:14,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.55 | bwd_microstep: 759.52 | bwd_inner_microstep: 759.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1926
[2024-06-10 20:36:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.24 | bwd_microstep: 725.27 | bwd_inner_microstep: 725.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896
[2024-06-10 20:36:16,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.61 | bwd_microstep: 776.55 | bwd_inner_microstep: 776.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 20:36:18,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1384.85 | bwd_inner_microstep: 1384.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3670
[2024-06-10 20:36:19,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.12 | bwd_microstep: 1228.96 | bwd_inner_microstep: 1228.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 20:36:22,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.83 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687
[2024-06-10 20:36:24,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.04 | bwd_microstep: 1616.45 | bwd_inner_microstep: 1616.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 20:36:26,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.27 | bwd_microstep: 1343.71 | bwd_inner_microstep: 1343.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2954
[2024-06-10 20:36:27,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.85 | bwd_microstep: 1131.42 | bwd_inner_microstep: 1131.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3837
[2024-06-10 20:36:29,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.20 | bwd_microstep: 1354.48 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2081
[2024-06-10 20:36:30,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.86 | bwd_microstep: 916.51 | bwd_inner_microstep: 916.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 20:36:32,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.01 | bwd_microstep: 1381.10 | bwd_inner_microstep: 1381.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 20:36:34,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1414.65 | bwd_inner_microstep: 1414.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2494
[2024-06-10 20:36:35,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.41 | bwd_microstep: 955.69 | bwd_inner_microstep: 955.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-10 20:36:38,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.35 | bwd_microstep: 1657.22 | bwd_inner_microstep: 1657.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 20:36:40,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3681
[2024-06-10 20:36:42,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.57 | bwd_inner_microstep: 1479.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 20:36:44,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.72 | bwd_microstep: 1443.20 | bwd_inner_microstep: 1443.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3815
[2024-06-10 20:36:46,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.33 | bwd_microstep: 1719.60 | bwd_inner_microstep: 1719.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 20:36:48,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.76 | bwd_microstep: 1553.40 | bwd_inner_microstep: 1553.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 20:36:50,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.15 | bwd_microstep: 1280.80 | bwd_inner_microstep: 1280.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3728
[2024-06-10 20:36:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.45 | bwd_microstep: 1560.76 | bwd_inner_microstep: 1560.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 20:36:54,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1348.86 | bwd_inner_microstep: 1348.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 20:36:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1546.32 | bwd_inner_microstep: 1546.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 20:36:58,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1551.73 | bwd_inner_microstep: 1551.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3782
[2024-06-10 20:37:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 20:37:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.99 | bwd_microstep: 5024.77 | bwd_inner_microstep: 1410.28 | bwd_allreduce_microstep: 3614.44 | step_microstep: 37.88
[2024-06-10 20:37:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15966.37 | bwd: 46483.80 | bwd_inner: 42868.34 | bwd_allreduce: 3614.73 | step: 39.45
{'loss': 1.2227, 'learning_rate': 1.0425503764331925e-05, 'epoch': 0.67}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 20:37:05,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.71 | bwd_microstep: 778.62 | bwd_inner_microstep: 778.56 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 20:37:06,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1243.23 | bwd_inner_microstep: 1243.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 20:37:09,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.48 | bwd_microstep: 1455.38 | bwd_inner_microstep: 1455.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 20:37:10,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.76 | bwd_microstep: 1388.03 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 20:37:12,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1248.05 | bwd_inner_microstep: 1248.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 20:37:13,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.16 | bwd_microstep: 677.83 | bwd_inner_microstep: 677.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-10 20:37:15,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1435.30 | bwd_inner_microstep: 1435.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-10 20:37:17,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.31 | bwd_microstep: 1149.53 | bwd_inner_microstep: 1149.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 20:37:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942
[2024-06-10 20:37:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.00 | bwd_microstep: 821.79 | bwd_inner_microstep: 821.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3845
[2024-06-10 20:37:22,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1484.35 | bwd_inner_microstep: 1484.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3670
[2024-06-10 20:37:24,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 20:37:26,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 20:37:27,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.67 | bwd_microstep: 1318.02 | bwd_inner_microstep: 1317.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461
[2024-06-10 20:37:29,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3826
[2024-06-10 20:37:31,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.45 | bwd_microstep: 1320.41 | bwd_inner_microstep: 1320.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 20:37:33,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.35 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3952
[2024-06-10 20:37:35,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.35 | bwd_microstep: 1599.25 | bwd_inner_microstep: 1599.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 20:37:37,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1607.89 | bwd_inner_microstep: 1607.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 20:37:39,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1278.69 | bwd_inner_microstep: 1278.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1987
[2024-06-10 20:37:40,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.00 | bwd_microstep: 735.66 | bwd_inner_microstep: 735.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-10 20:37:42,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.57 | bwd_microstep: 1157.98 | bwd_inner_microstep: 1157.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 20:37:44,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.73 | bwd_microstep: 1393.38 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-10 20:37:46,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1436.35 | bwd_inner_microstep: 1436.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 20:37:47,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 795.36 | bwd_inner_microstep: 795.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 20:37:49,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1298.24 | bwd_inner_microstep: 1298.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 20:37:50,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.22 | bwd_microstep: 875.67 | bwd_inner_microstep: 875.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 20:37:52,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1429.19 | bwd_inner_microstep: 1429.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3566
[2024-06-10 20:37:54,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.47 | bwd_microstep: 1558.46 | bwd_inner_microstep: 1558.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 20:37:56,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1281.13 | bwd_inner_microstep: 1281.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3795
[2024-06-10 20:37:58,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.45 | bwd_microstep: 1514.07 | bwd_inner_microstep: 1514.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 20:38:06,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.56
[2024-06-10 20:38:06,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 7172.61 | bwd_inner_microstep: 1579.35 | bwd_allreduce_microstep: 5593.20 | step_microstep: 38.26
[2024-06-10 20:38:06,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15256.79 | bwd: 46290.32 | bwd_inner: 40696.17 | bwd_allreduce: 5593.46 | step: 39.73
{'loss': 1.191, 'learning_rate': 1.0392567138747101e-05, 'epoch': 0.67}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 20:38:07,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.95 | bwd_microstep: 1237.80 | bwd_inner_microstep: 1237.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 20:38:09,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1276.05 | bwd_inner_microstep: 1276.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2431
[2024-06-10 20:38:10,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.49 | bwd_microstep: 1032.86 | bwd_inner_microstep: 1032.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3892
[2024-06-10 20:38:12,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1411.78 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 20:38:14,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1374.95 | bwd_inner_microstep: 1374.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 20:38:16,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.00 | bwd_microstep: 1536.04 | bwd_inner_microstep: 1536.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 20:38:18,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1389.20 | bwd_inner_microstep: 1389.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 20:38:20,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.32 | bwd_microstep: 1190.37 | bwd_inner_microstep: 1190.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 20:38:22,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1389.85 | bwd_inner_microstep: 1389.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3482
[2024-06-10 20:38:24,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1510.07 | bwd_inner_microstep: 1510.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3496
[2024-06-10 20:38:26,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.65 | bwd_microstep: 1365.98 | bwd_inner_microstep: 1365.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 894
[2024-06-10 20:38:26,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.31 | bwd_microstep: 367.68 | bwd_inner_microstep: 367.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1969
[2024-06-10 20:38:27,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.38 | bwd_microstep: 764.57 | bwd_inner_microstep: 764.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3694
[2024-06-10 20:38:30,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1557.17 | bwd_inner_microstep: 1557.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3735
[2024-06-10 20:38:32,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.45 | bwd_microstep: 1696.02 | bwd_inner_microstep: 1696.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 20:38:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.90 | bwd_microstep: 1479.27 | bwd_inner_microstep: 1479.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4004
[2024-06-10 20:38:36,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.93 | bwd_microstep: 1812.81 | bwd_inner_microstep: 1812.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 20:38:38,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3552
[2024-06-10 20:38:40,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1380.99 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 20:38:42,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1403.55 | bwd_inner_microstep: 1403.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 20:38:44,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.81 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-10 20:38:45,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 974.03 | bwd_inner_microstep: 974.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 20:38:47,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.50 | bwd_microstep: 1283.51 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 20:38:48,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.74 | bwd_microstep: 973.62 | bwd_inner_microstep: 973.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 20:38:51,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1655.91 | bwd_inner_microstep: 1655.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 20:38:53,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1393.36 | bwd_inner_microstep: 1393.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 20:38:54,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1293.51 | bwd_inner_microstep: 1293.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 20:38:56,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1395.13 | bwd_inner_microstep: 1395.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 20:38:58,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1448.11 | bwd_inner_microstep: 1448.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 20:39:00,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.87 | bwd_microstep: 1253.32 | bwd_inner_microstep: 1253.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3611
[2024-06-10 20:39:02,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1707.42 | bwd_inner_microstep: 1707.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3453
[2024-06-10 20:39:06,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.84 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 20:39:06,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.60 | bwd_microstep: 2561.30 | bwd_inner_microstep: 1565.45 | bwd_allreduce_microstep: 995.81 | step_microstep: 39.74
[2024-06-10 20:39:06,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15946.78 | bwd: 43795.76 | bwd_inner: 42799.06 | bwd_allreduce: 996.03 | step: 41.23
{'loss': 1.2321, 'learning_rate': 1.035966435049086e-05, 'epoch': 0.67}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 20:39:08,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1473.82 | bwd_inner_microstep: 1473.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 20:39:10,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1383.40 | bwd_inner_microstep: 1383.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4505
[2024-06-10 20:39:12,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.15 | bwd_microstep: 1638.02 | bwd_inner_microstep: 1638.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 20:39:14,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.56 | bwd_microstep: 1478.99 | bwd_inner_microstep: 1478.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2248
[2024-06-10 20:39:15,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.15 | bwd_microstep: 964.86 | bwd_inner_microstep: 964.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 20:39:17,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1637.92 | bwd_inner_microstep: 1637.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 20:39:19,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.88 | bwd_microstep: 1339.85 | bwd_inner_microstep: 1339.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 20:39:20,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.37 | bwd_microstep: 677.96 | bwd_inner_microstep: 677.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 20:39:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 20:39:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.28 | bwd_microstep: 1410.20 | bwd_inner_microstep: 1410.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 20:39:25,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 20:39:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.21 | bwd_microstep: 1622.49 | bwd_inner_microstep: 1622.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 20:39:29,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1485.02 | bwd_inner_microstep: 1484.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3506
[2024-06-10 20:39:31,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.89 | bwd_microstep: 1429.10 | bwd_inner_microstep: 1429.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 20:39:33,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1251.91 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 20:39:35,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.36 | bwd_microstep: 1421.31 | bwd_inner_microstep: 1421.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 20:39:37,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1480.76 | bwd_inner_microstep: 1480.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1990
[2024-06-10 20:39:38,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.13 | bwd_microstep: 737.01 | bwd_inner_microstep: 736.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827
[2024-06-10 20:39:40,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.90 | bwd_microstep: 1387.78 | bwd_inner_microstep: 1387.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987
[2024-06-10 20:39:41,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.34 | bwd_microstep: 705.82 | bwd_inner_microstep: 705.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705
[2024-06-10 20:39:43,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.80 | bwd_microstep: 1527.55 | bwd_inner_microstep: 1527.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3562
[2024-06-10 20:39:45,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1264.17 | bwd_inner_microstep: 1264.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3708
[2024-06-10 20:39:47,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.57 | bwd_microstep: 1265.08 | bwd_inner_microstep: 1265.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684
[2024-06-10 20:39:48,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1322.99 | bwd_inner_microstep: 1322.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2005
[2024-06-10 20:39:49,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.44 | bwd_microstep: 771.05 | bwd_inner_microstep: 771.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2963
[2024-06-10 20:39:51,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.15 | bwd_microstep: 1137.98 | bwd_inner_microstep: 1137.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3790
[2024-06-10 20:39:53,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.28 | bwd_microstep: 1455.63 | bwd_inner_microstep: 1455.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3859
[2024-06-10 20:39:55,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.44 | bwd_microstep: 1399.12 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616
[2024-06-10 20:39:57,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1542.46 | bwd_inner_microstep: 1542.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032
[2024-06-10 20:39:58,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.74 | bwd_microstep: 902.76 | bwd_inner_microstep: 902.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 20:40:01,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.94 | bwd_microstep: 1638.16 | bwd_inner_microstep: 1638.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596
[2024-06-10 20:40:07,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 20:40:07,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.49 | bwd_microstep: 5836.19 | bwd_inner_microstep: 1811.82 | bwd_allreduce_microstep: 4024.32 | step_microstep: 38.13
[2024-06-10 20:40:07,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15494.55 | bwd: 45627.68 | bwd_inner: 41602.46 | bwd_allreduce: 4024.54 | step: 39.63
{'loss': 1.2522, 'learning_rate': 1.0326795515446666e-05, 'epoch': 0.67}


 67%|██████▋   | 1153/1726 [19:57:38<9:49:06, 61.69s/it]
 67%|██████▋   | 1154/1726 [19:58:38<9:43:30, 61.21s/it]


 67%|██████▋   | 1154/1726 [19:58:38<9:43:30, 61.21s/it]
 67%|██████▋   | 1155/1726 [19:59:40<9:46:59, 61.68s/it]


 67%|██████▋   | 1155/1726 [19:59:40<9:46:59, 61.68s/it]
 67%|██████▋   | 1156/1726 [20:00:42<9:46:29, 61.74s/it]


 67%|██████▋   | 1156/1726 [20:00:42<9:46:29, 61.74s/it]
 67%|██████▋   | 1157/1726 [20:01:42<9:40:43, 61.24s/it]


 67%|██████▋   | 1157/1726 [20:01:42<9:40:43, 61.24s/it]
 67%|██████▋   | 1158/1726 [20:02:44<9:40:19, 61.30s/it]


 67%|█�dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 20:40:09,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1442.73 | bwd_inner_microstep: 1442.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904
[2024-06-10 20:40:11,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.87 | bwd_microstep: 1581.82 | bwd_inner_microstep: 1581.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856
[2024-06-10 20:40:14,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1662.75 | bwd_inner_microstep: 1662.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 20:40:15,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1287.24 | bwd_inner_microstep: 1287.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 20:40:17,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.09 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 20:40:19,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.95 | bwd_microstep: 1488.35 | bwd_inner_microstep: 1488.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 20:40:21,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.74 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 20:40:23,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 20:40:25,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1385.68 | bwd_inner_microstep: 1385.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3493
[2024-06-10 20:40:27,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.20 | bwd_microstep: 1219.27 | bwd_inner_microstep: 1219.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 20:40:29,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.27 | bwd_microstep: 1475.97 | bwd_inner_microstep: 1475.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 20:40:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1256.84 | bwd_inner_microstep: 1256.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 20:40:33,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.75 | bwd_microstep: 1631.52 | bwd_inner_microstep: 1631.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 20:40:35,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1478.10 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 20:40:37,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.25 | bwd_microstep: 1416.56 | bwd_inner_microstep: 1416.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678
[2024-06-10 20:40:39,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.09 | bwd_microstep: 1456.11 | bwd_inner_microstep: 1456.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2138
[2024-06-10 20:40:40,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.22 | bwd_microstep: 832.25 | bwd_inner_microstep: 832.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839
[2024-06-10 20:40:42,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.91 | bwd_microstep: 1464.48 | bwd_inner_microstep: 1464.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875
[2024-06-10 20:40:43,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 715.19 | bwd_inner_microstep: 715.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130
[2024-06-10 20:40:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.16 | bwd_microstep: 836.82 | bwd_inner_microstep: 836.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 20:40:46,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1511.38 | bwd_inner_microstep: 1511.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073
[2024-06-10 20:40:47,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.68 | bwd_microstep: 818.14 | bwd_inner_microstep: 818.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3552
[2024-06-10 20:40:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1262.06 | bwd_inner_microstep: 1262.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3765
[2024-06-10 20:40:51,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.89 | bwd_microstep: 1476.22 | bwd_inner_microstep: 1476.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-10 20:40:53,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.17 | bwd_microstep: 1217.87 | bwd_inner_microstep: 1217.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 20:40:55,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1557.64 | bwd_inner_microstep: 1557.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 20:40:57,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.37 | bwd_microstep: 1597.42 | bwd_inner_microstep: 1597.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3810
[2024-06-10 20:40:59,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1321.76 | bwd_inner_microstep: 1321.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764
[2024-06-10 20:41:01,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.82 | bwd_microstep: 1474.25 | bwd_inner_microstep: 1474.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 20:41:03,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 1544.05 | bwd_inner_microstep: 1544.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891
[2024-06-10 20:41:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.26 | bwd_microstep: 775.20 | bwd_inner_microstep: 775.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382
[2024-06-10 20:41:09,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.61
[2024-06-10 20:41:09,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.85 | bwd_microstep: 3987.64 | bwd_inner_microstep: 1518.08 | bwd_allreduce_microstep: 2469.49 | step_microstep: 38.63
[2024-06-10 20:41:09,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15953.36 | bwd: 45228.96 | bwd_inner: 42758.52 | bwd_allreduce: 2469.73 | step: 40.19
{'loss': 1.2004, 'learning_rate': 1.0293960749378384e-05, 'epoch': 0.67}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 20:41:11,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.12 | bwd_microstep: 1436.36 | bwd_inner_microstep: 1436.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-10 20:41:12,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.92 | bwd_microstep: 1372.34 | bwd_inner_microstep: 1372.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 20:41:14,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1343.92 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 20:41:16,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1377.72 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1950
[2024-06-10 20:41:17,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.07 | bwd_microstep: 729.13 | bwd_inner_microstep: 729.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-10 20:41:19,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.55 | bwd_microstep: 1557.07 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 20:41:21,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1246.76 | bwd_inner_microstep: 1246.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 20:41:23,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1449.98 | bwd_inner_microstep: 1449.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919
[2024-06-10 20:41:24,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.50 | bwd_microstep: 782.19 | bwd_inner_microstep: 782.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 20:41:26,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1399.68 | bwd_inner_microstep: 1399.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 20:41:28,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1298.16 | bwd_inner_microstep: 1298.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-10 20:41:29,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.49 | bwd_microstep: 685.29 | bwd_inner_microstep: 685.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3659
[2024-06-10 20:41:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.26 | bwd_microstep: 1354.97 | bwd_inner_microstep: 1354.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2132
[2024-06-10 20:41:32,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.82 | bwd_microstep: 799.79 | bwd_inner_microstep: 799.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3656
[2024-06-10 20:41:34,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1442.61 | bwd_inner_microstep: 1442.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-10 20:41:36,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1407.37 | bwd_inner_microstep: 1407.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-10 20:41:38,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.08 | bwd_microstep: 1522.40 | bwd_inner_microstep: 1522.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 20:41:40,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.06 | bwd_microstep: 1615.48 | bwd_inner_microstep: 1615.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 20:41:42,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2696
[2024-06-10 20:41:43,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.16 | bwd_microstep: 1035.23 | bwd_inner_microstep: 1035.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2427
[2024-06-10 20:41:45,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.50 | bwd_microstep: 942.81 | bwd_inner_microstep: 942.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 20:41:47,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1513.31 | bwd_inner_microstep: 1513.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-10 20:41:49,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.17 | bwd_microstep: 1459.51 | bwd_inner_microstep: 1459.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2426
[2024-06-10 20:41:50,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.22 | bwd_microstep: 939.58 | bwd_inner_microstep: 939.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 20:41:51,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.12 | bwd_microstep: 879.42 | bwd_inner_microstep: 879.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 20:41:54,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832
[2024-06-10 20:41:55,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.80 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-10 20:41:57,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1422.30 | bwd_inner_microstep: 1422.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 20:41:59,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1406.52 | bwd_inner_microstep: 1406.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 20:42:01,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.99 | bwd_microstep: 974.95 | bwd_inner_microstep: 974.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812
[2024-06-10 20:42:03,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 20:42:08,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-10 20:42:08,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 4526.83 | bwd_inner_microstep: 1521.13 | bwd_allreduce_microstep: 3005.64 | step_microstep: 37.72
[2024-06-10 20:42:08,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15167.86 | bwd: 43631.16 | bwd_inner: 40624.61 | bwd_allreduce: 3005.87 | step: 39.19
{'loss': 1.1851, 'learning_rate': 1.0261160167929884e-05, 'epoch': 0.67}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 20:42:10,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.14 | bwd_microstep: 1380.39 | bwd_inner_microstep: 1380.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870
[2024-06-10 20:42:11,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.49 | bwd_microstep: 736.74 | bwd_inner_microstep: 736.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3932
[2024-06-10 20:42:13,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.27 | bwd_microstep: 1589.73 | bwd_inner_microstep: 1589.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 20:42:15,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.48 | bwd_microstep: 1474.55 | bwd_inner_microstep: 1474.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 20:42:17,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 20:42:19,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.45 | bwd_microstep: 1650.51 | bwd_inner_microstep: 1650.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 20:42:21,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.41 | bwd_microstep: 1247.19 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 20:42:23,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1281.81 | bwd_inner_microstep: 1281.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2890
[2024-06-10 20:42:24,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.30 | bwd_microstep: 1087.26 | bwd_inner_microstep: 1087.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 20:42:26,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1392.13 | bwd_inner_microstep: 1392.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 20:42:28,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.23 | bwd_microstep: 1527.13 | bwd_inner_microstep: 1527.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 20:42:30,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.11 | bwd_microstep: 1523.93 | bwd_inner_microstep: 1523.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2173
[2024-06-10 20:42:31,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 883.13 | bwd_inner_microstep: 883.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-10 20:42:34,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1580.23 | bwd_inner_microstep: 1580.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2710
[2024-06-10 20:42:35,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.32 | bwd_microstep: 1128.98 | bwd_inner_microstep: 1128.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417
[2024-06-10 20:42:37,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1536.93 | bwd_inner_microstep: 1536.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3643
[2024-06-10 20:42:40,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.96 | bwd_microstep: 1679.21 | bwd_inner_microstep: 1679.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 20:42:42,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1482.23 | bwd_inner_microstep: 1482.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2318
[2024-06-10 20:42:43,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.04 | bwd_microstep: 825.05 | bwd_inner_microstep: 825.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668
[2024-06-10 20:42:45,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.90 | bwd_microstep: 1448.88 | bwd_inner_microstep: 1448.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 20:42:47,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1286.04 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433
[2024-06-10 20:42:49,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.97 | bwd_microstep: 1439.68 | bwd_inner_microstep: 1439.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 20:42:50,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1307.16 | bwd_inner_microstep: 1307.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 20:42:52,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-10 20:42:54,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1397.46 | bwd_inner_microstep: 1397.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3830
[2024-06-10 20:42:56,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.59 | bwd_microstep: 1585.72 | bwd_inner_microstep: 1585.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 604
[2024-06-10 20:42:57,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.30 | bwd_microstep: 257.97 | bwd_inner_microstep: 257.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3569
[2024-06-10 20:42:59,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1526.89 | bwd_inner_microstep: 1526.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2916
[2024-06-10 20:43:00,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.02 | bwd_microstep: 1281.59 | bwd_inner_microstep: 1281.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3887
[2024-06-10 20:43:03,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.05 | bwd_microstep: 1788.16 | bwd_inner_microstep: 1788.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-10 20:43:05,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.87 | bwd_microstep: 1646.48 | bwd_inner_microstep: 1646.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 20:43:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 20:43:10,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 4008.40 | bwd_inner_microstep: 2107.61 | bwd_allreduce_microstep: 1900.74 | step_microstep: 37.69
[2024-06-10 20:43:10,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16134.67 | bwd: 45631.15 | bwd_inner: 43729.50 | bwd_allreduce: 1900.97 | step: 39.13
{'loss': 1.2438, 'learning_rate': 1.0228393886624639e-05, 'epoch': 0.67}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 20:43:12,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1366.50 | bwd_inner_microstep: 1366.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3982
[2024-06-10 20:43:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1502.39 | bwd_inner_microstep: 1502.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3396
[2024-06-10 20:43:16,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1306.53 | bwd_inner_microstep: 1306.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3793
[2024-06-10 20:43:17,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.69 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3874
[2024-06-10 20:43:19,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.00 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870
[2024-06-10 20:43:20,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.85 | bwd_microstep: 679.39 | bwd_inner_microstep: 679.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 20:43:22,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1407.30 | bwd_inner_microstep: 1407.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 20:43:24,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1249.35 | bwd_inner_microstep: 1249.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 20:43:26,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1282.72 | bwd_inner_microstep: 1282.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 20:43:28,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729
[2024-06-10 20:43:30,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1367.04 | bwd_inner_microstep: 1367.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 20:43:31,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.01 | bwd_microstep: 1153.95 | bwd_inner_microstep: 1153.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994
[2024-06-10 20:43:32,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.91 | bwd_microstep: 830.91 | bwd_inner_microstep: 830.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2656
[2024-06-10 20:43:34,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.04 | bwd_microstep: 1027.82 | bwd_inner_microstep: 1027.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 20:43:36,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1380.35 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2637
[2024-06-10 20:43:37,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.09 | bwd_microstep: 1016.65 | bwd_inner_microstep: 1016.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1926
[2024-06-10 20:43:38,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.13 | bwd_microstep: 726.16 | bwd_inner_microstep: 726.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3665
[2024-06-10 20:43:41,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.97 | bwd_microstep: 1820.89 | bwd_inner_microstep: 1820.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3661
[2024-06-10 20:43:43,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.90 | bwd_microstep: 1819.96 | bwd_inner_microstep: 1819.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 20:43:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.14 | bwd_microstep: 1436.53 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 20:43:47,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1379.74 | bwd_inner_microstep: 1379.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 20:43:49,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1438.47 | bwd_inner_microstep: 1438.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 20:43:51,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1279.44 | bwd_inner_microstep: 1279.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604
[2024-06-10 20:43:53,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 1608.35 | bwd_inner_microstep: 1608.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2225
[2024-06-10 20:43:54,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.67 | bwd_microstep: 862.66 | bwd_inner_microstep: 862.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 20:43:55,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.95 | bwd_microstep: 808.61 | bwd_inner_microstep: 808.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 20:43:57,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.14 | bwd_microstep: 1158.20 | bwd_inner_microstep: 1158.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 20:43:58,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 821.96 | bwd_inner_microstep: 821.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2037
[2024-06-10 20:43:59,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.55 | bwd_microstep: 903.31 | bwd_inner_microstep: 903.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 20:44:02,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.87 | bwd_microstep: 1601.06 | bwd_inner_microstep: 1601.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828
[2024-06-10 20:44:04,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.42 | bwd_microstep: 1489.39 | bwd_inner_microstep: 1489.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581
[2024-06-10 20:44:12,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.34 | optimizer_step: 6.63
[2024-06-10 20:44:12,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 7463.33 | bwd_inner_microstep: 1611.46 | bwd_allreduce_microstep: 5851.80 | step_microstep: 38.87
[2024-06-10 20:44:12,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.90 | bwd: 46372.29 | bwd_inner: 40519.57 | bwd_allreduce: 5852.04 | step: 40.35
{'loss': 1.2067, 'learning_rate': 1.0195662020865333e-05, 'epoch': 0.67}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 20:44:14,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1464.36 | bwd_inner_microstep: 1464.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 20:44:16,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 20:44:17,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1281.10 | bwd_inner_microstep: 1281.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3962
[2024-06-10 20:44:20,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1593.52 | bwd_inner_microstep: 1593.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 20:44:21,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1249.34 | bwd_inner_microstep: 1249.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 20:44:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.67 | bwd_microstep: 1543.35 | bwd_inner_microstep: 1543.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 20:44:25,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.32 | bwd_microstep: 1147.14 | bwd_inner_microstep: 1147.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 20:44:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.79 | bwd_microstep: 1435.00 | bwd_inner_microstep: 1434.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 20:44:28,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.13 | bwd_microstep: 800.10 | bwd_inner_microstep: 800.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-10 20:44:30,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.96 | bwd_microstep: 1625.73 | bwd_inner_microstep: 1625.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 20:44:32,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1384.70 | bwd_inner_microstep: 1384.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 880
[2024-06-10 20:44:33,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.97 | bwd_microstep: 367.97 | bwd_inner_microstep: 367.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3959
[2024-06-10 20:44:35,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.97 | bwd_microstep: 1627.73 | bwd_inner_microstep: 1627.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966
[2024-06-10 20:44:36,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.21 | bwd_microstep: 825.27 | bwd_inner_microstep: 825.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 20:44:38,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1347.05 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2126
[2024-06-10 20:44:39,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.11 | bwd_microstep: 828.56 | bwd_inner_microstep: 828.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 20:44:41,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1378.15 | bwd_inner_microstep: 1378.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 20:44:43,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1513.83 | bwd_inner_microstep: 1513.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1989
[2024-06-10 20:44:44,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.52 | bwd_microstep: 769.90 | bwd_inner_microstep: 769.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3453
[2024-06-10 20:44:46,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.26 | bwd_microstep: 1321.80 | bwd_inner_microstep: 1321.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999
[2024-06-10 20:44:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.25 | bwd_microstep: 707.39 | bwd_inner_microstep: 707.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 20:44:49,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1496.93 | bwd_inner_microstep: 1496.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3775
[2024-06-10 20:44:51,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.77 | bwd_microstep: 1345.83 | bwd_inner_microstep: 1345.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 20:44:53,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1293.87 | bwd_inner_microstep: 1293.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3573
[2024-06-10 20:44:54,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.95 | bwd_microstep: 1206.53 | bwd_inner_microstep: 1206.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 20:44:57,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1509.80 | bwd_inner_microstep: 1509.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 20:44:58,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2196
[2024-06-10 20:45:00,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.32 | bwd_microstep: 797.74 | bwd_inner_microstep: 797.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 20:45:02,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.62 | bwd_microstep: 1645.52 | bwd_inner_microstep: 1645.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760
[2024-06-10 20:45:04,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.27 | bwd_microstep: 1642.15 | bwd_inner_microstep: 1642.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 20:45:06,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.76 | bwd_microstep: 1297.58 | bwd_inner_microstep: 1297.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3847
[2024-06-10 20:45:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 20:45:15,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.85 | bwd_microstep: 8371.88 | bwd_inner_microstep: 1997.93 | bwd_allreduce_microstep: 6373.90 | step_microstep: 37.85
[2024-06-10 20:45:15,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15381.28 | bwd: 47613.27 | bwd_inner: 41238.47 | bwd_allreduce: 6374.13 | step: 39.37
{'loss': 1.1528, 'learning_rate': 1.0162964685933426e-05, 'epoch': 0.67}
�████▋   | 1158/1726 [20:02:44<9:40:19, 61.30s/it]
 67%|██████▋   | 1159/1726 [20:03:45<9:39:55, 61.37s/it]


 67%|██████▋   | 1159/1726 [20:03:45<9:39:55, 61.37s/it]
 67%|██████▋   | 1160/1726 [20:04:44<9:32:32, 60.69s/it]


 67%|██████▋   | 1160/1726 [20:04:44<9:32:32, 60.69s/it]
 67%|██████▋   | 1161/1726 [20:05:47<9:35:29, 61.11s/it]


 67%|██████▋   | 1161/1726 [20:05:47<9:35:29, 61.11s/it]
 67%|██████▋   | 1162/1726 [20:06:48<9:36:28, 61.33s/it]


 67%|██████▋   | 1162/1726 [20:06:48<9:36:28, 61.33s/it]
 67%|██████▋   | 1163/1726 [20:07:52<9:41:03, 61.93s/it]


 67%|██████▋   | 116dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 20:45:17,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.60 | bwd_microstep: 1330.75 | bwd_inner_microstep: 1330.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 20:45:19,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893
[2024-06-10 20:45:21,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.19 | bwd_microstep: 1580.53 | bwd_inner_microstep: 1580.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 20:45:23,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.73 | bwd_microstep: 1342.65 | bwd_inner_microstep: 1342.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 20:45:25,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1487.16 | bwd_inner_microstep: 1487.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 20:45:27,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.48 | bwd_microstep: 1243.01 | bwd_inner_microstep: 1242.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3858
[2024-06-10 20:45:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1659.13 | bwd_inner_microstep: 1659.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 20:45:31,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1346.67 | bwd_inner_microstep: 1346.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-10 20:45:33,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1408.68 | bwd_inner_microstep: 1408.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 20:45:34,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1296.40 | bwd_inner_microstep: 1296.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755
[2024-06-10 20:45:36,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.60 | bwd_microstep: 1467.78 | bwd_inner_microstep: 1467.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 20:45:38,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1391.38 | bwd_inner_microstep: 1391.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3676
[2024-06-10 20:45:41,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.07 | bwd_microstep: 1585.09 | bwd_inner_microstep: 1585.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2151
[2024-06-10 20:45:42,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.12 | bwd_microstep: 1043.27 | bwd_inner_microstep: 1043.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-10 20:45:44,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1440.80 | bwd_inner_microstep: 1440.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2936
[2024-06-10 20:45:46,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.96 | bwd_microstep: 1131.07 | bwd_inner_microstep: 1131.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 20:45:47,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-10 20:45:50,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.44 | bwd_microstep: 1583.16 | bwd_inner_microstep: 1583.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3461
[2024-06-10 20:45:52,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.30 | bwd_microstep: 1608.07 | bwd_inner_microstep: 1608.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2097
[2024-06-10 20:45:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.08 | bwd_microstep: 729.30 | bwd_inner_microstep: 729.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-10 20:45:55,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.77 | bwd_microstep: 1538.32 | bwd_inner_microstep: 1538.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 20:45:57,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1556.25 | bwd_inner_microstep: 1556.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3537
[2024-06-10 20:45:59,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1227.87 | bwd_inner_microstep: 1227.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472
[2024-06-10 20:46:01,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.15 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-10 20:46:03,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.36 | bwd_microstep: 1489.08 | bwd_inner_microstep: 1489.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3780
[2024-06-10 20:46:05,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.00 | bwd_microstep: 1579.76 | bwd_inner_microstep: 1579.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2168
[2024-06-10 20:46:06,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.97 | bwd_microstep: 804.86 | bwd_inner_microstep: 804.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035
[2024-06-10 20:46:07,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.59 | bwd_microstep: 807.15 | bwd_inner_microstep: 807.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 20:46:09,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1553.57 | bwd_inner_microstep: 1553.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3590
[2024-06-10 20:46:12,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.34 | bwd_microstep: 1807.52 | bwd_inner_microstep: 1807.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3441
[2024-06-10 20:46:14,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.68 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2207
[2024-06-10 20:46:17,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.58
[2024-06-10 20:46:17,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.22 | bwd_microstep: 2945.80 | bwd_inner_microstep: 1048.01 | bwd_allreduce_microstep: 1897.74 | step_microstep: 37.72
[2024-06-10 20:46:17,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16233.50 | bwd: 45483.12 | bwd_inner: 43584.47 | bwd_allreduce: 1897.96 | step: 39.19
{'loss': 1.2032, 'learning_rate': 1.0130301996988755e-05, 'epoch': 0.67}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905
[2024-06-10 20:46:18,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.29 | bwd_microstep: 770.83 | bwd_inner_microstep: 770.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 20:46:20,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1376.88 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 20:46:22,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1249.90 | bwd_inner_microstep: 1249.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-10 20:46:24,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.69 | bwd_microstep: 1636.71 | bwd_inner_microstep: 1636.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 20:46:26,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1383.04 | bwd_inner_microstep: 1383.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 20:46:28,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 20:46:30,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.39 | bwd_microstep: 1631.64 | bwd_inner_microstep: 1631.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2632
[2024-06-10 20:46:31,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.79 | bwd_microstep: 949.72 | bwd_inner_microstep: 949.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2467
[2024-06-10 20:46:33,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.62 | bwd_microstep: 953.72 | bwd_inner_microstep: 953.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-10 20:46:34,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.21 | bwd_microstep: 1184.87 | bwd_inner_microstep: 1184.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912
[2024-06-10 20:46:35,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.89 | bwd_microstep: 719.12 | bwd_inner_microstep: 719.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3408
[2024-06-10 20:46:37,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1371.97 | bwd_inner_microstep: 1371.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3544
[2024-06-10 20:46:39,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.80 | bwd_microstep: 1638.83 | bwd_inner_microstep: 1638.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-10 20:46:41,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.74 | bwd_microstep: 1438.84 | bwd_inner_microstep: 1438.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3458
[2024-06-10 20:46:43,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.95 | bwd_microstep: 1434.78 | bwd_inner_microstep: 1434.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-10 20:46:46,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1614.59 | bwd_inner_microstep: 1614.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2662
[2024-06-10 20:46:47,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.46 | bwd_microstep: 1132.83 | bwd_inner_microstep: 1132.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 20:46:49,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.93 | bwd_microstep: 1522.76 | bwd_inner_microstep: 1522.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3829
[2024-06-10 20:46:52,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.72 | bwd_microstep: 1691.20 | bwd_inner_microstep: 1691.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3532
[2024-06-10 20:46:54,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1358.63 | bwd_inner_microstep: 1358.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-10 20:46:55,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1300.83 | bwd_inner_microstep: 1300.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530
[2024-06-10 20:46:57,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.30 | bwd_microstep: 1198.01 | bwd_inner_microstep: 1197.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 20:46:59,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1396.33 | bwd_inner_microstep: 1396.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 20:47:01,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1496.25 | bwd_inner_microstep: 1496.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 20:47:03,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1508.10 | bwd_inner_microstep: 1508.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-10 20:47:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1308.71 | bwd_inner_microstep: 1308.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 20:47:07,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.31 | bwd_microstep: 1386.93 | bwd_inner_microstep: 1386.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 20:47:09,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.52 | bwd_microstep: 1338.85 | bwd_inner_microstep: 1338.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2195
[2024-06-10 20:47:10,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.16 | bwd_microstep: 828.94 | bwd_inner_microstep: 828.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675
[2024-06-10 20:47:12,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.12 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058
[2024-06-10 20:47:13,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.78 | bwd_microstep: 911.48 | bwd_inner_microstep: 911.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 20:47:19,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 20:47:19,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 4975.04 | bwd_inner_microstep: 1684.76 | bwd_allreduce_microstep: 3290.22 | step_microstep: 37.78
[2024-06-10 20:47:19,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15794.41 | bwd: 45579.74 | bwd_inner: 42288.60 | bwd_allreduce: 3290.45 | step: 39.22
{'loss': 1.1463, 'learning_rate': 1.0097674069069132e-05, 'epoch': 0.67}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-10 20:47:21,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.83 | bwd_microstep: 1417.92 | bwd_inner_microstep: 1417.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3736
[2024-06-10 20:47:23,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.90 | bwd_microstep: 1732.45 | bwd_inner_microstep: 1732.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 20:47:25,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 20:47:27,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1475.70 | bwd_inner_microstep: 1475.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 20:47:29,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1474.77 | bwd_inner_microstep: 1474.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 20:47:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1249.11 | bwd_inner_microstep: 1249.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 20:47:33,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 20:47:35,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1496.72 | bwd_inner_microstep: 1496.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 20:47:36,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1247.74 | bwd_inner_microstep: 1247.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 20:47:38,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1341.79 | bwd_inner_microstep: 1341.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 20:47:40,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.57 | bwd_microstep: 1154.92 | bwd_inner_microstep: 1154.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3440
[2024-06-10 20:47:42,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.55 | bwd_microstep: 1296.88 | bwd_inner_microstep: 1296.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2054
[2024-06-10 20:47:43,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.54 | bwd_microstep: 848.19 | bwd_inner_microstep: 848.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3567
[2024-06-10 20:47:45,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.01 | bwd_microstep: 1562.92 | bwd_inner_microstep: 1562.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492
[2024-06-10 20:47:47,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.24 | bwd_microstep: 1429.25 | bwd_inner_microstep: 1429.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 20:47:49,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1375.44 | bwd_inner_microstep: 1375.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539
[2024-06-10 20:47:51,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1459.89 | bwd_inner_microstep: 1459.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3483
[2024-06-10 20:47:53,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.02 | bwd_microstep: 1433.26 | bwd_inner_microstep: 1433.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 20:47:55,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1384.43 | bwd_inner_microstep: 1384.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-10 20:47:57,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.71 | bwd_microstep: 1552.57 | bwd_inner_microstep: 1552.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 20:47:59,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 20:48:01,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1512.16 | bwd_inner_microstep: 1512.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944
[2024-06-10 20:48:02,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.34 | bwd_microstep: 758.81 | bwd_inner_microstep: 758.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 20:48:04,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.41 | bwd_microstep: 1495.53 | bwd_inner_microstep: 1495.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005
[2024-06-10 20:48:05,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.73 | bwd_microstep: 896.36 | bwd_inner_microstep: 896.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004
[2024-06-10 20:48:07,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.15 | bwd_microstep: 895.34 | bwd_inner_microstep: 895.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-10 20:48:09,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.49 | bwd_microstep: 1749.13 | bwd_inner_microstep: 1749.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-10 20:48:11,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.82 | bwd_microstep: 1483.68 | bwd_inner_microstep: 1483.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 20:48:13,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.50 | bwd_microstep: 1653.65 | bwd_inner_microstep: 1653.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-10 20:48:15,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1530.10 | bwd_inner_microstep: 1530.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 20:48:17,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1390.99 | bwd_inner_microstep: 1390.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466
[2024-06-10 20:48:22,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 20:48:22,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.14 | bwd_microstep: 3826.25 | bwd_inner_microstep: 1356.34 | bwd_allreduce_microstep: 2469.86 | step_microstep: 37.85
[2024-06-10 20:48:22,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16350.84 | bwd: 46297.00 | bwd_inner: 43826.24 | bwd_allreduce: 2470.09 | step: 39.25
{'loss': 1.1702, 'learning_rate': 1.006508101708997e-05, 'epoch': 0.68}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 20:48:24,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.65 | bwd_microstep: 1473.67 | bwd_inner_microstep: 1473.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 20:48:26,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.54 | bwd_microstep: 1374.90 | bwd_inner_microstep: 1374.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3854
[2024-06-10 20:48:28,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1465.71 | bwd_inner_microstep: 1465.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 20:48:29,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 1276.94 | bwd_inner_microstep: 1276.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4144
[2024-06-10 20:48:32,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.62 | bwd_microstep: 1638.44 | bwd_inner_microstep: 1638.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-10 20:48:34,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1347.74 | bwd_inner_microstep: 1347.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 20:48:35,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.94 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3445
[2024-06-10 20:48:37,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.98 | bwd_microstep: 1377.37 | bwd_inner_microstep: 1377.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897
[2024-06-10 20:48:38,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.66 | bwd_microstep: 777.40 | bwd_inner_microstep: 777.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 20:48:40,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1299.72 | bwd_inner_microstep: 1299.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3496
[2024-06-10 20:48:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1331.04 | bwd_inner_microstep: 1331.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2931
[2024-06-10 20:48:44,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.37 | bwd_microstep: 1092.30 | bwd_inner_microstep: 1092.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-10 20:48:46,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.92 | bwd_microstep: 1611.77 | bwd_inner_microstep: 1611.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-10 20:48:48,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.85 | bwd_microstep: 1582.48 | bwd_inner_microstep: 1582.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 20:48:49,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 891.43 | bwd_inner_microstep: 891.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3472
[2024-06-10 20:48:51,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1360.00 | bwd_inner_microstep: 1359.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519
[2024-06-10 20:48:53,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.23 | bwd_microstep: 1319.31 | bwd_inner_microstep: 1319.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3963
[2024-06-10 20:48:55,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.82 | bwd_microstep: 1510.52 | bwd_inner_microstep: 1510.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 20:48:57,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1251.78 | bwd_inner_microstep: 1251.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-10 20:48:58,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.45 | bwd_microstep: 725.66 | bwd_inner_microstep: 725.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 20:49:00,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 20:49:02,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.58 | bwd_microstep: 1498.05 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-10 20:49:04,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1393.96 | bwd_inner_microstep: 1393.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 20:49:06,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.28 | bwd_microstep: 1405.23 | bwd_inner_microstep: 1405.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 20:49:08,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.87 | bwd_microstep: 1358.16 | bwd_inner_microstep: 1358.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 20:49:10,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.84 | bwd_microstep: 1396.60 | bwd_inner_microstep: 1396.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 20:49:12,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1449.06 | bwd_inner_microstep: 1449.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-10 20:49:14,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.10 | bwd_microstep: 1481.35 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 20:49:16,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 20:49:18,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.51 | bwd_microstep: 1502.89 | bwd_inner_microstep: 1502.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-10 20:49:20,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.39 | bwd_microstep: 1545.12 | bwd_inner_microstep: 1545.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 20:49:22,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.03 | optimizer_step: 6.60
[2024-06-10 20:49:22,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1839.65 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 448.66 | step_microstep: 37.46
[2024-06-10 20:49:22,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16310.07 | bwd: 44003.37 | bwd_inner: 43553.82 | bwd_allreduce: 448.89 | step: 38.90
{'loss': 1.2102, 'learning_rate': 1.0032522955843822e-05, 'epoch': 0.68}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 20:49:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1332.17 | bwd_inner_microstep: 1332.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3999
[2024-06-10 20:49:26,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1408.63 | bwd_inner_microstep: 1408.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911
[2024-06-10 20:49:28,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.14 | bwd_microstep: 1588.74 | bwd_inner_microstep: 1588.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4295
[2024-06-10 20:49:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.69 | bwd_microstep: 1779.46 | bwd_inner_microstep: 1779.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 20:49:33,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.08 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 20:49:34,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1282.30 | bwd_inner_microstep: 1282.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 20:49:36,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1395.51 | bwd_inner_microstep: 1395.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 20:49:37,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.32 | bwd_microstep: 789.64 | bwd_inner_microstep: 789.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 20:49:39,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.69 | bwd_microstep: 1184.55 | bwd_inner_microstep: 1184.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 20:49:41,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1249.09 | bwd_inner_microstep: 1249.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584
[2024-06-10 20:49:43,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1396.32 | bwd_inner_microstep: 1396.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 20:49:45,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.78 | bwd_microstep: 1523.00 | bwd_inner_microstep: 1522.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493
[2024-06-10 20:49:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.06 | bwd_microstep: 1315.23 | bwd_inner_microstep: 1315.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2937
[2024-06-10 20:49:48,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.37 | bwd_microstep: 1130.22 | bwd_inner_microstep: 1130.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 20:49:50,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1287.45 | bwd_inner_microstep: 1287.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 20:49:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 20:49:54,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 20:49:56,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1391.95 | bwd_inner_microstep: 1391.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833
[2024-06-10 20:49:58,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.85 | bwd_microstep: 1465.95 | bwd_inner_microstep: 1465.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 20:50:00,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1415.17 | bwd_inner_microstep: 1415.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867
[2024-06-10 20:50:02,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.48 | bwd_microstep: 1669.28 | bwd_inner_microstep: 1669.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531
[2024-06-10 20:50:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.98 | bwd_microstep: 1353.18 | bwd_inner_microstep: 1353.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 20:50:06,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1283.18 | bwd_inner_microstep: 1283.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 20:50:08,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1498.18 | bwd_inner_microstep: 1498.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 20:50:10,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 1396.79 | bwd_inner_microstep: 1396.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3634
[2024-06-10 20:50:12,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.93 | bwd_microstep: 1476.79 | bwd_inner_microstep: 1476.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3457
[2024-06-10 20:50:13,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.30 | bwd_microstep: 1215.17 | bwd_inner_microstep: 1215.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2666
[2024-06-10 20:50:15,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.57 | bwd_microstep: 1120.46 | bwd_inner_microstep: 1120.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 20:50:17,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1481.85 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 20:50:19,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.02 | bwd_microstep: 1650.10 | bwd_inner_microstep: 1650.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2081
[2024-06-10 20:50:21,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.09 | bwd_microstep: 1012.07 | bwd_inner_microstep: 1012.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3454
[2024-06-10 20:50:25,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-10 20:50:25,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 3468.29 | bwd_inner_microstep: 1691.01 | bwd_allreduce_microstep: 1777.23 | step_microstep: 37.94
[2024-06-10 20:50:25,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16384.80 | bwd: 45678.32 | bwd_inner: 43900.18 | bwd_allreduce: 1777.46 | step: 39.51
{'loss': 1.2357, 'learning_rate': 1.0000000000000006e-05, 'epoch': 0.68}
3/1726 [20:07:52<9:41:03, 61.93s/it]
 67%|██████▋   | 1164/1726 [20:08:54<9:40:23, 61.96s/it]


 67%|██████▋   | 1164/1726 [20:08:54<9:40:23, 61.96s/it]
 67%|██████▋   | 1165/1726 [20:09:55<9:38:36, 61.88s/it]


 67%|██████▋   | 1165/1726 [20:09:55<9:38:36, 61.88s/it]
 68%|██████▊   | 1166/1726 [20:10:58<9:40:39, 62.21s/it]


 68%|██████▊   | 1166/1726 [20:10:58<9:40:39, 62.21s/it]
 68%|██████▊   | 1167/1726 [20:11:59<9:35:13, 61.74s/it]


 68%|██████▊   | 1167/1726 [20:11:59<9:35:13, 61.74s/it]
 68%|██████▊   | 1168/1726 [20:13:01<9:36:01, 61.94s/it]


 68%|██████▊   | 1168/1726 [20:13:01<9:36:01dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1984
[2024-06-10 20:50:26,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.15 | bwd_microstep: 882.26 | bwd_inner_microstep: 882.18 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 20:50:27,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.81 | bwd_microstep: 788.41 | bwd_inner_microstep: 788.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2322
[2024-06-10 20:50:28,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.59 | bwd_microstep: 885.03 | bwd_inner_microstep: 885.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4273
[2024-06-10 20:50:30,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.11 | bwd_microstep: 1599.73 | bwd_inner_microstep: 1599.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 20:50:33,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.49 | bwd_microstep: 1634.94 | bwd_inner_microstep: 1634.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4024
[2024-06-10 20:50:35,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.27 | bwd_microstep: 1519.36 | bwd_inner_microstep: 1519.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 20:50:36,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 804.14 | bwd_inner_microstep: 804.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 20:50:37,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 791.95 | bwd_inner_microstep: 791.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697
[2024-06-10 20:50:39,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.24 | bwd_microstep: 1457.89 | bwd_inner_microstep: 1457.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-10 20:50:41,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.00 | bwd_microstep: 1440.03 | bwd_inner_microstep: 1440.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-10 20:50:43,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.84 | bwd_microstep: 1720.61 | bwd_inner_microstep: 1720.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2115
[2024-06-10 20:50:45,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.77 | bwd_microstep: 873.88 | bwd_inner_microstep: 873.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3399
[2024-06-10 20:50:47,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.69 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3650
[2024-06-10 20:50:49,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.62 | bwd_microstep: 1719.32 | bwd_inner_microstep: 1719.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 20:50:51,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.02 | bwd_microstep: 1378.86 | bwd_inner_microstep: 1378.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-10 20:50:53,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1415.49 | bwd_inner_microstep: 1415.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-10 20:50:55,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.61 | bwd_microstep: 1318.48 | bwd_inner_microstep: 1318.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 20:50:56,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1282.21 | bwd_inner_microstep: 1282.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 20:50:58,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1396.82 | bwd_inner_microstep: 1396.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 20:51:00,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 20:51:02,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1490.77 | bwd_inner_microstep: 1490.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 20:51:04,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1297.04 | bwd_inner_microstep: 1297.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610
[2024-06-10 20:51:06,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1341.98 | bwd_inner_microstep: 1341.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271
[2024-06-10 20:51:07,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.60 | bwd_microstep: 877.65 | bwd_inner_microstep: 877.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754
[2024-06-10 20:51:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.17 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 20:51:11,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1379.07 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2569
[2024-06-10 20:51:13,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.57 | bwd_microstep: 975.04 | bwd_inner_microstep: 975.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603
[2024-06-10 20:51:14,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.09 | bwd_microstep: 1430.72 | bwd_inner_microstep: 1430.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816
[2024-06-10 20:51:17,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.84 | bwd_microstep: 1615.58 | bwd_inner_microstep: 1615.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 20:51:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1649.74 | bwd_inner_microstep: 1649.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770
[2024-06-10 20:51:21,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.62 | bwd_microstep: 1570.89 | bwd_inner_microstep: 1570.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2259
[2024-06-10 20:51:26,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.07 | optimizer_step: 6.58
[2024-06-10 20:51:26,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.13 | bwd_microstep: 4646.79 | bwd_inner_microstep: 1134.87 | bwd_allreduce_microstep: 3511.86 | step_microstep: 37.75
[2024-06-10 20:51:26,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15649.09 | bwd: 45506.01 | bwd_inner: 41993.17 | bwd_allreduce: 3512.13 | step: 39.27
{'loss': 1.1798, 'learning_rate': 9.967512264104204e-06, 'epoch': 0.68}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3473
[2024-06-10 20:51:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.84 | bwd_microstep: 1563.72 | bwd_inner_microstep: 1563.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3905
[2024-06-10 20:51:30,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1388.13 | bwd_inner_microstep: 1388.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3849
[2024-06-10 20:51:32,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1488.10 | bwd_inner_microstep: 1488.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3855
[2024-06-10 20:51:35,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.56 | bwd_microstep: 1657.35 | bwd_inner_microstep: 1657.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 20:51:36,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.80 | bwd_microstep: 792.43 | bwd_inner_microstep: 792.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 20:51:38,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1483.97 | bwd_inner_microstep: 1483.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-10 20:51:40,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.95 | bwd_microstep: 1647.71 | bwd_inner_microstep: 1647.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 20:51:42,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.27 | bwd_microstep: 1284.47 | bwd_inner_microstep: 1284.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1960
[2024-06-10 20:51:43,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.52 | bwd_microstep: 765.71 | bwd_inner_microstep: 765.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2484
[2024-06-10 20:51:44,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.45 | bwd_microstep: 956.02 | bwd_inner_microstep: 956.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 20:51:46,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1403.42 | bwd_inner_microstep: 1403.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3745
[2024-06-10 20:51:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.32 | bwd_microstep: 1566.73 | bwd_inner_microstep: 1566.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2140
[2024-06-10 20:51:50,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.51 | bwd_microstep: 928.65 | bwd_inner_microstep: 928.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 20:51:51,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1339.93 | bwd_inner_microstep: 1339.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2106
[2024-06-10 20:51:53,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.58 | bwd_microstep: 1014.42 | bwd_inner_microstep: 1014.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3845
[2024-06-10 20:51:55,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.25 | bwd_microstep: 1761.38 | bwd_inner_microstep: 1761.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 20:51:57,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1527.15 | bwd_inner_microstep: 1527.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 20:51:59,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.51 | bwd_microstep: 1320.14 | bwd_inner_microstep: 1320.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 20:52:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.84 | bwd_microstep: 1519.00 | bwd_inner_microstep: 1518.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-10 20:52:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.73 | bwd_microstep: 1486.26 | bwd_inner_microstep: 1486.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 20:52:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1291.86 | bwd_inner_microstep: 1291.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 20:52:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.23 | bwd_microstep: 1310.21 | bwd_inner_microstep: 1310.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 20:52:09,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.15 | bwd_microstep: 1693.76 | bwd_inner_microstep: 1693.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 20:52:12,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.34 | bwd_microstep: 1646.48 | bwd_inner_microstep: 1646.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 20:52:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 20:52:15,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1400.87 | bwd_inner_microstep: 1400.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 20:52:17,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.84 | bwd_microstep: 1397.32 | bwd_inner_microstep: 1397.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 20:52:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 1284.06 | bwd_inner_microstep: 1284.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-10 20:52:21,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.82 | bwd_microstep: 1606.06 | bwd_inner_microstep: 1606.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 20:52:23,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1540.68 | bwd_inner_microstep: 1540.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3802
[2024-06-10 20:52:26,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.20 | bwd_microstep: 1718.72 | bwd_inner_microstep: 1718.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499
[2024-06-10 20:52:28,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 20:52:28,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.39 | bwd_microstep: 1471.27 | bwd_inner_microstep: 1463.56 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.77
[2024-06-10 20:52:28,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16644.62 | bwd: 44651.66 | bwd_inner: 44643.10 | bwd_allreduce: 7.89 | step: 39.17
{'loss': 1.1453, 'learning_rate': 9.935059862578047e-06, 'epoch': 0.68}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931
[2024-06-10 20:52:29,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.09 | bwd_microstep: 880.27 | bwd_inner_microstep: 880.18 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 20:52:31,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.36 | bwd_microstep: 1297.94 | bwd_inner_microstep: 1297.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4313
[2024-06-10 20:52:33,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.52 | bwd_microstep: 1586.15 | bwd_inner_microstep: 1586.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 20:52:35,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1482.38 | bwd_inner_microstep: 1482.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 20:52:37,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1281.74 | bwd_inner_microstep: 1281.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 20:52:38,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 788.75 | bwd_inner_microstep: 788.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 20:52:40,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.49 | bwd_microstep: 1376.29 | bwd_inner_microstep: 1376.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-10 20:52:42,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1250.89 | bwd_inner_microstep: 1250.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4117
[2024-06-10 20:52:44,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.74 | bwd_microstep: 1532.34 | bwd_inner_microstep: 1532.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3548
[2024-06-10 20:52:46,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.03 | bwd_microstep: 1279.69 | bwd_inner_microstep: 1279.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3427
[2024-06-10 20:52:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1298.15 | bwd_inner_microstep: 1298.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 20:52:49,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1431.42 | bwd_inner_microstep: 1431.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2944
[2024-06-10 20:52:51,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.80 | bwd_microstep: 1148.24 | bwd_inner_microstep: 1148.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-10 20:52:52,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 797.45 | bwd_inner_microstep: 797.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-10 20:52:54,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.94 | bwd_microstep: 1633.17 | bwd_inner_microstep: 1633.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561
[2024-06-10 20:52:56,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1330.23 | bwd_inner_microstep: 1330.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3712
[2024-06-10 20:52:58,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.98 | bwd_microstep: 1724.12 | bwd_inner_microstep: 1724.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988
[2024-06-10 20:53:00,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.55 | bwd_microstep: 831.95 | bwd_inner_microstep: 831.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2152
[2024-06-10 20:53:01,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.87 | bwd_microstep: 851.04 | bwd_inner_microstep: 851.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3539
[2024-06-10 20:53:03,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.03 | bwd_microstep: 1661.78 | bwd_inner_microstep: 1661.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3848
[2024-06-10 20:53:05,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.93 | bwd_microstep: 1695.05 | bwd_inner_microstep: 1695.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2918
[2024-06-10 20:53:07,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.36 | bwd_microstep: 1284.85 | bwd_inner_microstep: 1284.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3725
[2024-06-10 20:53:09,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.49 | bwd_microstep: 1585.45 | bwd_inner_microstep: 1585.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1264
[2024-06-10 20:53:10,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 186.07 | bwd_microstep: 487.09 | bwd_inner_microstep: 487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 20:53:12,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.50 | bwd_microstep: 1511.47 | bwd_inner_microstep: 1511.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3809
[2024-06-10 20:53:14,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.10 | bwd_microstep: 1688.33 | bwd_inner_microstep: 1688.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 20:53:16,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 20:53:18,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1350.83 | bwd_inner_microstep: 1350.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 20:53:20,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.16 | bwd_microstep: 1398.88 | bwd_inner_microstep: 1398.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 20:53:22,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.58 | bwd_microstep: 1563.39 | bwd_inner_microstep: 1563.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-10 20:53:24,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.14 | bwd_microstep: 1604.91 | bwd_inner_microstep: 1604.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3380
[2024-06-10 20:53:28,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.57
[2024-06-10 20:53:28,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 2569.57 | bwd_inner_microstep: 1628.03 | bwd_allreduce_microstep: 941.49 | step_microstep: 37.78
[2024-06-10 20:53:28,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15847.83 | bwd: 43489.89 | bwd_inner: 42547.43 | bwd_allreduce: 941.76 | step: 39.32
{'loss': 1.1812, 'learning_rate': 9.902642909718737e-06, 'epoch': 0.68}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918
[2024-06-10 20:53:29,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.98 | bwd_microstep: 776.86 | bwd_inner_microstep: 776.78 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4007
[2024-06-10 20:53:31,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.01 | bwd_microstep: 1711.96 | bwd_inner_microstep: 1711.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 20:53:33,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.21 | bwd_microstep: 1483.29 | bwd_inner_microstep: 1483.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 20:53:35,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.48 | bwd_microstep: 1277.08 | bwd_inner_microstep: 1277.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 20:53:37,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 20:53:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.93 | bwd_microstep: 1155.92 | bwd_inner_microstep: 1155.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 20:53:40,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.08 | bwd_microstep: 1150.44 | bwd_inner_microstep: 1150.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3486
[2024-06-10 20:53:42,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.15 | bwd_microstep: 1249.90 | bwd_inner_microstep: 1249.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-10 20:53:44,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1452.28 | bwd_inner_microstep: 1452.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3499
[2024-06-10 20:53:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.36 | bwd_microstep: 1513.62 | bwd_inner_microstep: 1513.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455
[2024-06-10 20:53:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1329.07 | bwd_inner_microstep: 1329.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3513
[2024-06-10 20:53:49,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.47 | bwd_microstep: 1252.78 | bwd_inner_microstep: 1252.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3460
[2024-06-10 20:53:51,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.78 | bwd_microstep: 1212.67 | bwd_inner_microstep: 1212.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2412
[2024-06-10 20:53:52,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.71 | bwd_microstep: 971.20 | bwd_inner_microstep: 971.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1992
[2024-06-10 20:53:53,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.31 | bwd_microstep: 750.96 | bwd_inner_microstep: 750.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 20:53:56,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.66 | bwd_microstep: 1614.05 | bwd_inner_microstep: 1614.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 20:53:58,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1514.29 | bwd_inner_microstep: 1514.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 20:54:00,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1512.13 | bwd_inner_microstep: 1512.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3548
[2024-06-10 20:54:02,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.29 | bwd_microstep: 1442.04 | bwd_inner_microstep: 1442.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1985
[2024-06-10 20:54:03,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.96 | bwd_microstep: 767.20 | bwd_inner_microstep: 767.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 641
[2024-06-10 20:54:03,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 108.71 | bwd_microstep: 272.89 | bwd_inner_microstep: 272.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 20:54:05,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1498.97 | bwd_inner_microstep: 1498.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2436
[2024-06-10 20:54:07,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.30 | bwd_microstep: 976.96 | bwd_inner_microstep: 976.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 20:54:09,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1378.82 | bwd_inner_microstep: 1378.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-10 20:54:11,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.45 | bwd_microstep: 1603.55 | bwd_inner_microstep: 1603.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 20:54:13,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.34 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 20:54:15,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1556.77 | bwd_inner_microstep: 1556.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 20:54:16,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 791.77 | bwd_inner_microstep: 791.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437
[2024-06-10 20:54:18,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1281.47 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547
[2024-06-10 20:54:20,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1457.07 | bwd_inner_microstep: 1457.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3609
[2024-06-10 20:54:22,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 1457.91 | bwd_inner_microstep: 1457.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 20:54:30,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-10 20:54:30,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.99 | bwd_microstep: 7328.73 | bwd_inner_microstep: 1749.62 | bwd_allreduce_microstep: 5579.06 | step_microstep: 37.86
[2024-06-10 20:54:30,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15214.85 | bwd: 46570.74 | bwd_inner: 40990.72 | bwd_allreduce: 5579.33 | step: 39.50
{'loss': 1.155, 'learning_rate': 9.870261519698612e-06, 'epoch': 0.68}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 20:54:31,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.78 | bwd_microstep: 1328.20 | bwd_inner_microstep: 1328.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3387
[2024-06-10 20:54:33,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.04 | bwd_microstep: 1143.05 | bwd_inner_microstep: 1143.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832
[2024-06-10 20:54:35,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1479.05 | bwd_inner_microstep: 1479.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 20:54:37,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.15 | bwd_microstep: 1143.00 | bwd_inner_microstep: 1142.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3423
[2024-06-10 20:54:38,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.25 | bwd_microstep: 1210.82 | bwd_inner_microstep: 1210.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566
[2024-06-10 20:54:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.77 | bwd_microstep: 1203.02 | bwd_inner_microstep: 1202.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 20:54:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 20:54:43,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 791.64 | bwd_inner_microstep: 791.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 20:54:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1384.90 | bwd_inner_microstep: 1384.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533
[2024-06-10 20:54:47,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.79 | bwd_microstep: 1230.97 | bwd_inner_microstep: 1230.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 20:54:48,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1282.40 | bwd_inner_microstep: 1282.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3659
[2024-06-10 20:54:51,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.18 | bwd_microstep: 1475.44 | bwd_inner_microstep: 1475.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3954
[2024-06-10 20:54:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.47 | bwd_microstep: 1433.67 | bwd_inner_microstep: 1433.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 20:54:54,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1396.71 | bwd_inner_microstep: 1396.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 20:54:56,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1399.29 | bwd_inner_microstep: 1399.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 20:54:58,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.98 | bwd_microstep: 1452.66 | bwd_inner_microstep: 1452.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 20:55:00,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1386.49 | bwd_inner_microstep: 1386.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-10 20:55:02,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1155.24 | bwd_inner_microstep: 1155.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-10 20:55:03,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.09 | bwd_microstep: 798.30 | bwd_inner_microstep: 798.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 20:55:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1387.76 | bwd_inner_microstep: 1387.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 20:55:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.11 | bwd_microstep: 1459.40 | bwd_inner_microstep: 1459.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-10 20:55:09,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1610.80 | bwd_inner_microstep: 1610.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 20:55:10,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.18 | bwd_microstep: 685.42 | bwd_inner_microstep: 685.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 20:55:12,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.46 | bwd_microstep: 1281.88 | bwd_inner_microstep: 1281.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 20:55:14,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3598
[2024-06-10 20:55:16,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.86 | bwd_microstep: 1637.88 | bwd_inner_microstep: 1637.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 20:55:18,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1256.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 20:55:20,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.25 | bwd_microstep: 1496.12 | bwd_inner_microstep: 1496.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1859
[2024-06-10 20:55:21,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.40 | bwd_microstep: 706.38 | bwd_inner_microstep: 706.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089
[2024-06-10 20:55:22,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.50 | bwd_microstep: 852.38 | bwd_inner_microstep: 852.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3746
[2024-06-10 20:55:24,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.00 | bwd_microstep: 1682.44 | bwd_inner_microstep: 1682.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3773
[2024-06-10 20:55:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-10 20:55:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.33 | bwd_microstep: 4694.04 | bwd_inner_microstep: 2391.07 | bwd_allreduce_microstep: 2302.91 | step_microstep: 37.69
[2024-06-10 20:55:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15494.60 | bwd: 44113.59 | bwd_inner: 41809.77 | bwd_allreduce: 2303.14 | step: 39.19
{'loss': 1.2294, 'learning_rate': 9.837915806564753e-06, 'epoch': 0.68}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 20:55:32,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1471.32 | bwd_inner_microstep: 1471.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 20:55:33,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1276.53 | bwd_inner_microstep: 1276.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 20:55:35,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1343.16 | bwd_inner_microstep: 1343.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 20:55:38,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.85 | bwd_microstep: 1660.18 | bwd_inner_microstep: 1660.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 20:55:39,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1394.41 | bwd_inner_microstep: 1394.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-10 20:55:40,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.36 | bwd_microstep: 678.06 | bwd_inner_microstep: 678.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 20:55:42,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.24 | bwd_microstep: 1396.72 | bwd_inner_microstep: 1396.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 20:55:44,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1381.55 | bwd_inner_microstep: 1381.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 20:55:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1388.96 | bwd_inner_microstep: 1388.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 20:55:48,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1294.45 | bwd_inner_microstep: 1294.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 20:55:50,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1391.62 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513
[2024-06-10 20:55:52,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.02 | bwd_microstep: 1549.77 | bwd_inner_microstep: 1549.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3503
[2024-06-10 20:55:54,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1553.39 | bwd_inner_microstep: 1553.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2656
[2024-06-10 20:55:56,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.14 | bwd_microstep: 1019.30 | bwd_inner_microstep: 1019.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 20:55:58,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.62 | bwd_microstep: 1600.98 | bwd_inner_microstep: 1600.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 20:56:00,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1342.89 | bwd_inner_microstep: 1342.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3637
[2024-06-10 20:56:02,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1436.21 | bwd_inner_microstep: 1436.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 20:56:03,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1259.49 | bwd_inner_microstep: 1259.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 20:56:05,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.79 | bwd_microstep: 1486.80 | bwd_inner_microstep: 1486.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 20:56:07,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1159.64 | bwd_inner_microstep: 1159.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 20:56:09,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1516.11 | bwd_inner_microstep: 1516.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 20:56:11,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1254.33 | bwd_inner_microstep: 1254.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495
[2024-06-10 20:56:13,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1577.85 | bwd_inner_microstep: 1577.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-10 20:56:15,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1429.45 | bwd_inner_microstep: 1429.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 20:56:17,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.16 | bwd_microstep: 1590.07 | bwd_inner_microstep: 1590.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-10 20:56:19,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.03 | bwd_microstep: 1587.00 | bwd_inner_microstep: 1586.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 20:56:21,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1517.03 | bwd_inner_microstep: 1517.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 20:56:23,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.64 | bwd_microstep: 1470.80 | bwd_inner_microstep: 1470.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 20:56:25,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 20:56:27,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1286.90 | bwd_inner_microstep: 1286.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 20:56:29,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1509.05 | bwd_inner_microstep: 1509.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 20:56:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-10 20:56:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.18 | bwd_microstep: 1319.28 | bwd_inner_microstep: 1310.43 | bwd_allreduce_microstep: 8.81 | step_microstep: 39.48
[2024-06-10 20:56:31,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16588.57 | bwd: 44422.85 | bwd_inner: 44413.14 | bwd_allreduce: 9.04 | step: 40.94
, 61.94s/it]
 68%|██████▊   | 1169/1726 [20:14:03<9:33:45, 61.80s/it]


 68%|██████▊   | 1169/1726 [20:14:03<9:33:45, 61.80s/it]
 68%|██████▊   | 1170/1726 [20:15:05<9:32:14, 61.75s/it]


 68%|██████▊   | 1170/1726 [20:15:05<9:32:14, 61.75s/it]
 68%|██████▊   | 1171/1726 [20:16:04<9:25:27, 61.13s/it]


 68%|██████▊   | 1171/1726 [20:16:04<9:25:27, 61.13s/it]
 68%|██████▊   | 1172/1726 [20:17:06<9:27:11, 61.43s/it]


 68%|██████▊   | 1172/1726 [20:17:06<9:27:11, 61.43s/it]
 68%|██████▊   | 1173/1726 [20:18:06<9:22:02, 60.98s/it]


 68%|██████▊   | 1173/1726 [20:18:06<9:22:02, 60.98s/it]
 68%|██{'loss': 1.2026, 'learning_rate': 9.805605884238587e-06, 'epoch': 0.68}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-10 20:56:33,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1547.40 | bwd_inner_microstep: 1547.20 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3973
[2024-06-10 20:56:35,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.94 | bwd_microstep: 1704.37 | bwd_inner_microstep: 1704.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3798
[2024-06-10 20:56:37,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1479.20 | bwd_inner_microstep: 1479.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 20:56:39,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1248.87 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-10 20:56:41,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.93 | bwd_microstep: 1278.60 | bwd_inner_microstep: 1278.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529
[2024-06-10 20:56:43,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.74 | bwd_microstep: 1195.69 | bwd_inner_microstep: 1195.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 20:56:44,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1255.75 | bwd_inner_microstep: 1255.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 20:56:45,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.28 | bwd_microstep: 699.41 | bwd_inner_microstep: 699.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-10 20:56:47,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1403.24 | bwd_inner_microstep: 1403.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 20:56:49,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.57 | bwd_microstep: 1522.08 | bwd_inner_microstep: 1522.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 20:56:51,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.67 | bwd_microstep: 1523.58 | bwd_inner_microstep: 1523.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 20:56:53,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 20:56:55,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.74 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910
[2024-06-10 20:56:56,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.43 | bwd_microstep: 751.29 | bwd_inner_microstep: 751.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967
[2024-06-10 20:56:57,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.51 | bwd_microstep: 891.75 | bwd_inner_microstep: 891.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 20:56:59,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.02 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-10 20:57:01,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.72 | bwd_microstep: 1409.05 | bwd_inner_microstep: 1409.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 20:57:03,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.93 | bwd_microstep: 1335.97 | bwd_inner_microstep: 1335.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107
[2024-06-10 20:57:04,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.55 | bwd_microstep: 921.73 | bwd_inner_microstep: 921.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975
[2024-06-10 20:57:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.56 | bwd_microstep: 769.28 | bwd_inner_microstep: 769.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 20:57:08,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.81 | bwd_microstep: 1585.18 | bwd_inner_microstep: 1585.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 20:57:10,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.37 | bwd_microstep: 1594.63 | bwd_inner_microstep: 1594.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3106
[2024-06-10 20:57:11,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1249.78 | bwd_inner_microstep: 1249.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3268
[2024-06-10 20:57:13,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.01 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 20:57:15,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1491.86 | bwd_inner_microstep: 1491.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577
[2024-06-10 20:57:17,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.49 | bwd_microstep: 1527.10 | bwd_inner_microstep: 1527.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3814
[2024-06-10 20:57:19,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.58 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 20:57:21,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.16 | bwd_microstep: 1423.48 | bwd_inner_microstep: 1423.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 20:57:23,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1498.86 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178
[2024-06-10 20:57:25,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.09 | bwd_microstep: 954.04 | bwd_inner_microstep: 954.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-10 20:57:26,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.42 | bwd_microstep: 1158.62 | bwd_inner_microstep: 1158.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 20:57:31,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 20:57:31,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.14 | bwd_microstep: 3844.94 | bwd_inner_microstep: 1856.74 | bwd_allreduce_microstep: 1988.14 | step_microstep: 39.03
[2024-06-10 20:57:31,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15596.93 | bwd: 43847.45 | bwd_inner: 41858.24 | bwd_allreduce: 1988.46 | step: 40.60
{'loss': 1.1539, 'learning_rate': 9.77333186651551e-06, 'epoch': 0.68}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 20:57:33,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1269.82 | bwd_inner_microstep: 1269.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3946
[2024-06-10 20:57:35,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.36 | bwd_microstep: 1525.81 | bwd_inner_microstep: 1525.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 20:57:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1549.96 | bwd_inner_microstep: 1549.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-10 20:57:39,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1296.38 | bwd_inner_microstep: 1296.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782
[2024-06-10 20:57:41,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.06 | bwd_microstep: 1476.77 | bwd_inner_microstep: 1476.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-10 20:57:43,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.55 | bwd_microstep: 1632.40 | bwd_inner_microstep: 1632.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 20:57:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 792.59 | bwd_inner_microstep: 792.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-10 20:57:46,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.57 | bwd_microstep: 1153.37 | bwd_inner_microstep: 1153.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 20:57:47,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 20:57:49,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1384.88 | bwd_inner_microstep: 1384.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 20:57:50,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3978
[2024-06-10 20:57:53,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.45 | bwd_microstep: 1605.17 | bwd_inner_microstep: 1605.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 20:57:54,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.88 | bwd_microstep: 1378.67 | bwd_inner_microstep: 1378.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988
[2024-06-10 20:57:56,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.22 | bwd_microstep: 895.55 | bwd_inner_microstep: 895.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3911
[2024-06-10 20:57:58,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.55 | bwd_microstep: 1732.71 | bwd_inner_microstep: 1732.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3508
[2024-06-10 20:58:00,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.74 | bwd_microstep: 1428.57 | bwd_inner_microstep: 1428.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534
[2024-06-10 20:58:02,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1228.81 | bwd_inner_microstep: 1228.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622
[2024-06-10 20:58:04,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.13 | bwd_microstep: 1314.41 | bwd_inner_microstep: 1314.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 20:58:06,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1500.33 | bwd_inner_microstep: 1500.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 20:58:08,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1491.65 | bwd_inner_microstep: 1491.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 20:58:10,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1605.98 | bwd_inner_microstep: 1605.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637
[2024-06-10 20:58:12,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.99 | bwd_microstep: 1313.12 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 20:58:14,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1294.15 | bwd_inner_microstep: 1294.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899
[2024-06-10 20:58:16,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1587.88 | bwd_inner_microstep: 1587.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 20:58:18,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.00 | bwd_microstep: 1555.10 | bwd_inner_microstep: 1555.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 20:58:20,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2141
[2024-06-10 20:58:21,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.03 | bwd_microstep: 893.87 | bwd_inner_microstep: 893.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432
[2024-06-10 20:58:23,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.01 | bwd_microstep: 1315.69 | bwd_inner_microstep: 1315.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3802
[2024-06-10 20:58:25,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1418.05 | bwd_inner_microstep: 1418.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-10 20:58:27,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.09 | bwd_microstep: 1700.95 | bwd_inner_microstep: 1700.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3444
[2024-06-10 20:58:29,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.43 | bwd_microstep: 1513.66 | bwd_inner_microstep: 1513.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3743
[2024-06-10 20:58:31,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.66
[2024-06-10 20:58:31,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.73 | bwd_microstep: 1654.18 | bwd_inner_microstep: 1645.93 | bwd_allreduce_microstep: 8.19 | step_microstep: 39.37
[2024-06-10 20:58:31,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16444.26 | bwd: 43979.69 | bwd_inner: 43970.59 | bwd_allreduce: 8.42 | step: 40.87
{'loss': 1.1792, 'learning_rate': 9.74109386706443e-06, 'epoch': 0.68}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3063
[2024-06-10 20:58:33,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.30 | bwd_microstep: 1170.67 | bwd_inner_microstep: 1170.44 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.22
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3986
[2024-06-10 20:58:36,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.16 | bwd_microstep: 1746.05 | bwd_inner_microstep: 1746.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-10 20:58:38,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.09 | bwd_microstep: 1647.77 | bwd_inner_microstep: 1647.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-10 20:58:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.62 | bwd_microstep: 1297.73 | bwd_inner_microstep: 1297.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 20:58:42,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1397.23 | bwd_inner_microstep: 1397.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 20:58:42,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.32 | bwd_microstep: 678.38 | bwd_inner_microstep: 678.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 20:58:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.27 | bwd_microstep: 1284.71 | bwd_inner_microstep: 1284.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 20:58:46,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1242.85 | bwd_inner_microstep: 1242.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 20:58:48,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.82 | bwd_microstep: 1387.83 | bwd_inner_microstep: 1387.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688
[2024-06-10 20:58:50,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.96 | bwd_microstep: 1522.73 | bwd_inner_microstep: 1522.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 20:58:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.39 | bwd_microstep: 1150.62 | bwd_inner_microstep: 1150.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 20:58:53,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.29 | bwd_microstep: 1276.54 | bwd_inner_microstep: 1276.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2928
[2024-06-10 20:58:55,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.30 | bwd_microstep: 1094.42 | bwd_inner_microstep: 1094.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676
[2024-06-10 20:58:57,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.17 | bwd_microstep: 1548.76 | bwd_inner_microstep: 1548.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507
[2024-06-10 20:58:59,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1444.21 | bwd_inner_microstep: 1444.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 20:59:00,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.53 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 20:59:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.23 | bwd_microstep: 1293.06 | bwd_inner_microstep: 1293.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 20:59:04,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1286.20 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 20:59:06,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1348.41 | bwd_inner_microstep: 1348.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2968
[2024-06-10 20:59:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.82 | bwd_microstep: 1201.12 | bwd_inner_microstep: 1201.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 20:59:09,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.30 | bwd_microstep: 1499.89 | bwd_inner_microstep: 1499.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 20:59:11,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1449.01 | bwd_inner_microstep: 1448.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-10 20:59:13,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.67 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3832
[2024-06-10 20:59:16,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.20 | bwd_microstep: 1583.89 | bwd_inner_microstep: 1583.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 20:59:17,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1251.49 | bwd_inner_microstep: 1251.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 20:59:19,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1282.74 | bwd_inner_microstep: 1282.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3592
[2024-06-10 20:59:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1241.62 | bwd_inner_microstep: 1241.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3749
[2024-06-10 20:59:23,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559
[2024-06-10 20:59:25,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1423.43 | bwd_inner_microstep: 1423.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3541
[2024-06-10 20:59:27,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1462.81 | bwd_inner_microstep: 1462.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3776
[2024-06-10 20:59:29,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.77 | bwd_microstep: 1743.87 | bwd_inner_microstep: 1743.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3575
[2024-06-10 20:59:33,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.12 | optimizer_step: 6.59
[2024-06-10 20:59:33,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.24 | bwd_microstep: 2835.41 | bwd_inner_microstep: 1723.07 | bwd_allreduce_microstep: 1112.28 | step_microstep: 38.87
[2024-06-10 20:59:33,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16215.92 | bwd: 44501.24 | bwd_inner: 43387.86 | bwd_allreduce: 1112.62 | step: 40.59
{'loss': 1.2482, 'learning_rate': 9.70889199942743e-06, 'epoch': 0.68}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 20:59:34,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4056
[2024-06-10 20:59:37,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.18 | bwd_microstep: 1616.29 | bwd_inner_microstep: 1616.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 20:59:38,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1375.92 | bwd_inner_microstep: 1375.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-10 20:59:41,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1550.84 | bwd_inner_microstep: 1550.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 20:59:42,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.20 | bwd_microstep: 1148.95 | bwd_inner_microstep: 1148.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 20:59:44,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.18 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2212
[2024-06-10 20:59:45,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.58 | bwd_microstep: 957.73 | bwd_inner_microstep: 957.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 20:59:47,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1252.46 | bwd_inner_microstep: 1252.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 20:59:49,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.60 | bwd_microstep: 1159.97 | bwd_inner_microstep: 1159.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 20:59:51,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.43 | bwd_microstep: 1386.95 | bwd_inner_microstep: 1386.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1991
[2024-06-10 20:59:51,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.18 | bwd_microstep: 707.96 | bwd_inner_microstep: 707.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1359
[2024-06-10 20:59:52,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.61 | bwd_microstep: 521.70 | bwd_inner_microstep: 521.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-10 20:59:53,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.70 | bwd_microstep: 806.38 | bwd_inner_microstep: 806.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 20:59:55,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 20:59:57,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 20:59:59,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1374.82 | bwd_inner_microstep: 1374.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 21:00:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1399.52 | bwd_inner_microstep: 1399.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 21:00:03,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932
[2024-06-10 21:00:04,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.61 | bwd_microstep: 700.47 | bwd_inner_microstep: 700.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 21:00:06,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.07 | bwd_microstep: 1652.30 | bwd_inner_microstep: 1652.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 21:00:08,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1558.79 | bwd_inner_microstep: 1558.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 21:00:10,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.20 | bwd_microstep: 1480.32 | bwd_inner_microstep: 1480.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 21:00:12,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1508.49 | bwd_inner_microstep: 1508.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 21:00:15,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1497.00 | bwd_inner_microstep: 1496.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091
[2024-06-10 21:00:16,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.22 | bwd_microstep: 823.41 | bwd_inner_microstep: 823.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605
[2024-06-10 21:00:18,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1435.15 | bwd_inner_microstep: 1435.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3787
[2024-06-10 21:00:20,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.05 | bwd_microstep: 1695.96 | bwd_inner_microstep: 1695.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2189
[2024-06-10 21:00:21,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.67 | bwd_microstep: 908.33 | bwd_inner_microstep: 908.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1908
[2024-06-10 21:00:22,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.18 | bwd_microstep: 750.49 | bwd_inner_microstep: 750.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 21:00:25,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1607.08 | bwd_inner_microstep: 1607.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592
[2024-06-10 21:00:27,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.09 | bwd_microstep: 1703.71 | bwd_inner_microstep: 1703.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 21:00:34,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 21:00:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 6092.78 | bwd_inner_microstep: 1542.70 | bwd_allreduce_microstep: 4550.01 | step_microstep: 38.55
[2024-06-10 21:00:34,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15240.43 | bwd: 45382.13 | bwd_inner: 40831.19 | bwd_allreduce: 4550.25 | step: 40.08
{'loss': 1.1832, 'learning_rate': 9.676726377019296e-06, 'epoch': 0.68}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 21:00:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1339.01 | bwd_inner_microstep: 1338.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2966
[2024-06-10 21:00:37,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.03 | bwd_microstep: 1141.31 | bwd_inner_microstep: 1141.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4384
[2024-06-10 21:00:39,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.23 | bwd_microstep: 1808.82 | bwd_inner_microstep: 1808.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 21:00:41,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1279.27 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 21:00:43,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.61 | bwd_microstep: 1280.93 | bwd_inner_microstep: 1280.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3738
[2024-06-10 21:00:45,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1334.11 | bwd_inner_microstep: 1334.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 21:00:47,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1383.40 | bwd_inner_microstep: 1383.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 21:00:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.75 | bwd_microstep: 1281.15 | bwd_inner_microstep: 1281.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3708
[2024-06-10 21:00:50,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1329.95 | bwd_inner_microstep: 1329.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 21:00:52,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.71 | bwd_microstep: 1159.87 | bwd_inner_microstep: 1159.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 21:00:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1157.19 | bwd_inner_microstep: 1157.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 21:00:55,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.29 | bwd_microstep: 1283.63 | bwd_inner_microstep: 1283.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 21:00:57,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.35 | bwd_microstep: 1254.47 | bwd_inner_microstep: 1254.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 21:00:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1382.57 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 21:01:01,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1383.63 | bwd_inner_microstep: 1383.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505
[2024-06-10 21:01:03,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1551.67 | bwd_inner_microstep: 1551.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3606
[2024-06-10 21:01:05,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1481.21 | bwd_inner_microstep: 1481.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 21:01:07,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1417.61 | bwd_inner_microstep: 1417.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3635
[2024-06-10 21:01:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.15 | bwd_microstep: 1813.55 | bwd_inner_microstep: 1813.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829
[2024-06-10 21:01:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.71 | bwd_microstep: 1756.92 | bwd_inner_microstep: 1756.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-10 21:01:13,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.81 | bwd_microstep: 698.90 | bwd_inner_microstep: 698.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2373
[2024-06-10 21:01:14,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.10 | bwd_microstep: 901.46 | bwd_inner_microstep: 901.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3455
[2024-06-10 21:01:16,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.09 | bwd_microstep: 1192.29 | bwd_inner_microstep: 1192.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 21:01:18,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1253.24 | bwd_inner_microstep: 1253.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3795
[2024-06-10 21:01:19,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.70 | bwd_microstep: 1354.93 | bwd_inner_microstep: 1354.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2522
[2024-06-10 21:01:21,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.88 | bwd_microstep: 1059.66 | bwd_inner_microstep: 1059.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3485
[2024-06-10 21:01:23,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1335.84 | bwd_inner_microstep: 1335.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 21:01:25,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1398.09 | bwd_inner_microstep: 1398.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3552
[2024-06-10 21:01:26,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1282.15 | bwd_inner_microstep: 1282.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 21:01:28,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.70 | bwd_microstep: 1406.44 | bwd_inner_microstep: 1406.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 21:01:31,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1595.58 | bwd_inner_microstep: 1595.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-10 21:01:35,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 21:01:35,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.31 | bwd_microstep: 4015.07 | bwd_inner_microstep: 1798.60 | bwd_allreduce_microstep: 2216.41 | step_microstep: 39.38
[2024-06-10 21:01:35,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16095.83 | bwd: 45313.95 | bwd_inner: 43096.64 | bwd_allreduce: 2216.64 | step: 41.17
████▊   | 1174/1726 [20:19:08<9:22:03, 61.09s/it]


 68%|██████▊   | 1174/1726 [20:19:08<9:22:03, 61.09s/it]
 68%|██████▊   | 1175/1726 [20:20:07<9:17:25, 60.70s/it]


 68%|██████▊   | 1175/1726 [20:20:07<9:17:25, 60.70s/it]
 68%|██████▊   | 1176/1726 [20:21:08<9:16:35, 60.72s/it]


 68%|██████▊   | 1176/1726 [20:21:08<9:16:35, 60.72s/it]
 68%|██████▊   | 1177/1726 [20:22:09<9:16:30, 60.82s/it]


 68%|██████▊   | 1177/1726 [20:22:09<9:16:30, 60.82s/it]
 68%|██████▊   | 1178/1726 [20:23:10<9:15:51, 60.86s/it]


 68%|██████▊   | 1178/1726 [20:23:10<9:15:51, 60.86s/it]
 68%|██████▊   | 1179{'loss': 1.1488, 'learning_rate': 9.644597113127206e-06, 'epoch': 0.68}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 21:01:37,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.55 | bwd_microstep: 1331.73 | bwd_inner_microstep: 1331.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908
[2024-06-10 21:01:39,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1586.63 | bwd_inner_microstep: 1586.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 21:01:41,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1550.69 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.37
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4160
[2024-06-10 21:01:44,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.81 | bwd_microstep: 1641.57 | bwd_inner_microstep: 1641.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 21:01:46,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1354.26 | bwd_inner_microstep: 1354.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-10 21:01:48,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.93 | bwd_microstep: 1538.50 | bwd_inner_microstep: 1538.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-10 21:01:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 21:01:51,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.90 | bwd_microstep: 727.23 | bwd_inner_microstep: 727.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442
[2024-06-10 21:01:53,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.04 | bwd_microstep: 1450.59 | bwd_inner_microstep: 1450.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2131
[2024-06-10 21:01:54,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.57 | bwd_microstep: 863.85 | bwd_inner_microstep: 863.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-10 21:01:56,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1486.39 | bwd_inner_microstep: 1486.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 21:01:58,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 21:01:59,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.31 | bwd_microstep: 794.95 | bwd_inner_microstep: 794.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691
[2024-06-10 21:02:01,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.50 | bwd_microstep: 1629.54 | bwd_inner_microstep: 1629.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-10 21:02:03,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.27 | bwd_microstep: 1464.42 | bwd_inner_microstep: 1464.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3645
[2024-06-10 21:02:06,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.59 | bwd_microstep: 1560.78 | bwd_inner_microstep: 1560.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 21:02:07,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1380.31 | bwd_inner_microstep: 1380.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2416
[2024-06-10 21:02:09,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.55 | bwd_microstep: 842.84 | bwd_inner_microstep: 842.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 21:02:10,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 21:02:12,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1284.38 | bwd_inner_microstep: 1284.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 21:02:14,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.31 | bwd_microstep: 1284.77 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 21:02:16,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.22 | bwd_microstep: 1252.18 | bwd_inner_microstep: 1252.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 21:02:18,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1402.92 | bwd_inner_microstep: 1402.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 21:02:20,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1412.67 | bwd_inner_microstep: 1412.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 21:02:21,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 21:02:23,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1382.27 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3561
[2024-06-10 21:02:25,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1365.71 | bwd_inner_microstep: 1365.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 21:02:27,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1341.38 | bwd_inner_microstep: 1341.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 21:02:29,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.10 | bwd_microstep: 1305.94 | bwd_inner_microstep: 1305.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3412
[2024-06-10 21:02:31,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.20 | bwd_microstep: 1309.84 | bwd_inner_microstep: 1309.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2046
[2024-06-10 21:02:32,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.67 | bwd_microstep: 953.78 | bwd_inner_microstep: 953.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 21:02:35,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-10 21:02:35,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 2158.09 | bwd_inner_microstep: 1815.43 | bwd_allreduce_microstep: 342.60 | step_microstep: 38.88
[2024-06-10 21:02:35,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15975.29 | bwd: 43253.73 | bwd_inner: 42910.18 | bwd_allreduce: 342.83 | step: 40.80
{'loss': 1.1653, 'learning_rate': 9.612504320910249e-06, 'epoch': 0.68}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 21:02:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.24 | bwd_microstep: 1471.83 | bwd_inner_microstep: 1471.61 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2339
[2024-06-10 21:02:38,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.75 | bwd_microstep: 949.92 | bwd_inner_microstep: 949.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 21:02:40,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.54 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430
[2024-06-10 21:02:42,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.91 | bwd_microstep: 1180.94 | bwd_inner_microstep: 1180.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4088
[2024-06-10 21:02:44,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.44 | bwd_microstep: 1625.23 | bwd_inner_microstep: 1625.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-10 21:02:46,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.84 | bwd_microstep: 1149.25 | bwd_inner_microstep: 1149.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 21:02:47,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1248.45 | bwd_inner_microstep: 1248.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 21:02:49,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3496
[2024-06-10 21:02:51,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1415.50 | bwd_inner_microstep: 1415.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583
[2024-06-10 21:02:53,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1453.09 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3708
[2024-06-10 21:02:55,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.42 | bwd_microstep: 1694.69 | bwd_inner_microstep: 1694.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3674
[2024-06-10 21:02:58,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.96 | bwd_microstep: 1584.58 | bwd_inner_microstep: 1584.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 21:03:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.76 | bwd_microstep: 1483.45 | bwd_inner_microstep: 1483.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 21:03:01,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1286.53 | bwd_inner_microstep: 1286.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 21:03:03,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1248.93 | bwd_inner_microstep: 1248.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 21:03:05,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.96 | bwd_microstep: 1285.38 | bwd_inner_microstep: 1285.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-10 21:03:07,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.63 | bwd_microstep: 1754.26 | bwd_inner_microstep: 1754.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3643
[2024-06-10 21:03:09,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1446.16 | bwd_inner_microstep: 1446.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 21:03:11,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1492.73 | bwd_inner_microstep: 1492.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296
[2024-06-10 21:03:13,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.26 | bwd_microstep: 977.66 | bwd_inner_microstep: 977.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 21:03:15,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1293.95 | bwd_inner_microstep: 1293.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-10 21:03:17,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.51 | bwd_microstep: 1521.53 | bwd_inner_microstep: 1521.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 21:03:19,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1511.00 | bwd_inner_microstep: 1510.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 21:03:21,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1411.79 | bwd_inner_microstep: 1411.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2192
[2024-06-10 21:03:22,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.10 | bwd_microstep: 771.17 | bwd_inner_microstep: 771.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 21:03:24,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1289.74 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681
[2024-06-10 21:03:26,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1529.15 | bwd_inner_microstep: 1529.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 21:03:28,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566
[2024-06-10 21:03:30,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1596.90 | bwd_inner_microstep: 1596.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-10 21:03:32,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 1543.73 | bwd_inner_microstep: 1543.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-10 21:03:34,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.62 | bwd_microstep: 1310.23 | bwd_inner_microstep: 1310.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 21:03:36,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.27 | optimizer_step: 6.65
[2024-06-10 21:03:36,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.78 | bwd_microstep: 1585.16 | bwd_inner_microstep: 1577.27 | bwd_allreduce_microstep: 7.83 | step_microstep: 38.69
[2024-06-10 21:03:36,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16499.18 | bwd: 44179.88 | bwd_inner: 44170.98 | bwd_allreduce: 8.15 | step: 40.27
{'loss': 1.2132, 'learning_rate': 9.580448113399069e-06, 'epoch': 0.68}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992
[2024-06-10 21:03:37,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 899.53 | bwd_inner_microstep: 899.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 21:03:39,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.73 | bwd_microstep: 1374.96 | bwd_inner_microstep: 1374.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 21:03:41,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.12 | bwd_microstep: 1243.76 | bwd_inner_microstep: 1243.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482
[2024-06-10 21:03:43,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.53 | bwd_microstep: 1313.83 | bwd_inner_microstep: 1313.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 21:03:45,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.11 | bwd_microstep: 1482.02 | bwd_inner_microstep: 1481.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 21:03:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1492.56 | bwd_inner_microstep: 1492.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-10 21:03:48,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1280.10 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 21:03:51,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 1484.47 | bwd_inner_microstep: 1484.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 21:03:53,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.54 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 21:03:54,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1388.77 | bwd_inner_microstep: 1388.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-10 21:03:56,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1315.77 | bwd_inner_microstep: 1315.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3657
[2024-06-10 21:03:58,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1322.06 | bwd_inner_microstep: 1322.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 21:04:00,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.45 | bwd_microstep: 1618.06 | bwd_inner_microstep: 1618.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 819
[2024-06-10 21:04:01,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 131.30 | bwd_microstep: 341.46 | bwd_inner_microstep: 341.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 21:04:03,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1350.72 | bwd_inner_microstep: 1350.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2887
[2024-06-10 21:04:04,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.97 | bwd_microstep: 1184.76 | bwd_inner_microstep: 1184.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 21:04:06,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1377.32 | bwd_inner_microstep: 1377.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 21:04:08,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1475.97 | bwd_inner_microstep: 1475.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 21:04:10,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.10 | bwd_microstep: 1389.00 | bwd_inner_microstep: 1388.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3481
[2024-06-10 21:04:12,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.13 | bwd_microstep: 1243.04 | bwd_inner_microstep: 1243.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 21:04:14,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1293.93 | bwd_inner_microstep: 1293.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 21:04:16,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.29 | bwd_microstep: 1558.41 | bwd_inner_microstep: 1558.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2252
[2024-06-10 21:04:17,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.99 | bwd_microstep: 872.97 | bwd_inner_microstep: 872.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 21:04:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1399.92 | bwd_inner_microstep: 1399.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-10 21:04:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.73 | bwd_microstep: 1411.70 | bwd_inner_microstep: 1411.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 21:04:23,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1398.87 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 21:04:25,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1461.77 | bwd_inner_microstep: 1461.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 21:04:27,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.81 | bwd_microstep: 1385.27 | bwd_inner_microstep: 1385.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-10 21:04:29,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.41 | bwd_microstep: 1322.26 | bwd_inner_microstep: 1322.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-10 21:04:31,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.09 | bwd_microstep: 1493.01 | bwd_inner_microstep: 1492.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3811
[2024-06-10 21:04:33,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.63 | bwd_microstep: 1856.40 | bwd_inner_microstep: 1856.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-10 21:04:38,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.26 | optimizer_step: 6.58
[2024-06-10 21:04:38,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 3730.16 | bwd_inner_microstep: 1698.38 | bwd_allreduce_microstep: 2031.72 | step_microstep: 38.92
[2024-06-10 21:04:38,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16142.78 | bwd: 45242.68 | bwd_inner: 43210.01 | bwd_allreduce: 2031.97 | step: 40.40
{'loss': 1.2194, 'learning_rate': 9.54842860349548e-06, 'epoch': 0.68}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 21:04:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.41 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.31 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 21:04:42,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1486.80 | bwd_inner_microstep: 1486.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 21:04:43,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.39 | bwd_microstep: 1406.05 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3865
[2024-06-10 21:04:46,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.94 | bwd_microstep: 1523.99 | bwd_inner_microstep: 1523.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-10 21:04:47,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.13 | bwd_microstep: 679.52 | bwd_inner_microstep: 679.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 21:04:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-10 21:04:50,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.62 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1347.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 21:04:52,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.01 | bwd_microstep: 1148.47 | bwd_inner_microstep: 1148.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-10 21:04:54,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.70 | bwd_microstep: 1532.21 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 21:04:56,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1250.36 | bwd_inner_microstep: 1250.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 21:04:57,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1249.77 | bwd_inner_microstep: 1249.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-10 21:04:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.62 | bwd_microstep: 821.25 | bwd_inner_microstep: 821.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1963
[2024-06-10 21:05:00,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.25 | bwd_microstep: 856.88 | bwd_inner_microstep: 856.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 21:05:02,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.46 | bwd_microstep: 1456.08 | bwd_inner_microstep: 1456.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-10 21:05:04,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.51 | bwd_microstep: 1519.60 | bwd_inner_microstep: 1519.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 21:05:06,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1405.05 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-10 21:05:07,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.05 | bwd_microstep: 891.12 | bwd_inner_microstep: 891.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 21:05:09,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1504.02 | bwd_inner_microstep: 1503.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 21:05:11,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1392.91 | bwd_inner_microstep: 1392.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2730
[2024-06-10 21:05:12,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1042.70 | bwd_inner_microstep: 1042.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-10 21:05:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.53 | bwd_microstep: 1652.47 | bwd_inner_microstep: 1652.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 21:05:17,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1434.94 | bwd_inner_microstep: 1434.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 21:05:18,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.60 | bwd_microstep: 800.96 | bwd_inner_microstep: 800.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2168
[2024-06-10 21:05:19,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.64 | bwd_microstep: 951.78 | bwd_inner_microstep: 951.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3685
[2024-06-10 21:05:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1359.18 | bwd_inner_microstep: 1359.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 21:05:23,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1399.76 | bwd_inner_microstep: 1399.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1384
[2024-06-10 21:05:24,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 233.20 | bwd_microstep: 620.90 | bwd_inner_microstep: 620.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3716
[2024-06-10 21:05:26,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.21 | bwd_microstep: 1729.83 | bwd_inner_microstep: 1729.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055
[2024-06-10 21:05:27,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.75 | bwd_microstep: 912.68 | bwd_inner_microstep: 912.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 21:05:29,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1252.68 | bwd_inner_microstep: 1252.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 21:05:31,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1403.22 | bwd_inner_microstep: 1403.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580
[2024-06-10 21:05:36,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 21:05:36,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.27 | bwd_microstep: 4567.55 | bwd_inner_microstep: 1724.96 | bwd_allreduce_microstep: 2842.54 | step_microstep: 37.85
[2024-06-10 21:05:36,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15065.25 | bwd: 43224.64 | bwd_inner: 40381.07 | bwd_allreduce: 2842.84 | step: 40.63
{'loss': 1.1996, 'learning_rate': 9.516445903972005e-06, 'epoch': 0.69}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 21:05:37,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 796.80 | bwd_inner_microstep: 796.67 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2639
[2024-06-10 21:05:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.05 | bwd_microstep: 1017.79 | bwd_inner_microstep: 1017.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889
[2024-06-10 21:05:41,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1583.52 | bwd_inner_microstep: 1583.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 21:05:43,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.02 | bwd_microstep: 1281.48 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3758
[2024-06-10 21:05:45,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1303.86 | bwd_inner_microstep: 1303.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586
[2024-06-10 21:05:47,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.34 | bwd_microstep: 1435.23 | bwd_inner_microstep: 1435.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-10 21:05:49,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1546.11 | bwd_inner_microstep: 1546.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-10 21:05:50,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.81 | bwd_microstep: 1151.50 | bwd_inner_microstep: 1151.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 21:05:52,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1292.54 | bwd_inner_microstep: 1292.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2640
[2024-06-10 21:05:53,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.10 | bwd_microstep: 1022.07 | bwd_inner_microstep: 1022.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 21:05:55,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1248.69 | bwd_inner_microstep: 1248.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 21:05:57,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1497.49 | bwd_inner_microstep: 1497.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 21:05:59,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1510.53 | bwd_inner_microstep: 1510.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 21:06:01,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.25 | bwd_microstep: 1482.85 | bwd_inner_microstep: 1482.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-10 21:06:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.12 | bwd_microstep: 1418.00 | bwd_inner_microstep: 1417.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 21:06:05,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1406.75 | bwd_inner_microstep: 1406.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-10 21:06:07,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1297.90 | bwd_inner_microstep: 1297.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 21:06:09,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1253.94 | bwd_inner_microstep: 1253.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-10 21:06:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.81 | bwd_microstep: 1520.48 | bwd_inner_microstep: 1520.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-10 21:06:12,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.34 | bwd_microstep: 981.70 | bwd_inner_microstep: 981.53 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 21:06:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3447
[2024-06-10 21:06:16,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1192.07 | bwd_inner_microstep: 1192.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 21:06:18,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1394.86 | bwd_inner_microstep: 1394.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 21:06:19,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.23 | bwd_microstep: 805.39 | bwd_inner_microstep: 805.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 21:06:21,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.78 | bwd_microstep: 1311.43 | bwd_inner_microstep: 1311.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2943
[2024-06-10 21:06:22,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.84 | bwd_microstep: 1162.04 | bwd_inner_microstep: 1162.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 21:06:24,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1497.28 | bwd_inner_microstep: 1497.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3823
[2024-06-10 21:06:27,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.22 | bwd_microstep: 1684.84 | bwd_inner_microstep: 1684.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 21:06:28,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.38 | bwd_microstep: 789.86 | bwd_inner_microstep: 789.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2605
[2024-06-10 21:06:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.86 | bwd_microstep: 1026.97 | bwd_inner_microstep: 1026.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 21:06:31,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.51 | bwd_microstep: 1498.57 | bwd_inner_microstep: 1498.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3574
[2024-06-10 21:06:36,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.28 | optimizer_step: 6.60
[2024-06-10 21:06:36,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.21 | bwd_microstep: 4599.53 | bwd_inner_microstep: 1708.81 | bwd_allreduce_microstep: 2890.67 | step_microstep: 39.09
[2024-06-10 21:06:36,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15470.56 | bwd: 44408.68 | bwd_inner: 41516.87 | bwd_allreduce: 2891.02 | step: 40.67
/1726 [20:24:12<9:17:19, 61.13s/it]


 68%|██████▊   | 1179/1726 [20:24:12<9:17:19, 61.13s/it]
 68%|██████▊   | 1180/1726 [20:25:12<9:12:05, 60.67s/it]


 68%|██████▊   | 1180/1726 [20:25:12<9:12:05, 60.67s/it]
 68%|██████▊   | 1181/1726 [20:26:13<9:12:02, 60.78s/it]


 68%|██████▊   | 1181/1726 [20:26:13<9:12:02, 60.78s/it]
 68%|██████▊   | 1182/1726 [20:27:14<9:13:37, 61.06s/it]


 68%|██████▊   | 1182/1726 [20:27:14<9:13:37, 61.06s/it]
 69%|██████▊   | 1183/1726 [20:28:13<9:06:01, 60.33s/it]


 69%|██████▊   | 1183/1726 [20:28:13<9:06:01, 60.33s/it]
 69%|██████▊   | 1184/1726 [20:29:13<9:04:41,{'loss': 1.1969, 'learning_rate': 9.484500127471562e-06, 'epoch': 0.69}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3460
[2024-06-10 21:06:38,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1421.80 | bwd_inner_microstep: 1421.72 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 21:06:40,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.03 | bwd_microstep: 1278.67 | bwd_inner_microstep: 1278.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 21:06:42,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1280.99 | bwd_inner_microstep: 1280.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1344
[2024-06-10 21:06:43,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 197.85 | bwd_microstep: 514.02 | bwd_inner_microstep: 513.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3777
[2024-06-10 21:06:45,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.37 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 21:06:46,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.68 | bwd_microstep: 1284.09 | bwd_inner_microstep: 1284.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 21:06:48,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1387.55 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 21:06:50,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.85 | bwd_microstep: 1632.76 | bwd_inner_microstep: 1632.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 21:06:52,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1386.74 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3696
[2024-06-10 21:06:54,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1327.30 | bwd_inner_microstep: 1327.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2078
[2024-06-10 21:06:55,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.71 | bwd_microstep: 849.61 | bwd_inner_microstep: 849.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 21:06:57,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1380.98 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 21:07:00,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.19 | bwd_microstep: 1716.74 | bwd_inner_microstep: 1716.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-10 21:07:01,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.75 | bwd_microstep: 1313.46 | bwd_inner_microstep: 1313.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 21:07:03,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1392.10 | bwd_inner_microstep: 1392.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690
[2024-06-10 21:07:05,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1332.72 | bwd_inner_microstep: 1332.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3637
[2024-06-10 21:07:08,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.85 | bwd_microstep: 1659.42 | bwd_inner_microstep: 1659.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451
[2024-06-10 21:07:09,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1317.75 | bwd_inner_microstep: 1317.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-10 21:07:12,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.50 | bwd_microstep: 1588.62 | bwd_inner_microstep: 1588.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 21:07:13,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.41 | bwd_microstep: 696.02 | bwd_inner_microstep: 696.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-10 21:07:14,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 807.99 | bwd_inner_microstep: 807.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-10 21:07:16,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1488.43 | bwd_inner_microstep: 1488.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 21:07:17,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.73 | bwd_microstep: 1299.34 | bwd_inner_microstep: 1299.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 21:07:19,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1456.98 | bwd_inner_microstep: 1456.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 21:07:21,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1396.46 | bwd_inner_microstep: 1396.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3725
[2024-06-10 21:07:24,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1561.07 | bwd_inner_microstep: 1561.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3819
[2024-06-10 21:07:26,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.36 | bwd_microstep: 1821.93 | bwd_inner_microstep: 1821.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812
[2024-06-10 21:07:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.59 | bwd_microstep: 1619.54 | bwd_inner_microstep: 1619.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3699
[2024-06-10 21:07:30,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.36 | bwd_microstep: 1450.61 | bwd_inner_microstep: 1450.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 21:07:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1339.45 | bwd_inner_microstep: 1339.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 21:07:34,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1315.10 | bwd_inner_microstep: 1315.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 21:07:37,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.23 | optimizer_step: 6.59
[2024-06-10 21:07:37,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.00 | bwd_microstep: 2415.67 | bwd_inner_microstep: 1729.70 | bwd_allreduce_microstep: 685.92 | step_microstep: 39.06
[2024-06-10 21:07:37,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.29 | bwd: 44045.94 | bwd_inner: 43359.04 | bwd_allreduce: 686.19 | step: 40.68
{'loss': 1.1522, 'learning_rate': 9.452591386506999e-06, 'epoch': 0.69}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 21:07:39,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.85 | bwd_microstep: 1523.61 | bwd_inner_microstep: 1523.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3910
[2024-06-10 21:07:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.95 | bwd_microstep: 1692.24 | bwd_inner_microstep: 1692.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-10 21:07:43,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.94 | bwd_microstep: 1181.03 | bwd_inner_microstep: 1181.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 21:07:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1479.26 | bwd_inner_microstep: 1479.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4318
[2024-06-10 21:07:48,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.08 | bwd_microstep: 1782.64 | bwd_inner_microstep: 1782.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 21:07:49,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 21:07:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1492.10 | bwd_inner_microstep: 1492.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2257
[2024-06-10 21:07:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 779.16 | bwd_inner_microstep: 779.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-10 21:07:54,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.21 | bwd_microstep: 814.94 | bwd_inner_microstep: 814.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3627
[2024-06-10 21:07:55,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.55 | bwd_microstep: 1250.94 | bwd_inner_microstep: 1250.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 21:07:57,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1280.55 | bwd_inner_microstep: 1280.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3405
[2024-06-10 21:07:59,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.88 | bwd_microstep: 1207.26 | bwd_inner_microstep: 1207.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 21:08:01,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1479.68 | bwd_inner_microstep: 1479.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 21:08:03,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1515.96 | bwd_inner_microstep: 1515.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 21:08:05,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1446.24 | bwd_inner_microstep: 1446.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3501
[2024-06-10 21:08:07,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.36 | bwd_microstep: 1187.57 | bwd_inner_microstep: 1187.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514
[2024-06-10 21:08:08,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1416.27 | bwd_inner_microstep: 1416.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 919
[2024-06-10 21:08:09,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.98 | bwd_microstep: 375.03 | bwd_inner_microstep: 375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 21:08:11,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1420.29 | bwd_inner_microstep: 1420.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 21:08:12,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.29 | bwd_microstep: 802.16 | bwd_inner_microstep: 802.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 21:08:14,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3483
[2024-06-10 21:08:16,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.40 | bwd_microstep: 1347.07 | bwd_inner_microstep: 1347.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-10 21:08:18,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1496.69 | bwd_inner_microstep: 1496.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 21:08:20,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1509.23 | bwd_inner_microstep: 1509.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 21:08:21,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.15 | bwd_microstep: 698.38 | bwd_inner_microstep: 698.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 21:08:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1350.44 | bwd_inner_microstep: 1350.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 21:08:25,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1621.30 | bwd_inner_microstep: 1621.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-10 21:08:27,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.75 | bwd_microstep: 1605.25 | bwd_inner_microstep: 1605.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3803
[2024-06-10 21:08:30,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.76 | bwd_microstep: 1535.69 | bwd_inner_microstep: 1535.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-10 21:08:32,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.97 | bwd_microstep: 1639.11 | bwd_inner_microstep: 1639.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3772
[2024-06-10 21:08:34,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.30 | bwd_microstep: 1579.07 | bwd_inner_microstep: 1579.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273
[2024-06-10 21:08:36,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 21:08:36,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.58 | bwd_microstep: 2128.11 | bwd_inner_microstep: 993.03 | bwd_allreduce_microstep: 1135.04 | step_microstep: 37.66
[2024-06-10 21:08:36,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15747.60 | bwd: 43374.18 | bwd_inner: 42238.24 | bwd_allreduce: 1135.27 | step: 39.09
{'loss': 1.2125, 'learning_rate': 9.420719793460758e-06, 'epoch': 0.69}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 21:08:38,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1368.46 | bwd_inner_microstep: 1368.33 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864
[2024-06-10 21:08:40,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.42 | bwd_microstep: 1466.20 | bwd_inner_microstep: 1466.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 21:08:43,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.57 | bwd_microstep: 1550.94 | bwd_inner_microstep: 1550.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 21:08:44,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1341.67 | bwd_inner_microstep: 1341.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3509
[2024-06-10 21:08:46,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.38 | bwd_microstep: 1191.90 | bwd_inner_microstep: 1191.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4020
[2024-06-10 21:08:48,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1614.06 | bwd_inner_microstep: 1614.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 21:08:50,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 21:08:52,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1296.84 | bwd_inner_microstep: 1296.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 21:08:53,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 801.16 | bwd_inner_microstep: 801.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 21:08:55,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.94 | bwd_microstep: 1310.78 | bwd_inner_microstep: 1310.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3499
[2024-06-10 21:08:57,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.17 | bwd_microstep: 1333.70 | bwd_inner_microstep: 1333.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3595
[2024-06-10 21:08:59,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1371.42 | bwd_inner_microstep: 1371.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 21:09:01,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1514.79 | bwd_inner_microstep: 1514.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 21:09:03,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 21:09:05,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1589.88 | bwd_inner_microstep: 1589.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524
[2024-06-10 21:09:07,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.14 | bwd_microstep: 1582.13 | bwd_inner_microstep: 1582.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 21:09:09,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1437.38 | bwd_inner_microstep: 1437.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3518
[2024-06-10 21:09:11,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1253.55 | bwd_inner_microstep: 1253.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-10 21:09:12,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.38 | bwd_microstep: 802.72 | bwd_inner_microstep: 802.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-10 21:09:14,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.59 | bwd_microstep: 1628.59 | bwd_inner_microstep: 1628.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-10 21:09:16,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.73 | bwd_microstep: 1190.04 | bwd_inner_microstep: 1190.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551
[2024-06-10 21:09:17,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.06 | bwd_microstep: 1201.22 | bwd_inner_microstep: 1201.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-10 21:09:20,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1621.08 | bwd_inner_microstep: 1621.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-10 21:09:22,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 2290.28 | bwd_inner_microstep: 2290.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-10 21:09:24,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.03 | bwd_microstep: 1487.48 | bwd_inner_microstep: 1487.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 21:09:27,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.85 | bwd_microstep: 1657.22 | bwd_inner_microstep: 1657.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 21:09:29,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1500.00 | bwd_inner_microstep: 1499.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 21:09:31,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 21:09:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.99 | bwd_microstep: 974.80 | bwd_inner_microstep: 974.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584
[2024-06-10 21:09:34,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3589
[2024-06-10 21:09:37,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.88 | bwd_microstep: 1699.76 | bwd_inner_microstep: 1699.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-10 21:09:39,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.11 | optimizer_gradients: 4.10 | optimizer_step: 6.61
[2024-06-10 21:09:39,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.72 | bwd_microstep: 1441.03 | bwd_inner_microstep: 1433.34 | bwd_allreduce_microstep: 7.65 | step_microstep: 39.21
[2024-06-10 21:09:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16587.56 | bwd: 45133.25 | bwd_inner: 45124.61 | bwd_allreduce: 7.92 | step: 40.75
{'loss': 1.1793, 'learning_rate': 9.388885460584392e-06, 'epoch': 0.69}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 21:09:40,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.88 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 21:09:42,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1244.04 | bwd_inner_microstep: 1244.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2331
[2024-06-10 21:09:43,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.52 | bwd_microstep: 886.66 | bwd_inner_microstep: 886.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 21:09:45,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1556.18 | bwd_inner_microstep: 1556.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 21:09:47,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1245.35 | bwd_inner_microstep: 1245.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-10 21:09:49,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1249.57 | bwd_inner_microstep: 1249.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 21:09:51,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.49 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 21:09:53,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-10 21:09:55,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1421.39 | bwd_inner_microstep: 1421.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-10 21:09:57,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1624.90 | bwd_inner_microstep: 1624.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 21:09:59,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1282.10 | bwd_inner_microstep: 1282.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 21:10:01,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1513.83 | bwd_inner_microstep: 1513.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 21:10:02,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1247.94 | bwd_inner_microstep: 1247.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-10 21:10:03,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.92 | bwd_microstep: 796.22 | bwd_inner_microstep: 796.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-10 21:10:06,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1489.48 | bwd_inner_microstep: 1489.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3843
[2024-06-10 21:10:08,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.43 | bwd_microstep: 1792.13 | bwd_inner_microstep: 1792.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-10 21:10:10,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.81 | bwd_microstep: 1373.62 | bwd_inner_microstep: 1373.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 21:10:12,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 1551.02 | bwd_inner_microstep: 1550.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 21:10:14,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1553.93 | bwd_inner_microstep: 1553.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3618
[2024-06-10 21:10:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1342.53 | bwd_inner_microstep: 1342.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 21:10:18,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1382.17 | bwd_inner_microstep: 1382.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3689
[2024-06-10 21:10:20,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.62 | bwd_microstep: 1591.07 | bwd_inner_microstep: 1591.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3543
[2024-06-10 21:10:22,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.48 | bwd_microstep: 1554.79 | bwd_inner_microstep: 1554.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2134
[2024-06-10 21:10:23,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 767.87 | bwd_inner_microstep: 767.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3604
[2024-06-10 21:10:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.90 | bwd_microstep: 1648.37 | bwd_inner_microstep: 1648.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 21:10:28,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1398.24 | bwd_inner_microstep: 1398.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 21:10:30,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1424.93 | bwd_inner_microstep: 1424.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2018
[2024-06-10 21:10:31,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.85 | bwd_microstep: 902.45 | bwd_inner_microstep: 902.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3723
[2024-06-10 21:10:33,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.78 | bwd_microstep: 1732.43 | bwd_inner_microstep: 1732.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 21:10:35,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.62 | bwd_microstep: 1302.25 | bwd_inner_microstep: 1302.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-10 21:10:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1300.71 | bwd_inner_microstep: 1300.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2441
[2024-06-10 21:10:40,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.04 | optimizer_step: 6.61
[2024-06-10 21:10:40,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 3072.55 | bwd_inner_microstep: 1189.50 | bwd_allreduce_microstep: 1883.00 | step_microstep: 37.78
[2024-06-10 21:10:40,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16161.63 | bwd: 45201.58 | bwd_inner: 43317.69 | bwd_allreduce: 1883.23 | step: 39.29
{'loss': 1.1949, 'learning_rate': 9.35708849999828e-06, 'epoch': 0.69}
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3403
[2024-06-10 21:10:42,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.32 | bwd_microstep: 1451.94 | bwd_inner_microstep: 1451.85 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 21:10:44,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.24 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 21:10:46,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.39 | bwd_microstep: 1355.22 | bwd_inner_microstep: 1355.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-10 21:10:48,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1396.30 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 21:10:50,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946
[2024-06-10 21:10:51,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.70 | bwd_microstep: 727.36 | bwd_inner_microstep: 727.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 21:10:53,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.27 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2401
[2024-06-10 21:10:54,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.97 | bwd_microstep: 937.25 | bwd_inner_microstep: 937.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3448
[2024-06-10 21:10:56,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.52 | bwd_microstep: 1184.82 | bwd_inner_microstep: 1184.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 21:10:57,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 21:10:59,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1315.44 | bwd_inner_microstep: 1315.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2132
[2024-06-10 21:11:00,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.43 | bwd_microstep: 888.95 | bwd_inner_microstep: 888.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972
[2024-06-10 21:11:01,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 703.36 | bwd_inner_microstep: 703.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846
[2024-06-10 21:11:02,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.17 | bwd_microstep: 671.46 | bwd_inner_microstep: 671.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 21:11:03,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1296.01 | bwd_inner_microstep: 1295.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515
[2024-06-10 21:11:05,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1324.27 | bwd_inner_microstep: 1324.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3433
[2024-06-10 21:11:07,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.76 | bwd_microstep: 1309.09 | bwd_inner_microstep: 1309.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 21:11:09,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1428.46 | bwd_inner_microstep: 1428.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 21:11:11,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-10 21:11:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.20 | bwd_microstep: 1631.92 | bwd_inner_microstep: 1631.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3681
[2024-06-10 21:11:15,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1379.04 | bwd_inner_microstep: 1379.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 21:11:17,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1290.77 | bwd_inner_microstep: 1290.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2443
[2024-06-10 21:11:19,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.47 | bwd_microstep: 1046.41 | bwd_inner_microstep: 1046.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 21:11:21,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1504.41 | bwd_inner_microstep: 1504.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 21:11:23,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.34 | bwd_microstep: 1498.14 | bwd_inner_microstep: 1498.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 21:11:24,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.24 | bwd_microstep: 811.27 | bwd_inner_microstep: 811.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3596
[2024-06-10 21:11:26,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3640
[2024-06-10 21:11:28,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1531.60 | bwd_inner_microstep: 1531.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044
[2024-06-10 21:11:29,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.98 | bwd_microstep: 810.57 | bwd_inner_microstep: 810.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 21:11:31,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.42 | bwd_microstep: 1452.25 | bwd_inner_microstep: 1452.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-10 21:11:33,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.52 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 21:11:42,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.31 | optimizer_step: 6.60
[2024-06-10 21:11:42,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.75 | bwd_microstep: 8564.95 | bwd_inner_microstep: 1935.21 | bwd_allreduce_microstep: 6629.67 | step_microstep: 39.94
[2024-06-10 21:11:42,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14917.83 | bwd: 46750.89 | bwd_inner: 40120.21 | bwd_allreduce: 6629.96 | step: 41.57
 60.30s/it]


 69%|██████▊   | 1184/1726 [20:29:13<9:04:41, 60.30s/it]
 69%|██████▊   | 1185/1726 [20:30:14<9:04:24, 60.38s/it]


 69%|██████▊   | 1185/1726 [20:30:14<9:04:24, 60.38s/it]
 69%|██████▊   | 1186/1726 [20:31:13<9:00:54, 60.10s/it]


 69%|██████▊   | 1186/1726 [20:31:13<9:00:54, 60.10s/it]
 69%|██████▉   | 1187/1726 [20:32:15<9:05:12, 60.69s/it]


 69%|██████▉   | 1187/1726 [20:32:15<9:05:12, 60.69s/it]
 69%|██████▉   | 1188/1726 [20:33:17<9:06:55, 61.00s/it]


 69%|██████▉   | 1188/1726 [20:33:17<9:06:55, 61.00s/it]
 69%|██████▉   | 1189/1726 [20:34:19<9:08:35, 61.30s/it]
            {'loss': 1.1688, 'learning_rate': 9.325329023691137e-06, 'epoch': 0.69}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 21:11:44,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.63 | bwd_microstep: 1230.40 | bwd_inner_microstep: 1230.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4436
[2024-06-10 21:11:46,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.83 | bwd_microstep: 1816.96 | bwd_inner_microstep: 1816.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3857
[2024-06-10 21:11:49,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1490.18 | bwd_inner_microstep: 1490.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 21:11:50,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1377.12 | bwd_inner_microstep: 1377.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 21:11:52,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1381.48 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 21:11:54,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.81 | bwd_microstep: 1533.98 | bwd_inner_microstep: 1533.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 21:11:56,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1284.30 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 21:11:58,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.09 | bwd_microstep: 1148.00 | bwd_inner_microstep: 1147.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-10 21:11:59,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.94 | bwd_microstep: 1151.13 | bwd_inner_microstep: 1151.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3436
[2024-06-10 21:12:01,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.56 | bwd_microstep: 1185.68 | bwd_inner_microstep: 1185.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182
[2024-06-10 21:12:02,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.91 | bwd_microstep: 860.12 | bwd_inner_microstep: 860.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 21:12:04,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.22 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 21:12:06,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.01 | bwd_microstep: 1531.83 | bwd_inner_microstep: 1531.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 21:12:07,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 795.13 | bwd_inner_microstep: 795.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-10 21:12:09,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1408.50 | bwd_inner_microstep: 1408.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 21:12:11,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.61 | bwd_microstep: 1717.83 | bwd_inner_microstep: 1717.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 21:12:14,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1513.01 | bwd_inner_microstep: 1512.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623
[2024-06-10 21:12:16,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.14 | bwd_microstep: 1705.04 | bwd_inner_microstep: 1705.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3619
[2024-06-10 21:12:18,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.84 | bwd_microstep: 1707.38 | bwd_inner_microstep: 1707.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-10 21:12:20,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.75 | bwd_microstep: 1580.92 | bwd_inner_microstep: 1580.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829
[2024-06-10 21:12:23,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.53 | bwd_microstep: 1755.71 | bwd_inner_microstep: 1755.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 21:12:25,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.46 | bwd_microstep: 1292.66 | bwd_inner_microstep: 1292.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 21:12:27,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 21:12:28,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-10 21:12:30,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.50 | bwd_microstep: 877.57 | bwd_inner_microstep: 877.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3085
[2024-06-10 21:12:32,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.71 | bwd_microstep: 1332.16 | bwd_inner_microstep: 1332.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3604
[2024-06-10 21:12:34,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.99 | bwd_microstep: 1534.93 | bwd_inner_microstep: 1534.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 21:12:36,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1399.97 | bwd_inner_microstep: 1399.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 21:12:37,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1280.52 | bwd_inner_microstep: 1280.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2018
[2024-06-10 21:12:38,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.12 | bwd_microstep: 714.04 | bwd_inner_microstep: 714.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729
[2024-06-10 21:12:40,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.68 | bwd_microstep: 1443.64 | bwd_inner_microstep: 1443.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 21:12:45,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-10 21:12:45,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.39 | bwd_microstep: 3715.08 | bwd_inner_microstep: 1664.10 | bwd_allreduce_microstep: 2050.93 | step_microstep: 38.03
[2024-06-10 21:12:45,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16281.61 | bwd: 45796.09 | bwd_inner: 43744.25 | bwd_allreduce: 2051.16 | step: 39.65
{'loss': 1.2349, 'learning_rate': 9.293607143519685e-06, 'epoch': 0.69}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 21:12:46,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.81 | bwd_microstep: 1275.63 | bwd_inner_microstep: 1275.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475
[2024-06-10 21:12:48,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.10 | bwd_microstep: 1344.09 | bwd_inner_microstep: 1344.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3943
[2024-06-10 21:12:51,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.74 | bwd_microstep: 1691.88 | bwd_inner_microstep: 1691.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 21:12:53,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1482.72 | bwd_inner_microstep: 1482.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942
[2024-06-10 21:12:54,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.58 | bwd_microstep: 726.18 | bwd_inner_microstep: 726.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2271
[2024-06-10 21:12:55,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.13 | bwd_microstep: 934.92 | bwd_inner_microstep: 934.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 21:12:57,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1384.98 | bwd_inner_microstep: 1384.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3612
[2024-06-10 21:12:59,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1214.29 | bwd_inner_microstep: 1214.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2565
[2024-06-10 21:13:00,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.72 | bwd_microstep: 975.75 | bwd_inner_microstep: 975.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436
[2024-06-10 21:13:02,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1310.97 | bwd_inner_microstep: 1310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2339
[2024-06-10 21:13:03,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.73 | bwd_microstep: 894.74 | bwd_inner_microstep: 894.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 21:13:05,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1485.33 | bwd_inner_microstep: 1485.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 21:13:07,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1339.80 | bwd_inner_microstep: 1339.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388
[2024-06-10 21:13:09,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.31 | bwd_microstep: 1339.10 | bwd_inner_microstep: 1339.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3560
[2024-06-10 21:13:11,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.71 | bwd_microstep: 1340.46 | bwd_inner_microstep: 1340.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684
[2024-06-10 21:13:13,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.97 | bwd_microstep: 1721.05 | bwd_inner_microstep: 1721.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 21:13:15,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1350.10 | bwd_inner_microstep: 1350.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3696
[2024-06-10 21:13:17,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.21 | bwd_microstep: 1492.83 | bwd_inner_microstep: 1492.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 21:13:19,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 21:13:21,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1541.81 | bwd_inner_microstep: 1541.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1407
[2024-06-10 21:13:22,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.15 | bwd_microstep: 560.42 | bwd_inner_microstep: 560.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425
[2024-06-10 21:13:23,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3688
[2024-06-10 21:13:25,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1329.29 | bwd_inner_microstep: 1329.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 21:13:27,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.75 | bwd_microstep: 1440.62 | bwd_inner_microstep: 1440.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-10 21:13:29,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1301.78 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3491
[2024-06-10 21:13:31,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.84 | bwd_microstep: 1220.39 | bwd_inner_microstep: 1220.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-10 21:13:32,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.25 | bwd_microstep: 809.68 | bwd_inner_microstep: 809.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 21:13:34,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 1509.92 | bwd_inner_microstep: 1509.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817
[2024-06-10 21:13:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.06 | bwd_microstep: 1615.40 | bwd_inner_microstep: 1615.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3435
[2024-06-10 21:13:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.87 | bwd_microstep: 1224.23 | bwd_inner_microstep: 1224.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573
[2024-06-10 21:13:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.57 | bwd_microstep: 1566.31 | bwd_inner_microstep: 1566.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807
[2024-06-10 21:13:44,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-10 21:13:44,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 3502.19 | bwd_inner_microstep: 1676.00 | bwd_allreduce_microstep: 1826.11 | step_microstep: 39.84
[2024-06-10 21:13:44,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15579.79 | bwd: 43585.05 | bwd_inner: 41758.01 | bwd_allreduce: 1826.35 | step: 41.45
{'loss': 1.1595, 'learning_rate': 9.261922971208217e-06, 'epoch': 0.69}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 21:13:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.32 | bwd_microstep: 773.88 | bwd_inner_microstep: 773.74 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 21:13:46,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.75 | bwd_microstep: 791.32 | bwd_inner_microstep: 791.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 21:13:48,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1341.85 | bwd_inner_microstep: 1341.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 21:13:50,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 21:13:52,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 21:13:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.90 | bwd_microstep: 1560.85 | bwd_inner_microstep: 1560.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 21:13:57,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.62 | bwd_microstep: 1625.99 | bwd_inner_microstep: 1625.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 21:13:58,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1254.76 | bwd_inner_microstep: 1254.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 21:14:00,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495
[2024-06-10 21:14:02,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1443.98 | bwd_inner_microstep: 1443.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-10 21:14:04,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.04 | bwd_microstep: 1606.14 | bwd_inner_microstep: 1606.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 21:14:05,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.07 | bwd_microstep: 889.55 | bwd_inner_microstep: 889.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-10 21:14:08,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.90 | bwd_microstep: 1586.89 | bwd_inner_microstep: 1586.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431
[2024-06-10 21:14:09,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.60 | bwd_microstep: 1189.78 | bwd_inner_microstep: 1189.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 21:14:12,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.64 | bwd_microstep: 1658.57 | bwd_inner_microstep: 1658.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-10 21:14:13,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.17 | bwd_microstep: 1300.88 | bwd_inner_microstep: 1300.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850
[2024-06-10 21:14:16,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.20 | bwd_microstep: 1564.05 | bwd_inner_microstep: 1564.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 21:14:18,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1516.09 | bwd_inner_microstep: 1516.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 21:14:20,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1396.97 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-10 21:14:22,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.51 | bwd_microstep: 1615.25 | bwd_inner_microstep: 1615.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-10 21:14:24,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.85 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 21:14:26,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.89 | bwd_inner_microstep: 1378.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 21:14:28,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.10 | bwd_microstep: 1568.52 | bwd_inner_microstep: 1568.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 21:14:30,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.69 | bwd_microstep: 1281.55 | bwd_inner_microstep: 1281.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-10 21:14:32,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.23 | bwd_microstep: 1477.05 | bwd_inner_microstep: 1477.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 21:14:34,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1408.10 | bwd_inner_microstep: 1408.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3602
[2024-06-10 21:14:35,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1305.59 | bwd_inner_microstep: 1305.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3455
[2024-06-10 21:14:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1415.46 | bwd_inner_microstep: 1415.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778
[2024-06-10 21:14:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.24 | bwd_microstep: 1748.43 | bwd_inner_microstep: 1748.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 21:14:42,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.58 | bwd_microstep: 1636.68 | bwd_inner_microstep: 1636.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 21:14:44,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1352.15 | bwd_inner_microstep: 1352.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-10 21:14:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-10 21:14:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.75 | bwd_microstep: 2422.84 | bwd_inner_microstep: 1834.97 | bwd_allreduce_microstep: 587.82 | step_microstep: 37.88
[2024-06-10 21:14:47,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16735.46 | bwd: 45691.87 | bwd_inner: 45103.05 | bwd_allreduce: 588.10 | step: 39.42
{'loss': 1.2084, 'learning_rate': 9.230276618348224e-06, 'epoch': 0.69}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440
[2024-06-10 21:14:49,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.41 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1904
[2024-06-10 21:14:50,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 810.76 | bwd_inner_microstep: 810.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3485
[2024-06-10 21:14:52,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1234.30 | bwd_inner_microstep: 1234.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-10 21:14:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.01 | bwd_microstep: 684.09 | bwd_inner_microstep: 684.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 21:14:55,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.22 | bwd_microstep: 1549.24 | bwd_inner_microstep: 1549.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 21:14:57,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-10 21:14:59,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1251.04 | bwd_inner_microstep: 1251.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1979
[2024-06-10 21:15:00,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 705.54 | bwd_inner_microstep: 705.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-10 21:15:02,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1413.23 | bwd_inner_microstep: 1413.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-10 21:15:03,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.21 | bwd_microstep: 1159.34 | bwd_inner_microstep: 1159.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 21:15:05,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.34 | bwd_microstep: 1533.70 | bwd_inner_microstep: 1533.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 21:15:07,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 21:15:09,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1256.75 | bwd_inner_microstep: 1256.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-10 21:15:11,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380
[2024-06-10 21:15:13,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1337.62 | bwd_inner_microstep: 1337.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385
[2024-06-10 21:15:14,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.15 | bwd_microstep: 1242.52 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 21:15:16,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2625
[2024-06-10 21:15:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.34 | bwd_microstep: 984.99 | bwd_inner_microstep: 984.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 21:15:20,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1491.67 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 21:15:22,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1394.70 | bwd_inner_microstep: 1394.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1936
[2024-06-10 21:15:23,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.66 | bwd_microstep: 758.50 | bwd_inner_microstep: 758.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397
[2024-06-10 21:15:25,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1439.96 | bwd_inner_microstep: 1439.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446
[2024-06-10 21:15:26,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.39 | bwd_microstep: 1216.98 | bwd_inner_microstep: 1216.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572
[2024-06-10 21:15:28,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.11 | bwd_microstep: 1425.71 | bwd_inner_microstep: 1425.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-10 21:15:30,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.25 | bwd_microstep: 1191.04 | bwd_inner_microstep: 1191.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-10 21:15:31,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.65 | bwd_microstep: 822.98 | bwd_inner_microstep: 822.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3439
[2024-06-10 21:15:33,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.78 | bwd_microstep: 1212.04 | bwd_inner_microstep: 1212.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 21:15:35,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1404.38 | bwd_inner_microstep: 1404.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 21:15:37,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1458.77 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-10 21:15:39,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1507.11 | bwd_inner_microstep: 1507.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 21:15:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.49 | bwd_microstep: 975.89 | bwd_inner_microstep: 975.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 21:15:51,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-10 21:15:51,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 9910.92 | bwd_inner_microstep: 1547.59 | bwd_allreduce_microstep: 8363.25 | step_microstep: 39.44
[2024-06-10 21:15:51,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15023.82 | bwd: 48501.30 | bwd_inner: 40137.12 | bwd_allreduce: 8363.49 | step: 40.95
{'loss': 1.2062, 'learning_rate': 9.198668196397995e-06, 'epoch': 0.69}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1876
[2024-06-10 21:15:52,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.54 | bwd_microstep: 763.72 | bwd_inner_microstep: 763.55 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 21:15:54,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1367.32 | bwd_inner_microstep: 1367.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-10 21:15:56,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1547.17 | bwd_inner_microstep: 1547.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-10 21:15:58,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.13 | bwd_microstep: 1442.05 | bwd_inner_microstep: 1442.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 21:16:00,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1387.32 | bwd_inner_microstep: 1387.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1923
[2024-06-10 21:16:01,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.20 | bwd_microstep: 725.82 | bwd_inner_microstep: 725.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 21:16:03,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1341.16 | bwd_inner_microstep: 1341.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 21:16:04,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.37 | bwd_microstep: 1284.16 | bwd_inner_microstep: 1284.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-10 21:16:06,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.65 | bwd_microstep: 1150.88 | bwd_inner_microstep: 1150.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 21:16:08,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1504.65 | bwd_inner_microstep: 1504.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3496
[2024-06-10 21:16:10,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1349.87 | bwd_inner_microstep: 1349.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 21:16:12,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1338.15 | bwd_inner_microstep: 1338.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3387
[2024-06-10 21:16:14,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1430.25 | bwd_inner_microstep: 1430.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3789
[2024-06-10 21:16:16,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.90 | bwd_microstep: 1577.31 | bwd_inner_microstep: 1577.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2072
[2024-06-10 21:16:17,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.81 | bwd_microstep: 817.08 | bwd_inner_microstep: 817.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-10 21:16:19,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.47 | bwd_microstep: 1618.95 | bwd_inner_microstep: 1618.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 21:16:22,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.58 | bwd_microstep: 1653.88 | bwd_inner_microstep: 1653.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3522
[2024-06-10 21:16:23,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1355.30 | bwd_inner_microstep: 1355.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 21:16:25,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1280.10 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 21:16:27,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.79 | bwd_microstep: 910.38 | bwd_inner_microstep: 910.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2278
[2024-06-10 21:16:28,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.10 | bwd_microstep: 811.92 | bwd_inner_microstep: 811.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 21:16:30,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1499.38 | bwd_inner_microstep: 1499.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2638
[2024-06-10 21:16:31,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.09 | bwd_microstep: 1148.77 | bwd_inner_microstep: 1148.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-10 21:16:33,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.53 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3462
[2024-06-10 21:16:35,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1326.46 | bwd_inner_microstep: 1326.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278
[2024-06-10 21:16:37,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.18 | bwd_microstep: 1070.80 | bwd_inner_microstep: 1070.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 21:16:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1410.58 | bwd_inner_microstep: 1410.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 21:16:41,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.23 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3550
[2024-06-10 21:16:42,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1360.51 | bwd_inner_microstep: 1360.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2958
[2024-06-10 21:16:44,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.31 | bwd_microstep: 1016.74 | bwd_inner_microstep: 1016.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906
[2024-06-10 21:16:45,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.11 | bwd_microstep: 778.20 | bwd_inner_microstep: 778.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3842
[2024-06-10 21:16:52,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.11 | optimizer_step: 6.58
[2024-06-10 21:16:52,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.39 | bwd_microstep: 5918.62 | bwd_inner_microstep: 2329.57 | bwd_allreduce_microstep: 3589.01 | step_microstep: 37.79
[2024-06-10 21:16:52,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15341.30 | bwd: 45102.02 | bwd_inner: 41511.99 | bwd_allreduce: 3589.31 | step: 39.34


 69%|██████▉   | 1189/1726 [20:34:19<9:08:35, 61.30s/it]
 69%|██████▉   | 1190/1726 [20:35:21<9:10:35, 61.63s/it]


 69%|██████▉   | 1190/1726 [20:35:21<9:10:35, 61.63s/it]
 69%|██████▉   | 1191/1726 [20:36:21<9:03:51, 60.99s/it]


 69%|██████▉   | 1191/1726 [20:36:21<9:03:51, 60.99s/it]
 69%|██████▉   | 1192/1726 [20:37:24<9:07:35, 61.53s/it]


 69%|██████▉   | 1192/1726 [20:37:24<9:07:35, 61.53s/it]
 69%|██████▉   | 1193/1726 [20:38:28<9:12:46, 62.23s/it]


 69%|██████▉   | 1193/1726 [20:38:28<9:12:46, 62.23s/it]
 69%|██████▉   | 1194/1726 [20:39:28<9:07:54, 61.79s/it]
                                    {'loss': 1.1968, 'learning_rate': 9.167097816682218e-06, 'epoch': 0.69}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 21:16:53,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.84 | bwd_microstep: 1331.64 | bwd_inner_microstep: 1331.45 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2452
[2024-06-10 21:16:55,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.80 | bwd_microstep: 948.03 | bwd_inner_microstep: 948.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 21:16:57,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.43 | bwd_microstep: 1376.34 | bwd_inner_microstep: 1376.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 21:16:59,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1375.38 | bwd_inner_microstep: 1375.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3498
[2024-06-10 21:17:00,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.35 | bwd_microstep: 1218.29 | bwd_inner_microstep: 1218.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2259
[2024-06-10 21:17:02,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.38 | bwd_microstep: 929.94 | bwd_inner_microstep: 929.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 21:17:03,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 21:17:05,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1349.32 | bwd_inner_microstep: 1349.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 21:17:07,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1533.33 | bwd_inner_microstep: 1533.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 21:17:09,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1393.21 | bwd_inner_microstep: 1393.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2197
[2024-06-10 21:17:10,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.69 | bwd_microstep: 860.97 | bwd_inner_microstep: 860.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383
[2024-06-10 21:17:12,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.83 | bwd_microstep: 1241.19 | bwd_inner_microstep: 1241.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523
[2024-06-10 21:17:14,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1414.66 | bwd_inner_microstep: 1414.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2886
[2024-06-10 21:17:16,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.13 | bwd_microstep: 1087.21 | bwd_inner_microstep: 1087.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 21:17:18,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1605.09 | bwd_inner_microstep: 1605.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2381
[2024-06-10 21:17:19,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.31 | bwd_microstep: 1030.96 | bwd_inner_microstep: 1030.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-10 21:17:21,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1407.06 | bwd_inner_microstep: 1407.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-10 21:17:23,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.15 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 21:17:25,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1282.66 | bwd_inner_microstep: 1282.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 21:17:27,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1389.96 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 21:17:29,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1555.93 | bwd_inner_microstep: 1555.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 21:17:30,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.51 | bwd_microstep: 1288.75 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 21:17:33,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.67 | bwd_microstep: 1557.08 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3739
[2024-06-10 21:17:34,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.03 | bwd_microstep: 1339.29 | bwd_inner_microstep: 1339.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 21:17:36,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1298.63 | bwd_inner_microstep: 1298.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 21:17:38,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.10 | bwd_microstep: 1159.25 | bwd_inner_microstep: 1159.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728
[2024-06-10 21:17:40,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.98 | bwd_microstep: 1336.70 | bwd_inner_microstep: 1336.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-10 21:17:42,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1327.80 | bwd_inner_microstep: 1327.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-10 21:17:43,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.45 | bwd_microstep: 789.99 | bwd_inner_microstep: 789.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-10 21:17:44,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.35 | bwd_microstep: 791.01 | bwd_inner_microstep: 790.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928
[2024-06-10 21:17:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.47 | bwd_microstep: 730.90 | bwd_inner_microstep: 730.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 21:17:51,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-10 21:17:51,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 5279.71 | bwd_inner_microstep: 1525.33 | bwd_allreduce_microstep: 3754.33 | step_microstep: 38.20
[2024-06-10 21:17:51,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14986.39 | bwd: 43665.79 | bwd_inner: 39910.43 | bwd_allreduce: 3754.63 | step: 39.80
{'loss': 1.1994, 'learning_rate': 9.135565590391633e-06, 'epoch': 0.69}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 21:17:53,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1472.57 | bwd_inner_microstep: 1472.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-10 21:17:55,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.22 | bwd_microstep: 1678.23 | bwd_inner_microstep: 1678.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 21:17:57,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1372.25 | bwd_inner_microstep: 1372.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 21:17:59,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1438.19 | bwd_inner_microstep: 1438.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2636
[2024-06-10 21:18:00,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.01 | bwd_microstep: 1016.66 | bwd_inner_microstep: 1016.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 21:18:02,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 21:18:04,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1411.50 | bwd_inner_microstep: 1411.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3537
[2024-06-10 21:18:06,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1257.75 | bwd_inner_microstep: 1257.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 21:18:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 796.55 | bwd_inner_microstep: 796.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 21:18:09,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 21:18:11,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.98 | bwd_microstep: 1514.91 | bwd_inner_microstep: 1514.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 21:18:13,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3707
[2024-06-10 21:18:15,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1489.62 | bwd_inner_microstep: 1489.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3654
[2024-06-10 21:18:17,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 21:18:19,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1347.94 | bwd_inner_microstep: 1347.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501
[2024-06-10 21:18:21,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.91 | bwd_microstep: 1222.76 | bwd_inner_microstep: 1222.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 21:18:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.57 | bwd_microstep: 1295.25 | bwd_inner_microstep: 1295.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518
[2024-06-10 21:18:24,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.40 | bwd_microstep: 1224.79 | bwd_inner_microstep: 1224.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 21:18:25,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.28 | bwd_microstep: 697.10 | bwd_inner_microstep: 697.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 21:18:27,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.07 | bwd_microstep: 1451.19 | bwd_inner_microstep: 1451.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 21:18:29,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1554.79 | bwd_inner_microstep: 1554.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 21:18:31,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.59 | bwd_microstep: 1289.28 | bwd_inner_microstep: 1289.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 21:18:33,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.60 | bwd_microstep: 1459.84 | bwd_inner_microstep: 1459.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 5558
[2024-06-10 21:18:36,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 742.22 | bwd_microstep: 1971.62 | bwd_inner_microstep: 1971.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3728
[2024-06-10 21:18:38,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.59 | bwd_microstep: 1563.63 | bwd_inner_microstep: 1563.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 21:18:40,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1550.68 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 21:18:42,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.07 | bwd_microstep: 1545.21 | bwd_inner_microstep: 1545.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3562
[2024-06-10 21:18:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 1545.92 | bwd_inner_microstep: 1545.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415
[2024-06-10 21:18:46,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1409.44 | bwd_inner_microstep: 1409.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3589
[2024-06-10 21:18:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1534.55 | bwd_inner_microstep: 1534.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 21:18:51,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1644.80 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2533
[2024-06-10 21:18:52,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.04 | optimizer_step: 6.61
[2024-06-10 21:18:52,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.33 | bwd_microstep: 1227.63 | bwd_inner_microstep: 1219.95 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.40
[2024-06-10 21:18:52,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16695.25 | bwd: 44681.51 | bwd_inner: 44672.99 | bwd_allreduce: 7.86 | step: 38.98
{'loss': 1.1384, 'learning_rate': 9.104071628582542e-06, 'epoch': 0.69}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931
[2024-06-10 21:18:54,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.61 | bwd_microstep: 880.29 | bwd_inner_microstep: 880.23 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 21:18:55,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1279.77 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819
[2024-06-10 21:18:57,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2257
[2024-06-10 21:18:58,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.44 | bwd_microstep: 902.69 | bwd_inner_microstep: 902.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 21:19:00,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1247.84 | bwd_inner_microstep: 1247.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 21:19:02,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.19 | bwd_microstep: 1542.44 | bwd_inner_microstep: 1542.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4131
[2024-06-10 21:19:05,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.10 | bwd_microstep: 1637.95 | bwd_inner_microstep: 1637.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 21:19:07,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1409.07 | bwd_inner_microstep: 1409.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4029
[2024-06-10 21:19:09,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.64 | bwd_microstep: 1615.59 | bwd_inner_microstep: 1615.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 21:19:11,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.83 | bwd_microstep: 1432.95 | bwd_inner_microstep: 1432.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 21:19:13,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.23 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436
[2024-06-10 21:19:15,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1313.11 | bwd_inner_microstep: 1313.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-10 21:19:16,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 803.22 | bwd_inner_microstep: 803.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489
[2024-06-10 21:19:18,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.75 | bwd_microstep: 1446.56 | bwd_inner_microstep: 1446.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2095
[2024-06-10 21:19:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.51 | bwd_microstep: 1015.26 | bwd_inner_microstep: 1015.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 21:19:21,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 21:19:23,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1389.39 | bwd_inner_microstep: 1389.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3435
[2024-06-10 21:19:25,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.14 | bwd_microstep: 1188.02 | bwd_inner_microstep: 1187.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 21:19:27,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 1559.28 | bwd_inner_microstep: 1559.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2919
[2024-06-10 21:19:28,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.88 | bwd_microstep: 1096.05 | bwd_inner_microstep: 1096.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2284
[2024-06-10 21:19:30,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.24 | bwd_microstep: 1005.26 | bwd_inner_microstep: 1005.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3834
[2024-06-10 21:19:32,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1269.63 | bwd_inner_microstep: 1269.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-10 21:19:34,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.74 | bwd_microstep: 1519.65 | bwd_inner_microstep: 1519.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 21:19:35,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2280
[2024-06-10 21:19:37,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.74 | bwd_microstep: 1072.57 | bwd_inner_microstep: 1072.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 21:19:39,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.67 | bwd_microstep: 1452.81 | bwd_inner_microstep: 1452.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-10 21:19:41,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.08 | bwd_microstep: 1449.57 | bwd_inner_microstep: 1449.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-10 21:19:43,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.17 | bwd_microstep: 1303.69 | bwd_inner_microstep: 1303.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-10 21:19:45,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1448.57 | bwd_inner_microstep: 1448.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559
[2024-06-10 21:19:47,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.31 | bwd_microstep: 1332.52 | bwd_inner_microstep: 1332.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 21:19:49,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.92 | bwd_microstep: 1452.81 | bwd_inner_microstep: 1452.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 21:19:53,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 21:19:53,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 3557.24 | bwd_inner_microstep: 1039.27 | bwd_allreduce_microstep: 2517.93 | step_microstep: 38.09
[2024-06-10 21:19:53,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15635.66 | bwd: 44325.18 | bwd_inner: 41806.31 | bwd_allreduce: 2518.18 | step: 39.58
{'loss': 1.1997, 'learning_rate': 9.072616042176543e-06, 'epoch': 0.69}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 21:19:55,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.22 | bwd_microstep: 1433.74 | bwd_inner_microstep: 1433.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 21:19:56,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1377.60 | bwd_inner_microstep: 1377.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-10 21:19:58,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 789.31 | bwd_inner_microstep: 789.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-10 21:20:00,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.20 | bwd_microstep: 1446.61 | bwd_inner_microstep: 1446.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 21:20:02,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1550.60 | bwd_inner_microstep: 1550.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-10 21:20:04,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1410.22 | bwd_inner_microstep: 1410.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416
[2024-06-10 21:20:05,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.53 | bwd_microstep: 1151.89 | bwd_inner_microstep: 1151.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 21:20:07,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.36 | bwd_microstep: 1358.76 | bwd_inner_microstep: 1358.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 21:20:09,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 21:20:11,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1293.64 | bwd_inner_microstep: 1293.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 21:20:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1348.70 | bwd_inner_microstep: 1348.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-10 21:20:14,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.31 | bwd_microstep: 699.17 | bwd_inner_microstep: 699.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1962
[2024-06-10 21:20:15,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.59 | bwd_microstep: 733.88 | bwd_inner_microstep: 733.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 21:20:16,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.58 | bwd_microstep: 1286.11 | bwd_inner_microstep: 1286.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-10 21:20:19,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.29 | bwd_microstep: 1590.34 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654
[2024-06-10 21:20:21,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.95 | bwd_microstep: 1445.60 | bwd_inner_microstep: 1445.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 21:20:23,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1478.49 | bwd_inner_microstep: 1478.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 21:20:24,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1392.78 | bwd_inner_microstep: 1392.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3632
[2024-06-10 21:20:27,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.24 | bwd_microstep: 1657.46 | bwd_inner_microstep: 1657.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 21:20:29,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1495.36 | bwd_inner_microstep: 1495.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-10 21:20:31,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.08 | bwd_microstep: 1425.18 | bwd_inner_microstep: 1425.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2969
[2024-06-10 21:20:32,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.62 | bwd_microstep: 1204.53 | bwd_inner_microstep: 1204.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 21:20:35,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1494.27 | bwd_inner_microstep: 1494.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3555
[2024-06-10 21:20:36,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.65 | bwd_microstep: 1262.56 | bwd_inner_microstep: 1262.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 21:20:38,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.24 | bwd_microstep: 1502.41 | bwd_inner_microstep: 1502.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 21:20:40,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1393.20 | bwd_inner_microstep: 1393.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 21:20:42,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1281.31 | bwd_inner_microstep: 1281.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-10 21:20:44,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1298.48 | bwd_inner_microstep: 1298.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 21:20:46,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.35 | bwd_microstep: 1600.99 | bwd_inner_microstep: 1600.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2261
[2024-06-10 21:20:47,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.02 | bwd_microstep: 972.57 | bwd_inner_microstep: 972.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 21:20:49,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 21:20:53,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-10 21:20:53,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 2945.06 | bwd_inner_microstep: 1761.59 | bwd_allreduce_microstep: 1183.42 | step_microstep: 37.84
[2024-06-10 21:20:53,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.83 | bwd: 43978.68 | bwd_inner: 42794.35 | bwd_allreduce: 1183.65 | step: 39.51
{'loss': 1.1777, 'learning_rate': 9.04119894196003e-06, 'epoch': 0.69}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 21:20:55,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.18 | bwd_microstep: 1474.54 | bwd_inner_microstep: 1474.45 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4010
[2024-06-10 21:20:57,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.18 | bwd_microstep: 1705.84 | bwd_inner_microstep: 1705.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851
[2024-06-10 21:20:59,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.78 | bwd_microstep: 1558.23 | bwd_inner_microstep: 1558.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 21:21:02,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.55 | bwd_microstep: 1653.49 | bwd_inner_microstep: 1653.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3774
[2024-06-10 21:21:03,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1244.95 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 21:21:05,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1384.97 | bwd_inner_microstep: 1384.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 21:21:07,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1248.56 | bwd_inner_microstep: 1248.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 21:21:09,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1350.35 | bwd_inner_microstep: 1350.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1968
[2024-06-10 21:21:10,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.64 | bwd_microstep: 732.46 | bwd_inner_microstep: 732.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-10 21:21:12,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1528.41 | bwd_inner_microstep: 1528.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3500
[2024-06-10 21:21:14,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.59 | bwd_microstep: 1492.08 | bwd_inner_microstep: 1492.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3704
[2024-06-10 21:21:16,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.33 | bwd_microstep: 1677.78 | bwd_inner_microstep: 1677.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492
[2024-06-10 21:21:19,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.16 | bwd_microstep: 1582.46 | bwd_inner_microstep: 1582.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 21:21:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.04 | bwd_microstep: 1718.77 | bwd_inner_microstep: 1718.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 21:21:23,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1397.78 | bwd_inner_microstep: 1397.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 21:21:25,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1508.81 | bwd_inner_microstep: 1508.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1984
[2024-06-10 21:21:26,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.31 | bwd_microstep: 705.35 | bwd_inner_microstep: 705.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480
[2024-06-10 21:21:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.25 | bwd_microstep: 1189.42 | bwd_inner_microstep: 1189.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 21:21:29,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.30 | bwd_microstep: 1284.68 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3666
[2024-06-10 21:21:31,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.18 | bwd_microstep: 1455.60 | bwd_inner_microstep: 1455.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3627
[2024-06-10 21:21:34,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.83 | bwd_microstep: 1540.84 | bwd_inner_microstep: 1540.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489
[2024-06-10 21:21:35,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1316.31 | bwd_inner_microstep: 1316.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 21:21:38,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1591.73 | bwd_inner_microstep: 1591.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-10 21:21:39,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1403.80 | bwd_inner_microstep: 1403.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 21:21:42,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1506.73 | bwd_inner_microstep: 1506.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688
[2024-06-10 21:21:44,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1430.75 | bwd_inner_microstep: 1430.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 21:21:45,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469
[2024-06-10 21:21:47,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1435.51 | bwd_inner_microstep: 1435.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 21:21:49,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1349.24 | bwd_inner_microstep: 1349.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-10 21:21:51,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1442.95 | bwd_inner_microstep: 1442.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3600
[2024-06-10 21:21:54,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.27 | bwd_microstep: 1704.22 | bwd_inner_microstep: 1704.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3564
[2024-06-10 21:21:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 21:21:56,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.41 | bwd_microstep: 2069.60 | bwd_inner_microstep: 1908.65 | bwd_allreduce_microstep: 160.90 | step_microstep: 37.80
[2024-06-10 21:21:56,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17073.27 | bwd: 46087.94 | bwd_inner: 45926.07 | bwd_allreduce: 161.17 | step: 39.36
{'loss': 1.1104, 'learning_rate': 9.009820438583881e-06, 'epoch': 0.69}


 69%|██████▉   | 1194/1726 [20:39:28<9:07:54, 61.79s/it]
 69%|██████▉   | 1195/1726 [20:40:27<8:59:25, 60.95s/it]


 69%|██████▉   | 1195/1726 [20:40:27<8:59:25, 60.95s/it]
 69%|██████▉   | 1196/1726 [20:41:29<9:00:24, 61.18s/it]


 69%|██████▉   | 1196/1726 [20:41:29<9:00:24, 61.18s/it]
 69%|██████▉   | 1197/1726 [20:42:29<8:57:02, 60.91s/it]


 69%|██████▉   | 1197/1726 [20:42:29<8:57:02, 60.91s/it]
 69%|██████▉   | 1198/1726 [20:43:30<8:54:28, 60.74s/it]


 69%|██████▉   | 1198/1726 [20:43:30<8:54:28, 60.74s/it]
 69%|██████▉   | 1199/1726 [20:44:33<9:00:47, 61.57s/it]


 6dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-10 21:21:57,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.72 | bwd_microstep: 721.70 | bwd_inner_microstep: 721.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 21:21:59,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1244.98 | bwd_inner_microstep: 1244.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 21:22:01,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1298.88 | bwd_inner_microstep: 1298.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 21:22:03,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.29 | bwd_microstep: 1548.91 | bwd_inner_microstep: 1548.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 21:22:05,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1377.83 | bwd_inner_microstep: 1377.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-10 21:22:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1436.33 | bwd_inner_microstep: 1436.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4114
[2024-06-10 21:22:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.93 | bwd_microstep: 1638.24 | bwd_inner_microstep: 1638.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012
[2024-06-10 21:22:10,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.92 | bwd_microstep: 710.47 | bwd_inner_microstep: 710.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 21:22:12,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.99 | bwd_microstep: 1376.28 | bwd_inner_microstep: 1376.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 21:22:14,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1410.79 | bwd_inner_microstep: 1410.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 21:22:16,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3445
[2024-06-10 21:22:18,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1302.62 | bwd_inner_microstep: 1302.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-10 21:22:19,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.64 | bwd_microstep: 1340.43 | bwd_inner_microstep: 1340.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1925
[2024-06-10 21:22:21,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.45 | bwd_microstep: 759.17 | bwd_inner_microstep: 759.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503
[2024-06-10 21:22:23,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1445.35 | bwd_inner_microstep: 1445.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 21:22:25,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.00 | bwd_microstep: 1718.21 | bwd_inner_microstep: 1718.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 21:22:27,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1379.60 | bwd_inner_microstep: 1379.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 21:22:29,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657
[2024-06-10 21:22:31,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.18 | bwd_microstep: 1523.53 | bwd_inner_microstep: 1523.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552
[2024-06-10 21:22:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1427.79 | bwd_inner_microstep: 1427.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 21:22:35,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1358.52 | bwd_inner_microstep: 1358.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756
[2024-06-10 21:22:37,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.12 | bwd_microstep: 1541.58 | bwd_inner_microstep: 1541.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055
[2024-06-10 21:22:38,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.49 | bwd_microstep: 914.87 | bwd_inner_microstep: 914.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 21:22:40,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1301.47 | bwd_inner_microstep: 1301.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3673
[2024-06-10 21:22:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.10 | bwd_microstep: 1276.77 | bwd_inner_microstep: 1276.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 21:22:43,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1377.29 | bwd_inner_microstep: 1377.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 21:22:45,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1411.00 | bwd_inner_microstep: 1410.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559
[2024-06-10 21:22:47,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1330.24 | bwd_inner_microstep: 1330.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 21:22:49,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-10 21:22:51,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.59 | bwd_microstep: 1602.14 | bwd_inner_microstep: 1602.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3450
[2024-06-10 21:22:53,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.46 | bwd_microstep: 1512.96 | bwd_inner_microstep: 1512.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 21:22:58,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.49 | optimizer_step: 6.61
[2024-06-10 21:22:58,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.30 | bwd_microstep: 3551.50 | bwd_inner_microstep: 1690.58 | bwd_allreduce_microstep: 1860.84 | step_microstep: 43.09
[2024-06-10 21:22:58,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16065.59 | bwd: 44830.19 | bwd_inner: 42968.38 | bwd_allreduce: 1861.10 | step: 44.57
{'loss': 1.1653, 'learning_rate': 8.978480642563015e-06, 'epoch': 0.7}
9%|██████▉   | 1199/1726 [20:44:33<9:00:47, 61.57s/it]
 70%|██████▉   | 1200/1726 [20:45:34<8:58:52, 61.47s/it]


 70%|██████▉   | 1200/1726 [20:45:34<8:58:52, 61.47s/it][INFO|trainer.py:2936] 2024-06-10 21:23:01,003 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200
[INFO|configuration_utils.py:473] 2024-06-10 21:23:01,007 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/config.json
[INFO|configuration_utils.py:594] 2024-06-10 21:23:01,009 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-10 21:23:09,316 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-10 21:23:09,339 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-10 21:23:09,342 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-10 21:23:09,343 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/added_tokens.json
[2024-06-10 21:23:09,644] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1200 is about to be saved!
[2024-06-10 21:23:09,656] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/mp_rank_00_model_states.pt
[2024-06-10 21:23:09,657] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/mp_rank_00_model_states.pt...
[2024-06-10 21:23:18,017] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/mp_rank_00_model_states.pt.
[2024-06-10 21:23:18,033] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-10 21:23:30,076] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-10 21:23:30,104] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-10 21:23:30,105] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1200 is ready now!
[INFO|trainer.py:3028] 2024-06-10 21:23:30,325 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-600] due to args.save_total_limit
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-10 21:23:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1323.38 | bwd_inner_microstep: 1323.28 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3409
[2024-06-10 21:23:35,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.51 | bwd_microstep: 1366.03 | bwd_inner_microstep: 1366.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 21:23:36,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.06 | bwd_microstep: 1385.97 | bwd_inner_microstep: 1385.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 21:23:38,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1479.49 | bwd_inner_microstep: 1479.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 21:23:41,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1534.80 | bwd_inner_microstep: 1534.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3396
[2024-06-10 21:23:42,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.70 | bwd_microstep: 1206.87 | bwd_inner_microstep: 1206.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 21:23:44,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1254.96 | bwd_inner_microstep: 1254.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 21:23:46,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1373.70 | bwd_inner_microstep: 1373.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 21:23:48,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1383.05 | bwd_inner_microstep: 1383.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167
[2024-06-10 21:23:49,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.76 | bwd_microstep: 852.69 | bwd_inner_microstep: 852.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1985
[2024-06-10 21:23:50,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.39 | bwd_microstep: 826.42 | bwd_inner_microstep: 826.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 21:23:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.34 | bwd_microstep: 802.93 | bwd_inner_microstep: 802.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3421
[2024-06-10 21:23:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.89 | bwd_microstep: 1370.87 | bwd_inner_microstep: 1370.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 21:23:54,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 795.57 | bwd_inner_microstep: 795.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 21:23:56,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1392.43 | bwd_inner_microstep: 1392.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 21:23:58,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1184.05 | bwd_inner_microstep: 1184.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3648
[2024-06-10 21:24:00,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.27 | bwd_microstep: 1315.55 | bwd_inner_microstep: 1315.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090
[2024-06-10 21:24:01,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.27 | bwd_microstep: 918.35 | bwd_inner_microstep: 918.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 21:24:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1256.15 | bwd_inner_microstep: 1256.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 21:24:05,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1389.20 | bwd_inner_microstep: 1389.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 21:24:07,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.65 | bwd_microstep: 1551.38 | bwd_inner_microstep: 1551.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 21:24:09,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1455.10 | bwd_inner_microstep: 1455.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 21:24:10,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 21:24:12,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1258.93 | bwd_inner_microstep: 1258.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 21:24:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1416.33 | bwd_inner_microstep: 1416.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525
[2024-06-10 21:24:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.56 | bwd_microstep: 1352.90 | bwd_inner_microstep: 1352.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3721
[2024-06-10 21:24:19,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.84 | bwd_microstep: 1781.40 | bwd_inner_microstep: 1781.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836
[2024-06-10 21:24:21,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.30 | bwd_microstep: 1761.35 | bwd_inner_microstep: 1761.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 21:24:23,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.83 | bwd_microstep: 1244.13 | bwd_inner_microstep: 1244.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3765
[2024-06-10 21:24:25,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.62 | bwd_microstep: 1402.91 | bwd_inner_microstep: 1402.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 21:24:26,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1373.37 | bwd_inner_microstep: 1373.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 21:24:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-10 21:24:32,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 5249.36 | bwd_inner_microstep: 1648.69 | bwd_allreduce_microstep: 3600.61 | step_microstep: 39.11
[2024-06-10 21:24:32,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.39 | bwd: 45544.71 | bwd_inner: 41943.10 | bwd_allreduce: 3600.90 | step: 40.85
{'loss': 1.2298, 'learning_rate': 8.947179664276028e-06, 'epoch': 0.7}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 21:24:34,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1469.36 | bwd_inner_microstep: 1469.23 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4003
[2024-06-10 21:24:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1505.92 | bwd_inner_microstep: 1505.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 21:24:38,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1482.78 | bwd_inner_microstep: 1482.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2302
[2024-06-10 21:24:40,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.55 | bwd_microstep: 814.58 | bwd_inner_microstep: 814.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 21:24:41,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 793.84 | bwd_inner_microstep: 793.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-10 21:24:43,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.89 | bwd_microstep: 1444.13 | bwd_inner_microstep: 1444.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 21:24:44,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 789.75 | bwd_inner_microstep: 789.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2457
[2024-06-10 21:24:45,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.75 | bwd_microstep: 920.81 | bwd_inner_microstep: 920.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 21:24:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.97 | bwd_microstep: 1302.74 | bwd_inner_microstep: 1302.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499
[2024-06-10 21:24:49,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.22 | bwd_microstep: 1432.92 | bwd_inner_microstep: 1432.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3499
[2024-06-10 21:24:51,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1533.50 | bwd_inner_microstep: 1533.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3668
[2024-06-10 21:24:53,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.93 | bwd_microstep: 1356.37 | bwd_inner_microstep: 1356.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 21:24:55,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1416.61 | bwd_inner_microstep: 1416.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957
[2024-06-10 21:24:56,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.13 | bwd_microstep: 891.04 | bwd_inner_microstep: 891.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 21:24:58,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1245.04 | bwd_inner_microstep: 1245.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4022
[2024-06-10 21:25:00,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.15 | bwd_microstep: 1620.31 | bwd_inner_microstep: 1620.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2509
[2024-06-10 21:25:01,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.66 | bwd_microstep: 962.19 | bwd_inner_microstep: 962.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 21:25:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1616.39 | bwd_inner_microstep: 1616.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 21:25:05,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1278.08 | bwd_inner_microstep: 1278.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3894
[2024-06-10 21:25:07,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.79 | bwd_microstep: 1516.60 | bwd_inner_microstep: 1516.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 21:25:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.03 | bwd_microstep: 801.96 | bwd_inner_microstep: 801.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2111
[2024-06-10 21:25:10,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.03 | bwd_microstep: 823.84 | bwd_inner_microstep: 823.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3544
[2024-06-10 21:25:12,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.00 | bwd_microstep: 1329.07 | bwd_inner_microstep: 1329.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 21:25:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1300.54 | bwd_inner_microstep: 1300.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-10 21:25:15,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.62 | bwd_microstep: 1453.81 | bwd_inner_microstep: 1453.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639
[2024-06-10 21:25:18,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.15 | bwd_microstep: 1581.08 | bwd_inner_microstep: 1581.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944
[2024-06-10 21:25:19,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.77 | bwd_microstep: 729.49 | bwd_inner_microstep: 729.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3779
[2024-06-10 21:25:21,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.95 | bwd_microstep: 1747.20 | bwd_inner_microstep: 1747.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 21:25:23,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1507.77 | bwd_inner_microstep: 1507.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-10 21:25:24,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.68 | bwd_microstep: 813.84 | bwd_inner_microstep: 813.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2243
[2024-06-10 21:25:25,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.46 | bwd_microstep: 966.95 | bwd_inner_microstep: 966.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3438
[2024-06-10 21:25:35,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 21:25:35,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.16 | bwd_microstep: 8999.92 | bwd_inner_microstep: 1720.63 | bwd_allreduce_microstep: 7279.24 | step_microstep: 38.09
[2024-06-10 21:25:35,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14967.12 | bwd: 47448.45 | bwd_inner: 40168.19 | bwd_allreduce: 7279.52 | step: 39.61
{'loss': 1.1638, 'learning_rate': 8.9159176139648e-06, 'epoch': 0.7}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 21:25:37,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1371.07 | bwd_inner_microstep: 1370.99 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2446
[2024-06-10 21:25:38,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.04 | bwd_microstep: 913.55 | bwd_inner_microstep: 913.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3869
[2024-06-10 21:25:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.73 | bwd_microstep: 1557.71 | bwd_inner_microstep: 1557.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 21:25:42,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1443.47 | bwd_inner_microstep: 1443.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 21:25:44,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1436.16 | bwd_inner_microstep: 1436.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 21:25:46,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1434.83 | bwd_inner_microstep: 1434.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 21:25:48,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1241.84 | bwd_inner_microstep: 1241.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 767
[2024-06-10 21:25:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.68 | bwd_microstep: 310.44 | bwd_inner_microstep: 310.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 21:25:50,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1388.38 | bwd_inner_microstep: 1388.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 21:25:53,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.04 | bwd_microstep: 1525.20 | bwd_inner_microstep: 1525.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444
[2024-06-10 21:25:54,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.47 | bwd_microstep: 1217.37 | bwd_inner_microstep: 1217.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2096
[2024-06-10 21:25:55,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.58 | bwd_microstep: 849.05 | bwd_inner_microstep: 849.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 21:25:58,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.17 | bwd_microstep: 1717.57 | bwd_inner_microstep: 1717.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2101
[2024-06-10 21:25:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.79 | bwd_microstep: 730.19 | bwd_inner_microstep: 730.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 21:26:01,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1407.51 | bwd_inner_microstep: 1407.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 21:26:02,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.67 | bwd_microstep: 791.73 | bwd_inner_microstep: 791.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 21:26:04,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.12 | bwd_microstep: 1481.37 | bwd_inner_microstep: 1481.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962
[2024-06-10 21:26:05,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.83 | bwd_microstep: 887.73 | bwd_inner_microstep: 887.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 21:26:07,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.29 | bwd_microstep: 1183.43 | bwd_inner_microstep: 1183.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-10 21:26:09,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.26 | bwd_microstep: 1440.67 | bwd_inner_microstep: 1440.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3693
[2024-06-10 21:26:10,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1231.91 | bwd_inner_microstep: 1231.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 21:26:12,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.35 | bwd_microstep: 1498.06 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 21:26:14,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1282.55 | bwd_inner_microstep: 1282.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823
[2024-06-10 21:26:16,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.36 | bwd_microstep: 1479.81 | bwd_inner_microstep: 1479.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3776
[2024-06-10 21:26:19,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.45 | bwd_microstep: 1675.44 | bwd_inner_microstep: 1675.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 21:26:21,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.65 | bwd_microstep: 1647.61 | bwd_inner_microstep: 1647.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-10 21:26:23,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.42 | bwd_microstep: 1601.65 | bwd_inner_microstep: 1601.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 21:26:25,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.27 | bwd_microstep: 1550.32 | bwd_inner_microstep: 1550.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 21:26:27,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1402.91 | bwd_inner_microstep: 1402.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2237
[2024-06-10 21:26:28,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.65 | bwd_microstep: 895.97 | bwd_inner_microstep: 895.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3731
[2024-06-10 21:26:31,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1600.35 | bwd_inner_microstep: 1600.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 21:26:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 6.42 | optimizer_step: 6.60
[2024-06-10 21:26:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 4984.79 | bwd_inner_microstep: 1755.00 | bwd_allreduce_microstep: 3229.72 | step_microstep: 42.23
[2024-06-10 21:26:36,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15611.32 | bwd: 45180.67 | bwd_inner: 41949.96 | bwd_allreduce: 3230.00 | step: 43.78
{'loss': 1.2172, 'learning_rate': 8.884694601734123e-06, 'epoch': 0.7}
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2637
[2024-06-10 21:26:38,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.61 | bwd_microstep: 1134.27 | bwd_inner_microstep: 1134.13 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 21:26:40,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.58 | bwd_microstep: 1242.77 | bwd_inner_microstep: 1242.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868
[2024-06-10 21:26:42,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.65 | bwd_microstep: 1661.84 | bwd_inner_microstep: 1661.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 21:26:44,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.24 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690
[2024-06-10 21:26:46,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.13 | bwd_microstep: 1360.28 | bwd_inner_microstep: 1360.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3496
[2024-06-10 21:26:48,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1443.61 | bwd_inner_microstep: 1443.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2714
[2024-06-10 21:26:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.46 | bwd_microstep: 939.53 | bwd_inner_microstep: 939.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 21:26:50,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.25 | bwd_microstep: 790.72 | bwd_inner_microstep: 790.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2186
[2024-06-10 21:26:51,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.71 | bwd_microstep: 953.14 | bwd_inner_microstep: 953.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 21:26:53,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1247.66 | bwd_inner_microstep: 1247.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 21:26:55,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.71 | bwd_microstep: 1404.39 | bwd_inner_microstep: 1404.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 21:26:57,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 21:26:59,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1532.23 | bwd_inner_microstep: 1532.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1911
[2024-06-10 21:27:00,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.96 | bwd_microstep: 763.92 | bwd_inner_microstep: 763.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 21:27:02,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1389.07 | bwd_inner_microstep: 1389.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 21:27:04,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.27 | bwd_microstep: 1611.95 | bwd_inner_microstep: 1611.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3560
[2024-06-10 21:27:06,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.67 | bwd_microstep: 1658.22 | bwd_inner_microstep: 1658.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472
[2024-06-10 21:27:08,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.35 | bwd_microstep: 1213.67 | bwd_inner_microstep: 1213.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3841
[2024-06-10 21:27:10,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1491.36 | bwd_inner_microstep: 1491.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-10 21:27:12,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.53 | bwd_microstep: 1284.24 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488
[2024-06-10 21:27:14,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.70 | bwd_microstep: 1441.34 | bwd_inner_microstep: 1441.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-10 21:27:15,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.24 | bwd_microstep: 803.98 | bwd_inner_microstep: 803.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-10 21:27:16,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.79 | bwd_microstep: 881.47 | bwd_inner_microstep: 881.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554
[2024-06-10 21:27:18,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.66 | bwd_microstep: 1330.84 | bwd_inner_microstep: 1330.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812
[2024-06-10 21:27:20,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 21:27:22,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1433.72 | bwd_inner_microstep: 1433.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268
[2024-06-10 21:27:23,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 971.05 | bwd_inner_microstep: 971.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-10 21:27:25,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1488.47 | bwd_inner_microstep: 1488.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 21:27:27,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1495.10 | bwd_inner_microstep: 1495.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769
[2024-06-10 21:27:29,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.06 | bwd_microstep: 1470.89 | bwd_inner_microstep: 1470.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579
[2024-06-10 21:27:31,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.54 | bwd_microstep: 1526.51 | bwd_inner_microstep: 1526.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806
[2024-06-10 21:27:36,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.22 | optimizer_step: 6.94
[2024-06-10 21:27:36,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 3948.36 | bwd_inner_microstep: 1670.98 | bwd_allreduce_microstep: 2277.33 | step_microstep: 38.35
[2024-06-10 21:27:36,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15557.43 | bwd: 43929.68 | bwd_inner: 41651.33 | bwd_allreduce: 2277.61 | step: 39.92
{'loss': 1.1509, 'learning_rate': 8.853510737551274e-06, 'epoch': 0.7}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3402
[2024-06-10 21:27:38,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.95 | bwd_microstep: 1497.38 | bwd_inner_microstep: 1497.22 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 21:27:40,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1380.53 | bwd_inner_microstep: 1380.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 21:27:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1348.78 | bwd_inner_microstep: 1348.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 21:27:44,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1480.73 | bwd_inner_microstep: 1480.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868
[2024-06-10 21:27:45,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 708.26 | bwd_inner_microstep: 708.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 21:27:47,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.56 | bwd_microstep: 1376.14 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 21:27:49,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1376.90 | bwd_inner_microstep: 1376.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-10 21:27:50,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.40 | bwd_microstep: 1180.34 | bwd_inner_microstep: 1180.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 21:27:52,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.07 | bwd_microstep: 1291.89 | bwd_inner_microstep: 1291.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 21:27:54,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1628.46 | bwd_inner_microstep: 1628.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1950
[2024-06-10 21:27:56,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.47 | bwd_microstep: 886.55 | bwd_inner_microstep: 886.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3699
[2024-06-10 21:27:58,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.49 | bwd_microstep: 1722.76 | bwd_inner_microstep: 1722.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3990
[2024-06-10 21:28:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.26 | bwd_microstep: 1637.02 | bwd_inner_microstep: 1637.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 21:28:02,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1376.72 | bwd_inner_microstep: 1376.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3665
[2024-06-10 21:28:04,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.77 | bwd_microstep: 1687.99 | bwd_inner_microstep: 1687.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-10 21:28:06,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 21:28:08,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 796.57 | bwd_inner_microstep: 796.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 21:28:09,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1429.67 | bwd_inner_microstep: 1429.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 21:28:11,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1392.56 | bwd_inner_microstep: 1392.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 21:28:14,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.44 | bwd_microstep: 1607.30 | bwd_inner_microstep: 1607.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 21:28:16,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1398.60 | bwd_inner_microstep: 1398.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 21:28:18,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1559.64 | bwd_inner_microstep: 1559.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 21:28:19,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.66 | bwd_microstep: 698.18 | bwd_inner_microstep: 698.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-10 21:28:21,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1402.30 | bwd_inner_microstep: 1402.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 21:28:22,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1347.54 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3781
[2024-06-10 21:28:24,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.17 | bwd_microstep: 1317.05 | bwd_inner_microstep: 1317.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-10 21:28:26,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1294.93 | bwd_inner_microstep: 1294.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3563
[2024-06-10 21:28:28,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.91 | bwd_microstep: 1329.76 | bwd_inner_microstep: 1329.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-10 21:28:30,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1490.38 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2016
[2024-06-10 21:28:31,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.32 | bwd_microstep: 775.10 | bwd_inner_microstep: 775.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3689
[2024-06-10 21:28:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.43 | bwd_microstep: 1827.24 | bwd_inner_microstep: 1827.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3774
[2024-06-10 21:28:39,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.07 | optimizer_gradients: 4.32 | optimizer_step: 6.62
[2024-06-10 21:28:39,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.64 | bwd_microstep: 4342.43 | bwd_inner_microstep: 1804.36 | bwd_allreduce_microstep: 2538.01 | step_microstep: 39.76
[2024-06-10 21:28:39,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16173.71 | bwd: 45995.32 | bwd_inner: 43456.26 | bwd_allreduce: 2538.33 | step: 41.28
{'loss': 1.242, 'learning_rate': 8.822366131245664e-06, 'epoch': 0.7}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3410
[2024-06-10 21:28:40,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.96 | bwd_microstep: 1175.35 | bwd_inner_microstep: 1175.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1909
[2024-06-10 21:28:41,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.76 | bwd_microstep: 714.97 | bwd_inner_microstep: 714.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-10 21:28:43,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1244.30 | bwd_inner_microstep: 1244.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577
[2024-06-10 21:28:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.86 | bwd_microstep: 1294.77 | bwd_inner_microstep: 1294.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3795
[2024-06-10 21:28:47,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.43 | bwd_microstep: 1595.78 | bwd_inner_microstep: 1595.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 21:28:49,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.89 | bwd_microstep: 1281.64 | bwd_inner_microstep: 1281.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 21:28:51,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1384.66 | bwd_inner_microstep: 1384.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-10 21:28:53,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.22 | bwd_microstep: 1528.36 | bwd_inner_microstep: 1528.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 21:28:55,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1386.55 | bwd_inner_microstep: 1386.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 21:28:56,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1249.78 | bwd_inner_microstep: 1249.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 21:28:57,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 793.75 | bwd_inner_microstep: 793.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-10 21:28:59,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.53 | bwd_microstep: 1152.79 | bwd_inner_microstep: 1152.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3407
[2024-06-10 21:29:01,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.79 | bwd_microstep: 1436.42 | bwd_inner_microstep: 1436.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666
[2024-06-10 21:29:03,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.75 | bwd_microstep: 1585.67 | bwd_inner_microstep: 1585.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-10 21:29:05,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.97 | bwd_microstep: 975.70 | bwd_inner_microstep: 975.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 21:29:07,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1484.43 | bwd_inner_microstep: 1484.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3123
[2024-06-10 21:29:08,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.47 | bwd_microstep: 1093.90 | bwd_inner_microstep: 1093.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2961
[2024-06-10 21:29:10,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.42 | bwd_microstep: 1135.81 | bwd_inner_microstep: 1135.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3423
[2024-06-10 21:29:11,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.76 | bwd_microstep: 1185.43 | bwd_inner_microstep: 1185.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837
[2024-06-10 21:29:13,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.09 | bwd_microstep: 1488.93 | bwd_inner_microstep: 1488.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3616
[2024-06-10 21:29:15,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1389.36 | bwd_inner_microstep: 1389.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 21:29:17,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1397.60 | bwd_inner_microstep: 1397.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-10 21:29:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1327.57 | bwd_inner_microstep: 1327.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 21:29:21,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1504.86 | bwd_inner_microstep: 1504.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2285
[2024-06-10 21:29:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.31 | bwd_microstep: 784.60 | bwd_inner_microstep: 784.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1982
[2024-06-10 21:29:23,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.31 | bwd_microstep: 709.58 | bwd_inner_microstep: 709.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919
[2024-06-10 21:29:25,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.19 | bwd_microstep: 1187.98 | bwd_inner_microstep: 1187.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3578
[2024-06-10 21:29:27,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1364.95 | bwd_inner_microstep: 1364.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3566
[2024-06-10 21:29:29,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 1428.53 | bwd_inner_microstep: 1428.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-10 21:29:31,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.84 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 21:29:33,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3570
[2024-06-10 21:29:40,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.35 | optimizer_step: 6.56
[2024-06-10 21:29:40,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 6360.78 | bwd_inner_microstep: 1871.47 | bwd_allreduce_microstep: 4489.24 | step_microstep: 38.80
[2024-06-10 21:29:40,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15323.15 | bwd: 45439.98 | bwd_inner: 40949.81 | bwd_allreduce: 4489.48 | step: 40.27

 70%|██████▉   | 1201/1726 [20:47:09<10:25:02, 71.43s/it]


 70%|██████▉   | 1201/1726 [20:47:09<10:25:02, 71.43s/it]
 70%|██████▉   | 1202/1726 [20:48:12<10:01:05, 68.83s/it]


 70%|██████▉   | 1202/1726 [20:48:12<10:01:05, 68.83s/it]
 70%|██████▉   | 1203/1726 [20:49:13<9:39:50, 66.52s/it]


 70%|██████▉   | 1203/1726 [20:49:13<9:39:50, 66.52s/it]
 70%|██████▉   | 1204/1726 [20:50:13<9:21:14, 64.51s/it]


 70%|██████▉   | 1204/1726 [20:50:13<9:21:14, 64.51s/it]
 70%|██████▉   | 1205/1726 [20:51:15<9:14:58, 63.91s/it]


 70%|██████▉   | 1205/1726 [20:51:15<9:14:58, 63.91s/it]
 70%|███�{'loss': 1.1899, 'learning_rate': 8.79126089250843e-06, 'epoch': 0.7}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 21:29:42,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1467.35 | bwd_inner_microstep: 1467.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 21:29:43,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.01 | bwd_microstep: 1308.55 | bwd_inner_microstep: 1308.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3839
[2024-06-10 21:29:45,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1353.99 | bwd_inner_microstep: 1353.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 21:29:47,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.01 | bwd_microstep: 1446.06 | bwd_inner_microstep: 1446.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771
[2024-06-10 21:29:49,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.77 | bwd_microstep: 1340.48 | bwd_inner_microstep: 1340.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-10 21:29:51,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1308.73 | bwd_inner_microstep: 1308.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 21:29:53,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 21:29:55,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1496.35 | bwd_inner_microstep: 1496.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 21:29:57,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1389.93 | bwd_inner_microstep: 1389.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-10 21:29:59,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1189.29 | bwd_inner_microstep: 1189.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-10 21:30:00,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1313.37 | bwd_inner_microstep: 1313.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 21:30:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.31 | bwd_microstep: 793.37 | bwd_inner_microstep: 793.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 21:30:03,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.69 | bwd_microstep: 1289.03 | bwd_inner_microstep: 1289.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674
[2024-06-10 21:30:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1549.42 | bwd_inner_microstep: 1549.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 21:30:07,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1377.25 | bwd_inner_microstep: 1377.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 21:30:10,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.09 | bwd_microstep: 1625.91 | bwd_inner_microstep: 1625.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 21:30:11,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.22 | bwd_microstep: 800.78 | bwd_inner_microstep: 800.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2597
[2024-06-10 21:30:12,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.48 | bwd_microstep: 1094.59 | bwd_inner_microstep: 1094.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-10 21:30:14,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1511.71 | bwd_inner_microstep: 1511.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3672
[2024-06-10 21:30:16,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.50 | bwd_microstep: 1454.17 | bwd_inner_microstep: 1454.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-10 21:30:17,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 877.04 | bwd_inner_microstep: 877.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 21:30:19,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.15 | bwd_microstep: 820.42 | bwd_inner_microstep: 820.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 21:30:20,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.77 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-10 21:30:22,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.59 | bwd_microstep: 917.55 | bwd_inner_microstep: 917.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023
[2024-06-10 21:30:23,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.61 | bwd_microstep: 904.78 | bwd_inner_microstep: 904.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3806
[2024-06-10 21:30:25,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.81 | bwd_microstep: 1574.34 | bwd_inner_microstep: 1574.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3793
[2024-06-10 21:30:27,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.90 | bwd_microstep: 1749.35 | bwd_inner_microstep: 1749.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2913
[2024-06-10 21:30:29,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.16 | bwd_microstep: 1094.28 | bwd_inner_microstep: 1094.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834
[2024-06-10 21:30:31,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.23 | bwd_microstep: 1752.89 | bwd_inner_microstep: 1752.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 21:30:34,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.74 | bwd_microstep: 1547.05 | bwd_inner_microstep: 1547.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-10 21:30:35,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1307.08 | bwd_inner_microstep: 1307.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 21:30:39,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 21:30:39,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.72 | bwd_microstep: 3361.32 | bwd_inner_microstep: 1861.37 | bwd_allreduce_microstep: 1499.90 | step_microstep: 39.15
[2024-06-10 21:30:39,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15708.50 | bwd: 43684.95 | bwd_inner: 42184.15 | bwd_allreduce: 1500.12 | step: 41.74
{'loss': 1.1673, 'learning_rate': 8.76019513089206e-06, 'epoch': 0.7}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5710
[2024-06-10 21:30:42,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 805.54 | bwd_microstep: 2152.04 | bwd_inner_microstep: 2152.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 21:30:44,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.17 | bwd_microstep: 1275.46 | bwd_inner_microstep: 1275.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3881
[2024-06-10 21:30:46,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.10 | bwd_microstep: 1582.76 | bwd_inner_microstep: 1582.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3420
[2024-06-10 21:30:48,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1474.81 | bwd_inner_microstep: 1474.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 21:30:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.99 | bwd_microstep: 727.00 | bwd_inner_microstep: 726.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 21:30:51,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1383.62 | bwd_inner_microstep: 1383.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 21:30:53,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 21:30:55,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.62 | bwd_microstep: 1247.68 | bwd_inner_microstep: 1247.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 21:30:57,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.06 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 21:30:59,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.07 | bwd_microstep: 1542.61 | bwd_inner_microstep: 1542.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3515
[2024-06-10 21:31:01,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.43 | bwd_microstep: 1333.86 | bwd_inner_microstep: 1333.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 21:31:03,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 21:31:04,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1376.11 | bwd_inner_microstep: 1376.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 21:31:06,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1410.04 | bwd_inner_microstep: 1410.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3900
[2024-06-10 21:31:09,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.93 | bwd_microstep: 1785.08 | bwd_inner_microstep: 1785.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-10 21:31:11,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 1415.90 | bwd_inner_microstep: 1415.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 21:31:12,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.73 | bwd_microstep: 685.60 | bwd_inner_microstep: 685.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 21:31:14,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1291.91 | bwd_inner_microstep: 1291.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931
[2024-06-10 21:31:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 21:31:18,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.04 | bwd_microstep: 1659.98 | bwd_inner_microstep: 1659.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 21:31:20,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1287.67 | bwd_inner_microstep: 1287.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 21:31:21,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1254.70 | bwd_inner_microstep: 1254.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2303
[2024-06-10 21:31:23,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 977.44 | bwd_inner_microstep: 977.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511
[2024-06-10 21:31:24,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.98 | bwd_microstep: 1192.06 | bwd_inner_microstep: 1192.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006
[2024-06-10 21:31:26,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.05 | bwd_microstep: 833.82 | bwd_inner_microstep: 833.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 21:31:28,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.05 | bwd_microstep: 1453.43 | bwd_inner_microstep: 1453.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3555
[2024-06-10 21:31:30,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.39 | bwd_microstep: 1543.16 | bwd_inner_microstep: 1543.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3787
[2024-06-10 21:31:32,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.07 | bwd_microstep: 1599.05 | bwd_inner_microstep: 1599.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2230
[2024-06-10 21:31:33,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.37 | bwd_microstep: 1024.44 | bwd_inner_microstep: 1024.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3762
[2024-06-10 21:31:36,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1606.42 | bwd_inner_microstep: 1606.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3770
[2024-06-10 21:31:38,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.53 | bwd_microstep: 1842.66 | bwd_inner_microstep: 1842.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2406
[2024-06-10 21:31:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.15 | optimizer_gradients: 4.17 | optimizer_step: 6.59
[2024-06-10 21:31:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.80 | bwd_microstep: 1596.10 | bwd_inner_microstep: 1150.90 | bwd_allreduce_microstep: 445.03 | step_microstep: 39.35
[2024-06-10 21:31:40,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16276.45 | bwd: 44096.30 | bwd_inner: 43650.23 | bwd_allreduce: 445.25 | step: 40.89
{'loss': 1.2092, 'learning_rate': 8.729168955810015e-06, 'epoch': 0.7}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3427
[2024-06-10 21:31:42,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1509.26 | bwd_inner_microstep: 1509.18 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 21:31:43,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.98 | bwd_microstep: 679.70 | bwd_inner_microstep: 679.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3870
[2024-06-10 21:31:45,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1495.15 | bwd_inner_microstep: 1495.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824
[2024-06-10 21:31:47,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.89 | bwd_microstep: 1513.11 | bwd_inner_microstep: 1513.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 21:31:49,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.91 | bwd_microstep: 1278.50 | bwd_inner_microstep: 1278.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 21:31:51,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 21:31:53,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1403.74 | bwd_inner_microstep: 1403.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 21:31:55,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1379.05 | bwd_inner_microstep: 1379.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 21:31:57,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-10 21:31:58,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1255.35 | bwd_inner_microstep: 1255.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 21:32:00,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.14 | bwd_microstep: 1423.01 | bwd_inner_microstep: 1422.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947
[2024-06-10 21:32:01,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.96 | bwd_microstep: 727.33 | bwd_inner_microstep: 727.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2195
[2024-06-10 21:32:03,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.03 | bwd_microstep: 859.45 | bwd_inner_microstep: 859.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 21:32:04,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.26 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1936
[2024-06-10 21:32:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.46 | bwd_microstep: 882.36 | bwd_inner_microstep: 882.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 21:32:08,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.97 | bwd_microstep: 1645.09 | bwd_inner_microstep: 1645.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 21:32:10,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1377.04 | bwd_inner_microstep: 1377.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 21:32:12,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2609
[2024-06-10 21:32:13,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.24 | bwd_microstep: 1011.80 | bwd_inner_microstep: 1011.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3623
[2024-06-10 21:32:15,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.82 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 21:32:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.77 | bwd_microstep: 1158.60 | bwd_inner_microstep: 1158.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 21:32:18,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1354.56 | bwd_inner_microstep: 1354.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 21:32:21,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1503.66 | bwd_inner_microstep: 1503.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919
[2024-06-10 21:32:21,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.84 | bwd_microstep: 687.29 | bwd_inner_microstep: 687.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-10 21:32:23,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1387.31 | bwd_inner_microstep: 1387.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2067
[2024-06-10 21:32:25,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.89 | bwd_microstep: 914.92 | bwd_inner_microstep: 914.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3973
[2024-06-10 21:32:27,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.57 | bwd_microstep: 1607.23 | bwd_inner_microstep: 1607.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 21:32:28,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.19 | bwd_microstep: 799.12 | bwd_inner_microstep: 799.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 21:32:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1600.37 | bwd_inner_microstep: 1600.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-10 21:32:32,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.24 | bwd_microstep: 1307.97 | bwd_inner_microstep: 1307.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584
[2024-06-10 21:32:34,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1400.32 | bwd_inner_microstep: 1400.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-10 21:32:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 21:32:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 8517.94 | bwd_inner_microstep: 2072.54 | bwd_allreduce_microstep: 6445.33 | step_microstep: 38.89
[2024-06-10 21:32:43,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15216.90 | bwd: 47475.53 | bwd_inner: 41029.22 | bwd_allreduce: 6445.60 | step: 40.40
{'loss': 1.1537, 'learning_rate': 8.698182476536316e-06, 'epoch': 0.7}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 21:32:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1370.63 | bwd_inner_microstep: 1370.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2322
[2024-06-10 21:32:46,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.48 | bwd_microstep: 882.18 | bwd_inner_microstep: 882.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3825
[2024-06-10 21:32:48,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1401.39 | bwd_inner_microstep: 1401.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3792
[2024-06-10 21:32:50,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1471.04 | bwd_inner_microstep: 1471.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-10 21:32:52,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.49 | bwd_microstep: 1645.66 | bwd_inner_microstep: 1645.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 21:32:55,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 1537.11 | bwd_inner_microstep: 1537.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 21:32:56,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1342.38 | bwd_inner_microstep: 1342.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2910
[2024-06-10 21:32:58,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.59 | bwd_microstep: 999.27 | bwd_inner_microstep: 999.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182
[2024-06-10 21:32:59,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.16 | bwd_microstep: 951.75 | bwd_inner_microstep: 951.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 21:33:01,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1402.52 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511
[2024-06-10 21:33:03,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1446.61 | bwd_inner_microstep: 1446.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3434
[2024-06-10 21:33:05,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1394.75 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384
[2024-06-10 21:33:07,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.85 | bwd_microstep: 1240.12 | bwd_inner_microstep: 1240.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 21:33:09,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-10 21:33:11,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.74 | bwd_microstep: 1524.43 | bwd_inner_microstep: 1524.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974
[2024-06-10 21:33:12,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.64 | bwd_microstep: 766.26 | bwd_inner_microstep: 766.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 21:33:14,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1287.02 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 21:33:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1434.99 | bwd_inner_microstep: 1434.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999
[2024-06-10 21:33:17,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.26 | bwd_microstep: 737.37 | bwd_inner_microstep: 737.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.39
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539
[2024-06-10 21:33:18,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.19 | bwd_microstep: 1199.06 | bwd_inner_microstep: 1199.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998
[2024-06-10 21:33:19,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.54 | bwd_microstep: 706.85 | bwd_inner_microstep: 706.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 21:33:22,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.72 | bwd_microstep: 1658.35 | bwd_inner_microstep: 1658.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 21:33:24,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.62 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 21:33:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1654.18 | bwd_inner_microstep: 1654.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-10 21:33:27,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.41 | bwd_microstep: 975.71 | bwd_inner_microstep: 975.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 21:33:29,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.11 | bwd_microstep: 1407.71 | bwd_inner_microstep: 1407.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-10 21:33:31,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.54 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3772
[2024-06-10 21:33:33,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.33 | bwd_microstep: 1437.16 | bwd_inner_microstep: 1437.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3782
[2024-06-10 21:33:35,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1380.82 | bwd_inner_microstep: 1380.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 21:33:37,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1449.28 | bwd_inner_microstep: 1449.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3761
[2024-06-10 21:33:39,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1637.24 | bwd_inner_microstep: 1637.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3592
[2024-06-10 21:33:45,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 21:33:45,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.12 | bwd_microstep: 5240.13 | bwd_inner_microstep: 1908.13 | bwd_allreduce_microstep: 3331.93 | step_microstep: 38.79
[2024-06-10 21:33:45,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15829.38 | bwd: 46070.78 | bwd_inner: 42737.93 | bwd_allreduce: 3332.17 | step: 41.58
{'loss': 1.2004, 'learning_rate': 8.667235802205183e-06, 'epoch': 0.7}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3495
[2024-06-10 21:33:47,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.57 | bwd_microstep: 1506.58 | bwd_inner_microstep: 1506.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 21:33:49,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.90 | bwd_microstep: 1272.98 | bwd_inner_microstep: 1272.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2391
[2024-06-10 21:33:50,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.29 | bwd_microstep: 838.81 | bwd_inner_microstep: 838.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2756
[2024-06-10 21:33:52,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.71 | bwd_microstep: 1002.27 | bwd_inner_microstep: 1002.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4133
[2024-06-10 21:33:54,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.99 | bwd_microstep: 1589.07 | bwd_inner_microstep: 1589.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 21:33:56,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1251.56 | bwd_inner_microstep: 1251.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 21:33:57,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1245.29 | bwd_inner_microstep: 1245.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 21:33:59,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1392.44 | bwd_inner_microstep: 1392.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 21:34:01,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1247.60 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928
[2024-06-10 21:34:02,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.85 | bwd_microstep: 725.12 | bwd_inner_microstep: 725.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060
[2024-06-10 21:34:03,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.51 | bwd_microstep: 818.36 | bwd_inner_microstep: 818.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-10 21:34:05,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.51 | bwd_microstep: 1154.97 | bwd_inner_microstep: 1154.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-10 21:34:07,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1347.04 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3699
[2024-06-10 21:34:09,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1482.45 | bwd_inner_microstep: 1482.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2107
[2024-06-10 21:34:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.87 | bwd_microstep: 982.37 | bwd_inner_microstep: 982.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 21:34:12,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1346.09 | bwd_inner_microstep: 1346.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3945
[2024-06-10 21:34:14,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.57 | bwd_microstep: 1600.25 | bwd_inner_microstep: 1600.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-10 21:34:16,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1627.75 | bwd_inner_microstep: 1627.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 21:34:18,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1292.23 | bwd_inner_microstep: 1292.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539
[2024-06-10 21:34:20,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.64 | bwd_microstep: 1451.83 | bwd_inner_microstep: 1451.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 21:34:22,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1547.66 | bwd_inner_microstep: 1547.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 21:34:25,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.97 | bwd_microstep: 1658.83 | bwd_inner_microstep: 1658.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 21:34:26,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.35 | bwd_microstep: 1356.52 | bwd_inner_microstep: 1356.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543
[2024-06-10 21:34:28,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.47 | bwd_microstep: 1326.75 | bwd_inner_microstep: 1326.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782
[2024-06-10 21:34:30,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.14 | bwd_microstep: 1353.04 | bwd_inner_microstep: 1353.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 21:34:32,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.88 | bwd_microstep: 1475.13 | bwd_inner_microstep: 1475.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 21:34:34,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1373.55 | bwd_inner_microstep: 1373.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2286
[2024-06-10 21:34:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.84 | bwd_microstep: 1006.25 | bwd_inner_microstep: 1006.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424
[2024-06-10 21:34:37,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1310.96 | bwd_inner_microstep: 1310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 21:34:39,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1253.65 | bwd_inner_microstep: 1253.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 21:34:41,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1651.77 | bwd_inner_microstep: 1651.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3431
[2024-06-10 21:34:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 21:34:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.59 | bwd_microstep: 5719.60 | bwd_inner_microstep: 1721.02 | bwd_allreduce_microstep: 3998.52 | step_microstep: 38.62
[2024-06-10 21:34:48,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15773.81 | bwd: 46208.79 | bwd_inner: 42209.36 | bwd_allreduce: 3998.75 | step: 40.03
�██▉   | 1206/1726 [20:52:16<9:06:33, 63.06s/it]


 70%|██████▉   | 1206/1726 [20:52:16<9:06:33, 63.06s/it]
 70%|██████▉   | 1207/1726 [20:53:16<8:56:50, 62.06s/it]


 70%|██████▉   | 1207/1726 [20:53:16<8:56:50, 62.06s/it]
 70%|██████▉   | 1208/1726 [20:54:17<8:52:17, 61.66s/it]


 70%|██████▉   | 1208/1726 [20:54:17<8:52:17, 61.66s/it]
 70%|███████   | 1209/1726 [20:55:20<8:54:47, 62.07s/it]


 70%|███████   | 1209/1726 [20:55:20<8:54:47, 62.07s/it]
 70%|███████   | 1210/1726 [20:56:22<8:54:11, 62.12s/it]


 70%|███████   | 1210/1726 [20:56:22<8:54:11, 62.12s/it]
 70%|███████   | 1211/1726{'loss': 1.1983, 'learning_rate': 8.636329041810632e-06, 'epoch': 0.7}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 21:34:49,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.78 | bwd_microstep: 1271.65 | bwd_inner_microstep: 1271.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 21:34:51,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 21:34:53,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1349.92 | bwd_inner_microstep: 1349.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-10 21:34:55,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.92 | bwd_microstep: 1210.53 | bwd_inner_microstep: 1210.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 21:34:56,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1276.50 | bwd_inner_microstep: 1276.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 21:34:58,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.73 | bwd_microstep: 1277.40 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2240
[2024-06-10 21:34:59,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.99 | bwd_microstep: 895.20 | bwd_inner_microstep: 895.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-10 21:35:01,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.72 | bwd_microstep: 1483.43 | bwd_inner_microstep: 1483.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887
[2024-06-10 21:35:02,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.82 | bwd_microstep: 682.46 | bwd_inner_microstep: 682.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-10 21:35:03,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.82 | bwd_microstep: 698.50 | bwd_inner_microstep: 698.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 21:35:05,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.31 | bwd_microstep: 1248.95 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 21:35:07,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1380.46 | bwd_inner_microstep: 1380.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-10 21:35:09,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1512.14 | bwd_inner_microstep: 1512.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199
[2024-06-10 21:35:10,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.95 | bwd_microstep: 958.39 | bwd_inner_microstep: 958.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3541
[2024-06-10 21:35:13,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.62 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 21:35:14,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472
[2024-06-10 21:35:16,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.97 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883
[2024-06-10 21:35:17,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.12 | bwd_microstep: 711.35 | bwd_inner_microstep: 711.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070
[2024-06-10 21:35:19,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.67 | bwd_microstep: 818.44 | bwd_inner_microstep: 818.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-10 21:35:20,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1405.36 | bwd_inner_microstep: 1405.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 21:35:22,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3610
[2024-06-10 21:35:24,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.24 | bwd_microstep: 1457.43 | bwd_inner_microstep: 1457.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-10 21:35:26,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.59 | bwd_microstep: 1430.77 | bwd_inner_microstep: 1430.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 21:35:28,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.41 | bwd_microstep: 1391.79 | bwd_inner_microstep: 1391.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-10 21:35:30,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.69 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-10 21:35:32,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.34 | bwd_microstep: 1415.83 | bwd_inner_microstep: 1415.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2245
[2024-06-10 21:35:34,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.43 | bwd_microstep: 1002.29 | bwd_inner_microstep: 1002.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818
[2024-06-10 21:35:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1621.75 | bwd_inner_microstep: 1621.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 21:35:38,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.39 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3724
[2024-06-10 21:35:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.62 | bwd_microstep: 1477.06 | bwd_inner_microstep: 1477.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 21:35:41,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.62 | bwd_microstep: 909.55 | bwd_inner_microstep: 909.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-10 21:35:49,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-10 21:35:49,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.49 | bwd_microstep: 7443.35 | bwd_inner_microstep: 1448.26 | bwd_allreduce_microstep: 5995.03 | step_microstep: 37.95
[2024-06-10 21:35:49,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14980.17 | bwd: 45969.78 | bwd_inner: 39973.84 | bwd_allreduce: 5995.26 | step: 39.41
{'loss': 1.233, 'learning_rate': 8.605462304206129e-06, 'epoch': 0.7}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405
[2024-06-10 21:35:51,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1296.20 | bwd_inner_microstep: 1296.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-10 21:35:53,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1618.92 | bwd_inner_microstep: 1618.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 21:35:55,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.39 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-10 21:35:56,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.76 | bwd_microstep: 677.75 | bwd_inner_microstep: 677.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 21:35:58,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1380.93 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-10 21:35:58,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.49 | bwd_microstep: 702.45 | bwd_inner_microstep: 702.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219
[2024-06-10 21:36:00,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.18 | bwd_microstep: 957.82 | bwd_inner_microstep: 957.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 21:36:01,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 790.61 | bwd_inner_microstep: 790.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 21:36:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3483
[2024-06-10 21:36:05,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.09 | bwd_microstep: 1440.58 | bwd_inner_microstep: 1440.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-10 21:36:07,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.33 | bwd_microstep: 1412.24 | bwd_inner_microstep: 1412.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 21:36:09,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1338.58 | bwd_inner_microstep: 1338.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 21:36:10,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1276.03 | bwd_inner_microstep: 1276.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 21:36:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1414.74 | bwd_inner_microstep: 1414.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422
[2024-06-10 21:36:14,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1540.62 | bwd_inner_microstep: 1540.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3684
[2024-06-10 21:36:17,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1554.12 | bwd_inner_microstep: 1554.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640
[2024-06-10 21:36:18,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1317.43 | bwd_inner_microstep: 1317.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016
[2024-06-10 21:36:19,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.22 | bwd_microstep: 711.33 | bwd_inner_microstep: 711.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 21:36:21,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1288.40 | bwd_inner_microstep: 1288.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 21:36:23,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.79 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-10 21:36:25,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.97 | bwd_microstep: 1182.21 | bwd_inner_microstep: 1182.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822
[2024-06-10 21:36:27,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.43 | bwd_microstep: 1687.83 | bwd_inner_microstep: 1687.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2912
[2024-06-10 21:36:29,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.19 | bwd_microstep: 1092.89 | bwd_inner_microstep: 1092.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-10 21:36:30,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1409.11 | bwd_inner_microstep: 1409.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3818
[2024-06-10 21:36:33,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.62 | bwd_microstep: 1516.50 | bwd_inner_microstep: 1516.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 21:36:35,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.48 | bwd_microstep: 1499.45 | bwd_inner_microstep: 1499.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-10 21:36:37,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 1428.33 | bwd_inner_microstep: 1428.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 21:36:39,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1400.14 | bwd_inner_microstep: 1400.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3443
[2024-06-10 21:36:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.98 | bwd_microstep: 1316.58 | bwd_inner_microstep: 1316.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-10 21:36:42,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1532.63 | bwd_inner_microstep: 1532.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 21:36:44,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1405.18 | bwd_inner_microstep: 1405.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3812
[2024-06-10 21:36:52,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 21:36:52,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.81 | bwd_microstep: 7036.94 | bwd_inner_microstep: 2120.08 | bwd_allreduce_microstep: 4916.81 | step_microstep: 38.12
[2024-06-10 21:36:52,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15742.73 | bwd: 47200.68 | bwd_inner: 42282.97 | bwd_allreduce: 4917.04 | step: 39.62
{'loss': 1.2115, 'learning_rate': 8.57463569810415e-06, 'epoch': 0.7}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 21:36:54,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.31 | bwd_microstep: 1330.75 | bwd_inner_microstep: 1330.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-10 21:36:56,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1295.73 | bwd_inner_microstep: 1295.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940
[2024-06-10 21:36:58,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.39 | bwd_microstep: 1592.63 | bwd_inner_microstep: 1592.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 21:37:00,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 21:37:02,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.15 | bwd_microstep: 1548.71 | bwd_inner_microstep: 1548.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786
[2024-06-10 21:37:04,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.10 | bwd_microstep: 1645.93 | bwd_inner_microstep: 1645.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3733
[2024-06-10 21:37:06,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.86 | bwd_microstep: 1461.25 | bwd_inner_microstep: 1461.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1884
[2024-06-10 21:37:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.53 | bwd_microstep: 745.44 | bwd_inner_microstep: 745.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984
[2024-06-10 21:37:09,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 735.38 | bwd_inner_microstep: 735.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3963
[2024-06-10 21:37:11,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1498.59 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 21:37:13,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1391.17 | bwd_inner_microstep: 1391.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3503
[2024-06-10 21:37:14,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1435.37 | bwd_inner_microstep: 1435.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-10 21:37:16,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.41 | bwd_microstep: 786.04 | bwd_inner_microstep: 786.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944
[2024-06-10 21:37:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.53 | bwd_microstep: 760.54 | bwd_inner_microstep: 760.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 21:37:19,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1482.67 | bwd_inner_microstep: 1482.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423
[2024-06-10 21:37:20,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-10 21:37:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.02 | bwd_microstep: 1614.97 | bwd_inner_microstep: 1614.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-10 21:37:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.66 | bwd_microstep: 1311.14 | bwd_inner_microstep: 1311.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 21:37:27,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1457.96 | bwd_inner_microstep: 1457.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-10 21:37:29,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.33 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3487
[2024-06-10 21:37:31,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.92 | bwd_microstep: 1543.75 | bwd_inner_microstep: 1543.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-10 21:37:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-10 21:37:35,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1320.78 | bwd_inner_microstep: 1320.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 21:37:36,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1256.17 | bwd_inner_microstep: 1256.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-10 21:37:38,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1422.50 | bwd_inner_microstep: 1422.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388
[2024-06-10 21:37:40,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.70 | bwd_microstep: 1338.96 | bwd_inner_microstep: 1338.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3828
[2024-06-10 21:37:42,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.59 | bwd_microstep: 1706.90 | bwd_inner_microstep: 1706.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 21:37:44,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1411.90 | bwd_inner_microstep: 1411.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2084
[2024-06-10 21:37:46,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.75 | bwd_microstep: 848.83 | bwd_inner_microstep: 848.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 21:37:48,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.49 | bwd_microstep: 1623.70 | bwd_inner_microstep: 1623.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3487
[2024-06-10 21:37:50,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.57 | bwd_microstep: 1506.93 | bwd_inner_microstep: 1506.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3581
[2024-06-10 21:37:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.07 | optimizer_step: 6.58
[2024-06-10 21:37:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.05 | bwd_microstep: 5009.65 | bwd_inner_microstep: 1875.85 | bwd_allreduce_microstep: 3133.75 | step_microstep: 38.38
[2024-06-10 21:37:56,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16223.93 | bwd: 46813.14 | bwd_inner: 43678.50 | bwd_allreduce: 3133.97 | step: 39.93
{'loss': 1.1902, 'learning_rate': 8.543849332075862e-06, 'epoch': 0.7}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4285
[2024-06-10 21:37:58,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.98 | bwd_microstep: 1517.48 | bwd_inner_microstep: 1517.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886
[2024-06-10 21:38:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.84 | bwd_microstep: 1583.80 | bwd_inner_microstep: 1583.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3873
[2024-06-10 21:38:02,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1317.64 | bwd_inner_microstep: 1317.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3787
[2024-06-10 21:38:04,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1378.65 | bwd_inner_microstep: 1378.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 21:38:06,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.61 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 21:38:07,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.04 | bwd_microstep: 1245.50 | bwd_inner_microstep: 1245.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510
[2024-06-10 21:38:09,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1350.52 | bwd_inner_microstep: 1350.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 21:38:11,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.73 | bwd_microstep: 1151.40 | bwd_inner_microstep: 1151.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2225
[2024-06-10 21:38:12,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.01 | bwd_microstep: 798.75 | bwd_inner_microstep: 798.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 21:38:14,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1247.72 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-10 21:38:15,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.70 | bwd_microstep: 1309.15 | bwd_inner_microstep: 1309.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523
[2024-06-10 21:38:17,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.45 | bwd_microstep: 1458.14 | bwd_inner_microstep: 1458.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 21:38:19,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.16 | bwd_microstep: 1492.07 | bwd_inner_microstep: 1492.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 21:38:21,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1348.70 | bwd_inner_microstep: 1348.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-10 21:38:23,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-10 21:38:26,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.80 | bwd_microstep: 1526.86 | bwd_inner_microstep: 1526.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 21:38:27,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1288.65 | bwd_inner_microstep: 1288.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3536
[2024-06-10 21:38:29,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.30 | bwd_microstep: 1522.57 | bwd_inner_microstep: 1522.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3627
[2024-06-10 21:38:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1588.65 | bwd_inner_microstep: 1588.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 21:38:34,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.69 | bwd_microstep: 1381.64 | bwd_inner_microstep: 1381.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 21:38:36,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1502.38 | bwd_inner_microstep: 1502.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 21:38:37,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.77 | bwd_microstep: 1288.00 | bwd_inner_microstep: 1287.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577
[2024-06-10 21:38:39,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.87 | bwd_microstep: 1301.09 | bwd_inner_microstep: 1301.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 21:38:41,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1412.43 | bwd_inner_microstep: 1412.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 21:38:42,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.19 | bwd_microstep: 808.31 | bwd_inner_microstep: 808.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2281
[2024-06-10 21:38:44,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.06 | bwd_microstep: 1005.20 | bwd_inner_microstep: 1005.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-10 21:38:46,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.48 | bwd_microstep: 1534.62 | bwd_inner_microstep: 1534.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 21:38:48,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.89 | bwd_microstep: 1615.73 | bwd_inner_microstep: 1615.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819
[2024-06-10 21:38:50,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.61 | bwd_microstep: 1724.71 | bwd_inner_microstep: 1724.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-10 21:38:53,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.77 | bwd_microstep: 1596.25 | bwd_inner_microstep: 1596.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2227
[2024-06-10 21:38:54,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.34 | bwd_microstep: 958.53 | bwd_inner_microstep: 958.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465
[2024-06-10 21:38:56,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-10 21:38:56,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.29 | bwd_microstep: 1612.79 | bwd_inner_microstep: 1605.01 | bwd_allreduce_microstep: 7.74 | step_microstep: 37.85
[2024-06-10 21:38:56,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16401.71 | bwd: 43848.03 | bwd_inner: 43839.40 | bwd_allreduce: 7.97 | step: 39.32
{'loss': 1.1769, 'learning_rate': 8.513103314550657e-06, 'epoch': 0.7}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 21:38:57,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 781.04 | bwd_inner_microstep: 781.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3937
[2024-06-10 21:38:59,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1592.19 | bwd_inner_microstep: 1592.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 21:39:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1553.09 | bwd_inner_microstep: 1553.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-10 21:39:04,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1478.44 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 21:39:05,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.32 | bwd_microstep: 1151.79 | bwd_inner_microstep: 1151.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862
[2024-06-10 21:39:07,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1460.32 | bwd_inner_microstep: 1460.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 21:39:09,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1352.49 | bwd_inner_microstep: 1352.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489
[2024-06-10 21:39:11,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1430.83 | bwd_inner_microstep: 1430.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 21:39:13,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1290.52 | bwd_inner_microstep: 1290.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 21:39:15,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1389.04 | bwd_inner_microstep: 1389.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3904
[2024-06-10 21:39:17,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1394.44 | bwd_inner_microstep: 1394.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510
[2024-06-10 21:39:18,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.30 | bwd_microstep: 1225.48 | bwd_inner_microstep: 1225.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-10 21:39:20,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.81 | bwd_microstep: 1153.61 | bwd_inner_microstep: 1153.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1913
[2024-06-10 21:39:21,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.02 | bwd_microstep: 717.60 | bwd_inner_microstep: 717.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2097
[2024-06-10 21:39:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.03 | bwd_microstep: 770.74 | bwd_inner_microstep: 770.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 21:39:24,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1378.59 | bwd_inner_microstep: 1378.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3524
[2024-06-10 21:39:26,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.43 | bwd_microstep: 1520.75 | bwd_inner_microstep: 1520.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 21:39:28,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.30 | bwd_microstep: 1285.41 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-10 21:39:30,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.57 | bwd_microstep: 1709.48 | bwd_inner_microstep: 1709.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-10 21:39:31,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.08 | bwd_microstep: 779.38 | bwd_inner_microstep: 779.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 21:39:33,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 1432.07 | bwd_inner_microstep: 1432.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3072
[2024-06-10 21:39:35,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.76 | bwd_microstep: 1177.26 | bwd_inner_microstep: 1177.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-10 21:39:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1426.39 | bwd_inner_microstep: 1426.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-10 21:39:39,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.89 | bwd_microstep: 1606.99 | bwd_inner_microstep: 1606.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-10 21:39:41,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.95 | bwd_microstep: 1182.10 | bwd_inner_microstep: 1182.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 617
[2024-06-10 21:39:41,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.21 | bwd_microstep: 261.23 | bwd_inner_microstep: 261.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 21:39:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.40 | bwd_microstep: 1500.19 | bwd_inner_microstep: 1500.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-10 21:39:45,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1219.73 | bwd_inner_microstep: 1219.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-10 21:39:47,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.72 | bwd_microstep: 1449.21 | bwd_inner_microstep: 1449.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-10 21:39:49,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.91 | bwd_microstep: 1609.41 | bwd_inner_microstep: 1609.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-10 21:39:51,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.60 | bwd_microstep: 1497.43 | bwd_inner_microstep: 1497.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044
[2024-06-10 21:39:57,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-10 21:39:57,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.85 | bwd_microstep: 5314.25 | bwd_inner_microstep: 1038.19 | bwd_allreduce_microstep: 4276.01 | step_microstep: 37.96
[2024-06-10 21:39:57,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15266.18 | bwd: 45091.50 | bwd_inner: 40814.57 | bwd_allreduce: 4276.24 | step: 39.44
 [20:57:24<8:53:39, 62.17s/it]


 70%|███████   | 1211/1726 [20:57:24<8:53:39, 62.17s/it]
 70%|███████   | 1212/1726 [20:58:26<8:50:18, 61.90s/it]


 70%|███████   | 1212/1726 [20:58:26<8:50:18, 61.90s/it]
 70%|███████   | 1213/1726 [20:59:29<8:52:47, 62.31s/it]


 70%|███████   | 1213/1726 [20:59:29<8:52:47, 62.31s/it]
 70%|███████   | 1214/1726 [21:00:32<8:54:27, 62.63s/it]


 70%|███████   | 1214/1726 [21:00:32<8:54:27, 62.63s/it]
 70%|███████   | 1215/1726 [21:01:33<8:48:11, 62.02s/it]


 70%|███████   | 1215/1726 [21:01:33<8:48:11, 62.02s/it]
 70%|███████   | 1216/1726 [21:02:34<8:43:45, 61.6{'loss': 1.2051, 'learning_rate': 8.482397753815872e-06, 'epoch': 0.7}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 21:39:59,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1330.34 | bwd_inner_microstep: 1330.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 21:40:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.77 | bwd_microstep: 1473.11 | bwd_inner_microstep: 1473.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 21:40:03,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1403.30 | bwd_inner_microstep: 1403.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3864
[2024-06-10 21:40:05,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.37 | bwd_microstep: 1660.94 | bwd_inner_microstep: 1660.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 21:40:07,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1276.67 | bwd_inner_microstep: 1276.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 21:40:09,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.71 | bwd_microstep: 1651.13 | bwd_inner_microstep: 1651.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-10 21:40:11,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1250.03 | bwd_inner_microstep: 1250.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-10 21:40:13,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1413.19 | bwd_inner_microstep: 1413.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 21:40:14,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1248.77 | bwd_inner_microstep: 1248.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3481
[2024-06-10 21:40:16,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1429.73 | bwd_inner_microstep: 1429.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-10 21:40:18,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1274.90 | bwd_inner_microstep: 1274.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-10 21:40:20,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1346.02 | bwd_inner_microstep: 1346.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 21:40:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.92 | bwd_microstep: 1520.84 | bwd_inner_microstep: 1520.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2446
[2024-06-10 21:40:24,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.91 | bwd_microstep: 1132.57 | bwd_inner_microstep: 1132.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-10 21:40:26,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1508.56 | bwd_inner_microstep: 1508.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-10 21:40:28,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1522.02 | bwd_inner_microstep: 1521.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2402
[2024-06-10 21:40:29,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.70 | bwd_microstep: 965.17 | bwd_inner_microstep: 965.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2115
[2024-06-10 21:40:30,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.84 | bwd_microstep: 735.56 | bwd_inner_microstep: 735.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988
[2024-06-10 21:40:31,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.23 | bwd_microstep: 827.24 | bwd_inner_microstep: 827.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-10 21:40:33,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.22 | bwd_microstep: 1314.07 | bwd_inner_microstep: 1314.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 21:40:35,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1520.47 | bwd_inner_microstep: 1520.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 21:40:37,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1398.50 | bwd_inner_microstep: 1398.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 21:40:39,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.81 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 21:40:41,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1557.76 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-10 21:40:43,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.85 | bwd_microstep: 1539.59 | bwd_inner_microstep: 1539.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1262
[2024-06-10 21:40:44,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.49 | bwd_microstep: 455.04 | bwd_inner_microstep: 455.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 21:40:46,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1254.32 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944
[2024-06-10 21:40:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.37 | bwd_microstep: 727.27 | bwd_inner_microstep: 727.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3813
[2024-06-10 21:40:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.93 | bwd_microstep: 1699.52 | bwd_inner_microstep: 1699.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-10 21:40:51,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.95 | bwd_microstep: 1444.78 | bwd_inner_microstep: 1444.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 21:40:53,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 1543.56 | bwd_inner_microstep: 1543.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 21:40:59,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 21:40:59,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.78 | bwd_microstep: 5584.40 | bwd_inner_microstep: 2014.54 | bwd_allreduce_microstep: 3569.81 | step_microstep: 38.24
[2024-06-10 21:40:59,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15864.09 | bwd: 46461.88 | bwd_inner: 42891.17 | bwd_allreduce: 3570.04 | step: 39.73
{'loss': 1.1925, 'learning_rate': 8.451732758016322e-06, 'epoch': 0.71}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 21:41:01,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1371.50 | bwd_inner_microstep: 1371.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-10 21:41:02,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.27 | bwd_microstep: 707.52 | bwd_inner_microstep: 707.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-10 21:41:04,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.76 | bwd_microstep: 1435.34 | bwd_inner_microstep: 1435.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 21:41:06,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.53 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 21:41:08,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.16 | bwd_microstep: 1242.94 | bwd_inner_microstep: 1242.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 21:41:10,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.97 | bwd_microstep: 1386.51 | bwd_inner_microstep: 1386.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 21:41:12,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1387.06 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-10 21:41:13,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 701.60 | bwd_inner_microstep: 701.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 21:41:15,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1377.13 | bwd_inner_microstep: 1377.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3724
[2024-06-10 21:41:17,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1594.94 | bwd_inner_microstep: 1594.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1980
[2024-06-10 21:41:18,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.32 | bwd_microstep: 896.44 | bwd_inner_microstep: 896.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 21:41:20,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1493.98 | bwd_inner_microstep: 1493.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503
[2024-06-10 21:41:22,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1443.40 | bwd_inner_microstep: 1443.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975
[2024-06-10 21:41:23,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.55 | bwd_microstep: 858.46 | bwd_inner_microstep: 858.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661
[2024-06-10 21:41:25,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.03 | bwd_microstep: 1566.66 | bwd_inner_microstep: 1566.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 21:41:27,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1392.40 | bwd_inner_microstep: 1392.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-10 21:41:29,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1286.23 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3675
[2024-06-10 21:41:31,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.29 | bwd_microstep: 1259.47 | bwd_inner_microstep: 1259.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2679
[2024-06-10 21:41:32,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.16 | bwd_microstep: 1118.52 | bwd_inner_microstep: 1118.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 21:41:34,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.29 | bwd_microstep: 1287.19 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 21:41:36,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1352.56 | bwd_inner_microstep: 1352.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2114
[2024-06-10 21:41:37,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.51 | bwd_microstep: 827.18 | bwd_inner_microstep: 827.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 21:41:39,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.61 | bwd_microstep: 1507.93 | bwd_inner_microstep: 1507.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912
[2024-06-10 21:41:40,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 716.14 | bwd_inner_microstep: 716.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 21:41:42,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1404.41 | bwd_inner_microstep: 1404.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535
[2024-06-10 21:41:44,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1423.67 | bwd_inner_microstep: 1423.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 21:41:46,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1507.55 | bwd_inner_microstep: 1507.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2032
[2024-06-10 21:41:47,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.50 | bwd_microstep: 714.61 | bwd_inner_microstep: 714.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-10 21:41:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-10 21:41:51,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.89 | bwd_microstep: 973.09 | bwd_inner_microstep: 973.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3806
[2024-06-10 21:41:53,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.15 | bwd_microstep: 1412.09 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3636
[2024-06-10 21:42:02,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-10 21:42:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 8732.98 | bwd_inner_microstep: 1740.27 | bwd_allreduce_microstep: 6992.66 | step_microstep: 37.76
[2024-06-10 21:42:02,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14971.86 | bwd: 47118.75 | bwd_inner: 40125.19 | bwd_allreduce: 6992.89 | step: 39.20
{'loss': 1.1659, 'learning_rate': 8.421108435153964e-06, 'epoch': 0.71}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 21:42:04,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.73 | bwd_microstep: 1465.12 | bwd_inner_microstep: 1465.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3865
[2024-06-10 21:42:06,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1361.98 | bwd_inner_microstep: 1361.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 21:42:08,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.27 | bwd_microstep: 1552.18 | bwd_inner_microstep: 1552.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 21:42:10,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-10 21:42:12,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1430.85 | bwd_inner_microstep: 1430.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-10 21:42:13,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.99 | bwd_microstep: 777.37 | bwd_inner_microstep: 777.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 21:42:15,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.14 | bwd_microstep: 1532.52 | bwd_inner_microstep: 1532.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 21:42:17,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1386.91 | bwd_inner_microstep: 1386.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-10 21:42:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1413.17 | bwd_inner_microstep: 1413.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-10 21:42:21,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.35 | bwd_microstep: 1640.04 | bwd_inner_microstep: 1640.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3546
[2024-06-10 21:42:23,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.18 | bwd_microstep: 1425.67 | bwd_inner_microstep: 1425.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 21:42:25,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-10 21:42:27,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.19 | bwd_microstep: 1525.17 | bwd_inner_microstep: 1525.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 21:42:29,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1487.61 | bwd_inner_microstep: 1487.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 21:42:31,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.83 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-10 21:42:34,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.88 | bwd_microstep: 1720.26 | bwd_inner_microstep: 1720.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-10 21:42:36,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.83 | bwd_microstep: 1435.94 | bwd_inner_microstep: 1435.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3013
[2024-06-10 21:42:37,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.34 | bwd_microstep: 1275.02 | bwd_inner_microstep: 1274.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982
[2024-06-10 21:42:38,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.27 | bwd_microstep: 827.02 | bwd_inner_microstep: 826.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 21:42:40,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620
[2024-06-10 21:42:41,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.29 | bwd_microstep: 261.94 | bwd_inner_microstep: 261.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2171
[2024-06-10 21:42:42,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.53 | bwd_microstep: 760.66 | bwd_inner_microstep: 760.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 21:42:44,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 21:42:46,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1664.06 | bwd_inner_microstep: 1664.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3599
[2024-06-10 21:42:48,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1339.93 | bwd_inner_microstep: 1339.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-10 21:42:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1419.63 | bwd_inner_microstep: 1419.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 21:42:51,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.13 | bwd_microstep: 1180.78 | bwd_inner_microstep: 1180.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834
[2024-06-10 21:42:54,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.58 | bwd_microstep: 1758.13 | bwd_inner_microstep: 1758.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 21:42:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 21:42:58,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.32 | bwd_microstep: 1551.08 | bwd_inner_microstep: 1551.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 21:43:00,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.84 | bwd_microstep: 1547.24 | bwd_inner_microstep: 1547.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3806
[2024-06-10 21:43:03,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.01 | optimizer_step: 6.60
[2024-06-10 21:43:03,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1925.72 | bwd_inner_microstep: 1571.11 | bwd_allreduce_microstep: 354.55 | step_microstep: 37.40
[2024-06-10 21:43:03,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16309.34 | bwd: 44138.97 | bwd_inner: 43783.52 | bwd_allreduce: 354.78 | step: 38.84
{'loss': 1.1002, 'learning_rate': 8.390524893087505e-06, 'epoch': 0.71}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470
[2024-06-10 21:43:05,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.29 | bwd_microstep: 1565.40 | bwd_inner_microstep: 1565.31 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 21:43:07,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1352.67 | bwd_inner_microstep: 1352.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 21:43:09,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.53 | bwd_microstep: 1373.85 | bwd_inner_microstep: 1373.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 21:43:10,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1280.68 | bwd_inner_microstep: 1280.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 21:43:12,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1390.81 | bwd_inner_microstep: 1390.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 21:43:14,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1354.87 | bwd_inner_microstep: 1354.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3614
[2024-06-10 21:43:16,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1373.12 | bwd_inner_microstep: 1373.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2919
[2024-06-10 21:43:18,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.45 | bwd_microstep: 1091.02 | bwd_inner_microstep: 1091.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 21:43:19,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 793.15 | bwd_inner_microstep: 793.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 21:43:20,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.04 | bwd_microstep: 1181.44 | bwd_inner_microstep: 1181.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 21:43:22,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1380.32 | bwd_inner_microstep: 1380.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3499
[2024-06-10 21:43:24,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1316.05 | bwd_inner_microstep: 1316.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 21:43:26,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1344.20 | bwd_inner_microstep: 1344.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2608
[2024-06-10 21:43:27,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.34 | bwd_microstep: 1044.34 | bwd_inner_microstep: 1044.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3415
[2024-06-10 21:43:29,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1539.05 | bwd_inner_microstep: 1539.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-10 21:43:31,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.28 | bwd_microstep: 1337.30 | bwd_inner_microstep: 1337.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915
[2024-06-10 21:43:34,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.13 | bwd_microstep: 1592.83 | bwd_inner_microstep: 1592.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2089
[2024-06-10 21:43:35,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.77 | bwd_microstep: 820.49 | bwd_inner_microstep: 820.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-10 21:43:36,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.92 | bwd_microstep: 817.41 | bwd_inner_microstep: 817.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-10 21:43:38,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.13 | bwd_microstep: 1418.86 | bwd_inner_microstep: 1418.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074
[2024-06-10 21:43:39,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.64 | bwd_microstep: 823.98 | bwd_inner_microstep: 823.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-10 21:43:41,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.21 | bwd_microstep: 1602.42 | bwd_inner_microstep: 1602.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167
[2024-06-10 21:43:42,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.60 | bwd_microstep: 852.83 | bwd_inner_microstep: 852.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928
[2024-06-10 21:43:43,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.70 | bwd_microstep: 697.09 | bwd_inner_microstep: 697.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3773
[2024-06-10 21:43:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.41 | bwd_microstep: 1250.33 | bwd_inner_microstep: 1250.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-10 21:43:47,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1295.08 | bwd_inner_microstep: 1295.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-10 21:43:49,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.25 | bwd_microstep: 1553.66 | bwd_inner_microstep: 1553.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3829
[2024-06-10 21:43:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1389.32 | bwd_inner_microstep: 1389.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 21:43:53,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1348.97 | bwd_inner_microstep: 1348.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 21:43:54,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.82 | bwd_microstep: 810.73 | bwd_inner_microstep: 810.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 21:43:56,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1349.67 | bwd_inner_microstep: 1349.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3601
[2024-06-10 21:44:04,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.27 | optimizer_step: 6.63
[2024-06-10 21:44:04,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.96 | bwd_microstep: 7727.74 | bwd_inner_microstep: 1769.39 | bwd_allreduce_microstep: 5958.30 | step_microstep: 38.54
[2024-06-10 21:44:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14976.38 | bwd: 46069.69 | bwd_inner: 40110.41 | bwd_allreduce: 5958.58 | step: 40.08
{'loss': 1.1271, 'learning_rate': 8.359982239532016e-06, 'epoch': 0.71}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 21:44:06,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1242.46 | bwd_inner_microstep: 1242.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3966
[2024-06-10 21:44:08,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1692.91 | bwd_inner_microstep: 1692.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3843
[2024-06-10 21:44:10,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1586.12 | bwd_inner_microstep: 1586.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2267
[2024-06-10 21:44:12,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.09 | bwd_microstep: 968.75 | bwd_inner_microstep: 968.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 21:44:13,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485
[2024-06-10 21:44:15,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.49 | bwd_microstep: 1348.13 | bwd_inner_microstep: 1348.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 21:44:17,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1247.13 | bwd_inner_microstep: 1247.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 21:44:19,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1251.42 | bwd_inner_microstep: 1251.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3691
[2024-06-10 21:44:21,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.12 | bwd_microstep: 1587.39 | bwd_inner_microstep: 1587.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 21:44:23,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1352.12 | bwd_inner_microstep: 1352.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3695
[2024-06-10 21:44:25,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1659.36 | bwd_inner_microstep: 1659.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 21:44:27,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.08 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3445
[2024-06-10 21:44:29,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1301.03 | bwd_inner_microstep: 1301.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603
[2024-06-10 21:44:31,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.98 | bwd_microstep: 1467.38 | bwd_inner_microstep: 1467.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 21:44:33,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.89 | bwd_microstep: 1371.14 | bwd_inner_microstep: 1371.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 21:44:34,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2481
[2024-06-10 21:44:36,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.20 | bwd_microstep: 1002.63 | bwd_inner_microstep: 1002.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628
[2024-06-10 21:44:38,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1441.95 | bwd_inner_microstep: 1441.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3651
[2024-06-10 21:44:40,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.87 | bwd_microstep: 1465.69 | bwd_inner_microstep: 1465.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 21:44:42,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.70 | bwd_microstep: 1611.36 | bwd_inner_microstep: 1611.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 21:44:44,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.69 | bwd_microstep: 1306.19 | bwd_inner_microstep: 1306.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 21:44:46,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1555.25 | bwd_inner_microstep: 1555.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-10 21:44:48,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.18 | bwd_microstep: 1404.98 | bwd_inner_microstep: 1404.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 21:44:50,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1353.61 | bwd_inner_microstep: 1353.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 21:44:52,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1609.11 | bwd_inner_microstep: 1609.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 21:44:54,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1253.89 | bwd_inner_microstep: 1253.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-10 21:44:56,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1535.64 | bwd_inner_microstep: 1535.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3435
[2024-06-10 21:44:58,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.43 | bwd_microstep: 1406.14 | bwd_inner_microstep: 1406.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045
[2024-06-10 21:44:59,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.74 | bwd_microstep: 906.40 | bwd_inner_microstep: 906.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 21:45:01,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1500.44 | bwd_inner_microstep: 1500.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 21:45:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.91 | bwd_microstep: 1508.20 | bwd_inner_microstep: 1508.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 21:45:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.13 | optimizer_step: 6.60
[2024-06-10 21:45:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 2171.30 | bwd_inner_microstep: 1508.55 | bwd_allreduce_microstep: 662.70 | step_microstep: 37.63
[2024-06-10 21:45:06,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16507.99 | bwd: 45054.85 | bwd_inner: 44391.25 | bwd_allreduce: 662.93 | step: 39.11
2s/it]


 70%|███████   | 1216/1726 [21:02:34<8:43:45, 61.62s/it]
 71%|███████   | 1217/1726 [21:03:36<8:45:22, 61.93s/it]


 71%|███████   | 1217/1726 [21:03:36<8:45:22, 61.93s/it]
 71%|███████   | 1218/1726 [21:04:39<8:45:34, 62.08s/it]


 71%|███████   | 1218/1726 [21:04:39<8:45:34, 62.08s/it]
 71%|███████   | 1219/1726 [21:05:39<8:41:15, 61.69s/it]


 71%|███████   | 1219/1726 [21:05:39<8:41:15, 61.69s/it]
 71%|███████   | 1220/1726 [21:06:41<8:39:26, 61.59s/it]


 71%|███████   | 1220/1726 [21:06:41<8:39:26, 61.59s/it]
 71%|███████   | 1221/1726 [21:07:43<8:39:10, 61.69s/it]
                 {'loss': 1.2058, 'learning_rate': 8.329480582058574e-06, 'epoch': 0.71}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510
[2024-06-10 21:45:08,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1218.80 | bwd_inner_microstep: 1218.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477
[2024-06-10 21:45:10,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.83 | bwd_microstep: 1344.26 | bwd_inner_microstep: 1344.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-10 21:45:11,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1397.21 | bwd_inner_microstep: 1397.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-10 21:45:14,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.10 | bwd_microstep: 1645.41 | bwd_inner_microstep: 1645.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 21:45:15,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1245.82 | bwd_inner_microstep: 1245.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 21:45:17,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1279.58 | bwd_inner_microstep: 1279.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 21:45:19,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1246.69 | bwd_inner_microstep: 1246.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 21:45:21,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1246.36 | bwd_inner_microstep: 1246.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1018
[2024-06-10 21:45:21,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 163.64 | bwd_microstep: 427.81 | bwd_inner_microstep: 427.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 21:45:22,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 790.60 | bwd_inner_microstep: 790.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 21:45:24,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 1284.75 | bwd_inner_microstep: 1284.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 21:45:26,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.45 | bwd_microstep: 1279.83 | bwd_inner_microstep: 1279.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 21:45:28,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.11 | bwd_microstep: 1480.24 | bwd_inner_microstep: 1480.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 21:45:30,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1492.03 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3628
[2024-06-10 21:45:32,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.33 | bwd_microstep: 1644.80 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 21:45:34,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.17 | bwd_microstep: 1377.66 | bwd_inner_microstep: 1377.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 21:45:36,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.92 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 21:45:38,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1413.58 | bwd_inner_microstep: 1413.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 21:45:40,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1394.90 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3539
[2024-06-10 21:45:42,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1325.72 | bwd_inner_microstep: 1325.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442
[2024-06-10 21:45:43,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.88 | bwd_microstep: 1181.65 | bwd_inner_microstep: 1181.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2206
[2024-06-10 21:45:44,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.20 | bwd_microstep: 767.09 | bwd_inner_microstep: 767.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1947
[2024-06-10 21:45:45,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.43 | bwd_microstep: 761.69 | bwd_inner_microstep: 761.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 21:45:47,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1413.62 | bwd_inner_microstep: 1413.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3423
[2024-06-10 21:45:49,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1300.29 | bwd_inner_microstep: 1300.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2185
[2024-06-10 21:45:50,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.89 | bwd_microstep: 889.88 | bwd_inner_microstep: 889.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3564
[2024-06-10 21:45:52,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1360.72 | bwd_inner_microstep: 1360.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3387
[2024-06-10 21:45:54,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.19 | bwd_microstep: 1438.13 | bwd_inner_microstep: 1438.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2277
[2024-06-10 21:45:55,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.60 | bwd_microstep: 828.19 | bwd_inner_microstep: 828.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3540
[2024-06-10 21:45:57,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.44 | bwd_microstep: 1448.84 | bwd_inner_microstep: 1448.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780
[2024-06-10 21:45:59,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.08 | bwd_microstep: 1444.70 | bwd_inner_microstep: 1444.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2055
[2024-06-10 21:46:08,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.29 | optimizer_step: 6.62
[2024-06-10 21:46:08,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.45 | bwd_microstep: 8414.19 | bwd_inner_microstep: 974.90 | bwd_allreduce_microstep: 7439.23 | step_microstep: 38.34
[2024-06-10 21:46:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14869.54 | bwd: 47071.54 | bwd_inner: 39631.40 | bwd_allreduce: 7439.46 | step: 39.78
{'loss': 1.2015, 'learning_rate': 8.299020028093844e-06, 'epoch': 0.71}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-10 21:46:10,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1399.84 | bwd_inner_microstep: 1399.71 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3981
[2024-06-10 21:46:12,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.41 | bwd_microstep: 1703.05 | bwd_inner_microstep: 1703.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4223
[2024-06-10 21:46:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1654.14 | bwd_inner_microstep: 1654.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3811
[2024-06-10 21:46:17,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.91 | bwd_microstep: 1498.29 | bwd_inner_microstep: 1498.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 21:46:19,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740
[2024-06-10 21:46:21,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.15 | bwd_microstep: 1331.06 | bwd_inner_microstep: 1331.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 21:46:22,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486
[2024-06-10 21:46:23,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.89 | bwd_microstep: 1186.48 | bwd_inner_microstep: 1186.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 21:46:25,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1345.08 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-10 21:46:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1250.32 | bwd_inner_microstep: 1250.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 21:46:29,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.63 | bwd_inner_microstep: 1245.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 21:46:31,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1336.96 | bwd_inner_microstep: 1336.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-10 21:46:32,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.94 | bwd_microstep: 1441.56 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659
[2024-06-10 21:46:34,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.97 | bwd_microstep: 1438.88 | bwd_inner_microstep: 1438.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3420
[2024-06-10 21:46:37,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.66 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-10 21:46:39,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.76 | bwd_microstep: 1501.62 | bwd_inner_microstep: 1501.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 21:46:41,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.01 | bwd_microstep: 1487.07 | bwd_inner_microstep: 1487.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2111
[2024-06-10 21:46:42,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.50 | bwd_microstep: 1016.14 | bwd_inner_microstep: 1016.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 21:46:44,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1287.40 | bwd_inner_microstep: 1287.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297
[2024-06-10 21:46:45,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 976.06 | bwd_inner_microstep: 976.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2170
[2024-06-10 21:46:46,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.24 | bwd_microstep: 885.95 | bwd_inner_microstep: 885.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 21:46:48,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.47 | bwd_microstep: 972.49 | bwd_inner_microstep: 972.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1942
[2024-06-10 21:46:49,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.86 | bwd_microstep: 760.65 | bwd_inner_microstep: 760.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273
[2024-06-10 21:46:50,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.10 | bwd_microstep: 1067.49 | bwd_inner_microstep: 1067.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 21:46:52,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1452.92 | bwd_inner_microstep: 1452.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2282
[2024-06-10 21:46:54,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.21 | bwd_microstep: 1006.31 | bwd_inner_microstep: 1006.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800
[2024-06-10 21:46:56,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.42 | bwd_microstep: 1651.58 | bwd_inner_microstep: 1651.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-10 21:46:57,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.51 | bwd_microstep: 697.87 | bwd_inner_microstep: 697.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 21:46:59,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1652.27 | bwd_inner_microstep: 1652.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004
[2024-06-10 21:47:00,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.11 | bwd_microstep: 709.73 | bwd_inner_microstep: 709.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-10 21:47:02,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1494.45 | bwd_inner_microstep: 1494.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2238
[2024-06-10 21:47:10,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.12 | optimizer_step: 6.59
[2024-06-10 21:47:10,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.27 | bwd_microstep: 7615.54 | bwd_inner_microstep: 982.37 | bwd_allreduce_microstep: 6633.10 | step_microstep: 38.03
[2024-06-10 21:47:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14990.09 | bwd: 46776.48 | bwd_inner: 40142.34 | bwd_allreduce: 6633.40 | step: 39.66
{'loss': 1.2402, 'learning_rate': 8.268600684919765e-06, 'epoch': 0.71}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 21:47:12,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1462.20 | bwd_inner_microstep: 1462.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3971
[2024-06-10 21:47:14,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.44 | bwd_microstep: 1496.86 | bwd_inner_microstep: 1496.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 21:47:16,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.28 | bwd_microstep: 1444.68 | bwd_inner_microstep: 1444.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 21:47:18,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1287.55 | bwd_inner_microstep: 1287.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 21:47:20,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.42 | bwd_microstep: 1283.16 | bwd_inner_microstep: 1283.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 21:47:22,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1394.02 | bwd_inner_microstep: 1393.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 21:47:24,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1379.35 | bwd_inner_microstep: 1379.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 21:47:26,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1281.09 | bwd_inner_microstep: 1281.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629
[2024-06-10 21:47:28,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1535.08 | bwd_inner_microstep: 1535.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3038
[2024-06-10 21:47:29,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.29 | bwd_microstep: 1203.61 | bwd_inner_microstep: 1203.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.86
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553
[2024-06-10 21:47:31,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1325.66 | bwd_inner_microstep: 1325.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 21:47:33,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 21:47:35,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1241.74 | bwd_inner_microstep: 1241.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 21:47:37,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1479.16 | bwd_inner_microstep: 1479.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 21:47:39,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1378.54 | bwd_inner_microstep: 1378.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 21:47:41,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1288.86 | bwd_inner_microstep: 1288.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3647
[2024-06-10 21:47:42,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.81 | bwd_microstep: 1315.21 | bwd_inner_microstep: 1315.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 899
[2024-06-10 21:47:43,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.07 | bwd_microstep: 371.48 | bwd_inner_microstep: 371.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 21:47:45,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.16 | bwd_microstep: 1393.94 | bwd_inner_microstep: 1393.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 21:47:47,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.98 | bwd_microstep: 1653.80 | bwd_inner_microstep: 1653.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 21:47:49,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1399.34 | bwd_inner_microstep: 1399.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 21:47:51,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.90 | bwd_microstep: 1634.62 | bwd_inner_microstep: 1634.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1974
[2024-06-10 21:47:52,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.33 | bwd_microstep: 704.84 | bwd_inner_microstep: 704.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-10 21:47:53,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.82 | bwd_microstep: 813.38 | bwd_inner_microstep: 813.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817
[2024-06-10 21:47:55,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.31 | bwd_microstep: 1474.91 | bwd_inner_microstep: 1474.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481
[2024-06-10 21:47:57,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.52 | bwd_microstep: 1217.51 | bwd_inner_microstep: 1217.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 21:47:59,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1380.91 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 21:48:01,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.52 | bwd_microstep: 1298.37 | bwd_inner_microstep: 1298.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 21:48:03,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 1341.92 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3600
[2024-06-10 21:48:05,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.10 | bwd_microstep: 1665.89 | bwd_inner_microstep: 1665.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3818
[2024-06-10 21:48:07,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.65 | bwd_microstep: 1599.40 | bwd_inner_microstep: 1599.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054
[2024-06-10 21:48:09,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.28 | optimizer_step: 6.59
[2024-06-10 21:48:09,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.00 | bwd_microstep: 1964.44 | bwd_inner_microstep: 1041.43 | bwd_allreduce_microstep: 922.96 | step_microstep: 39.16
[2024-06-10 21:48:09,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15783.97 | bwd: 43058.72 | bwd_inner: 42134.85 | bwd_allreduce: 923.19 | step: 42.45
{'loss': 1.2039, 'learning_rate': 8.238222659673071e-06, 'epoch': 0.71}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3464
[2024-06-10 21:48:12,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.73 | bwd_microstep: 1570.26 | bwd_inner_microstep: 1570.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3928
[2024-06-10 21:48:14,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1493.37 | bwd_inner_microstep: 1493.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2411
[2024-06-10 21:48:15,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.65 | bwd_microstep: 1001.68 | bwd_inner_microstep: 1001.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4267
[2024-06-10 21:48:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.40 | bwd_microstep: 1765.40 | bwd_inner_microstep: 1765.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-10 21:48:20,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.06 | bwd_microstep: 1446.90 | bwd_inner_microstep: 1446.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 21:48:22,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.33 | bwd_microstep: 1540.78 | bwd_inner_microstep: 1540.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-10 21:48:24,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.52 | bwd_microstep: 1533.84 | bwd_inner_microstep: 1533.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-10 21:48:26,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.78 | bwd_microstep: 1283.90 | bwd_inner_microstep: 1283.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 21:48:27,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.65 | bwd_microstep: 1384.83 | bwd_inner_microstep: 1384.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 21:48:29,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.54 | bwd_microstep: 1384.64 | bwd_inner_microstep: 1384.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 21:48:31,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1427.28 | bwd_inner_microstep: 1427.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-10 21:48:33,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452
[2024-06-10 21:48:35,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.27 | bwd_microstep: 1318.42 | bwd_inner_microstep: 1318.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542
[2024-06-10 21:48:37,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.73 | bwd_microstep: 1454.97 | bwd_inner_microstep: 1454.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-10 21:48:39,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.19 | bwd_microstep: 1311.59 | bwd_inner_microstep: 1311.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660
[2024-06-10 21:48:41,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.83 | bwd_microstep: 1716.98 | bwd_inner_microstep: 1716.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 21:48:43,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1285.31 | bwd_inner_microstep: 1285.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 21:48:45,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.83 | bwd_microstep: 1482.59 | bwd_inner_microstep: 1482.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089
[2024-06-10 21:48:46,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.46 | bwd_microstep: 854.30 | bwd_inner_microstep: 854.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3962
[2024-06-10 21:48:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.88 | bwd_microstep: 1803.85 | bwd_inner_microstep: 1803.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-10 21:48:51,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.59 | bwd_microstep: 1615.23 | bwd_inner_microstep: 1615.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-10 21:48:53,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.11 | bwd_microstep: 1181.40 | bwd_inner_microstep: 1181.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 21:48:55,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.96 | bwd_microstep: 1429.17 | bwd_inner_microstep: 1429.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-10 21:48:56,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1288.56 | bwd_inner_microstep: 1288.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3585
[2024-06-10 21:48:58,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.88 | bwd_microstep: 1428.06 | bwd_inner_microstep: 1428.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3546
[2024-06-10 21:49:00,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.92 | bwd_microstep: 1567.45 | bwd_inner_microstep: 1567.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 21:49:02,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.11 | bwd_microstep: 1309.42 | bwd_inner_microstep: 1309.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2073
[2024-06-10 21:49:04,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.20 | bwd_microstep: 916.33 | bwd_inner_microstep: 916.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 21:49:06,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.70 | bwd_microstep: 1399.50 | bwd_inner_microstep: 1399.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3449
[2024-06-10 21:49:08,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.36 | bwd_microstep: 1514.04 | bwd_inner_microstep: 1514.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 21:49:10,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.63 | bwd_microstep: 1505.22 | bwd_inner_microstep: 1505.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472
[2024-06-10 21:49:12,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.04 | optimizer_step: 6.62
[2024-06-10 21:49:12,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.93 | bwd_microstep: 1450.26 | bwd_inner_microstep: 1442.61 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.35
[2024-06-10 21:49:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16814.27 | bwd: 45063.10 | bwd_inner: 45054.60 | bwd_allreduce: 7.83 | step: 38.83
{'loss': 1.1784, 'learning_rate': 8.207886059345034e-06, 'epoch': 0.71}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-10 21:49:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1377.29 | bwd_inner_microstep: 1377.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2448
[2024-06-10 21:49:15,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.48 | bwd_microstep: 948.13 | bwd_inner_microstep: 948.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 21:49:17,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.54 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843
[2024-06-10 21:49:19,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1663.75 | bwd_inner_microstep: 1663.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-10 21:49:21,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1444.40 | bwd_inner_microstep: 1444.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 21:49:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1383.64 | bwd_inner_microstep: 1383.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3771
[2024-06-10 21:49:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1488.19 | bwd_inner_microstep: 1488.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 21:49:27,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1285.87 | bwd_inner_microstep: 1285.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-10 21:49:28,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.54 | bwd_microstep: 1183.99 | bwd_inner_microstep: 1183.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 21:49:30,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.00 | bwd_microstep: 1390.38 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916
[2024-06-10 21:49:31,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.71 | bwd_microstep: 718.02 | bwd_inner_microstep: 718.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 21:49:32,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.19 | bwd_microstep: 806.24 | bwd_inner_microstep: 806.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511
[2024-06-10 21:49:34,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.50 | bwd_microstep: 1445.14 | bwd_inner_microstep: 1445.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1928
[2024-06-10 21:49:36,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.52 | bwd_microstep: 851.48 | bwd_inner_microstep: 851.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3641
[2024-06-10 21:49:38,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.99 | bwd_microstep: 1551.44 | bwd_inner_microstep: 1551.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 21:49:40,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.03 | bwd_microstep: 1340.02 | bwd_inner_microstep: 1340.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 21:49:42,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.21 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897
[2024-06-10 21:49:43,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.61 | bwd_microstep: 777.92 | bwd_inner_microstep: 777.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850
[2024-06-10 21:49:45,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.33 | bwd_microstep: 1664.61 | bwd_inner_microstep: 1664.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3869
[2024-06-10 21:49:47,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1471.90 | bwd_inner_microstep: 1471.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-10 21:49:49,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1491.81 | bwd_inner_microstep: 1491.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 21:49:51,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 1344.27 | bwd_inner_microstep: 1344.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-10 21:49:53,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1576.32 | bwd_inner_microstep: 1576.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 21:49:55,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.73 | bwd_microstep: 1348.55 | bwd_inner_microstep: 1348.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 21:49:57,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1247.99 | bwd_inner_microstep: 1247.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2017
[2024-06-10 21:49:58,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.24 | bwd_microstep: 713.96 | bwd_inner_microstep: 713.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-10 21:49:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.00 | bwd_microstep: 803.07 | bwd_inner_microstep: 803.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 21:50:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1287.93 | bwd_inner_microstep: 1287.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3599
[2024-06-10 21:50:03,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1372.06 | bwd_inner_microstep: 1372.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 21:50:04,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1297.46 | bwd_inner_microstep: 1297.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584
[2024-06-10 21:50:07,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.73 | bwd_microstep: 1528.43 | bwd_inner_microstep: 1528.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3771
[2024-06-10 21:50:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.34 | optimizer_step: 6.60
[2024-06-10 21:50:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.28 | bwd_microstep: 4958.90 | bwd_inner_microstep: 1781.41 | bwd_allreduce_microstep: 3177.43 | step_microstep: 38.56
[2024-06-10 21:50:12,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15438.52 | bwd: 44631.40 | bwd_inner: 41453.04 | bwd_allreduce: 3177.67 | step: 40.07


 71%|███████   | 1221/1726 [21:07:43<8:39:10, 61.69s/it]
 71%|███████   | 1222/1726 [21:08:45<8:39:36, 61.86s/it]


 71%|███████   | 1222/1726 [21:08:45<8:39:36, 61.86s/it]
 71%|███████   | 1223/1726 [21:09:47<8:39:11, 61.93s/it]


 71%|███████   | 1223/1726 [21:09:47<8:39:11, 61.93s/it]
 71%|███████   | 1224/1726 [21:10:46<8:31:13, 61.10s/it]


 71%|███████   | 1224/1726 [21:10:46<8:31:13, 61.10s/it]
 71%|███████   | 1225/1726 [21:11:48<8:33:00, 61.44s/it]


 71%|███████   | 1225/1726 [21:11:48<8:33:00, 61.44s/it]
 71%|███████   | 1226/1726 [21:12:49<8:29:22, 61.13s/it]
                                         {'loss': 1.1285, 'learning_rate': 8.177590990780988e-06, 'epoch': 0.71}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 21:50:14,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.94 | bwd_microstep: 1248.43 | bwd_inner_microstep: 1248.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2437
[2024-06-10 21:50:15,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.92 | bwd_microstep: 1011.65 | bwd_inner_microstep: 1011.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2332
[2024-06-10 21:50:17,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.74 | bwd_microstep: 984.16 | bwd_inner_microstep: 984.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 21:50:18,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1242.82 | bwd_inner_microstep: 1242.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-10 21:50:20,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.86 | bwd_microstep: 1279.93 | bwd_inner_microstep: 1279.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 21:50:22,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1384.65 | bwd_inner_microstep: 1384.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1987
[2024-06-10 21:50:23,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.88 | bwd_microstep: 737.48 | bwd_inner_microstep: 737.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 21:50:25,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1532.45 | bwd_inner_microstep: 1532.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2098
[2024-06-10 21:50:26,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.53 | bwd_microstep: 852.87 | bwd_inner_microstep: 852.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3674
[2024-06-10 21:50:29,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.14 | bwd_microstep: 1582.72 | bwd_inner_microstep: 1582.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2957
[2024-06-10 21:50:30,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.57 | bwd_microstep: 1192.23 | bwd_inner_microstep: 1192.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3448
[2024-06-10 21:50:32,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1548.73 | bwd_inner_microstep: 1548.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1950
[2024-06-10 21:50:34,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.82 | bwd_microstep: 885.07 | bwd_inner_microstep: 885.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 21:50:35,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1348.90 | bwd_inner_microstep: 1348.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2731
[2024-06-10 21:50:37,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.09 | bwd_microstep: 1232.31 | bwd_inner_microstep: 1232.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-10 21:50:39,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1184.82 | bwd_inner_microstep: 1184.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 21:50:40,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.91 | bwd_microstep: 1156.21 | bwd_inner_microstep: 1156.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521
[2024-06-10 21:50:42,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1454.70 | bwd_inner_microstep: 1454.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 21:50:44,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1508.42 | bwd_inner_microstep: 1508.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3658
[2024-06-10 21:50:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.99 | bwd_microstep: 1450.17 | bwd_inner_microstep: 1450.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449
[2024-06-10 21:50:48,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-10 21:50:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 802.20 | bwd_inner_microstep: 802.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 21:50:51,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1285.40 | bwd_inner_microstep: 1285.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 21:50:53,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.89 | bwd_microstep: 1500.71 | bwd_inner_microstep: 1500.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 21:50:55,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1308.77 | bwd_inner_microstep: 1308.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3713
[2024-06-10 21:50:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1478.81 | bwd_inner_microstep: 1478.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 21:50:59,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1543.00 | bwd_inner_microstep: 1542.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 21:51:01,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1634.94 | bwd_inner_microstep: 1634.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 21:51:03,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.95 | bwd_microstep: 1450.58 | bwd_inner_microstep: 1450.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3425
[2024-06-10 21:51:05,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1308.11 | bwd_inner_microstep: 1308.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-10 21:51:07,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.48 | bwd_microstep: 1589.18 | bwd_inner_microstep: 1589.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 21:51:15,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-10 21:51:15,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 6936.90 | bwd_inner_microstep: 1699.79 | bwd_allreduce_microstep: 5237.06 | step_microstep: 37.96
[2024-06-10 21:51:15,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15547.55 | bwd: 46945.22 | bwd_inner: 41707.23 | bwd_allreduce: 5237.29 | step: 39.50
{'loss': 1.1495, 'learning_rate': 8.147337560680022e-06, 'epoch': 0.71}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992
[2024-06-10 21:51:16,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.93 | bwd_microstep: 890.09 | bwd_inner_microstep: 889.95 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-10 21:51:18,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1475.34 | bwd_inner_microstep: 1475.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844
[2024-06-10 21:51:20,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1456.80 | bwd_inner_microstep: 1456.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431
[2024-06-10 21:51:22,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.68 | bwd_microstep: 1157.11 | bwd_inner_microstep: 1157.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228
[2024-06-10 21:51:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.45 | bwd_microstep: 957.52 | bwd_inner_microstep: 957.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 21:51:25,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1388.86 | bwd_inner_microstep: 1388.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884
[2024-06-10 21:51:26,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.98 | bwd_microstep: 679.32 | bwd_inner_microstep: 679.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-10 21:51:28,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1313.41 | bwd_inner_microstep: 1313.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 21:51:30,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1388.40 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 21:51:31,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.44 | bwd_microstep: 1180.25 | bwd_inner_microstep: 1180.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2140
[2024-06-10 21:51:33,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.77 | bwd_microstep: 891.53 | bwd_inner_microstep: 891.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-10 21:51:35,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1576.72 | bwd_inner_microstep: 1576.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 21:51:37,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1391.90 | bwd_inner_microstep: 1391.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463
[2024-06-10 21:51:39,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1404.65 | bwd_inner_microstep: 1404.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 21:51:41,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1382.51 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3559
[2024-06-10 21:51:43,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.63 | bwd_microstep: 1691.99 | bwd_inner_microstep: 1691.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823
[2024-06-10 21:51:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.60 | bwd_microstep: 1749.21 | bwd_inner_microstep: 1749.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 21:51:46,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 799.92 | bwd_inner_microstep: 799.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294
[2024-06-10 21:51:48,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.89 | bwd_microstep: 975.12 | bwd_inner_microstep: 975.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-10 21:51:50,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1408.66 | bwd_inner_microstep: 1408.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 21:51:52,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1494.20 | bwd_inner_microstep: 1494.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 21:51:54,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.23 | bwd_microstep: 1343.13 | bwd_inner_microstep: 1343.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3608
[2024-06-10 21:51:56,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.38 | bwd_microstep: 1641.01 | bwd_inner_microstep: 1640.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 21:51:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 1454.38 | bwd_inner_microstep: 1454.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 21:52:00,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1279.88 | bwd_inner_microstep: 1279.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 21:52:02,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1495.63 | bwd_inner_microstep: 1495.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-10 21:52:03,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.80 | bwd_microstep: 1200.94 | bwd_inner_microstep: 1200.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565
[2024-06-10 21:52:06,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1558.25 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 21:52:07,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.70 | bwd_microstep: 802.02 | bwd_inner_microstep: 801.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 21:52:09,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.45 | bwd_microstep: 1477.13 | bwd_inner_microstep: 1477.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3765
[2024-06-10 21:52:11,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.68 | bwd_microstep: 1843.68 | bwd_inner_microstep: 1843.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2139
[2024-06-10 21:52:18,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-10 21:52:18,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.10 | bwd_microstep: 6108.03 | bwd_inner_microstep: 982.33 | bwd_allreduce_microstep: 5125.65 | step_microstep: 37.81
[2024-06-10 21:52:18,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15578.73 | bwd: 46857.63 | bwd_inner: 41730.98 | bwd_allreduce: 5125.93 | step: 39.28
{'loss': 1.18, 'learning_rate': 8.11712587559455e-06, 'epoch': 0.71}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3483
[2024-06-10 21:52:20,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 1569.70 | bwd_inner_microstep: 1569.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3947
[2024-06-10 21:52:22,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.11 | bwd_microstep: 1489.02 | bwd_inner_microstep: 1488.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4471
[2024-06-10 21:52:24,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.36 | bwd_microstep: 1824.91 | bwd_inner_microstep: 1824.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-10 21:52:26,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1492.70 | bwd_inner_microstep: 1492.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 21:52:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 1278.67 | bwd_inner_microstep: 1278.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 21:52:30,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1382.06 | bwd_inner_microstep: 1382.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-10 21:52:32,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.49 | bwd_microstep: 1274.55 | bwd_inner_microstep: 1274.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 21:52:34,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.59 | bwd_microstep: 1473.00 | bwd_inner_microstep: 1472.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 21:52:36,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.23 | bwd_microstep: 1403.02 | bwd_inner_microstep: 1402.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3479
[2024-06-10 21:52:38,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 21:52:40,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1376.52 | bwd_inner_microstep: 1376.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497
[2024-06-10 21:52:42,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.48 | bwd_microstep: 1503.18 | bwd_inner_microstep: 1503.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 21:52:44,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1384.21 | bwd_inner_microstep: 1384.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 21:52:45,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1353.46 | bwd_inner_microstep: 1353.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3503
[2024-06-10 21:52:47,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.03 | bwd_microstep: 1462.71 | bwd_inner_microstep: 1462.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 21:52:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1377.22 | bwd_inner_microstep: 1377.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 21:52:51,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1510.38 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 21:52:53,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831
[2024-06-10 21:52:55,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.31 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 21:52:57,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-10 21:52:59,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.68 | bwd_microstep: 1622.91 | bwd_inner_microstep: 1622.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 21:53:01,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-10 21:53:03,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1484.62 | bwd_inner_microstep: 1484.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 21:53:05,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.13 | bwd_microstep: 1551.70 | bwd_inner_microstep: 1551.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003
[2024-06-10 21:53:07,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.69 | bwd_microstep: 739.10 | bwd_inner_microstep: 739.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-10 21:53:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.73 | bwd_microstep: 1358.04 | bwd_inner_microstep: 1358.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-10 21:53:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1443.84 | bwd_inner_microstep: 1443.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-10 21:53:12,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.45 | bwd_microstep: 1440.42 | bwd_inner_microstep: 1440.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2026
[2024-06-10 21:53:14,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.04 | bwd_microstep: 839.17 | bwd_inner_microstep: 839.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2033
[2024-06-10 21:53:15,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.01 | bwd_microstep: 714.15 | bwd_inner_microstep: 714.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 21:53:16,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1408.10 | bwd_inner_microstep: 1408.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572
[2024-06-10 21:53:21,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.32 | optimizer_step: 6.59
[2024-06-10 21:53:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.19 | bwd_microstep: 4266.48 | bwd_inner_microstep: 1757.07 | bwd_allreduce_microstep: 2509.34 | step_microstep: 38.57
[2024-06-10 21:53:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16511.06 | bwd: 46825.65 | bwd_inner: 44315.39 | bwd_allreduce: 2509.58 | step: 39.98
{'loss': 1.2666, 'learning_rate': 8.08695604192997e-06, 'epoch': 0.71}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3513
[2024-06-10 21:53:23,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.39 | bwd_microstep: 1340.15 | bwd_inner_microstep: 1340.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930
[2024-06-10 21:53:24,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 789.42 | bwd_inner_microstep: 789.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 21:53:26,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1377.67 | bwd_inner_microstep: 1377.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3877
[2024-06-10 21:53:28,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.49 | bwd_microstep: 1480.37 | bwd_inner_microstep: 1480.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 21:53:30,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 21:53:32,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.08 | bwd_microstep: 1292.31 | bwd_inner_microstep: 1292.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851
[2024-06-10 21:53:34,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1463.22 | bwd_inner_microstep: 1463.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2383
[2024-06-10 21:53:35,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.29 | bwd_microstep: 932.68 | bwd_inner_microstep: 932.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 21:53:37,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1247.71 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691
[2024-06-10 21:53:39,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.64 | bwd_microstep: 1327.16 | bwd_inner_microstep: 1327.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2101
[2024-06-10 21:53:40,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.95 | bwd_microstep: 760.32 | bwd_inner_microstep: 760.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 21:53:42,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 1348.97 | bwd_inner_microstep: 1348.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3520
[2024-06-10 21:53:44,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1461.40 | bwd_inner_microstep: 1461.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-10 21:53:46,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.44 | bwd_microstep: 1447.27 | bwd_inner_microstep: 1447.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1958
[2024-06-10 21:53:47,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.65 | bwd_microstep: 852.51 | bwd_inner_microstep: 852.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 21:53:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1253.97 | bwd_inner_microstep: 1253.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-10 21:53:50,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 798.29 | bwd_inner_microstep: 798.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 21:53:52,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1387.42 | bwd_inner_microstep: 1387.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3600
[2024-06-10 21:53:54,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.23 | bwd_microstep: 1470.61 | bwd_inner_microstep: 1470.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-10 21:53:56,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.20 | bwd_microstep: 1300.52 | bwd_inner_microstep: 1300.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 21:53:58,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.62 | bwd_microstep: 1660.49 | bwd_inner_microstep: 1660.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 21:54:00,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1253.44 | bwd_inner_microstep: 1253.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 21:54:02,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1349.45 | bwd_inner_microstep: 1349.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3703
[2024-06-10 21:54:04,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.95 | bwd_microstep: 1483.06 | bwd_inner_microstep: 1483.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 21:54:05,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 798.95 | bwd_inner_microstep: 798.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 21:54:06,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.85 | bwd_microstep: 885.07 | bwd_inner_microstep: 885.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583
[2024-06-10 21:54:08,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1501.90 | bwd_inner_microstep: 1501.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3612
[2024-06-10 21:54:10,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.69 | bwd_microstep: 1705.28 | bwd_inner_microstep: 1705.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 21:54:12,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 1493.21 | bwd_inner_microstep: 1493.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900
[2024-06-10 21:54:14,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.79 | bwd_microstep: 777.40 | bwd_inner_microstep: 777.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 21:54:16,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.36 | bwd_microstep: 1496.80 | bwd_inner_microstep: 1496.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3587
[2024-06-10 21:54:23,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 21:54:23,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 6900.89 | bwd_inner_microstep: 1543.84 | bwd_allreduce_microstep: 5357.00 | step_microstep: 38.01
[2024-06-10 21:54:23,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15209.21 | bwd: 46127.06 | bwd_inner: 40769.17 | bwd_allreduce: 5357.22 | step: 39.49
{'loss': 1.1472, 'learning_rate': 8.056828165944282e-06, 'epoch': 0.71}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2034
[2024-06-10 21:54:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.25 | bwd_microstep: 893.40 | bwd_inner_microstep: 893.33 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 21:54:26,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1375.68 | bwd_inner_microstep: 1375.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 21:54:28,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.58 | bwd_microstep: 1455.41 | bwd_inner_microstep: 1455.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 21:54:30,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.85 | bwd_microstep: 1490.67 | bwd_inner_microstep: 1490.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 21:54:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1380.26 | bwd_inner_microstep: 1380.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 21:54:34,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 21:54:36,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1281.58 | bwd_inner_microstep: 1281.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 21:54:37,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.02 | bwd_microstep: 678.34 | bwd_inner_microstep: 678.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3410
[2024-06-10 21:54:38,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.73 | bwd_microstep: 1197.44 | bwd_inner_microstep: 1197.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2645
[2024-06-10 21:54:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.97 | bwd_microstep: 1114.50 | bwd_inner_microstep: 1114.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1878
[2024-06-10 21:54:41,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.94 | bwd_microstep: 832.80 | bwd_inner_microstep: 832.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795
[2024-06-10 21:54:43,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1640.13 | bwd_inner_microstep: 1640.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-10 21:54:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.63 | bwd_microstep: 1407.03 | bwd_inner_microstep: 1407.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 21:54:46,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.82 | bwd_microstep: 792.52 | bwd_inner_microstep: 792.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1389
[2024-06-10 21:54:47,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.86 | bwd_microstep: 525.60 | bwd_inner_microstep: 525.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659
[2024-06-10 21:54:49,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1456.36 | bwd_inner_microstep: 1456.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633
[2024-06-10 21:54:51,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1311.32 | bwd_inner_microstep: 1311.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-10 21:54:53,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1419.39 | bwd_inner_microstep: 1419.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 21:54:55,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.51 | bwd_microstep: 1286.30 | bwd_inner_microstep: 1286.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 21:54:56,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1255.20 | bwd_inner_microstep: 1255.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449
[2024-06-10 21:54:58,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.20 | bwd_microstep: 1314.98 | bwd_inner_microstep: 1314.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3711
[2024-06-10 21:55:00,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1332.27 | bwd_inner_microstep: 1332.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 21:55:02,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1465.37 | bwd_inner_microstep: 1465.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3819
[2024-06-10 21:55:04,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.73 | bwd_microstep: 1414.16 | bwd_inner_microstep: 1414.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2245
[2024-06-10 21:55:05,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.88 | bwd_microstep: 871.96 | bwd_inner_microstep: 871.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 903
[2024-06-10 21:55:06,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 159.67 | bwd_microstep: 404.11 | bwd_inner_microstep: 404.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2060
[2024-06-10 21:55:07,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.65 | bwd_microstep: 911.24 | bwd_inner_microstep: 911.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2041
[2024-06-10 21:55:08,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.66 | bwd_microstep: 933.32 | bwd_inner_microstep: 933.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596
[2024-06-10 21:55:10,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1463.79 | bwd_inner_microstep: 1463.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101
[2024-06-10 21:55:11,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.74 | bwd_microstep: 821.91 | bwd_inner_microstep: 821.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3610
[2024-06-10 21:55:14,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.20 | bwd_microstep: 1772.16 | bwd_inner_microstep: 1772.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 21:55:25,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.50 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 21:55:25,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 10113.09 | bwd_inner_microstep: 1870.93 | bwd_allreduce_microstep: 8242.10 | step_microstep: 39.07
[2024-06-10 21:55:25,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14382.61 | bwd: 46857.01 | bwd_inner: 38613.94 | bwd_allreduce: 8242.37 | step: 40.52
{'loss': 1.1982, 'learning_rate': 8.026742353747698e-06, 'epoch': 0.71}


 71%|███████   | 1226/1726 [21:12:49<8:29:22, 61.13s/it]
 71%|███████   | 1227/1726 [21:13:52<8:32:35, 61.64s/it]


 71%|███████   | 1227/1726 [21:13:52<8:32:35, 61.64s/it]
 71%|███████   | 1228/1726 [21:14:54<8:34:22, 61.97s/it]


 71%|███████   | 1228/1726 [21:14:54<8:34:22, 61.97s/it]
 71%|███████   | 1229/1726 [21:15:58<8:37:34, 62.48s/it]


 71%|███████   | 1229/1726 [21:15:58<8:37:34, 62.48s/it]
 71%|███████▏  | 1230/1726 [21:17:00<8:34:29, 62.24s/it]


 71%|███████▏  | 1230/1726 [21:17:00<8:34:29, 62.24s/it]
 71%|███████▏  | 1231/1726 [21:18:01<8:31:47, 62.04s/it]


 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 21:55:26,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.74 | bwd_microstep: 1368.68 | bwd_inner_microstep: 1368.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 21:55:28,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.24 | bwd_microstep: 1285.91 | bwd_inner_microstep: 1285.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-10 21:55:30,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.01 | bwd_microstep: 1447.25 | bwd_inner_microstep: 1447.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 21:55:32,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1434.13 | bwd_inner_microstep: 1434.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-10 21:55:34,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.98 | bwd_microstep: 1379.43 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 21:55:36,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.27 | bwd_microstep: 1535.45 | bwd_inner_microstep: 1535.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 21:55:38,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.45 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-10 21:55:40,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1393.36 | bwd_inner_microstep: 1393.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 21:55:42,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1385.20 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 21:55:44,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1449.16 | bwd_inner_microstep: 1449.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-10 21:55:46,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 21:55:47,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1291.58 | bwd_inner_microstep: 1291.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3645
[2024-06-10 21:55:49,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.09 | bwd_microstep: 1471.83 | bwd_inner_microstep: 1471.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 21:55:51,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.28 | bwd_microstep: 1383.20 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 21:55:53,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1375.68 | bwd_inner_microstep: 1375.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3631
[2024-06-10 21:55:56,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.66 | bwd_microstep: 1808.10 | bwd_inner_microstep: 1808.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404
[2024-06-10 21:55:57,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.90 | bwd_microstep: 1308.99 | bwd_inner_microstep: 1308.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2135
[2024-06-10 21:55:59,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.00 | bwd_microstep: 928.31 | bwd_inner_microstep: 928.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-10 21:56:01,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.27 | bwd_microstep: 1648.52 | bwd_inner_microstep: 1648.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3667
[2024-06-10 21:56:03,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.95 | bwd_microstep: 1486.31 | bwd_inner_microstep: 1486.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 940
[2024-06-10 21:56:04,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.37 | bwd_microstep: 377.84 | bwd_inner_microstep: 377.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-10 21:56:06,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.62 | bwd_microstep: 1601.51 | bwd_inner_microstep: 1601.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 21:56:08,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.51 | bwd_microstep: 1258.56 | bwd_inner_microstep: 1258.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 21:56:09,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1286.65 | bwd_inner_microstep: 1286.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 21:56:11,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.33 | bwd_microstep: 1401.85 | bwd_inner_microstep: 1401.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3649
[2024-06-10 21:56:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1610.98 | bwd_inner_microstep: 1610.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 21:56:15,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1409.72 | bwd_inner_microstep: 1409.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-10 21:56:17,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.01 | bwd_microstep: 1485.62 | bwd_inner_microstep: 1485.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 21:56:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1445.30 | bwd_inner_microstep: 1445.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2233
[2024-06-10 21:56:21,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.75 | bwd_microstep: 835.93 | bwd_inner_microstep: 835.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3815
[2024-06-10 21:56:23,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.97 | bwd_microstep: 1703.75 | bwd_inner_microstep: 1703.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2359
[2024-06-10 21:56:25,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.03 | optimizer_step: 6.58
[2024-06-10 21:56:25,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.74 | bwd_microstep: 2073.34 | bwd_inner_microstep: 1202.97 | bwd_allreduce_microstep: 870.32 | step_microstep: 37.49
[2024-06-10 21:56:25,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16189.44 | bwd: 44347.62 | bwd_inner: 43476.40 | bwd_allreduce: 870.54 | step: 38.99
{'loss': 1.1704, 'learning_rate': 7.996698711302315e-06, 'epoch': 0.71}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535
[2024-06-10 21:56:27,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1417.68 | bwd_inner_microstep: 1417.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 21:56:29,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.41 | bwd_microstep: 788.78 | bwd_inner_microstep: 788.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 21:56:30,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1342.84 | bwd_inner_microstep: 1342.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 21:56:32,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1246.41 | bwd_inner_microstep: 1246.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 21:56:34,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1384.08 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4124
[2024-06-10 21:56:36,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 1440.03 | bwd_inner_microstep: 1440.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 21:56:38,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.16 | bwd_microstep: 1479.94 | bwd_inner_microstep: 1479.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3698
[2024-06-10 21:56:40,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1456.72 | bwd_inner_microstep: 1456.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-10 21:56:41,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.18 | bwd_microstep: 775.66 | bwd_inner_microstep: 775.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 21:56:43,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.70 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-10 21:56:45,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.91 | bwd_microstep: 1616.18 | bwd_inner_microstep: 1616.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3677
[2024-06-10 21:56:47,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.64 | bwd_microstep: 1580.74 | bwd_inner_microstep: 1580.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 21:56:49,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1348.23 | bwd_inner_microstep: 1348.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3390
[2024-06-10 21:56:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.89 | bwd_microstep: 1273.69 | bwd_inner_microstep: 1273.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055
[2024-06-10 21:56:52,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.89 | bwd_microstep: 914.31 | bwd_inner_microstep: 914.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519
[2024-06-10 21:56:54,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1418.91 | bwd_inner_microstep: 1418.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2128
[2024-06-10 21:56:55,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.78 | bwd_microstep: 831.56 | bwd_inner_microstep: 831.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 21:56:57,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.68 | bwd_microstep: 1285.30 | bwd_inner_microstep: 1285.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 21:56:59,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1557.51 | bwd_inner_microstep: 1557.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 21:57:01,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.35 | bwd_microstep: 1490.93 | bwd_inner_microstep: 1490.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702
[2024-06-10 21:57:03,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1327.55 | bwd_inner_microstep: 1327.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-10 21:57:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.20 | bwd_microstep: 1431.91 | bwd_inner_microstep: 1431.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 21:57:07,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.59 | bwd_microstep: 1457.29 | bwd_inner_microstep: 1457.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 21:57:08,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.25 | bwd_microstep: 804.93 | bwd_inner_microstep: 804.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3553
[2024-06-10 21:57:10,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1260.79 | bwd_inner_microstep: 1260.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 21:57:11,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.38 | bwd_microstep: 815.35 | bwd_inner_microstep: 815.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2030
[2024-06-10 21:57:12,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.06 | bwd_microstep: 866.01 | bwd_inner_microstep: 865.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-10 21:57:15,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.29 | bwd_microstep: 1535.48 | bwd_inner_microstep: 1535.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2059
[2024-06-10 21:57:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.01 | bwd_microstep: 847.61 | bwd_inner_microstep: 847.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-10 21:57:17,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.21 | bwd_microstep: 974.98 | bwd_inner_microstep: 974.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-10 21:57:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.20 | bwd_microstep: 1480.27 | bwd_inner_microstep: 1480.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2034
[2024-06-10 21:57:26,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.10 | optimizer_step: 6.60
[2024-06-10 21:57:26,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.84 | bwd_microstep: 6731.60 | bwd_inner_microstep: 1037.58 | bwd_allreduce_microstep: 5693.98 | step_microstep: 37.86
[2024-06-10 21:57:26,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14882.22 | bwd: 45564.05 | bwd_inner: 39869.17 | bwd_allreduce: 5694.20 | step: 39.30
{'loss': 1.217, 'learning_rate': 7.966697344421658e-06, 'epoch': 0.71}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 21:57:28,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.40 | bwd_microstep: 1269.10 | bwd_inner_microstep: 1269.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 21:57:30,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1343.96 | bwd_inner_microstep: 1343.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 21:57:32,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.42 | bwd_microstep: 1475.54 | bwd_inner_microstep: 1475.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3877
[2024-06-10 21:57:34,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.98 | bwd_microstep: 1545.00 | bwd_inner_microstep: 1544.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 21:57:36,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.73 | bwd_microstep: 1476.87 | bwd_inner_microstep: 1476.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 21:57:37,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.09 | bwd_microstep: 788.74 | bwd_inner_microstep: 788.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740
[2024-06-10 21:57:39,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1632.54 | bwd_inner_microstep: 1632.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 21:57:41,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.10 | bwd_microstep: 1352.46 | bwd_inner_microstep: 1352.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 21:57:43,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1244.79 | bwd_inner_microstep: 1244.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 21:57:45,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1292.27 | bwd_inner_microstep: 1292.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 21:57:47,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.06 | bwd_microstep: 1277.42 | bwd_inner_microstep: 1277.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 21:57:48,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1388.02 | bwd_inner_microstep: 1387.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-10 21:57:50,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.10 | bwd_microstep: 1314.07 | bwd_inner_microstep: 1314.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2173
[2024-06-10 21:57:52,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.82 | bwd_microstep: 945.65 | bwd_inner_microstep: 945.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-10 21:57:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.54 | bwd_microstep: 673.92 | bwd_inner_microstep: 673.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-10 21:57:54,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.58 | bwd_microstep: 1306.14 | bwd_inner_microstep: 1306.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2173
[2024-06-10 21:57:56,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.91 | bwd_microstep: 852.01 | bwd_inner_microstep: 851.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 21:57:57,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1253.51 | bwd_inner_microstep: 1253.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3653
[2024-06-10 21:57:59,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1481.30 | bwd_inner_microstep: 1481.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-10 21:58:01,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1492.19 | bwd_inner_microstep: 1492.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2678
[2024-06-10 21:58:03,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.45 | bwd_microstep: 1125.34 | bwd_inner_microstep: 1125.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 21:58:05,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.76 | bwd_microstep: 1658.80 | bwd_inner_microstep: 1658.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 21:58:07,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1295.03 | bwd_inner_microstep: 1295.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3579
[2024-06-10 21:58:09,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.40 | bwd_microstep: 1205.82 | bwd_inner_microstep: 1205.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-10 21:58:11,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.00 | bwd_microstep: 1310.03 | bwd_inner_microstep: 1310.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3851
[2024-06-10 21:58:13,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.54 | bwd_microstep: 1600.12 | bwd_inner_microstep: 1600.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952
[2024-06-10 21:58:14,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.08 | bwd_microstep: 729.30 | bwd_inner_microstep: 729.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3737
[2024-06-10 21:58:16,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.97 | bwd_microstep: 1680.75 | bwd_inner_microstep: 1680.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3558
[2024-06-10 21:58:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.29 | bwd_microstep: 1471.17 | bwd_inner_microstep: 1471.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 21:58:20,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.16 | bwd_microstep: 1486.46 | bwd_inner_microstep: 1486.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3770
[2024-06-10 21:58:22,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.63 | bwd_microstep: 1673.28 | bwd_inner_microstep: 1673.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-10 21:58:27,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 21:58:27,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.35 | bwd_microstep: 4247.78 | bwd_inner_microstep: 1816.97 | bwd_allreduce_microstep: 2430.76 | step_microstep: 37.79
[2024-06-10 21:58:27,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15845.78 | bwd: 44889.40 | bwd_inner: 42457.74 | bwd_allreduce: 2430.99 | step: 39.30
{'loss': 1.159, 'learning_rate': 7.936738358770409e-06, 'epoch': 0.71}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414
[2024-06-10 21:58:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.89 | bwd_microstep: 1365.40 | bwd_inner_microstep: 1365.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3979
[2024-06-10 21:58:32,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.24 | bwd_microstep: 1702.80 | bwd_inner_microstep: 1702.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3936
[2024-06-10 21:58:34,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1491.59 | bwd_inner_microstep: 1491.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 21:58:36,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.61 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 21:58:38,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1378.07 | bwd_inner_microstep: 1378.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 21:58:39,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.20 | bwd_microstep: 791.27 | bwd_inner_microstep: 791.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 21:58:41,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1286.85 | bwd_inner_microstep: 1286.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644
[2024-06-10 21:58:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1316.78 | bwd_inner_microstep: 1316.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 21:58:44,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1388.22 | bwd_inner_microstep: 1388.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 21:58:46,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.67 | bwd_microstep: 919.39 | bwd_inner_microstep: 919.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-10 21:58:48,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1525.26 | bwd_inner_microstep: 1525.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2137
[2024-06-10 21:58:49,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.08 | bwd_microstep: 1022.88 | bwd_inner_microstep: 1022.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-10 21:58:51,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1407.07 | bwd_inner_microstep: 1407.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3513
[2024-06-10 21:58:53,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.28 | bwd_microstep: 1256.39 | bwd_inner_microstep: 1256.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 21:58:54,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 792.28 | bwd_inner_microstep: 792.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 21:58:56,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.04 | bwd_microstep: 1497.65 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 21:58:58,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.31 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1391.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 21:59:00,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1656.35 | bwd_inner_microstep: 1656.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-10 21:59:02,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.14 | bwd_microstep: 1427.98 | bwd_inner_microstep: 1427.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999
[2024-06-10 21:59:03,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.43 | bwd_microstep: 707.73 | bwd_inner_microstep: 707.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 21:59:04,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.85 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 21:59:06,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1394.49 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3548
[2024-06-10 21:59:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1457.91 | bwd_inner_microstep: 1457.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-10 21:59:09,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.46 | bwd_microstep: 687.55 | bwd_inner_microstep: 687.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-10 21:59:11,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1313.64 | bwd_inner_microstep: 1313.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 21:59:12,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.96 | bwd_microstep: 802.75 | bwd_inner_microstep: 802.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3822
[2024-06-10 21:59:14,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 1536.18 | bwd_inner_microstep: 1536.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 21:59:16,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.30 | bwd_microstep: 1406.03 | bwd_inner_microstep: 1406.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 909
[2024-06-10 21:59:17,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.65 | bwd_microstep: 373.73 | bwd_inner_microstep: 373.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2001
[2024-06-10 21:59:18,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.52 | bwd_microstep: 828.98 | bwd_inner_microstep: 828.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 21:59:20,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.01 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468
[2024-06-10 21:59:28,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 21:59:28,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 8161.69 | bwd_inner_microstep: 1624.34 | bwd_allreduce_microstep: 6537.30 | step_microstep: 38.75
[2024-06-10 21:59:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14733.38 | bwd: 46026.93 | bwd_inner: 39488.72 | bwd_allreduce: 6537.52 | step: 40.16
{'loss': 1.2384, 'learning_rate': 7.90682185986394e-06, 'epoch': 0.72}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3478
[2024-06-10 21:59:30,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1496.74 | bwd_inner_microstep: 1496.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3913
[2024-06-10 21:59:33,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.85 | bwd_microstep: 1682.13 | bwd_inner_microstep: 1682.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-10 21:59:35,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.36 | bwd_microstep: 1579.14 | bwd_inner_microstep: 1579.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 21:59:36,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.02 | bwd_microstep: 789.15 | bwd_inner_microstep: 789.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 21:59:38,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.93 | bwd_microstep: 1337.43 | bwd_inner_microstep: 1337.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-10 21:59:39,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.73 | bwd_microstep: 789.16 | bwd_inner_microstep: 789.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 21:59:41,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1382.85 | bwd_inner_microstep: 1382.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-10 21:59:43,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.53 | bwd_microstep: 1290.03 | bwd_inner_microstep: 1290.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2369
[2024-06-10 21:59:44,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.14 | bwd_microstep: 929.49 | bwd_inner_microstep: 929.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 21:59:46,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.92 | bwd_microstep: 1182.11 | bwd_inner_microstep: 1182.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 21:59:48,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.23 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 21:59:50,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1393.58 | bwd_inner_microstep: 1393.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 21:59:52,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1383.00 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2118
[2024-06-10 21:59:53,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.01 | bwd_microstep: 859.02 | bwd_inner_microstep: 858.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-10 21:59:55,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.18 | bwd_microstep: 1609.14 | bwd_inner_microstep: 1609.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-10 21:59:57,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.05 | bwd_microstep: 1583.16 | bwd_inner_microstep: 1583.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 21:59:59,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.90 | bwd_microstep: 1383.38 | bwd_inner_microstep: 1383.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3527
[2024-06-10 22:00:01,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.97 | bwd_microstep: 1351.33 | bwd_inner_microstep: 1351.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 22:00:03,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1280.12 | bwd_inner_microstep: 1280.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 22:00:05,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.08 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 22:00:07,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1555.50 | bwd_inner_microstep: 1555.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-10 22:00:09,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1487.41 | bwd_inner_microstep: 1487.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2006
[2024-06-10 22:00:10,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.75 | bwd_microstep: 740.07 | bwd_inner_microstep: 740.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3813
[2024-06-10 22:00:12,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1515.49 | bwd_inner_microstep: 1515.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1896
[2024-06-10 22:00:13,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.74 | bwd_microstep: 746.45 | bwd_inner_microstep: 746.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2057
[2024-06-10 22:00:14,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 819.77 | bwd_inner_microstep: 819.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 22:00:15,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.49 | bwd_microstep: 798.04 | bwd_inner_microstep: 798.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3736
[2024-06-10 22:00:17,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1460.90 | bwd_inner_microstep: 1460.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 22:00:19,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1498.66 | bwd_inner_microstep: 1498.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 22:00:21,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.56 | bwd_microstep: 1597.82 | bwd_inner_microstep: 1597.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 22:00:23,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1250.19 | bwd_inner_microstep: 1250.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 22:00:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-10 22:00:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.35 | bwd_microstep: 7572.38 | bwd_inner_microstep: 1742.25 | bwd_allreduce_microstep: 5830.06 | step_microstep: 38.34
[2024-06-10 22:00:31,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15444.22 | bwd: 47275.62 | bwd_inner: 41444.64 | bwd_allreduce: 5830.31 | step: 39.81
{'loss': 1.1897, 'learning_rate': 7.87694795306802e-06, 'epoch': 0.72}
71%|███████▏  | 1231/1726 [21:18:01<8:31:47, 62.04s/it]
 71%|███████▏  | 1232/1726 [21:19:02<8:27:52, 61.68s/it]


 71%|███████▏  | 1232/1726 [21:19:02<8:27:52, 61.68s/it]
 71%|███████▏  | 1233/1726 [21:20:03<8:24:35, 61.41s/it]


 71%|███████▏  | 1233/1726 [21:20:03<8:24:35, 61.41s/it]
 71%|███████▏  | 1234/1726 [21:21:04<8:22:43, 61.31s/it]


 71%|███████▏  | 1234/1726 [21:21:04<8:22:43, 61.31s/it]
 72%|███████▏  | 1235/1726 [21:22:05<8:21:08, 61.24s/it]


 72%|███████▏  | 1235/1726 [21:22:05<8:21:08, 61.24s/it]
 72%|███████▏  | 1236/1726 [21:23:08<8:24:32, 61.78s/it]


 72%|dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1886
[2024-06-10 22:00:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 793.57 | bwd_inner_microstep: 793.46 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3926
[2024-06-10 22:00:35,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1585.98 | bwd_inner_microstep: 1585.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 22:00:37,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1340.27 | bwd_inner_microstep: 1340.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 22:00:39,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 1475.33 | bwd_inner_microstep: 1475.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 22:00:41,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1534.13 | bwd_inner_microstep: 1534.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 22:00:43,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1397.92 | bwd_inner_microstep: 1397.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3731
[2024-06-10 22:00:44,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.79 | bwd_microstep: 1267.14 | bwd_inner_microstep: 1267.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 22:00:46,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1283.98 | bwd_inner_microstep: 1283.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 22:00:47,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.71 | bwd_microstep: 790.57 | bwd_inner_microstep: 790.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 22:00:49,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.97 | bwd_microstep: 1280.03 | bwd_inner_microstep: 1280.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 22:00:51,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1342.77 | bwd_inner_microstep: 1342.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-10 22:00:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1413.24 | bwd_inner_microstep: 1413.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975
[2024-06-10 22:00:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.12 | bwd_microstep: 854.83 | bwd_inner_microstep: 854.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3481
[2024-06-10 22:00:56,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.03 | bwd_microstep: 1540.12 | bwd_inner_microstep: 1540.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3634
[2024-06-10 22:00:58,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.74 | bwd_microstep: 1640.89 | bwd_inner_microstep: 1640.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3551
[2024-06-10 22:01:01,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.01 | bwd_microstep: 1585.93 | bwd_inner_microstep: 1585.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 22:01:03,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1557.55 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3629
[2024-06-10 22:01:04,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1246.68 | bwd_inner_microstep: 1246.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-10 22:01:07,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.11 | bwd_microstep: 1610.25 | bwd_inner_microstep: 1610.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 22:01:09,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.70 | bwd_microstep: 1403.87 | bwd_inner_microstep: 1403.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 22:01:11,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1414.52 | bwd_inner_microstep: 1414.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 22:01:13,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.64 | bwd_microstep: 1495.72 | bwd_inner_microstep: 1495.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113
[2024-06-10 22:01:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.15 | bwd_microstep: 829.23 | bwd_inner_microstep: 829.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468
[2024-06-10 22:01:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1400.52 | bwd_inner_microstep: 1400.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 22:01:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.12 | bwd_microstep: 1487.44 | bwd_inner_microstep: 1487.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404
[2024-06-10 22:01:20,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.03 | bwd_microstep: 1436.24 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652
[2024-06-10 22:01:22,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1325.27 | bwd_inner_microstep: 1325.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 22:01:23,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725
[2024-06-10 22:01:25,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1339.74 | bwd_inner_microstep: 1339.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 22:01:27,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1280.72 | bwd_inner_microstep: 1280.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-10 22:01:29,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1311.38 | bwd_inner_microstep: 1311.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 22:01:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-10 22:01:33,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 3225.70 | bwd_inner_microstep: 1443.05 | bwd_allreduce_microstep: 1782.60 | step_microstep: 37.86
[2024-06-10 22:01:33,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16037.12 | bwd: 44745.78 | bwd_inner: 42962.18 | bwd_allreduce: 1782.88 | step: 39.31
{'loss': 1.2072, 'learning_rate': 7.847116743598392e-06, 'epoch': 0.72}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 22:01:35,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1474.06 | bwd_inner_microstep: 1473.88 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4036
[2024-06-10 22:01:37,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.44 | bwd_microstep: 1715.69 | bwd_inner_microstep: 1715.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2395
[2024-06-10 22:01:38,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.43 | bwd_microstep: 904.51 | bwd_inner_microstep: 904.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833
[2024-06-10 22:01:40,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1457.77 | bwd_inner_microstep: 1457.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402
[2024-06-10 22:01:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.33 | bwd_microstep: 1211.14 | bwd_inner_microstep: 1211.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-10 22:01:44,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.72 | bwd_microstep: 1446.14 | bwd_inner_microstep: 1446.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4085
[2024-06-10 22:01:46,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.92 | bwd_microstep: 1626.35 | bwd_inner_microstep: 1626.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 22:01:47,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.84 | bwd_microstep: 795.35 | bwd_inner_microstep: 795.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-10 22:01:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1429.52 | bwd_inner_microstep: 1429.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-10 22:01:51,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.56 | bwd_microstep: 1282.76 | bwd_inner_microstep: 1282.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 22:01:53,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1289.26 | bwd_inner_microstep: 1289.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519
[2024-06-10 22:01:55,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1320.81 | bwd_inner_microstep: 1320.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3493
[2024-06-10 22:01:56,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1350.26 | bwd_inner_microstep: 1350.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652
[2024-06-10 22:01:58,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1320.50 | bwd_inner_microstep: 1320.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677
[2024-06-10 22:02:01,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.77 | bwd_microstep: 1719.38 | bwd_inner_microstep: 1719.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-10 22:02:03,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.10 | bwd_microstep: 1524.61 | bwd_inner_microstep: 1524.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-10 22:02:05,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.28 | bwd_microstep: 1616.84 | bwd_inner_microstep: 1616.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3828
[2024-06-10 22:02:07,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.11 | bwd_microstep: 1707.69 | bwd_inner_microstep: 1707.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029
[2024-06-10 22:02:08,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.72 | bwd_microstep: 808.04 | bwd_inner_microstep: 808.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 22:02:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1399.10 | bwd_inner_microstep: 1399.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 22:02:12,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1410.93 | bwd_inner_microstep: 1410.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-10 22:02:14,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1487.16 | bwd_inner_microstep: 1487.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-10 22:02:15,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 802.88 | bwd_inner_microstep: 802.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-10 22:02:18,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1511.36 | bwd_inner_microstep: 1511.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 22:02:20,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.25 | bwd_microstep: 1547.79 | bwd_inner_microstep: 1547.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-10 22:02:21,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.17 | bwd_microstep: 809.90 | bwd_inner_microstep: 809.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070
[2024-06-10 22:02:22,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.76 | bwd_microstep: 819.40 | bwd_inner_microstep: 819.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 22:02:24,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1508.20 | bwd_inner_microstep: 1508.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 22:02:26,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.06 | bwd_microstep: 1648.53 | bwd_inner_microstep: 1648.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-10 22:02:28,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.75 | bwd_microstep: 1003.94 | bwd_inner_microstep: 1003.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 22:02:30,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.52 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3779
[2024-06-10 22:02:35,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-10 22:02:35,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.62 | bwd_microstep: 4503.78 | bwd_inner_microstep: 1987.64 | bwd_allreduce_microstep: 2516.08 | step_microstep: 38.83
[2024-06-10 22:02:35,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16099.62 | bwd: 45958.09 | bwd_inner: 43440.96 | bwd_allreduce: 2516.39 | step: 40.36
{'loss': 1.1691, 'learning_rate': 7.817328336520412e-06, 'epoch': 0.72}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 22:02:37,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.18 | bwd_microstep: 1334.95 | bwd_inner_microstep: 1334.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3870
[2024-06-10 22:02:39,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1564.43 | bwd_inner_microstep: 1564.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-10 22:02:41,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.28 | bwd_microstep: 1183.18 | bwd_inner_microstep: 1183.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 22:02:43,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1380.65 | bwd_inner_microstep: 1380.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 22:02:44,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.41 | bwd_microstep: 1340.57 | bwd_inner_microstep: 1340.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 22:02:46,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1391.54 | bwd_inner_microstep: 1391.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 22:02:47,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.67 | bwd_microstep: 790.82 | bwd_inner_microstep: 790.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2756
[2024-06-10 22:02:49,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 1047.34 | bwd_inner_microstep: 1047.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 22:02:51,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1391.48 | bwd_inner_microstep: 1391.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3406
[2024-06-10 22:02:52,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.82 | bwd_microstep: 1180.75 | bwd_inner_microstep: 1180.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-10 22:02:54,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.81 | bwd_microstep: 1315.22 | bwd_inner_microstep: 1315.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1935
[2024-06-10 22:02:55,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.79 | bwd_microstep: 885.48 | bwd_inner_microstep: 885.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2109
[2024-06-10 22:02:57,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.89 | bwd_microstep: 950.51 | bwd_inner_microstep: 950.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655
[2024-06-10 22:02:59,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.22 | bwd_microstep: 1577.51 | bwd_inner_microstep: 1577.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-10 22:03:01,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.33 | bwd_microstep: 1708.71 | bwd_inner_microstep: 1708.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-10 22:03:03,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 1439.40 | bwd_inner_microstep: 1439.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2105
[2024-06-10 22:03:04,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.12 | bwd_microstep: 824.52 | bwd_inner_microstep: 824.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-10 22:03:06,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1495.13 | bwd_inner_microstep: 1495.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 22:03:08,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.81 | bwd_microstep: 1392.04 | bwd_inner_microstep: 1392.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3541
[2024-06-10 22:03:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.11 | bwd_microstep: 1201.11 | bwd_inner_microstep: 1201.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2263
[2024-06-10 22:03:11,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 780.21 | bwd_inner_microstep: 780.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-10 22:03:13,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.88 | bwd_microstep: 1508.72 | bwd_inner_microstep: 1508.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 22:03:15,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1527.92 | bwd_inner_microstep: 1527.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785
[2024-06-10 22:03:18,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 1611.40 | bwd_inner_microstep: 1611.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705
[2024-06-10 22:03:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.83 | bwd_microstep: 1426.50 | bwd_inner_microstep: 1426.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 22:03:22,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.75 | bwd_microstep: 1604.04 | bwd_inner_microstep: 1604.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-10 22:03:24,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1441.74 | bwd_inner_microstep: 1441.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3452
[2024-06-10 22:03:26,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1303.26 | bwd_inner_microstep: 1303.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019
[2024-06-10 22:03:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.73 | bwd_microstep: 846.44 | bwd_inner_microstep: 846.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3810
[2024-06-10 22:03:29,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.93 | bwd_microstep: 1719.31 | bwd_inner_microstep: 1719.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3807
[2024-06-10 22:03:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.21 | bwd_microstep: 1583.24 | bwd_inner_microstep: 1583.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814
[2024-06-10 22:03:36,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.62
[2024-06-10 22:03:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.54 | bwd_microstep: 4348.85 | bwd_inner_microstep: 1819.44 | bwd_allreduce_microstep: 2529.35 | step_microstep: 38.59
[2024-06-10 22:03:36,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15844.98 | bwd: 45097.00 | bwd_inner: 42566.73 | bwd_allreduce: 2529.59 | step: 40.07
{'loss': 1.1587, 'learning_rate': 7.787582836748692e-06, 'epoch': 0.72}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463
[2024-06-10 22:03:38,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1467.32 | bwd_inner_microstep: 1467.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3924
[2024-06-10 22:03:41,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.05 | bwd_microstep: 1785.24 | bwd_inner_microstep: 1785.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3857
[2024-06-10 22:03:43,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1393.95 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 22:03:44,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.85 | bwd_microstep: 1247.56 | bwd_inner_microstep: 1247.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399
[2024-06-10 22:03:46,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.74 | bwd_microstep: 1147.56 | bwd_inner_microstep: 1147.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 22:03:48,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1374.17 | bwd_inner_microstep: 1374.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-10 22:03:49,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 790.71 | bwd_inner_microstep: 790.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 22:03:51,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.31 | bwd_microstep: 1284.97 | bwd_inner_microstep: 1284.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699
[2024-06-10 22:03:53,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.41 | bwd_microstep: 1623.76 | bwd_inner_microstep: 1623.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 22:03:55,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1248.31 | bwd_inner_microstep: 1248.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-10 22:03:56,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123
[2024-06-10 22:03:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.14 | bwd_microstep: 828.78 | bwd_inner_microstep: 828.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-10 22:03:59,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.91 | bwd_microstep: 1276.40 | bwd_inner_microstep: 1276.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-10 22:04:01,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1447.77 | bwd_inner_microstep: 1447.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-10 22:04:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.93 | bwd_microstep: 1707.26 | bwd_inner_microstep: 1707.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 22:04:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.11 | bwd_microstep: 1511.10 | bwd_inner_microstep: 1511.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 22:04:08,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.23 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-10 22:04:10,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1417.86 | bwd_inner_microstep: 1417.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 22:04:12,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1380.48 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294
[2024-06-10 22:04:13,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 877.51 | bwd_inner_microstep: 877.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130
[2024-06-10 22:04:14,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.49 | bwd_microstep: 830.66 | bwd_inner_microstep: 830.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-10 22:04:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1491.98 | bwd_inner_microstep: 1491.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 22:04:18,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.41 | bwd_microstep: 1505.79 | bwd_inner_microstep: 1505.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2128
[2024-06-10 22:04:19,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.91 | bwd_microstep: 831.68 | bwd_inner_microstep: 831.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3556
[2024-06-10 22:04:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1232.09 | bwd_inner_microstep: 1232.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 22:04:23,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822
[2024-06-10 22:04:25,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.98 | bwd_microstep: 1582.68 | bwd_inner_microstep: 1582.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2669
[2024-06-10 22:04:27,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.02 | bwd_microstep: 1216.04 | bwd_inner_microstep: 1216.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-10 22:04:29,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.31 | bwd_microstep: 1495.02 | bwd_inner_microstep: 1494.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583
[2024-06-10 22:04:31,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.24 | bwd_microstep: 1692.76 | bwd_inner_microstep: 1692.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 22:04:33,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.55 | bwd_microstep: 1277.31 | bwd_inner_microstep: 1277.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3570
[2024-06-10 22:04:38,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.06 | optimizer_step: 6.63
[2024-06-10 22:04:38,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 4907.58 | bwd_inner_microstep: 1918.75 | bwd_allreduce_microstep: 2988.78 | step_microstep: 37.62
[2024-06-10 22:04:38,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15963.14 | bwd: 45871.81 | bwd_inner: 42882.12 | bwd_allreduce: 2989.01 | step: 39.30
{'loss': 1.1737, 'learning_rate': 7.757880349046742e-06, 'epoch': 0.72}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4151
[2024-06-10 22:04:41,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.46 | bwd_microstep: 1619.40 | bwd_inner_microstep: 1619.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937
[2024-06-10 22:04:43,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1495.26 | bwd_inner_microstep: 1495.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3915
[2024-06-10 22:04:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.37 | bwd_microstep: 1440.67 | bwd_inner_microstep: 1440.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-10 22:04:47,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1650.74 | bwd_inner_microstep: 1650.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 22:04:49,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1245.33 | bwd_inner_microstep: 1245.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918
[2024-06-10 22:04:50,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.21 | bwd_microstep: 779.71 | bwd_inner_microstep: 779.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 22:04:52,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.02 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-10 22:04:54,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.41 | bwd_microstep: 1451.70 | bwd_inner_microstep: 1451.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-10 22:04:56,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.72 | bwd_microstep: 1626.93 | bwd_inner_microstep: 1626.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 22:04:58,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.35 | bwd_microstep: 1249.20 | bwd_inner_microstep: 1249.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2157
[2024-06-10 22:04:59,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.65 | bwd_microstep: 912.81 | bwd_inner_microstep: 912.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3702
[2024-06-10 22:05:01,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1283.22 | bwd_inner_microstep: 1283.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1909
[2024-06-10 22:05:02,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.25 | bwd_microstep: 810.47 | bwd_inner_microstep: 810.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2187
[2024-06-10 22:05:03,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.01 | bwd_microstep: 1050.31 | bwd_inner_microstep: 1050.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3656
[2024-06-10 22:05:05,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.21 | bwd_microstep: 1576.19 | bwd_inner_microstep: 1576.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 22:05:07,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1344.26 | bwd_inner_microstep: 1344.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825
[2024-06-10 22:05:09,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.06 | bwd_microstep: 1583.41 | bwd_inner_microstep: 1583.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-10 22:05:11,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1429.56 | bwd_inner_microstep: 1429.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 22:05:13,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.10 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-10 22:05:16,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.02 | bwd_microstep: 1654.99 | bwd_inner_microstep: 1654.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 22:05:18,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1396.22 | bwd_inner_microstep: 1396.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-10 22:05:19,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.89 | bwd_microstep: 1358.93 | bwd_inner_microstep: 1358.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3671
[2024-06-10 22:05:21,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.95 | bwd_microstep: 1259.69 | bwd_inner_microstep: 1259.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 22:05:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1290.15 | bwd_inner_microstep: 1290.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 22:05:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1544.04 | bwd_inner_microstep: 1544.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 22:05:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1288.59 | bwd_inner_microstep: 1288.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-10 22:05:29,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.37 | bwd_microstep: 1584.28 | bwd_inner_microstep: 1584.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-10 22:05:31,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.43 | bwd_microstep: 1586.11 | bwd_inner_microstep: 1586.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-10 22:05:33,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.33 | bwd_microstep: 1595.05 | bwd_inner_microstep: 1595.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 22:05:35,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1339.70 | bwd_inner_microstep: 1339.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729
[2024-06-10 22:05:37,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1465.39 | bwd_inner_microstep: 1465.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 22:05:41,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.03 | optimizer_step: 6.59
[2024-06-10 22:05:41,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.60 | bwd_microstep: 2596.49 | bwd_inner_microstep: 1799.23 | bwd_allreduce_microstep: 797.21 | step_microstep: 37.54
[2024-06-10 22:05:41,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16533.67 | bwd: 45278.04 | bwd_inner: 44479.94 | bwd_allreduce: 797.43 | step: 38.99
{'loss': 1.22, 'learning_rate': 7.728220978026563e-06, 'epoch': 0.72}
███████▏  | 1236/1726 [21:23:08<8:24:32, 61.78s/it]
 72%|███████▏  | 1237/1726 [21:24:09<8:21:52, 61.58s/it]


 72%|███████▏  | 1237/1726 [21:24:09<8:21:52, 61.58s/it]
 72%|███████▏  | 1238/1726 [21:25:12<8:22:51, 61.83s/it]


 72%|███████▏  | 1238/1726 [21:25:12<8:22:51, 61.83s/it]
 72%|███████▏  | 1239/1726 [21:26:13<8:20:29, 61.66s/it]


 72%|███████▏  | 1239/1726 [21:26:13<8:20:29, 61.66s/it]
 72%|███████▏  | 1240/1726 [21:27:15<8:20:45, 61.82s/it]


 72%|███████▏  | 1240/1726 [21:27:15<8:20:45, 61.82s/it]
 72%|███████▏  | 1241/1726 [21:28:17<8:20:30, 61.92s/it]


 72%|█�dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2376
[2024-06-10 22:05:42,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.90 | bwd_microstep: 1018.99 | bwd_inner_microstep: 1018.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4006
[2024-06-10 22:05:44,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.80 | bwd_microstep: 1505.82 | bwd_inner_microstep: 1505.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3864
[2024-06-10 22:05:46,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.67 | bwd_microstep: 1560.40 | bwd_inner_microstep: 1560.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3806
[2024-06-10 22:05:48,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.08 | bwd_microstep: 1350.47 | bwd_inner_microstep: 1350.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 22:05:50,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2246
[2024-06-10 22:05:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.41 | bwd_microstep: 805.12 | bwd_inner_microstep: 805.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-10 22:05:53,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1426.43 | bwd_inner_microstep: 1426.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 22:05:55,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1275.89 | bwd_inner_microstep: 1275.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-10 22:05:57,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.67 | bwd_microstep: 1530.48 | bwd_inner_microstep: 1530.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096
[2024-06-10 22:05:58,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 819.55 | bwd_inner_microstep: 819.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3713
[2024-06-10 22:06:00,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.57 | bwd_microstep: 1587.65 | bwd_inner_microstep: 1587.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2628
[2024-06-10 22:06:02,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.48 | bwd_microstep: 1014.28 | bwd_inner_microstep: 1014.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3451
[2024-06-10 22:06:04,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.10 | bwd_microstep: 1541.90 | bwd_inner_microstep: 1541.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3691
[2024-06-10 22:06:06,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.36 | bwd_microstep: 1718.58 | bwd_inner_microstep: 1718.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 22:06:08,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1379.12 | bwd_inner_microstep: 1379.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3937
[2024-06-10 22:06:10,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.64 | bwd_microstep: 1687.35 | bwd_inner_microstep: 1687.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3383
[2024-06-10 22:06:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.39 | bwd_microstep: 1364.98 | bwd_inner_microstep: 1364.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 22:06:14,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.16 | bwd_microstep: 1611.28 | bwd_inner_microstep: 1611.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539
[2024-06-10 22:06:17,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.30 | bwd_microstep: 1583.87 | bwd_inner_microstep: 1583.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 22:06:19,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 1646.81 | bwd_inner_microstep: 1646.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3824
[2024-06-10 22:06:21,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.63 | bwd_microstep: 1593.19 | bwd_inner_microstep: 1593.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580
[2024-06-10 22:06:23,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.75 | bwd_microstep: 1461.34 | bwd_inner_microstep: 1461.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 22:06:25,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1253.96 | bwd_inner_microstep: 1253.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 22:06:27,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1508.55 | bwd_inner_microstep: 1508.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-10 22:06:29,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1428.81 | bwd_inner_microstep: 1428.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-10 22:06:31,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1358.53 | bwd_inner_microstep: 1358.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-10 22:06:33,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3602
[2024-06-10 22:06:35,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1307.34 | bwd_inner_microstep: 1307.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815
[2024-06-10 22:06:37,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.89 | bwd_microstep: 1386.06 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 22:06:38,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1404.11 | bwd_inner_microstep: 1404.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3731
[2024-06-10 22:06:41,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.31 | bwd_microstep: 1594.32 | bwd_inner_microstep: 1594.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1017
[2024-06-10 22:06:42,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.02 | optimizer_step: 6.61
[2024-06-10 22:06:42,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.61 | bwd_microstep: 1069.01 | bwd_inner_microstep: 452.31 | bwd_allreduce_microstep: 616.66 | step_microstep: 38.09
[2024-06-10 22:06:42,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16424.29 | bwd: 44598.28 | bwd_inner: 43980.73 | bwd_allreduce: 616.89 | step: 39.59
{'loss': 1.1748, 'learning_rate': 7.698604828148306e-06, 'epoch': 0.72}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-10 22:06:43,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.25 | bwd_microstep: 674.66 | bwd_inner_microstep: 674.50 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3983
[2024-06-10 22:06:45,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.07 | bwd_microstep: 1456.28 | bwd_inner_microstep: 1456.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 22:06:47,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1278.63 | bwd_inner_microstep: 1278.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841
[2024-06-10 22:06:49,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.10 | bwd_microstep: 1557.08 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 22:06:51,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-10 22:06:52,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.52 | bwd_microstep: 788.65 | bwd_inner_microstep: 788.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-10 22:06:54,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.96 | bwd_microstep: 1620.20 | bwd_inner_microstep: 1620.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435
[2024-06-10 22:06:56,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.44 | bwd_microstep: 1157.21 | bwd_inner_microstep: 1157.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 22:06:58,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1252.95 | bwd_inner_microstep: 1252.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2093
[2024-06-10 22:06:59,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.53 | bwd_microstep: 824.42 | bwd_inner_microstep: 824.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 22:07:01,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1387.89 | bwd_inner_microstep: 1387.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512
[2024-06-10 22:07:03,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1418.07 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 22:07:05,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.94 | bwd_microstep: 1382.52 | bwd_inner_microstep: 1382.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3422
[2024-06-10 22:07:07,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.35 | bwd_microstep: 1510.48 | bwd_inner_microstep: 1510.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3396
[2024-06-10 22:07:08,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.69 | bwd_microstep: 1368.43 | bwd_inner_microstep: 1368.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 22:07:11,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.74 | bwd_microstep: 1487.61 | bwd_inner_microstep: 1487.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-10 22:07:12,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.25 | bwd_microstep: 798.20 | bwd_inner_microstep: 798.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385
[2024-06-10 22:07:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1431.35 | bwd_inner_microstep: 1431.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-10 22:07:16,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.90 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2488
[2024-06-10 22:07:17,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 960.51 | bwd_inner_microstep: 960.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 22:07:19,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1382.02 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546
[2024-06-10 22:07:21,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.72 | bwd_microstep: 1327.36 | bwd_inner_microstep: 1327.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004
[2024-06-10 22:07:22,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 897.66 | bwd_inner_microstep: 897.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 22:07:24,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 22:07:26,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.86 | bwd_microstep: 1560.59 | bwd_inner_microstep: 1560.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733
[2024-06-10 22:07:28,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1431.02 | bwd_inner_microstep: 1430.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2170
[2024-06-10 22:07:29,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.51 | bwd_microstep: 855.20 | bwd_inner_microstep: 855.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3767
[2024-06-10 22:07:31,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.72 | bwd_microstep: 1348.12 | bwd_inner_microstep: 1348.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3435
[2024-06-10 22:07:33,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1412.68 | bwd_inner_microstep: 1412.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831
[2024-06-10 22:07:35,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.14 | bwd_microstep: 1753.43 | bwd_inner_microstep: 1753.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 22:07:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1506.03 | bwd_inner_microstep: 1506.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 22:07:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.26 | optimizer_gradients: 4.10 | optimizer_step: 6.62
[2024-06-10 22:07:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 6656.04 | bwd_inner_microstep: 1554.10 | bwd_allreduce_microstep: 5101.89 | step_microstep: 38.44
[2024-06-10 22:07:45,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15562.01 | bwd: 46851.82 | bwd_inner: 41748.92 | bwd_allreduce: 5102.18 | step: 39.91
{'loss': 1.1524, 'learning_rate': 7.669032003719894e-06, 'epoch': 0.72}
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4068
[2024-06-10 22:07:47,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.91 | bwd_microstep: 1803.12 | bwd_inner_microstep: 1803.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 22:07:49,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.84 | bwd_microstep: 1369.99 | bwd_inner_microstep: 1369.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 22:07:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1548.65 | bwd_inner_microstep: 1548.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 22:07:53,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 22:07:55,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 22:07:57,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 1528.25 | bwd_inner_microstep: 1528.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 22:07:59,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2057
[2024-06-10 22:08:00,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.04 | bwd_microstep: 752.77 | bwd_inner_microstep: 752.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3708
[2024-06-10 22:08:02,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669
[2024-06-10 22:08:04,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.24 | bwd_microstep: 1544.53 | bwd_inner_microstep: 1544.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 22:08:06,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.94 | bwd_microstep: 1480.48 | bwd_inner_microstep: 1480.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489
[2024-06-10 22:08:08,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.37 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-10 22:08:10,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.21 | bwd_microstep: 1472.79 | bwd_inner_microstep: 1472.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3552
[2024-06-10 22:08:12,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.68 | bwd_microstep: 1694.57 | bwd_inner_microstep: 1694.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3657
[2024-06-10 22:08:14,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.09 | bwd_microstep: 1367.37 | bwd_inner_microstep: 1367.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 22:08:16,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.68 | bwd_microstep: 1184.47 | bwd_inner_microstep: 1184.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 22:08:18,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1378.63 | bwd_inner_microstep: 1378.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827
[2024-06-10 22:08:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2277
[2024-06-10 22:08:21,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.13 | bwd_microstep: 908.53 | bwd_inner_microstep: 908.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2289
[2024-06-10 22:08:22,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.06 | bwd_microstep: 1003.46 | bwd_inner_microstep: 1003.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 22:08:24,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1514.59 | bwd_inner_microstep: 1514.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 22:08:25,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.60 | bwd_microstep: 700.39 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003
[2024-06-10 22:08:27,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.82 | bwd_microstep: 899.12 | bwd_inner_microstep: 899.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-10 22:08:28,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.26 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831
[2024-06-10 22:08:31,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.82 | bwd_microstep: 1752.16 | bwd_inner_microstep: 1752.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-10 22:08:33,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.68 | bwd_microstep: 1612.16 | bwd_inner_microstep: 1612.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772
[2024-06-10 22:08:35,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.63 | bwd_microstep: 1344.66 | bwd_inner_microstep: 1344.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 22:08:37,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.24 | bwd_microstep: 1157.91 | bwd_inner_microstep: 1157.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2234
[2024-06-10 22:08:38,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.92 | bwd_microstep: 801.74 | bwd_inner_microstep: 801.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158
[2024-06-10 22:08:39,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.94 | bwd_microstep: 950.71 | bwd_inner_microstep: 950.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 22:08:41,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.40 | bwd_microstep: 1293.49 | bwd_inner_microstep: 1293.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2040
[2024-06-10 22:08:44,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.09 | optimizer_step: 6.59
[2024-06-10 22:08:44,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.12 | bwd_microstep: 2811.64 | bwd_inner_microstep: 1091.94 | bwd_allreduce_microstep: 1719.64 | step_microstep: 37.80
[2024-06-10 22:08:44,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15578.81 | bwd: 43454.08 | bwd_inner: 41733.55 | bwd_allreduce: 1719.87 | step: 39.24
{'loss': 1.1829, 'learning_rate': 7.639502608896653e-06, 'epoch': 0.72}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866
[2024-06-10 22:08:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.74 | bwd_microstep: 699.00 | bwd_inner_microstep: 698.93 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3966
[2024-06-10 22:08:47,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.08 | bwd_microstep: 1594.68 | bwd_inner_microstep: 1594.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 22:08:49,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3794
[2024-06-10 22:08:51,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.90 | bwd_microstep: 1380.76 | bwd_inner_microstep: 1380.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 22:08:53,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3422
[2024-06-10 22:08:55,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.16 | bwd_microstep: 1280.16 | bwd_inner_microstep: 1280.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 22:08:57,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1387.45 | bwd_inner_microstep: 1387.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 22:08:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.49 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 22:09:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 1387.15 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500
[2024-06-10 22:09:02,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1415.20 | bwd_inner_microstep: 1415.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 22:09:04,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1384.61 | bwd_inner_microstep: 1384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 22:09:06,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3507
[2024-06-10 22:09:08,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.30 | bwd_microstep: 1434.54 | bwd_inner_microstep: 1434.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 22:09:10,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.33 | bwd_microstep: 1391.54 | bwd_inner_microstep: 1391.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3659
[2024-06-10 22:09:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.61 | bwd_microstep: 1683.93 | bwd_inner_microstep: 1683.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2073
[2024-06-10 22:09:14,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.36 | bwd_microstep: 947.43 | bwd_inner_microstep: 947.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3700
[2024-06-10 22:09:15,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1234.95 | bwd_inner_microstep: 1234.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3619
[2024-06-10 22:09:17,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.06 | bwd_microstep: 1553.55 | bwd_inner_microstep: 1553.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 22:09:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.77 | bwd_microstep: 1649.25 | bwd_inner_microstep: 1649.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3878
[2024-06-10 22:09:22,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.36 | bwd_microstep: 1684.94 | bwd_inner_microstep: 1684.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 22:09:24,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.54 | bwd_microstep: 1160.90 | bwd_inner_microstep: 1160.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-10 22:09:26,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1484.60 | bwd_inner_microstep: 1484.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3528
[2024-06-10 22:09:28,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1551.69 | bwd_inner_microstep: 1551.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3550
[2024-06-10 22:09:30,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.11 | bwd_microstep: 1563.50 | bwd_inner_microstep: 1563.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545
[2024-06-10 22:09:32,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.18 | bwd_microstep: 1589.94 | bwd_inner_microstep: 1589.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 22:09:34,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1291.13 | bwd_inner_microstep: 1291.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 22:09:36,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1350.14 | bwd_inner_microstep: 1350.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 22:09:37,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.04 | bwd_microstep: 694.81 | bwd_inner_microstep: 694.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584
[2024-06-10 22:09:39,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1407.47 | bwd_inner_microstep: 1407.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3584
[2024-06-10 22:09:41,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.66 | bwd_microstep: 1333.14 | bwd_inner_microstep: 1333.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2266
[2024-06-10 22:09:42,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.33 | bwd_microstep: 1005.55 | bwd_inner_microstep: 1005.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 22:09:44,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.02 | optimizer_step: 6.63
[2024-06-10 22:09:44,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1598.48 | bwd_inner_microstep: 1525.75 | bwd_allreduce_microstep: 72.68 | step_microstep: 37.57
[2024-06-10 22:09:44,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16232.39 | bwd: 43588.28 | bwd_inner: 43514.64 | bwd_allreduce: 72.94 | step: 39.05
{'loss': 1.1328, 'learning_rate': 7.61001674768098e-06, 'epoch': 0.72}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-10 22:09:46,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1474.65 | bwd_inner_microstep: 1474.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-10 22:09:49,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.14 | bwd_microstep: 1150.83 | bwd_inner_microstep: 1150.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 22:09:51,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1242.12 | bwd_inner_microstep: 1242.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 22:09:53,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1354.23 | bwd_inner_microstep: 1354.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-10 22:09:55,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1539.35 | bwd_inner_microstep: 1539.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3800
[2024-06-10 22:09:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.50 | bwd_microstep: 1352.41 | bwd_inner_microstep: 1352.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 22:09:59,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.43 | bwd_microstep: 1458.78 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-10 22:10:01,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.30 | bwd_microstep: 1344.94 | bwd_inner_microstep: 1344.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 22:10:03,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739
[2024-06-10 22:10:05,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1430.87 | bwd_inner_microstep: 1430.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 22:10:07,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1511.62 | bwd_inner_microstep: 1511.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2207
[2024-06-10 22:10:08,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.14 | bwd_microstep: 930.18 | bwd_inner_microstep: 930.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954
[2024-06-10 22:10:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.32 | bwd_microstep: 888.28 | bwd_inner_microstep: 888.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690
[2024-06-10 22:10:11,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1327.18 | bwd_inner_microstep: 1327.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941
[2024-06-10 22:10:12,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.05 | bwd_microstep: 883.82 | bwd_inner_microstep: 883.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 22:10:14,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-10 22:10:16,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.10 | bwd_microstep: 1342.15 | bwd_inner_microstep: 1342.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 22:10:18,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.36 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-10 22:10:19,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.57 | bwd_microstep: 695.87 | bwd_inner_microstep: 695.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 22:10:21,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.01 | bwd_microstep: 1656.57 | bwd_inner_microstep: 1656.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2923
[2024-06-10 22:10:23,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.11 | bwd_microstep: 1095.80 | bwd_inner_microstep: 1095.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 22:10:25,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1559.68 | bwd_inner_microstep: 1559.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 22:10:27,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1283.53 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825
[2024-06-10 22:10:29,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.80 | bwd_microstep: 1752.06 | bwd_inner_microstep: 1752.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 22:10:31,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.39 | bwd_microstep: 1356.11 | bwd_inner_microstep: 1356.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 22:10:33,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1379.99 | bwd_inner_microstep: 1379.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865
[2024-06-10 22:10:35,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.43 | bwd_microstep: 1664.12 | bwd_inner_microstep: 1664.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-10 22:10:37,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.71 | bwd_microstep: 1448.17 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 22:10:39,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 22:10:41,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.51 | bwd_microstep: 1657.38 | bwd_inner_microstep: 1657.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3901
[2024-06-10 22:10:43,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1396.02 | bwd_inner_microstep: 1395.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495
[2024-06-10 22:10:47,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 22:10:47,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 3456.66 | bwd_inner_microstep: 1629.10 | bwd_allreduce_microstep: 1827.51 | step_microstep: 37.78
[2024-06-10 22:10:47,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16183.02 | bwd: 45242.52 | bwd_inner: 43414.12 | bwd_allreduce: 1827.74 | step: 39.23
{'loss': 1.2109, 'learning_rate': 7.580574523921906e-06, 'epoch': 0.72}
��█████▏  | 1241/1726 [21:28:17<8:20:30, 61.92s/it]
 72%|███████▏  | 1242/1726 [21:29:19<8:18:07, 61.75s/it]


 72%|███████▏  | 1242/1726 [21:29:19<8:18:07, 61.75s/it]
 72%|███████▏  | 1243/1726 [21:30:21<8:19:28, 62.05s/it]


 72%|███████▏  | 1243/1726 [21:30:21<8:19:28, 62.05s/it]
 72%|███████▏  | 1244/1726 [21:31:21<8:11:57, 61.24s/it]


 72%|███████▏  | 1244/1726 [21:31:21<8:11:57, 61.24s/it]
 72%|███████▏  | 1245/1726 [21:32:21<8:08:19, 60.91s/it]


 72%|███████▏  | 1245/1726 [21:32:21<8:08:19, 60.91s/it]
 72%|███████▏  | 1246/1726 [21:33:24<8:12:52, 61.61s/it]


 72%|██�dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402
[2024-06-10 22:10:49,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.78 | bwd_microstep: 1167.86 | bwd_inner_microstep: 1167.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3906
[2024-06-10 22:10:51,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1517.82 | bwd_inner_microstep: 1517.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871
[2024-06-10 22:10:53,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 1662.22 | bwd_inner_microstep: 1662.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3777
[2024-06-10 22:10:55,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1505.31 | bwd_inner_microstep: 1505.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1902
[2024-06-10 22:10:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.54 | bwd_microstep: 747.08 | bwd_inner_microstep: 747.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 22:10:59,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1530.90 | bwd_inner_microstep: 1530.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-10 22:11:00,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.61 | bwd_microstep: 789.50 | bwd_inner_microstep: 789.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-10 22:11:01,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.29 | bwd_microstep: 961.62 | bwd_inner_microstep: 961.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499
[2024-06-10 22:11:03,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1428.00 | bwd_inner_microstep: 1427.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922
[2024-06-10 22:11:04,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 791.16 | bwd_inner_microstep: 791.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 22:11:06,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.04 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 22:11:08,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.99 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 22:11:10,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.67 | bwd_microstep: 1417.68 | bwd_inner_microstep: 1417.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-10 22:11:12,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.96 | bwd_microstep: 1408.48 | bwd_inner_microstep: 1408.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3466
[2024-06-10 22:11:14,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.18 | bwd_microstep: 1243.91 | bwd_inner_microstep: 1243.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1909
[2024-06-10 22:11:15,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.35 | bwd_microstep: 715.04 | bwd_inner_microstep: 715.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943
[2024-06-10 22:11:16,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.75 | bwd_microstep: 1192.58 | bwd_inner_microstep: 1192.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-10 22:11:18,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.86 | bwd_microstep: 1509.99 | bwd_inner_microstep: 1509.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 22:11:20,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.66 | bwd_microstep: 795.70 | bwd_inner_microstep: 795.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3785
[2024-06-10 22:11:21,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1286.32 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626
[2024-06-10 22:11:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.61 | bwd_microstep: 1471.07 | bwd_inner_microstep: 1471.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 22:11:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1280.25 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2429
[2024-06-10 22:11:26,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.04 | bwd_microstep: 941.22 | bwd_inner_microstep: 941.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 22:11:28,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.46 | bwd_microstep: 1401.79 | bwd_inner_microstep: 1401.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 22:11:31,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.98 | bwd_microstep: 1653.75 | bwd_inner_microstep: 1653.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 818
[2024-06-10 22:11:31,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 131.77 | bwd_microstep: 342.37 | bwd_inner_microstep: 342.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3804
[2024-06-10 22:11:33,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.57 | bwd_microstep: 1717.47 | bwd_inner_microstep: 1717.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3793
[2024-06-10 22:11:36,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.35 | bwd_microstep: 1747.30 | bwd_inner_microstep: 1747.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 22:11:38,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1425.22 | bwd_inner_microstep: 1425.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 22:11:39,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.42 | bwd_microstep: 972.28 | bwd_inner_microstep: 972.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 22:11:41,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.10 | bwd_microstep: 1593.89 | bwd_inner_microstep: 1593.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3809
[2024-06-10 22:11:50,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.18 | optimizer_step: 6.61
[2024-06-10 22:11:50,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 7717.68 | bwd_inner_microstep: 1565.46 | bwd_allreduce_microstep: 6152.16 | step_microstep: 38.28
[2024-06-10 22:11:50,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.58 | bwd: 46796.51 | bwd_inner: 40643.44 | bwd_allreduce: 6152.40 | step: 39.74
{'loss': 1.2112, 'learning_rate': 7.5511760413148e-06, 'epoch': 0.72}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388
[2024-06-10 22:11:51,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.21 | bwd_microstep: 1235.05 | bwd_inner_microstep: 1235.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 22:11:53,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.66 | bwd_microstep: 1280.93 | bwd_inner_microstep: 1280.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3834
[2024-06-10 22:11:55,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 22:11:57,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.65 | bwd_microstep: 1240.38 | bwd_inner_microstep: 1240.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-10 22:11:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1243.14 | bwd_inner_microstep: 1243.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2633
[2024-06-10 22:12:00,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.52 | bwd_microstep: 1066.24 | bwd_inner_microstep: 1066.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3571
[2024-06-10 22:12:02,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.12 | bwd_microstep: 1203.55 | bwd_inner_microstep: 1203.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 22:12:04,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1382.89 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2933
[2024-06-10 22:12:05,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.59 | bwd_microstep: 1177.72 | bwd_inner_microstep: 1177.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 22:12:07,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.17 | bwd_microstep: 1524.15 | bwd_inner_microstep: 1524.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3506
[2024-06-10 22:12:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.43 | bwd_microstep: 1331.54 | bwd_inner_microstep: 1331.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 22:12:11,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.03 | bwd_microstep: 1363.71 | bwd_inner_microstep: 1363.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 22:12:13,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.87 | bwd_microstep: 1605.89 | bwd_inner_microstep: 1605.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-10 22:12:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1627.60 | bwd_inner_microstep: 1627.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 22:12:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1345.77 | bwd_inner_microstep: 1345.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2094
[2024-06-10 22:12:19,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.87 | bwd_microstep: 880.05 | bwd_inner_microstep: 880.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-10 22:12:20,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.20 | bwd_microstep: 1423.91 | bwd_inner_microstep: 1423.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3638
[2024-06-10 22:12:23,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1574.15 | bwd_inner_microstep: 1574.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 22:12:24,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.66 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-10 22:12:26,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1422.31 | bwd_inner_microstep: 1422.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2610
[2024-06-10 22:12:28,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.37 | bwd_microstep: 1001.33 | bwd_inner_microstep: 1001.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 874
[2024-06-10 22:12:28,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.13 | bwd_microstep: 397.65 | bwd_inner_microstep: 397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2947
[2024-06-10 22:12:30,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.15 | bwd_microstep: 1194.02 | bwd_inner_microstep: 1194.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3819
[2024-06-10 22:12:32,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1594.81 | bwd_inner_microstep: 1594.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3819
[2024-06-10 22:12:34,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1407.83 | bwd_inner_microstep: 1407.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 22:12:36,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 22:12:38,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1606.90 | bwd_inner_microstep: 1606.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 22:12:40,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 22:12:42,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.18 | bwd_microstep: 1373.42 | bwd_inner_microstep: 1373.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2078
[2024-06-10 22:12:43,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.16 | bwd_microstep: 916.87 | bwd_inner_microstep: 916.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 22:12:45,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.38 | bwd_microstep: 1750.44 | bwd_inner_microstep: 1750.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-10 22:12:49,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.17 | optimizer_step: 6.62
[2024-06-10 22:12:49,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.19 | bwd_microstep: 3441.67 | bwd_inner_microstep: 1749.82 | bwd_allreduce_microstep: 1691.80 | step_microstep: 38.98
[2024-06-10 22:12:49,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15686.71 | bwd: 43818.71 | bwd_inner: 42125.91 | bwd_allreduce: 1692.03 | step: 40.47
{'loss': 1.2234, 'learning_rate': 7.521821403400955e-06, 'epoch': 0.72}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 22:12:52,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1473.58 | bwd_inner_microstep: 1473.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3994
[2024-06-10 22:12:54,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.25 | bwd_microstep: 1502.00 | bwd_inner_microstep: 1501.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 22:12:56,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1479.81 | bwd_inner_microstep: 1479.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 22:12:58,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.41 | bwd_microstep: 1448.74 | bwd_inner_microstep: 1448.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 22:13:00,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1551.48 | bwd_inner_microstep: 1551.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 22:13:01,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1243.84 | bwd_inner_microstep: 1243.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883
[2024-06-10 22:13:02,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.00 | bwd_microstep: 679.51 | bwd_inner_microstep: 679.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2100
[2024-06-10 22:13:04,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.38 | bwd_microstep: 823.16 | bwd_inner_microstep: 823.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-10 22:13:05,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.54 | bwd_microstep: 789.09 | bwd_inner_microstep: 789.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 22:13:07,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694
[2024-06-10 22:13:09,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.49 | bwd_microstep: 1627.09 | bwd_inner_microstep: 1627.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-10 22:13:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.41 | bwd_microstep: 1528.30 | bwd_inner_microstep: 1528.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2133
[2024-06-10 22:13:12,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.53 | bwd_microstep: 971.55 | bwd_inner_microstep: 971.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 22:13:14,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1494.11 | bwd_inner_microstep: 1494.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 22:13:16,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.17 | bwd_microstep: 1379.22 | bwd_inner_microstep: 1379.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3863
[2024-06-10 22:13:19,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.85 | bwd_microstep: 1659.70 | bwd_inner_microstep: 1659.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-10 22:13:21,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.40 | bwd_microstep: 1438.20 | bwd_inner_microstep: 1438.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 22:13:22,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1399.48 | bwd_inner_microstep: 1399.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-10 22:13:25,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1496.56 | bwd_inner_microstep: 1496.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 22:13:27,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1552.27 | bwd_inner_microstep: 1552.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 22:13:29,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1507.59 | bwd_inner_microstep: 1507.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 22:13:31,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.74 | bwd_microstep: 1646.51 | bwd_inner_microstep: 1646.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-10 22:13:33,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1497.52 | bwd_inner_microstep: 1497.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 22:13:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1504.81 | bwd_inner_microstep: 1504.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 22:13:37,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1391.89 | bwd_inner_microstep: 1391.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 22:13:39,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1251.88 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-10 22:13:41,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1499.81 | bwd_inner_microstep: 1499.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2891
[2024-06-10 22:13:43,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.73 | bwd_microstep: 1184.20 | bwd_inner_microstep: 1184.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3769
[2024-06-10 22:13:44,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.33 | bwd_microstep: 1376.85 | bwd_inner_microstep: 1376.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 22:13:46,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1280.66 | bwd_inner_microstep: 1280.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585
[2024-06-10 22:13:48,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.61 | bwd_microstep: 1526.76 | bwd_inner_microstep: 1526.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799
[2024-06-10 22:13:51,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.06 | optimizer_step: 6.60
[2024-06-10 22:13:51,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1689.20 | bwd_inner_microstep: 1681.54 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.36
[2024-06-10 22:13:51,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16533.18 | bwd: 44283.85 | bwd_inner: 44275.34 | bwd_allreduce: 7.84 | step: 38.92
{'loss': 1.1818, 'learning_rate': 7.492510713567265e-06, 'epoch': 0.72}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470
[2024-06-10 22:13:53,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1402.62 | bwd_inner_microstep: 1402.45 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 22:13:54,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1145.27 | bwd_inner_microstep: 1145.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394
[2024-06-10 22:13:56,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1341.31 | bwd_inner_microstep: 1341.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 22:13:58,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1380.16 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-10 22:14:00,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.09 | bwd_microstep: 1533.29 | bwd_inner_microstep: 1533.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 22:14:02,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1247.34 | bwd_inner_microstep: 1247.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 22:14:04,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1381.59 | bwd_inner_microstep: 1381.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2196
[2024-06-10 22:14:05,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.93 | bwd_microstep: 953.65 | bwd_inner_microstep: 953.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3421
[2024-06-10 22:14:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1293.91 | bwd_inner_microstep: 1293.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1894
[2024-06-10 22:14:08,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.44 | bwd_microstep: 712.52 | bwd_inner_microstep: 712.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 22:14:10,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.42 | bwd_microstep: 1276.66 | bwd_inner_microstep: 1276.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 22:14:11,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1243.91 | bwd_inner_microstep: 1243.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505
[2024-06-10 22:14:13,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.06 | bwd_microstep: 1430.15 | bwd_inner_microstep: 1430.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920
[2024-06-10 22:14:14,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.95 | bwd_microstep: 778.68 | bwd_inner_microstep: 778.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2728
[2024-06-10 22:14:16,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.13 | bwd_microstep: 1139.01 | bwd_inner_microstep: 1138.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2138
[2024-06-10 22:14:17,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.81 | bwd_microstep: 832.03 | bwd_inner_microstep: 832.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 22:14:19,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 22:14:21,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.12 | bwd_microstep: 1298.27 | bwd_inner_microstep: 1298.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 22:14:23,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1390.66 | bwd_inner_microstep: 1390.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-10 22:14:25,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.24 | bwd_microstep: 1432.75 | bwd_inner_microstep: 1432.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 22:14:27,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.31 | bwd_microstep: 1354.07 | bwd_inner_microstep: 1354.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997
[2024-06-10 22:14:28,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.82 | bwd_microstep: 706.35 | bwd_inner_microstep: 706.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 22:14:30,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1512.54 | bwd_inner_microstep: 1512.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 22:14:32,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.96 | bwd_microstep: 1279.85 | bwd_inner_microstep: 1279.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716
[2024-06-10 22:14:33,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1337.01 | bwd_inner_microstep: 1336.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 22:14:35,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.32 | bwd_microstep: 1404.78 | bwd_inner_microstep: 1404.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 22:14:37,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1377.75 | bwd_inner_microstep: 1377.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-10 22:14:39,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1453.79 | bwd_inner_microstep: 1453.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-10 22:14:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1550.12 | bwd_inner_microstep: 1550.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 22:14:43,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2201
[2024-06-10 22:14:45,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.44 | bwd_microstep: 985.79 | bwd_inner_microstep: 985.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 22:14:53,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.56
[2024-06-10 22:14:53,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 7515.15 | bwd_inner_microstep: 1566.40 | bwd_allreduce_microstep: 5948.68 | step_microstep: 38.81
[2024-06-10 22:14:53,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15234.66 | bwd: 46621.49 | bwd_inner: 40671.75 | bwd_allreduce: 5948.99 | step: 40.34
{'loss': 1.1904, 'learning_rate': 7.463244075045815e-06, 'epoch': 0.72}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 22:14:55,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1363.94 | bwd_inner_microstep: 1363.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3926
[2024-06-10 22:14:57,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.59 | bwd_microstep: 1587.21 | bwd_inner_microstep: 1587.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849
[2024-06-10 22:14:59,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.27 | bwd_microstep: 1557.35 | bwd_inner_microstep: 1557.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 22:15:01,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.01 | bwd_microstep: 1649.23 | bwd_inner_microstep: 1649.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843
[2024-06-10 22:15:03,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.99 | bwd_microstep: 1463.84 | bwd_inner_microstep: 1463.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 22:15:05,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1274.51 | bwd_inner_microstep: 1274.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3487
[2024-06-10 22:15:07,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1216.45 | bwd_inner_microstep: 1216.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 22:15:09,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3702
[2024-06-10 22:15:11,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1457.87 | bwd_inner_microstep: 1457.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3711
[2024-06-10 22:15:13,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.55 | bwd_microstep: 1655.89 | bwd_inner_microstep: 1655.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-10 22:15:15,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1288.60 | bwd_inner_microstep: 1288.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678
[2024-06-10 22:15:17,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.70 | bwd_microstep: 1619.83 | bwd_inner_microstep: 1619.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3542
[2024-06-10 22:15:19,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1561.57 | bwd_inner_microstep: 1561.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 22:15:21,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1286.80 | bwd_inner_microstep: 1286.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3657
[2024-06-10 22:15:23,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.15 | bwd_microstep: 1662.01 | bwd_inner_microstep: 1661.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-10 22:15:25,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.29 | bwd_microstep: 1595.07 | bwd_inner_microstep: 1595.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 22:15:27,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1341.53 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 22:15:29,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.77 | bwd_microstep: 1490.91 | bwd_inner_microstep: 1490.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-10 22:15:30,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.80 | bwd_microstep: 806.04 | bwd_inner_microstep: 806.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-10 22:15:33,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.24 | bwd_microstep: 1666.90 | bwd_inner_microstep: 1666.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 22:15:34,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.75 | bwd_microstep: 1184.18 | bwd_inner_microstep: 1184.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-10 22:15:36,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 975.34 | bwd_inner_microstep: 975.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3886
[2024-06-10 22:15:38,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.78 | bwd_microstep: 1782.50 | bwd_inner_microstep: 1782.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-10 22:15:40,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1398.80 | bwd_inner_microstep: 1398.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-10 22:15:43,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.38 | bwd_microstep: 1750.65 | bwd_inner_microstep: 1750.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-10 22:15:45,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.35 | bwd_microstep: 1645.64 | bwd_inner_microstep: 1645.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968
[2024-06-10 22:15:46,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.72 | bwd_microstep: 703.53 | bwd_inner_microstep: 703.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2892
[2024-06-10 22:15:47,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.42 | bwd_microstep: 1025.40 | bwd_inner_microstep: 1025.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547
[2024-06-10 22:15:49,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.36 | bwd_microstep: 1329.71 | bwd_inner_microstep: 1329.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3565
[2024-06-10 22:15:51,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1423.13 | bwd_inner_microstep: 1423.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 22:15:53,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1512.05 | bwd_inner_microstep: 1512.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064
[2024-06-10 22:15:55,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-10 22:15:55,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.59 | bwd_microstep: 1958.15 | bwd_inner_microstep: 918.86 | bwd_allreduce_microstep: 1039.25 | step_microstep: 38.06
[2024-06-10 22:15:55,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16626.29 | bwd: 45617.26 | bwd_inner: 44577.12 | bwd_allreduce: 1039.48 | step: 39.51
{'loss': 1.1749, 'learning_rate': 7.434021590913573e-06, 'epoch': 0.72}
�████▏  | 1246/1726 [21:33:24<8:12:52, 61.61s/it]
 72%|███████▏  | 1247/1726 [21:34:26<8:13:23, 61.80s/it]


 72%|███████▏  | 1247/1726 [21:34:26<8:13:23, 61.80s/it]
 72%|███████▏  | 1248/1726 [21:35:26<8:07:39, 61.21s/it]


 72%|███████▏  | 1248/1726 [21:35:26<8:07:39, 61.21s/it]
 72%|███████▏  | 1249/1726 [21:36:27<8:06:29, 61.19s/it]


 72%|███████▏  | 1249/1726 [21:36:27<8:06:29, 61.19s/it]
 72%|███████▏  | 1250/1726 [21:37:30<8:07:49, 61.49s/it]


 72%|███████▏  | 1250/1726 [21:37:30<8:07:49, 61.49s/it]
 72%|███████▏  | 1251/1726 [21:38:32<8:09:23, 61.82s/it]


 72%|████dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 22:15:57,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.16 | bwd_microstep: 1471.14 | bwd_inner_microstep: 1471.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3968
[2024-06-10 22:15:59,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.70 | bwd_microstep: 1430.30 | bwd_inner_microstep: 1430.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3880
[2024-06-10 22:16:02,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.70 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 22:16:03,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1346.03 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-10 22:16:05,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1321.32 | bwd_inner_microstep: 1321.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3524
[2024-06-10 22:16:07,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1228.39 | bwd_inner_microstep: 1228.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-10 22:16:09,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1352.00 | bwd_inner_microstep: 1351.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 22:16:11,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1247.61 | bwd_inner_microstep: 1247.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-10 22:16:12,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.07 | bwd_microstep: 1155.02 | bwd_inner_microstep: 1155.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3500
[2024-06-10 22:16:14,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1247.98 | bwd_inner_microstep: 1247.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 22:16:16,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.90 | bwd_microstep: 1259.74 | bwd_inner_microstep: 1259.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 22:16:17,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.58 | bwd_microstep: 1285.90 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3446
[2024-06-10 22:16:19,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1337.21 | bwd_inner_microstep: 1337.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 22:16:21,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1389.21 | bwd_inner_microstep: 1389.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 22:16:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1380.99 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2157
[2024-06-10 22:16:24,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.60 | bwd_microstep: 1044.36 | bwd_inner_microstep: 1044.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 22:16:27,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.97 | bwd_microstep: 1655.37 | bwd_inner_microstep: 1655.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3620
[2024-06-10 22:16:29,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.51 | bwd_microstep: 1648.38 | bwd_inner_microstep: 1648.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 22:16:31,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.38 | bwd_microstep: 1601.48 | bwd_inner_microstep: 1601.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-10 22:16:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1710.72 | bwd_inner_microstep: 1710.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-10 22:16:35,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1285.66 | bwd_inner_microstep: 1285.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 22:16:37,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.26 | bwd_microstep: 1292.54 | bwd_inner_microstep: 1292.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 22:16:39,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1378.26 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-10 22:16:41,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.63 | bwd_microstep: 1195.84 | bwd_inner_microstep: 1195.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 22:16:42,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.54 | bwd_microstep: 877.39 | bwd_inner_microstep: 877.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3460
[2024-06-10 22:16:44,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.90 | bwd_microstep: 1213.82 | bwd_inner_microstep: 1213.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3558
[2024-06-10 22:16:45,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.02 | bwd_microstep: 1207.74 | bwd_inner_microstep: 1207.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3804
[2024-06-10 22:16:48,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1686.03 | bwd_inner_microstep: 1686.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2208
[2024-06-10 22:16:49,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.81 | bwd_microstep: 957.02 | bwd_inner_microstep: 956.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 22:16:51,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 1447.08 | bwd_inner_microstep: 1447.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3806
[2024-06-10 22:16:53,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.84 | bwd_microstep: 1604.31 | bwd_inner_microstep: 1604.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3769
[2024-06-10 22:16:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.44 | optimizer_gradients: 4.27 | optimizer_step: 6.59
[2024-06-10 22:16:59,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 4989.67 | bwd_inner_microstep: 2202.87 | bwd_allreduce_microstep: 2786.74 | step_microstep: 39.12
[2024-06-10 22:16:59,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16271.60 | bwd: 46767.10 | bwd_inner: 43979.23 | bwd_allreduce: 2787.07 | step: 40.78
{'loss': 1.1265, 'learning_rate': 7.404843364091951e-06, 'epoch': 0.73}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3467
[2024-06-10 22:17:01,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1514.02 | bwd_inner_microstep: 1514.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3920
[2024-06-10 22:17:03,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.70 | bwd_microstep: 1326.55 | bwd_inner_microstep: 1326.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798
[2024-06-10 22:17:05,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.68 | bwd_microstep: 1441.82 | bwd_inner_microstep: 1441.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1275
[2024-06-10 22:17:05,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 186.31 | bwd_microstep: 487.01 | bwd_inner_microstep: 486.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760
[2024-06-10 22:17:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.42 | bwd_microstep: 1432.42 | bwd_inner_microstep: 1432.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 22:17:09,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1381.44 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-10 22:17:11,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.23 | bwd_microstep: 1184.63 | bwd_inner_microstep: 1184.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-10 22:17:12,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.58 | bwd_microstep: 682.98 | bwd_inner_microstep: 682.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756
[2024-06-10 22:17:14,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.39 | bwd_microstep: 1644.03 | bwd_inner_microstep: 1644.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486
[2024-06-10 22:17:16,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.22 | bwd_microstep: 1438.86 | bwd_inner_microstep: 1438.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2884
[2024-06-10 22:17:18,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.71 | bwd_microstep: 1122.37 | bwd_inner_microstep: 1122.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 22:17:20,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1615.94 | bwd_inner_microstep: 1615.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1959
[2024-06-10 22:17:21,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.08 | bwd_microstep: 886.89 | bwd_inner_microstep: 886.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-10 22:17:23,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1444.44 | bwd_inner_microstep: 1444.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3656
[2024-06-10 22:17:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.15 | bwd_microstep: 1443.55 | bwd_inner_microstep: 1443.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141
[2024-06-10 22:17:26,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.29 | bwd_microstep: 834.87 | bwd_inner_microstep: 834.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921
[2024-06-10 22:17:27,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.73 | bwd_microstep: 726.71 | bwd_inner_microstep: 726.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 22:17:29,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.59 | bwd_microstep: 1280.15 | bwd_inner_microstep: 1280.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 22:17:31,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 22:17:33,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3528
[2024-06-10 22:17:35,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.38 | bwd_microstep: 1198.96 | bwd_inner_microstep: 1198.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 22:17:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-10 22:17:39,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1443.97 | bwd_inner_microstep: 1443.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 22:17:41,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.79 | bwd_microstep: 1506.03 | bwd_inner_microstep: 1506.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 22:17:42,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1314.55 | bwd_inner_microstep: 1314.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 22:17:45,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.61 | bwd_microstep: 1653.56 | bwd_inner_microstep: 1653.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-10 22:17:47,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.23 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2929
[2024-06-10 22:17:49,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.78 | bwd_microstep: 1190.36 | bwd_inner_microstep: 1190.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 22:17:50,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 22:17:52,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1548.44 | bwd_inner_microstep: 1548.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-10 22:17:55,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.86 | bwd_microstep: 1532.70 | bwd_inner_microstep: 1532.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3613
[2024-06-10 22:18:02,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-10 22:18:02,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.56 | bwd_microstep: 6509.04 | bwd_inner_microstep: 1541.82 | bwd_allreduce_microstep: 4967.16 | step_microstep: 38.00
[2024-06-10 22:18:02,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15657.37 | bwd: 46840.23 | bwd_inner: 41872.18 | bwd_allreduce: 4967.39 | step: 39.50
{'loss': 1.1938, 'learning_rate': 7.37570949734653e-06, 'epoch': 0.73}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 22:18:04,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1460.87 | bwd_inner_microstep: 1460.68 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3919
[2024-06-10 22:18:06,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1517.80 | bwd_inner_microstep: 1517.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3880
[2024-06-10 22:18:08,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.45 | bwd_microstep: 1579.30 | bwd_inner_microstep: 1579.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 22:18:10,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.83 | bwd_microstep: 1178.84 | bwd_inner_microstep: 1178.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752
[2024-06-10 22:18:11,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1335.57 | bwd_inner_microstep: 1335.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-10 22:18:12,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.49 | bwd_microstep: 787.23 | bwd_inner_microstep: 787.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3480
[2024-06-10 22:18:14,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-10 22:18:16,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.59 | bwd_microstep: 1247.37 | bwd_inner_microstep: 1247.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-10 22:18:18,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.46 | bwd_microstep: 1628.96 | bwd_inner_microstep: 1628.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 22:18:20,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1487.58 | bwd_inner_microstep: 1487.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 22:18:22,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1478.18 | bwd_inner_microstep: 1478.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3591
[2024-06-10 22:18:25,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.90 | bwd_microstep: 1702.48 | bwd_inner_microstep: 1702.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-10 22:18:26,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.71 | bwd_microstep: 1299.76 | bwd_inner_microstep: 1299.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2030
[2024-06-10 22:18:27,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.20 | bwd_microstep: 744.95 | bwd_inner_microstep: 744.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 22:18:29,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1380.94 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 22:18:31,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.51 | bwd_microstep: 1347.97 | bwd_inner_microstep: 1347.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3734
[2024-06-10 22:18:33,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.59 | bwd_microstep: 1336.33 | bwd_inner_microstep: 1336.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 22:18:35,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1281.72 | bwd_inner_microstep: 1281.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615
[2024-06-10 22:18:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1342.68 | bwd_inner_microstep: 1342.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 22:18:39,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1509.47 | bwd_inner_microstep: 1509.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005
[2024-06-10 22:18:40,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.87 | bwd_microstep: 838.19 | bwd_inner_microstep: 838.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3535
[2024-06-10 22:18:42,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1228.39 | bwd_inner_microstep: 1228.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-10 22:18:44,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1426.86 | bwd_inner_microstep: 1426.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-10 22:18:45,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1314.28 | bwd_inner_microstep: 1314.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-10 22:18:47,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1341.93 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3543
[2024-06-10 22:18:49,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1539.98 | bwd_inner_microstep: 1539.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3534
[2024-06-10 22:18:52,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.77 | bwd_microstep: 1586.77 | bwd_inner_microstep: 1586.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816
[2024-06-10 22:18:54,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.41 | bwd_microstep: 1756.13 | bwd_inner_microstep: 1756.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 22:18:56,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1647.01 | bwd_inner_microstep: 1646.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2015
[2024-06-10 22:18:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.17 | bwd_microstep: 930.88 | bwd_inner_microstep: 930.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2538
[2024-06-10 22:18:59,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.47 | bwd_microstep: 1089.79 | bwd_inner_microstep: 1089.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-10 22:19:02,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.00 | optimizer_step: 6.57
[2024-06-10 22:19:02,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.94 | bwd_microstep: 2293.00 | bwd_inner_microstep: 1798.65 | bwd_allreduce_microstep: 494.30 | step_microstep: 37.28
[2024-06-10 22:19:02,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16188.30 | bwd: 43884.24 | bwd_inner: 43388.90 | bwd_allreduce: 494.61 | step: 38.98
{'loss': 1.1942, 'learning_rate': 7.3466200932866334e-06, 'epoch': 0.73}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 22:19:04,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1374.48 | bwd_inner_microstep: 1374.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 22:19:06,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.64 | bwd_microstep: 1345.12 | bwd_inner_microstep: 1345.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 22:19:08,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1482.10 | bwd_inner_microstep: 1482.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 22:19:10,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 22:19:11,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.44 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775
[2024-06-10 22:19:14,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.85 | bwd_microstep: 1541.37 | bwd_inner_microstep: 1541.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 22:19:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 22:19:17,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1384.50 | bwd_inner_microstep: 1384.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1874
[2024-06-10 22:19:18,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 709.51 | bwd_inner_microstep: 709.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3404
[2024-06-10 22:19:20,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.86 | bwd_microstep: 1292.82 | bwd_inner_microstep: 1292.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2707
[2024-06-10 22:19:22,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.36 | bwd_microstep: 1208.05 | bwd_inner_microstep: 1208.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-10 22:19:24,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1442.80 | bwd_inner_microstep: 1442.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3495
[2024-06-10 22:19:26,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.10 | bwd_microstep: 1548.21 | bwd_inner_microstep: 1548.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-10 22:19:28,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1510.61 | bwd_inner_microstep: 1510.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 22:19:30,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1390.25 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-10 22:19:31,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.68 | bwd_microstep: 685.18 | bwd_inner_microstep: 685.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 22:19:33,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1256.33 | bwd_inner_microstep: 1256.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-10 22:19:34,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.12 | bwd_microstep: 1311.29 | bwd_inner_microstep: 1311.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 22:19:37,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.21 | bwd_microstep: 1524.75 | bwd_inner_microstep: 1524.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-10 22:19:38,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1410.96 | bwd_inner_microstep: 1410.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043
[2024-06-10 22:19:40,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.66 | bwd_microstep: 807.66 | bwd_inner_microstep: 807.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 22:19:41,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1351.01 | bwd_inner_microstep: 1350.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 22:19:43,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1392.27 | bwd_inner_microstep: 1392.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-10 22:19:45,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.54 | bwd_microstep: 1529.90 | bwd_inner_microstep: 1529.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 22:19:48,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 1658.16 | bwd_inner_microstep: 1658.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 22:19:50,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1415.68 | bwd_inner_microstep: 1415.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2651
[2024-06-10 22:19:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.69 | bwd_microstep: 1006.77 | bwd_inner_microstep: 1006.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 22:19:53,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1348.42 | bwd_inner_microstep: 1348.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-10 22:19:55,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-10 22:19:57,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1496.94 | bwd_inner_microstep: 1496.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-10 22:19:59,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.93 | bwd_microstep: 1444.22 | bwd_inner_microstep: 1444.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3587
[2024-06-10 22:20:03,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 22:20:03,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 3754.59 | bwd_inner_microstep: 1729.07 | bwd_allreduce_microstep: 2025.47 | step_microstep: 37.94
[2024-06-10 22:20:03,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15986.16 | bwd: 44986.27 | bwd_inner: 42959.90 | bwd_allreduce: 2025.69 | step: 39.36
{'loss': 1.2394, 'learning_rate': 7.31757525436499e-06, 'epoch': 0.73}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 22:20:05,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1365.57 | bwd_inner_microstep: 1365.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 22:20:07,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1376.06 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 22:20:09,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.68 | bwd_microstep: 1275.14 | bwd_inner_microstep: 1275.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3877
[2024-06-10 22:20:11,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 22:20:12,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.35 | bwd_microstep: 675.62 | bwd_inner_microstep: 675.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-10 22:20:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.85 | bwd_microstep: 708.66 | bwd_inner_microstep: 708.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-10 22:20:15,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1499.55 | bwd_inner_microstep: 1499.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-10 22:20:17,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.62 | bwd_inner_microstep: 1345.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4084
[2024-06-10 22:20:19,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.62 | bwd_microstep: 1555.25 | bwd_inner_microstep: 1555.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 22:20:21,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 22:20:23,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1285.42 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 22:20:25,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1388.74 | bwd_inner_microstep: 1388.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 22:20:26,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.53 | bwd_microstep: 1287.54 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-10 22:20:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.20 | bwd_microstep: 893.35 | bwd_inner_microstep: 893.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2393
[2024-06-10 22:20:29,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.94 | bwd_microstep: 1028.44 | bwd_inner_microstep: 1028.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 22:20:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.89 | bwd_microstep: 1484.54 | bwd_inner_microstep: 1484.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3645
[2024-06-10 22:20:33,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.27 | bwd_microstep: 1647.98 | bwd_inner_microstep: 1647.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 22:20:35,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.70 | bwd_microstep: 1286.93 | bwd_inner_microstep: 1286.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-10 22:20:37,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1289.66 | bwd_inner_microstep: 1289.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 22:20:39,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1516.46 | bwd_inner_microstep: 1516.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-10 22:20:41,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1494.28 | bwd_inner_microstep: 1494.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472
[2024-06-10 22:20:43,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.71 | bwd_microstep: 1405.71 | bwd_inner_microstep: 1405.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2060
[2024-06-10 22:20:44,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.79 | bwd_microstep: 916.69 | bwd_inner_microstep: 916.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-10 22:20:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.47 | bwd_microstep: 731.74 | bwd_inner_microstep: 731.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 22:20:47,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1495.42 | bwd_inner_microstep: 1495.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 22:20:49,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1293.24 | bwd_inner_microstep: 1293.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3552
[2024-06-10 22:20:51,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1358.48 | bwd_inner_microstep: 1358.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 22:20:53,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.28 | bwd_microstep: 1505.41 | bwd_inner_microstep: 1505.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 22:20:55,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.42 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 22:20:57,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1432.70 | bwd_inner_microstep: 1432.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054
[2024-06-10 22:20:58,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.63 | bwd_microstep: 911.38 | bwd_inner_microstep: 911.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 22:21:04,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.15 | optimizer_step: 6.57
[2024-06-10 22:21:04,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.64 | bwd_microstep: 4649.35 | bwd_inner_microstep: 1639.51 | bwd_allreduce_microstep: 3009.78 | step_microstep: 39.27
[2024-06-10 22:21:04,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15498.41 | bwd: 44526.91 | bwd_inner: 41516.19 | bwd_allreduce: 3010.01 | step: 40.79
{'loss': 1.1534, 'learning_rate': 7.2885750828773694e-06, 'epoch': 0.73}
███▏  | 1251/1726 [21:38:32<8:09:23, 61.82s/it]
 73%|███████▎  | 1252/1726 [21:39:36<8:12:04, 62.29s/it]


 73%|███████▎  | 1252/1726 [21:39:36<8:12:04, 62.29s/it]
 73%|███████▎  | 1253/1726 [21:40:38<8:12:17, 62.45s/it]


 73%|███████▎  | 1253/1726 [21:40:38<8:12:17, 62.45s/it]
 73%|███████▎  | 1254/1726 [21:41:39<8:06:27, 61.84s/it]


 73%|███████▎  | 1254/1726 [21:41:39<8:06:27, 61.84s/it]
 73%|███████▎  | 1255/1726 [21:42:40<8:04:10, 61.68s/it]


 73%|███████▎  | 1255/1726 [21:42:40<8:04:10, 61.68s/it]
 73%|███████▎  | 1256/1726 [21:43:40<8:00:02, 61.28s/it]


 73%|█████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 22:21:06,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1343.59 | bwd_inner_microstep: 1343.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 22:21:07,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.15 | bwd_microstep: 1241.29 | bwd_inner_microstep: 1241.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 22:21:09,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1342.33 | bwd_inner_microstep: 1342.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3799
[2024-06-10 22:21:11,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.21 | bwd_microstep: 1256.27 | bwd_inner_microstep: 1256.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759
[2024-06-10 22:21:13,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.55 | bwd_microstep: 1638.82 | bwd_inner_microstep: 1638.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 22:21:15,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.75 | bwd_microstep: 1384.38 | bwd_inner_microstep: 1384.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 22:21:17,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1286.47 | bwd_inner_microstep: 1286.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 22:21:19,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1382.81 | bwd_inner_microstep: 1382.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 22:21:21,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1379.66 | bwd_inner_microstep: 1379.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3628
[2024-06-10 22:21:23,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.57 | bwd_microstep: 1564.70 | bwd_inner_microstep: 1564.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 22:21:25,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1473.09 | bwd_inner_microstep: 1473.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677
[2024-06-10 22:21:27,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.99 | bwd_microstep: 1719.70 | bwd_inner_microstep: 1719.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 22:21:29,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.08 | bwd_microstep: 1343.16 | bwd_inner_microstep: 1343.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454
[2024-06-10 22:21:31,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1313.87 | bwd_inner_microstep: 1313.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3672
[2024-06-10 22:21:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1375.34 | bwd_inner_microstep: 1375.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522
[2024-06-10 22:21:35,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1458.67 | bwd_inner_microstep: 1458.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-10 22:21:37,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1409.01 | bwd_inner_microstep: 1408.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529
[2024-06-10 22:21:38,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.16 | bwd_microstep: 1196.37 | bwd_inner_microstep: 1196.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 22:21:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 22:21:42,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1382.82 | bwd_inner_microstep: 1382.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 22:21:44,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1285.46 | bwd_inner_microstep: 1285.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 22:21:46,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1256.47 | bwd_inner_microstep: 1256.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177
[2024-06-10 22:21:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.83 | bwd_microstep: 857.61 | bwd_inner_microstep: 857.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3806
[2024-06-10 22:21:49,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1290.28 | bwd_inner_microstep: 1290.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-10 22:21:50,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.70 | bwd_microstep: 1299.15 | bwd_inner_microstep: 1299.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-10 22:21:52,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.88 | bwd_microstep: 1307.02 | bwd_inner_microstep: 1306.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060
[2024-06-10 22:21:53,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.84 | bwd_microstep: 850.75 | bwd_inner_microstep: 850.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-10 22:21:55,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1389.18 | bwd_inner_microstep: 1389.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2171
[2024-06-10 22:21:57,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.54 | bwd_microstep: 1012.64 | bwd_inner_microstep: 1012.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3815
[2024-06-10 22:21:59,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.44 | bwd_microstep: 1817.88 | bwd_inner_microstep: 1817.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 22:22:01,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1603.45 | bwd_inner_microstep: 1603.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-10 22:22:04,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.13 | optimizer_step: 6.64
[2024-06-10 22:22:04,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.96 | bwd_microstep: 1700.85 | bwd_inner_microstep: 1684.68 | bwd_allreduce_microstep: 16.12 | step_microstep: 39.14
[2024-06-10 22:22:04,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16253.62 | bwd: 43443.61 | bwd_inner: 43426.58 | bwd_allreduce: 16.35 | step: 40.71
{'loss': 1.2284, 'learning_rate': 7.259619680962222e-06, 'epoch': 0.73}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 22:22:06,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.16 | bwd_microstep: 1379.99 | bwd_inner_microstep: 1379.89 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-10 22:22:07,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.52 | bwd_microstep: 712.96 | bwd_inner_microstep: 712.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894
[2024-06-10 22:22:09,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1587.26 | bwd_inner_microstep: 1587.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4208
[2024-06-10 22:22:11,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.19 | bwd_microstep: 1563.07 | bwd_inner_microstep: 1563.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-10 22:22:13,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.53 | bwd_microstep: 1438.94 | bwd_inner_microstep: 1438.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-10 22:22:15,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 22:22:17,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.42 | bwd_microstep: 1654.56 | bwd_inner_microstep: 1654.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613
[2024-06-10 22:22:19,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.36 | bwd_microstep: 1217.77 | bwd_inner_microstep: 1217.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3416
[2024-06-10 22:22:20,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.50 | bwd_microstep: 1214.21 | bwd_inner_microstep: 1214.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 22:22:22,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1510.88 | bwd_inner_microstep: 1510.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678
[2024-06-10 22:22:24,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.43 | bwd_microstep: 1366.83 | bwd_inner_microstep: 1366.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3504
[2024-06-10 22:22:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1399.24 | bwd_inner_microstep: 1399.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-10 22:22:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.23 | bwd_microstep: 1338.10 | bwd_inner_microstep: 1338.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 22:22:30,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1339.25 | bwd_inner_microstep: 1339.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107
[2024-06-10 22:22:31,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.47 | bwd_microstep: 1018.19 | bwd_inner_microstep: 1018.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 22:22:33,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1343.48 | bwd_inner_microstep: 1343.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3638
[2024-06-10 22:22:35,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1571.79 | bwd_inner_microstep: 1571.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3508
[2024-06-10 22:22:38,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.87 | bwd_microstep: 1557.90 | bwd_inner_microstep: 1557.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-10 22:22:40,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984
[2024-06-10 22:22:41,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.65 | bwd_microstep: 734.51 | bwd_inner_microstep: 734.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 22:22:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1289.68 | bwd_inner_microstep: 1289.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 22:22:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1396.21 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 22:22:45,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.13 | bwd_microstep: 797.87 | bwd_inner_microstep: 797.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-10 22:22:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-10 22:22:48,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.83 | bwd_microstep: 917.71 | bwd_inner_microstep: 917.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 22:22:50,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.17 | bwd_microstep: 1304.01 | bwd_inner_microstep: 1303.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2032
[2024-06-10 22:22:51,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.84 | bwd_microstep: 839.41 | bwd_inner_microstep: 839.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-10 22:22:53,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1451.76 | bwd_inner_microstep: 1451.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2054
[2024-06-10 22:22:55,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.24 | bwd_microstep: 874.51 | bwd_inner_microstep: 874.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 22:22:56,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.70 | bwd_microstep: 808.03 | bwd_inner_microstep: 808.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2990
[2024-06-10 22:22:57,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.69 | bwd_microstep: 1139.93 | bwd_inner_microstep: 1139.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 22:23:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-10 22:23:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.02 | bwd_microstep: 8170.92 | bwd_inner_microstep: 1741.20 | bwd_allreduce_microstep: 6429.66 | step_microstep: 38.11
[2024-06-10 22:23:06,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15121.65 | bwd: 46947.33 | bwd_inner: 40516.67 | bwd_allreduce: 6429.94 | step: 39.64
{'loss': 1.211, 'learning_rate': 7.2307091506003325e-06, 'epoch': 0.73}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 22:23:08,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1336.47 | bwd_inner_microstep: 1336.39 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 3125
[2024-06-10 22:23:09,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.57 | bwd_microstep: 1014.15 | bwd_inner_microstep: 1014.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2321
[2024-06-10 22:23:11,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.02 | bwd_microstep: 882.39 | bwd_inner_microstep: 882.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911
[2024-06-10 22:23:13,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.30 | bwd_microstep: 1587.41 | bwd_inner_microstep: 1587.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-10 22:23:15,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.11 | bwd_microstep: 1350.45 | bwd_inner_microstep: 1350.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 22:23:17,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.82 | bwd_microstep: 1376.21 | bwd_inner_microstep: 1376.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 22:23:18,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.66 | bwd_microstep: 1149.92 | bwd_inner_microstep: 1149.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 749
[2024-06-10 22:23:19,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.54 | bwd_microstep: 301.01 | bwd_inner_microstep: 300.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-10 22:23:20,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 22:23:21,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.67 | bwd_microstep: 795.30 | bwd_inner_microstep: 795.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 22:23:23,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1388.05 | bwd_inner_microstep: 1388.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3492
[2024-06-10 22:23:25,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.88 | bwd_microstep: 1329.65 | bwd_inner_microstep: 1329.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1934
[2024-06-10 22:23:26,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 820.78 | bwd_inner_microstep: 820.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-10 22:23:28,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1493.17 | bwd_inner_microstep: 1493.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 22:23:30,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.25 | bwd_microstep: 1473.94 | bwd_inner_microstep: 1473.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-10 22:23:32,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.94 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3430
[2024-06-10 22:23:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 934
[2024-06-10 22:23:34,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.93 | bwd_microstep: 378.23 | bwd_inner_microstep: 378.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3634
[2024-06-10 22:23:36,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1343.88 | bwd_inner_microstep: 1343.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828
[2024-06-10 22:23:38,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.56 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3619
[2024-06-10 22:23:40,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.71 | bwd_microstep: 1453.83 | bwd_inner_microstep: 1453.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055
[2024-06-10 22:23:41,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.53 | bwd_microstep: 909.89 | bwd_inner_microstep: 909.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-10 22:23:43,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.82 | bwd_microstep: 1158.00 | bwd_inner_microstep: 1157.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-10 22:23:45,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.05 | bwd_microstep: 1438.02 | bwd_inner_microstep: 1437.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-10 22:23:47,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.06 | bwd_microstep: 1297.60 | bwd_inner_microstep: 1297.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871
[2024-06-10 22:23:49,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.47 | bwd_microstep: 1662.58 | bwd_inner_microstep: 1662.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1991
[2024-06-10 22:23:50,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.42 | bwd_microstep: 833.47 | bwd_inner_microstep: 833.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 22:23:52,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.05 | bwd_microstep: 1537.89 | bwd_inner_microstep: 1537.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3622
[2024-06-10 22:23:54,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.46 | bwd_microstep: 1537.25 | bwd_inner_microstep: 1537.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3805
[2024-06-10 22:23:57,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.44 | bwd_microstep: 1685.32 | bwd_inner_microstep: 1685.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-10 22:23:58,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1315.93 | bwd_inner_microstep: 1315.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-10 22:24:08,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.29 | optimizer_step: 6.60
[2024-06-10 22:24:08,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.15 | bwd_microstep: 9094.04 | bwd_inner_microstep: 972.01 | bwd_allreduce_microstep: 8121.97 | step_microstep: 39.28
[2024-06-10 22:24:08,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14527.09 | bwd: 46957.98 | bwd_inner: 38835.04 | bwd_allreduce: 8122.24 | step: 40.82
{'loss': 1.2119, 'learning_rate': 7.201843593614428e-06, 'epoch': 0.73}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 22:24:10,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1330.46 | bwd_inner_microstep: 1330.30 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 22:24:12,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1478.06 | bwd_inner_microstep: 1478.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 22:24:14,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.33 | bwd_microstep: 1470.81 | bwd_inner_microstep: 1470.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 22:24:16,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.38 | bwd_microstep: 1540.44 | bwd_inner_microstep: 1540.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 22:24:18,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1242.35 | bwd_inner_microstep: 1242.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 22:24:20,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-10 22:24:22,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.26 | bwd_microstep: 1538.31 | bwd_inner_microstep: 1538.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-10 22:24:23,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.45 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3435
[2024-06-10 22:24:25,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 22:24:27,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.57 | bwd_microstep: 1189.31 | bwd_inner_microstep: 1189.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3995
[2024-06-10 22:24:29,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.91 | bwd_microstep: 1708.02 | bwd_inner_microstep: 1707.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-10 22:24:31,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1251.47 | bwd_inner_microstep: 1251.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 22:24:33,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-10 22:24:35,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1428.58 | bwd_inner_microstep: 1428.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-10 22:24:37,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1544.21 | bwd_inner_microstep: 1544.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3643
[2024-06-10 22:24:39,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1448.17 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446
[2024-06-10 22:24:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.91 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 22:24:43,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1351.50 | bwd_inner_microstep: 1351.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-10 22:24:45,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.34 | bwd_microstep: 1293.43 | bwd_inner_microstep: 1293.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3632
[2024-06-10 22:24:47,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.14 | bwd_microstep: 1343.10 | bwd_inner_microstep: 1343.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 22:24:49,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.61 | bwd_microstep: 1529.69 | bwd_inner_microstep: 1529.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535
[2024-06-10 22:24:51,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.78 | bwd_microstep: 1417.08 | bwd_inner_microstep: 1417.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1976
[2024-06-10 22:24:52,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.77 | bwd_microstep: 829.49 | bwd_inner_microstep: 829.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2187
[2024-06-10 22:24:53,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.64 | bwd_microstep: 956.20 | bwd_inner_microstep: 956.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676
[2024-06-10 22:24:55,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.27 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 22:24:57,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1279.72 | bwd_inner_microstep: 1279.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 22:24:59,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1501.31 | bwd_inner_microstep: 1501.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-10 22:25:01,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.10 | bwd_microstep: 1317.29 | bwd_inner_microstep: 1317.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-10 22:25:03,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.85 | bwd_microstep: 1508.64 | bwd_inner_microstep: 1508.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588
[2024-06-10 22:25:05,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.69 | bwd_microstep: 1703.74 | bwd_inner_microstep: 1703.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433
[2024-06-10 22:25:07,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.47 | bwd_microstep: 1374.25 | bwd_inner_microstep: 1374.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 22:25:10,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-10 22:25:10,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.37 | bwd_microstep: 1983.47 | bwd_inner_microstep: 1686.42 | bwd_allreduce_microstep: 296.99 | step_microstep: 38.70
[2024-06-10 22:25:10,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16649.40 | bwd: 44929.71 | bwd_inner: 44631.68 | bwd_allreduce: 297.29 | step: 40.29
{'loss': 1.1763, 'learning_rate': 7.173023111668868e-06, 'epoch': 0.73}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461
[2024-06-10 22:25:12,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.60 | bwd_microstep: 1571.71 | bwd_inner_microstep: 1571.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 22:25:14,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1242.59 | bwd_inner_microstep: 1242.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-10 22:25:16,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.17 | bwd_microstep: 1653.85 | bwd_inner_microstep: 1653.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-10 22:25:18,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1437.24 | bwd_inner_microstep: 1437.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 22:25:20,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1280.32 | bwd_inner_microstep: 1280.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 22:25:22,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 22:25:23,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.11 | bwd_microstep: 1281.22 | bwd_inner_microstep: 1281.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3751
[2024-06-10 22:25:26,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.62 | bwd_microstep: 1504.75 | bwd_inner_microstep: 1504.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 22:25:27,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1386.80 | bwd_inner_microstep: 1386.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 22:25:29,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 1288.84 | bwd_inner_microstep: 1288.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3490
[2024-06-10 22:25:31,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.37 | bwd_microstep: 1507.74 | bwd_inner_microstep: 1507.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 22:25:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.38 | bwd_microstep: 1577.01 | bwd_inner_microstep: 1576.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441
[2024-06-10 22:25:35,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1397.92 | bwd_inner_microstep: 1397.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2651
[2024-06-10 22:25:37,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.22 | bwd_microstep: 1212.26 | bwd_inner_microstep: 1212.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 22:25:39,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.22 | bwd_microstep: 1250.31 | bwd_inner_microstep: 1250.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 22:25:41,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.65 | bwd_microstep: 1318.42 | bwd_inner_microstep: 1318.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442
[2024-06-10 22:25:42,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.96 | bwd_microstep: 1186.96 | bwd_inner_microstep: 1186.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3504
[2024-06-10 22:25:44,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.92 | bwd_microstep: 1222.82 | bwd_inner_microstep: 1222.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 22:25:46,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1509.79 | bwd_inner_microstep: 1509.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 22:25:48,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.78 | bwd_microstep: 1259.44 | bwd_inner_microstep: 1259.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-10 22:25:50,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.92 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-10 22:25:52,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.83 | bwd_microstep: 1185.52 | bwd_inner_microstep: 1185.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 22:25:53,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1285.17 | bwd_inner_microstep: 1285.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 22:25:55,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 1388.88 | bwd_inner_microstep: 1388.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 22:25:57,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1556.55 | bwd_inner_microstep: 1556.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3833
[2024-06-10 22:26:00,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.90 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-10 22:26:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 22:26:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1489.29 | bwd_inner_microstep: 1489.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3550
[2024-06-10 22:26:06,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.88 | bwd_microstep: 1421.07 | bwd_inner_microstep: 1421.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3570
[2024-06-10 22:26:08,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.78 | bwd_microstep: 1695.89 | bwd_inner_microstep: 1695.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 22:26:10,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.99 | bwd_microstep: 1646.38 | bwd_inner_microstep: 1646.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3388
[2024-06-10 22:26:14,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-10 22:26:14,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.91 | bwd_microstep: 3103.43 | bwd_inner_microstep: 1556.34 | bwd_allreduce_microstep: 1547.03 | step_microstep: 39.43
[2024-06-10 22:26:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16847.21 | bwd: 46746.39 | bwd_inner: 45198.45 | bwd_allreduce: 1547.26 | step: 41.26
{'loss': 1.2022, 'learning_rate': 7.1442478062692135e-06, 'epoch': 0.73}
��█▎  | 1256/1726 [21:43:40<8:00:02, 61.28s/it]
 73%|███████▎  | 1257/1726 [21:44:40<7:56:06, 60.91s/it]


 73%|███████▎  | 1257/1726 [21:44:40<7:56:06, 60.91s/it]
 73%|███████▎  | 1258/1726 [21:45:43<7:58:34, 61.36s/it]


 73%|███████▎  | 1258/1726 [21:45:43<7:58:34, 61.36s/it]
 73%|███████▎  | 1259/1726 [21:46:45<7:58:38, 61.49s/it]


 73%|███████▎  | 1259/1726 [21:46:45<7:58:38, 61.49s/it]
 73%|███████▎  | 1260/1726 [21:47:47<7:58:36, 61.62s/it]


 73%|███████▎  | 1260/1726 [21:47:47<7:58:36, 61.62s/it]
 73%|███████▎  | 1261/1726 [21:48:51<8:02:59, 62.32s/it]


 73%|██████�dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482
[2024-06-10 22:26:16,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.06 | bwd_microstep: 1571.04 | bwd_inner_microstep: 1571.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3923
[2024-06-10 22:26:18,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3879
[2024-06-10 22:26:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1586.65 | bwd_inner_microstep: 1586.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3902
[2024-06-10 22:26:22,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1585.21 | bwd_inner_microstep: 1585.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 22:26:24,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.80 | bwd_microstep: 1245.68 | bwd_inner_microstep: 1245.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2924
[2024-06-10 22:26:26,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.25 | bwd_microstep: 1188.54 | bwd_inner_microstep: 1188.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-10 22:26:27,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.25 | bwd_microstep: 1301.32 | bwd_inner_microstep: 1301.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 22:26:29,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.24 | bwd_microstep: 1403.50 | bwd_inner_microstep: 1403.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 22:26:31,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 1484.16 | bwd_inner_microstep: 1484.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 22:26:33,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.45 | bwd_microstep: 1194.01 | bwd_inner_microstep: 1193.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3497
[2024-06-10 22:26:35,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1446.55 | bwd_inner_microstep: 1446.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 22:26:37,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1485.98 | bwd_inner_microstep: 1485.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 22:26:39,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1345.38 | bwd_inner_microstep: 1345.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 22:26:41,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.41 | bwd_microstep: 1603.56 | bwd_inner_microstep: 1603.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 22:26:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1412.73 | bwd_inner_microstep: 1412.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544
[2024-06-10 22:26:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.53 | bwd_microstep: 1234.11 | bwd_inner_microstep: 1234.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 22:26:47,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1409.71 | bwd_inner_microstep: 1409.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 22:26:49,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1401.23 | bwd_inner_microstep: 1401.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-10 22:26:51,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1356.16 | bwd_inner_microstep: 1356.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3742
[2024-06-10 22:26:52,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.41 | bwd_microstep: 1341.26 | bwd_inner_microstep: 1341.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 22:26:54,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1389.52 | bwd_inner_microstep: 1389.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-10 22:26:56,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1412.44 | bwd_inner_microstep: 1412.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-10 22:26:58,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.29 | bwd_microstep: 1182.77 | bwd_inner_microstep: 1182.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533
[2024-06-10 22:27:00,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.90 | bwd_microstep: 1201.15 | bwd_inner_microstep: 1201.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3575
[2024-06-10 22:27:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1457.97 | bwd_inner_microstep: 1457.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3920
[2024-06-10 22:27:04,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.94 | bwd_microstep: 1333.62 | bwd_inner_microstep: 1333.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2684
[2024-06-10 22:27:05,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.89 | bwd_microstep: 1220.05 | bwd_inner_microstep: 1220.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 22:27:07,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1457.17 | bwd_inner_microstep: 1457.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3850
[2024-06-10 22:27:09,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1491.34 | bwd_inner_microstep: 1491.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-10 22:27:11,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.83 | bwd_microstep: 1441.30 | bwd_inner_microstep: 1441.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-10 22:27:13,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.12 | bwd_microstep: 1407.16 | bwd_inner_microstep: 1407.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2890
[2024-06-10 22:27:17,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.32 | optimizer_step: 6.60
[2024-06-10 22:27:17,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.64 | bwd_microstep: 3054.72 | bwd_inner_microstep: 1127.58 | bwd_allreduce_microstep: 1927.08 | step_microstep: 39.11
[2024-06-10 22:27:17,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16521.24 | bwd: 46037.86 | bwd_inner: 44109.85 | bwd_allreduce: 1927.31 | step: 40.61
{'loss': 1.19, 'learning_rate': 7.115517778761963e-06, 'epoch': 0.73}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512
[2024-06-10 22:27:19,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1412.52 | bwd_inner_microstep: 1412.44 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2893
[2024-06-10 22:27:20,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.69 | bwd_microstep: 998.90 | bwd_inner_microstep: 998.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 22:27:22,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1341.80 | bwd_inner_microstep: 1341.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795
[2024-06-10 22:27:24,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.21 | bwd_microstep: 1545.03 | bwd_inner_microstep: 1545.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 22:27:26,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1379.93 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 22:27:28,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1341.49 | bwd_inner_microstep: 1341.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-10 22:27:30,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1538.54 | bwd_inner_microstep: 1538.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 22:27:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.13 | bwd_microstep: 1188.45 | bwd_inner_microstep: 1188.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-10 22:27:34,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1526.85 | bwd_inner_microstep: 1526.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-10 22:27:36,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1491.32 | bwd_inner_microstep: 1491.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1996
[2024-06-10 22:27:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.18 | bwd_microstep: 830.16 | bwd_inner_microstep: 830.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-10 22:27:39,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.49 | bwd_microstep: 1519.71 | bwd_inner_microstep: 1519.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3521
[2024-06-10 22:27:41,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1303.63 | bwd_inner_microstep: 1303.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3507
[2024-06-10 22:27:43,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1336.87 | bwd_inner_microstep: 1336.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3785
[2024-06-10 22:27:45,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.51 | bwd_microstep: 1353.29 | bwd_inner_microstep: 1353.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-10 22:27:46,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.90 | bwd_microstep: 1342.07 | bwd_inner_microstep: 1342.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 22:27:48,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.69 | bwd_microstep: 1380.23 | bwd_inner_microstep: 1380.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 22:27:51,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1658.55 | bwd_inner_microstep: 1658.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 22:27:53,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.33 | bwd_microstep: 1414.83 | bwd_inner_microstep: 1414.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 22:27:54,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.60 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-10 22:27:57,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.71 | bwd_microstep: 1634.63 | bwd_inner_microstep: 1634.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 22:27:58,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1399.40 | bwd_inner_microstep: 1399.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807
[2024-06-10 22:28:01,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1479.12 | bwd_inner_microstep: 1479.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 22:28:03,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.18 | bwd_microstep: 1459.03 | bwd_inner_microstep: 1459.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 22:28:04,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1375.39 | bwd_inner_microstep: 1375.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477
[2024-06-10 22:28:07,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.64 | bwd_microstep: 2056.60 | bwd_inner_microstep: 2056.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-10 22:28:09,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.08 | bwd_microstep: 1351.04 | bwd_inner_microstep: 1351.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-10 22:28:11,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.57 | bwd_microstep: 1472.96 | bwd_inner_microstep: 1472.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 22:28:13,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.43 | bwd_microstep: 1593.62 | bwd_inner_microstep: 1593.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-10 22:28:15,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1448.83 | bwd_inner_microstep: 1448.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-10 22:28:17,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.71 | bwd_microstep: 1643.63 | bwd_inner_microstep: 1643.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3430
[2024-06-10 22:28:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-10 22:28:19,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.87 | bwd_microstep: 1513.37 | bwd_inner_microstep: 1505.63 | bwd_allreduce_microstep: 7.70 | step_microstep: 38.14
[2024-06-10 22:28:19,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16745.85 | bwd: 45618.09 | bwd_inner: 45609.43 | bwd_allreduce: 7.96 | step: 39.74
{'loss': 1.2217, 'learning_rate': 7.086833130334107e-06, 'epoch': 0.73}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471
[2024-06-10 22:28:21,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.04 | bwd_microstep: 1430.76 | bwd_inner_microstep: 1430.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 22:28:23,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.02 | bwd_microstep: 1476.87 | bwd_inner_microstep: 1476.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 22:28:26,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1548.26 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-10 22:28:27,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.02 | bwd_microstep: 732.12 | bwd_inner_microstep: 732.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 22:28:28,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.14 | bwd_microstep: 1155.38 | bwd_inner_microstep: 1155.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 22:28:29,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 791.77 | bwd_inner_microstep: 791.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 22:28:31,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 22:28:33,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1383.77 | bwd_inner_microstep: 1383.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1990
[2024-06-10 22:28:34,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.27 | bwd_microstep: 846.56 | bwd_inner_microstep: 846.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3516
[2024-06-10 22:28:36,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.16 | bwd_microstep: 1433.45 | bwd_inner_microstep: 1433.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669
[2024-06-10 22:28:38,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.49 | bwd_microstep: 1617.54 | bwd_inner_microstep: 1617.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-10 22:28:40,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.08 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3484
[2024-06-10 22:28:42,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1506.59 | bwd_inner_microstep: 1506.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3406
[2024-06-10 22:28:44,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1370.65 | bwd_inner_microstep: 1370.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 22:28:46,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1556.14 | bwd_inner_microstep: 1556.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 22:28:48,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.94 | bwd_microstep: 1374.49 | bwd_inner_microstep: 1374.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 22:28:50,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.42 | bwd_microstep: 1280.47 | bwd_inner_microstep: 1280.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 22:28:52,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 22:28:54,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.70 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 22:28:55,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1277.98 | bwd_inner_microstep: 1277.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-10 22:28:56,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.42 | bwd_microstep: 710.28 | bwd_inner_microstep: 710.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-10 22:28:59,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.62 | bwd_microstep: 1659.92 | bwd_inner_microstep: 1659.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2290
[2024-06-10 22:29:00,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.13 | bwd_microstep: 855.30 | bwd_inner_microstep: 855.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 22:29:02,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 22:29:04,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1329.26 | bwd_inner_microstep: 1329.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-10 22:29:06,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.31 | bwd_microstep: 1405.45 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3560
[2024-06-10 22:29:07,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.08 | bwd_microstep: 1204.45 | bwd_inner_microstep: 1204.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812
[2024-06-10 22:29:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1517.67 | bwd_inner_microstep: 1517.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568
[2024-06-10 22:29:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1554.77 | bwd_inner_microstep: 1554.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3420
[2024-06-10 22:29:14,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.33 | bwd_microstep: 1374.08 | bwd_inner_microstep: 1374.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-10 22:29:16,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.97 | bwd_microstep: 1639.62 | bwd_inner_microstep: 1639.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-10 22:29:21,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-10 22:29:21,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.63 | bwd_microstep: 4976.50 | bwd_inner_microstep: 1861.69 | bwd_allreduce_microstep: 3114.76 | step_microstep: 38.03
[2024-06-10 22:29:21,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15929.42 | bwd: 45790.13 | bwd_inner: 42674.47 | bwd_allreduce: 3114.99 | step: 39.56
{'loss': 1.1961, 'learning_rate': 7.0581939620128515e-06, 'epoch': 0.73}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 22:29:23,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.67 | bwd_microstep: 1334.06 | bwd_inner_microstep: 1334.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3907
[2024-06-10 22:29:26,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.16 | bwd_microstep: 1691.97 | bwd_inner_microstep: 1691.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3905
[2024-06-10 22:29:28,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1390.24 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3786
[2024-06-10 22:29:29,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1344.87 | bwd_inner_microstep: 1344.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-10 22:29:31,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1453.44 | bwd_inner_microstep: 1453.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743
[2024-06-10 22:29:34,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1532.21 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1941
[2024-06-10 22:29:35,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.58 | bwd_microstep: 823.92 | bwd_inner_microstep: 823.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 22:29:36,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.14 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 22:29:38,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1301.29 | bwd_inner_microstep: 1301.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3699
[2024-06-10 22:29:40,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.29 | bwd_microstep: 1477.24 | bwd_inner_microstep: 1477.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 22:29:42,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.73 | bwd_microstep: 1284.66 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573
[2024-06-10 22:29:44,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1428.54 | bwd_inner_microstep: 1428.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3508
[2024-06-10 22:29:46,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1251.88 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 22:29:48,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2112
[2024-06-10 22:29:49,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.93 | bwd_microstep: 1020.25 | bwd_inner_microstep: 1020.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 22:29:51,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.67 | bwd_microstep: 1516.95 | bwd_inner_microstep: 1516.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-10 22:29:53,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.74 | bwd_microstep: 1243.90 | bwd_inner_microstep: 1243.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 22:29:55,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.26 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 22:29:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.94 | bwd_microstep: 1322.25 | bwd_inner_microstep: 1322.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-10 22:29:59,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1559.16 | bwd_inner_microstep: 1559.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 22:30:00,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1356.13 | bwd_inner_microstep: 1356.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 22:30:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1287.81 | bwd_inner_microstep: 1287.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1962
[2024-06-10 22:30:03,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.55 | bwd_microstep: 732.51 | bwd_inner_microstep: 732.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 22:30:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1398.16 | bwd_inner_microstep: 1398.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 22:30:07,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1557.10 | bwd_inner_microstep: 1557.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-10 22:30:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1297.82 | bwd_inner_microstep: 1297.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-10 22:30:11,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1633.40 | bwd_inner_microstep: 1633.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1891
[2024-06-10 22:30:12,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.10 | bwd_microstep: 688.64 | bwd_inner_microstep: 688.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3815
[2024-06-10 22:30:15,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.42 | bwd_microstep: 1823.88 | bwd_inner_microstep: 1823.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2713
[2024-06-10 22:30:16,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.56 | bwd_microstep: 1096.59 | bwd_inner_microstep: 1096.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 22:30:19,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.02 | bwd_microstep: 1591.50 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3390
[2024-06-10 22:30:24,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-10 22:30:24,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.96 | bwd_microstep: 5185.15 | bwd_inner_microstep: 1558.98 | bwd_allreduce_microstep: 3626.11 | step_microstep: 37.79
[2024-06-10 22:30:24,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15982.22 | bwd: 46535.71 | bwd_inner: 42908.69 | bwd_allreduce: 3626.34 | step: 39.28
{'loss': 1.1922, 'learning_rate': 7.029600374665171e-06, 'epoch': 0.73}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 22:30:26,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.58 | bwd_microstep: 1238.24 | bwd_inner_microstep: 1238.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 22:30:28,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.82 | bwd_microstep: 1244.02 | bwd_inner_microstep: 1243.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 22:30:30,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.28 | bwd_microstep: 1445.68 | bwd_inner_microstep: 1445.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 22:30:32,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1448.33 | bwd_inner_microstep: 1448.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3789
[2024-06-10 22:30:34,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.79 | bwd_microstep: 1252.98 | bwd_inner_microstep: 1252.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 22:30:36,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.91 | bwd_microstep: 1545.18 | bwd_inner_microstep: 1545.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4174
[2024-06-10 22:30:38,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.78 | bwd_microstep: 1549.21 | bwd_inner_microstep: 1549.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-10 22:30:40,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 1437.89 | bwd_inner_microstep: 1437.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-10 22:30:42,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1344.76 | bwd_inner_microstep: 1344.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 22:30:43,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1251.70 | bwd_inner_microstep: 1251.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-10 22:30:44,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.31 | bwd_microstep: 680.71 | bwd_inner_microstep: 680.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3413
[2024-06-10 22:30:46,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 22:30:48,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1520.94 | bwd_inner_microstep: 1520.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3496
[2024-06-10 22:30:50,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.58 | bwd_microstep: 1551.10 | bwd_inner_microstep: 1551.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2060
[2024-06-10 22:30:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.89 | bwd_microstep: 724.16 | bwd_inner_microstep: 724.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641
[2024-06-10 22:30:54,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.45 | bwd_microstep: 1571.29 | bwd_inner_microstep: 1571.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-10 22:30:55,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.06 | bwd_microstep: 1409.03 | bwd_inner_microstep: 1409.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-10 22:30:57,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.17 | bwd_microstep: 1337.23 | bwd_inner_microstep: 1337.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3969
[2024-06-10 22:30:59,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1528.69 | bwd_inner_microstep: 1528.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-10 22:31:01,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1452.78 | bwd_inner_microstep: 1452.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-10 22:31:03,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.10 | bwd_microstep: 975.14 | bwd_inner_microstep: 975.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 22:31:05,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.01 | bwd_microstep: 1351.13 | bwd_inner_microstep: 1351.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3632
[2024-06-10 22:31:06,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 1247.41 | bwd_inner_microstep: 1247.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 22:31:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1354.42 | bwd_inner_microstep: 1354.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 22:31:10,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1556.07 | bwd_inner_microstep: 1556.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 22:31:12,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1440.20 | bwd_inner_microstep: 1440.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 22:31:14,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1412.48 | bwd_inner_microstep: 1412.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-10 22:31:17,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.60 | bwd_microstep: 1657.73 | bwd_inner_microstep: 1657.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3800
[2024-06-10 22:31:18,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.11 | bwd_microstep: 1358.98 | bwd_inner_microstep: 1358.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825
[2024-06-10 22:31:21,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.76 | bwd_microstep: 1581.71 | bwd_inner_microstep: 1581.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-10 22:31:23,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1442.77 | bwd_inner_microstep: 1442.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3767
[2024-06-10 22:31:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.07 | optimizer_step: 6.64
[2024-06-10 22:31:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.15 | bwd_microstep: 4464.57 | bwd_inner_microstep: 1659.91 | bwd_allreduce_microstep: 2804.61 | step_microstep: 37.61
[2024-06-10 22:31:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16341.86 | bwd: 46684.83 | bwd_inner: 43879.30 | bwd_allreduce: 2804.84 | step: 39.11
{'loss': 1.2325, 'learning_rate': 7.001052468997551e-06, 'epoch': 0.73}
�▎  | 1261/1726 [21:48:51<8:02:59, 62.32s/it]
 73%|███████▎  | 1262/1726 [21:49:53<8:03:18, 62.50s/it]


 73%|███████▎  | 1262/1726 [21:49:53<8:03:18, 62.50s/it]
 73%|███████▎  | 1263/1726 [21:50:56<8:02:45, 62.56s/it]


 73%|███████▎  | 1263/1726 [21:50:56<8:02:45, 62.56s/it]
 73%|███████▎  | 1264/1726 [21:51:58<8:00:33, 62.41s/it]


 73%|███████▎  | 1264/1726 [21:51:58<8:00:33, 62.41s/it]
 73%|███████▎  | 1265/1726 [21:53:01<8:00:32, 62.54s/it]


 73%|███████▎  | 1265/1726 [21:53:01<8:00:32, 62.54s/it]
 73%|███████▎  | 1266/1726 [21:54:04<8:01:22, 62.79s/it]


 73%|███████▎dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-10 22:31:29,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.77 | bwd_microstep: 1143.29 | bwd_inner_microstep: 1143.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-10 22:31:30,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 788.10 | bwd_inner_microstep: 788.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3851
[2024-06-10 22:31:33,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.45 | bwd_microstep: 1623.47 | bwd_inner_microstep: 1623.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3848
[2024-06-10 22:31:35,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.85 | bwd_microstep: 1657.02 | bwd_inner_microstep: 1656.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-10 22:31:37,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1376.38 | bwd_inner_microstep: 1376.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2969
[2024-06-10 22:31:38,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.63 | bwd_microstep: 1009.25 | bwd_inner_microstep: 1009.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 22:31:40,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1285.51 | bwd_inner_microstep: 1285.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 22:31:42,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1246.45 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 22:31:43,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1284.94 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 22:31:45,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1258.19 | bwd_inner_microstep: 1258.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014
[2024-06-10 22:31:46,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.26 | bwd_microstep: 896.35 | bwd_inner_microstep: 896.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 22:31:48,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.60 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-10 22:31:50,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.58 | bwd_microstep: 1441.65 | bwd_inner_microstep: 1441.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 22:31:52,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1487.06 | bwd_inner_microstep: 1487.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3644
[2024-06-10 22:31:55,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.36 | bwd_microstep: 1637.31 | bwd_inner_microstep: 1637.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3637
[2024-06-10 22:31:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.13 | bwd_microstep: 1810.15 | bwd_inner_microstep: 1810.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2550
[2024-06-10 22:31:59,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.88 | bwd_microstep: 1059.26 | bwd_inner_microstep: 1059.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 22:32:00,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1292.63 | bwd_inner_microstep: 1292.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-10 22:32:02,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1458.88 | bwd_inner_microstep: 1458.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1401
[2024-06-10 22:32:03,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.84 | bwd_microstep: 525.44 | bwd_inner_microstep: 525.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2302
[2024-06-10 22:32:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 882.51 | bwd_inner_microstep: 882.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 22:32:06,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1559.17 | bwd_inner_microstep: 1559.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-10 22:32:08,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.18 | bwd_microstep: 798.00 | bwd_inner_microstep: 797.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 22:32:09,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.44 | bwd_microstep: 1328.53 | bwd_inner_microstep: 1328.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-10 22:32:11,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1393.31 | bwd_inner_microstep: 1393.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-10 22:32:13,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2272
[2024-06-10 22:32:15,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.99 | bwd_microstep: 782.37 | bwd_inner_microstep: 782.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495
[2024-06-10 22:32:16,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-10 22:32:18,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1295.16 | bwd_inner_microstep: 1295.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3565
[2024-06-10 22:32:21,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.71 | bwd_microstep: 1667.21 | bwd_inner_microstep: 1667.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3605
[2024-06-10 22:32:23,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.68 | bwd_microstep: 1640.74 | bwd_inner_microstep: 1640.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-10 22:32:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 22:32:29,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 5391.73 | bwd_inner_microstep: 1705.36 | bwd_allreduce_microstep: 3686.31 | step_microstep: 38.00
[2024-06-10 22:32:29,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15496.24 | bwd: 45293.48 | bwd_inner: 41606.27 | bwd_allreduce: 3686.54 | step: 39.42
{'loss': 1.1277, 'learning_rate': 6.97255034555556e-06, 'epoch': 0.73}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 22:32:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1466.49 | bwd_inner_microstep: 1466.35 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-10 22:32:33,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.94 | bwd_microstep: 1482.93 | bwd_inner_microstep: 1482.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 22:32:35,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.92 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 22:32:36,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3773
[2024-06-10 22:32:38,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1501.60 | bwd_inner_microstep: 1501.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-10 22:32:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1385.81 | bwd_inner_microstep: 1385.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-10 22:32:41,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.29 | bwd_microstep: 794.28 | bwd_inner_microstep: 794.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-10 22:32:44,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.64 | bwd_microstep: 1629.66 | bwd_inner_microstep: 1629.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-10 22:32:45,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.91 | bwd_microstep: 677.72 | bwd_inner_microstep: 677.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-10 22:32:47,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1526.54 | bwd_inner_microstep: 1526.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517
[2024-06-10 22:32:48,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.51 | bwd_microstep: 1191.11 | bwd_inner_microstep: 1191.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479
[2024-06-10 22:32:50,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205
[2024-06-10 22:32:52,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.62 | bwd_microstep: 956.26 | bwd_inner_microstep: 956.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-10 22:32:53,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.71 | bwd_microstep: 882.95 | bwd_inner_microstep: 882.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-10 22:32:55,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1482.68 | bwd_inner_microstep: 1482.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3655
[2024-06-10 22:32:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.31 | bwd_microstep: 1716.75 | bwd_inner_microstep: 1716.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896
[2024-06-10 22:32:58,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.95 | bwd_microstep: 775.17 | bwd_inner_microstep: 775.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-10 22:33:00,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1502.37 | bwd_inner_microstep: 1502.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 22:33:02,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1377.02 | bwd_inner_microstep: 1376.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 22:33:04,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 22:33:06,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.18 | bwd_microstep: 1256.39 | bwd_inner_microstep: 1256.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3863
[2024-06-10 22:33:08,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.44 | bwd_microstep: 1736.40 | bwd_inner_microstep: 1736.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 22:33:10,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1356.62 | bwd_inner_microstep: 1356.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090
[2024-06-10 22:33:11,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.90 | bwd_microstep: 917.53 | bwd_inner_microstep: 917.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 22:33:13,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.50 | bwd_microstep: 1432.70 | bwd_inner_microstep: 1432.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-10 22:33:14,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.91 | bwd_microstep: 813.71 | bwd_inner_microstep: 813.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2852
[2024-06-10 22:33:16,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.34 | bwd_microstep: 1095.18 | bwd_inner_microstep: 1095.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-10 22:33:18,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1350.81 | bwd_inner_microstep: 1350.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 22:33:20,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1554.41 | bwd_inner_microstep: 1554.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-10 22:33:22,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1542.93 | bwd_inner_microstep: 1542.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 22:33:24,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-10 22:33:30,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-10 22:33:30,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 5758.72 | bwd_inner_microstep: 1567.32 | bwd_allreduce_microstep: 4191.35 | step_microstep: 37.98
[2024-06-10 22:33:30,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.85 | bwd: 45548.72 | bwd_inner: 41356.35 | bwd_allreduce: 4191.64 | step: 39.47
{'loss': 1.1898, 'learning_rate': 6.94409410472352e-06, 'epoch': 0.73}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1929
[2024-06-10 22:33:31,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.80 | bwd_microstep: 874.28 | bwd_inner_microstep: 874.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 22:33:33,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.75 | bwd_microstep: 1344.04 | bwd_inner_microstep: 1344.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-10 22:33:35,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1551.84 | bwd_inner_microstep: 1551.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4238
[2024-06-10 22:33:38,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.75 | bwd_microstep: 1662.78 | bwd_inner_microstep: 1662.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 22:33:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.84 | bwd_microstep: 1647.88 | bwd_inner_microstep: 1647.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-10 22:33:42,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.99 | bwd_microstep: 1182.91 | bwd_inner_microstep: 1182.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-10 22:33:44,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1523.32 | bwd_inner_microstep: 1523.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 22:33:46,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1546.25 | bwd_inner_microstep: 1546.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1875
[2024-06-10 22:33:47,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.68 | bwd_microstep: 744.91 | bwd_inner_microstep: 744.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 22:33:48,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.02 | bwd_microstep: 802.34 | bwd_inner_microstep: 802.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 22:33:50,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.75 | bwd_microstep: 1283.36 | bwd_inner_microstep: 1283.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450
[2024-06-10 22:33:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1413.15 | bwd_inner_microstep: 1413.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398
[2024-06-10 22:33:53,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.05 | bwd_microstep: 1365.79 | bwd_inner_microstep: 1365.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-10 22:33:55,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.82 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3644
[2024-06-10 22:33:58,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.65 | bwd_microstep: 1814.26 | bwd_inner_microstep: 1814.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 22:33:59,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.02 | bwd_microstep: 791.16 | bwd_inner_microstep: 791.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 22:34:01,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.61 | bwd_microstep: 1717.96 | bwd_inner_microstep: 1717.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-10 22:34:03,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.75 | bwd_microstep: 1427.96 | bwd_inner_microstep: 1427.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650
[2024-06-10 22:34:06,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.77 | bwd_microstep: 1582.02 | bwd_inner_microstep: 1581.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-10 22:34:07,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 22:34:09,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.67 | bwd_microstep: 1326.23 | bwd_inner_microstep: 1326.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-10 22:34:11,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1395.64 | bwd_inner_microstep: 1395.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692
[2024-06-10 22:34:13,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1331.13 | bwd_inner_microstep: 1331.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 22:34:15,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1376.77 | bwd_inner_microstep: 1376.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-10 22:34:17,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1648.44 | bwd_inner_microstep: 1648.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2412
[2024-06-10 22:34:19,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 936.97 | bwd_inner_microstep: 936.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 22:34:21,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1399.35 | bwd_inner_microstep: 1399.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 22:34:23,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.83 | bwd_microstep: 1648.92 | bwd_inner_microstep: 1648.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591
[2024-06-10 22:34:25,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1460.54 | bwd_inner_microstep: 1460.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3591
[2024-06-10 22:34:27,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1566.97 | bwd_inner_microstep: 1566.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532
[2024-06-10 22:34:29,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1451.06 | bwd_inner_microstep: 1451.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 22:34:31,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 22:34:31,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1539.74 | bwd_inner_microstep: 1532.10 | bwd_allreduce_microstep: 7.59 | step_microstep: 37.53
[2024-06-10 22:34:31,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16476.20 | bwd: 44198.46 | bwd_inner: 44189.98 | bwd_allreduce: 7.81 | step: 39.02
{'loss': 1.201, 'learning_rate': 6.915683846724188e-06, 'epoch': 0.74}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 22:34:33,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1336.31 | bwd_inner_microstep: 1336.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4119
[2024-06-10 22:34:35,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.38 | bwd_microstep: 1736.34 | bwd_inner_microstep: 1736.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 22:34:37,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2239
[2024-06-10 22:34:38,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.17 | bwd_microstep: 863.14 | bwd_inner_microstep: 863.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-10 22:34:41,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1485.08 | bwd_inner_microstep: 1485.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 22:34:42,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 1391.23 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 22:34:44,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1386.19 | bwd_inner_microstep: 1386.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-10 22:34:46,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.73 | bwd_microstep: 1527.27 | bwd_inner_microstep: 1527.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3752
[2024-06-10 22:34:49,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1566.23 | bwd_inner_microstep: 1566.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684
[2024-06-10 22:34:51,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.12 | bwd_microstep: 1721.48 | bwd_inner_microstep: 1721.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 22:34:53,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1248.74 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2458
[2024-06-10 22:34:54,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.27 | bwd_microstep: 994.92 | bwd_inner_microstep: 994.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 22:34:56,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1350.16 | bwd_inner_microstep: 1350.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-10 22:34:58,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1449.28 | bwd_inner_microstep: 1449.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508
[2024-06-10 22:35:00,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.94 | bwd_microstep: 1446.08 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3519
[2024-06-10 22:35:02,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.23 | bwd_microstep: 1553.30 | bwd_inner_microstep: 1553.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3501
[2024-06-10 22:35:04,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1432.53 | bwd_inner_microstep: 1432.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-10 22:35:06,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.22 | bwd_microstep: 1522.07 | bwd_inner_microstep: 1522.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615
[2024-06-10 22:35:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.19 | bwd_microstep: 1468.04 | bwd_inner_microstep: 1468.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 22:35:10,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1499.15 | bwd_inner_microstep: 1499.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 22:35:12,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.31 | bwd_microstep: 1491.25 | bwd_inner_microstep: 1491.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 22:35:14,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.15 | bwd_microstep: 1477.73 | bwd_inner_microstep: 1477.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 22:35:16,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.43 | bwd_microstep: 1452.81 | bwd_inner_microstep: 1452.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 22:35:19,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1660.13 | bwd_inner_microstep: 1660.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2220
[2024-06-10 22:35:20,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.00 | bwd_microstep: 960.62 | bwd_inner_microstep: 960.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 22:35:22,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.47 | bwd_microstep: 1184.37 | bwd_inner_microstep: 1184.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 22:35:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1514.93 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597
[2024-06-10 22:35:26,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1437.17 | bwd_inner_microstep: 1437.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 22:35:28,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 22:35:30,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1557.35 | bwd_inner_microstep: 1557.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-10 22:35:32,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1357.57 | bwd_inner_microstep: 1357.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 22:35:34,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-10 22:35:34,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1418.84 | bwd_inner_microstep: 1411.16 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.54
[2024-06-10 22:35:34,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16881.40 | bwd: 45256.00 | bwd_inner: 45247.46 | bwd_allreduce: 7.85 | step: 38.99
{'loss': 1.1756, 'learning_rate': 6.887319671618315e-06, 'epoch': 0.74}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 22:35:36,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1448.01 | bwd_inner_microstep: 1447.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3945
[2024-06-10 22:35:38,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.01 | bwd_microstep: 1702.63 | bwd_inner_microstep: 1702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 22:35:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1383.37 | bwd_inner_microstep: 1383.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 22:35:42,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.74 | bwd_inner_microstep: 1556.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 22:35:44,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1249.16 | bwd_inner_microstep: 1249.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3742
[2024-06-10 22:35:46,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1497.66 | bwd_inner_microstep: 1497.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-10 22:35:47,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.90 | bwd_microstep: 1154.14 | bwd_inner_microstep: 1154.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-10 22:35:49,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1532.70 | bwd_inner_microstep: 1532.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 22:35:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.48 | bwd_microstep: 1531.29 | bwd_inner_microstep: 1531.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 748
[2024-06-10 22:35:52,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.19 | bwd_microstep: 300.55 | bwd_inner_microstep: 300.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 22:35:54,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3439
[2024-06-10 22:35:56,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1411.88 | bwd_inner_microstep: 1411.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-10 22:35:57,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1252.60 | bwd_inner_microstep: 1252.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 22:35:59,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.44 | bwd_microstep: 1340.78 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921
[2024-06-10 22:36:00,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.66 | bwd_microstep: 788.69 | bwd_inner_microstep: 788.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 22:36:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1373.99 | bwd_inner_microstep: 1373.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514
[2024-06-10 22:36:04,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.59 | bwd_microstep: 1446.73 | bwd_inner_microstep: 1446.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988
[2024-06-10 22:36:06,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.68 | bwd_microstep: 897.33 | bwd_inner_microstep: 897.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-10 22:36:07,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1383.02 | bwd_inner_microstep: 1383.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-10 22:36:10,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.72 | bwd_microstep: 1526.15 | bwd_inner_microstep: 1526.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3986
[2024-06-10 22:36:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.32 | bwd_microstep: 1814.23 | bwd_inner_microstep: 1814.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-10 22:36:14,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1411.40 | bwd_inner_microstep: 1411.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 22:36:16,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1279.42 | bwd_inner_microstep: 1279.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 616
[2024-06-10 22:36:16,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.50 | bwd_microstep: 261.61 | bwd_inner_microstep: 261.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555
[2024-06-10 22:36:18,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.55 | bwd_microstep: 1200.66 | bwd_inner_microstep: 1200.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-10 22:36:20,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.98 | bwd_microstep: 1302.55 | bwd_inner_microstep: 1302.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-10 22:36:21,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1310.59 | bwd_inner_microstep: 1310.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-10 22:36:24,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.65 | bwd_microstep: 1502.71 | bwd_inner_microstep: 1502.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-10 22:36:26,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.05 | bwd_microstep: 1585.68 | bwd_inner_microstep: 1585.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3781
[2024-06-10 22:36:28,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.78 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577
[2024-06-10 22:36:30,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.96 | bwd_microstep: 1426.51 | bwd_inner_microstep: 1426.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4011
[2024-06-10 22:36:36,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.10 | optimizer_step: 6.61
[2024-06-10 22:36:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 5668.14 | bwd_inner_microstep: 1621.23 | bwd_allreduce_microstep: 4046.86 | step_microstep: 38.16
[2024-06-10 22:36:36,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15754.40 | bwd: 46210.41 | bwd_inner: 42162.65 | bwd_allreduce: 4047.09 | step: 39.63
{'loss': 1.1764, 'learning_rate': 6.859001679304398e-06, 'epoch': 0.74}
  | 1266/1726 [21:54:04<8:01:22, 62.79s/it]
 73%|███████▎  | 1267/1726 [21:55:06<7:56:29, 62.29s/it]


 73%|███████▎  | 1267/1726 [21:55:06<7:56:29, 62.29s/it]
 73%|███████▎  | 1268/1726 [21:56:07<7:53:09, 61.99s/it]


 73%|███████▎  | 1268/1726 [21:56:07<7:53:09, 61.99s/it]
 74%|███████▎  | 1269/1726 [21:57:08<7:49:53, 61.69s/it]


 74%|███████▎  | 1269/1726 [21:57:08<7:49:53, 61.69s/it]
 74%|███████▎  | 1270/1726 [21:58:10<7:50:39, 61.93s/it]


 74%|███████▎  | 1270/1726 [21:58:10<7:50:39, 61.93s/it]
 74%|███████▎  | 1271/1726 [21:59:13<7:50:27, 62.04s/it]


 74%|███████▎  | dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 22:36:37,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 787.04 | bwd_inner_microstep: 786.92 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4494
[2024-06-10 22:36:39,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.28 | bwd_microstep: 1641.09 | bwd_inner_microstep: 1641.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-10 22:36:41,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1397.70 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-10 22:36:43,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.41 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 22:36:45,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1545.61 | bwd_inner_microstep: 1545.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-10 22:36:47,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1430.21 | bwd_inner_microstep: 1430.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217
[2024-06-10 22:36:48,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.20 | bwd_microstep: 956.75 | bwd_inner_microstep: 956.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-10 22:36:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.05 | bwd_microstep: 1628.52 | bwd_inner_microstep: 1628.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-10 22:36:53,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.51 | bwd_microstep: 1525.27 | bwd_inner_microstep: 1525.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-10 22:36:55,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.53 | bwd_microstep: 1427.10 | bwd_inner_microstep: 1427.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 22:36:57,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1438.77 | bwd_inner_microstep: 1438.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-10 22:36:58,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.15 | bwd_microstep: 720.70 | bwd_inner_microstep: 720.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674
[2024-06-10 22:37:00,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.08 | bwd_microstep: 1549.04 | bwd_inner_microstep: 1549.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504
[2024-06-10 22:37:02,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3424
[2024-06-10 22:37:03,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1258.28 | bwd_inner_microstep: 1258.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 22:37:05,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1469.06 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-10 22:37:07,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1388.78 | bwd_inner_microstep: 1388.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2422
[2024-06-10 22:37:09,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.99 | bwd_microstep: 937.77 | bwd_inner_microstep: 937.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639
[2024-06-10 22:37:11,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.12 | bwd_microstep: 1605.01 | bwd_inner_microstep: 1604.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3619
[2024-06-10 22:37:13,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.61 | bwd_microstep: 1374.32 | bwd_inner_microstep: 1374.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3503
[2024-06-10 22:37:14,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.08 | bwd_microstep: 1235.25 | bwd_inner_microstep: 1235.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-10 22:37:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1554.66 | bwd_inner_microstep: 1554.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 22:37:18,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808
[2024-06-10 22:37:20,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1353.01 | bwd_inner_microstep: 1352.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 22:37:22,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3803
[2024-06-10 22:37:24,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3825
[2024-06-10 22:37:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1501.01 | bwd_inner_microstep: 1500.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 22:37:28,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.53 | bwd_microstep: 1496.90 | bwd_inner_microstep: 1496.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258
[2024-06-10 22:37:29,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.71 | bwd_microstep: 968.95 | bwd_inner_microstep: 968.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596
[2024-06-10 22:37:31,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.37 | bwd_microstep: 1432.31 | bwd_inner_microstep: 1432.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-10 22:37:34,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.36 | bwd_microstep: 1700.93 | bwd_inner_microstep: 1700.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 22:37:36,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.02 | optimizer_step: 6.60
[2024-06-10 22:37:36,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1866.03 | bwd_inner_microstep: 1442.55 | bwd_allreduce_microstep: 423.43 | step_microstep: 37.56
[2024-06-10 22:37:36,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16190.29 | bwd: 43769.05 | bwd_inner: 43344.63 | bwd_allreduce: 423.70 | step: 39.04
{'loss': 1.1987, 'learning_rate': 6.830729969518246e-06, 'epoch': 0.74}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-10 22:37:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.38 | bwd_microstep: 1371.82 | bwd_inner_microstep: 1371.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-10 22:37:39,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.11 | bwd_microstep: 776.20 | bwd_inner_microstep: 776.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 22:37:41,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1555.70 | bwd_inner_microstep: 1555.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-10 22:37:43,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1240.69 | bwd_inner_microstep: 1240.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4071
[2024-06-10 22:37:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.97 | bwd_microstep: 1624.15 | bwd_inner_microstep: 1624.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 22:37:47,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.62 | bwd_microstep: 1481.17 | bwd_inner_microstep: 1481.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 22:37:49,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1250.17 | bwd_inner_microstep: 1250.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-10 22:37:51,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1244.74 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 22:37:52,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-10 22:37:54,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1351.45 | bwd_inner_microstep: 1351.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-10 22:37:56,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.70 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3490
[2024-06-10 22:37:58,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1445.64 | bwd_inner_microstep: 1445.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 22:38:00,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1372.16 | bwd_inner_microstep: 1372.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 22:38:02,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.03 | bwd_microstep: 1252.29 | bwd_inner_microstep: 1252.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-10 22:38:04,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 1584.09 | bwd_inner_microstep: 1584.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2153
[2024-06-10 22:38:05,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.13 | bwd_microstep: 976.60 | bwd_inner_microstep: 976.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-10 22:38:07,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 791.69 | bwd_inner_microstep: 791.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 22:38:09,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1556.12 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954
[2024-06-10 22:38:10,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 700.21 | bwd_inner_microstep: 700.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-10 22:38:11,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1248.69 | bwd_inner_microstep: 1248.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 22:38:13,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1390.38 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 22:38:15,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1350.88 | bwd_inner_microstep: 1350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-10 22:38:17,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1416.71 | bwd_inner_microstep: 1416.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566
[2024-06-10 22:38:19,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.03 | bwd_microstep: 1597.24 | bwd_inner_microstep: 1597.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768
[2024-06-10 22:38:21,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.74 | bwd_microstep: 1465.28 | bwd_inner_microstep: 1465.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 22:38:22,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.64 | bwd_microstep: 802.21 | bwd_inner_microstep: 802.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2055
[2024-06-10 22:38:24,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.53 | bwd_microstep: 1009.83 | bwd_inner_microstep: 1009.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-10 22:38:26,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.04 | bwd_microstep: 1646.83 | bwd_inner_microstep: 1646.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 22:38:28,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-10 22:38:29,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.68 | bwd_microstep: 728.05 | bwd_inner_microstep: 728.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2029
[2024-06-10 22:38:31,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.04 | bwd_microstep: 913.01 | bwd_inner_microstep: 912.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3776
[2024-06-10 22:38:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-10 22:38:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.67 | bwd_microstep: 6291.98 | bwd_inner_microstep: 1705.18 | bwd_allreduce_microstep: 4586.75 | step_microstep: 37.93
[2024-06-10 22:38:37,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15277.27 | bwd: 45647.30 | bwd_inner: 41059.64 | bwd_allreduce: 4586.97 | step: 39.46
{'loss': 1.2205, 'learning_rate': 6.80250464183269e-06, 'epoch': 0.74}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-10 22:38:40,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1543.42 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-10 22:38:41,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1379.10 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3931
[2024-06-10 22:38:44,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.90 | bwd_microstep: 1590.66 | bwd_inner_microstep: 1590.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3780
[2024-06-10 22:38:46,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1476.89 | bwd_inner_microstep: 1476.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-10 22:38:48,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.07 | bwd_microstep: 1529.55 | bwd_inner_microstep: 1529.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 22:38:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.89 | bwd_microstep: 1274.59 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 22:38:51,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1280.92 | bwd_inner_microstep: 1280.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3705
[2024-06-10 22:38:53,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1455.15 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 22:38:55,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1404.70 | bwd_inner_microstep: 1404.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 22:38:57,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.15 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498
[2024-06-10 22:38:59,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1442.73 | bwd_inner_microstep: 1442.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-10 22:39:01,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.39 | bwd_microstep: 1339.82 | bwd_inner_microstep: 1339.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3663
[2024-06-10 22:39:03,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1466.55 | bwd_inner_microstep: 1466.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3068
[2024-06-10 22:39:05,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.89 | bwd_microstep: 1237.60 | bwd_inner_microstep: 1237.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 22:39:07,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.11 | bwd_microstep: 1451.44 | bwd_inner_microstep: 1451.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-10 22:39:08,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.57 | bwd_microstep: 697.76 | bwd_inner_microstep: 697.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 22:39:10,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.76 | bwd_microstep: 1455.19 | bwd_inner_microstep: 1455.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-10 22:39:11,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.46 | bwd_microstep: 915.31 | bwd_inner_microstep: 915.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 22:39:13,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1249.80 | bwd_inner_microstep: 1249.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 22:39:15,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1398.41 | bwd_inner_microstep: 1398.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-10 22:39:16,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1398.64 | bwd_inner_microstep: 1398.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504
[2024-06-10 22:39:18,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1248.53 | bwd_inner_microstep: 1248.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3874
[2024-06-10 22:39:20,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.65 | bwd_microstep: 1612.16 | bwd_inner_microstep: 1612.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-10 22:39:22,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.26 | bwd_microstep: 1502.59 | bwd_inner_microstep: 1502.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 22:39:24,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.92 | bwd_microstep: 1445.80 | bwd_inner_microstep: 1445.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 22:39:25,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.70 | bwd_microstep: 702.62 | bwd_inner_microstep: 702.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-10 22:39:27,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.07 | bwd_microstep: 878.59 | bwd_inner_microstep: 878.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3453
[2024-06-10 22:39:29,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.31 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3617
[2024-06-10 22:39:31,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.91 | bwd_microstep: 1245.53 | bwd_inner_microstep: 1245.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2043
[2024-06-10 22:39:32,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.44 | bwd_microstep: 716.49 | bwd_inner_microstep: 716.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-10 22:39:34,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.36 | bwd_microstep: 1542.50 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585
[2024-06-10 22:39:36,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.03 | optimizer_step: 6.58
[2024-06-10 22:39:36,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1959.91 | bwd_inner_microstep: 1482.79 | bwd_allreduce_microstep: 477.07 | step_microstep: 37.58
[2024-06-10 22:39:36,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15743.27 | bwd: 42665.05 | bwd_inner: 42187.04 | bwd_allreduce: 477.30 | step: 39.10
{'loss': 1.1478, 'learning_rate': 6.774325795657175e-06, 'epoch': 0.74}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-10 22:39:38,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 1577.57 | bwd_inner_microstep: 1577.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-10 22:39:40,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.75 | bwd_microstep: 1382.71 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3885
[2024-06-10 22:39:43,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.26 | bwd_microstep: 1684.27 | bwd_inner_microstep: 1684.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 22:39:44,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-10 22:39:46,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.51 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 22:39:48,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.65 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 22:39:50,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1247.10 | bwd_inner_microstep: 1247.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-10 22:39:52,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 22:39:54,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1386.33 | bwd_inner_microstep: 1386.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 22:39:56,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1247.16 | bwd_inner_microstep: 1247.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 22:39:57,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.23 | bwd_microstep: 1247.90 | bwd_inner_microstep: 1247.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-10 22:40:00,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.88 | bwd_microstep: 1618.48 | bwd_inner_microstep: 1618.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3573
[2024-06-10 22:40:02,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.73 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 22:40:03,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1284.01 | bwd_inner_microstep: 1283.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 22:40:05,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675
[2024-06-10 22:40:07,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1554.47 | bwd_inner_microstep: 1554.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 22:40:09,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 1388.67 | bwd_inner_microstep: 1388.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 22:40:11,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.83 | bwd_microstep: 1277.00 | bwd_inner_microstep: 1276.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 22:40:13,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1450.52 | bwd_inner_microstep: 1450.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 22:40:15,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1606.91 | bwd_inner_microstep: 1606.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 22:40:17,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1389.47 | bwd_inner_microstep: 1389.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3620
[2024-06-10 22:40:19,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1341.64 | bwd_inner_microstep: 1341.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977
[2024-06-10 22:40:20,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.50 | bwd_microstep: 705.30 | bwd_inner_microstep: 705.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555
[2024-06-10 22:40:22,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.17 | bwd_microstep: 1333.05 | bwd_inner_microstep: 1333.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-10 22:40:24,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1511.26 | bwd_inner_microstep: 1511.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-10 22:40:26,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1556.19 | bwd_inner_microstep: 1556.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560
[2024-06-10 22:40:28,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.67 | bwd_microstep: 1429.03 | bwd_inner_microstep: 1429.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3435
[2024-06-10 22:40:30,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.19 | bwd_microstep: 1215.60 | bwd_inner_microstep: 1215.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 22:40:31,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.42 | bwd_microstep: 875.64 | bwd_inner_microstep: 875.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-10 22:40:32,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.55 | bwd_microstep: 973.62 | bwd_inner_microstep: 973.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-10 22:40:34,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.50 | bwd_microstep: 1491.93 | bwd_inner_microstep: 1491.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-10 22:40:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.04 | optimizer_step: 6.60
[2024-06-10 22:40:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.24 | bwd_microstep: 2674.96 | bwd_inner_microstep: 1749.63 | bwd_allreduce_microstep: 925.28 | step_microstep: 37.37
[2024-06-10 22:40:38,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16356.75 | bwd: 44762.16 | bwd_inner: 43835.98 | bwd_allreduce: 925.51 | step: 38.96
{'loss': 1.1573, 'learning_rate': 6.746193530237457e-06, 'epoch': 0.74}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 22:40:39,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.65 | bwd_microstep: 1274.87 | bwd_inner_microstep: 1274.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 22:40:41,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1379.16 | bwd_inner_microstep: 1379.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3891
[2024-06-10 22:40:43,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1482.17 | bwd_inner_microstep: 1482.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 22:40:45,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1342.36 | bwd_inner_microstep: 1342.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-10 22:40:47,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.30 | bwd_microstep: 1550.25 | bwd_inner_microstep: 1550.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-10 22:40:49,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.85 | bwd_inner_microstep: 1479.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 22:40:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-10 22:40:53,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1405.03 | bwd_inner_microstep: 1405.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 22:40:55,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.01 | bwd_microstep: 1384.56 | bwd_inner_microstep: 1384.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 22:40:57,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1247.24 | bwd_inner_microstep: 1247.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-10 22:40:59,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.35 | bwd_microstep: 1620.30 | bwd_inner_microstep: 1620.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 22:41:01,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.00 | bwd_microstep: 1381.99 | bwd_inner_microstep: 1381.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692
[2024-06-10 22:41:03,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.40 | bwd_microstep: 1617.88 | bwd_inner_microstep: 1617.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3656
[2024-06-10 22:41:05,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1542.88 | bwd_inner_microstep: 1542.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3623
[2024-06-10 22:41:07,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1373.48 | bwd_inner_microstep: 1373.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 22:41:09,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1287.48 | bwd_inner_microstep: 1287.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-10 22:41:11,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1287.54 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 22:41:13,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1485.22 | bwd_inner_microstep: 1485.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 22:41:15,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1410.08 | bwd_inner_microstep: 1410.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 22:41:17,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1403.90 | bwd_inner_microstep: 1403.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 22:41:19,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1291.04 | bwd_inner_microstep: 1291.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 22:41:21,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1459.02 | bwd_inner_microstep: 1459.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 22:41:22,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1288.90 | bwd_inner_microstep: 1288.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2577
[2024-06-10 22:41:24,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.36 | bwd_microstep: 1068.93 | bwd_inner_microstep: 1068.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 22:41:26,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.98 | bwd_microstep: 1605.81 | bwd_inner_microstep: 1605.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 22:41:28,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1255.37 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3747
[2024-06-10 22:41:30,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1563.39 | bwd_inner_microstep: 1563.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-10 22:41:32,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.02 | bwd_microstep: 1403.07 | bwd_inner_microstep: 1403.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-10 22:41:34,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1656.99 | bwd_inner_microstep: 1656.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-10 22:41:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.33 | bwd_microstep: 1592.56 | bwd_inner_microstep: 1592.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3590
[2024-06-10 22:41:39,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.85 | bwd_microstep: 1670.49 | bwd_inner_microstep: 1670.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781
[2024-06-10 22:41:41,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.04 | optimizer_step: 6.62
[2024-06-10 22:41:41,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.60 | bwd_microstep: 1488.08 | bwd_inner_microstep: 1480.46 | bwd_allreduce_microstep: 7.58 | step_microstep: 37.38
[2024-06-10 22:41:41,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17061.04 | bwd: 45691.78 | bwd_inner: 45683.31 | bwd_allreduce: 7.81 | step: 38.88
{'loss': 1.2352, 'learning_rate': 6.7181079446552165e-06, 'epoch': 0.74}
1271/1726 [21:59:13<7:50:27, 62.04s/it]
 74%|███████▎  | 1272/1726 [22:00:13<7:45:27, 61.51s/it]


 74%|███████▎  | 1272/1726 [22:00:13<7:45:27, 61.51s/it]
 74%|███████▍  | 1273/1726 [22:01:14<7:43:51, 61.44s/it]


 74%|███████▍  | 1273/1726 [22:01:14<7:43:51, 61.44s/it]
 74%|███████▍  | 1274/1726 [22:02:13<7:36:44, 60.63s/it]


 74%|███████▍  | 1274/1726 [22:02:13<7:36:44, 60.63s/it]
 74%|███████▍  | 1275/1726 [22:03:14<7:37:36, 60.88s/it]


 74%|███████▍  | 1275/1726 [22:03:14<7:37:36, 60.88s/it]
 74%|███████▍  | 1276/1726 [22:04:17<7:41:33, 61.54s/it]


 74%|███████▍  | 1276dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2629
[2024-06-10 22:41:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.30 | bwd_microstep: 1011.12 | bwd_inner_microstep: 1011.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4667
[2024-06-10 22:41:44,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.44 | bwd_microstep: 1674.46 | bwd_inner_microstep: 1674.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 22:41:46,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1343.11 | bwd_inner_microstep: 1343.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 22:41:48,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.01 | bwd_microstep: 1293.80 | bwd_inner_microstep: 1293.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-10 22:41:50,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.02 | bwd_microstep: 1442.59 | bwd_inner_microstep: 1442.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 22:41:52,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1388.15 | bwd_inner_microstep: 1388.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 22:41:54,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.79 | bwd_microstep: 1149.03 | bwd_inner_microstep: 1149.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-10 22:41:55,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.33 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3514
[2024-06-10 22:41:57,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.84 | bwd_microstep: 1433.33 | bwd_inner_microstep: 1433.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4124
[2024-06-10 22:42:00,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.09 | bwd_microstep: 1729.34 | bwd_inner_microstep: 1729.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509
[2024-06-10 22:42:02,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.59 | bwd_microstep: 1448.78 | bwd_inner_microstep: 1448.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2138
[2024-06-10 22:42:03,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.56 | bwd_microstep: 926.15 | bwd_inner_microstep: 926.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-10 22:42:04,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.26 | bwd_microstep: 886.95 | bwd_inner_microstep: 886.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519
[2024-06-10 22:42:06,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1324.61 | bwd_inner_microstep: 1324.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 22:42:08,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1322.12 | bwd_inner_microstep: 1322.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-10 22:42:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.61 | bwd_microstep: 1493.61 | bwd_inner_microstep: 1493.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2188
[2024-06-10 22:42:11,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.66 | bwd_microstep: 763.87 | bwd_inner_microstep: 763.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180
[2024-06-10 22:42:12,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.00 | bwd_microstep: 857.08 | bwd_inner_microstep: 857.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928
[2024-06-10 22:42:13,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.16 | bwd_microstep: 697.64 | bwd_inner_microstep: 697.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3655
[2024-06-10 22:42:15,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1322.91 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-10 22:42:17,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1410.93 | bwd_inner_microstep: 1410.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 22:42:19,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.64 | bwd_microstep: 1506.19 | bwd_inner_microstep: 1506.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-10 22:42:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1310.39 | bwd_inner_microstep: 1310.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-10 22:42:22,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.99 | bwd_microstep: 820.54 | bwd_inner_microstep: 820.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-10 22:42:24,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.27 | bwd_microstep: 1449.50 | bwd_inner_microstep: 1449.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3599
[2024-06-10 22:42:26,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1341.41 | bwd_inner_microstep: 1341.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-10 22:42:27,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.45 | bwd_microstep: 699.84 | bwd_inner_microstep: 699.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 22:42:29,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.75 | bwd_microstep: 1645.18 | bwd_inner_microstep: 1645.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3821
[2024-06-10 22:42:32,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.41 | bwd_microstep: 1708.31 | bwd_inner_microstep: 1708.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2287
[2024-06-10 22:42:33,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.38 | bwd_microstep: 1009.98 | bwd_inner_microstep: 1009.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2057
[2024-06-10 22:42:34,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.49 | bwd_microstep: 941.35 | bwd_inner_microstep: 941.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3047
[2024-06-10 22:42:40,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-10 22:42:40,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.24 | bwd_microstep: 5515.40 | bwd_inner_microstep: 1566.69 | bwd_allreduce_microstep: 3948.65 | step_microstep: 37.86
[2024-06-10 22:42:40,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15041.04 | bwd: 44208.78 | bwd_inner: 40259.23 | bwd_allreduce: 3948.87 | step: 39.31
{'loss': 1.188, 'learning_rate': 6.690069137827757e-06, 'epoch': 0.74}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1932
[2024-06-10 22:42:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.69 | bwd_microstep: 816.38 | bwd_inner_microstep: 816.31 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3888
[2024-06-10 22:42:43,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1477.31 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4478
[2024-06-10 22:42:46,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.89 | bwd_microstep: 1633.36 | bwd_inner_microstep: 1633.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2930
[2024-06-10 22:42:47,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.55 | bwd_microstep: 1142.58 | bwd_inner_microstep: 1142.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-10 22:42:49,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.14 | bwd_microstep: 1434.09 | bwd_inner_microstep: 1434.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 22:42:51,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.51 | bwd_microstep: 1441.66 | bwd_inner_microstep: 1441.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743
[2024-06-10 22:42:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.51 | bwd_microstep: 1635.97 | bwd_inner_microstep: 1635.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2233
[2024-06-10 22:42:55,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.82 | bwd_microstep: 770.12 | bwd_inner_microstep: 770.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-10 22:42:56,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.79 | bwd_microstep: 714.18 | bwd_inner_microstep: 714.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 22:42:57,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507
[2024-06-10 22:42:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.74 | bwd_microstep: 1191.57 | bwd_inner_microstep: 1191.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1959
[2024-06-10 22:43:00,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.52 | bwd_microstep: 827.15 | bwd_inner_microstep: 827.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3512
[2024-06-10 22:43:02,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1515.89 | bwd_inner_microstep: 1515.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 22:43:03,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.00 | bwd_microstep: 889.41 | bwd_inner_microstep: 889.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-10 22:43:05,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1356.96 | bwd_inner_microstep: 1356.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-10 22:43:07,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.36 | bwd_microstep: 917.91 | bwd_inner_microstep: 917.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-10 22:43:09,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1552.00 | bwd_inner_microstep: 1551.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-10 22:43:11,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1416.65 | bwd_inner_microstep: 1416.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2264
[2024-06-10 22:43:12,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.61 | bwd_microstep: 873.25 | bwd_inner_microstep: 873.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 22:43:14,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1286.77 | bwd_inner_microstep: 1286.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 22:43:15,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.39 | bwd_microstep: 1159.04 | bwd_inner_microstep: 1159.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 22:43:17,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1555.07 | bwd_inner_microstep: 1555.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-10 22:43:19,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.16 | bwd_microstep: 1403.14 | bwd_inner_microstep: 1403.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 22:43:21,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1398.00 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 22:43:23,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1285.12 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 22:43:25,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.58 | bwd_microstep: 1499.72 | bwd_inner_microstep: 1499.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3678
[2024-06-10 22:43:27,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1672.64 | bwd_inner_microstep: 1672.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824
[2024-06-10 22:43:30,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.95 | bwd_microstep: 1749.41 | bwd_inner_microstep: 1749.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3070
[2024-06-10 22:43:32,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.67 | bwd_microstep: 1298.29 | bwd_inner_microstep: 1298.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-10 22:43:34,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.06 | bwd_microstep: 1654.69 | bwd_inner_microstep: 1654.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2884
[2024-06-10 22:43:35,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.53 | bwd_microstep: 1086.41 | bwd_inner_microstep: 1086.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563
[2024-06-10 22:43:43,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 22:43:43,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.91 | bwd_microstep: 6546.13 | bwd_inner_microstep: 1798.88 | bwd_allreduce_microstep: 4747.16 | step_microstep: 39.06
[2024-06-10 22:43:43,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15533.12 | bwd: 46481.32 | bwd_inner: 41733.15 | bwd_allreduce: 4747.46 | step: 40.53
{'loss': 1.1993, 'learning_rate': 6.662077208507603e-06, 'epoch': 0.74}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5427
[2024-06-10 22:43:45,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 705.97 | bwd_microstep: 1867.74 | bwd_inner_microstep: 1867.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3958
[2024-06-10 22:43:47,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1490.75 | bwd_inner_microstep: 1490.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2337
[2024-06-10 22:43:49,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.65 | bwd_microstep: 981.95 | bwd_inner_microstep: 981.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3869
[2024-06-10 22:43:51,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.91 | bwd_microstep: 1564.96 | bwd_inner_microstep: 1564.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 22:43:53,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1385.90 | bwd_inner_microstep: 1385.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 22:43:55,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1481.44 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 22:43:56,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.25 | bwd_microstep: 677.41 | bwd_inner_microstep: 677.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 22:43:58,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1395.21 | bwd_inner_microstep: 1395.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 22:43:59,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.38 | bwd_microstep: 801.85 | bwd_inner_microstep: 801.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 22:44:01,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1403.09 | bwd_inner_microstep: 1403.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495
[2024-06-10 22:44:02,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.16 | bwd_microstep: 1223.26 | bwd_inner_microstep: 1223.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 732
[2024-06-10 22:44:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.90 | bwd_microstep: 296.61 | bwd_inner_microstep: 296.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483
[2024-06-10 22:44:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1315.71 | bwd_inner_microstep: 1315.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 22:44:06,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1345.81 | bwd_inner_microstep: 1345.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2405
[2024-06-10 22:44:08,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.33 | bwd_microstep: 1031.85 | bwd_inner_microstep: 1031.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-10 22:44:10,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1339.69 | bwd_inner_microstep: 1339.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 22:44:12,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.20 | bwd_microstep: 1604.88 | bwd_inner_microstep: 1604.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3836
[2024-06-10 22:44:14,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.23 | bwd_microstep: 1721.01 | bwd_inner_microstep: 1720.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-10 22:44:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 797.68 | bwd_inner_microstep: 797.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3920
[2024-06-10 22:44:18,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.22 | bwd_microstep: 1798.87 | bwd_inner_microstep: 1798.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 22:44:20,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1390.88 | bwd_inner_microstep: 1390.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-10 22:44:21,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.97 | bwd_microstep: 876.01 | bwd_inner_microstep: 875.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2519
[2024-06-10 22:44:23,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.55 | bwd_microstep: 1059.62 | bwd_inner_microstep: 1059.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 22:44:24,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.18 | bwd_microstep: 1435.45 | bwd_inner_microstep: 1435.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 22:44:27,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1533.41 | bwd_inner_microstep: 1533.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-10 22:44:29,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.59 | bwd_microstep: 1644.00 | bwd_inner_microstep: 1643.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 22:44:31,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.88 | bwd_microstep: 1492.58 | bwd_inner_microstep: 1492.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2534
[2024-06-10 22:44:33,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.32 | bwd_microstep: 1187.20 | bwd_inner_microstep: 1187.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653
[2024-06-10 22:44:35,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1512.64 | bwd_inner_microstep: 1512.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-10 22:44:36,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1300.59 | bwd_inner_microstep: 1300.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575
[2024-06-10 22:44:38,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1428.74 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 22:44:44,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-10 22:44:44,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 5063.93 | bwd_inner_microstep: 1780.72 | bwd_allreduce_microstep: 3283.15 | step_microstep: 38.47
[2024-06-10 22:44:44,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15631.54 | bwd: 45450.74 | bwd_inner: 42166.67 | bwd_allreduce: 3283.39 | step: 39.90
{'loss': 1.1584, 'learning_rate': 6.634132255282182e-06, 'epoch': 0.74}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1930
[2024-06-10 22:44:45,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.72 | bwd_microstep: 809.19 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 22:44:47,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1381.18 | bwd_inner_microstep: 1381.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3876
[2024-06-10 22:44:49,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.63 | bwd_microstep: 1478.84 | bwd_inner_microstep: 1478.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3831
[2024-06-10 22:44:51,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.34 | bwd_microstep: 1583.21 | bwd_inner_microstep: 1583.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 22:44:53,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1480.15 | bwd_inner_microstep: 1480.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 22:44:55,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.58 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-10 22:44:57,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1252.01 | bwd_inner_microstep: 1251.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2246
[2024-06-10 22:44:58,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.89 | bwd_microstep: 964.20 | bwd_inner_microstep: 964.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-10 22:45:00,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.06 | bwd_microstep: 1392.10 | bwd_inner_microstep: 1392.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 22:45:02,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1253.33 | bwd_inner_microstep: 1253.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 22:45:04,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.59 | bwd_microstep: 1652.93 | bwd_inner_microstep: 1652.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-10 22:45:05,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.63 | bwd_microstep: 798.81 | bwd_inner_microstep: 798.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-10 22:45:07,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1318.77 | bwd_inner_microstep: 1318.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 22:45:09,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.42 | bwd_microstep: 1256.28 | bwd_inner_microstep: 1256.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-10 22:45:11,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.92 | bwd_microstep: 1311.45 | bwd_inner_microstep: 1311.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3926
[2024-06-10 22:45:13,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.94 | bwd_microstep: 1762.30 | bwd_inner_microstep: 1762.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-10 22:45:15,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1512.80 | bwd_inner_microstep: 1512.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603
[2024-06-10 22:45:17,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1473.74 | bwd_inner_microstep: 1473.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 22:45:19,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1397.11 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 22:45:21,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1279.09 | bwd_inner_microstep: 1279.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933
[2024-06-10 22:45:23,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.54 | bwd_microstep: 1196.35 | bwd_inner_microstep: 1196.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-10 22:45:25,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.28 | bwd_microstep: 1615.97 | bwd_inner_microstep: 1615.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 22:45:27,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1557.81 | bwd_inner_microstep: 1557.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-10 22:45:29,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1628.19 | bwd_inner_microstep: 1628.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 22:45:31,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1413.16 | bwd_inner_microstep: 1413.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-10 22:45:33,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.40 | bwd_microstep: 967.12 | bwd_inner_microstep: 967.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 22:45:35,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.83 | bwd_microstep: 1661.95 | bwd_inner_microstep: 1661.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 22:45:37,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1553.79 | bwd_inner_microstep: 1553.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-10 22:45:39,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1506.71 | bwd_inner_microstep: 1506.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3380
[2024-06-10 22:45:41,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1436.00 | bwd_inner_microstep: 1435.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 22:45:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1500.77 | bwd_inner_microstep: 1500.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755
[2024-06-10 22:45:45,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.04 | optimizer_step: 6.65
[2024-06-10 22:45:45,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.85 | bwd_microstep: 1505.59 | bwd_inner_microstep: 1497.85 | bwd_allreduce_microstep: 7.69 | step_microstep: 37.61
[2024-06-10 22:45:45,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16516.46 | bwd: 44286.10 | bwd_inner: 44277.46 | bwd_allreduce: 7.94 | step: 39.11
{'loss': 1.168, 'learning_rate': 6.6062343765734774e-06, 'epoch': 0.74}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 22:45:47,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1472.40 | bwd_inner_microstep: 1472.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 22:45:49,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1342.63 | bwd_inner_microstep: 1342.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 22:45:51,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1342.99 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301
[2024-06-10 22:45:52,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 975.34 | bwd_inner_microstep: 975.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 22:45:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1346.22 | bwd_inner_microstep: 1346.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-10 22:45:56,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.35 | bwd_microstep: 1190.78 | bwd_inner_microstep: 1190.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-10 22:45:58,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1353.84 | bwd_inner_microstep: 1353.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-10 22:45:59,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.04 | bwd_microstep: 727.62 | bwd_inner_microstep: 727.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-10 22:46:00,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.90 | bwd_microstep: 1197.55 | bwd_inner_microstep: 1197.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3663
[2024-06-10 22:46:02,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1370.49 | bwd_inner_microstep: 1370.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1919
[2024-06-10 22:46:03,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.28 | bwd_microstep: 817.94 | bwd_inner_microstep: 817.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-10 22:46:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.92 | bwd_microstep: 1584.71 | bwd_inner_microstep: 1584.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-10 22:46:08,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1597.16 | bwd_inner_microstep: 1597.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547
[2024-06-10 22:46:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1594.28 | bwd_inner_microstep: 1594.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3649
[2024-06-10 22:46:12,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.42 | bwd_microstep: 1483.59 | bwd_inner_microstep: 1483.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3697
[2024-06-10 22:46:14,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.45 | bwd_microstep: 1724.49 | bwd_inner_microstep: 1724.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 22:46:16,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.11 | bwd_microstep: 1487.78 | bwd_inner_microstep: 1487.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 22:46:18,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.75 | bwd_microstep: 1486.04 | bwd_inner_microstep: 1486.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 22:46:20,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1254.37 | bwd_inner_microstep: 1254.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-10 22:46:22,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1435.29 | bwd_inner_microstep: 1435.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2188
[2024-06-10 22:46:23,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.03 | bwd_microstep: 860.71 | bwd_inner_microstep: 860.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 539
[2024-06-10 22:46:24,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.21 | bwd_microstep: 246.56 | bwd_inner_microstep: 246.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526
[2024-06-10 22:46:26,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.92 | bwd_microstep: 1328.73 | bwd_inner_microstep: 1328.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 22:46:28,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.77 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2230
[2024-06-10 22:46:29,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.62 | bwd_microstep: 867.34 | bwd_inner_microstep: 867.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3546
[2024-06-10 22:46:31,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1455.41 | bwd_inner_microstep: 1455.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 22:46:33,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1601.01 | bwd_inner_microstep: 1600.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-10 22:46:34,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.32 | bwd_microstep: 878.50 | bwd_inner_microstep: 878.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2105
[2024-06-10 22:46:36,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.36 | bwd_microstep: 884.71 | bwd_inner_microstep: 884.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566
[2024-06-10 22:46:37,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.17 | bwd_microstep: 1203.36 | bwd_inner_microstep: 1203.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823
[2024-06-10 22:46:39,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1514.92 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 22:46:47,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 20.91 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-10 22:46:47,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 7216.91 | bwd_inner_microstep: 1691.87 | bwd_allreduce_microstep: 5524.98 | step_microstep: 42.35
[2024-06-10 22:46:47,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15210.67 | bwd: 46367.58 | bwd_inner: 40841.69 | bwd_allreduce: 5525.22 | step: 43.85
{'loss': 1.1745, 'learning_rate': 6.578383670637662e-06, 'epoch': 0.74}
/1726 [22:04:17<7:41:33, 61.54s/it]
 74%|███████▍  | 1277/1726 [22:05:17<7:36:06, 60.95s/it]


 74%|███████▍  | 1277/1726 [22:05:17<7:36:06, 60.95s/it]
 74%|███████▍  | 1278/1726 [22:06:19<7:38:13, 61.37s/it]


 74%|███████▍  | 1278/1726 [22:06:19<7:38:13, 61.37s/it]
 74%|███████▍  | 1279/1726 [22:07:21<7:37:17, 61.38s/it]


 74%|███████▍  | 1279/1726 [22:07:21<7:37:17, 61.38s/it]
 74%|███████▍  | 1280/1726 [22:08:22<7:35:43, 61.31s/it]


 74%|███████▍  | 1280/1726 [22:08:22<7:35:43, 61.31s/it]
 74%|███████▍  | 1281/1726 [22:09:24<7:36:02, 61.49s/it]


 74%|███████▍  | 1281/172dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504
[2024-06-10 22:46:49,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1420.88 | bwd_inner_microstep: 1420.71 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 22:46:51,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.94 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4185
[2024-06-10 22:46:53,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.93 | bwd_microstep: 1746.08 | bwd_inner_microstep: 1746.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 22:46:55,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1547.69 | bwd_inner_microstep: 1547.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-10 22:46:56,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.18 | bwd_microstep: 675.84 | bwd_inner_microstep: 675.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4025
[2024-06-10 22:46:59,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.15 | bwd_microstep: 1707.06 | bwd_inner_microstep: 1707.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-10 22:47:00,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.46 | bwd_microstep: 818.11 | bwd_inner_microstep: 818.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-10 22:47:02,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.29 | bwd_microstep: 1630.05 | bwd_inner_microstep: 1630.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3733
[2024-06-10 22:47:04,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.94 | bwd_microstep: 1663.17 | bwd_inner_microstep: 1663.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864
[2024-06-10 22:47:05,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.52 | bwd_microstep: 709.41 | bwd_inner_microstep: 709.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3409
[2024-06-10 22:47:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1435.98 | bwd_inner_microstep: 1435.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 22:47:09,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-10 22:47:11,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1438.82 | bwd_inner_microstep: 1438.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3976
[2024-06-10 22:47:14,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.31 | bwd_microstep: 1799.04 | bwd_inner_microstep: 1799.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3703
[2024-06-10 22:47:16,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.89 | bwd_microstep: 1721.35 | bwd_inner_microstep: 1721.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 22:47:17,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 794.01 | bwd_inner_microstep: 793.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 22:47:19,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 22:47:21,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.08 | bwd_microstep: 1413.51 | bwd_inner_microstep: 1413.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-10 22:47:23,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1388.45 | bwd_inner_microstep: 1388.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 22:47:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 22:47:27,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1383.43 | bwd_inner_microstep: 1383.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 22:47:28,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.60 | bwd_microstep: 697.78 | bwd_inner_microstep: 697.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 22:47:30,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.67 | bwd_microstep: 1452.93 | bwd_inner_microstep: 1452.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-10 22:47:32,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 22:47:34,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1598.67 | bwd_inner_microstep: 1598.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4701
[2024-06-10 22:47:37,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.95 | bwd_microstep: 1685.50 | bwd_inner_microstep: 1685.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3553
[2024-06-10 22:47:39,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.17 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 22:47:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.19 | bwd_microstep: 1437.32 | bwd_inner_microstep: 1437.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 22:47:43,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-10 22:47:45,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1519.49 | bwd_inner_microstep: 1519.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2713
[2024-06-10 22:47:46,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.01 | bwd_microstep: 1128.98 | bwd_inner_microstep: 1128.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3416
[2024-06-10 22:47:49,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-10 22:47:49,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1720.84 | bwd_inner_microstep: 1509.49 | bwd_allreduce_microstep: 211.31 | step_microstep: 37.53
[2024-06-10 22:47:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16570.38 | bwd: 44740.87 | bwd_inner: 44528.54 | bwd_allreduce: 211.61 | step: 39.02
{'loss': 1.2216, 'learning_rate': 6.550580235564794e-06, 'epoch': 0.74}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 22:47:50,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1245.96 | bwd_inner_microstep: 1245.77 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866
[2024-06-10 22:47:51,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 706.18 | bwd_inner_microstep: 706.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-10 22:47:53,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1399.25 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852
[2024-06-10 22:47:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.54 | bwd_microstep: 1659.01 | bwd_inner_microstep: 1658.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-10 22:47:58,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1448.15 | bwd_inner_microstep: 1448.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 22:48:00,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.43 | bwd_microstep: 1481.50 | bwd_inner_microstep: 1481.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-10 22:48:02,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.74 | bwd_microstep: 1451.03 | bwd_inner_microstep: 1451.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-10 22:48:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.31 | bwd_microstep: 1278.82 | bwd_inner_microstep: 1278.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-10 22:48:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.53 | bwd_microstep: 1637.19 | bwd_inner_microstep: 1637.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 22:48:08,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1411.28 | bwd_inner_microstep: 1411.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3525
[2024-06-10 22:48:10,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.49 | bwd_microstep: 1338.00 | bwd_inner_microstep: 1337.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-10 22:48:11,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1334.36 | bwd_inner_microstep: 1334.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 22:48:13,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1383.05 | bwd_inner_microstep: 1383.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 22:48:15,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.56 | bwd_microstep: 1339.78 | bwd_inner_microstep: 1339.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 22:48:17,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1246.14 | bwd_inner_microstep: 1246.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632
[2024-06-10 22:48:19,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.55 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3631
[2024-06-10 22:48:21,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.79 | bwd_microstep: 1741.62 | bwd_inner_microstep: 1741.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 22:48:23,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.97 | bwd_microstep: 1452.20 | bwd_inner_microstep: 1452.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 22:48:25,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1253.16 | bwd_inner_microstep: 1253.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-10 22:48:27,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1293.01 | bwd_inner_microstep: 1292.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 22:48:29,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1380.84 | bwd_inner_microstep: 1380.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114
[2024-06-10 22:48:30,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.73 | bwd_microstep: 923.39 | bwd_inner_microstep: 923.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3608
[2024-06-10 22:48:32,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1340.83 | bwd_inner_microstep: 1340.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 22:48:34,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1506.31 | bwd_inner_microstep: 1506.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449
[2024-06-10 22:48:36,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3863
[2024-06-10 22:48:38,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1471.32 | bwd_inner_microstep: 1471.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3722
[2024-06-10 22:48:40,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1240.09 | bwd_inner_microstep: 1240.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2416
[2024-06-10 22:48:41,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.54 | bwd_microstep: 842.92 | bwd_inner_microstep: 842.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004
[2024-06-10 22:48:42,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.06 | bwd_microstep: 833.26 | bwd_inner_microstep: 833.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 22:48:44,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-10 22:48:45,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-10 22:48:49,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 22:48:49,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.41 | bwd_microstep: 3719.95 | bwd_inner_microstep: 941.01 | bwd_allreduce_microstep: 2778.88 | step_microstep: 37.82
[2024-06-10 22:48:49,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15678.76 | bwd: 44704.40 | bwd_inner: 41924.48 | bwd_allreduce: 2779.18 | step: 39.35
{'loss': 1.1341, 'learning_rate': 6.522824169278419e-06, 'epoch': 0.74}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476
[2024-06-10 22:48:51,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1401.16 | bwd_inner_microstep: 1401.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2367
[2024-06-10 22:48:53,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.21 | bwd_microstep: 893.05 | bwd_inner_microstep: 893.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3521
[2024-06-10 22:48:54,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1222.25 | bwd_inner_microstep: 1222.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-10 22:48:56,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1480.22 | bwd_inner_microstep: 1480.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426
[2024-06-10 22:48:58,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.98 | bwd_microstep: 1309.27 | bwd_inner_microstep: 1309.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 22:49:00,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1245.38 | bwd_inner_microstep: 1245.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 22:49:02,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.40 | bwd_microstep: 1376.15 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3672
[2024-06-10 22:49:04,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.09 | bwd_microstep: 1354.06 | bwd_inner_microstep: 1354.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-10 22:49:05,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1287.08 | bwd_inner_microstep: 1287.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 22:49:07,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1249.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-10 22:49:09,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.10 | bwd_microstep: 1188.38 | bwd_inner_microstep: 1188.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3500
[2024-06-10 22:49:11,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1351.92 | bwd_inner_microstep: 1351.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979
[2024-06-10 22:49:12,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 893.99 | bwd_inner_microstep: 893.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 22:49:14,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1393.78 | bwd_inner_microstep: 1393.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-10 22:49:16,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.20 | bwd_microstep: 1476.11 | bwd_inner_microstep: 1476.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3625
[2024-06-10 22:49:18,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1464.01 | bwd_inner_microstep: 1463.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-10 22:49:19,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 793.13 | bwd_inner_microstep: 793.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-10 22:49:21,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.77 | bwd_microstep: 1612.69 | bwd_inner_microstep: 1612.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3842
[2024-06-10 22:49:24,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.65 | bwd_microstep: 1765.84 | bwd_inner_microstep: 1765.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 22:49:25,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1285.68 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3689
[2024-06-10 22:49:27,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1235.36 | bwd_inner_microstep: 1235.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 22:49:29,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1285.93 | bwd_inner_microstep: 1285.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-10 22:49:31,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1433.83 | bwd_inner_microstep: 1433.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2041
[2024-06-10 22:49:32,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 906.30 | bwd_inner_microstep: 906.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480
[2024-06-10 22:49:34,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 1217.09 | bwd_inner_microstep: 1217.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3722
[2024-06-10 22:49:36,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.35 | bwd_microstep: 1563.33 | bwd_inner_microstep: 1563.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 22:49:38,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1297.92 | bwd_inner_microstep: 1297.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-10 22:49:40,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.34 | bwd_microstep: 1490.10 | bwd_inner_microstep: 1490.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-10 22:49:42,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.40 | bwd_microstep: 1547.39 | bwd_inner_microstep: 1547.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3802
[2024-06-10 22:49:44,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1562.36 | bwd_inner_microstep: 1562.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422
[2024-06-10 22:49:46,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.19 | bwd_microstep: 1544.33 | bwd_inner_microstep: 1544.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-10 22:49:51,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-10 22:49:51,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 3573.30 | bwd_inner_microstep: 1685.96 | bwd_allreduce_microstep: 1887.28 | step_microstep: 37.67
[2024-06-10 22:49:51,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16001.22 | bwd: 44700.49 | bwd_inner: 42812.31 | bwd_allreduce: 1887.51 | step: 39.19
{'loss': 1.1561, 'learning_rate': 6.4951155695352595e-06, 'epoch': 0.74}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 22:49:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.31 | bwd_microstep: 1470.67 | bwd_inner_microstep: 1470.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 22:49:54,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1244.53 | bwd_inner_microstep: 1244.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304
[2024-06-10 22:49:56,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 973.76 | bwd_inner_microstep: 973.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3013
[2024-06-10 22:49:57,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.67 | bwd_microstep: 1226.26 | bwd_inner_microstep: 1226.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-10 22:50:00,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.81 | bwd_microstep: 1643.12 | bwd_inner_microstep: 1643.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 22:50:01,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1278.94 | bwd_inner_microstep: 1278.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793
[2024-06-10 22:50:03,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.37 | bwd_microstep: 1442.14 | bwd_inner_microstep: 1442.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-10 22:50:05,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.09 | bwd_microstep: 1146.57 | bwd_inner_microstep: 1146.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401
[2024-06-10 22:50:07,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.74 | bwd_microstep: 1178.70 | bwd_inner_microstep: 1178.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751
[2024-06-10 22:50:09,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.91 | bwd_microstep: 1442.52 | bwd_inner_microstep: 1442.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-10 22:50:10,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.89 | bwd_microstep: 1187.32 | bwd_inner_microstep: 1187.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2156
[2024-06-10 22:50:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.87 | bwd_microstep: 947.11 | bwd_inner_microstep: 947.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3439
[2024-06-10 22:50:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1409.72 | bwd_inner_microstep: 1409.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693
[2024-06-10 22:50:16,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.00 | bwd_microstep: 1720.50 | bwd_inner_microstep: 1720.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 22:50:18,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.32 | bwd_microstep: 1340.73 | bwd_inner_microstep: 1340.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 22:50:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1490.22 | bwd_inner_microstep: 1490.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3687
[2024-06-10 22:50:22,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1331.66 | bwd_inner_microstep: 1331.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-10 22:50:24,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1450.29 | bwd_inner_microstep: 1450.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2933
[2024-06-10 22:50:25,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.05 | bwd_microstep: 1097.44 | bwd_inner_microstep: 1097.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 22:50:27,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.86 | bwd_microstep: 1558.37 | bwd_inner_microstep: 1558.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 22:50:29,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.41 | bwd_microstep: 1256.70 | bwd_inner_microstep: 1256.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-10 22:50:30,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.98 | bwd_microstep: 697.05 | bwd_inner_microstep: 697.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 22:50:32,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1286.04 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 22:50:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1299.08 | bwd_inner_microstep: 1299.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827
[2024-06-10 22:50:36,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1491.58 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-10 22:50:37,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1251.60 | bwd_inner_microstep: 1251.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806
[2024-06-10 22:50:39,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1515.03 | bwd_inner_microstep: 1515.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 22:50:42,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.82 | bwd_microstep: 1546.50 | bwd_inner_microstep: 1546.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2268
[2024-06-10 22:50:43,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.33 | bwd_microstep: 809.09 | bwd_inner_microstep: 809.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808
[2024-06-10 22:50:45,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1359.49 | bwd_inner_microstep: 1359.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 22:50:47,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1550.61 | bwd_inner_microstep: 1550.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 22:50:49,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.04 | optimizer_step: 6.60
[2024-06-10 22:50:49,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 1850.86 | bwd_inner_microstep: 1089.59 | bwd_allreduce_microstep: 761.23 | step_microstep: 37.50
[2024-06-10 22:50:49,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.05 | bwd: 42494.27 | bwd_inner: 41732.11 | bwd_allreduce: 761.46 | step: 38.97
{'loss': 1.1984, 'learning_rate': 6.46745453392485e-06, 'epoch': 0.74}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3462
[2024-06-10 22:50:51,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.75 | bwd_microstep: 1407.79 | bwd_inner_microstep: 1407.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 22:50:53,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.48 | bwd_microstep: 1379.46 | bwd_inner_microstep: 1379.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3887
[2024-06-10 22:50:55,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.22 | bwd_microstep: 1480.69 | bwd_inner_microstep: 1480.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 22:50:57,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.17 | bwd_microstep: 1244.55 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-10 22:50:59,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1436.24 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 22:51:00,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1380.88 | bwd_inner_microstep: 1380.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-10 22:51:02,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.17 | bwd_microstep: 804.96 | bwd_inner_microstep: 804.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4019
[2024-06-10 22:51:04,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.74 | bwd_microstep: 1612.57 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4011
[2024-06-10 22:51:06,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.30 | bwd_microstep: 1815.80 | bwd_inner_microstep: 1815.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550
[2024-06-10 22:51:08,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1491.11 | bwd_inner_microstep: 1491.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 22:51:10,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.90 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673
[2024-06-10 22:51:12,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1580.43 | bwd_inner_microstep: 1580.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2117
[2024-06-10 22:51:14,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.46 | bwd_microstep: 919.54 | bwd_inner_microstep: 919.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478
[2024-06-10 22:51:16,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.84 | bwd_microstep: 1576.07 | bwd_inner_microstep: 1576.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3454
[2024-06-10 22:51:18,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1398.95 | bwd_inner_microstep: 1398.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 22:51:20,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-10 22:51:22,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.76 | bwd_microstep: 1570.00 | bwd_inner_microstep: 1569.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-10 22:51:23,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.22 | bwd_microstep: 893.15 | bwd_inner_microstep: 893.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629
[2024-06-10 22:51:25,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.36 | bwd_microstep: 1533.59 | bwd_inner_microstep: 1533.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 22:51:27,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1609.03 | bwd_inner_microstep: 1609.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 22:51:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1394.97 | bwd_inner_microstep: 1394.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895
[2024-06-10 22:51:31,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1483.86 | bwd_inner_microstep: 1483.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 22:51:33,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.69 | bwd_microstep: 1182.18 | bwd_inner_microstep: 1182.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3812
[2024-06-10 22:51:35,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.56 | bwd_microstep: 1414.46 | bwd_inner_microstep: 1414.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 22:51:36,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.13 | bwd_microstep: 804.63 | bwd_inner_microstep: 804.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-10 22:51:38,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1256.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-10 22:51:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.00 | bwd_microstep: 1274.35 | bwd_inner_microstep: 1274.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-10 22:51:42,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.04 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3580
[2024-06-10 22:51:44,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.20 | bwd_microstep: 1471.97 | bwd_inner_microstep: 1471.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-10 22:51:46,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1305.55 | bwd_inner_microstep: 1305.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461
[2024-06-10 22:51:48,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.44 | bwd_microstep: 1567.70 | bwd_inner_microstep: 1567.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426
[2024-06-10 22:51:51,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-10 22:51:51,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.15 | bwd_microstep: 3145.15 | bwd_inner_microstep: 1445.69 | bwd_allreduce_microstep: 1699.41 | step_microstep: 37.83
[2024-06-10 22:51:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16389.08 | bwd: 45675.92 | bwd_inner: 43975.62 | bwd_allreduce: 1699.64 | step: 39.29
{'loss': 1.1699, 'learning_rate': 6.439841159869233e-06, 'epoch': 0.75}
6 [22:09:24<7:36:02, 61.49s/it]
 74%|███████▍  | 1282/1726 [22:10:25<7:35:22, 61.54s/it]


 74%|███████▍  | 1282/1726 [22:10:25<7:35:22, 61.54s/it]
 74%|███████▍  | 1283/1726 [22:11:26<7:32:31, 61.29s/it]


 74%|███████▍  | 1283/1726 [22:11:26<7:32:31, 61.29s/it]
 74%|███████▍  | 1284/1726 [22:12:27<7:30:57, 61.22s/it]


 74%|███████▍  | 1284/1726 [22:12:27<7:30:57, 61.22s/it]
 74%|███████▍  | 1285/1726 [22:13:26<7:23:50, 60.39s/it]


 74%|███████▍  | 1285/1726 [22:13:26<7:23:50, 60.39s/it]
 75%|███████▍  | 1286/1726 [22:14:28<7:27:16, 60.99s/it]


 75%|███████▍  | 1286/1726 [2dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3550
[2024-06-10 22:51:53,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.13 | bwd_microstep: 1246.95 | bwd_inner_microstep: 1246.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897
[2024-06-10 22:51:55,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.59 | bwd_microstep: 1581.28 | bwd_inner_microstep: 1581.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2348
[2024-06-10 22:51:57,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.54 | bwd_microstep: 984.91 | bwd_inner_microstep: 984.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-10 22:51:59,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1544.23 | bwd_inner_microstep: 1544.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 22:52:01,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.97 | bwd_microstep: 1385.75 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 22:52:02,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 22:52:05,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.72 | bwd_microstep: 1640.69 | bwd_inner_microstep: 1640.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 22:52:07,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.44 | bwd_microstep: 1386.12 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 22:52:08,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1255.81 | bwd_inner_microstep: 1255.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 22:52:10,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1286.41 | bwd_inner_microstep: 1286.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1873
[2024-06-10 22:52:11,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.31 | bwd_microstep: 741.14 | bwd_inner_microstep: 741.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 22:52:13,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.01 | bwd_microstep: 1476.64 | bwd_inner_microstep: 1476.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3667
[2024-06-10 22:52:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.05 | bwd_microstep: 1548.96 | bwd_inner_microstep: 1548.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1949
[2024-06-10 22:52:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.42 | bwd_microstep: 890.32 | bwd_inner_microstep: 890.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-10 22:52:19,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.37 | bwd_microstep: 1508.58 | bwd_inner_microstep: 1508.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-10 22:52:21,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1386.90 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-10 22:52:23,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1393.63 | bwd_inner_microstep: 1393.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-10 22:52:25,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.86 | bwd_microstep: 1620.31 | bwd_inner_microstep: 1620.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596
[2024-06-10 22:52:27,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.83 | bwd_microstep: 1467.51 | bwd_inner_microstep: 1467.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 22:52:29,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.25 | bwd_microstep: 1569.77 | bwd_inner_microstep: 1569.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 22:52:31,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1394.94 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 22:52:33,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1423.80 | bwd_inner_microstep: 1423.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 22:52:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2652
[2024-06-10 22:52:37,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.02 | bwd_microstep: 1222.02 | bwd_inner_microstep: 1221.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 22:52:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.50 | bwd_microstep: 1662.13 | bwd_inner_microstep: 1662.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550
[2024-06-10 22:52:41,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.72 | bwd_microstep: 1586.41 | bwd_inner_microstep: 1586.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2202
[2024-06-10 22:52:42,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.19 | bwd_microstep: 858.95 | bwd_inner_microstep: 858.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3726
[2024-06-10 22:52:45,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.58 | bwd_microstep: 1730.37 | bwd_inner_microstep: 1730.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420
[2024-06-10 22:52:46,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1313.60 | bwd_inner_microstep: 1313.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3563
[2024-06-10 22:52:49,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1525.00 | bwd_inner_microstep: 1524.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770
[2024-06-10 22:52:50,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.14 | bwd_microstep: 1356.58 | bwd_inner_microstep: 1356.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 22:52:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.05 | optimizer_step: 6.63
[2024-06-10 22:52:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.90 | bwd_microstep: 1998.99 | bwd_inner_microstep: 1749.42 | bwd_allreduce_microstep: 249.53 | step_microstep: 37.76
[2024-06-10 22:52:53,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16572.63 | bwd: 44766.09 | bwd_inner: 44515.67 | bwd_allreduce: 249.75 | step: 39.24
{'loss': 1.1843, 'learning_rate': 6.412275544622557e-06, 'epoch': 0.75}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1856
[2024-06-10 22:52:54,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.12 | bwd_microstep: 759.18 | bwd_inner_microstep: 759.11 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-10 22:52:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2691
[2024-06-10 22:52:57,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.87 | bwd_microstep: 1125.15 | bwd_inner_microstep: 1125.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846
[2024-06-10 22:52:59,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.36 | bwd_microstep: 1466.07 | bwd_inner_microstep: 1466.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479
[2024-06-10 22:53:01,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.39 | bwd_microstep: 1215.27 | bwd_inner_microstep: 1215.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4124
[2024-06-10 22:53:03,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.67 | bwd_microstep: 1595.15 | bwd_inner_microstep: 1595.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3738
[2024-06-10 22:53:06,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1630.91 | bwd_inner_microstep: 1630.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1889
[2024-06-10 22:53:07,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.95 | bwd_microstep: 712.25 | bwd_inner_microstep: 712.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 22:53:08,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1247.14 | bwd_inner_microstep: 1247.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2385
[2024-06-10 22:53:09,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.99 | bwd_microstep: 837.06 | bwd_inner_microstep: 837.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-10 22:53:11,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.91 | bwd_microstep: 1152.43 | bwd_inner_microstep: 1152.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 22:53:13,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 22:53:15,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1477.61 | bwd_inner_microstep: 1477.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 22:53:17,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.58 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954
[2024-06-10 22:53:18,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.99 | bwd_microstep: 850.72 | bwd_inner_microstep: 850.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437
[2024-06-10 22:53:20,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1308.78 | bwd_inner_microstep: 1308.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3829
[2024-06-10 22:53:22,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1387.34 | bwd_inner_microstep: 1387.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529
[2024-06-10 22:53:24,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.50 | bwd_microstep: 1411.49 | bwd_inner_microstep: 1411.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135
[2024-06-10 22:53:25,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.46 | bwd_microstep: 833.24 | bwd_inner_microstep: 833.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618
[2024-06-10 22:53:27,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1434.31 | bwd_inner_microstep: 1434.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 22:53:29,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1457.76 | bwd_inner_microstep: 1457.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 22:53:31,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1281.39 | bwd_inner_microstep: 1281.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-10 22:53:33,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1456.67 | bwd_inner_microstep: 1456.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998
[2024-06-10 22:53:34,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.27 | bwd_microstep: 737.27 | bwd_inner_microstep: 737.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 22:53:36,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1283.48 | bwd_inner_microstep: 1283.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3777
[2024-06-10 22:53:38,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.93 | bwd_microstep: 1473.16 | bwd_inner_microstep: 1473.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-10 22:53:40,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1507.81 | bwd_inner_microstep: 1507.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-10 22:53:42,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1445.08 | bwd_inner_microstep: 1445.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3738
[2024-06-10 22:53:44,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.07 | bwd_microstep: 1457.06 | bwd_inner_microstep: 1457.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-10 22:53:46,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.73 | bwd_microstep: 1634.74 | bwd_inner_microstep: 1634.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 22:53:48,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.70 | bwd_microstep: 1304.56 | bwd_inner_microstep: 1304.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-10 22:53:54,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.33 | optimizer_step: 6.63
[2024-06-10 22:53:54,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.08 | bwd_microstep: 5821.06 | bwd_inner_microstep: 1477.97 | bwd_allreduce_microstep: 4343.02 | step_microstep: 38.80
[2024-06-10 22:53:54,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15383.97 | bwd: 45415.06 | bwd_inner: 41071.07 | bwd_allreduce: 4343.30 | step: 40.26
{'loss': 1.1461, 'learning_rate': 6.384757785270777e-06, 'epoch': 0.75}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 22:53:55,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.25 | bwd_microstep: 797.51 | bwd_inner_microstep: 797.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 22:53:57,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.25 | bwd_microstep: 1242.47 | bwd_inner_microstep: 1242.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2817
[2024-06-10 22:53:59,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.02 | bwd_microstep: 1108.97 | bwd_inner_microstep: 1108.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1878
[2024-06-10 22:54:00,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.47 | bwd_microstep: 771.40 | bwd_inner_microstep: 771.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 22:54:01,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1339.46 | bwd_inner_microstep: 1339.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-10 22:54:03,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.27 | bwd_microstep: 1277.13 | bwd_inner_microstep: 1277.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1934
[2024-06-10 22:54:04,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.86 | bwd_microstep: 741.64 | bwd_inner_microstep: 741.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 877
[2024-06-10 22:54:05,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.24 | bwd_microstep: 398.19 | bwd_inner_microstep: 398.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-10 22:54:06,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1216.11 | bwd_inner_microstep: 1216.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3486
[2024-06-10 22:54:08,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.25 | bwd_microstep: 1431.07 | bwd_inner_microstep: 1431.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3590
[2024-06-10 22:54:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.58 | bwd_microstep: 1367.72 | bwd_inner_microstep: 1367.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 22:54:12,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1416.29 | bwd_inner_microstep: 1416.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3490
[2024-06-10 22:54:14,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.47 | bwd_microstep: 1528.72 | bwd_inner_microstep: 1528.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3827
[2024-06-10 22:54:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.08 | bwd_microstep: 1583.96 | bwd_inner_microstep: 1583.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 22:54:19,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 22:54:20,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.89 | bwd_microstep: 799.85 | bwd_inner_microstep: 799.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904
[2024-06-10 22:54:22,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.72 | bwd_microstep: 1492.95 | bwd_inner_microstep: 1492.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445
[2024-06-10 22:54:23,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1284.35 | bwd_inner_microstep: 1284.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4054
[2024-06-10 22:54:26,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.65 | bwd_microstep: 1822.18 | bwd_inner_microstep: 1822.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 22:54:28,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1553.69 | bwd_inner_microstep: 1553.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-10 22:54:30,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.21 | bwd_microstep: 1613.44 | bwd_inner_microstep: 1613.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-10 22:54:32,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.23 | bwd_microstep: 1402.10 | bwd_inner_microstep: 1402.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 22:54:34,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1284.84 | bwd_inner_microstep: 1284.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 22:54:36,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1347.88 | bwd_inner_microstep: 1347.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 866
[2024-06-10 22:54:36,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 140.97 | bwd_microstep: 366.26 | bwd_inner_microstep: 366.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3817
[2024-06-10 22:54:38,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1401.30 | bwd_inner_microstep: 1401.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-10 22:54:41,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.55 | bwd_microstep: 1624.71 | bwd_inner_microstep: 1624.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3232
[2024-06-10 22:54:43,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1424.32 | bwd_inner_microstep: 1424.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992
[2024-06-10 22:54:44,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.56 | bwd_microstep: 897.04 | bwd_inner_microstep: 897.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3544
[2024-06-10 22:54:46,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.71 | bwd_microstep: 1543.89 | bwd_inner_microstep: 1543.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449
[2024-06-10 22:54:48,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.12 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 22:54:55,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 22:54:55,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.22 | bwd_microstep: 6173.26 | bwd_inner_microstep: 1803.63 | bwd_allreduce_microstep: 4369.56 | step_microstep: 38.57
[2024-06-10 22:54:55,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15108.27 | bwd: 45016.08 | bwd_inner: 40645.57 | bwd_allreduce: 4369.81 | step: 40.05
{'loss': 1.2173, 'learning_rate': 6.357287978731292e-06, 'epoch': 0.75}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411
[2024-06-10 22:54:56,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1300.88 | bwd_inner_microstep: 1300.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102
[2024-06-10 22:54:58,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 821.07 | bwd_inner_microstep: 821.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867
[2024-06-10 22:55:00,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.87 | bwd_microstep: 1658.86 | bwd_inner_microstep: 1658.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 22:55:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.97 | bwd_microstep: 1309.99 | bwd_inner_microstep: 1309.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3583
[2024-06-10 22:55:03,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1264.87 | bwd_inner_microstep: 1264.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946
[2024-06-10 22:55:04,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.75 | bwd_microstep: 728.02 | bwd_inner_microstep: 728.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498
[2024-06-10 22:55:06,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.24 | bwd_microstep: 1187.69 | bwd_inner_microstep: 1187.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 22:55:08,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1386.27 | bwd_inner_microstep: 1386.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-10 22:55:10,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1427.26 | bwd_inner_microstep: 1427.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-10 22:55:12,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1398.95 | bwd_inner_microstep: 1398.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-10 22:55:14,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.34 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1896
[2024-06-10 22:55:15,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.96 | bwd_microstep: 683.47 | bwd_inner_microstep: 683.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 22:55:17,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1390.63 | bwd_inner_microstep: 1390.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3707
[2024-06-10 22:55:19,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.38 | bwd_microstep: 1466.83 | bwd_inner_microstep: 1466.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-10 22:55:21,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.25 | bwd_microstep: 1614.53 | bwd_inner_microstep: 1614.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446
[2024-06-10 22:55:23,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.51 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3447
[2024-06-10 22:55:25,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1476.75 | bwd_inner_microstep: 1476.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932
[2024-06-10 22:55:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.70 | bwd_microstep: 696.50 | bwd_inner_microstep: 696.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-10 22:55:28,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1403.97 | bwd_inner_microstep: 1403.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3628
[2024-06-10 22:55:29,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.90 | bwd_microstep: 1216.47 | bwd_inner_microstep: 1216.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-10 22:55:30,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.62 | bwd_microstep: 913.70 | bwd_inner_microstep: 913.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-10 22:55:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 22:55:35,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.94 | bwd_microstep: 1556.38 | bwd_inner_microstep: 1556.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-10 22:55:37,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.16 | bwd_microstep: 1502.60 | bwd_inner_microstep: 1502.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2531
[2024-06-10 22:55:38,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.93 | bwd_microstep: 1150.12 | bwd_inner_microstep: 1150.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 22:55:40,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1389.41 | bwd_inner_microstep: 1389.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-10 22:55:42,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 22:55:44,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.64 | bwd_microstep: 1455.99 | bwd_inner_microstep: 1455.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 22:55:46,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.26 | bwd_microstep: 975.19 | bwd_inner_microstep: 975.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3780
[2024-06-10 22:55:48,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.31 | bwd_microstep: 1692.31 | bwd_inner_microstep: 1692.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228
[2024-06-10 22:55:49,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.35 | bwd_microstep: 961.82 | bwd_inner_microstep: 961.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 22:55:57,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-10 22:55:57,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 7411.73 | bwd_inner_microstep: 1661.19 | bwd_allreduce_microstep: 5750.48 | step_microstep: 37.73
[2024-06-10 22:55:57,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15382.19 | bwd: 47008.51 | bwd_inner: 41257.12 | bwd_allreduce: 5750.71 | step: 39.22
{'loss': 1.2659, 'learning_rate': 6.3298662217526315e-06, 'epoch': 0.75}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488
[2024-06-10 22:55:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1433.30 | bwd_inner_microstep: 1433.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-10 22:56:01,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.36 | bwd_microstep: 1278.74 | bwd_inner_microstep: 1278.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2330
[2024-06-10 22:56:02,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.72 | bwd_microstep: 914.26 | bwd_inner_microstep: 914.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-10 22:56:05,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.85 | bwd_microstep: 1640.71 | bwd_inner_microstep: 1640.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 22:56:06,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.46 | bwd_microstep: 1295.73 | bwd_inner_microstep: 1295.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 22:56:08,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1338.12 | bwd_inner_microstep: 1338.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-10 22:56:10,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.70 | bwd_microstep: 1430.08 | bwd_inner_microstep: 1430.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 22:56:12,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1377.17 | bwd_inner_microstep: 1377.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 879
[2024-06-10 22:56:13,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.61 | bwd_microstep: 367.22 | bwd_inner_microstep: 367.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1907
[2024-06-10 22:56:14,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.09 | bwd_microstep: 715.47 | bwd_inner_microstep: 715.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-10 22:56:15,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.82 | bwd_microstep: 1301.27 | bwd_inner_microstep: 1301.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-10 22:56:17,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1338.55 | bwd_inner_microstep: 1338.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3660
[2024-06-10 22:56:19,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.81 | bwd_microstep: 1577.71 | bwd_inner_microstep: 1577.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 22:56:21,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1346.53 | bwd_inner_microstep: 1346.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3497
[2024-06-10 22:56:24,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.47 | bwd_microstep: 1678.85 | bwd_inner_microstep: 1678.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-10 22:56:26,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.11 | bwd_microstep: 1491.41 | bwd_inner_microstep: 1491.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-10 22:56:28,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.00 | bwd_microstep: 1554.13 | bwd_inner_microstep: 1554.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 22:56:30,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.74 | bwd_microstep: 1378.64 | bwd_inner_microstep: 1378.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 22:56:32,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 877
[2024-06-10 22:56:32,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.44 | bwd_microstep: 397.62 | bwd_inner_microstep: 397.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3477
[2024-06-10 22:56:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1419.83 | bwd_inner_microstep: 1419.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-10 22:56:36,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1395.75 | bwd_inner_microstep: 1395.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-10 22:56:38,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1399.01 | bwd_inner_microstep: 1398.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 22:56:40,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-10 22:56:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.08 | bwd_microstep: 1159.36 | bwd_inner_microstep: 1159.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-10 22:56:43,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1348.11 | bwd_inner_microstep: 1348.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556
[2024-06-10 22:56:45,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1293.74 | bwd_inner_microstep: 1293.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 22:56:47,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1491.40 | bwd_inner_microstep: 1491.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3562
[2024-06-10 22:56:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.45 | bwd_microstep: 1440.72 | bwd_inner_microstep: 1440.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 22:56:51,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1500.52 | bwd_inner_microstep: 1500.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2658
[2024-06-10 22:56:53,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.75 | bwd_microstep: 1020.71 | bwd_inner_microstep: 1020.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-10 22:56:59,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.09 | optimizer_step: 6.63
[2024-06-10 22:56:59,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.08 | bwd_microstep: 5678.51 | bwd_inner_microstep: 1087.67 | bwd_allreduce_microstep: 4590.79 | step_microstep: 37.81
[2024-06-10 22:56:59,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15384.94 | bwd: 45678.21 | bwd_inner: 41086.52 | bwd_allreduce: 4591.02 | step: 39.40
{'loss': 1.1452, 'learning_rate': 6.3024926109140725e-06, 'epoch': 0.75}
2:14:28<7:27:16, 60.99s/it]
 75%|███████▍  | 1287/1726 [22:15:30<7:27:46, 61.20s/it]


 75%|███████▍  | 1287/1726 [22:15:30<7:27:46, 61.20s/it]
 75%|███████▍  | 1288/1726 [22:16:31<7:26:34, 61.18s/it]


 75%|███████▍  | 1288/1726 [22:16:31<7:26:34, 61.18s/it]
 75%|███████▍  | 1289/1726 [22:17:31<7:23:58, 60.96s/it]


 75%|███████▍  | 1289/1726 [22:17:31<7:23:58, 60.96s/it]
 75%|███████▍  | 1290/1726 [22:18:34<7:26:47, 61.49s/it]


 75%|███████▍  | 1290/1726 [22:18:34<7:26:47, 61.49s/it]
 75%|███████▍  | 1291/1726 [22:19:35<7:25:35, 61.46s/it]


 75%|███████▍  | 1291/1726 [22:19dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3039
[2024-06-10 22:57:00,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.01 | bwd_microstep: 1272.50 | bwd_inner_microstep: 1272.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 22:57:02,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.94 | bwd_microstep: 1242.52 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3860
[2024-06-10 22:57:04,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1455.96 | bwd_inner_microstep: 1455.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3886
[2024-06-10 22:57:06,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.41 | bwd_microstep: 1489.82 | bwd_inner_microstep: 1489.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-10 22:57:08,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.14 | bwd_microstep: 1543.41 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 22:57:10,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1341.68 | bwd_inner_microstep: 1341.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 22:57:12,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.50 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938
[2024-06-10 22:57:13,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.69 | bwd_microstep: 725.90 | bwd_inner_microstep: 725.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 22:57:15,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.03 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3427
[2024-06-10 22:57:16,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1183.95 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644
[2024-06-10 22:57:18,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492
[2024-06-10 22:57:20,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1442.12 | bwd_inner_microstep: 1442.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-10 22:57:22,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1479.11 | bwd_inner_microstep: 1479.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3657
[2024-06-10 22:57:25,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.12 | bwd_microstep: 1655.04 | bwd_inner_microstep: 1655.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1934
[2024-06-10 22:57:26,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.71 | bwd_microstep: 821.95 | bwd_inner_microstep: 821.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3679
[2024-06-10 22:57:28,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.79 | bwd_microstep: 1449.31 | bwd_inner_microstep: 1449.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1963
[2024-06-10 22:57:29,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.08 | bwd_microstep: 854.76 | bwd_inner_microstep: 854.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3381
[2024-06-10 22:57:31,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.91 | bwd_microstep: 1271.81 | bwd_inner_microstep: 1271.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 22:57:33,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1345.37 | bwd_inner_microstep: 1345.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3619
[2024-06-10 22:57:35,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1570.70 | bwd_inner_microstep: 1570.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 22:57:37,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.88 | bwd_microstep: 1479.61 | bwd_inner_microstep: 1479.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-10 22:57:39,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1412.72 | bwd_inner_microstep: 1412.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 22:57:41,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1536.66 | bwd_inner_microstep: 1536.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 22:57:43,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1460.42 | bwd_inner_microstep: 1460.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3558
[2024-06-10 22:57:45,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.14 | bwd_microstep: 1360.27 | bwd_inner_microstep: 1360.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 22:57:47,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.81 | bwd_microstep: 1289.65 | bwd_inner_microstep: 1289.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 22:57:49,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1508.70 | bwd_inner_microstep: 1508.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606
[2024-06-10 22:57:51,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.53 | bwd_microstep: 1540.64 | bwd_inner_microstep: 1540.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801
[2024-06-10 22:57:53,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1555.68 | bwd_inner_microstep: 1555.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-10 22:57:54,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.66 | bwd_microstep: 977.29 | bwd_inner_microstep: 977.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-10 22:57:56,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.16 | bwd_microstep: 1505.13 | bwd_inner_microstep: 1505.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-10 22:58:00,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 22:58:00,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.46 | bwd_microstep: 3397.79 | bwd_inner_microstep: 1139.93 | bwd_allreduce_microstep: 2257.81 | step_microstep: 38.02
[2024-06-10 22:58:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15981.76 | bwd: 45155.83 | bwd_inner: 42897.12 | bwd_allreduce: 2258.04 | step: 39.49
{'loss': 1.1915, 'learning_rate': 6.275167242625331e-06, 'epoch': 0.75}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 22:58:02,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1472.12 | bwd_inner_microstep: 1471.93 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-10 22:58:03,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.17 | bwd_microstep: 695.91 | bwd_inner_microstep: 695.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915
[2024-06-10 22:58:05,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1585.46 | bwd_inner_microstep: 1585.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2330
[2024-06-10 22:58:07,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.38 | bwd_microstep: 920.27 | bwd_inner_microstep: 920.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 22:58:09,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1490.20 | bwd_inner_microstep: 1490.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-10 22:58:10,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 793.92 | bwd_inner_microstep: 793.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 22:58:12,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 22:58:13,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411
[2024-06-10 22:58:15,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.78 | bwd_microstep: 1308.03 | bwd_inner_microstep: 1308.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 22:58:17,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518
[2024-06-10 22:58:19,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.10 | bwd_microstep: 1511.61 | bwd_inner_microstep: 1511.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1964
[2024-06-10 22:58:20,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.45 | bwd_microstep: 853.88 | bwd_inner_microstep: 853.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430
[2024-06-10 22:58:22,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.47 | bwd_microstep: 1284.96 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3518
[2024-06-10 22:58:24,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1434.32 | bwd_inner_microstep: 1434.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 22:58:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1386.26 | bwd_inner_microstep: 1386.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 22:58:28,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.70 | bwd_microstep: 1387.95 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-10 22:58:30,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.09 | bwd_microstep: 1314.24 | bwd_inner_microstep: 1314.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2390
[2024-06-10 22:58:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.34 | bwd_microstep: 906.81 | bwd_inner_microstep: 906.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-10 22:58:32,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.11 | bwd_microstep: 930.73 | bwd_inner_microstep: 930.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 22:58:34,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1311.97 | bwd_inner_microstep: 1311.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 22:58:36,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.10 | bwd_microstep: 1293.59 | bwd_inner_microstep: 1293.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-10 22:58:38,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.86 | bwd_microstep: 1460.66 | bwd_inner_microstep: 1460.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 22:58:40,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.85 | bwd_microstep: 1654.63 | bwd_inner_microstep: 1654.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-10 22:58:42,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1510.55 | bwd_inner_microstep: 1510.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 22:58:44,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1498.17 | bwd_inner_microstep: 1498.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3809
[2024-06-10 22:58:46,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.50 | bwd_microstep: 1414.76 | bwd_inner_microstep: 1414.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586
[2024-06-10 22:58:48,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1436.08 | bwd_inner_microstep: 1436.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3765
[2024-06-10 22:58:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.56 | bwd_microstep: 1469.20 | bwd_inner_microstep: 1469.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 22:58:53,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1512.40 | bwd_inner_microstep: 1512.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-10 22:58:55,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.34 | bwd_microstep: 1636.07 | bwd_inner_microstep: 1636.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-10 22:58:57,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1538.52 | bwd_inner_microstep: 1538.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-10 22:59:00,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.04 | optimizer_step: 6.63
[2024-06-10 22:59:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.47 | bwd_microstep: 2537.21 | bwd_inner_microstep: 1818.58 | bwd_allreduce_microstep: 718.58 | step_microstep: 37.69
[2024-06-10 22:59:00,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15953.58 | bwd: 43575.26 | bwd_inner: 42855.65 | bwd_allreduce: 718.88 | step: 39.35
{'loss': 1.1535, 'learning_rate': 6.247890213126213e-06, 'epoch': 0.75}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 22:59:02,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.39 | bwd_microstep: 1273.43 | bwd_inner_microstep: 1273.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 22:59:04,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-10 22:59:05,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.29 | bwd_microstep: 1180.14 | bwd_inner_microstep: 1180.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-10 22:59:07,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.54 | bwd_microstep: 1378.03 | bwd_inner_microstep: 1378.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4303
[2024-06-10 22:59:10,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.28 | bwd_microstep: 1781.89 | bwd_inner_microstep: 1781.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 22:59:12,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 22:59:13,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.58 | bwd_microstep: 1342.66 | bwd_inner_microstep: 1342.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 22:59:15,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.36 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-10 22:59:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.04 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4020
[2024-06-10 22:59:19,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.52 | bwd_microstep: 1519.78 | bwd_inner_microstep: 1519.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2108
[2024-06-10 22:59:20,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.28 | bwd_microstep: 732.26 | bwd_inner_microstep: 732.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1924
[2024-06-10 22:59:21,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.31 | bwd_microstep: 728.15 | bwd_inner_microstep: 728.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3707
[2024-06-10 22:59:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1545.92 | bwd_inner_microstep: 1545.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 22:59:25,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.53 | bwd_microstep: 1380.61 | bwd_inner_microstep: 1380.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523
[2024-06-10 22:59:28,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.73 | bwd_microstep: 1585.76 | bwd_inner_microstep: 1585.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2331
[2024-06-10 22:59:29,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.89 | bwd_microstep: 919.02 | bwd_inner_microstep: 918.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-10 22:59:31,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3523
[2024-06-10 22:59:33,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1196.90 | bwd_inner_microstep: 1196.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 22:59:33,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.63 | bwd_microstep: 696.77 | bwd_inner_microstep: 696.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-10 22:59:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1454.60 | bwd_inner_microstep: 1454.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 22:59:37,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1353.22 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-10 22:59:39,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1432.77 | bwd_inner_microstep: 1432.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-10 22:59:41,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.03 | bwd_microstep: 1359.09 | bwd_inner_microstep: 1359.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-10 22:59:43,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1289.97 | bwd_inner_microstep: 1289.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-10 22:59:45,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.95 | bwd_microstep: 1216.11 | bwd_inner_microstep: 1216.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3558
[2024-06-10 22:59:46,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.10 | bwd_microstep: 1233.39 | bwd_inner_microstep: 1233.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831
[2024-06-10 22:59:49,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.25 | bwd_microstep: 1518.02 | bwd_inner_microstep: 1517.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3432
[2024-06-10 22:59:50,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1426.17 | bwd_inner_microstep: 1426.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 22:59:52,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 1339.77 | bwd_inner_microstep: 1339.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595
[2024-06-10 22:59:55,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.33 | bwd_microstep: 1596.70 | bwd_inner_microstep: 1596.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2084
[2024-06-10 22:59:56,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.55 | bwd_microstep: 1012.24 | bwd_inner_microstep: 1012.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-10 23:00:00,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-10 23:00:00,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.75 | bwd_microstep: 3524.11 | bwd_inner_microstep: 1855.47 | bwd_allreduce_microstep: 1668.59 | step_microstep: 37.68
[2024-06-10 23:00:00,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15774.91 | bwd: 43908.91 | bwd_inner: 42239.42 | bwd_allreduce: 1668.82 | step: 39.20
{'loss': 1.1959, 'learning_rate': 6.220661618486268e-06, 'epoch': 0.75}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933
[2024-06-10 23:00:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.14 | bwd_microstep: 817.20 | bwd_inner_microstep: 817.07 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3928
[2024-06-10 23:00:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.20 | bwd_microstep: 1894.76 | bwd_inner_microstep: 1894.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3896
[2024-06-10 23:00:06,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.46 | bwd_microstep: 1512.13 | bwd_inner_microstep: 1512.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-10 23:00:07,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 791.08 | bwd_inner_microstep: 791.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3474
[2024-06-10 23:00:09,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.15 | bwd_microstep: 1212.00 | bwd_inner_microstep: 1211.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 23:00:11,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 1403.87 | bwd_inner_microstep: 1403.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 23:00:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1388.25 | bwd_inner_microstep: 1388.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3715
[2024-06-10 23:00:15,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.61 | bwd_microstep: 1464.44 | bwd_inner_microstep: 1464.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 23:00:16,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1249.06 | bwd_inner_microstep: 1249.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2202
[2024-06-10 23:00:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.18 | bwd_microstep: 987.66 | bwd_inner_microstep: 987.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 23:00:19,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.18 | bwd_microstep: 803.14 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 23:00:21,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.14 | bwd_microstep: 1295.29 | bwd_inner_microstep: 1295.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506
[2024-06-10 23:00:22,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1347.46 | bwd_inner_microstep: 1347.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 23:00:24,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1350.63 | bwd_inner_microstep: 1350.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 23:00:26,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1390.58 | bwd_inner_microstep: 1390.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3529
[2024-06-10 23:00:28,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1553.54 | bwd_inner_microstep: 1553.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3533
[2024-06-10 23:00:30,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1518.50 | bwd_inner_microstep: 1518.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3537
[2024-06-10 23:00:33,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.72 | bwd_microstep: 1535.08 | bwd_inner_microstep: 1535.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3465
[2024-06-10 23:00:34,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.63 | bwd_microstep: 1244.41 | bwd_inner_microstep: 1244.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3819
[2024-06-10 23:00:36,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.37 | bwd_microstep: 1511.94 | bwd_inner_microstep: 1511.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3832
[2024-06-10 23:00:39,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.33 | bwd_microstep: 1584.69 | bwd_inner_microstep: 1584.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 23:00:40,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1377.20 | bwd_inner_microstep: 1377.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3527
[2024-06-10 23:00:42,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.19 | bwd_microstep: 1356.36 | bwd_inner_microstep: 1356.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 23:00:44,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.73 | bwd_microstep: 1558.37 | bwd_inner_microstep: 1558.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-10 23:00:46,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1272.52 | bwd_inner_microstep: 1272.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-10 23:00:48,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 23:00:50,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1256.01 | bwd_inner_microstep: 1255.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769
[2024-06-10 23:00:52,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1474.68 | bwd_inner_microstep: 1474.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 23:00:54,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.60 | bwd_microstep: 1372.19 | bwd_inner_microstep: 1372.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382
[2024-06-10 23:00:56,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.54 | bwd_microstep: 1271.87 | bwd_inner_microstep: 1271.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-10 23:00:58,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.41 | bwd_microstep: 1478.64 | bwd_inner_microstep: 1478.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 23:01:01,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.03 | optimizer_step: 6.59
[2024-06-10 23:01:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 2449.36 | bwd_inner_microstep: 1734.86 | bwd_allreduce_microstep: 714.46 | step_microstep: 37.58
[2024-06-10 23:01:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16218.18 | bwd: 44171.64 | bwd_inner: 43456.18 | bwd_allreduce: 714.73 | step: 39.10
{'loss': 1.1778, 'learning_rate': 6.1934815546044765e-06, 'epoch': 0.75}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-10 23:01:03,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1252.98 | bwd_inner_microstep: 1252.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893
[2024-06-10 23:01:05,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1680.71 | bwd_inner_microstep: 1680.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-10 23:01:07,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1300.95 | bwd_inner_microstep: 1300.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4138
[2024-06-10 23:01:09,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.38 | bwd_microstep: 1442.47 | bwd_inner_microstep: 1442.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3469
[2024-06-10 23:01:11,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.72 | bwd_microstep: 1329.87 | bwd_inner_microstep: 1329.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758
[2024-06-10 23:01:12,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1437.50 | bwd_inner_microstep: 1437.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 23:01:13,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.06 | bwd_microstep: 678.73 | bwd_inner_microstep: 678.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 23:01:15,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.55 | bwd_microstep: 1148.16 | bwd_inner_microstep: 1148.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 23:01:17,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 23:01:19,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.56 | bwd_microstep: 1624.56 | bwd_inner_microstep: 1624.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 23:01:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1282.71 | bwd_inner_microstep: 1282.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2054
[2024-06-10 23:01:22,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.16 | bwd_microstep: 864.09 | bwd_inner_microstep: 864.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964
[2024-06-10 23:01:23,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.60 | bwd_microstep: 826.68 | bwd_inner_microstep: 826.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-10 23:01:26,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.23 | bwd_microstep: 1719.51 | bwd_inner_microstep: 1719.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 23:01:28,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1349.75 | bwd_inner_microstep: 1349.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 23:01:29,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3516
[2024-06-10 23:01:31,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1318.46 | bwd_inner_microstep: 1318.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-10 23:01:33,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1518.51 | bwd_inner_microstep: 1518.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3467
[2024-06-10 23:01:35,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.34 | bwd_microstep: 1455.45 | bwd_inner_microstep: 1455.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3706
[2024-06-10 23:01:37,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.09 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3513
[2024-06-10 23:01:39,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1505.03 | bwd_inner_microstep: 1505.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3614
[2024-06-10 23:01:41,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1339.47 | bwd_inner_microstep: 1339.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-10 23:01:43,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1493.70 | bwd_inner_microstep: 1493.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 23:01:45,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.18 | bwd_microstep: 1429.16 | bwd_inner_microstep: 1429.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 23:01:47,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.43 | bwd_microstep: 1535.14 | bwd_inner_microstep: 1535.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 23:01:49,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1416.28 | bwd_inner_microstep: 1416.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-10 23:01:52,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.12 | bwd_microstep: 1606.31 | bwd_inner_microstep: 1606.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-10 23:01:54,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1409.33 | bwd_inner_microstep: 1409.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-10 23:01:56,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1612.48 | bwd_inner_microstep: 1612.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453
[2024-06-10 23:01:57,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.32 | bwd_microstep: 1188.81 | bwd_inner_microstep: 1188.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3680
[2024-06-10 23:01:59,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1306.78 | bwd_inner_microstep: 1306.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-10 23:02:02,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.04 | optimizer_step: 6.61
[2024-06-10 23:02:02,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.55 | bwd_microstep: 1876.16 | bwd_inner_microstep: 1588.85 | bwd_allreduce_microstep: 287.27 | step_microstep: 37.49
[2024-06-10 23:02:02,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16392.42 | bwd: 44207.11 | bwd_inner: 43918.95 | bwd_allreduce: 287.50 | step: 38.93
{'loss': 1.2053, 'learning_rate': 6.1663501172088726e-06, 'epoch': 0.75}
:35<7:25:35, 61.46s/it]
 75%|███████▍  | 1292/1726 [22:20:37<7:24:35, 61.46s/it]


 75%|███████▍  | 1292/1726 [22:20:37<7:24:35, 61.46s/it]
 75%|███████▍  | 1293/1726 [22:21:37<7:20:05, 60.98s/it]


 75%|███████▍  | 1293/1726 [22:21:37<7:20:05, 60.98s/it]
 75%|███████▍  | 1294/1726 [22:22:37<7:16:59, 60.69s/it]


 75%|███████▍  | 1294/1726 [22:22:37<7:16:59, 60.69s/it]
 75%|███████▌  | 1295/1726 [22:23:38<7:16:03, 60.70s/it]


 75%|███████▌  | 1295/1726 [22:23:38<7:16:03, 60.70s/it]
 75%|███████▌  | 1296/1726 [22:24:38<7:15:32, 60.77s/it]


 75%|███████▌  | 1296/1726 [22:24:38<dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 23:02:04,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.19 | bwd_microstep: 1341.82 | bwd_inner_microstep: 1341.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 23:02:05,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.22 | bwd_microstep: 1278.90 | bwd_inner_microstep: 1278.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3899
[2024-06-10 23:02:08,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.84 | bwd_microstep: 1684.23 | bwd_inner_microstep: 1684.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854
[2024-06-10 23:02:10,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.76 | bwd_microstep: 1662.41 | bwd_inner_microstep: 1662.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2266
[2024-06-10 23:02:11,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.27 | bwd_microstep: 967.52 | bwd_inner_microstep: 967.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-10 23:02:13,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1390.79 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3939
[2024-06-10 23:02:15,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1395.60 | bwd_inner_microstep: 1395.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-10 23:02:17,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.06 | bwd_microstep: 1631.70 | bwd_inner_microstep: 1631.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1974
[2024-06-10 23:02:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.89 | bwd_microstep: 842.11 | bwd_inner_microstep: 842.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1862
[2024-06-10 23:02:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.04 | bwd_microstep: 739.45 | bwd_inner_microstep: 739.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2833
[2024-06-10 23:02:21,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.70 | bwd_microstep: 1062.70 | bwd_inner_microstep: 1062.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 23:02:23,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.78 | bwd_microstep: 1519.74 | bwd_inner_microstep: 1519.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 23:02:25,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.19 | bwd_microstep: 1420.55 | bwd_inner_microstep: 1420.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2151
[2024-06-10 23:02:26,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.53 | bwd_microstep: 945.58 | bwd_inner_microstep: 945.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684
[2024-06-10 23:02:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.43 | bwd_microstep: 1721.04 | bwd_inner_microstep: 1721.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3524
[2024-06-10 23:02:31,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.58 | bwd_microstep: 1405.75 | bwd_inner_microstep: 1405.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3669
[2024-06-10 23:02:33,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.65 | bwd_microstep: 1472.91 | bwd_inner_microstep: 1472.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3532
[2024-06-10 23:02:35,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.28 | bwd_microstep: 1442.88 | bwd_inner_microstep: 1442.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 23:02:37,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.23 | bwd_microstep: 1527.11 | bwd_inner_microstep: 1527.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-10 23:02:39,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.89 | bwd_microstep: 1617.04 | bwd_inner_microstep: 1617.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-10 23:02:41,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1507.87 | bwd_inner_microstep: 1507.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2189
[2024-06-10 23:02:42,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.87 | bwd_microstep: 894.72 | bwd_inner_microstep: 894.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564
[2024-06-10 23:02:44,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.45 | bwd_microstep: 1430.45 | bwd_inner_microstep: 1430.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 23:02:45,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.81 | bwd_microstep: 696.18 | bwd_inner_microstep: 696.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-10 23:02:46,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.17 | bwd_microstep: 697.33 | bwd_inner_microstep: 697.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2431
[2024-06-10 23:02:48,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.70 | bwd_microstep: 990.77 | bwd_inner_microstep: 990.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-10 23:02:49,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.71 | bwd_microstep: 803.95 | bwd_inner_microstep: 803.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 23:02:51,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1499.03 | bwd_inner_microstep: 1499.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2059
[2024-06-10 23:02:52,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.66 | bwd_microstep: 875.03 | bwd_inner_microstep: 875.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3579
[2024-06-10 23:02:54,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1643.96 | bwd_inner_microstep: 1643.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 23:02:56,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.39 | bwd_microstep: 1500.31 | bwd_inner_microstep: 1500.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3763
[2024-06-10 23:03:02,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-10 23:03:02,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.82 | bwd_microstep: 4920.95 | bwd_inner_microstep: 1937.03 | bwd_allreduce_microstep: 2983.87 | step_microstep: 37.71
[2024-06-10 23:03:02,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15431.72 | bwd: 44530.42 | bwd_inner: 41545.65 | bwd_allreduce: 2984.10 | step: 39.15
{'loss': 1.1307, 'learning_rate': 6.1392674018562525e-06, 'epoch': 0.75}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 23:03:04,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.77 | bwd_microstep: 1372.98 | bwd_inner_microstep: 1372.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416
[2024-06-10 23:03:06,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.57 | bwd_microstep: 1149.92 | bwd_inner_microstep: 1149.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268
[2024-06-10 23:03:07,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.40 | bwd_microstep: 967.76 | bwd_inner_microstep: 967.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-10 23:03:09,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.30 | bwd_microstep: 1656.05 | bwd_inner_microstep: 1656.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3791
[2024-06-10 23:03:11,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.44 | bwd_microstep: 1445.20 | bwd_inner_microstep: 1445.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 23:03:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-10 23:03:14,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.68 | bwd_microstep: 1156.28 | bwd_inner_microstep: 1156.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 23:03:16,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1278.60 | bwd_inner_microstep: 1278.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 23:03:18,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1389.25 | bwd_inner_microstep: 1389.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-10 23:03:20,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.87 | bwd_microstep: 1187.42 | bwd_inner_microstep: 1187.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 23:03:22,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.43 | bwd_microstep: 1483.58 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2166
[2024-06-10 23:03:23,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.29 | bwd_microstep: 820.36 | bwd_inner_microstep: 820.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-10 23:03:25,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.52 | bwd_microstep: 1490.63 | bwd_inner_microstep: 1490.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3622
[2024-06-10 23:03:27,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.60 | bwd_microstep: 1710.52 | bwd_inner_microstep: 1710.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016
[2024-06-10 23:03:29,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.66 | bwd_microstep: 897.36 | bwd_inner_microstep: 897.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969
[2024-06-10 23:03:30,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.17 | bwd_microstep: 851.75 | bwd_inner_microstep: 851.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3668
[2024-06-10 23:03:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.60 | bwd_microstep: 1716.89 | bwd_inner_microstep: 1716.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113
[2024-06-10 23:03:33,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.83 | bwd_microstep: 827.54 | bwd_inner_microstep: 827.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-10 23:03:34,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.54 | bwd_microstep: 826.99 | bwd_inner_microstep: 826.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506
[2024-06-10 23:03:36,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.91 | bwd_microstep: 1192.80 | bwd_inner_microstep: 1192.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 23:03:38,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.09 | bwd_microstep: 1360.68 | bwd_inner_microstep: 1360.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-10 23:03:39,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.50 | bwd_microstep: 803.77 | bwd_inner_microstep: 803.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-10 23:03:41,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.50 | bwd_microstep: 1497.90 | bwd_inner_microstep: 1497.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-10 23:03:43,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.32 | bwd_microstep: 1312.35 | bwd_inner_microstep: 1312.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-10 23:03:45,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.70 | bwd_microstep: 1656.28 | bwd_inner_microstep: 1656.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192
[2024-06-10 23:03:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.66 | bwd_microstep: 957.99 | bwd_inner_microstep: 957.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-10 23:03:49,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 1357.24 | bwd_inner_microstep: 1357.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574
[2024-06-10 23:03:51,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.06 | bwd_microstep: 1527.60 | bwd_inner_microstep: 1527.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3703
[2024-06-10 23:03:52,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1335.30 | bwd_inner_microstep: 1335.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 23:03:54,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1281.31 | bwd_inner_microstep: 1281.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3393
[2024-06-10 23:03:56,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.63 | bwd_microstep: 1400.60 | bwd_inner_microstep: 1400.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2300
[2024-06-10 23:04:02,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-10 23:04:02,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.78 | bwd_microstep: 5743.88 | bwd_inner_microstep: 1217.15 | bwd_allreduce_microstep: 4526.68 | step_microstep: 37.92
[2024-06-10 23:04:02,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15099.29 | bwd: 44902.13 | bwd_inner: 40374.53 | bwd_allreduce: 4526.91 | step: 39.34
{'loss': 1.1437, 'learning_rate': 6.112233503931775e-06, 'epoch': 0.75}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-10 23:04:04,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.41 | bwd_microstep: 1268.08 | bwd_inner_microstep: 1268.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-10 23:04:06,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862
[2024-06-10 23:04:08,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1459.98 | bwd_inner_microstep: 1459.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 23:04:10,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1544.89 | bwd_inner_microstep: 1544.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4071
[2024-06-10 23:04:13,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.31 | bwd_microstep: 1624.83 | bwd_inner_microstep: 1624.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-10 23:04:15,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1437.50 | bwd_inner_microstep: 1437.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-10 23:04:16,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.53 | bwd_microstep: 1376.34 | bwd_inner_microstep: 1376.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 23:04:18,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1385.18 | bwd_inner_microstep: 1385.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3749
[2024-06-10 23:04:20,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1339.27 | bwd_inner_microstep: 1339.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977
[2024-06-10 23:04:21,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.62 | bwd_microstep: 704.87 | bwd_inner_microstep: 704.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 23:04:23,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.35 | bwd_microstep: 1478.83 | bwd_inner_microstep: 1478.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442
[2024-06-10 23:04:25,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.42 | bwd_microstep: 1311.38 | bwd_inner_microstep: 1311.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2665
[2024-06-10 23:04:26,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.57 | bwd_microstep: 1020.08 | bwd_inner_microstep: 1020.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-10 23:04:28,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.91 | bwd_microstep: 1481.87 | bwd_inner_microstep: 1481.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2921
[2024-06-10 23:04:30,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.08 | bwd_microstep: 1158.66 | bwd_inner_microstep: 1158.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-10 23:04:31,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.18 | bwd_microstep: 799.76 | bwd_inner_microstep: 799.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-10 23:04:33,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.58 | bwd_microstep: 1336.72 | bwd_inner_microstep: 1336.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 23:04:35,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.78 | bwd_microstep: 1616.23 | bwd_inner_microstep: 1616.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3618
[2024-06-10 23:04:37,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1537.41 | bwd_inner_microstep: 1537.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-10 23:04:40,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.17 | bwd_microstep: 1605.72 | bwd_inner_microstep: 1605.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3549
[2024-06-10 23:04:41,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.71 | bwd_microstep: 1341.63 | bwd_inner_microstep: 1341.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-10 23:04:43,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.19 | bwd_microstep: 808.52 | bwd_inner_microstep: 808.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-10 23:04:45,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1510.77 | bwd_inner_microstep: 1510.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 23:04:47,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.31 | bwd_microstep: 1355.35 | bwd_inner_microstep: 1355.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-10 23:04:48,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1290.19 | bwd_inner_microstep: 1290.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-10 23:04:49,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.01 | bwd_microstep: 804.62 | bwd_inner_microstep: 804.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740
[2024-06-10 23:04:52,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.61 | bwd_microstep: 1541.14 | bwd_inner_microstep: 1541.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806
[2024-06-10 23:04:54,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 1484.71 | bwd_inner_microstep: 1484.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3593
[2024-06-10 23:04:56,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.18 | bwd_microstep: 1644.68 | bwd_inner_microstep: 1644.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2278
[2024-06-10 23:04:57,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.38 | bwd_microstep: 1035.15 | bwd_inner_microstep: 1035.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3860
[2024-06-10 23:05:00,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.96 | bwd_microstep: 1672.43 | bwd_inner_microstep: 1672.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-10 23:05:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.57
[2024-06-10 23:05:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.06 | bwd_microstep: 2433.18 | bwd_inner_microstep: 1842.38 | bwd_allreduce_microstep: 590.75 | step_microstep: 37.94
[2024-06-10 23:05:03,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16071.62 | bwd: 43889.42 | bwd_inner: 43297.77 | bwd_allreduce: 590.98 | step: 39.42
{'loss': 1.1749, 'learning_rate': 6.085248518648708e-06, 'epoch': 0.75}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-10 23:05:05,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1477.08 | bwd_inner_microstep: 1477.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 23:05:06,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1248.28 | bwd_inner_microstep: 1248.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 23:05:08,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.43 | bwd_microstep: 1152.49 | bwd_inner_microstep: 1152.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872
[2024-06-10 23:05:10,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.58 | bwd_microstep: 1664.20 | bwd_inner_microstep: 1664.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-10 23:05:12,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.11 | bwd_microstep: 1314.33 | bwd_inner_microstep: 1314.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3786
[2024-06-10 23:05:14,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.95 | bwd_microstep: 1445.40 | bwd_inner_microstep: 1445.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 23:05:16,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1393.95 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 23:05:17,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.66 | bwd_microstep: 794.18 | bwd_inner_microstep: 794.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 23:05:19,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1247.58 | bwd_inner_microstep: 1247.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3701
[2024-06-10 23:05:21,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.40 | bwd_microstep: 1423.78 | bwd_inner_microstep: 1423.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 23:05:23,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1255.43 | bwd_inner_microstep: 1255.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-10 23:05:25,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1392.84 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583
[2024-06-10 23:05:27,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1463.58 | bwd_inner_microstep: 1463.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 23:05:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1376.03 | bwd_inner_microstep: 1376.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3715
[2024-06-10 23:05:31,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.60 | bwd_microstep: 1726.86 | bwd_inner_microstep: 1726.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-10 23:05:33,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1485.55 | bwd_inner_microstep: 1485.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643
[2024-06-10 23:05:35,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.30 | bwd_microstep: 1602.88 | bwd_inner_microstep: 1602.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2069
[2024-06-10 23:05:36,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.36 | bwd_microstep: 943.55 | bwd_inner_microstep: 943.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-10 23:05:38,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.70 | bwd_microstep: 904.91 | bwd_inner_microstep: 904.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-10 23:05:39,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.13 | bwd_microstep: 1191.90 | bwd_inner_microstep: 1191.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-10 23:05:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.59 | bwd_microstep: 1657.39 | bwd_inner_microstep: 1657.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 23:05:43,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1373.27 | bwd_inner_microstep: 1373.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 23:05:45,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.99 | bwd_microstep: 1483.12 | bwd_inner_microstep: 1483.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-10 23:05:47,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.21 | bwd_microstep: 1404.81 | bwd_inner_microstep: 1404.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-10 23:05:49,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.15 | bwd_microstep: 1472.99 | bwd_inner_microstep: 1472.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1884
[2024-06-10 23:05:51,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.59 | bwd_microstep: 791.44 | bwd_inner_microstep: 791.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3557
[2024-06-10 23:05:53,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.82 | bwd_microstep: 1453.11 | bwd_inner_microstep: 1453.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 23:05:55,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1508.88 | bwd_inner_microstep: 1508.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-10 23:05:57,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.93 | bwd_microstep: 1492.40 | bwd_inner_microstep: 1492.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 23:05:59,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.67 | bwd_microstep: 1472.68 | bwd_inner_microstep: 1472.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3785
[2024-06-10 23:06:01,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.59 | bwd_microstep: 1677.60 | bwd_inner_microstep: 1677.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-10 23:06:06,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 23:06:06,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.05 | bwd_microstep: 4295.62 | bwd_inner_microstep: 927.34 | bwd_allreduce_microstep: 3368.24 | step_microstep: 37.76
[2024-06-10 23:06:06,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16105.83 | bwd: 46588.12 | bwd_inner: 43218.95 | bwd_allreduce: 3368.47 | step: 39.28
{'loss': 1.1138, 'learning_rate': 6.058312541048021e-06, 'epoch': 0.75}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 23:06:08,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.56 | bwd_microstep: 1364.83 | bwd_inner_microstep: 1364.64 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 23:06:09,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1343.13 | bwd_inner_microstep: 1343.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911
[2024-06-10 23:06:12,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.34 | bwd_microstep: 1685.53 | bwd_inner_microstep: 1685.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-10 23:06:14,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.32 | bwd_microstep: 1279.85 | bwd_inner_microstep: 1279.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3763
[2024-06-10 23:06:15,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1339.52 | bwd_inner_microstep: 1339.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-10 23:06:18,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.04 | bwd_microstep: 1645.66 | bwd_inner_microstep: 1645.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 23:06:19,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.19 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 23:06:21,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1354.45 | bwd_inner_microstep: 1354.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 23:06:23,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1391.83 | bwd_inner_microstep: 1391.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-10 23:06:25,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.80 | bwd_inner_microstep: 1385.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-10 23:06:27,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.38 | bwd_microstep: 1435.18 | bwd_inner_microstep: 1435.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 23:06:29,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1345.11 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1936
[2024-06-10 23:06:30,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.86 | bwd_microstep: 823.28 | bwd_inner_microstep: 823.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435
[2024-06-10 23:06:32,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.89 | bwd_microstep: 1313.10 | bwd_inner_microstep: 1313.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-10 23:06:34,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1255.05 | bwd_inner_microstep: 1255.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-10 23:06:36,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1428.27 | bwd_inner_microstep: 1428.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3656
[2024-06-10 23:06:38,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.82 | bwd_microstep: 1444.17 | bwd_inner_microstep: 1444.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3519
[2024-06-10 23:06:39,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.82 | bwd_microstep: 1224.17 | bwd_inner_microstep: 1224.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3643
[2024-06-10 23:06:41,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.79 | bwd_microstep: 1445.77 | bwd_inner_microstep: 1445.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667
[2024-06-10 23:06:44,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.14 | bwd_microstep: 1625.18 | bwd_inner_microstep: 1625.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-10 23:06:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1416.49 | bwd_inner_microstep: 1416.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-10 23:06:47,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 1402.15 | bwd_inner_microstep: 1402.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-10 23:06:49,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.43 | bwd_microstep: 1359.33 | bwd_inner_microstep: 1359.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1934
[2024-06-10 23:06:50,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.09 | bwd_microstep: 745.23 | bwd_inner_microstep: 745.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 23:06:52,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.66 | bwd_microstep: 1254.34 | bwd_inner_microstep: 1254.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3470
[2024-06-10 23:06:54,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1332.34 | bwd_inner_microstep: 1332.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 23:06:56,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1407.21 | bwd_inner_microstep: 1407.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3701
[2024-06-10 23:06:58,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.17 | bwd_microstep: 1475.30 | bwd_inner_microstep: 1475.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-10 23:07:00,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.31 | bwd_microstep: 1504.26 | bwd_inner_microstep: 1504.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3559
[2024-06-10 23:07:02,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1362.66 | bwd_inner_microstep: 1362.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377
[2024-06-10 23:07:04,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.64 | bwd_microstep: 1338.24 | bwd_inner_microstep: 1338.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2268
[2024-06-10 23:07:05,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.01 | optimizer_step: 6.66
[2024-06-10 23:07:05,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.50 | bwd_microstep: 1270.43 | bwd_inner_microstep: 1048.32 | bwd_allreduce_microstep: 222.07 | step_microstep: 37.41
[2024-06-10 23:07:05,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16116.83 | bwd: 43281.62 | bwd_inner: 43058.51 | bwd_allreduce: 222.37 | step: 38.93
{'loss': 1.1942, 'learning_rate': 6.03142566599809e-06, 'epoch': 0.75}
7:15:32, 60.77s/it]
 75%|███████▌  | 1297/1726 [22:25:39<7:13:29, 60.63s/it]


 75%|███████▌  | 1297/1726 [22:25:39<7:13:29, 60.63s/it]
 75%|███████▌  | 1298/1726 [22:26:39<7:11:49, 60.54s/it]


 75%|███████▌  | 1298/1726 [22:26:39<7:11:49, 60.54s/it]
 75%|███████▌  | 1299/1726 [22:27:39<7:10:18, 60.46s/it]


 75%|███████▌  | 1299/1726 [22:27:39<7:10:18, 60.46s/it]
 75%|███████▌  | 1300/1726 [22:28:42<7:14:45, 61.23s/it]


 75%|███████▌  | 1300/1726 [22:28:42<7:14:45, 61.23s/it]
 75%|███████▌  | 1301/1726 [22:29:42<7:10:32, 60.78s/it]


 75%|███████▌  | 1301/1726 [22:29:42<7:10dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-10 23:07:07,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.39 | bwd_microstep: 1145.06 | bwd_inner_microstep: 1145.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 23:07:09,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1245.19 | bwd_inner_microstep: 1245.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 23:07:11,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.73 | bwd_microstep: 1475.20 | bwd_inner_microstep: 1475.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-10 23:07:12,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.22 | bwd_microstep: 1181.38 | bwd_inner_microstep: 1181.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-10 23:07:14,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1482.05 | bwd_inner_microstep: 1482.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-10 23:07:16,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1297.21 | bwd_inner_microstep: 1297.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 23:07:18,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1383.26 | bwd_inner_microstep: 1383.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-10 23:07:20,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.11 | bwd_microstep: 1648.63 | bwd_inner_microstep: 1648.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2631
[2024-06-10 23:07:22,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.75 | bwd_microstep: 1017.09 | bwd_inner_microstep: 1017.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3697
[2024-06-10 23:07:24,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.40 | bwd_microstep: 1359.22 | bwd_inner_microstep: 1359.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-10 23:07:25,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 794.97 | bwd_inner_microstep: 794.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010
[2024-06-10 23:07:26,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 804.36 | bwd_inner_microstep: 804.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3488
[2024-06-10 23:07:28,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1530.70 | bwd_inner_microstep: 1530.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1965
[2024-06-10 23:07:29,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 784.74 | bwd_inner_microstep: 784.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-10 23:07:31,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.77 | bwd_microstep: 1286.90 | bwd_inner_microstep: 1286.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-10 23:07:33,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.81 | bwd_microstep: 1293.60 | bwd_inner_microstep: 1293.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-10 23:07:35,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1408.80 | bwd_inner_microstep: 1408.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 23:07:36,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.42 | bwd_microstep: 797.11 | bwd_inner_microstep: 797.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 23:07:38,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1559.31 | bwd_inner_microstep: 1559.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 23:07:39,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.32 | bwd_microstep: 975.18 | bwd_inner_microstep: 975.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-10 23:07:41,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1397.04 | bwd_inner_microstep: 1397.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-10 23:07:43,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1398.57 | bwd_inner_microstep: 1398.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 23:07:45,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.70 | bwd_microstep: 1608.59 | bwd_inner_microstep: 1608.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-10 23:07:46,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.53 | bwd_microstep: 803.24 | bwd_inner_microstep: 803.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555
[2024-06-10 23:07:48,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.64 | bwd_microstep: 1233.62 | bwd_inner_microstep: 1233.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3602
[2024-06-10 23:07:50,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.00 | bwd_microstep: 1369.03 | bwd_inner_microstep: 1369.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-10 23:07:52,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.65 | bwd_microstep: 1420.41 | bwd_inner_microstep: 1420.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3462
[2024-06-10 23:07:54,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1429.19 | bwd_inner_microstep: 1429.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 23:07:56,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.46 | bwd_microstep: 1253.85 | bwd_inner_microstep: 1253.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3822
[2024-06-10 23:07:58,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.76 | bwd_microstep: 1754.59 | bwd_inner_microstep: 1754.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-10 23:08:01,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.73 | bwd_microstep: 1748.35 | bwd_inner_microstep: 1748.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801
[2024-06-10 23:08:06,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.04 | optimizer_step: 6.62
[2024-06-10 23:08:06,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.65 | bwd_microstep: 4576.42 | bwd_inner_microstep: 1991.61 | bwd_allreduce_microstep: 2584.76 | step_microstep: 37.67
[2024-06-10 23:08:06,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15622.26 | bwd: 44462.87 | bwd_inner: 41877.21 | bwd_allreduce: 2584.99 | step: 39.16
{'loss': 1.2276, 'learning_rate': 6.004587988194342e-06, 'epoch': 0.75}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3566
[2024-06-10 23:08:08,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.93 | bwd_microstep: 1434.73 | bwd_inner_microstep: 1434.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 23:08:10,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1389.06 | bwd_inner_microstep: 1389.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-10 23:08:12,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.26 | bwd_microstep: 1557.13 | bwd_inner_microstep: 1557.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2253
[2024-06-10 23:08:13,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.52 | bwd_microstep: 968.16 | bwd_inner_microstep: 968.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 23:08:15,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.01 | bwd_microstep: 1280.33 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4102
[2024-06-10 23:08:17,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.50 | bwd_microstep: 1732.58 | bwd_inner_microstep: 1732.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 23:08:19,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.54 | bwd_microstep: 1346.66 | bwd_inner_microstep: 1346.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 23:08:21,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.82 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 23:08:23,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1389.16 | bwd_inner_microstep: 1389.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3582
[2024-06-10 23:08:25,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.68 | bwd_microstep: 1531.24 | bwd_inner_microstep: 1531.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-10 23:08:27,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.66 | bwd_microstep: 1485.96 | bwd_inner_microstep: 1485.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3648
[2024-06-10 23:08:30,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.55 | bwd_microstep: 1714.03 | bwd_inner_microstep: 1714.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3750
[2024-06-10 23:08:32,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.36 | bwd_microstep: 1735.37 | bwd_inner_microstep: 1735.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-10 23:08:34,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1584.06 | bwd_inner_microstep: 1584.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-10 23:08:36,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-10 23:08:38,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 1453.73 | bwd_inner_microstep: 1453.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-10 23:08:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1448.89 | bwd_inner_microstep: 1448.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2084
[2024-06-10 23:08:41,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.11 | bwd_microstep: 948.06 | bwd_inner_microstep: 948.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-10 23:08:43,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.81 | bwd_microstep: 1426.02 | bwd_inner_microstep: 1425.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-10 23:08:45,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1301.71 | bwd_inner_microstep: 1301.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2000
[2024-06-10 23:08:46,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.31 | bwd_microstep: 706.83 | bwd_inner_microstep: 706.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-10 23:08:48,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1293.46 | bwd_inner_microstep: 1293.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 23:08:50,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1509.44 | bwd_inner_microstep: 1509.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 23:08:52,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1450.62 | bwd_inner_microstep: 1450.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 23:08:54,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1490.86 | bwd_inner_microstep: 1490.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2273
[2024-06-10 23:08:55,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.23 | bwd_microstep: 846.35 | bwd_inner_microstep: 846.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-10 23:08:57,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.36 | bwd_microstep: 1547.44 | bwd_inner_microstep: 1547.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-10 23:08:59,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.96 | bwd_microstep: 1451.54 | bwd_inner_microstep: 1451.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-10 23:09:01,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1280.01 | bwd_inner_microstep: 1279.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-10 23:09:03,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.94 | bwd_microstep: 1454.94 | bwd_inner_microstep: 1454.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2279
[2024-06-10 23:09:04,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.97 | bwd_microstep: 812.89 | bwd_inner_microstep: 812.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2281
[2024-06-10 23:09:07,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-10 23:09:07,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.16 | bwd_microstep: 2065.36 | bwd_inner_microstep: 1213.24 | bwd_allreduce_microstep: 852.07 | step_microstep: 37.72
[2024-06-10 23:09:07,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16250.01 | bwd: 44461.81 | bwd_inner: 43608.83 | bwd_allreduce: 852.30 | step: 39.16
{'loss': 1.2029, 'learning_rate': 5.977799602158949e-06, 'epoch': 0.75}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 23:09:09,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.34 | bwd_microstep: 1336.01 | bwd_inner_microstep: 1335.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 23:09:11,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1377.32 | bwd_inner_microstep: 1377.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 23:09:13,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.41 | bwd_microstep: 1644.59 | bwd_inner_microstep: 1644.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-10 23:09:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1495.07 | bwd_inner_microstep: 1495.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3756
[2024-06-10 23:09:17,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.07 | bwd_microstep: 1340.80 | bwd_inner_microstep: 1340.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 23:09:19,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1532.31 | bwd_inner_microstep: 1532.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-10 23:09:21,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.95 | bwd_microstep: 1434.56 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-10 23:09:23,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1409.70 | bwd_inner_microstep: 1409.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 23:09:25,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1481.35 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3406
[2024-06-10 23:09:27,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.03 | bwd_microstep: 1198.08 | bwd_inner_microstep: 1198.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 23:09:29,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 1602.49 | bwd_inner_microstep: 1602.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3545
[2024-06-10 23:09:31,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1447.42 | bwd_inner_microstep: 1447.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 23:09:33,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 1476.93 | bwd_inner_microstep: 1476.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3661
[2024-06-10 23:09:35,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.38 | bwd_microstep: 1818.95 | bwd_inner_microstep: 1818.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481
[2024-06-10 23:09:37,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.28 | bwd_microstep: 1185.23 | bwd_inner_microstep: 1185.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096
[2024-06-10 23:09:38,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.27 | bwd_microstep: 819.93 | bwd_inner_microstep: 819.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 23:09:40,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1349.39 | bwd_inner_microstep: 1349.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 23:09:42,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.33 | bwd_microstep: 1383.52 | bwd_inner_microstep: 1383.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 23:09:44,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.98 | bwd_microstep: 1656.22 | bwd_inner_microstep: 1656.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 23:09:46,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1395.56 | bwd_inner_microstep: 1395.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-10 23:09:48,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.51 | bwd_microstep: 1659.34 | bwd_inner_microstep: 1659.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828
[2024-06-10 23:09:50,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.27 | bwd_microstep: 1356.39 | bwd_inner_microstep: 1356.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3542
[2024-06-10 23:09:52,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.88 | bwd_microstep: 1541.75 | bwd_inner_microstep: 1541.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 662
[2024-06-10 23:09:53,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.03 | bwd_microstep: 277.55 | bwd_inner_microstep: 277.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2512
[2024-06-10 23:09:54,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.02 | bwd_microstep: 960.48 | bwd_inner_microstep: 960.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-10 23:09:56,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1509.18 | bwd_inner_microstep: 1509.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-10 23:09:58,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.33 | bwd_microstep: 1281.81 | bwd_inner_microstep: 1281.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 973
[2024-06-10 23:09:58,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 150.52 | bwd_microstep: 386.22 | bwd_inner_microstep: 386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 23:10:01,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.07 | bwd_microstep: 1644.83 | bwd_inner_microstep: 1644.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 23:10:03,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.60 | bwd_microstep: 1446.09 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3458
[2024-06-10 23:10:05,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.90 | bwd_microstep: 1357.11 | bwd_inner_microstep: 1357.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3436
[2024-06-10 23:10:09,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-10 23:10:09,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.41 | bwd_microstep: 3660.43 | bwd_inner_microstep: 1716.13 | bwd_allreduce_microstep: 1944.25 | step_microstep: 38.01
[2024-06-10 23:10:09,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16175.49 | bwd: 45466.62 | bwd_inner: 43521.47 | bwd_allreduce: 1944.49 | step: 39.49
{'loss': 1.1916, 'learning_rate': 5.951060602240464e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-10 23:10:11,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.22 | bwd_microstep: 1332.96 | bwd_inner_microstep: 1332.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1898
[2024-06-10 23:10:12,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.36 | bwd_microstep: 710.66 | bwd_inner_microstep: 710.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846
[2024-06-10 23:10:14,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.82 | bwd_microstep: 1660.93 | bwd_inner_microstep: 1660.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2311
[2024-06-10 23:10:15,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.92 | bwd_microstep: 978.73 | bwd_inner_microstep: 978.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484
[2024-06-10 23:10:17,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.12 | bwd_microstep: 1342.78 | bwd_inner_microstep: 1342.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-10 23:10:19,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1345.84 | bwd_inner_microstep: 1345.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-10 23:10:21,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.58 | bwd_microstep: 1478.66 | bwd_inner_microstep: 1478.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2654
[2024-06-10 23:10:22,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.76 | bwd_microstep: 1021.47 | bwd_inner_microstep: 1021.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-10 23:10:25,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.44 | bwd_microstep: 1482.21 | bwd_inner_microstep: 1482.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-10 23:10:26,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.83 | bwd_microstep: 1289.57 | bwd_inner_microstep: 1289.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-10 23:10:28,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1426.37 | bwd_inner_microstep: 1426.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 23:10:30,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1488.48 | bwd_inner_microstep: 1488.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4039
[2024-06-10 23:10:33,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.28 | bwd_microstep: 1623.43 | bwd_inner_microstep: 1623.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3492
[2024-06-10 23:10:34,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1344.62 | bwd_inner_microstep: 1344.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 23:10:36,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.33 | bwd_microstep: 1447.56 | bwd_inner_microstep: 1447.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3679
[2024-06-10 23:10:39,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.48 | bwd_microstep: 1590.92 | bwd_inner_microstep: 1590.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135
[2024-06-10 23:10:40,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.28 | bwd_microstep: 834.17 | bwd_inner_microstep: 834.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 23:10:41,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.97 | bwd_microstep: 1184.92 | bwd_inner_microstep: 1184.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2606
[2024-06-10 23:10:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.36 | bwd_microstep: 997.68 | bwd_inner_microstep: 997.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 23:10:45,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1279.16 | bwd_inner_microstep: 1279.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-10 23:10:46,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.11 | bwd_microstep: 880.03 | bwd_inner_microstep: 880.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 23:10:48,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1400.29 | bwd_inner_microstep: 1400.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-10 23:10:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.73 | bwd_microstep: 686.77 | bwd_inner_microstep: 686.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3708
[2024-06-10 23:10:51,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.22 | bwd_microstep: 1557.49 | bwd_inner_microstep: 1557.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722
[2024-06-10 23:10:53,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.75 | bwd_microstep: 1341.21 | bwd_inner_microstep: 1341.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2958
[2024-06-10 23:10:54,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.48 | bwd_microstep: 1136.10 | bwd_inner_microstep: 1136.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381
[2024-06-10 23:10:56,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.84 | bwd_microstep: 1242.35 | bwd_inner_microstep: 1242.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3746
[2024-06-10 23:10:58,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1342.58 | bwd_inner_microstep: 1342.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3726
[2024-06-10 23:11:00,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.81 | bwd_microstep: 1730.45 | bwd_inner_microstep: 1730.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491
[2024-06-10 23:11:02,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.38 | bwd_microstep: 1319.11 | bwd_inner_microstep: 1319.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-10 23:11:04,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.38 | bwd_microstep: 1529.10 | bwd_inner_microstep: 1529.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3806
[2024-06-10 23:11:09,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-10 23:11:09,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 4480.21 | bwd_inner_microstep: 1576.70 | bwd_allreduce_microstep: 2903.46 | step_microstep: 37.74
[2024-06-10 23:11:09,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15524.54 | bwd: 44506.82 | bwd_inner: 41602.47 | bwd_allreduce: 2903.69 | step: 39.25
{'loss': 1.1704, 'learning_rate': 5.924371082613496e-06, 'epoch': 0.76}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472
[2024-06-10 23:11:11,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.06 | bwd_microstep: 1566.78 | bwd_inner_microstep: 1566.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4586
[2024-06-10 23:11:14,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.14 | bwd_microstep: 1852.94 | bwd_inner_microstep: 1852.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963
[2024-06-10 23:11:15,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.11 | bwd_microstep: 699.51 | bwd_inner_microstep: 699.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-10 23:11:17,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.27 | bwd_microstep: 1475.93 | bwd_inner_microstep: 1475.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 23:11:19,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.31 | bwd_microstep: 1272.07 | bwd_inner_microstep: 1272.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 23:11:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.09 | bwd_microstep: 1147.07 | bwd_inner_microstep: 1147.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-10 23:11:22,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1246.83 | bwd_inner_microstep: 1246.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2204
[2024-06-10 23:11:23,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.34 | bwd_microstep: 861.32 | bwd_inner_microstep: 861.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 23:11:25,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1390.14 | bwd_inner_microstep: 1390.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3744
[2024-06-10 23:11:27,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.17 | bwd_microstep: 1491.39 | bwd_inner_microstep: 1491.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-10 23:11:29,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1387.36 | bwd_inner_microstep: 1387.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 23:11:31,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.60 | bwd_microstep: 1340.34 | bwd_inner_microstep: 1340.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3533
[2024-06-10 23:11:33,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.32 | bwd_microstep: 1543.13 | bwd_inner_microstep: 1543.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651
[2024-06-10 23:11:35,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.62 | bwd_microstep: 1511.57 | bwd_inner_microstep: 1511.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-10 23:11:36,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.35 | bwd_microstep: 789.79 | bwd_inner_microstep: 789.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449
[2024-06-10 23:11:38,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.68 | bwd_microstep: 1377.05 | bwd_inner_microstep: 1377.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661
[2024-06-10 23:11:40,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1426.09 | bwd_inner_microstep: 1426.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-10 23:11:42,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1295.99 | bwd_inner_microstep: 1295.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857
[2024-06-10 23:11:44,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.58 | bwd_microstep: 1556.00 | bwd_inner_microstep: 1555.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-10 23:11:46,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1415.13 | bwd_inner_microstep: 1415.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-10 23:11:48,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1421.89 | bwd_inner_microstep: 1421.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-10 23:11:50,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1373.67 | bwd_inner_microstep: 1373.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-10 23:11:52,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.01 | bwd_microstep: 1156.04 | bwd_inner_microstep: 1156.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3459
[2024-06-10 23:11:53,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.97 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-10 23:11:55,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-10 23:11:58,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.79 | bwd_microstep: 1599.08 | bwd_inner_microstep: 1599.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-10 23:11:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.61 | bwd_microstep: 878.22 | bwd_inner_microstep: 878.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3764
[2024-06-10 23:12:01,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1405.65 | bwd_inner_microstep: 1405.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-10 23:12:02,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1249.56 | bwd_inner_microstep: 1249.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039
[2024-06-10 23:12:04,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.03 | bwd_microstep: 907.14 | bwd_inner_microstep: 907.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 23:12:06,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1402.15 | bwd_inner_microstep: 1402.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 23:12:11,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.30 | optimizer_step: 6.59
[2024-06-10 23:12:11,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 4560.70 | bwd_inner_microstep: 1630.65 | bwd_allreduce_microstep: 2929.99 | step_microstep: 38.20
[2024-06-10 23:12:11,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15874.68 | bwd: 45381.59 | bwd_inner: 42450.70 | bwd_allreduce: 2930.22 | step: 39.66
{'loss': 1.1938, 'learning_rate': 5.897731137278417e-06, 'epoch': 0.76}
:32, 60.78s/it]
 75%|███████▌  | 1302/1726 [22:30:43<7:08:44, 60.67s/it]


 75%|███████▌  | 1302/1726 [22:30:43<7:08:44, 60.67s/it]
 75%|███████▌  | 1303/1726 [22:31:44<7:08:31, 60.78s/it]


 75%|███████▌  | 1303/1726 [22:31:44<7:08:31, 60.78s/it]
 76%|███████▌  | 1304/1726 [22:32:46<7:10:01, 61.14s/it]


 76%|███████▌  | 1304/1726 [22:32:46<7:10:01, 61.14s/it]
 76%|███████▌  | 1305/1726 [22:33:46<7:07:21, 60.91s/it]


 76%|███████▌  | 1305/1726 [22:33:46<7:07:21, 60.91s/it]
 76%|███████▌  | 1306/1726 [22:34:48<7:07:45, 61.11s/it]


 76%|███████▌  | 1306/1726 [22:34:48<7:07:45,dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-10 23:12:13,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.33 | bwd_microstep: 1565.73 | bwd_inner_microstep: 1565.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3890
[2024-06-10 23:12:15,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.38 | bwd_microstep: 1678.63 | bwd_inner_microstep: 1678.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1905
[2024-06-10 23:12:16,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.87 | bwd_microstep: 749.01 | bwd_inner_microstep: 748.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-10 23:12:18,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.77 | bwd_microstep: 1484.18 | bwd_inner_microstep: 1484.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3755
[2024-06-10 23:12:20,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1369.91 | bwd_inner_microstep: 1369.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284
[2024-06-10 23:12:21,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.78 | bwd_microstep: 873.40 | bwd_inner_microstep: 873.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885
[2024-06-10 23:12:22,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.13 | bwd_microstep: 680.10 | bwd_inner_microstep: 680.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 23:12:23,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.68 | bwd_microstep: 678.55 | bwd_inner_microstep: 678.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1361
[2024-06-10 23:12:24,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 221.37 | bwd_microstep: 580.78 | bwd_inner_microstep: 580.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 23:12:26,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.89 | bwd_microstep: 1339.48 | bwd_inner_microstep: 1339.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-10 23:12:28,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1483.90 | bwd_inner_microstep: 1483.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3507
[2024-06-10 23:12:30,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.01 | bwd_microstep: 1516.54 | bwd_inner_microstep: 1516.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2660
[2024-06-10 23:12:32,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.31 | bwd_microstep: 1115.61 | bwd_inner_microstep: 1115.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 23:12:33,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.09 | bwd_microstep: 803.40 | bwd_inner_microstep: 803.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-10 23:12:35,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1418.79 | bwd_inner_microstep: 1418.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.54
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2164
[2024-06-10 23:12:36,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.67 | bwd_microstep: 883.78 | bwd_inner_microstep: 883.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3706
[2024-06-10 23:12:38,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.83 | bwd_microstep: 1580.15 | bwd_inner_microstep: 1580.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 677
[2024-06-10 23:12:39,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.00 | bwd_microstep: 282.35 | bwd_inner_microstep: 282.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3622
[2024-06-10 23:12:41,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.72 | bwd_microstep: 1539.60 | bwd_inner_microstep: 1539.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3493
[2024-06-10 23:12:42,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.90 | bwd_microstep: 1218.44 | bwd_inner_microstep: 1218.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 23:12:44,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.21 | bwd_microstep: 1532.24 | bwd_inner_microstep: 1532.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 23:12:46,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1279.81 | bwd_inner_microstep: 1279.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3735
[2024-06-10 23:12:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1369.16 | bwd_inner_microstep: 1369.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-10 23:12:50,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.89 | bwd_microstep: 1215.28 | bwd_inner_microstep: 1215.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-10 23:12:51,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 810.18 | bwd_inner_microstep: 810.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746
[2024-06-10 23:12:53,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.98 | bwd_microstep: 1442.13 | bwd_inner_microstep: 1442.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-10 23:12:55,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.43 | bwd_microstep: 1612.49 | bwd_inner_microstep: 1612.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3600
[2024-06-10 23:12:57,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.75 | bwd_microstep: 1672.81 | bwd_inner_microstep: 1672.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3568
[2024-06-10 23:12:59,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.75 | bwd_microstep: 1450.41 | bwd_inner_microstep: 1450.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-10 23:13:02,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.43 | bwd_microstep: 1644.04 | bwd_inner_microstep: 1644.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3444
[2024-06-10 23:13:04,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1399.48 | bwd_inner_microstep: 1399.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3578
[2024-06-10 23:13:12,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.23 | optimizer_step: 6.64
[2024-06-10 23:13:12,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.89 | bwd_microstep: 7976.34 | bwd_inner_microstep: 1960.22 | bwd_allreduce_microstep: 6016.05 | step_microstep: 38.87
[2024-06-10 23:13:12,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14936.75 | bwd: 46246.73 | bwd_inner: 40229.74 | bwd_allreduce: 6016.30 | step: 40.37
{'loss': 1.1932, 'learning_rate': 5.871140860060951e-06, 'epoch': 0.76}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-10 23:13:14,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1392.97 | bwd_inner_microstep: 1392.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1867
[2024-06-10 23:13:15,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.13 | bwd_microstep: 751.46 | bwd_inner_microstep: 751.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-10 23:13:17,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.74 | bwd_microstep: 1552.23 | bwd_inner_microstep: 1552.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-10 23:13:19,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.50 | bwd_microstep: 1277.29 | bwd_inner_microstep: 1277.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-10 23:13:21,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1247.14 | bwd_inner_microstep: 1247.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4186
[2024-06-10 23:13:23,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1452.88 | bwd_inner_microstep: 1452.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-10 23:13:25,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3492
[2024-06-10 23:13:27,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1350.22 | bwd_inner_microstep: 1350.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495
[2024-06-10 23:13:29,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1412.59 | bwd_inner_microstep: 1412.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1914
[2024-06-10 23:13:30,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.65 | bwd_microstep: 875.24 | bwd_inner_microstep: 875.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-10 23:13:32,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.91 | bwd_microstep: 1334.30 | bwd_inner_microstep: 1334.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3665
[2024-06-10 23:13:34,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.20 | bwd_microstep: 1562.70 | bwd_inner_microstep: 1562.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2746
[2024-06-10 23:13:35,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.58 | bwd_microstep: 1135.61 | bwd_inner_microstep: 1135.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-10 23:13:38,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.27 | bwd_microstep: 1609.34 | bwd_inner_microstep: 1609.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-10 23:13:39,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.00 | bwd_microstep: 1184.91 | bwd_inner_microstep: 1184.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644
[2024-06-10 23:13:41,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.21 | bwd_microstep: 1410.49 | bwd_inner_microstep: 1410.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-10 23:13:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 795.48 | bwd_inner_microstep: 795.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-10 23:13:44,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.15 | bwd_microstep: 1186.78 | bwd_inner_microstep: 1186.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 23:13:46,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.02 | bwd_microstep: 1296.33 | bwd_inner_microstep: 1296.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 23:13:48,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.11 | bwd_microstep: 1453.41 | bwd_inner_microstep: 1453.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-10 23:13:50,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.95 | bwd_microstep: 1309.24 | bwd_inner_microstep: 1309.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-10 23:13:52,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 1450.53 | bwd_inner_microstep: 1450.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-10 23:13:53,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1257.79 | bwd_inner_microstep: 1257.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2924
[2024-06-10 23:13:55,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.87 | bwd_microstep: 1126.20 | bwd_inner_microstep: 1126.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580
[2024-06-10 23:13:57,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1529.35 | bwd_inner_microstep: 1529.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-10 23:13:59,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1508.27 | bwd_inner_microstep: 1508.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 23:14:01,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1289.09 | bwd_inner_microstep: 1289.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-10 23:14:03,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.61 | bwd_microstep: 1500.74 | bwd_inner_microstep: 1500.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 23:14:05,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1352.05 | bwd_inner_microstep: 1352.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761
[2024-06-10 23:14:07,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.46 | bwd_microstep: 1543.78 | bwd_inner_microstep: 1543.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-10 23:14:09,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.89 | bwd_microstep: 1755.55 | bwd_inner_microstep: 1755.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-10 23:14:13,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-10 23:14:13,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 3558.86 | bwd_inner_microstep: 1645.80 | bwd_allreduce_microstep: 1913.01 | step_microstep: 37.69
[2024-06-10 23:14:13,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16008.00 | bwd: 44846.13 | bwd_inner: 42932.22 | bwd_allreduce: 1913.24 | step: 39.11
{'loss': 1.2024, 'learning_rate': 5.844600344611931e-06, 'epoch': 0.76}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-10 23:14:15,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.54 | bwd_microstep: 887.63 | bwd_inner_microstep: 887.51 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 23:14:16,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.15 | bwd_microstep: 1241.66 | bwd_inner_microstep: 1241.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 23:14:18,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1472.66 | bwd_inner_microstep: 1472.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3882
[2024-06-10 23:14:21,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.74 | bwd_microstep: 1682.61 | bwd_inner_microstep: 1682.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-10 23:14:23,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1275.18 | bwd_inner_microstep: 1275.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 23:14:24,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1246.88 | bwd_inner_microstep: 1246.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-10 23:14:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1411.25 | bwd_inner_microstep: 1411.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946
[2024-06-10 23:14:27,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.76 | bwd_microstep: 728.76 | bwd_inner_microstep: 728.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-10 23:14:29,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.84 | bwd_microstep: 1625.32 | bwd_inner_microstep: 1625.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3696
[2024-06-10 23:14:32,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1472.50 | bwd_inner_microstep: 1472.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-10 23:14:33,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1287.45 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3495
[2024-06-10 23:14:35,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1490.74 | bwd_inner_microstep: 1490.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-10 23:14:38,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.11 | bwd_microstep: 1581.37 | bwd_inner_microstep: 1581.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 23:14:40,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.57 | bwd_microstep: 1605.04 | bwd_inner_microstep: 1605.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 23:14:42,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1389.45 | bwd_inner_microstep: 1389.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3904
[2024-06-10 23:14:44,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.82 | bwd_microstep: 1787.04 | bwd_inner_microstep: 1787.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-10 23:14:46,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.35 | bwd_microstep: 1292.16 | bwd_inner_microstep: 1292.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627
[2024-06-10 23:14:48,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.28 | bwd_microstep: 1705.76 | bwd_inner_microstep: 1705.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-10 23:14:49,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 819.44 | bwd_inner_microstep: 819.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2246
[2024-06-10 23:14:51,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.40 | bwd_microstep: 901.89 | bwd_inner_microstep: 901.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2285
[2024-06-10 23:14:52,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.20 | bwd_microstep: 956.13 | bwd_inner_microstep: 956.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2036
[2024-06-10 23:14:53,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.42 | bwd_microstep: 715.87 | bwd_inner_microstep: 715.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016
[2024-06-10 23:14:54,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.52 | bwd_microstep: 711.62 | bwd_inner_microstep: 711.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-10 23:14:56,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1388.66 | bwd_inner_microstep: 1388.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 23:14:58,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1658.32 | bwd_inner_microstep: 1658.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1974
[2024-06-10 23:14:59,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.45 | bwd_microstep: 704.55 | bwd_inner_microstep: 704.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3600
[2024-06-10 23:15:01,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.80 | bwd_microstep: 1213.93 | bwd_inner_microstep: 1213.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2050
[2024-06-10 23:15:02,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.35 | bwd_microstep: 892.86 | bwd_inner_microstep: 892.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-10 23:15:04,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1251.32 | bwd_inner_microstep: 1251.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3737
[2024-06-10 23:15:06,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.71 | bwd_microstep: 1834.27 | bwd_inner_microstep: 1834.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-10 23:15:08,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 972.71 | bwd_inner_microstep: 972.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 23:15:17,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-10 23:15:17,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 8321.92 | bwd_inner_microstep: 1625.67 | bwd_allreduce_microstep: 6696.20 | step_microstep: 37.98
[2024-06-10 23:15:17,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15168.89 | bwd: 47526.97 | bwd_inner: 40829.78 | bwd_allreduce: 6696.47 | step: 39.49
{'loss': 1.2101, 'learning_rate': 5.8181096844069055e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-10 23:15:18,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.96 | bwd_microstep: 1332.78 | bwd_inner_microstep: 1332.69 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 23:15:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.76 | bwd_microstep: 1244.27 | bwd_inner_microstep: 1244.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-10 23:15:22,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1550.27 | bwd_inner_microstep: 1550.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-10 23:15:24,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.75 | bwd_microstep: 1543.74 | bwd_inner_microstep: 1543.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-10 23:15:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.92 | bwd_microstep: 1183.60 | bwd_inner_microstep: 1183.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-10 23:15:28,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1246.51 | bwd_inner_microstep: 1246.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1372
[2024-06-10 23:15:28,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 199.79 | bwd_microstep: 519.79 | bwd_inner_microstep: 519.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-10 23:15:30,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.99 | bwd_microstep: 1158.36 | bwd_inner_microstep: 1158.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 23:15:32,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.29 | bwd_microstep: 1529.86 | bwd_inner_microstep: 1529.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-10 23:15:33,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.71 | bwd_microstep: 794.63 | bwd_inner_microstep: 794.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4038
[2024-06-10 23:15:35,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1545.04 | bwd_inner_microstep: 1545.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 23:15:37,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.85 | bwd_microstep: 1505.87 | bwd_inner_microstep: 1505.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-10 23:15:39,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1276.56 | bwd_inner_microstep: 1276.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3516
[2024-06-10 23:15:42,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.62 | bwd_microstep: 1648.91 | bwd_inner_microstep: 1648.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-10 23:15:44,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1519.54 | bwd_inner_microstep: 1519.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-10 23:15:45,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.18 | bwd_microstep: 1245.35 | bwd_inner_microstep: 1245.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2487
[2024-06-10 23:15:47,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.02 | bwd_microstep: 1081.35 | bwd_inner_microstep: 1081.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1935
[2024-06-10 23:15:48,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.12 | bwd_microstep: 759.90 | bwd_inner_microstep: 759.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 23:15:50,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1491.51 | bwd_inner_microstep: 1491.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-10 23:15:52,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1493.80 | bwd_inner_microstep: 1493.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3543
[2024-06-10 23:15:54,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.97 | bwd_microstep: 1636.98 | bwd_inner_microstep: 1636.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3510
[2024-06-10 23:15:56,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1512.13 | bwd_inner_microstep: 1512.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2935
[2024-06-10 23:15:58,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.12 | bwd_microstep: 1285.54 | bwd_inner_microstep: 1285.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 23:16:00,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1645.57 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-10 23:16:02,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.13 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3595
[2024-06-10 23:16:05,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.05 | bwd_microstep: 1702.36 | bwd_inner_microstep: 1702.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-10 23:16:07,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.14 | bwd_microstep: 1503.23 | bwd_inner_microstep: 1503.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-10 23:16:09,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.87 | bwd_microstep: 1503.29 | bwd_inner_microstep: 1503.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-10 23:16:11,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1534.85 | bwd_inner_microstep: 1534.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-10 23:16:12,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.96 | bwd_microstep: 964.43 | bwd_inner_microstep: 964.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 23:16:14,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1256.78 | bwd_inner_microstep: 1256.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-10 23:16:20,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-10 23:16:20,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 5495.29 | bwd_inner_microstep: 1359.64 | bwd_allreduce_microstep: 4135.59 | step_microstep: 37.97
[2024-06-10 23:16:20,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16011.80 | bwd: 47114.49 | bwd_inner: 42977.92 | bwd_allreduce: 4135.87 | step: 39.48
{'loss': 1.1845, 'learning_rate': 5.791668972745859e-06, 'epoch': 0.76}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 23:16:22,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.61 | bwd_microstep: 1235.01 | bwd_inner_microstep: 1234.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3926
[2024-06-10 23:16:24,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.67 | bwd_microstep: 1539.03 | bwd_inner_microstep: 1539.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406
[2024-06-10 23:16:25,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.05 | bwd_microstep: 1209.81 | bwd_inner_microstep: 1209.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-10 23:16:27,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1396.83 | bwd_inner_microstep: 1396.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4194
[2024-06-10 23:16:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.24 | bwd_microstep: 1561.45 | bwd_inner_microstep: 1561.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-10 23:16:31,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.04 | bwd_microstep: 803.80 | bwd_inner_microstep: 803.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 23:16:33,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1352.41 | bwd_inner_microstep: 1352.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-10 23:16:34,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.45 | bwd_microstep: 679.34 | bwd_inner_microstep: 679.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 23:16:35,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1248.04 | bwd_inner_microstep: 1248.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 23:16:37,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1285.48 | bwd_inner_microstep: 1285.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-10 23:16:38,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.47 | bwd_microstep: 793.02 | bwd_inner_microstep: 793.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 23:16:40,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.51 | bwd_microstep: 1283.73 | bwd_inner_microstep: 1283.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-10 23:16:42,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.20 | bwd_microstep: 1288.77 | bwd_inner_microstep: 1288.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3648
[2024-06-10 23:16:44,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.64 | bwd_microstep: 1315.49 | bwd_inner_microstep: 1315.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-10 23:16:46,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.99 | bwd_microstep: 1514.41 | bwd_inner_microstep: 1514.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-10 23:16:48,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1459.78 | bwd_inner_microstep: 1459.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 23:16:50,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.25 | bwd_microstep: 1480.57 | bwd_inner_microstep: 1480.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-10 23:16:51,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.44 | bwd_microstep: 1293.11 | bwd_inner_microstep: 1293.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-10 23:16:54,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.25 | bwd_microstep: 1562.72 | bwd_inner_microstep: 1562.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3824
[2024-06-10 23:16:56,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.61 | bwd_microstep: 1487.56 | bwd_inner_microstep: 1487.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3824
[2024-06-10 23:16:58,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.46 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-10 23:17:00,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1282.58 | bwd_inner_microstep: 1282.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 23:17:02,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1495.19 | bwd_inner_microstep: 1495.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444
[2024-06-10 23:17:04,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1409.24 | bwd_inner_microstep: 1409.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-10 23:17:06,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.20 | bwd_microstep: 1486.60 | bwd_inner_microstep: 1486.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-10 23:17:07,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.49 | bwd_microstep: 1300.31 | bwd_inner_microstep: 1300.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-10 23:17:09,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.56 | bwd_microstep: 1398.87 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-10 23:17:11,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.86 | bwd_microstep: 1343.44 | bwd_inner_microstep: 1343.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3485
[2024-06-10 23:17:13,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1405.05 | bwd_inner_microstep: 1405.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2921
[2024-06-10 23:17:15,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.88 | bwd_microstep: 1226.23 | bwd_inner_microstep: 1226.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 23:17:17,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.69 | bwd_microstep: 1455.38 | bwd_inner_microstep: 1455.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3577
[2024-06-10 23:17:22,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.57
[2024-06-10 23:17:22,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.08 | bwd_microstep: 4634.79 | bwd_inner_microstep: 1920.21 | bwd_allreduce_microstep: 2714.53 | step_microstep: 37.66
[2024-06-10 23:17:22,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16086.33 | bwd: 45755.74 | bwd_inner: 43040.31 | bwd_allreduce: 2714.76 | step: 39.11
{'loss': 1.147, 'learning_rate': 5.765278302752815e-06, 'epoch': 0.76}
 61.11s/it]
 76%|███████▌  | 1307/1726 [22:35:49<7:07:35, 61.23s/it]


 76%|███████▌  | 1307/1726 [22:35:49<7:07:35, 61.23s/it]
 76%|███████▌  | 1308/1726 [22:36:50<7:06:27, 61.21s/it]


 76%|███████▌  | 1308/1726 [22:36:50<7:06:27, 61.21s/it]
 76%|███████▌  | 1309/1726 [22:37:53<7:09:14, 61.76s/it]


 76%|███████▌  | 1309/1726 [22:37:53<7:09:14, 61.76s/it]
 76%|███████▌  | 1310/1726 [22:38:57<7:11:45, 62.27s/it]


 76%|███████▌  | 1310/1726 [22:38:57<7:11:45, 62.27s/it]
 76%|███████▌  | 1311/1726 [22:39:59<7:10:30, 62.24s/it]


 76%|███████▌  | 1311/1726 [22:39:59<7:10:30, 62.dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-10 23:17:24,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.36 | bwd_microstep: 1428.78 | bwd_inner_microstep: 1428.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 23:17:26,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1370.58 | bwd_inner_microstep: 1370.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-10 23:17:28,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1349.34 | bwd_inner_microstep: 1349.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-10 23:17:30,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1392.63 | bwd_inner_microstep: 1392.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 23:17:32,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709
[2024-06-10 23:17:34,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.86 | bwd_microstep: 1627.26 | bwd_inner_microstep: 1627.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3414
[2024-06-10 23:17:36,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.76 | bwd_microstep: 1182.93 | bwd_inner_microstep: 1182.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 23:17:37,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.96 | bwd_microstep: 789.73 | bwd_inner_microstep: 789.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 23:17:39,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.56 | bwd_microstep: 1484.86 | bwd_inner_microstep: 1484.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-10 23:17:41,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.58 | bwd_microstep: 1283.51 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-10 23:17:42,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.41 | bwd_microstep: 1340.65 | bwd_inner_microstep: 1340.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-10 23:17:44,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1374.81 | bwd_inner_microstep: 1374.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2082
[2024-06-10 23:17:46,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.24 | bwd_microstep: 943.25 | bwd_inner_microstep: 943.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-10 23:17:48,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.51 | bwd_microstep: 1613.51 | bwd_inner_microstep: 1613.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-10 23:17:50,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.72 | bwd_microstep: 1605.91 | bwd_inner_microstep: 1605.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-10 23:17:52,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.54 | bwd_microstep: 1316.10 | bwd_inner_microstep: 1316.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-10 23:17:54,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.93 | bwd_microstep: 1422.27 | bwd_inner_microstep: 1422.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-10 23:17:56,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1609.69 | bwd_inner_microstep: 1609.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 23:17:58,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.14 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 23:18:00,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.33 | bwd_microstep: 1384.09 | bwd_inner_microstep: 1384.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 23:18:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.67 | bwd_microstep: 1355.20 | bwd_inner_microstep: 1355.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 23:18:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1492.90 | bwd_inner_microstep: 1492.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 23:18:05,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.02 | bwd_microstep: 973.36 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-10 23:18:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.54 | bwd_microstep: 697.89 | bwd_inner_microstep: 697.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-10 23:18:07,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.45 | bwd_microstep: 877.47 | bwd_inner_microstep: 877.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 23:18:09,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.44 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3820
[2024-06-10 23:18:12,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.33 | bwd_microstep: 1632.96 | bwd_inner_microstep: 1632.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573
[2024-06-10 23:18:14,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.38 | bwd_microstep: 1423.94 | bwd_inner_microstep: 1423.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3447
[2024-06-10 23:18:16,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.33 | bwd_microstep: 1515.96 | bwd_inner_microstep: 1515.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3772
[2024-06-10 23:18:18,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.40 | bwd_microstep: 1488.47 | bwd_inner_microstep: 1488.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-10 23:18:20,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1355.16 | bwd_inner_microstep: 1355.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678
[2024-06-10 23:18:23,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-10 23:18:23,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.18 | bwd_microstep: 2879.92 | bwd_inner_microstep: 1526.10 | bwd_allreduce_microstep: 1353.77 | step_microstep: 37.79
[2024-06-10 23:18:23,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.91 | bwd: 44436.88 | bwd_inner: 43082.20 | bwd_allreduce: 1354.00 | step: 39.28
{'loss': 1.2161, 'learning_rate': 5.738937767375596e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 23:18:25,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.98 | bwd_microstep: 1376.27 | bwd_inner_microstep: 1376.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-10 23:18:27,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1479.43 | bwd_inner_microstep: 1479.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3997
[2024-06-10 23:18:29,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.14 | bwd_microstep: 1536.29 | bwd_inner_microstep: 1536.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3787
[2024-06-10 23:18:31,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.62 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-10 23:18:33,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1535.01 | bwd_inner_microstep: 1534.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 23:18:35,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.92 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-10 23:18:36,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.82 | bwd_microstep: 793.90 | bwd_inner_microstep: 793.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-10 23:18:37,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.60 | bwd_microstep: 1153.25 | bwd_inner_microstep: 1153.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2668
[2024-06-10 23:18:39,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.50 | bwd_microstep: 959.99 | bwd_inner_microstep: 959.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3717
[2024-06-10 23:18:41,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.86 | bwd_microstep: 1482.03 | bwd_inner_microstep: 1482.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439
[2024-06-10 23:18:42,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.53 | bwd_microstep: 1187.91 | bwd_inner_microstep: 1187.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-10 23:18:44,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.65 | bwd_microstep: 806.72 | bwd_inner_microstep: 806.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2130
[2024-06-10 23:18:45,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.29 | bwd_microstep: 926.22 | bwd_inner_microstep: 926.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-10 23:18:47,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1448.18 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657
[2024-06-10 23:18:49,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.86 | bwd_microstep: 1714.28 | bwd_inner_microstep: 1714.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3541
[2024-06-10 23:18:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.21 | bwd_microstep: 1552.82 | bwd_inner_microstep: 1552.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 23:18:53,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1344.97 | bwd_inner_microstep: 1344.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 23:18:55,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.89 | bwd_microstep: 1549.29 | bwd_inner_microstep: 1549.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920
[2024-06-10 23:18:56,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.67 | bwd_microstep: 686.88 | bwd_inner_microstep: 686.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 23:18:58,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.23 | bwd_microstep: 1185.38 | bwd_inner_microstep: 1185.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-10 23:19:00,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1251.76 | bwd_inner_microstep: 1251.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 23:19:02,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.36 | bwd_microstep: 1539.38 | bwd_inner_microstep: 1539.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2088
[2024-06-10 23:19:03,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.99 | bwd_microstep: 918.11 | bwd_inner_microstep: 918.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1981
[2024-06-10 23:19:04,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.88 | bwd_microstep: 734.20 | bwd_inner_microstep: 734.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-10 23:19:06,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.40 | bwd_microstep: 1562.50 | bwd_inner_microstep: 1562.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-10 23:19:07,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.93 | bwd_microstep: 806.32 | bwd_inner_microstep: 806.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-10 23:19:09,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.06 | bwd_microstep: 1461.12 | bwd_inner_microstep: 1461.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 23:19:11,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.72 | bwd_microstep: 1550.11 | bwd_inner_microstep: 1550.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574
[2024-06-10 23:19:14,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.45 | bwd_microstep: 1524.18 | bwd_inner_microstep: 1524.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 23:19:15,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.71 | bwd_microstep: 1306.11 | bwd_inner_microstep: 1306.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-10 23:19:18,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.08 | bwd_microstep: 1644.62 | bwd_inner_microstep: 1644.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3809
[2024-06-10 23:19:26,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.26 | optimizer_step: 6.60
[2024-06-10 23:19:26,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.04 | bwd_microstep: 7503.52 | bwd_inner_microstep: 1831.88 | bwd_allreduce_microstep: 5671.57 | step_microstep: 38.28
[2024-06-10 23:19:26,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15418.27 | bwd: 47052.25 | bwd_inner: 41379.76 | bwd_allreduce: 5671.81 | step: 39.72
{'loss': 1.1571, 'learning_rate': 5.712647459385425e-06, 'epoch': 0.76}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3472
[2024-06-10 23:19:28,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1426.41 | bwd_inner_microstep: 1426.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 23:19:30,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1380.27 | bwd_inner_microstep: 1380.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 23:19:31,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1247.64 | bwd_inner_microstep: 1247.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830
[2024-06-10 23:19:33,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1483.67 | bwd_inner_microstep: 1483.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 23:19:35,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-10 23:19:37,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 23:19:39,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.28 | bwd_microstep: 1373.16 | bwd_inner_microstep: 1373.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-10 23:19:41,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1374.03 | bwd_inner_microstep: 1374.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 23:19:43,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1486.64 | bwd_inner_microstep: 1486.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-10 23:19:45,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.38 | bwd_microstep: 1627.55 | bwd_inner_microstep: 1627.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3755
[2024-06-10 23:19:47,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1565.60 | bwd_inner_microstep: 1565.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-10 23:19:49,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.96 | bwd_microstep: 1318.92 | bwd_inner_microstep: 1318.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677
[2024-06-10 23:19:51,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1324.38 | bwd_inner_microstep: 1324.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2615
[2024-06-10 23:19:52,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.65 | bwd_microstep: 1045.27 | bwd_inner_microstep: 1045.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3708
[2024-06-10 23:19:55,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.81 | bwd_microstep: 1723.98 | bwd_inner_microstep: 1723.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-10 23:19:57,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.92 | bwd_microstep: 1481.93 | bwd_inner_microstep: 1481.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545
[2024-06-10 23:19:59,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.58 | bwd_microstep: 1590.33 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 23:20:01,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1340.98 | bwd_inner_microstep: 1340.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3533
[2024-06-10 23:20:03,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.70 | bwd_microstep: 1565.39 | bwd_inner_microstep: 1565.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3449
[2024-06-10 23:20:05,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 1335.34 | bwd_inner_microstep: 1335.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 23:20:07,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.07 | bwd_microstep: 1437.38 | bwd_inner_microstep: 1437.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 23:20:09,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1283.40 | bwd_inner_microstep: 1283.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-10 23:20:10,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-10 23:20:13,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.22 | bwd_microstep: 1649.19 | bwd_inner_microstep: 1649.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-10 23:20:14,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1245.49 | bwd_inner_microstep: 1245.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586
[2024-06-10 23:20:16,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.69 | bwd_microstep: 1438.34 | bwd_inner_microstep: 1438.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-10 23:20:19,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1550.19 | bwd_inner_microstep: 1550.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3601
[2024-06-10 23:20:21,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.93 | bwd_microstep: 1433.67 | bwd_inner_microstep: 1433.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-10 23:20:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.03 | bwd_microstep: 1607.94 | bwd_inner_microstep: 1607.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-10 23:20:25,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1315.39 | bwd_inner_microstep: 1315.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2050
[2024-06-10 23:20:26,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 850.98 | bwd_inner_microstep: 850.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2246
[2024-06-10 23:20:28,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.07 | optimizer_step: 6.59
[2024-06-10 23:20:28,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.46 | bwd_microstep: 1867.41 | bwd_inner_microstep: 981.75 | bwd_allreduce_microstep: 885.61 | step_microstep: 37.61
[2024-06-10 23:20:28,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16572.85 | bwd: 45307.25 | bwd_inner: 44420.73 | bwd_allreduce: 885.83 | step: 39.10
{'loss': 1.1782, 'learning_rate': 5.686407471376623e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 23:20:30,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1382.49 | bwd_inner_microstep: 1382.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-10 23:20:31,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.95 | bwd_microstep: 708.96 | bwd_inner_microstep: 708.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520
[2024-06-10 23:20:33,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1452.39 | bwd_inner_microstep: 1452.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-10 23:20:35,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.34 | bwd_microstep: 1442.17 | bwd_inner_microstep: 1442.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-10 23:20:37,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.51 | bwd_microstep: 1539.70 | bwd_inner_microstep: 1539.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-10 23:20:39,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1436.43 | bwd_inner_microstep: 1436.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 23:20:41,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1246.81 | bwd_inner_microstep: 1246.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 23:20:43,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1388.58 | bwd_inner_microstep: 1388.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 23:20:44,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1247.89 | bwd_inner_microstep: 1247.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 23:20:46,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1256.06 | bwd_inner_microstep: 1256.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-10 23:20:48,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1285.05 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525
[2024-06-10 23:20:50,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1324.91 | bwd_inner_microstep: 1324.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3518
[2024-06-10 23:20:52,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1320.00 | bwd_inner_microstep: 1319.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434
[2024-06-10 23:20:53,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.18 | bwd_microstep: 1311.04 | bwd_inner_microstep: 1311.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449
[2024-06-10 23:20:55,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.39 | bwd_microstep: 1316.30 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 23:20:57,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1346.53 | bwd_inner_microstep: 1346.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3680
[2024-06-10 23:20:59,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1358.61 | bwd_inner_microstep: 1358.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 23:21:01,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1407.06 | bwd_inner_microstep: 1407.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438
[2024-06-10 23:21:03,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.74 | bwd_microstep: 1318.09 | bwd_inner_microstep: 1318.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-10 23:21:05,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.72 | bwd_microstep: 1497.03 | bwd_inner_microstep: 1497.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2140
[2024-06-10 23:21:06,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.75 | bwd_microstep: 770.97 | bwd_inner_microstep: 770.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1988
[2024-06-10 23:21:07,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.17 | bwd_microstep: 739.75 | bwd_inner_microstep: 739.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 23:21:09,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.04 | bwd_microstep: 1555.31 | bwd_inner_microstep: 1555.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3554
[2024-06-10 23:21:11,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1235.53 | bwd_inner_microstep: 1235.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-10 23:21:12,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1259.26 | bwd_inner_microstep: 1259.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-10 23:21:15,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1563.26 | bwd_inner_microstep: 1563.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-10 23:21:16,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.08 | bwd_microstep: 697.52 | bwd_inner_microstep: 697.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-10 23:21:18,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.19 | bwd_microstep: 1450.26 | bwd_inner_microstep: 1450.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-10 23:21:20,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1410.49 | bwd_inner_microstep: 1410.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-10 23:21:21,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1383.57 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3601
[2024-06-10 23:21:24,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.38 | bwd_microstep: 1535.93 | bwd_inner_microstep: 1535.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-10 23:21:28,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-10 23:21:28,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 4002.92 | bwd_inner_microstep: 1600.86 | bwd_allreduce_microstep: 2402.01 | step_microstep: 37.76
[2024-06-10 23:21:28,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15631.77 | bwd: 44190.89 | bwd_inner: 41787.97 | bwd_allreduce: 2402.24 | step: 39.26
{'loss': 1.1498, 'learning_rate': 5.660217895766302e-06, 'epoch': 0.76}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 23:21:30,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1468.85 | bwd_inner_microstep: 1468.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900
[2024-06-10 23:21:31,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.31 | bwd_microstep: 682.46 | bwd_inner_microstep: 682.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 23:21:33,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.61 | bwd_microstep: 1274.07 | bwd_inner_microstep: 1274.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-10 23:21:35,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.92 | bwd_microstep: 1646.88 | bwd_inner_microstep: 1646.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-10 23:21:37,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.34 | bwd_microstep: 1538.39 | bwd_inner_microstep: 1538.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 23:21:39,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1397.27 | bwd_inner_microstep: 1397.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 23:21:41,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.46 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3443
[2024-06-10 23:21:43,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.25 | bwd_microstep: 1216.62 | bwd_inner_microstep: 1216.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3463
[2024-06-10 23:21:44,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.78 | bwd_microstep: 1217.85 | bwd_inner_microstep: 1217.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1986
[2024-06-10 23:21:45,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.32 | bwd_microstep: 772.92 | bwd_inner_microstep: 772.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1920
[2024-06-10 23:21:46,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.15 | bwd_microstep: 735.77 | bwd_inner_microstep: 735.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 23:21:48,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1340.42 | bwd_inner_microstep: 1340.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678
[2024-06-10 23:21:50,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.58 | bwd_microstep: 1518.75 | bwd_inner_microstep: 1518.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-10 23:21:52,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1395.75 | bwd_inner_microstep: 1395.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2962
[2024-06-10 23:21:54,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.08 | bwd_microstep: 1199.31 | bwd_inner_microstep: 1199.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523
[2024-06-10 23:21:56,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.21 | bwd_microstep: 1590.14 | bwd_inner_microstep: 1590.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531
[2024-06-10 23:21:58,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.80 | bwd_microstep: 1593.97 | bwd_inner_microstep: 1593.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 23:22:00,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.60 | bwd_microstep: 1404.60 | bwd_inner_microstep: 1404.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-10 23:22:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1416.88 | bwd_inner_microstep: 1416.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3959
[2024-06-10 23:22:05,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.87 | bwd_microstep: 1807.23 | bwd_inner_microstep: 1807.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-10 23:22:06,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1282.07 | bwd_inner_microstep: 1282.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-10 23:22:09,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.00 | bwd_microstep: 1559.67 | bwd_inner_microstep: 1559.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549
[2024-06-10 23:22:10,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.46 | bwd_microstep: 1202.39 | bwd_inner_microstep: 1202.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-10 23:22:13,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.65 | bwd_microstep: 1663.99 | bwd_inner_microstep: 1663.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3459
[2024-06-10 23:22:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1506.42 | bwd_inner_microstep: 1506.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 23:22:17,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1386.86 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2287
[2024-06-10 23:22:18,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.64 | bwd_microstep: 912.59 | bwd_inner_microstep: 912.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-10 23:22:20,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1556.19 | bwd_inner_microstep: 1556.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3742
[2024-06-10 23:22:22,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.34 | bwd_microstep: 1495.60 | bwd_inner_microstep: 1495.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2063
[2024-06-10 23:22:23,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.30 | bwd_microstep: 1009.37 | bwd_inner_microstep: 1009.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280
[2024-06-10 23:22:25,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.71 | bwd_microstep: 1004.46 | bwd_inner_microstep: 1004.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3807
[2024-06-10 23:22:31,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.22 | optimizer_step: 6.57
[2024-06-10 23:22:31,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.81 | bwd_microstep: 5547.63 | bwd_inner_microstep: 2214.45 | bwd_allreduce_microstep: 3333.11 | step_microstep: 38.74
[2024-06-10 23:22:31,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15980.33 | bwd: 46529.99 | bwd_inner: 43195.93 | bwd_allreduce: 3333.34 | step: 40.31
{'loss': 1.1392, 'learning_rate': 5.634078824794009e-06, 'epoch': 0.76}
24s/it]
 76%|███████▌  | 1312/1726 [22:41:00<7:06:32, 61.82s/it]


 76%|███████▌  | 1312/1726 [22:41:00<7:06:32, 61.82s/it]
 76%|███████▌  | 1313/1726 [22:42:03<7:07:32, 62.11s/it]


 76%|███████▌  | 1313/1726 [22:42:03<7:07:32, 62.11s/it]
 76%|███████▌  | 1314/1726 [22:43:05<7:06:42, 62.14s/it]


 76%|███████▌  | 1314/1726 [22:43:05<7:06:42, 62.14s/it]
 76%|███████▌  | 1315/1726 [22:44:05<7:01:35, 61.55s/it]


 76%|███████▌  | 1315/1726 [22:44:05<7:01:35, 61.55s/it]
 76%|███████▌  | 1316/1726 [22:45:08<7:03:14, 61.94s/it]


 76%|███████▌  | 1316/1726 [22:45:08<7:03:14, 61.94s/dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 23:22:33,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.74 | bwd_microstep: 1335.66 | bwd_inner_microstep: 1335.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-10 23:22:35,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1383.89 | bwd_inner_microstep: 1383.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-10 23:22:37,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1342.65 | bwd_inner_microstep: 1342.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 23:22:39,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.86 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 850
[2024-06-10 23:22:39,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.33 | bwd_microstep: 347.14 | bwd_inner_microstep: 347.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741
[2024-06-10 23:22:41,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1429.23 | bwd_inner_microstep: 1429.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 23:22:43,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.12 | bwd_microstep: 1403.03 | bwd_inner_microstep: 1403.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 23:22:45,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.52 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-10 23:22:47,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1283.57 | bwd_inner_microstep: 1283.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-10 23:22:49,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.45 | bwd_microstep: 1526.41 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1915
[2024-06-10 23:22:50,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.11 | bwd_microstep: 780.26 | bwd_inner_microstep: 780.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2445
[2024-06-10 23:22:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.80 | bwd_microstep: 1015.46 | bwd_inner_microstep: 1015.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083
[2024-06-10 23:22:53,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.33 | bwd_microstep: 918.29 | bwd_inner_microstep: 918.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-10 23:22:55,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1418.77 | bwd_inner_microstep: 1418.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 23:22:57,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.78 | bwd_microstep: 1614.10 | bwd_inner_microstep: 1614.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2929
[2024-06-10 23:22:59,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.58 | bwd_microstep: 1284.40 | bwd_inner_microstep: 1284.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-10 23:23:00,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.18 | bwd_microstep: 1153.83 | bwd_inner_microstep: 1153.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-10 23:23:02,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.20 | bwd_microstep: 1439.17 | bwd_inner_microstep: 1439.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-10 23:23:04,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1403.50 | bwd_inner_microstep: 1403.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-10 23:23:05,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.72 | bwd_microstep: 797.27 | bwd_inner_microstep: 797.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3544
[2024-06-10 23:23:07,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-10 23:23:09,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.25 | bwd_microstep: 1310.41 | bwd_inner_microstep: 1310.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-10 23:23:10,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.24 | bwd_microstep: 809.11 | bwd_inner_microstep: 809.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-10 23:23:12,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1296.36 | bwd_inner_microstep: 1296.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 23:23:13,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.19 | bwd_microstep: 1185.52 | bwd_inner_microstep: 1185.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2186
[2024-06-10 23:23:15,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.79 | bwd_microstep: 857.95 | bwd_inner_microstep: 857.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-10 23:23:17,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.71 | bwd_microstep: 1436.73 | bwd_inner_microstep: 1436.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 23:23:18,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1378.27 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2237
[2024-06-10 23:23:20,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.77 | bwd_microstep: 1062.39 | bwd_inner_microstep: 1062.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3536
[2024-06-10 23:23:22,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1448.83 | bwd_inner_microstep: 1448.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038
[2024-06-10 23:23:23,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.72 | bwd_microstep: 901.67 | bwd_inner_microstep: 901.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 23:23:35,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.41 | optimizer_step: 6.60
[2024-06-10 23:23:35,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 10884.96 | bwd_inner_microstep: 1543.79 | bwd_allreduce_microstep: 9341.10 | step_microstep: 39.93
[2024-06-10 23:23:35,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14703.93 | bwd: 48602.52 | bwd_inner: 39260.43 | bwd_allreduce: 9341.38 | step: 41.54
{'loss': 1.2072, 'learning_rate': 5.607990350521413e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-10 23:23:37,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1392.57 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2381
[2024-06-10 23:23:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.16 | bwd_microstep: 957.90 | bwd_inner_microstep: 957.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-10 23:23:40,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1336.22 | bwd_inner_microstep: 1336.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 23:23:42,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1374.96 | bwd_inner_microstep: 1374.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 23:23:44,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1394.63 | bwd_inner_microstep: 1394.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732
[2024-06-10 23:23:46,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.49 | bwd_microstep: 1425.73 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884
[2024-06-10 23:23:47,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.13 | bwd_microstep: 710.52 | bwd_inner_microstep: 710.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2257
[2024-06-10 23:23:48,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 968.92 | bwd_inner_microstep: 968.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 23:23:50,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1396.18 | bwd_inner_microstep: 1396.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1945
[2024-06-10 23:23:51,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.71 | bwd_microstep: 849.27 | bwd_inner_microstep: 849.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3477
[2024-06-10 23:23:53,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.46 | bwd_microstep: 1326.48 | bwd_inner_microstep: 1326.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-10 23:23:55,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1582.72 | bwd_inner_microstep: 1582.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661
[2024-06-10 23:23:57,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.53 | bwd_microstep: 1717.41 | bwd_inner_microstep: 1717.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-10 23:24:00,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.96 | bwd_microstep: 1577.20 | bwd_inner_microstep: 1577.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 23:24:01,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1389.25 | bwd_inner_microstep: 1389.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-10 23:24:03,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.13 | bwd_microstep: 1405.60 | bwd_inner_microstep: 1405.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522
[2024-06-10 23:24:05,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.47 | bwd_microstep: 1449.89 | bwd_inner_microstep: 1449.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3530
[2024-06-10 23:24:07,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1275.59 | bwd_inner_microstep: 1275.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3538
[2024-06-10 23:24:09,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1227.66 | bwd_inner_microstep: 1227.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054
[2024-06-10 23:24:10,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.93 | bwd_microstep: 910.39 | bwd_inner_microstep: 910.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284
[2024-06-10 23:24:12,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.44 | bwd_microstep: 1071.40 | bwd_inner_microstep: 1071.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 23:24:14,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.82 | bwd_microstep: 1523.06 | bwd_inner_microstep: 1523.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-10 23:24:16,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1509.84 | bwd_inner_microstep: 1509.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 23:24:18,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.92 | bwd_microstep: 1550.53 | bwd_inner_microstep: 1550.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3822
[2024-06-10 23:24:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.02 | bwd_microstep: 1261.11 | bwd_inner_microstep: 1261.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-10 23:24:21,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.28 | bwd_microstep: 876.31 | bwd_inner_microstep: 876.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3713
[2024-06-10 23:24:23,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.68 | bwd_microstep: 1730.50 | bwd_inner_microstep: 1730.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3592
[2024-06-10 23:24:25,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.79 | bwd_microstep: 1532.18 | bwd_inner_microstep: 1532.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-10 23:24:27,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1350.94 | bwd_inner_microstep: 1350.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2668
[2024-06-10 23:24:29,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.98 | bwd_microstep: 1119.19 | bwd_inner_microstep: 1119.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 23:24:31,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.84 | bwd_microstep: 1522.89 | bwd_inner_microstep: 1522.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 23:24:35,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-10 23:24:35,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 3812.37 | bwd_inner_microstep: 1757.53 | bwd_allreduce_microstep: 2054.79 | step_microstep: 37.73
[2024-06-10 23:24:35,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15783.70 | bwd: 44529.42 | bwd_inner: 42473.74 | bwd_allreduce: 2055.01 | step: 39.19
{'loss': 1.2379, 'learning_rate': 5.581952564831978e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 23:24:37,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.70 | bwd_microstep: 1336.50 | bwd_inner_microstep: 1336.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-10 23:24:39,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.33 | bwd_microstep: 1377.32 | bwd_inner_microstep: 1377.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 23:24:41,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.47 | bwd_microstep: 1278.49 | bwd_inner_microstep: 1278.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-10 23:24:43,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1241.84 | bwd_inner_microstep: 1241.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 23:24:45,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.83 | bwd_microstep: 1544.90 | bwd_inner_microstep: 1544.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2034
[2024-06-10 23:24:46,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.33 | bwd_microstep: 811.05 | bwd_inner_microstep: 811.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-10 23:24:48,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.65 | bwd_microstep: 1342.48 | bwd_inner_microstep: 1342.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-10 23:24:50,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.93 | bwd_microstep: 1384.01 | bwd_inner_microstep: 1383.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3707
[2024-06-10 23:24:51,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.28 | bwd_microstep: 1358.07 | bwd_inner_microstep: 1358.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-10 23:24:53,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1306.09 | bwd_inner_microstep: 1306.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3686
[2024-06-10 23:24:56,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.53 | bwd_microstep: 1687.80 | bwd_inner_microstep: 1687.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4816
[2024-06-10 23:24:58,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 692.76 | bwd_microstep: 1828.13 | bwd_inner_microstep: 1828.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-10 23:25:00,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1318.81 | bwd_inner_microstep: 1318.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3511
[2024-06-10 23:25:02,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.76 | bwd_microstep: 1533.99 | bwd_inner_microstep: 1533.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3518
[2024-06-10 23:25:04,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1445.66 | bwd_inner_microstep: 1445.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 23:25:06,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.41 | bwd_microstep: 1587.83 | bwd_inner_microstep: 1587.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-10 23:25:08,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1296.08 | bwd_inner_microstep: 1296.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3480
[2024-06-10 23:25:10,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.75 | bwd_microstep: 1526.54 | bwd_inner_microstep: 1526.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 23:25:12,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1517.13 | bwd_inner_microstep: 1517.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 23:25:14,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1396.22 | bwd_inner_microstep: 1396.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 23:25:16,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.47 | bwd_microstep: 1181.76 | bwd_inner_microstep: 1181.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-10 23:25:18,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.87 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 23:25:19,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.66 | bwd_microstep: 1259.08 | bwd_inner_microstep: 1259.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-10 23:25:21,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.54 | bwd_microstep: 1540.82 | bwd_inner_microstep: 1540.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287
[2024-06-10 23:25:23,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.70 | bwd_microstep: 977.67 | bwd_inner_microstep: 977.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3714
[2024-06-10 23:25:25,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1365.67 | bwd_inner_microstep: 1365.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 23:25:26,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.26 | bwd_microstep: 1158.49 | bwd_inner_microstep: 1158.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-10 23:25:28,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.96 | bwd_microstep: 1536.06 | bwd_inner_microstep: 1536.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3464
[2024-06-10 23:25:30,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.00 | bwd_microstep: 1340.57 | bwd_inner_microstep: 1340.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3565
[2024-06-10 23:25:32,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.49 | bwd_microstep: 1460.85 | bwd_inner_microstep: 1460.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-10 23:25:34,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.17 | bwd_microstep: 1439.65 | bwd_inner_microstep: 1439.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2270
[2024-06-10 23:25:37,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.01 | optimizer_step: 6.58
[2024-06-10 23:25:37,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.72 | bwd_microstep: 1825.37 | bwd_inner_microstep: 1061.23 | bwd_allreduce_microstep: 764.09 | step_microstep: 37.48
[2024-06-10 23:25:37,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16373.74 | bwd: 44515.16 | bwd_inner: 43750.16 | bwd_allreduce: 764.32 | step: 38.95
{'loss': 1.1775, 'learning_rate': 5.555965559430671e-06, 'epoch': 0.76}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-10 23:25:38,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.64 | bwd_microstep: 1327.06 | bwd_inner_microstep: 1327.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2398
[2024-06-10 23:25:40,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.21 | bwd_microstep: 998.74 | bwd_inner_microstep: 998.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 23:25:42,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1350.95 | bwd_inner_microstep: 1350.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-10 23:25:43,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.09 | bwd_microstep: 823.85 | bwd_inner_microstep: 823.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-10 23:25:45,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-10 23:25:47,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.79 | bwd_microstep: 1532.44 | bwd_inner_microstep: 1532.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420
[2024-06-10 23:25:48,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.92 | bwd_microstep: 1212.15 | bwd_inner_microstep: 1212.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2929
[2024-06-10 23:25:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.98 | bwd_microstep: 1062.82 | bwd_inner_microstep: 1062.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-10 23:25:51,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.41 | bwd_microstep: 792.88 | bwd_inner_microstep: 792.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691
[2024-06-10 23:25:53,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1425.06 | bwd_inner_microstep: 1425.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 23:25:55,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1343.28 | bwd_inner_microstep: 1343.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-10 23:25:57,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.56 | bwd_microstep: 1444.71 | bwd_inner_microstep: 1444.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2165
[2024-06-10 23:25:58,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.84 | bwd_microstep: 1045.51 | bwd_inner_microstep: 1045.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3414
[2024-06-10 23:26:00,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1394.56 | bwd_inner_microstep: 1394.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3465
[2024-06-10 23:26:02,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.13 | bwd_microstep: 1555.20 | bwd_inner_microstep: 1555.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2423
[2024-06-10 23:26:04,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.98 | bwd_microstep: 939.99 | bwd_inner_microstep: 939.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-10 23:26:05,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.61 | bwd_microstep: 1181.50 | bwd_inner_microstep: 1181.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 23:26:07,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1390.01 | bwd_inner_microstep: 1389.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 23:26:09,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.81 | bwd_microstep: 1557.16 | bwd_inner_microstep: 1557.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3471
[2024-06-10 23:26:11,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.48 | bwd_microstep: 1328.02 | bwd_inner_microstep: 1328.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-10 23:26:13,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1431.14 | bwd_inner_microstep: 1431.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536
[2024-06-10 23:26:15,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1325.84 | bwd_inner_microstep: 1325.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-10 23:26:16,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 699.88 | bwd_inner_microstep: 699.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-10 23:26:17,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.65 | bwd_microstep: 802.74 | bwd_inner_microstep: 802.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3662
[2024-06-10 23:26:19,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.81 | bwd_microstep: 1325.82 | bwd_inner_microstep: 1325.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3555
[2024-06-10 23:26:21,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.39 | bwd_microstep: 1454.54 | bwd_inner_microstep: 1454.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4184
[2024-06-10 23:26:23,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1523.57 | bwd_inner_microstep: 1523.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-10 23:26:25,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1551.89 | bwd_inner_microstep: 1551.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3766
[2024-06-10 23:26:27,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1476.58 | bwd_inner_microstep: 1476.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2194
[2024-06-10 23:26:29,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.70 | bwd_microstep: 953.68 | bwd_inner_microstep: 953.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-10 23:26:31,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1511.09 | bwd_inner_microstep: 1511.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2037
[2024-06-10 23:26:37,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.09 | optimizer_step: 6.57
[2024-06-10 23:26:37,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.59 | bwd_microstep: 5791.17 | bwd_inner_microstep: 966.15 | bwd_allreduce_microstep: 4824.96 | step_microstep: 37.95
[2024-06-10 23:26:37,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15000.78 | bwd: 44941.69 | bwd_inner: 40115.83 | bwd_allreduce: 4825.19 | step: 39.39
{'loss': 1.1842, 'learning_rate': 5.530029425843564e-06, 'epoch': 0.76}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 23:26:38,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.13 | bwd_microstep: 1234.61 | bwd_inner_microstep: 1234.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3855
[2024-06-10 23:26:41,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.55 | bwd_microstep: 1455.06 | bwd_inner_microstep: 1455.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781
[2024-06-10 23:26:42,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.12 | bwd_microstep: 1441.56 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-10 23:26:44,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.30 | bwd_microstep: 1276.82 | bwd_inner_microstep: 1276.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729
[2024-06-10 23:26:47,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.92 | bwd_microstep: 1630.66 | bwd_inner_microstep: 1630.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-10 23:26:48,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1432.05 | bwd_inner_microstep: 1432.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 23:26:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1383.36 | bwd_inner_microstep: 1383.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-10 23:26:52,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1431.32 | bwd_inner_microstep: 1431.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1908
[2024-06-10 23:26:53,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.67 | bwd_microstep: 747.61 | bwd_inner_microstep: 747.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-10 23:26:55,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.90 | bwd_microstep: 1480.68 | bwd_inner_microstep: 1480.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-10 23:26:57,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.67 | bwd_microstep: 1277.74 | bwd_inner_microstep: 1277.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692
[2024-06-10 23:26:59,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1445.52 | bwd_inner_microstep: 1445.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3125
[2024-06-10 23:27:01,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 1342.32 | bwd_inner_microstep: 1342.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 23:27:03,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.98 | bwd_microstep: 1520.00 | bwd_inner_microstep: 1519.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 23:27:05,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1391.92 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695
[2024-06-10 23:27:07,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.60 | bwd_microstep: 1630.32 | bwd_inner_microstep: 1630.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-10 23:27:09,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1387.43 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-10 23:27:11,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.28 | bwd_microstep: 1377.45 | bwd_inner_microstep: 1377.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-10 23:27:13,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1396.33 | bwd_inner_microstep: 1396.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 23:27:15,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.67 | bwd_microstep: 1348.10 | bwd_inner_microstep: 1348.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-10 23:27:16,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.18 | bwd_microstep: 977.65 | bwd_inner_microstep: 977.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-10 23:27:18,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.16 | bwd_microstep: 915.13 | bwd_inner_microstep: 915.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 23:27:19,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1413.72 | bwd_inner_microstep: 1413.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-10 23:27:22,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1557.67 | bwd_inner_microstep: 1557.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3726
[2024-06-10 23:27:24,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1416.62 | bwd_inner_microstep: 1416.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 23:27:26,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1387.72 | bwd_inner_microstep: 1387.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3557
[2024-06-10 23:27:28,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.08 | bwd_microstep: 1591.68 | bwd_inner_microstep: 1591.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924
[2024-06-10 23:27:29,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.17 | bwd_microstep: 852.44 | bwd_inner_microstep: 852.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-10 23:27:31,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1353.24 | bwd_inner_microstep: 1353.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 23:27:33,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.13 | bwd_microstep: 1645.96 | bwd_inner_microstep: 1645.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 23:27:35,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.98 | bwd_microstep: 1496.59 | bwd_inner_microstep: 1496.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-10 23:27:40,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-10 23:27:40,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.23 | bwd_microstep: 4362.20 | bwd_inner_microstep: 897.11 | bwd_allreduce_microstep: 3465.04 | step_microstep: 38.14
[2024-06-10 23:27:40,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16060.88 | bwd: 46601.52 | bwd_inner: 43135.55 | bwd_allreduce: 3465.28 | step: 39.60
{'loss': 1.2159, 'learning_rate': 5.504144255417605e-06, 'epoch': 0.77}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910
[2024-06-10 23:27:41,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.05 | bwd_microstep: 771.41 | bwd_inner_microstep: 771.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3877
[2024-06-10 23:27:43,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.78 | bwd_microstep: 1577.31 | bwd_inner_microstep: 1577.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852
[2024-06-10 23:27:45,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.37 | bwd_microstep: 1655.93 | bwd_inner_microstep: 1655.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844
[2024-06-10 23:27:47,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.92 | bwd_microstep: 1461.15 | bwd_inner_microstep: 1461.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3497
[2024-06-10 23:27:49,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.92 | bwd_microstep: 1350.60 | bwd_inner_microstep: 1350.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-10 23:27:51,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1529.30 | bwd_inner_microstep: 1529.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2637
[2024-06-10 23:27:53,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.26 | bwd_microstep: 1022.32 | bwd_inner_microstep: 1022.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 23:27:55,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1410.47 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3748
[2024-06-10 23:27:57,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.62 | bwd_microstep: 1538.66 | bwd_inner_microstep: 1538.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3502
[2024-06-10 23:27:59,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.26 | bwd_microstep: 1532.17 | bwd_inner_microstep: 1532.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-10 23:28:01,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1414.68 | bwd_inner_microstep: 1414.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-10 23:28:02,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.71 | bwd_microstep: 892.47 | bwd_inner_microstep: 892.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3709
[2024-06-10 23:28:04,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.86 | bwd_microstep: 1691.56 | bwd_inner_microstep: 1691.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3649
[2024-06-10 23:28:07,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.41 | bwd_microstep: 1819.97 | bwd_inner_microstep: 1819.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3654
[2024-06-10 23:28:09,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.28 | bwd_microstep: 1751.17 | bwd_inner_microstep: 1751.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-10 23:28:11,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.86 | bwd_microstep: 1615.29 | bwd_inner_microstep: 1615.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-10 23:28:13,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1389.60 | bwd_inner_microstep: 1389.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-10 23:28:16,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.19 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-10 23:28:17,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1377.40 | bwd_inner_microstep: 1377.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-10 23:28:19,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.81 | bwd_microstep: 1290.69 | bwd_inner_microstep: 1290.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-10 23:28:20,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.06 | bwd_microstep: 806.13 | bwd_inner_microstep: 806.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-10 23:28:22,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.13 | bwd_microstep: 1358.49 | bwd_inner_microstep: 1358.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 23:28:24,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.91 | bwd_microstep: 1253.02 | bwd_inner_microstep: 1252.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967
[2024-06-10 23:28:25,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.77 | bwd_microstep: 704.89 | bwd_inner_microstep: 704.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-10 23:28:27,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1506.39 | bwd_inner_microstep: 1506.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-10 23:28:29,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.74 | bwd_microstep: 1661.02 | bwd_inner_microstep: 1660.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 23:28:32,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.78 | bwd_microstep: 1640.39 | bwd_inner_microstep: 1640.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3608
[2024-06-10 23:28:34,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.29 | bwd_microstep: 1557.55 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-10 23:28:36,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1508.10 | bwd_inner_microstep: 1508.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-10 23:28:38,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1491.52 | bwd_inner_microstep: 1491.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3753
[2024-06-10 23:28:40,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.88 | bwd_microstep: 1503.41 | bwd_inner_microstep: 1503.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-10 23:28:42,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.05 | optimizer_step: 6.60
[2024-06-10 23:28:42,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.48 | bwd_microstep: 2065.82 | bwd_inner_microstep: 1063.78 | bwd_allreduce_microstep: 1001.99 | step_microstep: 37.94
[2024-06-10 23:28:42,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16562.45 | bwd: 45674.51 | bwd_inner: 44671.59 | bwd_allreduce: 1002.23 | step: 39.46
it]
 76%|███████▋  | 1317/1726 [22:46:11<7:05:42, 62.45s/it]


 76%|███████▋  | 1317/1726 [22:46:11<7:05:42, 62.45s/it]
 76%|███████▋  | 1318/1726 [22:47:12<7:00:59, 61.91s/it]


 76%|███████▋  | 1318/1726 [22:47:12<7:00:59, 61.91s/it]
 76%|███████▋  | 1319/1726 [22:48:13<6:58:33, 61.70s/it]


 76%|███████▋  | 1319/1726 [22:48:13<6:58:33, 61.70s/it]
 76%|███████▋  | 1320/1726 [22:49:14<6:54:36, 61.27s/it]


 76%|███████▋  | 1320/1726 [22:49:14<6:54:36, 61.27s/it]
 77%|███████▋  | 1321/1726 [22:50:17<6:57:04, 61.79s/it]


 77%|███████▋  | 1321/1726 [22:50:17<6:57:04, 61.79s/it]
{'loss': 1.1806, 'learning_rate': 5.478310139320213e-06, 'epoch': 0.77}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-10 23:28:43,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.78 | bwd_microstep: 784.83 | bwd_inner_microstep: 784.66 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2056
[2024-06-10 23:28:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.23 | bwd_microstep: 811.44 | bwd_inner_microstep: 811.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-10 23:28:47,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1392.49 | bwd_inner_microstep: 1392.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-10 23:28:48,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.94 | bwd_microstep: 1288.25 | bwd_inner_microstep: 1288.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 23:28:50,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.36 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-10 23:28:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1247.63 | bwd_inner_microstep: 1247.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 23:28:53,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1253.70 | bwd_inner_microstep: 1253.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-10 23:28:55,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.81 | bwd_microstep: 793.42 | bwd_inner_microstep: 793.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2419
[2024-06-10 23:28:56,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.48 | bwd_microstep: 1033.27 | bwd_inner_microstep: 1033.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-10 23:28:58,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1247.45 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1993
[2024-06-10 23:28:59,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.29 | bwd_microstep: 707.61 | bwd_inner_microstep: 707.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 23:29:00,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1248.46 | bwd_inner_microstep: 1248.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 23:29:02,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1390.17 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-10 23:29:04,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.01 | bwd_microstep: 923.36 | bwd_inner_microstep: 923.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 23:29:06,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1474.47 | bwd_inner_microstep: 1474.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-10 23:29:08,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1438.76 | bwd_inner_microstep: 1438.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628
[2024-06-10 23:29:10,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 1436.23 | bwd_inner_microstep: 1436.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 23:29:12,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.51 | bwd_microstep: 1486.95 | bwd_inner_microstep: 1486.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3824
[2024-06-10 23:29:14,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1390.17 | bwd_inner_microstep: 1390.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-10 23:29:16,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1605.32 | bwd_inner_microstep: 1605.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453
[2024-06-10 23:29:17,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.46 | bwd_microstep: 1192.59 | bwd_inner_microstep: 1192.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716
[2024-06-10 23:29:19,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.58 | bwd_microstep: 1464.28 | bwd_inner_microstep: 1464.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 23:29:21,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1281.92 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2427
[2024-06-10 23:29:23,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.49 | bwd_microstep: 968.08 | bwd_inner_microstep: 968.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-10 23:29:25,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.52 | bwd_microstep: 1449.65 | bwd_inner_microstep: 1449.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-10 23:29:27,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.22 | bwd_microstep: 1503.16 | bwd_inner_microstep: 1503.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-10 23:29:28,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.23 | bwd_microstep: 696.86 | bwd_inner_microstep: 696.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-10 23:29:30,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.48 | bwd_microstep: 1422.24 | bwd_inner_microstep: 1422.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3588
[2024-06-10 23:29:32,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.49 | bwd_microstep: 1554.44 | bwd_inner_microstep: 1554.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-10 23:29:34,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.14 | bwd_microstep: 1599.63 | bwd_inner_microstep: 1599.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-10 23:29:35,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.20 | bwd_microstep: 902.97 | bwd_inner_microstep: 902.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-10 23:29:44,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-10 23:29:44,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.31 | bwd_microstep: 7991.21 | bwd_inner_microstep: 1693.04 | bwd_allreduce_microstep: 6298.11 | step_microstep: 38.88
[2024-06-10 23:29:44,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14902.47 | bwd: 46167.14 | bwd_inner: 39867.99 | bwd_allreduce: 6298.41 | step: 40.48
{'loss': 1.1919, 'learning_rate': 5.452527168539026e-06, 'epoch': 0.77}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1916
[2024-06-10 23:29:45,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.72 | bwd_microstep: 866.09 | bwd_inner_microstep: 866.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-10 23:29:47,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.92 | bwd_microstep: 1245.48 | bwd_inner_microstep: 1245.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3834
[2024-06-10 23:29:49,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.13 | bwd_microstep: 1414.64 | bwd_inner_microstep: 1414.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 23:29:50,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1272.75 | bwd_inner_microstep: 1272.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 23:29:52,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1242.36 | bwd_inner_microstep: 1242.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3837
[2024-06-10 23:29:54,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1602.43 | bwd_inner_microstep: 1602.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 23:29:56,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1246.47 | bwd_inner_microstep: 1246.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-10 23:29:58,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1341.88 | bwd_inner_microstep: 1341.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-10 23:30:00,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1247.57 | bwd_inner_microstep: 1247.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-10 23:30:01,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3993
[2024-06-10 23:30:04,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 696.11 | bwd_microstep: 1913.22 | bwd_inner_microstep: 1913.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3604
[2024-06-10 23:30:06,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.08 | bwd_microstep: 1370.64 | bwd_inner_microstep: 1370.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2891
[2024-06-10 23:30:08,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.52 | bwd_microstep: 1183.08 | bwd_inner_microstep: 1183.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-10 23:30:10,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.06 | bwd_microstep: 1613.58 | bwd_inner_microstep: 1613.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637
[2024-06-10 23:30:12,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.54 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 23:30:14,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.62 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3643
[2024-06-10 23:30:16,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1343.60 | bwd_inner_microstep: 1343.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611
[2024-06-10 23:30:18,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.82 | bwd_microstep: 1608.39 | bwd_inner_microstep: 1608.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2129
[2024-06-10 23:30:19,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.01 | bwd_microstep: 798.60 | bwd_inner_microstep: 798.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-10 23:30:20,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 796.80 | bwd_inner_microstep: 796.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078
[2024-06-10 23:30:21,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.28 | bwd_microstep: 819.34 | bwd_inner_microstep: 819.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608
[2024-06-10 23:30:23,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.70 | bwd_microstep: 1310.50 | bwd_inner_microstep: 1310.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-10 23:30:25,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.28 | bwd_microstep: 1433.92 | bwd_inner_microstep: 1433.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3464
[2024-06-10 23:30:27,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.62 | bwd_microstep: 1214.28 | bwd_inner_microstep: 1214.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-10 23:30:29,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1399.63 | bwd_inner_microstep: 1399.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2082
[2024-06-10 23:30:30,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.85 | bwd_microstep: 918.60 | bwd_inner_microstep: 918.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-10 23:30:32,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.70 | bwd_microstep: 1299.64 | bwd_inner_microstep: 1299.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-10 23:30:33,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.12 | bwd_microstep: 1182.44 | bwd_inner_microstep: 1182.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3562
[2024-06-10 23:30:35,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1363.28 | bwd_inner_microstep: 1363.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-10 23:30:37,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824
[2024-06-10 23:30:39,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.79 | bwd_microstep: 1513.22 | bwd_inner_microstep: 1513.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382
[2024-06-10 23:30:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.58
[2024-06-10 23:30:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.31 | bwd_microstep: 7762.84 | bwd_inner_microstep: 1441.69 | bwd_allreduce_microstep: 6321.09 | step_microstep: 37.95
[2024-06-10 23:30:48,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15603.10 | bwd: 48063.73 | bwd_inner: 41741.70 | bwd_allreduce: 6321.33 | step: 39.46
{'loss': 1.1705, 'learning_rate': 5.426795433881527e-06, 'epoch': 0.77}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 23:30:50,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1363.82 | bwd_inner_microstep: 1363.74 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3459
[2024-06-10 23:30:51,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.36 | bwd_microstep: 1208.04 | bwd_inner_microstep: 1208.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 23:30:53,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.63 | bwd_microstep: 1472.05 | bwd_inner_microstep: 1472.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-10 23:30:55,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1379.14 | bwd_inner_microstep: 1379.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-10 23:30:57,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1370.19 | bwd_inner_microstep: 1370.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521
[2024-06-10 23:30:59,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.18 | bwd_microstep: 1193.12 | bwd_inner_microstep: 1193.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 23:31:00,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 795.32 | bwd_inner_microstep: 795.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-10 23:31:02,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.88 | bwd_microstep: 1273.75 | bwd_inner_microstep: 1273.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3968
[2024-06-10 23:31:04,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.04 | bwd_microstep: 1694.35 | bwd_inner_microstep: 1694.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-10 23:31:06,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1246.33 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-10 23:31:08,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.34 | bwd_microstep: 1275.85 | bwd_inner_microstep: 1275.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 11, images per sample: 2.75, dynamic token length: 1101
[2024-06-10 23:31:08,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 189.99 | bwd_microstep: 500.54 | bwd_inner_microstep: 500.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692
[2024-06-10 23:31:10,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 1325.73 | bwd_inner_microstep: 1325.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-10 23:31:12,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.66 | bwd_microstep: 1520.20 | bwd_inner_microstep: 1520.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3931
[2024-06-10 23:31:15,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.31 | bwd_microstep: 1737.08 | bwd_inner_microstep: 1737.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-10 23:31:17,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.29 | bwd_microstep: 1432.06 | bwd_inner_microstep: 1432.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 23:31:18,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.04 | bwd_microstep: 1380.22 | bwd_inner_microstep: 1380.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-10 23:31:19,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.54 | bwd_microstep: 726.92 | bwd_inner_microstep: 726.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3642
[2024-06-10 23:31:22,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1536.98 | bwd_inner_microstep: 1536.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-10 23:31:23,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.60 | bwd_microstep: 701.79 | bwd_inner_microstep: 701.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-10 23:31:25,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1552.83 | bwd_inner_microstep: 1552.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-10 23:31:27,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1505.43 | bwd_inner_microstep: 1505.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2287
[2024-06-10 23:31:28,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.51 | bwd_microstep: 1070.53 | bwd_inner_microstep: 1070.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004
[2024-06-10 23:31:29,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.33 | bwd_microstep: 896.05 | bwd_inner_microstep: 896.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-10 23:31:32,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.73 | bwd_microstep: 1600.30 | bwd_inner_microstep: 1600.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-10 23:31:34,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1346.42 | bwd_inner_microstep: 1346.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382
[2024-06-10 23:31:35,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.58 | bwd_microstep: 1273.39 | bwd_inner_microstep: 1273.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-10 23:31:37,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1465.62 | bwd_inner_microstep: 1465.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-10 23:31:39,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.81 | bwd_microstep: 1302.18 | bwd_inner_microstep: 1302.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3562
[2024-06-10 23:31:41,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1473.12 | bwd_inner_microstep: 1473.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 23:31:43,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1250.54 | bwd_inner_microstep: 1250.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736
[2024-06-10 23:31:50,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 23:31:50,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.98 | bwd_microstep: 6042.72 | bwd_inner_microstep: 1623.09 | bwd_allreduce_microstep: 4419.57 | step_microstep: 37.85
[2024-06-10 23:31:50,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15507.76 | bwd: 45912.66 | bwd_inner: 41492.12 | bwd_allreduce: 4419.85 | step: 39.39
{'loss': 1.1718, 'learning_rate': 5.40111502597475e-06, 'epoch': 0.77}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-10 23:31:51,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.95 | bwd_microstep: 1374.52 | bwd_inner_microstep: 1374.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1857
[2024-06-10 23:31:52,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.12 | bwd_microstep: 740.80 | bwd_inner_microstep: 740.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-10 23:31:55,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.62 | bwd_microstep: 1659.27 | bwd_inner_microstep: 1659.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-10 23:31:57,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1380.93 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 23:31:58,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.21 | bwd_microstep: 1277.30 | bwd_inner_microstep: 1277.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 23:32:00,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.61 | bwd_microstep: 1275.34 | bwd_inner_microstep: 1275.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-10 23:32:02,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.39 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-10 23:32:04,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1278.95 | bwd_inner_microstep: 1278.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-10 23:32:05,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.49 | bwd_microstep: 806.05 | bwd_inner_microstep: 806.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-10 23:32:07,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.87 | bwd_microstep: 1187.80 | bwd_inner_microstep: 1187.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-10 23:32:09,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.32 | bwd_microstep: 1420.48 | bwd_inner_microstep: 1420.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-10 23:32:11,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1483.88 | bwd_inner_microstep: 1483.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3694
[2024-06-10 23:32:13,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.32 | bwd_microstep: 1565.85 | bwd_inner_microstep: 1565.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-10 23:32:15,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1393.64 | bwd_inner_microstep: 1393.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3489
[2024-06-10 23:32:17,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1543.32 | bwd_inner_microstep: 1543.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079
[2024-06-10 23:32:18,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.90 | bwd_microstep: 819.11 | bwd_inner_microstep: 819.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3423
[2024-06-10 23:32:20,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.76 | bwd_microstep: 1369.28 | bwd_inner_microstep: 1369.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1965
[2024-06-10 23:32:21,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.18 | bwd_microstep: 702.56 | bwd_inner_microstep: 702.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495
[2024-06-10 23:32:23,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.78 | bwd_microstep: 1431.49 | bwd_inner_microstep: 1431.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-10 23:32:25,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1388.01 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-10 23:32:27,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1399.01 | bwd_inner_microstep: 1398.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-10 23:32:29,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1414.31 | bwd_inner_microstep: 1414.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 23:32:31,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.64 | bwd_microstep: 1456.71 | bwd_inner_microstep: 1456.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3600
[2024-06-10 23:32:33,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.06 | bwd_microstep: 1516.87 | bwd_inner_microstep: 1516.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612
[2024-06-10 23:32:35,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1505.97 | bwd_inner_microstep: 1505.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-10 23:32:37,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1461.82 | bwd_inner_microstep: 1461.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-10 23:32:39,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.35 | bwd_microstep: 1646.56 | bwd_inner_microstep: 1646.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-10 23:32:41,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1475.80 | bwd_inner_microstep: 1475.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3461
[2024-06-10 23:32:43,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1426.17 | bwd_inner_microstep: 1426.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2083
[2024-06-10 23:32:44,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.33 | bwd_microstep: 820.44 | bwd_inner_microstep: 820.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3737
[2024-06-10 23:32:47,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.58 | bwd_microstep: 1696.52 | bwd_inner_microstep: 1696.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3777
[2024-06-10 23:32:50,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 23:32:50,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.56 | bwd_microstep: 3035.71 | bwd_inner_microstep: 1784.50 | bwd_allreduce_microstep: 1251.17 | step_microstep: 37.98
[2024-06-10 23:32:50,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16092.56 | bwd: 44504.71 | bwd_inner: 43252.64 | bwd_allreduce: 1251.40 | step: 39.46
{'loss': 1.193, 'learning_rate': 5.375486035264961e-06, 'epoch': 0.77}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466
[2024-06-10 23:32:53,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.46 | bwd_microstep: 1560.30 | bwd_inner_microstep: 1560.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-10 23:32:55,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1475.45 | bwd_inner_microstep: 1475.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-10 23:32:57,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.12 | bwd_microstep: 1548.90 | bwd_inner_microstep: 1548.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-10 23:32:59,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1250.31 | bwd_inner_microstep: 1250.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3439
[2024-06-10 23:33:00,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.45 | bwd_microstep: 1153.98 | bwd_inner_microstep: 1153.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-10 23:33:02,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1482.28 | bwd_inner_microstep: 1482.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-10 23:33:03,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 797.54 | bwd_inner_microstep: 797.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3907
[2024-06-10 23:33:05,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1359.28 | bwd_inner_microstep: 1359.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483
[2024-06-10 23:33:07,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1412.57 | bwd_inner_microstep: 1412.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3711
[2024-06-10 23:33:09,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.63 | bwd_microstep: 1677.81 | bwd_inner_microstep: 1677.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-10 23:33:11,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1254.26 | bwd_inner_microstep: 1254.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3491
[2024-06-10 23:33:13,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1332.94 | bwd_inner_microstep: 1332.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-10 23:33:15,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.12 | bwd_microstep: 1512.24 | bwd_inner_microstep: 1512.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-10 23:33:17,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.58 | bwd_microstep: 1515.35 | bwd_inner_microstep: 1515.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 23:33:19,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1383.07 | bwd_inner_microstep: 1383.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3188
[2024-06-10 23:33:21,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.88 | bwd_microstep: 1168.75 | bwd_inner_microstep: 1168.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 23:33:23,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1579.79 | bwd_inner_microstep: 1579.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-10 23:33:25,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.41 | bwd_microstep: 1275.49 | bwd_inner_microstep: 1275.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529
[2024-06-10 23:33:27,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1418.88 | bwd_inner_microstep: 1418.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 23:33:29,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617
[2024-06-10 23:33:31,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1610.47 | bwd_inner_microstep: 1610.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 23:33:33,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.85 | bwd_microstep: 1285.89 | bwd_inner_microstep: 1285.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2186
[2024-06-10 23:33:34,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.59 | bwd_microstep: 764.77 | bwd_inner_microstep: 764.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 23:33:35,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.90 | bwd_microstep: 1356.06 | bwd_inner_microstep: 1356.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 23:33:37,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.13 | bwd_microstep: 802.54 | bwd_inner_microstep: 802.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2067
[2024-06-10 23:33:38,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.30 | bwd_microstep: 874.53 | bwd_inner_microstep: 874.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 23:33:40,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1552.99 | bwd_inner_microstep: 1552.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-10 23:33:42,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1393.69 | bwd_inner_microstep: 1393.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-10 23:33:44,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.90 | bwd_microstep: 1358.11 | bwd_inner_microstep: 1358.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2277
[2024-06-10 23:33:45,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.08 | bwd_microstep: 820.28 | bwd_inner_microstep: 820.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3564
[2024-06-10 23:33:47,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1563.52 | bwd_inner_microstep: 1563.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-10 23:33:51,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.55 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-10 23:33:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.01 | bwd_microstep: 3743.03 | bwd_inner_microstep: 1697.21 | bwd_allreduce_microstep: 2045.76 | step_microstep: 38.87
[2024-06-10 23:33:51,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15926.94 | bwd: 44681.46 | bwd_inner: 42634.78 | bwd_allreduce: 2045.99 | step: 40.35
 77%|███████▋  | 1322/1726 [22:51:19<6:57:38, 62.03s/it]


 77%|███████▋  | 1322/1726 [22:51:19<6:57:38, 62.03s/it]
 77%|███████▋  | 1323/1726 [22:52:20<6:55:20, 61.84s/it]


 77%|███████▋  | 1323/1726 [22:52:20<6:55:20, 61.84s/it]
 77%|███████▋  | 1324/1726 [22:53:24<6:58:39, 62.49s/it]


 77%|███████▋  | 1324/1726 [22:53:25<6:58:39, 62.49s/it]
 77%|███████▋  | 1325/1726 [22:54:26<6:56:08, 62.27s/it]


 77%|███████▋  | 1325/1726 [22:54:26<6:56:08, 62.27s/it]
 77%|███████▋  | 1326/1726 [22:55:27<6:52:27, 61.87s/it]


 77%|███████▋  | 1326/1726 [22:55:27<6:52:27, 61.87s/it]
 77%{'loss': 1.2503, 'learning_rate': 5.349908552017323e-06, 'epoch': 0.77}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1906
[2024-06-10 23:33:52,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.56 | bwd_microstep: 738.61 | bwd_inner_microstep: 738.52 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 969
[2024-06-10 23:33:53,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 150.01 | bwd_microstep: 385.61 | bwd_inner_microstep: 385.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883
[2024-06-10 23:33:55,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.89 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403
[2024-06-10 23:33:57,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1304.64 | bwd_inner_microstep: 1304.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 23:33:59,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-10 23:34:00,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.31 | bwd_microstep: 787.55 | bwd_inner_microstep: 787.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3405
[2024-06-10 23:34:01,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.73 | bwd_microstep: 1210.61 | bwd_inner_microstep: 1210.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910
[2024-06-10 23:34:02,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.02 | bwd_microstep: 715.18 | bwd_inner_microstep: 715.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 23:34:05,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.93 | bwd_microstep: 1525.95 | bwd_inner_microstep: 1525.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1896
[2024-06-10 23:34:06,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.82 | bwd_microstep: 715.20 | bwd_inner_microstep: 715.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 23:34:07,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1388.20 | bwd_inner_microstep: 1388.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3501
[2024-06-10 23:34:09,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.43 | bwd_microstep: 1251.77 | bwd_inner_microstep: 1251.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2594
[2024-06-10 23:34:11,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.47 | bwd_microstep: 968.14 | bwd_inner_microstep: 968.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-10 23:34:13,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1483.54 | bwd_inner_microstep: 1483.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-10 23:34:15,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.51 | bwd_microstep: 1525.05 | bwd_inner_microstep: 1525.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-10 23:34:17,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.41 | bwd_microstep: 1612.88 | bwd_inner_microstep: 1612.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481
[2024-06-10 23:34:19,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.09 | bwd_microstep: 1573.15 | bwd_inner_microstep: 1573.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 23:34:21,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1376.03 | bwd_inner_microstep: 1376.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3553
[2024-06-10 23:34:23,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1525.87 | bwd_inner_microstep: 1525.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-10 23:34:25,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.05 | bwd_microstep: 1492.79 | bwd_inner_microstep: 1492.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 23:34:27,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.86 | bwd_microstep: 974.94 | bwd_inner_microstep: 974.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 23:34:28,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.15 | bwd_microstep: 1284.57 | bwd_inner_microstep: 1284.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-10 23:34:30,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.20 | bwd_microstep: 1160.40 | bwd_inner_microstep: 1160.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2402
[2024-06-10 23:34:31,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.85 | bwd_microstep: 1098.24 | bwd_inner_microstep: 1098.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-10 23:34:33,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.35 | bwd_microstep: 1377.48 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-10 23:34:36,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.76 | bwd_microstep: 1638.92 | bwd_inner_microstep: 1638.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1999
[2024-06-10 23:34:37,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.78 | bwd_microstep: 894.49 | bwd_inner_microstep: 894.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-10 23:34:39,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1489.09 | bwd_inner_microstep: 1489.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-10 23:34:41,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.81 | bwd_microstep: 1496.94 | bwd_inner_microstep: 1496.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-10 23:34:43,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.82 | bwd_microstep: 1282.80 | bwd_inner_microstep: 1282.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-10 23:34:45,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1315.72 | bwd_inner_microstep: 1315.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2198
[2024-06-10 23:35:39,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-10 23:35:39,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.67 | bwd_microstep: 53625.70 | bwd_inner_microstep: 1010.94 | bwd_allreduce_microstep: 52614.69 | step_microstep: 38.85
[2024-06-10 23:35:39,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14754.87 | bwd: 92075.53 | bwd_inner: 39459.85 | bwd_allreduce: 52614.98 | step: 40.31
{'loss': 1.2153, 'learning_rate': 5.32438266631561e-06, 'epoch': 0.77}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-10 23:35:41,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 1456.01 | bwd_inner_microstep: 1455.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3979
[2024-06-10 23:35:43,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.76 | bwd_microstep: 1689.42 | bwd_inner_microstep: 1689.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-10 23:35:45,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1368.57 | bwd_inner_microstep: 1368.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-10 23:35:46,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.97 | bwd_microstep: 1238.46 | bwd_inner_microstep: 1238.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-10 23:35:48,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1398.14 | bwd_inner_microstep: 1398.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-10 23:35:50,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.93 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 23:35:52,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.82 | bwd_microstep: 1381.20 | bwd_inner_microstep: 1381.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2204
[2024-06-10 23:35:54,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.78 | bwd_microstep: 886.28 | bwd_inner_microstep: 886.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-10 23:35:55,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.18 | bwd_microstep: 787.49 | bwd_inner_microstep: 787.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-10 23:35:57,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1519.69 | bwd_inner_microstep: 1519.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498
[2024-06-10 23:36:52,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.47 | bwd_microstep: 1530.41 | bwd_inner_microstep: 1530.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-10 23:36:53,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.68 | bwd_microstep: 1236.58 | bwd_inner_microstep: 1236.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3663
[2024-06-10 23:36:56,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1481.66 | bwd_inner_microstep: 1481.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914
[2024-06-10 23:36:57,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.50 | bwd_microstep: 774.04 | bwd_inner_microstep: 774.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3634
[2024-06-10 23:36:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.18 | bwd_microstep: 1794.51 | bwd_inner_microstep: 1794.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-10 23:37:01,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1474.79 | bwd_inner_microstep: 1474.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-10 23:37:03,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.77 | bwd_microstep: 1504.49 | bwd_inner_microstep: 1504.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-10 23:37:05,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.39 | bwd_microstep: 1281.78 | bwd_inner_microstep: 1281.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 23:37:07,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.89 | bwd_microstep: 1450.84 | bwd_inner_microstep: 1450.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-10 23:37:09,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.69 | bwd_microstep: 1421.21 | bwd_inner_microstep: 1421.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-10 23:37:11,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.00 | bwd_microstep: 1290.73 | bwd_inner_microstep: 1290.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-10 23:37:12,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1252.92 | bwd_inner_microstep: 1252.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-10 23:37:14,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1426.40 | bwd_inner_microstep: 1426.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 23:37:17,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.48 | bwd_microstep: 1554.55 | bwd_inner_microstep: 1554.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3775
[2024-06-10 23:37:18,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.44 | bwd_microstep: 1353.31 | bwd_inner_microstep: 1353.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-10 23:37:20,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.66 | bwd_microstep: 1293.53 | bwd_inner_microstep: 1293.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-10 23:37:22,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1440.79 | bwd_inner_microstep: 1440.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3826
[2024-06-10 23:37:25,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.24 | bwd_microstep: 1721.81 | bwd_inner_microstep: 1721.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-10 23:37:26,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.45 | bwd_microstep: 914.30 | bwd_inner_microstep: 914.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-10 23:37:27,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.18 | bwd_microstep: 803.79 | bwd_inner_microstep: 803.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3609
[2024-06-10 23:37:29,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.72 | bwd_microstep: 1471.78 | bwd_inner_microstep: 1471.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-10 23:37:33,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 23:37:33,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 3449.17 | bwd_inner_microstep: 1461.77 | bwd_allreduce_microstep: 1987.35 | step_microstep: 37.77
[2024-06-10 23:37:33,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16073.04 | bwd: 45144.44 | bwd_inner: 43156.19 | bwd_allreduce: 1987.58 | step: 39.25
{'loss': 1.1987, 'learning_rate': 5.298908468061859e-06, 'epoch': 0.77}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3406
[2024-06-10 23:37:35,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1434.81 | bwd_inner_microstep: 1434.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 23:37:36,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.05 | bwd_microstep: 677.73 | bwd_inner_microstep: 677.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-10 23:37:38,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.95 | bwd_microstep: 1477.41 | bwd_inner_microstep: 1477.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872
[2024-06-10 23:37:40,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.48 | bwd_microstep: 1660.40 | bwd_inner_microstep: 1660.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-10 23:37:42,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.44 | bwd_microstep: 1637.48 | bwd_inner_microstep: 1637.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-10 23:37:44,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-10 23:37:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 793.04 | bwd_inner_microstep: 793.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-10 23:37:47,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1249.52 | bwd_inner_microstep: 1249.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2095
[2024-06-10 23:37:48,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.91 | bwd_microstep: 822.18 | bwd_inner_microstep: 822.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-10 23:37:50,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.96 | bwd_microstep: 1275.50 | bwd_inner_microstep: 1275.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-10 23:37:51,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 790.83 | bwd_inner_microstep: 790.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-10 23:37:53,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1276.38 | bwd_inner_microstep: 1276.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 23:37:55,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.01 | bwd_microstep: 1382.88 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-10 23:37:57,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1485.43 | bwd_inner_microstep: 1485.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2934
[2024-06-10 23:37:58,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.79 | bwd_microstep: 1194.00 | bwd_inner_microstep: 1193.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3620
[2024-06-10 23:38:00,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.41 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 658
[2024-06-10 23:38:01,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.32 | bwd_microstep: 275.68 | bwd_inner_microstep: 275.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-10 23:38:02,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.61 | bwd_microstep: 1286.81 | bwd_inner_microstep: 1286.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-10 23:38:04,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.64 | bwd_microstep: 1252.51 | bwd_inner_microstep: 1252.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3702
[2024-06-10 23:38:06,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.75 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-10 23:38:07,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.97 | bwd_microstep: 799.10 | bwd_inner_microstep: 799.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-10 23:38:09,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1286.69 | bwd_inner_microstep: 1286.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 23:38:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1377.24 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2000
[2024-06-10 23:38:12,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.22 | bwd_microstep: 707.49 | bwd_inner_microstep: 707.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-10 23:38:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.83 | bwd_microstep: 1190.63 | bwd_inner_microstep: 1190.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-10 23:38:15,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.90 | bwd_microstep: 1254.98 | bwd_inner_microstep: 1254.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495
[2024-06-10 23:38:17,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.89 | bwd_microstep: 1219.76 | bwd_inner_microstep: 1219.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016
[2024-06-10 23:38:18,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.84 | bwd_microstep: 897.43 | bwd_inner_microstep: 897.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555
[2024-06-10 23:38:20,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.96 | bwd_microstep: 1203.44 | bwd_inner_microstep: 1203.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-10 23:38:22,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.55 | bwd_microstep: 1447.31 | bwd_inner_microstep: 1447.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-10 23:38:24,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.60 | bwd_microstep: 1645.09 | bwd_inner_microstep: 1645.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-10 23:38:32,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.10 | optimizer_step: 6.58
[2024-06-10 23:38:32,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.12 | bwd_microstep: 7823.85 | bwd_inner_microstep: 1809.09 | bwd_allreduce_microstep: 6014.70 | step_microstep: 37.94
[2024-06-10 23:38:32,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14465.74 | bwd: 44656.61 | bwd_inner: 38641.01 | bwd_allreduce: 6014.93 | step: 39.38
{'loss': 1.2046, 'learning_rate': 5.273486046976057e-06, 'epoch': 0.77}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 23:38:34,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.17 | bwd_microstep: 1374.05 | bwd_inner_microstep: 1374.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-10 23:38:36,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.40 | bwd_microstep: 1287.70 | bwd_inner_microstep: 1287.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-10 23:38:38,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.10 | bwd_microstep: 1277.49 | bwd_inner_microstep: 1277.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-10 23:38:40,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1337.86 | bwd_inner_microstep: 1337.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-10 23:38:42,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.67 | bwd_microstep: 1533.29 | bwd_inner_microstep: 1533.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 23:38:43,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1247.44 | bwd_inner_microstep: 1247.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-10 23:38:45,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.04 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-10 23:38:47,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.27 | bwd_microstep: 1281.06 | bwd_inner_microstep: 1281.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 23:38:49,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1282.68 | bwd_inner_microstep: 1282.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-10 23:38:51,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.51 | bwd_microstep: 1635.87 | bwd_inner_microstep: 1635.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-10 23:38:53,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.50 | bwd_microstep: 1312.49 | bwd_inner_microstep: 1312.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495
[2024-06-10 23:38:55,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-10 23:38:57,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1348.14 | bwd_inner_microstep: 1348.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-10 23:38:59,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1602.63 | bwd_inner_microstep: 1602.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107
[2024-06-10 23:39:00,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.64 | bwd_microstep: 1016.18 | bwd_inner_microstep: 1016.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2134
[2024-06-10 23:39:02,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.31 | bwd_microstep: 929.16 | bwd_inner_microstep: 929.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 23:39:04,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.26 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-10 23:39:05,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-10 23:39:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1350.79 | bwd_inner_microstep: 1350.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-10 23:39:09,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.42 | bwd_microstep: 1497.11 | bwd_inner_microstep: 1497.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-10 23:39:12,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.15 | bwd_microstep: 1658.84 | bwd_inner_microstep: 1658.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-10 23:39:14,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1411.53 | bwd_inner_microstep: 1411.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078
[2024-06-10 23:39:15,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.24 | bwd_microstep: 820.21 | bwd_inner_microstep: 820.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2042
[2024-06-10 23:39:16,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.12 | bwd_microstep: 840.36 | bwd_inner_microstep: 840.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3545
[2024-06-10 23:39:18,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.58 | bwd_microstep: 1418.87 | bwd_inner_microstep: 1418.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-10 23:39:20,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1431.45 | bwd_inner_microstep: 1431.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2673
[2024-06-10 23:39:21,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.41 | bwd_microstep: 1119.01 | bwd_inner_microstep: 1118.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3560
[2024-06-10 23:39:24,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.88 | bwd_microstep: 1542.93 | bwd_inner_microstep: 1542.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 23:39:25,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1348.29 | bwd_inner_microstep: 1348.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-10 23:39:28,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1647.05 | bwd_inner_microstep: 1647.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1884
[2024-06-10 23:39:29,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.00 | bwd_microstep: 758.73 | bwd_inner_microstep: 758.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3583
[2024-06-10 23:39:35,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.09 | optimizer_step: 6.63
[2024-06-10 23:39:35,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.19 | bwd_microstep: 5580.89 | bwd_inner_microstep: 2039.96 | bwd_allreduce_microstep: 3540.88 | step_microstep: 37.73
[2024-06-10 23:39:35,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15858.22 | bwd: 46421.76 | bwd_inner: 42879.98 | bwd_allreduce: 3541.10 | step: 39.14
{'loss': 1.1769, 'learning_rate': 5.248115492595837e-06, 'epoch': 0.77}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-10 23:39:37,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1271.61 | bwd_inner_microstep: 1271.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859
[2024-06-10 23:39:39,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.09 | bwd_microstep: 1560.54 | bwd_inner_microstep: 1560.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4075
[2024-06-10 23:39:41,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.68 | bwd_microstep: 1720.42 | bwd_inner_microstep: 1720.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793
[2024-06-10 23:39:44,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.99 | bwd_microstep: 1650.22 | bwd_inner_microstep: 1650.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2253
[2024-06-10 23:39:45,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.37 | bwd_microstep: 868.34 | bwd_inner_microstep: 868.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-10 23:39:46,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.79 | bwd_microstep: 1286.38 | bwd_inner_microstep: 1286.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-10 23:39:48,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1350.97 | bwd_inner_microstep: 1350.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-10 23:39:50,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.91 | bwd_microstep: 1284.46 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-10 23:39:51,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.69 | bwd_microstep: 798.05 | bwd_inner_microstep: 798.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-10 23:39:53,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-10 23:39:55,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.00 | bwd_microstep: 1433.30 | bwd_inner_microstep: 1433.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-10 23:39:56,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.95 | bwd_microstep: 683.89 | bwd_inner_microstep: 683.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578
[2024-06-10 23:39:58,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1600.78 | bwd_inner_microstep: 1600.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969
[2024-06-10 23:39:59,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 892.53 | bwd_inner_microstep: 892.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3660
[2024-06-10 23:40:02,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.79 | bwd_microstep: 1568.41 | bwd_inner_microstep: 1568.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642
[2024-06-10 23:40:04,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.90 | bwd_microstep: 1708.07 | bwd_inner_microstep: 1708.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-10 23:40:06,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1654.02 | bwd_inner_microstep: 1653.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3539
[2024-06-10 23:40:08,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.43 | bwd_microstep: 1342.09 | bwd_inner_microstep: 1342.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2030
[2024-06-10 23:40:09,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.38 | bwd_microstep: 743.91 | bwd_inner_microstep: 743.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458
[2024-06-10 23:40:11,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.59 | bwd_microstep: 1337.67 | bwd_inner_microstep: 1337.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-10 23:40:13,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1615.93 | bwd_inner_microstep: 1615.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3674
[2024-06-10 23:40:15,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.16 | bwd_microstep: 1326.78 | bwd_inner_microstep: 1326.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3901
[2024-06-10 23:40:18,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 689.83 | bwd_microstep: 1898.82 | bwd_inner_microstep: 1898.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3723
[2024-06-10 23:40:20,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.08 | bwd_microstep: 1465.44 | bwd_inner_microstep: 1465.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-10 23:40:22,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1492.83 | bwd_inner_microstep: 1492.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2285
[2024-06-10 23:40:23,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.86 | bwd_microstep: 910.06 | bwd_inner_microstep: 910.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3725
[2024-06-10 23:40:25,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.63 | bwd_microstep: 1399.13 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684
[2024-06-10 23:40:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1426.35 | bwd_inner_microstep: 1426.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-10 23:40:29,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.74 | bwd_microstep: 1594.63 | bwd_inner_microstep: 1594.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2066
[2024-06-10 23:40:30,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.25 | bwd_microstep: 1007.88 | bwd_inner_microstep: 1007.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438
[2024-06-10 23:40:32,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1413.22 | bwd_inner_microstep: 1413.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 23:40:39,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-10 23:40:39,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 6069.64 | bwd_inner_microstep: 1754.79 | bwd_allreduce_microstep: 4314.80 | step_microstep: 37.96
[2024-06-10 23:40:39,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16104.52 | bwd: 47719.25 | bwd_inner: 43403.56 | bwd_allreduce: 4315.03 | step: 39.36
|███████▋  | 1327/1726 [22:56:28<6:49:34, 61.59s/it]


 77%|███████▋  | 1327/1726 [22:56:28<6:49:34, 61.59s/it]
 77%|███████▋  | 1328/1726 [22:58:15<8:19:13, 75.26s/it]


 77%|███████▋  | 1328/1726 [22:58:15<8:19:13, 75.26s/it]
 77%|███████▋  | 1329/1726 [23:00:10<9:35:35, 86.99s/it]


 77%|███████▋  | 1329/1726 [23:00:10<9:35:35, 86.99s/it]
 77%|███████▋  | 1330/1726 [23:01:09<8:39:35, 78.73s/it]


 77%|███████▋  | 1330/1726 [23:01:09<8:39:35, 78.73s/it]
 77%|███████▋  | 1331/1726 [23:02:12<8:06:26, 73.89s/it]


 77%|███████▋  | 1331/1726 [23:02:12<8:06:26, 73.89s/it]
 77%|█{'loss': 1.1704, 'learning_rate': 5.222796894276172e-06, 'epoch': 0.77}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-10 23:40:41,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.48 | bwd_microstep: 1346.93 | bwd_inner_microstep: 1346.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2674
[2024-06-10 23:40:43,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.35 | bwd_microstep: 1215.21 | bwd_inner_microstep: 1215.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 23:40:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-10 23:40:46,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.33 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-10 23:40:48,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.62 | bwd_microstep: 1475.21 | bwd_inner_microstep: 1475.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 23:40:50,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1401.55 | bwd_inner_microstep: 1401.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610
[2024-06-10 23:40:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1341.42 | bwd_inner_microstep: 1341.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-10 23:40:53,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.02 | bwd_microstep: 677.41 | bwd_inner_microstep: 677.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-10 23:40:55,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1527.45 | bwd_inner_microstep: 1527.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3755
[2024-06-10 23:40:57,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.60 | bwd_microstep: 1442.55 | bwd_inner_microstep: 1442.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 23:40:59,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 23:41:01,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1285.80 | bwd_inner_microstep: 1285.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 23:41:03,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.96 | bwd_microstep: 1394.47 | bwd_inner_microstep: 1394.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-10 23:41:05,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3522
[2024-06-10 23:41:07,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1550.80 | bwd_inner_microstep: 1550.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 23:41:09,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 1483.06 | bwd_inner_microstep: 1483.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3442
[2024-06-10 23:41:11,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.87 | bwd_microstep: 1265.39 | bwd_inner_microstep: 1265.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-10 23:41:12,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1315.17 | bwd_inner_microstep: 1315.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-10 23:41:14,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1416.30 | bwd_inner_microstep: 1416.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933
[2024-06-10 23:41:16,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.23 | bwd_microstep: 1191.90 | bwd_inner_microstep: 1191.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691
[2024-06-10 23:41:18,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1432.73 | bwd_inner_microstep: 1432.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-10 23:41:20,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.62 | bwd_microstep: 1376.78 | bwd_inner_microstep: 1376.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 23:41:22,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1281.40 | bwd_inner_microstep: 1281.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-10 23:41:24,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.16 | bwd_microstep: 1656.59 | bwd_inner_microstep: 1656.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 634
[2024-06-10 23:41:24,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.39 | bwd_microstep: 262.82 | bwd_inner_microstep: 262.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2666
[2024-06-10 23:41:26,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.23 | bwd_microstep: 1026.46 | bwd_inner_microstep: 1026.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2913
[2024-06-10 23:41:27,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.40 | bwd_microstep: 1128.77 | bwd_inner_microstep: 1128.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3594
[2024-06-10 23:41:29,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1340.60 | bwd_inner_microstep: 1340.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233
[2024-06-10 23:41:30,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.71 | bwd_microstep: 962.73 | bwd_inner_microstep: 962.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-10 23:41:33,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.74 | bwd_microstep: 1494.94 | bwd_inner_microstep: 1494.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-10 23:41:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.67 | bwd_microstep: 1605.20 | bwd_inner_microstep: 1605.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 23:41:40,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.26 | optimizer_step: 6.62
[2024-06-10 23:41:40,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 4302.47 | bwd_inner_microstep: 1545.15 | bwd_allreduce_microstep: 2757.26 | step_microstep: 38.02
[2024-06-10 23:41:40,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15620.96 | bwd: 44549.73 | bwd_inner: 41791.56 | bwd_allreduce: 2757.49 | step: 39.46
{'loss': 1.198, 'learning_rate': 5.1975303411890235e-06, 'epoch': 0.77}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-10 23:41:41,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.61 | bwd_microstep: 1234.66 | bwd_inner_microstep: 1234.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-10 23:41:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1379.54 | bwd_inner_microstep: 1379.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3856
[2024-06-10 23:41:45,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.41 | bwd_microstep: 1484.04 | bwd_inner_microstep: 1484.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794
[2024-06-10 23:41:48,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1640.08 | bwd_inner_microstep: 1640.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3766
[2024-06-10 23:41:50,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.38 | bwd_microstep: 1472.08 | bwd_inner_microstep: 1472.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3595
[2024-06-10 23:41:51,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.42 | bwd_microstep: 1209.90 | bwd_inner_microstep: 1209.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 23:41:53,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1480.15 | bwd_inner_microstep: 1480.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-10 23:41:55,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1251.50 | bwd_inner_microstep: 1251.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-10 23:41:57,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.75 | bwd_microstep: 1424.60 | bwd_inner_microstep: 1424.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3677
[2024-06-10 23:41:59,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.52 | bwd_microstep: 1586.92 | bwd_inner_microstep: 1586.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-10 23:42:01,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1343.37 | bwd_inner_microstep: 1343.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 23:42:03,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1484.23 | bwd_inner_microstep: 1484.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3700
[2024-06-10 23:42:05,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.74 | bwd_microstep: 1594.37 | bwd_inner_microstep: 1594.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 23:42:07,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1346.29 | bwd_inner_microstep: 1346.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890
[2024-06-10 23:42:08,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.74 | bwd_microstep: 775.29 | bwd_inner_microstep: 775.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-10 23:42:10,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.52 | bwd_microstep: 1340.59 | bwd_inner_microstep: 1340.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-10 23:42:11,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.77 | bwd_microstep: 685.79 | bwd_inner_microstep: 685.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3641
[2024-06-10 23:42:13,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.93 | bwd_microstep: 1248.74 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-10 23:42:15,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.02 | bwd_microstep: 1526.39 | bwd_inner_microstep: 1526.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 23:42:17,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1396.06 | bwd_inner_microstep: 1396.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-10 23:42:19,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1556.03 | bwd_inner_microstep: 1556.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-10 23:42:21,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1435.62 | bwd_inner_microstep: 1435.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 23:42:23,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1480.93 | bwd_inner_microstep: 1480.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 23:42:25,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1396.36 | bwd_inner_microstep: 1396.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508
[2024-06-10 23:42:27,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1452.48 | bwd_inner_microstep: 1452.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3086
[2024-06-10 23:42:29,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1245.30 | bwd_inner_microstep: 1245.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-10 23:42:31,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1461.42 | bwd_inner_microstep: 1461.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3768
[2024-06-10 23:42:33,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.98 | bwd_microstep: 1845.56 | bwd_inner_microstep: 1845.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2402
[2024-06-10 23:42:35,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.77 | bwd_microstep: 1002.99 | bwd_inner_microstep: 1002.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3820
[2024-06-10 23:42:37,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.24 | bwd_microstep: 1802.73 | bwd_inner_microstep: 1802.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 23:42:39,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.53 | bwd_microstep: 1291.92 | bwd_inner_microstep: 1291.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-10 23:42:41,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.25 | optimizer_step: 6.64
[2024-06-10 23:42:41,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.15 | bwd_microstep: 1949.14 | bwd_inner_microstep: 1636.31 | bwd_allreduce_microstep: 312.78 | step_microstep: 42.36
[2024-06-10 23:42:41,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16548.04 | bwd: 44825.11 | bwd_inner: 44511.41 | bwd_allreduce: 313.01 | step: 43.93
{'loss': 1.1899, 'learning_rate': 5.172315922323064e-06, 'epoch': 0.77}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-10 23:42:44,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.60 | bwd_microstep: 1577.80 | bwd_inner_microstep: 1577.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3949
[2024-06-10 23:42:46,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.53 | bwd_microstep: 1591.80 | bwd_inner_microstep: 1591.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3953
[2024-06-10 23:42:48,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.52 | bwd_microstep: 1596.78 | bwd_inner_microstep: 1596.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2373
[2024-06-10 23:42:49,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.10 | bwd_microstep: 1030.25 | bwd_inner_microstep: 1030.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-10 23:42:51,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.47 | bwd_microstep: 1540.54 | bwd_inner_microstep: 1540.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756
[2024-06-10 23:42:54,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.81 | bwd_microstep: 1540.12 | bwd_inner_microstep: 1540.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-10 23:42:55,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1285.62 | bwd_inner_microstep: 1285.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 23:42:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.77 | bwd_microstep: 1550.10 | bwd_inner_microstep: 1550.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3762
[2024-06-10 23:42:59,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1373.97 | bwd_inner_microstep: 1373.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3506
[2024-06-10 23:43:01,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.78 | bwd_microstep: 1269.83 | bwd_inner_microstep: 1269.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3714
[2024-06-10 23:43:03,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.48 | bwd_microstep: 1662.98 | bwd_inner_microstep: 1662.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1984
[2024-06-10 23:43:05,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.44 | bwd_microstep: 856.99 | bwd_inner_microstep: 856.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3626
[2024-06-10 23:43:07,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.01 | bwd_microstep: 1677.31 | bwd_inner_microstep: 1677.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-10 23:43:09,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1526.81 | bwd_inner_microstep: 1526.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-10 23:43:11,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1373.58 | bwd_inner_microstep: 1373.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 23:43:13,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.74 | bwd_microstep: 1371.24 | bwd_inner_microstep: 1371.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3541
[2024-06-10 23:43:15,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.07 | bwd_microstep: 1202.86 | bwd_inner_microstep: 1202.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622
[2024-06-10 23:43:16,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1313.64 | bwd_inner_microstep: 1313.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-10 23:43:18,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1392.43 | bwd_inner_microstep: 1392.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-10 23:43:20,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.25 | bwd_microstep: 1531.44 | bwd_inner_microstep: 1531.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-10 23:43:22,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.86 | bwd_microstep: 1288.93 | bwd_inner_microstep: 1288.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-10 23:43:24,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.25 | bwd_microstep: 1502.62 | bwd_inner_microstep: 1502.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-10 23:43:26,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1405.80 | bwd_inner_microstep: 1405.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3825
[2024-06-10 23:43:28,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.26 | bwd_microstep: 1509.66 | bwd_inner_microstep: 1509.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069
[2024-06-10 23:43:30,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.21 | bwd_microstep: 917.95 | bwd_inner_microstep: 917.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377
[2024-06-10 23:43:31,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.90 | bwd_microstep: 1337.35 | bwd_inner_microstep: 1337.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1928
[2024-06-10 23:43:32,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.11 | bwd_microstep: 760.91 | bwd_inner_microstep: 760.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-10 23:43:35,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.56 | bwd_microstep: 1547.56 | bwd_inner_microstep: 1547.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-10 23:43:36,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.89 | bwd_microstep: 913.91 | bwd_inner_microstep: 913.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-10 23:43:38,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 1451.43 | bwd_inner_microstep: 1451.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-10 23:43:40,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.06 | bwd_microstep: 1558.25 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-10 23:43:44,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.07 | optimizer_step: 6.57
[2024-06-10 23:43:44,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.00 | bwd_microstep: 3628.34 | bwd_inner_microstep: 1427.16 | bwd_allreduce_microstep: 2201.13 | step_microstep: 37.78
[2024-06-10 23:43:44,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16334.26 | bwd: 46088.82 | bwd_inner: 43886.80 | bwd_allreduce: 2201.35 | step: 39.44
{'loss': 1.2227, 'learning_rate': 5.147153726483338e-06, 'epoch': 0.77}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3455
[2024-06-10 23:43:46,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.82 | bwd_microstep: 1493.62 | bwd_inner_microstep: 1493.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-10 23:43:48,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1341.65 | bwd_inner_microstep: 1341.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1878
[2024-06-10 23:43:49,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.93 | bwd_microstep: 715.24 | bwd_inner_microstep: 715.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 23:43:51,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1279.28 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 23:43:53,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.18 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-10 23:43:54,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.31 | bwd_microstep: 1381.82 | bwd_inner_microstep: 1381.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4107
[2024-06-10 23:43:57,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1528.65 | bwd_inner_microstep: 1528.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 23:43:58,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942
[2024-06-10 23:43:59,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.01 | bwd_microstep: 726.97 | bwd_inner_microstep: 726.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-10 23:44:01,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-10 23:44:03,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.95 | bwd_microstep: 1503.63 | bwd_inner_microstep: 1503.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3696
[2024-06-10 23:44:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.84 | bwd_microstep: 1590.94 | bwd_inner_microstep: 1590.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3617
[2024-06-10 23:44:08,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.75 | bwd_microstep: 1464.47 | bwd_inner_microstep: 1464.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668
[2024-06-10 23:44:09,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1321.82 | bwd_inner_microstep: 1321.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 23:44:11,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1385.27 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118
[2024-06-10 23:44:12,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.25 | bwd_microstep: 826.95 | bwd_inner_microstep: 826.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2586
[2024-06-10 23:44:14,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.81 | bwd_microstep: 1069.46 | bwd_inner_microstep: 1069.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-10 23:44:16,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.84 | bwd_microstep: 1327.64 | bwd_inner_microstep: 1327.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-10 23:44:17,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.69 | bwd_microstep: 1156.42 | bwd_inner_microstep: 1156.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3691
[2024-06-10 23:44:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1361.39 | bwd_inner_microstep: 1361.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2432
[2024-06-10 23:44:21,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.90 | bwd_microstep: 940.22 | bwd_inner_microstep: 940.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 23:44:23,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 1554.76 | bwd_inner_microstep: 1554.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-10 23:44:25,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1392.56 | bwd_inner_microstep: 1392.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-10 23:44:27,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.20 | bwd_microstep: 1598.51 | bwd_inner_microstep: 1598.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512
[2024-06-10 23:44:29,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.19 | bwd_microstep: 1410.66 | bwd_inner_microstep: 1410.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3623
[2024-06-10 23:44:31,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1574.47 | bwd_inner_microstep: 1574.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-10 23:44:33,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1394.25 | bwd_inner_microstep: 1394.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972
[2024-06-10 23:44:34,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.03 | bwd_microstep: 829.36 | bwd_inner_microstep: 829.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3923
[2024-06-10 23:44:36,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.97 | bwd_microstep: 1629.96 | bwd_inner_microstep: 1629.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554
[2024-06-10 23:44:38,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.39 | bwd_microstep: 1331.12 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579
[2024-06-10 23:44:40,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.87 | bwd_microstep: 1431.50 | bwd_inner_microstep: 1431.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572
[2024-06-10 23:44:47,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-10 23:44:47,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 6612.36 | bwd_inner_microstep: 1798.24 | bwd_allreduce_microstep: 4814.07 | step_microstep: 37.97
[2024-06-10 23:44:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15778.60 | bwd: 47115.65 | bwd_inner: 42300.68 | bwd_allreduce: 4814.30 | step: 39.45
{'loss': 1.2243, 'learning_rate': 5.12204384229098e-06, 'epoch': 0.77}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-10 23:44:49,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.67 | bwd_microstep: 1235.07 | bwd_inner_microstep: 1235.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-10 23:44:51,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.56 | bwd_microstep: 1211.11 | bwd_inner_microstep: 1211.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865
[2024-06-10 23:44:53,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.43 | bwd_microstep: 1456.46 | bwd_inner_microstep: 1456.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-10 23:44:55,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1445.55 | bwd_inner_microstep: 1445.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 23:44:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1281.71 | bwd_inner_microstep: 1281.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2153
[2024-06-10 23:44:58,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.38 | bwd_microstep: 948.68 | bwd_inner_microstep: 948.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-10 23:44:59,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1151.61 | bwd_inner_microstep: 1151.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4062
[2024-06-10 23:45:02,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.01 | bwd_microstep: 1620.16 | bwd_inner_microstep: 1620.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-10 23:45:04,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.35 | bwd_microstep: 1561.66 | bwd_inner_microstep: 1561.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3933
[2024-06-10 23:45:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1495.29 | bwd_inner_microstep: 1495.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959
[2024-06-10 23:45:07,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 703.72 | bwd_inner_microstep: 703.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-10 23:45:09,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1320.09 | bwd_inner_microstep: 1320.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-10 23:45:11,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.18 | bwd_microstep: 1517.40 | bwd_inner_microstep: 1517.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 23:45:13,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.02 | bwd_microstep: 1486.47 | bwd_inner_microstep: 1486.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3814
[2024-06-10 23:45:15,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.54 | bwd_microstep: 1854.23 | bwd_inner_microstep: 1854.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 23:45:17,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.20 | bwd_microstep: 1340.07 | bwd_inner_microstep: 1340.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-10 23:45:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1350.29 | bwd_inner_microstep: 1350.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1973
[2024-06-10 23:45:20,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.04 | bwd_microstep: 732.92 | bwd_inner_microstep: 732.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-10 23:45:21,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 798.42 | bwd_inner_microstep: 798.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2766
[2024-06-10 23:45:23,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.74 | bwd_microstep: 1020.36 | bwd_inner_microstep: 1020.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-10 23:45:25,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1386.80 | bwd_inner_microstep: 1386.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-10 23:45:26,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.24 | bwd_microstep: 1392.12 | bwd_inner_microstep: 1392.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1918
[2024-06-10 23:45:27,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.51 | bwd_microstep: 719.41 | bwd_inner_microstep: 719.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-10 23:45:29,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1255.17 | bwd_inner_microstep: 1255.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-10 23:45:31,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1511.23 | bwd_inner_microstep: 1511.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 23:45:32,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.88 | bwd_microstep: 811.10 | bwd_inner_microstep: 811.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-10 23:45:34,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1280.55 | bwd_inner_microstep: 1280.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-10 23:45:36,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.66 | bwd_microstep: 1499.82 | bwd_inner_microstep: 1499.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2266
[2024-06-10 23:45:38,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.79 | bwd_microstep: 1004.12 | bwd_inner_microstep: 1004.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3483
[2024-06-10 23:45:40,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.90 | bwd_microstep: 1526.58 | bwd_inner_microstep: 1526.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3585
[2024-06-10 23:45:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.16 | bwd_microstep: 1553.15 | bwd_inner_microstep: 1553.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3806
[2024-06-10 23:45:47,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-10 23:45:47,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.72 | bwd_microstep: 4558.14 | bwd_inner_microstep: 1908.56 | bwd_allreduce_microstep: 2649.53 | step_microstep: 37.71
[2024-06-10 23:45:47,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15415.66 | bwd: 44029.45 | bwd_inner: 41379.01 | bwd_allreduce: 2649.76 | step: 39.15
██████▋  | 1332/1726 [23:03:16<7:46:02, 70.97s/it]


 77%|███████▋  | 1332/1726 [23:03:16<7:46:02, 70.97s/it]
 77%|███████▋  | 1333/1726 [23:04:16<7:24:17, 67.83s/it]


 77%|███████▋  | 1333/1726 [23:04:16<7:24:17, 67.83s/it]
 77%|███████▋  | 1334/1726 [23:05:18<7:11:11, 66.00s/it]


 77%|███████▋  | 1334/1726 [23:05:18<7:11:11, 66.00s/it]
 77%|███████▋  | 1335/1726 [23:06:21<7:03:47, 65.03s/it]


 77%|███████▋  | 1335/1726 [23:06:21<7:03:47, 65.03s/it]
 77%|███████▋  | 1336/1726 [23:07:24<6:59:10, 64.49s/it]


 77%|███████▋  | 1336/1726 [23:07:24<6:59:10, 64.49s/it]
 77%|██�{'loss': 1.1716, 'learning_rate': 5.096986358182867e-06, 'epoch': 0.77}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-10 23:45:49,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.02 | bwd_microstep: 1471.00 | bwd_inner_microstep: 1470.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-10 23:45:50,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.42 | bwd_microstep: 677.62 | bwd_inner_microstep: 677.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3904
[2024-06-10 23:45:52,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.77 | bwd_microstep: 1516.14 | bwd_inner_microstep: 1516.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4411
[2024-06-10 23:45:54,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.24 | bwd_microstep: 1549.01 | bwd_inner_microstep: 1548.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-10 23:45:56,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.01 | bwd_microstep: 1350.52 | bwd_inner_microstep: 1350.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-10 23:45:58,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-10 23:46:00,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.95 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420
[2024-06-10 23:46:02,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.01 | bwd_microstep: 1214.66 | bwd_inner_microstep: 1214.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1892
[2024-06-10 23:46:03,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.97 | bwd_microstep: 743.16 | bwd_inner_microstep: 743.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717
[2024-06-10 23:46:05,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.94 | bwd_microstep: 1631.58 | bwd_inner_microstep: 1631.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-10 23:46:07,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.74 | bwd_microstep: 1382.53 | bwd_inner_microstep: 1382.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 23:46:09,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-10 23:46:11,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1252.62 | bwd_inner_microstep: 1252.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-10 23:46:13,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.13 | bwd_microstep: 1521.65 | bwd_inner_microstep: 1521.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 23:46:15,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.13 | bwd_microstep: 1489.20 | bwd_inner_microstep: 1489.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929
[2024-06-10 23:46:16,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.78 | bwd_microstep: 728.99 | bwd_inner_microstep: 728.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 23:46:18,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1480.47 | bwd_inner_microstep: 1480.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 23:46:19,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.16 | bwd_microstep: 677.97 | bwd_inner_microstep: 677.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-10 23:46:21,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.43 | bwd_microstep: 1604.22 | bwd_inner_microstep: 1604.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-10 23:46:23,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1247.21 | bwd_inner_microstep: 1247.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 23:46:25,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.17 | bwd_microstep: 1707.83 | bwd_inner_microstep: 1707.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 23:46:27,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1551.38 | bwd_inner_microstep: 1551.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3542
[2024-06-10 23:46:29,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.68 | bwd_microstep: 1689.57 | bwd_inner_microstep: 1689.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3429
[2024-06-10 23:46:31,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.02 | bwd_microstep: 1458.84 | bwd_inner_microstep: 1458.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3713
[2024-06-10 23:46:33,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.61 | bwd_microstep: 1238.72 | bwd_inner_microstep: 1238.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-10 23:46:35,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.07 | bwd_microstep: 1391.66 | bwd_inner_microstep: 1391.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-10 23:46:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.92 | bwd_microstep: 825.43 | bwd_inner_microstep: 825.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 23:46:39,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.26 | bwd_microstep: 1707.55 | bwd_inner_microstep: 1707.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3539
[2024-06-10 23:46:41,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1420.89 | bwd_inner_microstep: 1420.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556
[2024-06-10 23:46:42,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1359.19 | bwd_inner_microstep: 1359.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423
[2024-06-10 23:46:44,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.14 | bwd_microstep: 1151.91 | bwd_inner_microstep: 1151.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2030
[2024-06-10 23:46:50,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.35 | optimizer_step: 6.79
[2024-06-10 23:46:50,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.02 | bwd_microstep: 5229.76 | bwd_inner_microstep: 815.70 | bwd_allreduce_microstep: 4413.99 | step_microstep: 39.24
[2024-06-10 23:46:50,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15685.70 | bwd: 46431.54 | bwd_inner: 42016.63 | bwd_allreduce: 4414.23 | step: 40.78
{'loss': 1.1746, 'learning_rate': 5.071981362411327e-06, 'epoch': 0.78}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 23:46:51,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.78 | bwd_microstep: 1335.64 | bwd_inner_microstep: 1335.44 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3963
[2024-06-10 23:46:54,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.99 | bwd_microstep: 1697.51 | bwd_inner_microstep: 1697.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-10 23:46:56,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.56 | bwd_microstep: 1446.89 | bwd_inner_microstep: 1446.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2361
[2024-06-10 23:46:57,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.57 | bwd_microstep: 986.25 | bwd_inner_microstep: 986.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-10 23:46:59,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1536.71 | bwd_inner_microstep: 1536.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-10 23:47:01,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.09 | bwd_microstep: 1183.73 | bwd_inner_microstep: 1183.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-10 23:47:03,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3499
[2024-06-10 23:47:05,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1553.67 | bwd_inner_microstep: 1553.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-10 23:47:07,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.42 | bwd_microstep: 1398.03 | bwd_inner_microstep: 1398.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-10 23:47:09,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.16 | bwd_microstep: 1529.01 | bwd_inner_microstep: 1528.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-10 23:47:11,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.71 | bwd_microstep: 1285.84 | bwd_inner_microstep: 1285.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-10 23:47:12,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.34 | bwd_microstep: 1309.18 | bwd_inner_microstep: 1309.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1959
[2024-06-10 23:47:14,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.89 | bwd_microstep: 890.77 | bwd_inner_microstep: 890.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 23:47:15,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.93 | bwd_microstep: 1339.02 | bwd_inner_microstep: 1339.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521
[2024-06-10 23:47:18,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.98 | bwd_microstep: 1581.42 | bwd_inner_microstep: 1581.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-10 23:47:20,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-10 23:47:22,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.08 | bwd_microstep: 1382.53 | bwd_inner_microstep: 1382.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821
[2024-06-10 23:47:24,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.91 | bwd_microstep: 1753.83 | bwd_inner_microstep: 1753.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-10 23:47:26,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1490.43 | bwd_inner_microstep: 1490.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-10 23:47:27,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.36 | bwd_microstep: 799.07 | bwd_inner_microstep: 799.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3534
[2024-06-10 23:47:29,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.12 | bwd_microstep: 1197.98 | bwd_inner_microstep: 1197.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-10 23:47:31,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.15 | bwd_microstep: 1407.65 | bwd_inner_microstep: 1407.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-10 23:47:33,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1399.59 | bwd_inner_microstep: 1399.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-10 23:47:34,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.51 | bwd_microstep: 977.10 | bwd_inner_microstep: 977.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2939
[2024-06-10 23:47:36,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.37 | bwd_microstep: 1097.88 | bwd_inner_microstep: 1097.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-10 23:47:38,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.06 | bwd_microstep: 1549.71 | bwd_inner_microstep: 1549.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-10 23:47:40,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1460.10 | bwd_inner_microstep: 1460.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725
[2024-06-10 23:47:42,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.87 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2208
[2024-06-10 23:47:43,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.95 | bwd_microstep: 1054.62 | bwd_inner_microstep: 1054.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3556
[2024-06-10 23:47:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.86 | bwd_microstep: 1332.99 | bwd_inner_microstep: 1332.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3587
[2024-06-10 23:47:47,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.77 | bwd_microstep: 1634.17 | bwd_inner_microstep: 1634.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271
[2024-06-10 23:47:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.01 | optimizer_step: 6.58
[2024-06-10 23:47:50,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.80 | bwd_microstep: 2953.66 | bwd_inner_microstep: 1128.93 | bwd_allreduce_microstep: 1824.68 | step_microstep: 37.68
[2024-06-10 23:47:50,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15923.57 | bwd: 44636.33 | bwd_inner: 42810.59 | bwd_allreduce: 1824.99 | step: 39.24
{'loss': 1.1429, 'learning_rate': 5.047028943043826e-06, 'epoch': 0.78}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-10 23:47:52,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.37 | bwd_microstep: 1444.63 | bwd_inner_microstep: 1444.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-10 23:47:55,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3472
[2024-06-10 23:47:56,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1327.41 | bwd_inner_microstep: 1327.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3855
[2024-06-10 23:47:58,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.52 | bwd_microstep: 1520.01 | bwd_inner_microstep: 1519.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838
[2024-06-10 23:48:01,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1654.35 | bwd_inner_microstep: 1654.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-10 23:48:03,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3395
[2024-06-10 23:48:04,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.36 | bwd_microstep: 1178.11 | bwd_inner_microstep: 1178.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-10 23:48:06,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1386.82 | bwd_inner_microstep: 1386.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535
[2024-06-10 23:48:08,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1394.70 | bwd_inner_microstep: 1394.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2109
[2024-06-10 23:48:10,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.50 | bwd_microstep: 1676.62 | bwd_inner_microstep: 1676.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-10 23:48:12,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1388.90 | bwd_inner_microstep: 1388.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142
[2024-06-10 23:48:13,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.66 | bwd_microstep: 926.49 | bwd_inner_microstep: 926.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521
[2024-06-10 23:48:15,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1451.55 | bwd_inner_microstep: 1451.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-10 23:48:17,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.76 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484
[2024-06-10 23:48:20,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.00 | bwd_microstep: 1582.69 | bwd_inner_microstep: 1582.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-10 23:48:21,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.69 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1281.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-10 23:48:23,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-10 23:48:25,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1552.93 | bwd_inner_microstep: 1552.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-10 23:48:27,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1254.99 | bwd_inner_microstep: 1254.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521
[2024-06-10 23:48:29,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1323.43 | bwd_inner_microstep: 1323.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-10 23:48:31,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1386.25 | bwd_inner_microstep: 1386.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-10 23:48:32,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 807.79 | bwd_inner_microstep: 807.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3491
[2024-06-10 23:48:34,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1348.52 | bwd_inner_microstep: 1348.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-10 23:48:36,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.03 | bwd_microstep: 1280.43 | bwd_inner_microstep: 1280.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-10 23:48:38,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.44 | bwd_microstep: 1455.75 | bwd_inner_microstep: 1455.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3762
[2024-06-10 23:48:40,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.41 | bwd_microstep: 1346.53 | bwd_inner_microstep: 1346.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-10 23:48:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 1407.93 | bwd_inner_microstep: 1407.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603
[2024-06-10 23:48:43,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.28 | bwd_microstep: 1462.88 | bwd_inner_microstep: 1462.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-10 23:48:46,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.01 | bwd_microstep: 1549.62 | bwd_inner_microstep: 1549.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3568
[2024-06-10 23:48:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.82 | bwd_microstep: 1460.90 | bwd_inner_microstep: 1460.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3454
[2024-06-10 23:48:50,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.97 | bwd_microstep: 1543.60 | bwd_inner_microstep: 1543.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438
[2024-06-10 23:48:53,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 5.35 | optimizer_step: 6.63
[2024-06-10 23:48:53,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.25 | bwd_microstep: 2611.60 | bwd_inner_microstep: 1645.03 | bwd_allreduce_microstep: 966.51 | step_microstep: 39.29
[2024-06-10 23:48:53,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16386.07 | bwd: 45761.23 | bwd_inner: 44793.80 | bwd_allreduce: 966.74 | step: 40.72
{'loss': 1.2206, 'learning_rate': 5.022129187962648e-06, 'epoch': 0.78}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3423
[2024-06-10 23:48:55,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.83 | bwd_microstep: 1271.39 | bwd_inner_microstep: 1271.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3946
[2024-06-10 23:48:57,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.32 | bwd_microstep: 1690.30 | bwd_inner_microstep: 1690.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3842
[2024-06-10 23:48:59,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1489.23 | bwd_inner_microstep: 1489.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-10 23:49:00,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.84 | bwd_microstep: 677.44 | bwd_inner_microstep: 677.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 23:49:02,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.35 | bwd_microstep: 1648.74 | bwd_inner_microstep: 1648.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2257
[2024-06-10 23:49:04,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.78 | bwd_microstep: 964.73 | bwd_inner_microstep: 964.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-10 23:49:05,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.20 | bwd_microstep: 677.32 | bwd_inner_microstep: 677.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040
[2024-06-10 23:49:06,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.31 | bwd_microstep: 812.32 | bwd_inner_microstep: 812.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-10 23:49:07,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 795.45 | bwd_inner_microstep: 795.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-10 23:49:09,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1252.04 | bwd_inner_microstep: 1252.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-10 23:49:10,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1379.53 | bwd_inner_microstep: 1379.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-10 23:49:12,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1445.94 | bwd_inner_microstep: 1445.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-10 23:49:14,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1415.97 | bwd_inner_microstep: 1415.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3653
[2024-06-10 23:49:17,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.76 | bwd_microstep: 1676.73 | bwd_inner_microstep: 1676.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-10 23:49:19,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.95 | bwd_microstep: 1597.56 | bwd_inner_microstep: 1597.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675
[2024-06-10 23:49:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 1519.84 | bwd_inner_microstep: 1519.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 23:49:23,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.09 | bwd_microstep: 1480.33 | bwd_inner_microstep: 1480.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3627
[2024-06-10 23:49:25,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1527.33 | bwd_inner_microstep: 1527.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827
[2024-06-10 23:49:28,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.37 | bwd_microstep: 1756.84 | bwd_inner_microstep: 1756.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-10 23:49:30,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1644.16 | bwd_inner_microstep: 1644.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2741
[2024-06-10 23:49:31,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.24 | bwd_microstep: 1134.33 | bwd_inner_microstep: 1134.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4091
[2024-06-10 23:49:34,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.95 | bwd_microstep: 1662.68 | bwd_inner_microstep: 1662.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-10 23:49:36,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1491.45 | bwd_inner_microstep: 1491.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3804
[2024-06-10 23:49:38,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3811
[2024-06-10 23:49:40,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1408.44 | bwd_inner_microstep: 1408.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627
[2024-06-10 23:49:42,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 1441.71 | bwd_inner_microstep: 1441.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-10 23:49:43,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1398.11 | bwd_inner_microstep: 1398.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-10 23:49:46,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.31 | bwd_microstep: 1550.07 | bwd_inner_microstep: 1550.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 23:49:47,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.41 | bwd_microstep: 1279.39 | bwd_inner_microstep: 1279.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3554
[2024-06-10 23:49:50,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.87 | bwd_microstep: 1590.51 | bwd_inner_microstep: 1590.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2286
[2024-06-10 23:49:51,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.42 | bwd_microstep: 1069.39 | bwd_inner_microstep: 1069.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-10 23:49:54,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-10 23:49:54,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 2543.35 | bwd_inner_microstep: 1691.78 | bwd_allreduce_microstep: 851.52 | step_microstep: 37.63
[2024-06-10 23:49:54,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16263.80 | bwd: 44675.63 | bwd_inner: 43823.22 | bwd_allreduce: 851.74 | step: 39.10
{'loss': 1.2176, 'learning_rate': 4.997282184864613e-06, 'epoch': 0.78}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-10 23:49:56,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1471.36 | bwd_inner_microstep: 1471.27 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-10 23:49:58,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.78 | bwd_microstep: 1341.95 | bwd_inner_microstep: 1341.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-10 23:50:00,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.36 | bwd_microstep: 1274.10 | bwd_inner_microstep: 1274.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4148
[2024-06-10 23:50:02,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.03 | bwd_microstep: 1737.92 | bwd_inner_microstep: 1737.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 23:50:03,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.65 | bwd_microstep: 679.31 | bwd_inner_microstep: 679.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-10 23:50:05,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.22 | bwd_microstep: 1532.41 | bwd_inner_microstep: 1532.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091
[2024-06-10 23:50:06,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.70 | bwd_microstep: 821.80 | bwd_inner_microstep: 821.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-10 23:50:08,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1350.06 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956
[2024-06-10 23:50:09,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.28 | bwd_microstep: 826.62 | bwd_inner_microstep: 826.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-10 23:50:11,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1342.20 | bwd_inner_microstep: 1342.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3583
[2024-06-10 23:50:13,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 1425.08 | bwd_inner_microstep: 1425.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3659
[2024-06-10 23:50:16,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.13 | bwd_microstep: 1817.69 | bwd_inner_microstep: 1817.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396
[2024-06-10 23:50:18,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1339.80 | bwd_inner_microstep: 1339.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3645
[2024-06-10 23:50:20,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1456.94 | bwd_inner_microstep: 1456.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-10 23:50:22,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1391.94 | bwd_inner_microstep: 1391.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-10 23:50:23,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.09 | bwd_microstep: 1256.77 | bwd_inner_microstep: 1256.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 23:50:25,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1416.54 | bwd_inner_microstep: 1416.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-10 23:50:27,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 1416.49 | bwd_inner_microstep: 1416.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3514
[2024-06-10 23:50:29,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.89 | bwd_microstep: 1254.17 | bwd_inner_microstep: 1254.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-10 23:50:31,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1511.65 | bwd_inner_microstep: 1511.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-10 23:50:33,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1513.36 | bwd_inner_microstep: 1513.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-10 23:50:35,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1395.84 | bwd_inner_microstep: 1395.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-10 23:50:37,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1414.28 | bwd_inner_microstep: 1414.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-10 23:50:39,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.44 | bwd_microstep: 1434.99 | bwd_inner_microstep: 1434.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-10 23:50:41,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1490.63 | bwd_inner_microstep: 1490.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-10 23:50:43,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.73 | bwd_microstep: 1603.38 | bwd_inner_microstep: 1603.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2028
[2024-06-10 23:50:44,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.26 | bwd_microstep: 744.87 | bwd_inner_microstep: 744.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3806
[2024-06-10 23:50:47,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.17 | bwd_microstep: 1695.12 | bwd_inner_microstep: 1695.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3079
[2024-06-10 23:50:48,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.70 | bwd_microstep: 1145.55 | bwd_inner_microstep: 1145.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3679
[2024-06-10 23:50:50,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.74 | bwd_microstep: 1498.64 | bwd_inner_microstep: 1498.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-10 23:50:52,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.41 | bwd_microstep: 1280.29 | bwd_inner_microstep: 1280.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-10 23:50:54,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.04 | optimizer_step: 6.62
[2024-06-10 23:50:54,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1639.22 | bwd_inner_microstep: 1631.52 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.37
[2024-06-10 23:50:54,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16256.18 | bwd: 43521.01 | bwd_inner: 43512.37 | bwd_allreduce: 7.93 | step: 38.94
��████▋  | 1337/1726 [23:08:24<6:48:55, 63.07s/it]


 77%|███████▋  | 1337/1726 [23:08:24<6:48:55, 63.07s/it]
 78%|███████▊  | 1338/1726 [23:09:26<6:46:40, 62.89s/it]


 78%|███████▊  | 1338/1726 [23:09:26<6:46:40, 62.89s/it]
 78%|███████▊  | 1339/1726 [23:10:27<6:41:46, 62.29s/it]


 78%|███████▊  | 1339/1726 [23:10:27<6:41:46, 62.29s/it]
 78%|███████▊  | 1340/1726 [23:11:30<6:41:06, 62.35s/it]


 78%|███████▊  | 1340/1726 [23:11:30<6:41:06, 62.35s/it]
 78%|███████▊  | 1341/1726 [23:12:31<6:37:59, 62.02s/it]


 78%|███████▊  | 1341/1726 [23:12:31<6:37:59, 62.02s/it]
 78%|███�{'loss': 1.1218, 'learning_rate': 4.972488021260733e-06, 'epoch': 0.78}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-10 23:50:56,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.60 | bwd_microstep: 1151.54 | bwd_inner_microstep: 1151.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 23:50:58,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1490.63 | bwd_inner_microstep: 1490.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408
[2024-06-10 23:51:00,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1180.50 | bwd_inner_microstep: 1180.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-10 23:51:02,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1410.95 | bwd_inner_microstep: 1410.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-10 23:51:04,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.49 | bwd_microstep: 1538.52 | bwd_inner_microstep: 1538.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-10 23:51:06,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1405.95 | bwd_inner_microstep: 1405.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-10 23:51:08,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.98 | bwd_microstep: 1539.09 | bwd_inner_microstep: 1539.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3607
[2024-06-10 23:51:09,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1247.54 | bwd_inner_microstep: 1247.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-10 23:51:11,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1289.11 | bwd_inner_microstep: 1289.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955
[2024-06-10 23:51:12,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.99 | bwd_microstep: 824.70 | bwd_inner_microstep: 824.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-10 23:51:14,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1484.71 | bwd_inner_microstep: 1484.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3643
[2024-06-10 23:51:17,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.20 | bwd_microstep: 1811.59 | bwd_inner_microstep: 1811.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-10 23:51:18,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.30 | bwd_microstep: 780.15 | bwd_inner_microstep: 780.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-10 23:51:20,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.67 | bwd_microstep: 1340.00 | bwd_inner_microstep: 1339.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-10 23:51:22,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1387.05 | bwd_inner_microstep: 1387.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1219
[2024-06-10 23:51:22,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 180.15 | bwd_microstep: 466.83 | bwd_inner_microstep: 466.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178
[2024-06-10 23:51:24,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.00 | bwd_microstep: 857.93 | bwd_inner_microstep: 857.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2172
[2024-06-10 23:51:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.00 | bwd_microstep: 858.03 | bwd_inner_microstep: 858.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3566
[2024-06-10 23:51:27,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.79 | bwd_microstep: 1236.32 | bwd_inner_microstep: 1236.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-10 23:51:28,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1295.64 | bwd_inner_microstep: 1295.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 23:51:30,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1292.02 | bwd_inner_microstep: 1291.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-10 23:51:32,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.50 | bwd_microstep: 1495.68 | bwd_inner_microstep: 1495.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-10 23:51:34,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.41 | bwd_microstep: 1461.62 | bwd_inner_microstep: 1461.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612
[2024-06-10 23:51:36,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.02 | bwd_microstep: 1539.17 | bwd_inner_microstep: 1539.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2012
[2024-06-10 23:51:37,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.11 | bwd_microstep: 788.23 | bwd_inner_microstep: 788.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3593
[2024-06-10 23:51:39,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.68 | bwd_microstep: 1467.29 | bwd_inner_microstep: 1467.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-10 23:51:42,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.84 | bwd_microstep: 1529.16 | bwd_inner_microstep: 1529.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-10 23:51:44,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.98 | bwd_microstep: 1621.65 | bwd_inner_microstep: 1621.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3606
[2024-06-10 23:51:46,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.45 | bwd_microstep: 1339.97 | bwd_inner_microstep: 1339.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-10 23:51:48,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1510.39 | bwd_inner_microstep: 1510.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3828
[2024-06-10 23:51:50,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.84 | bwd_microstep: 1823.99 | bwd_inner_microstep: 1823.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778
[2024-06-10 23:51:56,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.11 | optimizer_step: 6.59
[2024-06-10 23:51:56,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.39 | bwd_microstep: 5250.93 | bwd_inner_microstep: 1983.33 | bwd_allreduce_microstep: 3267.54 | step_microstep: 38.28
[2024-06-10 23:51:56,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15782.54 | bwd: 45716.90 | bwd_inner: 42448.46 | bwd_allreduce: 3267.77 | step: 39.74
{'loss': 1.1477, 'learning_rate': 4.947746784475919e-06, 'epoch': 0.78}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-10 23:51:58,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.38 | bwd_microstep: 1469.05 | bwd_inner_microstep: 1469.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908
[2024-06-10 23:52:00,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.68 | bwd_microstep: 1585.65 | bwd_inner_microstep: 1585.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-10 23:52:02,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.09 | bwd_microstep: 1344.16 | bwd_inner_microstep: 1344.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-10 23:52:04,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-10 23:52:06,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.04 | bwd_microstep: 1660.02 | bwd_inner_microstep: 1660.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 23:52:08,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.63 | bwd_microstep: 1278.40 | bwd_inner_microstep: 1278.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 23:52:10,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1387.05 | bwd_inner_microstep: 1387.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-10 23:52:12,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1354.40 | bwd_inner_microstep: 1354.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-10 23:52:14,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1389.00 | bwd_inner_microstep: 1388.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-10 23:52:16,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901
[2024-06-10 23:52:17,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.71 | bwd_microstep: 684.14 | bwd_inner_microstep: 684.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3700
[2024-06-10 23:52:19,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.79 | bwd_microstep: 1757.56 | bwd_inner_microstep: 1757.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-10 23:52:21,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.83 | bwd_microstep: 1487.38 | bwd_inner_microstep: 1487.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1946
[2024-06-10 23:52:22,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 885.14 | bwd_inner_microstep: 885.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-10 23:52:24,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.73 | bwd_microstep: 1587.15 | bwd_inner_microstep: 1587.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3680
[2024-06-10 23:52:26,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.20 | bwd_microstep: 1481.03 | bwd_inner_microstep: 1481.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-10 23:52:28,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1254.47 | bwd_inner_microstep: 1254.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-10 23:52:30,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.07 | bwd_microstep: 1422.35 | bwd_inner_microstep: 1422.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3707
[2024-06-10 23:52:32,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.82 | bwd_microstep: 1333.04 | bwd_inner_microstep: 1333.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-10 23:52:34,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622
[2024-06-10 23:52:36,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.98 | bwd_microstep: 1311.82 | bwd_inner_microstep: 1311.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-10 23:52:38,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.34 | bwd_microstep: 1524.04 | bwd_inner_microstep: 1524.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-10 23:52:40,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.06 | bwd_microstep: 1315.74 | bwd_inner_microstep: 1315.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-10 23:52:42,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.62 | bwd_microstep: 1492.44 | bwd_inner_microstep: 1492.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-10 23:52:43,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.32 | bwd_microstep: 683.95 | bwd_inner_microstep: 683.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3607
[2024-06-10 23:52:45,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1340.01 | bwd_inner_microstep: 1339.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3823
[2024-06-10 23:52:46,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1388.21 | bwd_inner_microstep: 1388.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584
[2024-06-10 23:52:48,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577
[2024-06-10 23:52:50,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1362.65 | bwd_inner_microstep: 1362.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 23:52:52,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1396.18 | bwd_inner_microstep: 1396.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-10 23:52:54,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.96 | bwd_microstep: 977.93 | bwd_inner_microstep: 977.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3800
[2024-06-10 23:52:57,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-10 23:52:57,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.42 | bwd_microstep: 2821.42 | bwd_inner_microstep: 1843.96 | bwd_allreduce_microstep: 977.41 | step_microstep: 37.66
[2024-06-10 23:52:57,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.63 | bwd: 44369.89 | bwd_inner: 43391.58 | bwd_allreduce: 977.64 | step: 39.12
{'loss': 1.1985, 'learning_rate': 4.923058561648677e-06, 'epoch': 0.78}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-10 23:52:59,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.14 | bwd_microstep: 1335.26 | bwd_inner_microstep: 1335.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-10 23:53:01,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1343.33 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-10 23:53:03,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1482.92 | bwd_inner_microstep: 1482.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-10 23:53:05,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1382.88 | bwd_inner_microstep: 1382.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-10 23:53:06,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.93 | bwd_microstep: 810.65 | bwd_inner_microstep: 810.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866
[2024-06-10 23:53:07,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.48 | bwd_microstep: 707.42 | bwd_inner_microstep: 707.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-10 23:53:08,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.36 | bwd_microstep: 727.53 | bwd_inner_microstep: 727.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-10 23:53:10,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1525.14 | bwd_inner_microstep: 1525.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3676
[2024-06-10 23:53:12,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.34 | bwd_microstep: 1476.15 | bwd_inner_microstep: 1476.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2968
[2024-06-10 23:53:13,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.35 | bwd_microstep: 1100.34 | bwd_inner_microstep: 1100.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-10 23:53:15,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1280.77 | bwd_inner_microstep: 1280.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3655
[2024-06-10 23:53:18,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.56 | bwd_microstep: 1716.06 | bwd_inner_microstep: 1716.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-10 23:53:20,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.12 | bwd_microstep: 1481.24 | bwd_inner_microstep: 1481.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641
[2024-06-10 23:53:22,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1436.32 | bwd_inner_microstep: 1436.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543
[2024-06-10 23:53:23,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.73 | bwd_microstep: 1325.88 | bwd_inner_microstep: 1325.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-10 23:53:26,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.21 | bwd_microstep: 1623.97 | bwd_inner_microstep: 1623.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-10 23:53:28,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.28 | bwd_microstep: 1358.04 | bwd_inner_microstep: 1358.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-10 23:53:30,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1420.66 | bwd_inner_microstep: 1420.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-10 23:53:32,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.51 | bwd_microstep: 1555.54 | bwd_inner_microstep: 1555.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-10 23:53:34,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.51 | bwd_microstep: 1554.29 | bwd_inner_microstep: 1554.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-10 23:53:35,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.02 | bwd_microstep: 1183.24 | bwd_inner_microstep: 1183.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-10 23:53:37,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1407.94 | bwd_inner_microstep: 1407.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-10 23:53:38,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 698.07 | bwd_inner_microstep: 698.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-10 23:53:40,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.56 | bwd_microstep: 1306.17 | bwd_inner_microstep: 1306.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-10 23:53:41,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.96 | bwd_microstep: 912.31 | bwd_inner_microstep: 912.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-10 23:53:44,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.20 | bwd_microstep: 1656.16 | bwd_inner_microstep: 1656.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3804
[2024-06-10 23:53:46,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.24 | bwd_microstep: 1475.62 | bwd_inner_microstep: 1475.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 869
[2024-06-10 23:53:46,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.83 | bwd_microstep: 397.59 | bwd_inner_microstep: 397.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3782
[2024-06-10 23:53:48,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1409.11 | bwd_inner_microstep: 1409.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1963
[2024-06-10 23:53:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.56 | bwd_microstep: 730.61 | bwd_inner_microstep: 730.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-10 23:53:50,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.67 | bwd_microstep: 814.94 | bwd_inner_microstep: 814.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2048
[2024-06-10 23:53:59,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-10 23:53:59,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.59 | bwd_microstep: 7753.53 | bwd_inner_microstep: 1037.36 | bwd_allreduce_microstep: 6716.12 | step_microstep: 37.76
[2024-06-10 23:53:59,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14800.06 | bwd: 46389.69 | bwd_inner: 39672.67 | bwd_allreduce: 6716.35 | step: 39.32
{'loss': 1.2317, 'learning_rate': 4.8984234397308086e-06, 'epoch': 0.78}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940
[2024-06-10 23:54:00,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.28 | bwd_microstep: 871.77 | bwd_inner_microstep: 871.60 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-10 23:54:02,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.88 | bwd_microstep: 1701.99 | bwd_inner_microstep: 1701.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-10 23:54:04,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.96 | bwd_microstep: 1340.40 | bwd_inner_microstep: 1340.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1869
[2024-06-10 23:54:05,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.48 | bwd_microstep: 767.68 | bwd_inner_microstep: 767.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472
[2024-06-10 23:54:07,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.85 | bwd_microstep: 1213.35 | bwd_inner_microstep: 1213.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-10 23:54:08,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.65 | bwd_microstep: 1280.41 | bwd_inner_microstep: 1280.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-10 23:54:10,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1297.34 | bwd_inner_microstep: 1297.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-10 23:54:12,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.84 | bwd_microstep: 1148.77 | bwd_inner_microstep: 1148.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-10 23:54:14,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.97 | bwd_microstep: 1277.06 | bwd_inner_microstep: 1277.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489
[2024-06-10 23:54:16,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.76 | bwd_microstep: 1581.22 | bwd_inner_microstep: 1581.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966
[2024-06-10 23:54:17,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.53 | bwd_microstep: 857.97 | bwd_inner_microstep: 857.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-10 23:54:19,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.50 | bwd_microstep: 1480.57 | bwd_inner_microstep: 1480.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512
[2024-06-10 23:54:21,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.70 | bwd_microstep: 1409.72 | bwd_inner_microstep: 1409.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496
[2024-06-10 23:54:23,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1317.42 | bwd_inner_microstep: 1317.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498
[2024-06-10 23:54:25,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1549.13 | bwd_inner_microstep: 1549.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-10 23:54:26,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.53 | bwd_microstep: 889.65 | bwd_inner_microstep: 889.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3645
[2024-06-10 23:54:28,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.06 | bwd_microstep: 1569.96 | bwd_inner_microstep: 1569.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673
[2024-06-10 23:54:30,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.10 | bwd_microstep: 1326.54 | bwd_inner_microstep: 1326.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-10 23:54:32,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.83 | bwd_microstep: 1525.44 | bwd_inner_microstep: 1525.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3666
[2024-06-10 23:54:34,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.89 | bwd_microstep: 1259.88 | bwd_inner_microstep: 1259.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901
[2024-06-10 23:54:35,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.44 | bwd_microstep: 685.94 | bwd_inner_microstep: 685.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2006
[2024-06-10 23:54:36,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.89 | bwd_microstep: 852.26 | bwd_inner_microstep: 852.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-10 23:54:38,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1556.09 | bwd_inner_microstep: 1556.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-10 23:54:40,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1432.76 | bwd_inner_microstep: 1432.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-10 23:54:43,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.64 | bwd_microstep: 1654.37 | bwd_inner_microstep: 1654.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-10 23:54:45,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.60 | bwd_microstep: 1638.05 | bwd_inner_microstep: 1638.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760
[2024-06-10 23:54:47,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.36 | bwd_microstep: 1641.00 | bwd_inner_microstep: 1640.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3588
[2024-06-10 23:54:49,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1535.70 | bwd_inner_microstep: 1535.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766
[2024-06-10 23:54:51,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.11 | bwd_microstep: 1545.44 | bwd_inner_microstep: 1545.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-10 23:54:53,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.51 | bwd_microstep: 1286.31 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-10 23:54:55,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.44 | bwd_microstep: 1598.94 | bwd_inner_microstep: 1598.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-10 23:55:00,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-10 23:55:00,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.60 | bwd_microstep: 3689.06 | bwd_inner_microstep: 1878.17 | bwd_allreduce_microstep: 1810.84 | step_microstep: 37.76
[2024-06-10 23:55:00,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15871.84 | bwd: 44782.25 | bwd_inner: 42970.36 | bwd_allreduce: 1811.14 | step: 39.33
{'loss': 1.2019, 'learning_rate': 4.8738415054870735e-06, 'epoch': 0.78}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-10 23:55:02,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.28 | bwd_microstep: 1474.93 | bwd_inner_microstep: 1474.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-10 23:55:03,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.92 | bwd_microstep: 1239.92 | bwd_inner_microstep: 1239.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4296
[2024-06-10 23:55:06,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.36 | bwd_microstep: 1775.42 | bwd_inner_microstep: 1775.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-10 23:55:08,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1551.98 | bwd_inner_microstep: 1551.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-10 23:55:10,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.10 | bwd_microstep: 1277.23 | bwd_inner_microstep: 1277.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-10 23:55:11,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-10 23:55:13,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1244.42 | bwd_inner_microstep: 1244.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-10 23:55:15,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.64 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-10 23:55:17,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1298.46 | bwd_inner_microstep: 1298.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883
[2024-06-10 23:55:18,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.57 | bwd_microstep: 710.56 | bwd_inner_microstep: 710.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415
[2024-06-10 23:55:20,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.91 | bwd_microstep: 1399.39 | bwd_inner_microstep: 1399.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-10 23:55:22,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.84 | bwd_microstep: 1276.66 | bwd_inner_microstep: 1276.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3700
[2024-06-10 23:55:24,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1657.87 | bwd_inner_microstep: 1657.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3965
[2024-06-10 23:55:26,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.81 | bwd_microstep: 1626.92 | bwd_inner_microstep: 1626.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-10 23:55:28,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1349.76 | bwd_inner_microstep: 1349.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2107
[2024-06-10 23:55:29,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.96 | bwd_microstep: 825.68 | bwd_inner_microstep: 825.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-10 23:55:31,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1408.50 | bwd_inner_microstep: 1408.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-10 23:55:33,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1276.03 | bwd_inner_microstep: 1276.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-10 23:55:35,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.01 | bwd_microstep: 1421.65 | bwd_inner_microstep: 1421.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3838
[2024-06-10 23:55:37,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.23 | bwd_microstep: 1261.52 | bwd_inner_microstep: 1261.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-10 23:55:39,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.03 | bwd_microstep: 1359.57 | bwd_inner_microstep: 1359.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-10 23:55:40,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.38 | bwd_microstep: 1223.35 | bwd_inner_microstep: 1223.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-10 23:55:41,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.55 | bwd_microstep: 698.15 | bwd_inner_microstep: 698.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-10 23:55:43,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1554.70 | bwd_inner_microstep: 1554.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905
[2024-06-10 23:55:44,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.08 | bwd_microstep: 684.93 | bwd_inner_microstep: 684.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2590
[2024-06-10 23:55:46,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.64 | bwd_microstep: 852.13 | bwd_inner_microstep: 852.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3826
[2024-06-10 23:55:48,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.43 | bwd_microstep: 1687.58 | bwd_inner_microstep: 1687.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-10 23:55:50,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1295.06 | bwd_inner_microstep: 1295.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3571
[2024-06-10 23:55:52,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.67 | bwd_microstep: 1343.40 | bwd_inner_microstep: 1343.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592
[2024-06-10 23:55:54,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.76 | bwd_microstep: 1701.12 | bwd_inner_microstep: 1701.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575
[2024-06-10 23:55:56,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.31 | bwd_microstep: 1425.30 | bwd_inner_microstep: 1425.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420
[2024-06-10 23:56:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-10 23:56:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 4250.56 | bwd_inner_microstep: 1479.65 | bwd_allreduce_microstep: 2770.86 | step_microstep: 37.79
[2024-06-10 23:56:01,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15780.05 | bwd: 44977.62 | bwd_inner: 42205.86 | bwd_allreduce: 2771.08 | step: 39.36
�███▊  | 1342/1726 [23:13:31<6:33:17, 61.45s/it]


 78%|███████▊  | 1342/1726 [23:13:31<6:33:17, 61.45s/it]
 78%|███████▊  | 1343/1726 [23:14:33<6:33:00, 61.57s/it]


 78%|███████▊  | 1343/1726 [23:14:33<6:33:00, 61.57s/it]
 78%|███████▊  | 1344/1726 [23:15:34<6:30:37, 61.35s/it]


 78%|███████▊  | 1344/1726 [23:15:34<6:30:37, 61.35s/it]
 78%|███████▊  | 1345/1726 [23:16:35<6:29:54, 61.40s/it]


 78%|███████▊  | 1345/1726 [23:16:35<6:29:54, 61.40s/it]
 78%|███████▊  | 1346/1726 [23:17:36<6:28:06, 61.28s/it]


 78%|███████▊  | 1346/1726 [23:17:36<6:28:06, 61.28s/it]
 78%|█████{'loss': 1.1495, 'learning_rate': 4.849312845494936e-06, 'epoch': 0.78}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-10 23:56:02,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.20 | bwd_microstep: 1276.46 | bwd_inner_microstep: 1276.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2352
[2024-06-10 23:56:04,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.61 | bwd_microstep: 891.49 | bwd_inner_microstep: 891.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889
[2024-06-10 23:56:06,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1583.30 | bwd_inner_microstep: 1583.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 844
[2024-06-10 23:56:06,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.67 | bwd_microstep: 346.78 | bwd_inner_microstep: 346.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752
[2024-06-10 23:56:08,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1339.60 | bwd_inner_microstep: 1339.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-10 23:56:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 789.56 | bwd_inner_microstep: 789.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-10 23:56:11,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1288.20 | bwd_inner_microstep: 1288.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-10 23:56:13,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.90 | bwd_microstep: 1281.76 | bwd_inner_microstep: 1281.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-10 23:56:14,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.77 | bwd_microstep: 796.70 | bwd_inner_microstep: 796.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4175
[2024-06-10 23:56:16,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.51 | bwd_microstep: 1487.52 | bwd_inner_microstep: 1487.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-10 23:56:18,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1379.81 | bwd_inner_microstep: 1379.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-10 23:56:20,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.31 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3412
[2024-06-10 23:56:22,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1447.97 | bwd_inner_microstep: 1447.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2130
[2024-06-10 23:56:23,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.18 | bwd_microstep: 1024.13 | bwd_inner_microstep: 1024.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-10 23:56:25,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.93 | bwd_microstep: 1383.61 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-10 23:56:27,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.91 | bwd_microstep: 1603.74 | bwd_inner_microstep: 1603.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-10 23:56:28,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.40 | bwd_microstep: 801.47 | bwd_inner_microstep: 801.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-10 23:56:30,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.18 | bwd_microstep: 1160.66 | bwd_inner_microstep: 1160.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-10 23:56:32,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.95 | bwd_microstep: 1323.84 | bwd_inner_microstep: 1323.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3601
[2024-06-10 23:56:34,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1339.72 | bwd_inner_microstep: 1339.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-10 23:56:36,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1308.35 | bwd_inner_microstep: 1308.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-10 23:56:38,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1555.30 | bwd_inner_microstep: 1555.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-10 23:56:39,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.36 | bwd_microstep: 697.09 | bwd_inner_microstep: 697.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-10 23:56:40,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.62 | bwd_microstep: 1283.95 | bwd_inner_microstep: 1283.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-10 23:56:42,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.30 | bwd_microstep: 1497.12 | bwd_inner_microstep: 1497.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3624
[2024-06-10 23:56:45,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.08 | bwd_microstep: 1710.94 | bwd_inner_microstep: 1710.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3826
[2024-06-10 23:56:47,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.29 | bwd_microstep: 1602.05 | bwd_inner_microstep: 1602.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3391
[2024-06-10 23:56:49,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.67 | bwd_microstep: 1364.79 | bwd_inner_microstep: 1364.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-10 23:56:51,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1403.43 | bwd_inner_microstep: 1403.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2184
[2024-06-10 23:56:52,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.71 | bwd_microstep: 859.71 | bwd_inner_microstep: 859.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-10 23:56:54,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.12 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3460
[2024-06-10 23:57:01,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-10 23:57:01,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 5839.35 | bwd_inner_microstep: 1487.42 | bwd_allreduce_microstep: 4351.88 | step_microstep: 37.94
[2024-06-10 23:57:01,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15039.99 | bwd: 44607.02 | bwd_inner: 40254.23 | bwd_allreduce: 4352.11 | step: 39.41
{'loss': 1.1585, 'learning_rate': 4.824837546144183e-06, 'epoch': 0.78}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-10 23:57:02,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.62 | bwd_microstep: 1362.06 | bwd_inner_microstep: 1362.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-10 23:57:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.84 | bwd_microstep: 1374.94 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3901
[2024-06-10 23:57:06,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.13 | bwd_microstep: 1487.39 | bwd_inner_microstep: 1487.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-10 23:57:08,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.60 | bwd_microstep: 1340.61 | bwd_inner_microstep: 1340.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-10 23:57:09,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.43 | bwd_microstep: 679.31 | bwd_inner_microstep: 679.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-10 23:57:11,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.30 | bwd_microstep: 1478.32 | bwd_inner_microstep: 1478.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-10 23:57:13,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1405.44 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-10 23:57:15,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.80 | bwd_microstep: 1244.52 | bwd_inner_microstep: 1244.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-10 23:57:17,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.94 | bwd_microstep: 1539.79 | bwd_inner_microstep: 1539.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3470
[2024-06-10 23:57:19,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1503.79 | bwd_inner_microstep: 1503.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3687
[2024-06-10 23:57:21,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.95 | bwd_microstep: 1659.15 | bwd_inner_microstep: 1659.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-10 23:57:23,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1413.11 | bwd_inner_microstep: 1413.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-10 23:57:25,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1447.19 | bwd_inner_microstep: 1447.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1967
[2024-06-10 23:57:27,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.41 | bwd_microstep: 845.52 | bwd_inner_microstep: 845.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3483
[2024-06-10 23:57:28,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1344.94 | bwd_inner_microstep: 1344.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 23:57:30,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.42 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664
[2024-06-10 23:57:32,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.72 | bwd_microstep: 1623.87 | bwd_inner_microstep: 1623.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852
[2024-06-10 23:57:35,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.31 | bwd_microstep: 1665.43 | bwd_inner_microstep: 1665.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-10 23:57:37,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.37 | bwd_microstep: 1659.73 | bwd_inner_microstep: 1659.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-10 23:57:39,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.23 | bwd_microstep: 1499.69 | bwd_inner_microstep: 1499.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-10 23:57:41,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.98 | bwd_microstep: 1292.79 | bwd_inner_microstep: 1292.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-10 23:57:43,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.90 | bwd_microstep: 1409.51 | bwd_inner_microstep: 1409.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-10 23:57:45,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.04 | bwd_microstep: 1580.12 | bwd_inner_microstep: 1580.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-10 23:57:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1512.39 | bwd_inner_microstep: 1512.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-10 23:57:49,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1185.51 | bwd_inner_microstep: 1185.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444
[2024-06-10 23:57:50,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1285.21 | bwd_inner_microstep: 1285.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-10 23:57:52,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1438.38 | bwd_inner_microstep: 1438.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2061
[2024-06-10 23:57:54,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.94 | bwd_microstep: 846.48 | bwd_inner_microstep: 846.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-10 23:57:55,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.46 | bwd_microstep: 1278.15 | bwd_inner_microstep: 1278.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005
[2024-06-10 23:57:57,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.48 | bwd_microstep: 897.86 | bwd_inner_microstep: 897.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230
[2024-06-10 23:57:58,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.36 | bwd_microstep: 962.01 | bwd_inner_microstep: 961.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-10 23:58:01,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.06 | optimizer_step: 6.58
[2024-06-10 23:58:01,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 2851.72 | bwd_inner_microstep: 1537.61 | bwd_allreduce_microstep: 1314.05 | step_microstep: 37.72
[2024-06-10 23:58:01,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16062.29 | bwd: 44399.03 | bwd_inner: 43084.07 | bwd_allreduce: 1314.29 | step: 39.18
{'loss': 1.1965, 'learning_rate': 4.800415693636709e-06, 'epoch': 0.78}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3409
[2024-06-10 23:58:03,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1438.81 | bwd_inner_microstep: 1438.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410
[2024-06-10 23:58:05,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.85 | bwd_microstep: 1374.65 | bwd_inner_microstep: 1374.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-10 23:58:07,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1349.61 | bwd_inner_microstep: 1349.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-10 23:58:09,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.58 | bwd_microstep: 1275.44 | bwd_inner_microstep: 1275.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-10 23:58:11,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1539.07 | bwd_inner_microstep: 1539.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2663
[2024-06-10 23:58:12,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.76 | bwd_microstep: 1022.78 | bwd_inner_microstep: 1022.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2079
[2024-06-10 23:58:14,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.93 | bwd_microstep: 789.65 | bwd_inner_microstep: 789.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-10 23:58:15,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1384.18 | bwd_inner_microstep: 1384.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 739
[2024-06-10 23:58:16,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.52 | bwd_microstep: 298.34 | bwd_inner_microstep: 298.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-10 23:58:18,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-10 23:58:19,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.51 | bwd_microstep: 1253.27 | bwd_inner_microstep: 1253.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519
[2024-06-10 23:58:21,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1419.95 | bwd_inner_microstep: 1419.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3416
[2024-06-10 23:58:23,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.14 | bwd_microstep: 1471.01 | bwd_inner_microstep: 1470.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-10 23:58:25,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.45 | bwd_microstep: 1444.96 | bwd_inner_microstep: 1444.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-10 23:58:28,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.66 | bwd_microstep: 1583.95 | bwd_inner_microstep: 1583.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-10 23:58:30,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.62 | bwd_microstep: 1499.16 | bwd_inner_microstep: 1499.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-10 23:58:32,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1397.82 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-10 23:58:33,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.24 | bwd_microstep: 1158.54 | bwd_inner_microstep: 1158.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2087
[2024-06-10 23:58:34,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 918.21 | bwd_inner_microstep: 918.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293
[2024-06-10 23:58:36,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.63 | bwd_microstep: 979.93 | bwd_inner_microstep: 979.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3531
[2024-06-10 23:58:37,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.71 | bwd_microstep: 1198.18 | bwd_inner_microstep: 1198.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3622
[2024-06-10 23:58:39,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1244.54 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3713
[2024-06-10 23:58:41,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.23 | bwd_microstep: 1395.72 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-10 23:58:43,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.16 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-10 23:58:45,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1398.35 | bwd_inner_microstep: 1398.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-10 23:58:47,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.18 | bwd_microstep: 1430.55 | bwd_inner_microstep: 1430.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-10 23:58:49,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3600
[2024-06-10 23:58:51,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1570.16 | bwd_inner_microstep: 1570.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-10 23:58:53,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 1551.45 | bwd_inner_microstep: 1551.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3468
[2024-06-10 23:58:55,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1424.58 | bwd_inner_microstep: 1424.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2275
[2024-06-10 23:58:57,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.68 | bwd_microstep: 1068.85 | bwd_inner_microstep: 1068.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-10 23:59:03,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.19 | optimizer_step: 6.59
[2024-06-10 23:59:03,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.26 | bwd_microstep: 6246.76 | bwd_inner_microstep: 1686.65 | bwd_allreduce_microstep: 4560.05 | step_microstep: 38.42
[2024-06-10 23:59:03,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15527.52 | bwd: 46152.48 | bwd_inner: 41591.51 | bwd_allreduce: 4560.29 | step: 39.87
{'loss': 1.1841, 'learning_rate': 4.776047373986148e-06, 'epoch': 0.78}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-10 23:59:05,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1371.67 | bwd_inner_microstep: 1371.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-10 23:59:06,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.42 | bwd_microstep: 677.62 | bwd_inner_microstep: 677.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3856
[2024-06-10 23:59:08,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.17 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-10 23:59:10,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1486.84 | bwd_inner_microstep: 1486.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-10 23:59:12,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1277.74 | bwd_inner_microstep: 1277.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-10 23:59:14,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.87 | bwd_microstep: 1482.47 | bwd_inner_microstep: 1482.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-10 23:59:15,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.22 | bwd_microstep: 700.91 | bwd_inner_microstep: 700.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-10 23:59:17,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.14 | bwd_microstep: 1154.26 | bwd_inner_microstep: 1154.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-10 23:59:18,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.63 | bwd_microstep: 776.55 | bwd_inner_microstep: 776.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3768
[2024-06-10 23:59:20,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1403.34 | bwd_inner_microstep: 1403.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2987
[2024-06-10 23:59:21,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.07 | bwd_microstep: 1199.68 | bwd_inner_microstep: 1199.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-10 23:59:23,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.85 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-10 23:59:25,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.18 | bwd_microstep: 1287.56 | bwd_inner_microstep: 1287.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3519
[2024-06-10 23:59:27,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.56 | bwd_microstep: 1190.57 | bwd_inner_microstep: 1190.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4094
[2024-06-10 23:59:29,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.54 | bwd_microstep: 1730.61 | bwd_inner_microstep: 1730.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-10 23:59:31,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1492.02 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-10 23:59:33,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.67 | bwd_microstep: 1470.22 | bwd_inner_microstep: 1470.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829
[2024-06-10 23:59:35,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.20 | bwd_microstep: 1758.54 | bwd_inner_microstep: 1758.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-10 23:59:37,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1445.41 | bwd_inner_microstep: 1445.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-10 23:59:39,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1350.79 | bwd_inner_microstep: 1350.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2035
[2024-06-10 23:59:41,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.37 | bwd_microstep: 905.01 | bwd_inner_microstep: 904.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-10 23:59:42,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.58 | bwd_microstep: 1287.63 | bwd_inner_microstep: 1287.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-10 23:59:44,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.65 | bwd_microstep: 1298.66 | bwd_inner_microstep: 1298.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2281
[2024-06-10 23:59:45,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.05 | bwd_microstep: 938.11 | bwd_inner_microstep: 938.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-10 23:59:47,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1347.32 | bwd_inner_microstep: 1347.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-10 23:59:49,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1524.15 | bwd_inner_microstep: 1524.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3827
[2024-06-10 23:59:52,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.00 | bwd_microstep: 1705.91 | bwd_inner_microstep: 1705.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2274
[2024-06-10 23:59:53,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.05 | bwd_microstep: 1005.73 | bwd_inner_microstep: 1005.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3450
[2024-06-10 23:59:55,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.92 | bwd_microstep: 1187.87 | bwd_inner_microstep: 1187.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3608
[2024-06-10 23:59:57,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.03 | bwd_microstep: 1371.65 | bwd_inner_microstep: 1371.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-10 23:59:58,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.42 | bwd_microstep: 1182.89 | bwd_inner_microstep: 1182.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 00:00:04,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-11 00:00:04,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.10 | bwd_microstep: 4558.14 | bwd_inner_microstep: 1579.46 | bwd_allreduce_microstep: 2978.63 | step_microstep: 37.77
[2024-06-11 00:00:04,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15451.19 | bwd: 44350.22 | bwd_inner: 41370.68 | bwd_allreduce: 2978.86 | step: 39.22
{'loss': 1.1935, 'learning_rate': 4.751732673017589e-06, 'epoch': 0.78}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-11 00:00:05,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.31 | bwd_microstep: 1177.16 | bwd_inner_microstep: 1177.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 00:00:07,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1276.86 | bwd_inner_microstep: 1276.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-11 00:00:08,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.11 | bwd_microstep: 792.54 | bwd_inner_microstep: 792.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3802
[2024-06-11 00:00:10,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.93 | bwd_microstep: 1479.19 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3885
[2024-06-11 00:00:12,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.22 | bwd_microstep: 1516.29 | bwd_inner_microstep: 1516.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2766
[2024-06-11 00:00:14,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.92 | bwd_microstep: 1081.44 | bwd_inner_microstep: 1081.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 00:00:16,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.30 | bwd_microstep: 1382.84 | bwd_inner_microstep: 1382.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-11 00:00:18,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.04 | bwd_microstep: 1633.59 | bwd_inner_microstep: 1633.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3719
[2024-06-11 00:00:20,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1235.85 | bwd_inner_microstep: 1235.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 00:00:22,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.39 | bwd_microstep: 1552.07 | bwd_inner_microstep: 1552.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-11 00:00:23,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.68 | bwd_microstep: 680.52 | bwd_inner_microstep: 680.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3654
[2024-06-11 00:00:24,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.66 | bwd_microstep: 1325.33 | bwd_inner_microstep: 1325.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2199
[2024-06-11 00:00:26,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.27 | bwd_microstep: 858.31 | bwd_inner_microstep: 858.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 4030
[2024-06-11 00:00:28,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.99 | bwd_microstep: 1749.03 | bwd_inner_microstep: 1749.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123
[2024-06-11 00:00:29,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.65 | bwd_microstep: 833.60 | bwd_inner_microstep: 833.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3927
[2024-06-11 00:00:31,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1491.55 | bwd_inner_microstep: 1491.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694
[2024-06-11 00:00:33,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.20 | bwd_microstep: 1423.44 | bwd_inner_microstep: 1423.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-11 00:00:35,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.95 | bwd_microstep: 1474.02 | bwd_inner_microstep: 1474.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3661
[2024-06-11 00:00:38,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.70 | bwd_microstep: 1654.18 | bwd_inner_microstep: 1654.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469
[2024-06-11 00:00:39,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.76 | bwd_microstep: 1181.08 | bwd_inner_microstep: 1181.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 00:00:41,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.13 | bwd_microstep: 1281.14 | bwd_inner_microstep: 1281.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3538
[2024-06-11 00:00:43,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.71 | bwd_microstep: 1259.47 | bwd_inner_microstep: 1259.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-11 00:00:45,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.32 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 00:00:46,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.08 | bwd_microstep: 1258.17 | bwd_inner_microstep: 1258.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-11 00:00:48,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1525.17 | bwd_inner_microstep: 1525.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-11 00:00:50,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.97 | bwd_microstep: 1418.82 | bwd_inner_microstep: 1418.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 00:00:52,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1395.87 | bwd_inner_microstep: 1395.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-11 00:00:54,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1412.37 | bwd_inner_microstep: 1412.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919
[2024-06-11 00:00:56,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.54 | bwd_microstep: 1187.73 | bwd_inner_microstep: 1187.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-11 00:00:58,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.40 | bwd_microstep: 1419.03 | bwd_inner_microstep: 1419.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-11 00:01:00,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1507.64 | bwd_inner_microstep: 1507.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 00:01:04,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.20 | optimizer_step: 6.63
[2024-06-11 00:01:04,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.42 | bwd_microstep: 3066.23 | bwd_inner_microstep: 1810.89 | bwd_allreduce_microstep: 1255.29 | step_microstep: 37.93
[2024-06-11 00:01:04,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15924.51 | bwd: 43909.87 | bwd_inner: 42653.68 | bwd_allreduce: 1255.52 | step: 39.40
██▊  | 1347/1726 [23:18:37<6:26:42, 61.22s/it]


 78%|███████▊  | 1347/1726 [23:18:37<6:26:42, 61.22s/it]
 78%|███████▊  | 1348/1726 [23:19:37<6:23:19, 60.85s/it]


 78%|███████▊  | 1348/1726 [23:19:37<6:23:19, 60.85s/it]
 78%|███████▊  | 1349/1726 [23:20:38<6:22:12, 60.83s/it]


 78%|███████▊  | 1349/1726 [23:20:38<6:22:12, 60.83s/it]
 78%|███████▊  | 1350/1726 [23:21:40<6:23:24, 61.18s/it]


 78%|███████▊  | 1350/1726 [23:21:40<6:23:24, 61.18s/it]
 78%|███████▊  | 1351/1726 [23:22:40<6:20:24, 60.86s/it]


 78%|███████▊  | 1351/1726 [23:22:40<6:20:24, 60.86s/it]
 78%|██████�{'loss': 1.1208, 'learning_rate': 4.727471676367299e-06, 'epoch': 0.78}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-11 00:01:05,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.07 | bwd_microstep: 1148.49 | bwd_inner_microstep: 1148.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 00:01:07,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.77 | bwd_microstep: 1376.81 | bwd_inner_microstep: 1376.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 00:01:09,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.18 | bwd_microstep: 1454.95 | bwd_inner_microstep: 1454.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4171
[2024-06-11 00:01:11,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1648.70 | bwd_inner_microstep: 1648.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780
[2024-06-11 00:01:14,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.19 | bwd_microstep: 1643.62 | bwd_inner_microstep: 1643.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-11 00:01:16,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1531.36 | bwd_inner_microstep: 1531.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 00:01:18,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.88 | bwd_microstep: 1384.01 | bwd_inner_microstep: 1383.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 00:01:20,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1379.26 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478
[2024-06-11 00:01:22,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.51 | bwd_microstep: 1443.08 | bwd_inner_microstep: 1443.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3698
[2024-06-11 00:01:24,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.09 | bwd_microstep: 1717.57 | bwd_inner_microstep: 1717.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 00:01:26,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1338.77 | bwd_inner_microstep: 1338.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-11 00:01:28,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.80 | bwd_microstep: 1490.26 | bwd_inner_microstep: 1490.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 00:01:30,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1377.39 | bwd_inner_microstep: 1377.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-11 00:01:32,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.11 | bwd_microstep: 1600.89 | bwd_inner_microstep: 1600.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-11 00:01:34,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.32 | bwd_microstep: 1586.31 | bwd_inner_microstep: 1586.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2319
[2024-06-11 00:01:36,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.19 | bwd_microstep: 986.51 | bwd_inner_microstep: 986.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1968
[2024-06-11 00:01:37,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.64 | bwd_microstep: 731.78 | bwd_inner_microstep: 731.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2038
[2024-06-11 00:01:38,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.81 | bwd_microstep: 810.31 | bwd_inner_microstep: 810.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597
[2024-06-11 00:01:40,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.52 | bwd_microstep: 1507.39 | bwd_inner_microstep: 1507.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 00:01:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.40 | bwd_microstep: 1257.46 | bwd_inner_microstep: 1257.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-11 00:01:43,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.38 | bwd_microstep: 1158.00 | bwd_inner_microstep: 1157.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-11 00:01:45,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.20 | bwd_microstep: 1411.04 | bwd_inner_microstep: 1411.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-11 00:01:47,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.27 | bwd_microstep: 1391.24 | bwd_inner_microstep: 1391.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-11 00:01:49,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1405.16 | bwd_inner_microstep: 1405.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2022
[2024-06-11 00:01:50,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.36 | bwd_microstep: 902.33 | bwd_inner_microstep: 902.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3475
[2024-06-11 00:01:52,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.95 | bwd_microstep: 1245.74 | bwd_inner_microstep: 1245.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3512
[2024-06-11 00:01:54,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.13 | bwd_microstep: 1222.94 | bwd_inner_microstep: 1222.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 00:01:56,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1416.78 | bwd_inner_microstep: 1416.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3580
[2024-06-11 00:01:57,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1334.66 | bwd_inner_microstep: 1334.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-11 00:02:00,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1603.36 | bwd_inner_microstep: 1603.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 00:02:02,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.83 | bwd_microstep: 1495.45 | bwd_inner_microstep: 1495.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-11 00:02:05,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-11 00:02:05,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.85 | bwd_microstep: 2625.72 | bwd_inner_microstep: 1738.08 | bwd_allreduce_microstep: 887.58 | step_microstep: 37.66
[2024-06-11 00:02:05,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.79 | bwd: 44627.37 | bwd_inner: 43738.87 | bwd_allreduce: 887.81 | step: 39.16
{'loss': 1.143, 'learning_rate': 4.703264469482358e-06, 'epoch': 0.78}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 00:02:07,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1367.90 | bwd_inner_microstep: 1367.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2366
[2024-06-11 00:02:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.02 | bwd_microstep: 953.18 | bwd_inner_microstep: 953.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3861
[2024-06-11 00:02:10,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.49 | bwd_microstep: 1660.32 | bwd_inner_microstep: 1660.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3774
[2024-06-11 00:02:12,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.63 | bwd_microstep: 1341.52 | bwd_inner_microstep: 1341.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 00:02:14,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.20 | bwd_microstep: 1404.82 | bwd_inner_microstep: 1404.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2191
[2024-06-11 00:02:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.42 | bwd_microstep: 920.60 | bwd_inner_microstep: 920.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-11 00:02:18,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.21 | bwd_microstep: 1632.98 | bwd_inner_microstep: 1632.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-11 00:02:19,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.30 | bwd_microstep: 699.49 | bwd_inner_microstep: 699.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 00:02:21,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1390.89 | bwd_inner_microstep: 1390.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 00:02:23,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1392.61 | bwd_inner_microstep: 1392.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-11 00:02:25,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1452.76 | bwd_inner_microstep: 1452.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2192
[2024-06-11 00:02:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.40 | bwd_microstep: 1055.43 | bwd_inner_microstep: 1055.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 00:02:28,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3380
[2024-06-11 00:02:30,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.48 | bwd_microstep: 1274.14 | bwd_inner_microstep: 1274.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2949
[2024-06-11 00:02:31,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.82 | bwd_microstep: 1101.37 | bwd_inner_microstep: 1101.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 00:02:33,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1247.35 | bwd_inner_microstep: 1247.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 00:02:35,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1512.46 | bwd_inner_microstep: 1512.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-11 00:02:37,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.92 | bwd_microstep: 1532.38 | bwd_inner_microstep: 1532.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 00:02:39,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1511.38 | bwd_inner_microstep: 1511.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526
[2024-06-11 00:02:41,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1423.53 | bwd_inner_microstep: 1423.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683
[2024-06-11 00:02:43,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.74 | bwd_microstep: 1425.70 | bwd_inner_microstep: 1425.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2653
[2024-06-11 00:02:45,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.20 | bwd_microstep: 1052.99 | bwd_inner_microstep: 1052.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 00:02:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.74 | bwd_microstep: 1461.62 | bwd_inner_microstep: 1461.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 00:02:49,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1412.66 | bwd_inner_microstep: 1412.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 00:02:51,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.15 | bwd_microstep: 1399.80 | bwd_inner_microstep: 1399.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-11 00:02:52,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1412.88 | bwd_inner_microstep: 1412.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3807
[2024-06-11 00:02:54,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-11 00:02:56,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.76 | bwd_microstep: 1508.51 | bwd_inner_microstep: 1508.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3491
[2024-06-11 00:02:58,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.97 | bwd_microstep: 1221.03 | bwd_inner_microstep: 1221.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3636
[2024-06-11 00:03:00,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.94 | bwd_microstep: 1540.88 | bwd_inner_microstep: 1540.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470
[2024-06-11 00:03:02,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 1342.71 | bwd_inner_microstep: 1342.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2729
[2024-06-11 00:03:08,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-11 00:03:08,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.68 | bwd_microstep: 5175.16 | bwd_inner_microstep: 1320.44 | bwd_allreduce_microstep: 3854.66 | step_microstep: 38.20
[2024-06-11 00:03:08,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15900.59 | bwd: 46503.94 | bwd_inner: 42648.33 | bwd_allreduce: 3854.91 | step: 39.68
{'loss': 1.2024, 'learning_rate': 4.679111137620442e-06, 'epoch': 0.78}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 00:03:10,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1365.25 | bwd_inner_microstep: 1365.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2385
[2024-06-11 00:03:11,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.86 | bwd_microstep: 898.29 | bwd_inner_microstep: 898.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3881
[2024-06-11 00:03:13,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.21 | bwd_microstep: 1481.26 | bwd_inner_microstep: 1481.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4093
[2024-06-11 00:03:15,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.58 | bwd_microstep: 1626.19 | bwd_inner_microstep: 1626.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3792
[2024-06-11 00:03:17,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1407.68 | bwd_inner_microstep: 1407.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3773
[2024-06-11 00:03:19,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1488.27 | bwd_inner_microstep: 1488.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3398
[2024-06-11 00:03:21,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.33 | bwd_microstep: 1179.76 | bwd_inner_microstep: 1179.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-11 00:03:23,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.46 | bwd_microstep: 1316.11 | bwd_inner_microstep: 1316.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-11 00:03:24,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.10 | bwd_microstep: 1343.11 | bwd_inner_microstep: 1343.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-11 00:03:26,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1342.95 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2711
[2024-06-11 00:03:28,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.77 | bwd_microstep: 1129.34 | bwd_inner_microstep: 1129.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-11 00:03:29,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.81 | bwd_microstep: 793.08 | bwd_inner_microstep: 793.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 00:03:31,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1247.58 | bwd_inner_microstep: 1247.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-11 00:03:33,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.51 | bwd_microstep: 1492.19 | bwd_inner_microstep: 1492.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 00:03:35,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1397.16 | bwd_inner_microstep: 1397.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504
[2024-06-11 00:03:36,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.86 | bwd_microstep: 1190.74 | bwd_inner_microstep: 1190.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 00:03:38,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.10 | bwd_microstep: 1288.82 | bwd_inner_microstep: 1288.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2129
[2024-06-11 00:03:39,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.85 | bwd_microstep: 831.31 | bwd_inner_microstep: 831.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-11 00:03:41,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1412.01 | bwd_inner_microstep: 1411.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-11 00:03:43,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1499.13 | bwd_inner_microstep: 1499.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-11 00:03:45,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.30 | bwd_microstep: 1416.98 | bwd_inner_microstep: 1416.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-11 00:03:47,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 975.24 | bwd_inner_microstep: 975.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-11 00:03:49,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.84 | bwd_microstep: 1652.92 | bwd_inner_microstep: 1652.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-11 00:03:51,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.83 | bwd_microstep: 1293.61 | bwd_inner_microstep: 1293.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-11 00:03:52,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 804.26 | bwd_inner_microstep: 804.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 00:03:54,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1351.24 | bwd_inner_microstep: 1351.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3583
[2024-06-11 00:03:55,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.80 | bwd_microstep: 1332.39 | bwd_inner_microstep: 1332.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-11 00:03:57,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1462.16 | bwd_inner_microstep: 1462.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2280
[2024-06-11 00:03:59,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.85 | bwd_microstep: 909.46 | bwd_inner_microstep: 909.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470
[2024-06-11 00:04:01,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1403.41 | bwd_inner_microstep: 1403.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 00:04:03,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.98 | bwd_microstep: 1641.84 | bwd_inner_microstep: 1641.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3010
[2024-06-11 00:04:08,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-11 00:04:08,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.92 | bwd_microstep: 4843.59 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 3456.28 | step_microstep: 39.23
[2024-06-11 00:04:08,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15481.99 | bwd: 44817.34 | bwd_inner: 41360.13 | bwd_allreduce: 3456.53 | step: 40.74
{'loss': 1.2197, 'learning_rate': 4.655011765849448e-06, 'epoch': 0.78}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3402
[2024-06-11 00:04:10,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1289.20 | bwd_inner_microstep: 1289.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3941
[2024-06-11 00:04:12,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.46 | bwd_microstep: 1691.81 | bwd_inner_microstep: 1691.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 00:04:14,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.28 | bwd_microstep: 1342.28 | bwd_inner_microstep: 1342.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2250
[2024-06-11 00:04:15,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.94 | bwd_microstep: 867.72 | bwd_inner_microstep: 867.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3842
[2024-06-11 00:04:18,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.02 | bwd_microstep: 1660.91 | bwd_inner_microstep: 1660.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 00:04:20,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-11 00:04:22,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.26 | bwd_microstep: 1529.44 | bwd_inner_microstep: 1529.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 00:04:24,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.91 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3474
[2024-06-11 00:04:26,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1326.96 | bwd_inner_microstep: 1326.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 00:04:27,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1245.02 | bwd_inner_microstep: 1245.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-11 00:04:29,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.14 | bwd_microstep: 1308.12 | bwd_inner_microstep: 1308.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3454
[2024-06-11 00:04:31,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1301.44 | bwd_inner_microstep: 1301.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 00:04:33,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1252.73 | bwd_inner_microstep: 1252.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2482
[2024-06-11 00:04:34,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.02 | bwd_microstep: 1142.18 | bwd_inner_microstep: 1142.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-11 00:04:36,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1408.06 | bwd_inner_microstep: 1408.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933
[2024-06-11 00:04:38,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1594.29 | bwd_inner_microstep: 1594.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3864
[2024-06-11 00:04:40,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1301.40 | bwd_inner_microstep: 1301.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-11 00:04:42,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.83 | bwd_microstep: 1214.00 | bwd_inner_microstep: 1213.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-11 00:04:44,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1491.35 | bwd_inner_microstep: 1491.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-11 00:04:46,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.66 | bwd_microstep: 1429.89 | bwd_inner_microstep: 1429.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 00:04:48,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.48 | bwd_microstep: 1281.56 | bwd_inner_microstep: 1281.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1265
[2024-06-11 00:04:48,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.47 | bwd_microstep: 455.55 | bwd_inner_microstep: 455.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 763
[2024-06-11 00:04:49,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.73 | bwd_microstep: 302.72 | bwd_inner_microstep: 302.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 00:04:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.62 | bwd_microstep: 1451.91 | bwd_inner_microstep: 1451.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 00:04:53,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.89 | bwd_microstep: 1400.38 | bwd_inner_microstep: 1400.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3607
[2024-06-11 00:04:55,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.04 | bwd_microstep: 1535.60 | bwd_inner_microstep: 1535.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3812
[2024-06-11 00:04:57,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1498.70 | bwd_inner_microstep: 1498.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2283
[2024-06-11 00:04:58,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.94 | bwd_microstep: 1071.81 | bwd_inner_microstep: 1071.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3815
[2024-06-11 00:05:01,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.99 | bwd_microstep: 1620.80 | bwd_inner_microstep: 1620.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-11 00:05:02,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1397.69 | bwd_inner_microstep: 1397.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3401
[2024-06-11 00:05:04,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1398.03 | bwd_inner_microstep: 1398.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049
[2024-06-11 00:05:11,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-11 00:05:11,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.28 | bwd_microstep: 6628.66 | bwd_inner_microstep: 1039.47 | bwd_allreduce_microstep: 5589.11 | step_microstep: 38.78
[2024-06-11 00:05:11,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15561.70 | bwd: 47205.95 | bwd_inner: 41615.90 | bwd_allreduce: 5589.35 | step: 40.25
{'loss': 1.1146, 'learning_rate': 4.630966439047255e-06, 'epoch': 0.79}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-11 00:05:13,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.38 | bwd_microstep: 882.51 | bwd_inner_microstep: 882.38 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3956
[2024-06-11 00:05:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1556.70 | bwd_inner_microstep: 1556.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888
[2024-06-11 00:05:17,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1578.52 | bwd_inner_microstep: 1578.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 00:05:19,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.28 | bwd_microstep: 1372.23 | bwd_inner_microstep: 1372.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 00:05:21,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.27 | bwd_microstep: 1275.15 | bwd_inner_microstep: 1275.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 00:05:23,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 00:05:24,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1373.71 | bwd_inner_microstep: 1373.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 00:05:26,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1384.79 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 00:05:28,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1380.38 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-11 00:05:30,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1307.31 | bwd_inner_microstep: 1307.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3630
[2024-06-11 00:05:32,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1374.38 | bwd_inner_microstep: 1374.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 00:05:34,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-11 00:05:36,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.74 | bwd_microstep: 1602.77 | bwd_inner_microstep: 1602.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491
[2024-06-11 00:05:38,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.82 | bwd_microstep: 1512.78 | bwd_inner_microstep: 1512.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 00:05:40,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1251.67 | bwd_inner_microstep: 1251.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2521
[2024-06-11 00:05:41,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.48 | bwd_microstep: 867.34 | bwd_inner_microstep: 867.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-11 00:05:43,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.10 | bwd_microstep: 1612.81 | bwd_inner_microstep: 1612.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 00:05:45,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.86 | bwd_microstep: 1488.18 | bwd_inner_microstep: 1488.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-11 00:05:47,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.78 | bwd_microstep: 1510.15 | bwd_inner_microstep: 1510.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-11 00:05:49,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.54 | bwd_microstep: 1309.47 | bwd_inner_microstep: 1309.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-11 00:05:50,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 797.66 | bwd_inner_microstep: 797.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3907
[2024-06-11 00:05:53,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.78 | bwd_microstep: 1697.97 | bwd_inner_microstep: 1697.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2147
[2024-06-11 00:05:54,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.99 | bwd_microstep: 853.98 | bwd_inner_microstep: 853.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3825
[2024-06-11 00:05:56,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.04 | bwd_microstep: 1718.54 | bwd_inner_microstep: 1718.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-11 00:05:58,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.03 | bwd_microstep: 1584.71 | bwd_inner_microstep: 1584.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605
[2024-06-11 00:06:00,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.39 | bwd_microstep: 1432.62 | bwd_inner_microstep: 1432.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-11 00:06:03,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1545.47 | bwd_inner_microstep: 1545.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3594
[2024-06-11 00:06:05,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.62 | bwd_microstep: 1700.03 | bwd_inner_microstep: 1700.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3472
[2024-06-11 00:06:07,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.74 | bwd_microstep: 1443.24 | bwd_inner_microstep: 1443.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3570
[2024-06-11 00:06:09,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.64 | bwd_microstep: 1206.36 | bwd_inner_microstep: 1206.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2237
[2024-06-11 00:06:10,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.15 | bwd_microstep: 898.09 | bwd_inner_microstep: 898.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-11 00:06:12,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.04 | optimizer_step: 6.65
[2024-06-11 00:06:12,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.32 | bwd_microstep: 1535.30 | bwd_inner_microstep: 1527.54 | bwd_allreduce_microstep: 7.71 | step_microstep: 37.49
[2024-06-11 00:06:12,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16344.13 | bwd: 43817.50 | bwd_inner: 43808.80 | bwd_allreduce: 7.98 | step: 39.01
��▊  | 1352/1726 [23:23:40<6:18:05, 60.66s/it]


 78%|███████▊  | 1352/1726 [23:23:40<6:18:05, 60.66s/it]
 78%|███████▊  | 1353/1726 [23:24:42<6:18:13, 60.84s/it]


 78%|███████▊  | 1353/1726 [23:24:42<6:18:13, 60.84s/it]
 78%|███████▊  | 1354/1726 [23:25:44<6:20:44, 61.41s/it]


 78%|███████▊  | 1354/1726 [23:25:44<6:20:44, 61.41s/it]
 79%|███████▊  | 1355/1726 [23:26:45<6:18:15, 61.17s/it]


 79%|███████▊  | 1355/1726 [23:26:45<6:18:15, 61.17s/it]
 79%|███████▊  | 1356/1726 [23:27:48<6:20:48, 61.75s/it]


 79%|███████▊  | 1356/1726 [23:27:48<6:20:48, 61.75s/it]
 79%|███████�{'loss': 1.1865, 'learning_rate': 4.606975241901354e-06, 'epoch': 0.79}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 00:06:14,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 00:06:16,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1244.74 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-11 00:06:18,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1489.48 | bwd_inner_microstep: 1489.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-11 00:06:20,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.35 | bwd_microstep: 1660.19 | bwd_inner_microstep: 1660.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3758
[2024-06-11 00:06:22,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1486.48 | bwd_inner_microstep: 1486.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 00:06:24,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1285.87 | bwd_inner_microstep: 1285.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 00:06:26,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.91 | bwd_microstep: 1285.90 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2214
[2024-06-11 00:06:27,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.23 | bwd_microstep: 860.73 | bwd_inner_microstep: 860.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-11 00:06:28,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.72 | bwd_microstep: 1187.45 | bwd_inner_microstep: 1187.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-11 00:06:30,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1442.87 | bwd_inner_microstep: 1442.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 889
[2024-06-11 00:06:31,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 153.63 | bwd_microstep: 401.83 | bwd_inner_microstep: 401.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1912
[2024-06-11 00:06:32,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.55 | bwd_microstep: 796.79 | bwd_inner_microstep: 796.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 00:06:34,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1494.89 | bwd_inner_microstep: 1494.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1412
[2024-06-11 00:06:35,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 206.74 | bwd_microstep: 537.09 | bwd_inner_microstep: 537.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 00:06:37,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1483.22 | bwd_inner_microstep: 1483.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 00:06:39,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.80 | bwd_microstep: 1387.98 | bwd_inner_microstep: 1387.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 00:06:41,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 1392.71 | bwd_inner_microstep: 1392.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-11 00:06:43,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.26 | bwd_microstep: 1526.77 | bwd_inner_microstep: 1526.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-11 00:06:45,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.67 | bwd_microstep: 1430.17 | bwd_inner_microstep: 1430.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-11 00:06:47,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1295.90 | bwd_inner_microstep: 1295.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-11 00:06:49,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.80 | bwd_microstep: 1608.86 | bwd_inner_microstep: 1608.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-11 00:06:50,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.82 | bwd_microstep: 790.92 | bwd_inner_microstep: 790.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-11 00:06:51,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.51 | bwd_microstep: 702.47 | bwd_inner_microstep: 702.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 00:06:52,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.19 | bwd_microstep: 975.94 | bwd_inner_microstep: 975.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3620
[2024-06-11 00:06:54,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1343.29 | bwd_inner_microstep: 1343.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-11 00:06:56,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.67 | bwd_microstep: 1443.74 | bwd_inner_microstep: 1443.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-11 00:06:58,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.88 | bwd_microstep: 1438.67 | bwd_inner_microstep: 1438.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1937
[2024-06-11 00:06:59,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.93 | bwd_microstep: 727.68 | bwd_inner_microstep: 727.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3602
[2024-06-11 00:07:01,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.21 | bwd_microstep: 1705.47 | bwd_inner_microstep: 1705.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3839
[2024-06-11 00:07:03,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1480.31 | bwd_inner_microstep: 1480.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3428
[2024-06-11 00:07:05,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.83 | bwd_microstep: 1377.18 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-11 00:07:12,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 00:07:12,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 6418.28 | bwd_inner_microstep: 2106.39 | bwd_allreduce_microstep: 4311.84 | step_microstep: 37.85
[2024-06-11 00:07:12,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15083.18 | bwd: 45082.62 | bwd_inner: 40769.88 | bwd_allreduce: 4312.07 | step: 39.58
{'loss': 1.2323, 'learning_rate': 4.583038258908641e-06, 'epoch': 0.79}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-11 00:07:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1268.06 | bwd_inner_microstep: 1268.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888
[2024-06-11 00:07:16,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.71 | bwd_microstep: 1579.50 | bwd_inner_microstep: 1579.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-11 00:07:18,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1340.50 | bwd_inner_microstep: 1340.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3909
[2024-06-11 00:07:20,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.27 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3805
[2024-06-11 00:07:23,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.90 | bwd_microstep: 1598.28 | bwd_inner_microstep: 1598.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 00:07:24,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1387.30 | bwd_inner_microstep: 1387.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-11 00:07:26,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.32 | bwd_microstep: 804.58 | bwd_inner_microstep: 804.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2089
[2024-06-11 00:07:27,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.62 | bwd_microstep: 819.54 | bwd_inner_microstep: 819.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716
[2024-06-11 00:07:29,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.30 | bwd_microstep: 1632.23 | bwd_inner_microstep: 1632.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191
[2024-06-11 00:07:30,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.80 | bwd_microstep: 953.82 | bwd_inner_microstep: 953.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-11 00:07:32,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.46 | bwd_microstep: 1286.63 | bwd_inner_microstep: 1286.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3994
[2024-06-11 00:07:34,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.91 | bwd_microstep: 1611.01 | bwd_inner_microstep: 1610.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-11 00:07:36,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-11 00:07:37,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.47 | bwd_microstep: 903.02 | bwd_inner_microstep: 902.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-11 00:07:39,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.33 | bwd_microstep: 1322.46 | bwd_inner_microstep: 1322.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-11 00:07:41,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.09 | bwd_microstep: 1512.50 | bwd_inner_microstep: 1512.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 563
[2024-06-11 00:07:41,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 98.78 | bwd_microstep: 248.14 | bwd_inner_microstep: 248.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-11 00:07:43,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.91 | bwd_microstep: 1312.37 | bwd_inner_microstep: 1312.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 00:07:45,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1386.88 | bwd_inner_microstep: 1386.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3474
[2024-06-11 00:07:47,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.35 | bwd_microstep: 1232.83 | bwd_inner_microstep: 1232.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-11 00:07:49,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.39 | bwd_microstep: 1433.54 | bwd_inner_microstep: 1433.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714
[2024-06-11 00:07:51,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1437.89 | bwd_inner_microstep: 1437.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-11 00:07:52,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.68 | bwd_microstep: 1159.89 | bwd_inner_microstep: 1159.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-11 00:07:54,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1437.74 | bwd_inner_microstep: 1437.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534
[2024-06-11 00:07:56,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1323.29 | bwd_inner_microstep: 1323.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1923
[2024-06-11 00:07:57,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.56 | bwd_microstep: 760.34 | bwd_inner_microstep: 760.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 00:07:59,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.42 | bwd_microstep: 1253.02 | bwd_inner_microstep: 1253.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 00:08:01,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.13 | bwd_microstep: 1567.85 | bwd_inner_microstep: 1567.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-11 00:08:03,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1498.85 | bwd_inner_microstep: 1498.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 00:08:05,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1354.25 | bwd_inner_microstep: 1354.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2436
[2024-06-11 00:08:07,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.66 | bwd_microstep: 974.34 | bwd_inner_microstep: 974.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-11 00:08:16,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-11 00:08:16,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.66 | bwd_microstep: 8744.51 | bwd_inner_microstep: 1213.56 | bwd_allreduce_microstep: 7530.90 | step_microstep: 38.01
[2024-06-11 00:08:16,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15096.94 | bwd: 47916.14 | bwd_inner: 40384.35 | bwd_allreduce: 7531.12 | step: 39.46
{'loss': 1.2061, 'learning_rate': 4.559155574375025e-06, 'epoch': 0.79}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-11 00:08:18,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1333.45 | bwd_inner_microstep: 1333.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-11 00:08:19,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.53 | bwd_microstep: 1143.09 | bwd_inner_microstep: 1143.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3894
[2024-06-11 00:08:21,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.10 | bwd_microstep: 1543.79 | bwd_inner_microstep: 1543.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3778
[2024-06-11 00:08:24,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1608.38 | bwd_inner_microstep: 1608.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-11 00:08:25,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.56 | bwd_microstep: 789.72 | bwd_inner_microstep: 789.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 00:08:26,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1243.78 | bwd_inner_microstep: 1243.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2152
[2024-06-11 00:08:28,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.78 | bwd_microstep: 848.87 | bwd_inner_microstep: 848.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-11 00:08:29,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 702.95 | bwd_inner_microstep: 702.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 00:08:30,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1245.88 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-11 00:08:32,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.60 | bwd_microstep: 1276.27 | bwd_inner_microstep: 1276.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-11 00:08:34,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.42 | bwd_microstep: 1290.52 | bwd_inner_microstep: 1290.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3492
[2024-06-11 00:08:36,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1393.55 | bwd_inner_microstep: 1393.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199
[2024-06-11 00:08:37,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.28 | bwd_microstep: 958.19 | bwd_inner_microstep: 958.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465
[2024-06-11 00:08:39,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.24 | bwd_microstep: 1566.39 | bwd_inner_microstep: 1566.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-11 00:08:41,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.38 | bwd_microstep: 1418.28 | bwd_inner_microstep: 1418.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-11 00:08:43,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.68 | bwd_microstep: 1489.58 | bwd_inner_microstep: 1489.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3973
[2024-06-11 00:08:45,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.28 | bwd_microstep: 1610.89 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-11 00:08:47,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.77 | bwd_microstep: 1185.85 | bwd_inner_microstep: 1185.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-11 00:08:49,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.27 | bwd_microstep: 1609.42 | bwd_inner_microstep: 1609.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 00:08:51,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.12 | bwd_microstep: 1452.44 | bwd_inner_microstep: 1452.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 00:08:53,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.51 | bwd_microstep: 1392.67 | bwd_inner_microstep: 1392.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-11 00:08:55,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.73 | bwd_microstep: 1525.81 | bwd_inner_microstep: 1525.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-11 00:08:56,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.26 | bwd_microstep: 801.66 | bwd_inner_microstep: 801.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084
[2024-06-11 00:08:58,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.91 | bwd_microstep: 821.80 | bwd_inner_microstep: 821.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2926
[2024-06-11 00:08:59,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.76 | bwd_microstep: 1095.27 | bwd_inner_microstep: 1095.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 896
[2024-06-11 00:09:00,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.28 | bwd_microstep: 369.75 | bwd_inner_microstep: 369.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3806
[2024-06-11 00:09:02,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722
[2024-06-11 00:09:04,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.06 | bwd_microstep: 1439.02 | bwd_inner_microstep: 1438.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-11 00:09:06,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1498.09 | bwd_inner_microstep: 1498.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3754
[2024-06-11 00:09:08,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.19 | bwd_microstep: 1517.51 | bwd_inner_microstep: 1517.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 00:09:09,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1246.72 | bwd_inner_microstep: 1246.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-11 00:09:18,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.14 | optimizer_step: 6.57
[2024-06-11 00:09:18,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 7845.02 | bwd_inner_microstep: 1690.44 | bwd_allreduce_microstep: 6154.53 | step_microstep: 38.73
[2024-06-11 00:09:18,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15133.62 | bwd: 46648.37 | bwd_inner: 40492.93 | bwd_allreduce: 6154.76 | step: 40.21
{'loss': 1.1564, 'learning_rate': 4.535327272415215e-06, 'epoch': 0.79}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-11 00:09:20,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.12 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-11 00:09:22,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1280.77 | bwd_inner_microstep: 1280.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 00:09:24,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1375.44 | bwd_inner_microstep: 1375.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 00:09:25,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1373.60 | bwd_inner_microstep: 1373.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2265
[2024-06-11 00:09:27,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.63 | bwd_microstep: 918.65 | bwd_inner_microstep: 918.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-11 00:09:29,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1535.93 | bwd_inner_microstep: 1535.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-11 00:09:30,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.14 | bwd_microstep: 775.86 | bwd_inner_microstep: 775.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190
[2024-06-11 00:09:31,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.60 | bwd_microstep: 856.18 | bwd_inner_microstep: 856.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441
[2024-06-11 00:09:33,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.82 | bwd_microstep: 1186.18 | bwd_inner_microstep: 1186.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-11 00:09:35,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.34 | bwd_microstep: 1636.45 | bwd_inner_microstep: 1636.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2115
[2024-06-11 00:09:36,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.54 | bwd_microstep: 739.10 | bwd_inner_microstep: 739.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504
[2024-06-11 00:09:38,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1249.67 | bwd_inner_microstep: 1249.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 00:09:40,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.59 | bwd_microstep: 1484.94 | bwd_inner_microstep: 1484.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 00:09:42,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1386.69 | bwd_inner_microstep: 1386.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-11 00:09:44,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.69 | bwd_microstep: 1586.97 | bwd_inner_microstep: 1586.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 00:09:46,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1483.02 | bwd_inner_microstep: 1482.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-11 00:09:47,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.79 | bwd_microstep: 686.58 | bwd_inner_microstep: 686.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525
[2024-06-11 00:09:49,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.06 | bwd_microstep: 1446.50 | bwd_inner_microstep: 1446.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-11 00:09:51,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.05 | bwd_microstep: 1600.67 | bwd_inner_microstep: 1600.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-11 00:09:53,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.11 | bwd_microstep: 1433.71 | bwd_inner_microstep: 1433.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2176
[2024-06-11 00:09:54,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.41 | bwd_microstep: 794.22 | bwd_inner_microstep: 794.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646
[2024-06-11 00:09:56,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1415.51 | bwd_inner_microstep: 1415.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-11 00:09:58,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1419.63 | bwd_inner_microstep: 1419.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 00:10:00,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-11 00:10:02,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1296.44 | bwd_inner_microstep: 1296.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2897
[2024-06-11 00:10:04,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.33 | bwd_microstep: 1175.11 | bwd_inner_microstep: 1175.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 00:10:06,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.43 | bwd_microstep: 1545.69 | bwd_inner_microstep: 1545.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3717
[2024-06-11 00:10:08,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.85 | bwd_microstep: 1565.39 | bwd_inner_microstep: 1565.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754
[2024-06-11 00:10:10,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.71 | bwd_microstep: 1445.41 | bwd_inner_microstep: 1445.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 00:10:12,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.05 | bwd_microstep: 1287.22 | bwd_inner_microstep: 1287.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 00:10:14,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1400.15 | bwd_inner_microstep: 1400.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2452
[2024-06-11 00:10:19,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-11 00:10:19,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.79 | bwd_microstep: 5493.98 | bwd_inner_microstep: 1133.50 | bwd_allreduce_microstep: 4360.42 | step_microstep: 38.66
[2024-06-11 00:10:19,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15478.74 | bwd: 45807.31 | bwd_inner: 41445.97 | bwd_allreduce: 4360.66 | step: 40.12
{'loss': 1.2341, 'learning_rate': 4.511553436952356e-06, 'epoch': 0.79}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2046
[2024-06-11 00:10:21,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.21 | bwd_microstep: 842.00 | bwd_inner_microstep: 841.88 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3982
[2024-06-11 00:10:23,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.18 | bwd_microstep: 1601.25 | bwd_inner_microstep: 1601.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 00:10:25,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1371.32 | bwd_inner_microstep: 1371.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-11 00:10:27,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.20 | bwd_microstep: 1647.20 | bwd_inner_microstep: 1647.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-11 00:10:28,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.77 | bwd_microstep: 974.25 | bwd_inner_microstep: 974.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1892
[2024-06-11 00:10:29,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.15 | bwd_microstep: 713.62 | bwd_inner_microstep: 713.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-11 00:10:32,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.21 | bwd_microstep: 1635.19 | bwd_inner_microstep: 1635.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-11 00:10:34,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.50 | bwd_microstep: 1638.81 | bwd_inner_microstep: 1638.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 00:10:36,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.98 | bwd_microstep: 1482.71 | bwd_inner_microstep: 1482.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 00:10:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1247.81 | bwd_inner_microstep: 1247.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-11 00:10:40,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.52 | bwd_microstep: 1524.36 | bwd_inner_microstep: 1524.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 00:10:41,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.48 | bwd_microstep: 1257.96 | bwd_inner_microstep: 1257.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 00:10:44,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.94 | bwd_microstep: 1507.43 | bwd_inner_microstep: 1507.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1966
[2024-06-11 00:10:45,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.08 | bwd_microstep: 895.44 | bwd_inner_microstep: 895.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3627
[2024-06-11 00:10:47,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1459.09 | bwd_inner_microstep: 1459.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3642
[2024-06-11 00:10:49,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1663.58 | bwd_inner_microstep: 1663.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3503
[2024-06-11 00:10:51,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.86 | bwd_microstep: 1551.33 | bwd_inner_microstep: 1551.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2390
[2024-06-11 00:10:53,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.60 | bwd_microstep: 1028.19 | bwd_inner_microstep: 1028.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-11 00:10:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.81 | bwd_microstep: 1479.42 | bwd_inner_microstep: 1479.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1919
[2024-06-11 00:10:56,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.84 | bwd_microstep: 842.00 | bwd_inner_microstep: 841.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2686
[2024-06-11 00:10:57,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.25 | bwd_microstep: 1056.42 | bwd_inner_microstep: 1056.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-11 00:11:00,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1605.07 | bwd_inner_microstep: 1605.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831
[2024-06-11 00:11:01,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1357.62 | bwd_inner_microstep: 1357.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983
[2024-06-11 00:11:03,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 889.25 | bwd_inner_microstep: 889.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 00:11:05,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 1495.98 | bwd_inner_microstep: 1495.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3808
[2024-06-11 00:11:07,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.49 | bwd_microstep: 1686.48 | bwd_inner_microstep: 1686.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-11 00:11:09,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1531.72 | bwd_inner_microstep: 1531.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608
[2024-06-11 00:11:11,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3873
[2024-06-11 00:11:13,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.70 | bwd_microstep: 1790.45 | bwd_inner_microstep: 1790.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-11 00:11:15,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.76 | bwd_microstep: 1544.37 | bwd_inner_microstep: 1544.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573
[2024-06-11 00:11:17,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1425.90 | bwd_inner_microstep: 1425.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3688
[2024-06-11 00:11:20,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.04 | optimizer_step: 6.58
[2024-06-11 00:11:20,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.13 | bwd_microstep: 2255.91 | bwd_inner_microstep: 1546.35 | bwd_allreduce_microstep: 709.51 | step_microstep: 37.40
[2024-06-11 00:11:20,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16168.55 | bwd: 44313.67 | bwd_inner: 43603.17 | bwd_allreduce: 709.78 | step: 39.00
�  | 1357/1726 [23:28:49<6:17:29, 61.38s/it]


 79%|███████▊  | 1357/1726 [23:28:49<6:17:29, 61.38s/it]
 79%|███████▊  | 1358/1726 [23:29:49<6:14:49, 61.11s/it]


 79%|███████▊  | 1358/1726 [23:29:49<6:14:49, 61.11s/it]
 79%|███████▊  | 1359/1726 [23:30:52<6:17:53, 61.78s/it]


 79%|███████▊  | 1359/1726 [23:30:53<6:17:53, 61.78s/it]
 79%|███████▉  | 1360/1726 [23:31:55<6:17:27, 61.88s/it]


 79%|███████▉  | 1360/1726 [23:31:55<6:17:27, 61.88s/it]
 79%|███████▉  | 1361/1726 [23:32:56<6:15:57, 61.80s/it]


 79%|███████▉  | 1361/1726 [23:32:56<6:15:57, 61.80s/it]
 79%|███████▉  |{'loss': 1.2122, 'learning_rate': 4.487834151717778e-06, 'epoch': 0.79}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2393
[2024-06-11 00:11:22,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.07 | bwd_microstep: 1020.04 | bwd_inner_microstep: 1020.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 00:11:24,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.61 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-11 00:11:26,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.73 | bwd_microstep: 1654.41 | bwd_inner_microstep: 1654.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-11 00:11:28,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 00:11:30,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1496.03 | bwd_inner_microstep: 1496.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-11 00:11:32,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.47 | bwd_microstep: 1638.04 | bwd_inner_microstep: 1638.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 00:11:34,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 00:11:36,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.06 | bwd_microstep: 1400.25 | bwd_inner_microstep: 1400.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3866
[2024-06-11 00:11:38,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.49 | bwd_microstep: 1310.35 | bwd_inner_microstep: 1310.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2854
[2024-06-11 00:11:39,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.14 | bwd_microstep: 1160.59 | bwd_inner_microstep: 1160.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 00:11:41,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.13 | bwd_microstep: 1378.42 | bwd_inner_microstep: 1378.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 00:11:43,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1251.40 | bwd_inner_microstep: 1251.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-11 00:11:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1417.63 | bwd_inner_microstep: 1417.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2115
[2024-06-11 00:11:46,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.97 | bwd_microstep: 921.09 | bwd_inner_microstep: 921.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505
[2024-06-11 00:11:48,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.99 | bwd_microstep: 1346.25 | bwd_inner_microstep: 1346.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952
[2024-06-11 00:11:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.03 | bwd_microstep: 733.60 | bwd_inner_microstep: 733.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-11 00:11:51,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.29 | bwd_microstep: 1219.75 | bwd_inner_microstep: 1219.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676
[2024-06-11 00:11:53,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.68 | bwd_microstep: 1717.76 | bwd_inner_microstep: 1717.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-11 00:11:55,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.60 | bwd_microstep: 1185.24 | bwd_inner_microstep: 1185.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462
[2024-06-11 00:11:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.97 | bwd_microstep: 1180.96 | bwd_inner_microstep: 1180.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-11 00:11:58,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1392.12 | bwd_inner_microstep: 1392.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3883
[2024-06-11 00:12:00,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.17 | bwd_microstep: 1492.50 | bwd_inner_microstep: 1492.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-11 00:12:02,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1296.39 | bwd_inner_microstep: 1296.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 00:12:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.71 | bwd_microstep: 1554.42 | bwd_inner_microstep: 1554.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184
[2024-06-11 00:12:06,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.34 | bwd_microstep: 952.42 | bwd_inner_microstep: 952.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 00:12:08,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1556.02 | bwd_inner_microstep: 1555.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-11 00:12:10,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.34 | bwd_microstep: 1383.67 | bwd_inner_microstep: 1383.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 00:12:12,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2046
[2024-06-11 00:12:13,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.97 | bwd_microstep: 842.12 | bwd_inner_microstep: 842.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-11 00:12:15,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1645.94 | bwd_inner_microstep: 1645.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3582
[2024-06-11 00:12:17,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.82 | bwd_microstep: 1561.32 | bwd_inner_microstep: 1561.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-11 00:12:20,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.01 | optimizer_step: 6.65
[2024-06-11 00:12:20,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.04 | bwd_microstep: 2281.57 | bwd_inner_microstep: 1807.52 | bwd_allreduce_microstep: 474.00 | step_microstep: 37.38
[2024-06-11 00:12:20,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16002.25 | bwd: 43384.64 | bwd_inner: 42909.73 | bwd_allreduce: 474.23 | step: 38.82
{'loss': 1.1924, 'learning_rate': 4.464169500250677e-06, 'epoch': 0.79}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-11 00:12:21,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.56 | bwd_microstep: 790.15 | bwd_inner_microstep: 790.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3893
[2024-06-11 00:12:23,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.62 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3409
[2024-06-11 00:12:25,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1392.43 | bwd_inner_microstep: 1392.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-11 00:12:27,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1538.26 | bwd_inner_microstep: 1538.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 00:12:29,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1248.90 | bwd_inner_microstep: 1248.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1891
[2024-06-11 00:12:30,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.39 | bwd_microstep: 684.62 | bwd_inner_microstep: 684.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1952
[2024-06-11 00:12:31,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.67 | bwd_microstep: 699.63 | bwd_inner_microstep: 699.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-11 00:12:32,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.43 | bwd_microstep: 698.15 | bwd_inner_microstep: 698.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-11 00:12:33,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.53 | bwd_microstep: 678.21 | bwd_inner_microstep: 678.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-11 00:12:35,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1411.89 | bwd_inner_microstep: 1411.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-11 00:12:36,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 1342.58 | bwd_inner_microstep: 1342.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2751
[2024-06-11 00:12:38,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.23 | bwd_microstep: 1100.23 | bwd_inner_microstep: 1100.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3686
[2024-06-11 00:12:40,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1544.54 | bwd_inner_microstep: 1544.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-11 00:12:42,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.51 | bwd_microstep: 1641.18 | bwd_inner_microstep: 1641.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-11 00:12:44,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1386.93 | bwd_inner_microstep: 1386.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-11 00:12:45,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 798.13 | bwd_inner_microstep: 798.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670
[2024-06-11 00:12:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1324.62 | bwd_inner_microstep: 1324.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-11 00:12:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.17 | bwd_microstep: 1420.60 | bwd_inner_microstep: 1420.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016
[2024-06-11 00:12:50,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 805.24 | bwd_inner_microstep: 805.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-11 00:12:51,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.26 | bwd_microstep: 817.87 | bwd_inner_microstep: 817.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 00:12:54,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1557.58 | bwd_inner_microstep: 1557.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3693
[2024-06-11 00:12:55,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1329.64 | bwd_inner_microstep: 1329.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-11 00:12:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1511.70 | bwd_inner_microstep: 1511.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-11 00:13:00,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.36 | bwd_microstep: 1506.37 | bwd_inner_microstep: 1506.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150
[2024-06-11 00:13:01,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.73 | bwd_microstep: 849.82 | bwd_inner_microstep: 849.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3775
[2024-06-11 00:13:03,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1404.59 | bwd_inner_microstep: 1404.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946
[2024-06-11 00:13:04,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.19 | bwd_microstep: 732.05 | bwd_inner_microstep: 732.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-11 00:13:06,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1512.54 | bwd_inner_microstep: 1512.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3664
[2024-06-11 00:13:08,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.23 | bwd_microstep: 1321.99 | bwd_inner_microstep: 1321.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-11 00:13:10,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3592
[2024-06-11 00:13:11,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.37 | bwd_microstep: 1350.69 | bwd_inner_microstep: 1350.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3805
[2024-06-11 00:13:22,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.08 | optimizer_step: 6.58
[2024-06-11 00:13:22,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.39 | bwd_microstep: 9539.28 | bwd_inner_microstep: 1946.78 | bwd_allreduce_microstep: 7592.44 | step_microstep: 38.00
[2024-06-11 00:13:22,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14581.40 | bwd: 46770.25 | bwd_inner: 39176.87 | bwd_allreduce: 7592.69 | step: 39.46
{'loss': 1.1929, 'learning_rate': 4.440559565897826e-06, 'epoch': 0.79}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3404
[2024-06-11 00:13:24,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.13 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3884
[2024-06-11 00:13:26,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.17 | bwd_microstep: 1414.57 | bwd_inner_microstep: 1414.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-11 00:13:28,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.04 | bwd_microstep: 1473.47 | bwd_inner_microstep: 1473.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-11 00:13:29,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.43 | bwd_microstep: 728.39 | bwd_inner_microstep: 728.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2927
[2024-06-11 00:13:30,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.99 | bwd_microstep: 1186.64 | bwd_inner_microstep: 1186.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 00:13:32,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1548.60 | bwd_inner_microstep: 1548.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-11 00:13:33,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.25 | bwd_microstep: 682.77 | bwd_inner_microstep: 682.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-11 00:13:35,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1256.38 | bwd_inner_microstep: 1256.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-11 00:13:37,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1526.72 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-11 00:13:39,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.09 | bwd_microstep: 1347.08 | bwd_inner_microstep: 1347.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2643
[2024-06-11 00:13:41,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.04 | bwd_microstep: 1068.63 | bwd_inner_microstep: 1068.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 00:13:42,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1381.98 | bwd_inner_microstep: 1381.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1953
[2024-06-11 00:13:44,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.40 | bwd_microstep: 891.31 | bwd_inner_microstep: 891.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3668
[2024-06-11 00:13:46,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.71 | bwd_microstep: 1578.11 | bwd_inner_microstep: 1578.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3514
[2024-06-11 00:13:48,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1515.94 | bwd_inner_microstep: 1515.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675
[2024-06-11 00:13:50,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1480.06 | bwd_inner_microstep: 1480.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621
[2024-06-11 00:13:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1441.34 | bwd_inner_microstep: 1441.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3966
[2024-06-11 00:13:54,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.14 | bwd_microstep: 1634.74 | bwd_inner_microstep: 1634.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3594
[2024-06-11 00:13:56,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.63 | bwd_microstep: 1451.26 | bwd_inner_microstep: 1451.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-11 00:13:58,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1293.43 | bwd_inner_microstep: 1293.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3879
[2024-06-11 00:14:00,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1491.43 | bwd_inner_microstep: 1491.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2293
[2024-06-11 00:14:01,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.58 | bwd_microstep: 914.15 | bwd_inner_microstep: 914.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 00:14:03,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1557.84 | bwd_inner_microstep: 1557.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2029
[2024-06-11 00:14:05,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.47 | bwd_microstep: 778.66 | bwd_inner_microstep: 778.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1989
[2024-06-11 00:14:06,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.28 | bwd_microstep: 831.51 | bwd_inner_microstep: 831.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3672
[2024-06-11 00:14:08,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.40 | bwd_microstep: 1356.72 | bwd_inner_microstep: 1356.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-11 00:14:10,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.32 | bwd_microstep: 1476.80 | bwd_inner_microstep: 1476.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-11 00:14:12,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1490.67 | bwd_inner_microstep: 1490.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 00:14:14,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1646.45 | bwd_inner_microstep: 1646.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2040
[2024-06-11 00:14:15,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.32 | bwd_microstep: 905.33 | bwd_inner_microstep: 905.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-11 00:14:17,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1478.70 | bwd_inner_microstep: 1478.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-11 00:14:25,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.13 | optimizer_step: 6.61
[2024-06-11 00:14:25,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.36 | bwd_microstep: 7086.97 | bwd_inner_microstep: 1687.95 | bwd_allreduce_microstep: 5398.96 | step_microstep: 38.30
[2024-06-11 00:14:25,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15578.60 | bwd: 47294.30 | bwd_inner: 41894.43 | bwd_allreduce: 5399.19 | step: 39.82
{'loss': 1.185, 'learning_rate': 4.41700443181331e-06, 'epoch': 0.79}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-11 00:14:27,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.28 | bwd_microstep: 1593.97 | bwd_inner_microstep: 1593.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.17
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401
[2024-06-11 00:14:29,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1272.10 | bwd_inner_microstep: 1272.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 00:14:31,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1348.94 | bwd_inner_microstep: 1348.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229
[2024-06-11 00:14:32,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.97 | bwd_microstep: 956.60 | bwd_inner_microstep: 956.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4123
[2024-06-11 00:14:34,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.18 | bwd_microstep: 1582.86 | bwd_inner_microstep: 1582.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3777
[2024-06-11 00:14:36,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1406.52 | bwd_inner_microstep: 1406.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 00:14:38,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964
[2024-06-11 00:14:39,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.71 | bwd_microstep: 700.49 | bwd_inner_microstep: 700.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-11 00:14:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.39 | bwd_microstep: 1430.23 | bwd_inner_microstep: 1430.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-11 00:14:43,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.41 | bwd_microstep: 1187.74 | bwd_inner_microstep: 1187.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-11 00:14:44,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1256.02 | bwd_inner_microstep: 1256.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-11 00:14:46,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 1343.72 | bwd_inner_microstep: 1343.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380
[2024-06-11 00:14:48,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.34 | bwd_microstep: 1334.17 | bwd_inner_microstep: 1334.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 00:14:50,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.88 | bwd_microstep: 1475.25 | bwd_inner_microstep: 1475.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3653
[2024-06-11 00:14:52,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.19 | bwd_microstep: 1543.40 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475
[2024-06-11 00:14:54,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.58 | bwd_microstep: 1214.95 | bwd_inner_microstep: 1214.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-11 00:14:56,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.40 | bwd_microstep: 1661.48 | bwd_inner_microstep: 1661.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-11 00:14:58,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1507.85 | bwd_inner_microstep: 1507.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-11 00:15:01,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.48 | bwd_microstep: 1609.79 | bwd_inner_microstep: 1609.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 00:15:02,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.59 | bwd_microstep: 1253.53 | bwd_inner_microstep: 1253.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3674
[2024-06-11 00:15:04,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.11 | bwd_microstep: 1230.53 | bwd_inner_microstep: 1230.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-11 00:15:05,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.03 | bwd_microstep: 808.54 | bwd_inner_microstep: 808.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454
[2024-06-11 00:15:07,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.68 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 00:15:09,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1282.99 | bwd_inner_microstep: 1282.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 00:15:11,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.48 | bwd_microstep: 1402.70 | bwd_inner_microstep: 1402.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3828
[2024-06-11 00:15:13,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.73 | bwd_microstep: 1498.35 | bwd_inner_microstep: 1498.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3679
[2024-06-11 00:15:15,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1456.83 | bwd_inner_microstep: 1456.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2496
[2024-06-11 00:15:16,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.84 | bwd_microstep: 895.22 | bwd_inner_microstep: 895.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-11 00:15:18,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.90 | bwd_microstep: 1537.31 | bwd_inner_microstep: 1537.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-11 00:15:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1396.49 | bwd_inner_microstep: 1396.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729
[2024-06-11 00:15:22,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1365.28 | bwd_inner_microstep: 1365.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-11 00:15:26,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-11 00:15:26,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.22 | bwd_microstep: 3506.96 | bwd_inner_microstep: 1776.50 | bwd_allreduce_microstep: 1730.41 | step_microstep: 38.02
[2024-06-11 00:15:26,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16051.78 | bwd: 44720.84 | bwd_inner: 42989.39 | bwd_allreduce: 1730.71 | step: 39.63
{'loss': 1.1366, 'learning_rate': 4.393504180958166e-06, 'epoch': 0.79}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-11 00:15:27,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.71 | bwd_microstep: 675.75 | bwd_inner_microstep: 675.63 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3393
[2024-06-11 00:15:29,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.85 | bwd_microstep: 1142.89 | bwd_inner_microstep: 1142.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-11 00:15:30,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.51 | bwd_microstep: 1144.36 | bwd_inner_microstep: 1144.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778
[2024-06-11 00:15:32,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.50 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 00:15:34,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-11 00:15:35,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.05 | bwd_microstep: 1186.58 | bwd_inner_microstep: 1186.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1893
[2024-06-11 00:15:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.66 | bwd_microstep: 746.14 | bwd_inner_microstep: 746.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 00:15:38,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.43 | bwd_microstep: 1295.94 | bwd_inner_microstep: 1295.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3428
[2024-06-11 00:15:40,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.79 | bwd_microstep: 1186.42 | bwd_inner_microstep: 1186.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 00:15:42,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.12 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-11 00:15:43,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 797.15 | bwd_inner_microstep: 797.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-11 00:15:44,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 794.26 | bwd_inner_microstep: 794.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 00:15:46,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719
[2024-06-11 00:15:48,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1559.73 | bwd_inner_microstep: 1559.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2398
[2024-06-11 00:15:49,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.59 | bwd_microstep: 986.23 | bwd_inner_microstep: 986.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 00:15:51,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.65 | bwd_microstep: 1353.07 | bwd_inner_microstep: 1353.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-11 00:15:53,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1407.42 | bwd_inner_microstep: 1407.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-11 00:15:55,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.84 | bwd_microstep: 1647.49 | bwd_inner_microstep: 1647.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-11 00:15:57,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1447.71 | bwd_inner_microstep: 1447.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603
[2024-06-11 00:16:00,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.90 | bwd_microstep: 1704.43 | bwd_inner_microstep: 1704.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-11 00:16:02,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1297.77 | bwd_inner_microstep: 1297.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823
[2024-06-11 00:16:04,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1514.17 | bwd_inner_microstep: 1514.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 00:16:06,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.82 | bwd_microstep: 1653.42 | bwd_inner_microstep: 1653.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 00:16:08,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.71 | bwd_microstep: 1655.02 | bwd_inner_microstep: 1654.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 00:16:10,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1253.43 | bwd_inner_microstep: 1253.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-11 00:16:11,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 793.64 | bwd_inner_microstep: 793.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 00:16:13,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1399.38 | bwd_inner_microstep: 1399.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-11 00:16:15,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.18 | bwd_microstep: 1533.81 | bwd_inner_microstep: 1533.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2026
[2024-06-11 00:16:16,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.81 | bwd_microstep: 899.20 | bwd_inner_microstep: 899.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3562
[2024-06-11 00:16:18,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.61 | bwd_microstep: 1453.66 | bwd_inner_microstep: 1453.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-11 00:16:20,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 00:16:28,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.33 | optimizer_step: 6.59
[2024-06-11 00:16:28,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 7667.24 | bwd_inner_microstep: 1436.82 | bwd_allreduce_microstep: 6230.35 | step_microstep: 38.77
[2024-06-11 00:16:28,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15214.82 | bwd: 46877.85 | bwd_inner: 40646.49 | bwd_allreduce: 6230.63 | step: 40.34
 1362/1726 [23:33:57<6:13:08, 61.51s/it]


 79%|███████▉  | 1362/1726 [23:33:57<6:13:08, 61.51s/it]
 79%|███████▉  | 1363/1726 [23:34:57<6:08:51, 60.97s/it]


 79%|███████▉  | 1363/1726 [23:34:57<6:08:51, 60.97s/it]
 79%|███████▉  | 1364/1726 [23:35:58<6:09:06, 61.18s/it]


 79%|███████▉  | 1364/1726 [23:35:58<6:09:06, 61.18s/it]
 79%|███████▉  | 1365/1726 [23:37:02<6:11:45, 61.79s/it]


 79%|███████▉  | 1365/1726 [23:37:02<6:11:45, 61.79s/it]
 79%|███████▉  | 1366/1726 [23:38:03<6:09:30, 61.58s/it]


 79%|███████▉  | 1366/1726 [23:38:03<6:09:30, 61.58s/it]
 79%|███████▉  | 136{'loss': 1.1978, 'learning_rate': 4.370058896100171e-06, 'epoch': 0.79}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-11 00:16:31,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.85 | bwd_microstep: 1580.62 | bwd_inner_microstep: 1580.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-11 00:16:33,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.95 | bwd_microstep: 1399.75 | bwd_inner_microstep: 1399.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-11 00:16:33,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.10 | bwd_microstep: 676.37 | bwd_inner_microstep: 676.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 00:16:36,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1646.92 | bwd_inner_microstep: 1646.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2313
[2024-06-11 00:16:37,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.50 | bwd_microstep: 977.73 | bwd_inner_microstep: 977.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-11 00:16:39,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.73 | bwd_microstep: 1641.29 | bwd_inner_microstep: 1641.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416
[2024-06-11 00:16:41,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.20 | bwd_microstep: 1182.12 | bwd_inner_microstep: 1182.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-11 00:16:42,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.70 | bwd_microstep: 793.72 | bwd_inner_microstep: 793.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 00:16:44,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1389.51 | bwd_inner_microstep: 1389.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 00:16:46,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.87 | bwd_microstep: 1248.81 | bwd_inner_microstep: 1248.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-11 00:16:47,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1253.65 | bwd_inner_microstep: 1253.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909
[2024-06-11 00:16:48,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.89 | bwd_microstep: 687.39 | bwd_inner_microstep: 687.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975
[2024-06-11 00:16:50,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.54 | bwd_microstep: 831.53 | bwd_inner_microstep: 831.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 00:16:52,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1378.78 | bwd_inner_microstep: 1378.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3403
[2024-06-11 00:16:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1294.10 | bwd_inner_microstep: 1294.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2705
[2024-06-11 00:16:55,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.49 | bwd_microstep: 1131.88 | bwd_inner_microstep: 1131.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 00:16:57,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.57 | bwd_microstep: 1375.01 | bwd_inner_microstep: 1374.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440
[2024-06-11 00:16:58,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.46 | bwd_microstep: 1222.55 | bwd_inner_microstep: 1222.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521
[2024-06-11 00:17:00,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1417.18 | bwd_inner_microstep: 1417.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-11 00:17:02,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.74 | bwd_microstep: 1418.36 | bwd_inner_microstep: 1418.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687
[2024-06-11 00:17:04,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1432.35 | bwd_inner_microstep: 1432.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-11 00:17:06,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.65 | bwd_microstep: 1523.37 | bwd_inner_microstep: 1523.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2393
[2024-06-11 00:17:08,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.64 | bwd_microstep: 935.99 | bwd_inner_microstep: 935.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612
[2024-06-11 00:17:10,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.19 | bwd_microstep: 1611.15 | bwd_inner_microstep: 1611.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-11 00:17:12,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.64 | bwd_microstep: 1447.07 | bwd_inner_microstep: 1447.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-11 00:17:14,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.80 | bwd_microstep: 1503.02 | bwd_inner_microstep: 1502.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 00:17:16,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1284.83 | bwd_inner_microstep: 1284.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3434
[2024-06-11 00:17:18,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.48 | bwd_microstep: 1332.23 | bwd_inner_microstep: 1332.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3808
[2024-06-11 00:17:20,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.28 | bwd_microstep: 1477.10 | bwd_inner_microstep: 1477.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-11 00:17:22,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.98 | bwd_microstep: 1299.23 | bwd_inner_microstep: 1299.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3283
[2024-06-11 00:17:23,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.11 | bwd_microstep: 1319.44 | bwd_inner_microstep: 1319.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821
[2024-06-11 00:17:30,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-11 00:17:30,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.00 | bwd_microstep: 5975.61 | bwd_inner_microstep: 2312.57 | bwd_allreduce_microstep: 3662.97 | step_microstep: 38.76
[2024-06-11 00:17:30,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15543.77 | bwd: 45688.67 | bwd_inner: 42024.78 | bwd_allreduce: 3663.21 | step: 40.31
{'loss': 1.1907, 'learning_rate': 4.346668659813489e-06, 'epoch': 0.79}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3398
[2024-06-11 00:17:32,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1398.82 | bwd_inner_microstep: 1398.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-11 00:17:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.55 | bwd_microstep: 1479.97 | bwd_inner_microstep: 1479.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2328
[2024-06-11 00:17:35,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.95 | bwd_microstep: 985.14 | bwd_inner_microstep: 985.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 00:17:37,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.62 | bwd_microstep: 1481.33 | bwd_inner_microstep: 1481.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-11 00:17:39,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 1299.92 | bwd_inner_microstep: 1299.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2165
[2024-06-11 00:17:40,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.83 | bwd_microstep: 949.49 | bwd_inner_microstep: 949.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-11 00:17:42,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.76 | bwd_microstep: 1185.86 | bwd_inner_microstep: 1185.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 00:17:44,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1285.40 | bwd_inner_microstep: 1285.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3580
[2024-06-11 00:17:46,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.03 | bwd_microstep: 1205.05 | bwd_inner_microstep: 1205.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409
[2024-06-11 00:17:47,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.15 | bwd_microstep: 1308.46 | bwd_inner_microstep: 1308.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3502
[2024-06-11 00:17:49,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.45 | bwd_microstep: 1250.64 | bwd_inner_microstep: 1250.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007
[2024-06-11 00:17:50,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.91 | bwd_microstep: 897.13 | bwd_inner_microstep: 897.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655
[2024-06-11 00:17:53,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.96 | bwd_microstep: 1577.59 | bwd_inner_microstep: 1577.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3705
[2024-06-11 00:17:55,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.75 | bwd_microstep: 1723.18 | bwd_inner_microstep: 1723.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657
[2024-06-11 00:17:57,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1478.47 | bwd_inner_microstep: 1478.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3828
[2024-06-11 00:17:59,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.30 | bwd_microstep: 1750.77 | bwd_inner_microstep: 1750.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3947
[2024-06-11 00:18:02,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.05 | bwd_microstep: 1697.08 | bwd_inner_microstep: 1697.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-11 00:18:04,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.44 | bwd_microstep: 1542.87 | bwd_inner_microstep: 1542.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 00:18:06,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.04 | bwd_microstep: 1388.55 | bwd_inner_microstep: 1388.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1962
[2024-06-11 00:18:07,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.39 | bwd_microstep: 733.54 | bwd_inner_microstep: 733.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2721
[2024-06-11 00:18:08,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.21 | bwd_microstep: 1134.29 | bwd_inner_microstep: 1134.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 00:18:10,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1375.19 | bwd_inner_microstep: 1375.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-11 00:18:12,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1516.83 | bwd_inner_microstep: 1516.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895
[2024-06-11 00:18:13,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.57 | bwd_microstep: 776.57 | bwd_inner_microstep: 776.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-11 00:18:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.76 | bwd_microstep: 1295.45 | bwd_inner_microstep: 1295.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949
[2024-06-11 00:18:16,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.16 | bwd_microstep: 728.70 | bwd_inner_microstep: 728.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 00:18:18,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1285.04 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 00:18:20,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.42 | bwd_microstep: 1656.10 | bwd_inner_microstep: 1656.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819
[2024-06-11 00:18:23,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.09 | bwd_microstep: 1693.15 | bwd_inner_microstep: 1693.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823
[2024-06-11 00:18:25,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.81 | bwd_microstep: 1518.00 | bwd_inner_microstep: 1517.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 00:18:26,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.31 | bwd_microstep: 1256.84 | bwd_inner_microstep: 1256.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2265
[2024-06-11 00:18:33,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-11 00:18:33,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.83 | bwd_microstep: 5934.23 | bwd_inner_microstep: 1100.57 | bwd_allreduce_microstep: 4833.61 | step_microstep: 37.96
[2024-06-11 00:18:33,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15649.74 | bwd: 46789.68 | bwd_inner: 41955.17 | bwd_allreduce: 4833.84 | step: 39.45
{'loss': 1.1753, 'learning_rate': 4.323333554478415e-06, 'epoch': 0.79}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392
[2024-06-11 00:18:35,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.33 | bwd_microstep: 1330.45 | bwd_inner_microstep: 1330.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 00:18:37,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.78 | bwd_microstep: 1477.38 | bwd_inner_microstep: 1477.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3919
[2024-06-11 00:18:39,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1393.14 | bwd_inner_microstep: 1393.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 00:18:40,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1383.18 | bwd_inner_microstep: 1383.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857
[2024-06-11 00:18:41,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.92 | bwd_microstep: 676.08 | bwd_inner_microstep: 676.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-11 00:18:42,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.94 | bwd_microstep: 682.83 | bwd_inner_microstep: 682.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-11 00:18:45,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.73 | bwd_microstep: 1630.90 | bwd_inner_microstep: 1630.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-11 00:18:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.16 | bwd_microstep: 1532.69 | bwd_inner_microstep: 1532.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 00:18:48,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1249.07 | bwd_inner_microstep: 1249.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-11 00:18:50,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.34 | bwd_microstep: 1307.83 | bwd_inner_microstep: 1307.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942
[2024-06-11 00:18:51,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.72 | bwd_microstep: 727.83 | bwd_inner_microstep: 727.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3714
[2024-06-11 00:18:53,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.75 | bwd_microstep: 1591.27 | bwd_inner_microstep: 1591.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2118
[2024-06-11 00:18:55,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.54 | bwd_microstep: 1021.93 | bwd_inner_microstep: 1021.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 00:18:57,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.48 | bwd_microstep: 1499.04 | bwd_inner_microstep: 1499.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840
[2024-06-11 00:18:59,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.64 | bwd_microstep: 1649.29 | bwd_inner_microstep: 1649.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-11 00:19:01,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1313.04 | bwd_inner_microstep: 1313.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 00:19:03,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.38 | bwd_microstep: 1258.18 | bwd_inner_microstep: 1258.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-11 00:19:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.29 | bwd_microstep: 1519.50 | bwd_inner_microstep: 1519.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1418
[2024-06-11 00:19:06,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 216.79 | bwd_microstep: 563.63 | bwd_inner_microstep: 563.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615
[2024-06-11 00:19:08,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.61 | bwd_microstep: 1341.54 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-11 00:19:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.64 | bwd_microstep: 1656.76 | bwd_inner_microstep: 1656.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-11 00:19:12,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.23 | bwd_microstep: 1300.40 | bwd_inner_microstep: 1300.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3714
[2024-06-11 00:19:14,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1559.77 | bwd_inner_microstep: 1559.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3825
[2024-06-11 00:19:16,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.47 | bwd_microstep: 1789.48 | bwd_inner_microstep: 1789.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275
[2024-06-11 00:19:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.17 | bwd_microstep: 908.40 | bwd_inner_microstep: 908.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3751
[2024-06-11 00:19:20,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.67 | bwd_microstep: 1569.83 | bwd_inner_microstep: 1569.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3593
[2024-06-11 00:19:22,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.21 | bwd_microstep: 1637.84 | bwd_inner_microstep: 1637.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-11 00:19:24,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1490.80 | bwd_inner_microstep: 1490.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-11 00:19:26,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.88 | bwd_microstep: 1598.18 | bwd_inner_microstep: 1598.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3628
[2024-06-11 00:19:28,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1347.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 00:19:30,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1378.86 | bwd_inner_microstep: 1378.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-11 00:19:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.28 | optimizer_step: 6.58
[2024-06-11 00:19:35,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 4129.95 | bwd_inner_microstep: 1442.79 | bwd_allreduce_microstep: 2687.10 | step_microstep: 39.31
[2024-06-11 00:19:35,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15949.58 | bwd: 45516.25 | bwd_inner: 42828.24 | bwd_allreduce: 2687.33 | step: 40.81
{'loss': 1.21, 'learning_rate': 4.30005366228107e-06, 'epoch': 0.79}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3452
[2024-06-11 00:19:37,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1400.68 | bwd_inner_microstep: 1400.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1929
[2024-06-11 00:19:38,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.27 | bwd_microstep: 851.61 | bwd_inner_microstep: 851.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3463
[2024-06-11 00:19:39,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.18 | bwd_microstep: 1215.12 | bwd_inner_microstep: 1215.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3872
[2024-06-11 00:19:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1496.83 | bwd_inner_microstep: 1496.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 00:19:44,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 1491.56 | bwd_inner_microstep: 1491.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 00:19:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1381.50 | bwd_inner_microstep: 1381.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 00:19:47,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3715
[2024-06-11 00:19:49,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.34 | bwd_microstep: 1462.25 | bwd_inner_microstep: 1462.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-11 00:19:51,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211
[2024-06-11 00:19:52,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.83 | bwd_microstep: 959.61 | bwd_inner_microstep: 959.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-11 00:19:53,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.59 | bwd_microstep: 801.39 | bwd_inner_microstep: 801.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 00:19:55,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 00:19:57,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1387.25 | bwd_inner_microstep: 1387.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3440
[2024-06-11 00:19:59,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1392.43 | bwd_inner_microstep: 1392.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-11 00:20:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1420.22 | bwd_inner_microstep: 1420.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662
[2024-06-11 00:20:03,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.43 | bwd_microstep: 1516.86 | bwd_inner_microstep: 1516.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-11 00:20:05,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.87 | bwd_microstep: 1409.60 | bwd_inner_microstep: 1409.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-11 00:20:07,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.07 | bwd_microstep: 1289.81 | bwd_inner_microstep: 1289.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 00:20:09,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.44 | bwd_microstep: 1656.94 | bwd_inner_microstep: 1656.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3912
[2024-06-11 00:20:11,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1489.69 | bwd_inner_microstep: 1489.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2119
[2024-06-11 00:20:12,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.00 | bwd_microstep: 829.22 | bwd_inner_microstep: 829.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-11 00:20:14,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.28 | bwd_microstep: 1078.16 | bwd_inner_microstep: 1078.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 00:20:16,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.81 | bwd_microstep: 1278.75 | bwd_inner_microstep: 1278.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3544
[2024-06-11 00:20:18,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.87 | bwd_microstep: 1588.47 | bwd_inner_microstep: 1588.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2014
[2024-06-11 00:20:19,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.52 | bwd_microstep: 871.60 | bwd_inner_microstep: 871.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2171
[2024-06-11 00:20:20,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.74 | bwd_microstep: 949.27 | bwd_inner_microstep: 949.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-11 00:20:23,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.96 | bwd_microstep: 1597.96 | bwd_inner_microstep: 1597.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580
[2024-06-11 00:20:25,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1526.15 | bwd_inner_microstep: 1526.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-11 00:20:26,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.74 | bwd_microstep: 1301.50 | bwd_inner_microstep: 1301.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-11 00:20:28,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.91 | bwd_microstep: 973.46 | bwd_inner_microstep: 973.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-11 00:20:30,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1610.53 | bwd_inner_microstep: 1610.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-11 00:20:34,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 00:20:34,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 3212.76 | bwd_inner_microstep: 1529.55 | bwd_allreduce_microstep: 1683.16 | step_microstep: 37.77
[2024-06-11 00:20:34,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15521.91 | bwd: 43363.35 | bwd_inner: 41679.30 | bwd_allreduce: 1683.38 | step: 39.28
{'loss': 1.1671, 'learning_rate': 4.2768290652131086e-06, 'epoch': 0.79}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 00:20:36,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.33 | bwd_microstep: 1349.43 | bwd_inner_microstep: 1349.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 00:20:38,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.93 | bwd_microstep: 1349.57 | bwd_inner_microstep: 1349.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-11 00:20:40,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.98 | bwd_microstep: 1562.10 | bwd_inner_microstep: 1562.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-11 00:20:41,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1255.75 | bwd_inner_microstep: 1255.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1880
[2024-06-11 00:20:42,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.67 | bwd_microstep: 707.66 | bwd_inner_microstep: 707.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-11 00:20:45,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.78 | bwd_microstep: 1548.35 | bwd_inner_microstep: 1548.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3741
[2024-06-11 00:20:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1560.20 | bwd_inner_microstep: 1560.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 00:20:48,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 00:20:50,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1381.83 | bwd_inner_microstep: 1381.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 00:20:52,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3408
[2024-06-11 00:20:54,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.43 | bwd_microstep: 1197.63 | bwd_inner_microstep: 1197.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-11 00:20:56,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.57 | bwd_microstep: 1187.29 | bwd_inner_microstep: 1187.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-11 00:20:58,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 1528.52 | bwd_inner_microstep: 1528.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-11 00:21:00,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.58 | bwd_microstep: 1419.77 | bwd_inner_microstep: 1419.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-11 00:21:02,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1409.84 | bwd_inner_microstep: 1409.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-11 00:21:03,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.45 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2056
[2024-06-11 00:21:04,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.57 | bwd_microstep: 973.53 | bwd_inner_microstep: 973.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3455
[2024-06-11 00:21:06,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.50 | bwd_microstep: 1302.34 | bwd_inner_microstep: 1302.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-11 00:21:08,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1557.74 | bwd_inner_microstep: 1557.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-11 00:21:09,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 731.68 | bwd_inner_microstep: 731.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-11 00:21:10,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.68 | bwd_microstep: 919.90 | bwd_inner_microstep: 919.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 00:21:12,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.85 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-11 00:21:14,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1444.85 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3696
[2024-06-11 00:21:16,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1331.55 | bwd_inner_microstep: 1331.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-11 00:21:18,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1430.28 | bwd_inner_microstep: 1430.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-11 00:21:20,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.05 | bwd_microstep: 1528.88 | bwd_inner_microstep: 1528.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3806
[2024-06-11 00:21:22,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.19 | bwd_microstep: 1700.90 | bwd_inner_microstep: 1700.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 00:21:25,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.57 | bwd_microstep: 1657.31 | bwd_inner_microstep: 1657.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-11 00:21:27,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.92 | bwd_microstep: 1441.34 | bwd_inner_microstep: 1441.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-11 00:21:29,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.70 | bwd_microstep: 1455.91 | bwd_inner_microstep: 1455.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-11 00:21:31,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1492.75 | bwd_inner_microstep: 1492.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 00:21:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.07 | optimizer_step: 6.58
[2024-06-11 00:21:35,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 4334.82 | bwd_inner_microstep: 1410.02 | bwd_allreduce_microstep: 2924.75 | step_microstep: 37.62
[2024-06-11 00:21:35,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15868.76 | bwd: 45485.17 | bwd_inner: 42559.51 | bwd_allreduce: 2924.98 | step: 39.14
7/1726 [23:39:05<6:09:59, 61.84s/it]


 79%|███████▉  | 1367/1726 [23:39:05<6:09:59, 61.84s/it]
 79%|███████▉  | 1368/1726 [23:40:07<6:08:28, 61.76s/it]


 79%|███████▉  | 1368/1726 [23:40:07<6:08:28, 61.76s/it]
 79%|███████▉  | 1369/1726 [23:41:10<6:09:15, 62.06s/it]


 79%|███████▉  | 1369/1726 [23:41:10<6:09:15, 62.06s/it]
 79%|███████▉  | 1370/1726 [23:42:11<6:07:46, 61.99s/it]


 79%|███████▉  | 1370/1726 [23:42:11<6:07:46, 61.99s/it]
 79%|███████▉  | 1371/1726 [23:43:11<6:01:50, 61.16s/it]


 79%|███████▉  | 1371/1726 [23:43:11<6:01:50, 61.16s/it]
 79%|███████▉  | 1372/17{'loss': 1.141, 'learning_rate': 4.253659845071436e-06, 'epoch': 0.79}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 00:21:37,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.57 | bwd_microstep: 1238.71 | bwd_inner_microstep: 1238.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 00:21:39,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1348.08 | bwd_inner_microstep: 1348.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 00:21:41,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.54 | bwd_microstep: 1340.13 | bwd_inner_microstep: 1340.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3842
[2024-06-11 00:21:43,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.87 | bwd_microstep: 1327.04 | bwd_inner_microstep: 1327.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3761
[2024-06-11 00:21:45,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1635.59 | bwd_inner_microstep: 1635.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899
[2024-06-11 00:21:46,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.89 | bwd_microstep: 683.30 | bwd_inner_microstep: 683.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2891
[2024-06-11 00:21:47,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.84 | bwd_microstep: 1088.58 | bwd_inner_microstep: 1088.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 00:21:49,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3408
[2024-06-11 00:21:51,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1296.01 | bwd_inner_microstep: 1295.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1891
[2024-06-11 00:21:52,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.25 | bwd_microstep: 746.40 | bwd_inner_microstep: 746.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2301
[2024-06-11 00:21:54,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.78 | bwd_microstep: 1078.70 | bwd_inner_microstep: 1078.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3809
[2024-06-11 00:21:56,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.28 | bwd_microstep: 1716.50 | bwd_inner_microstep: 1716.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-11 00:21:57,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.61 | bwd_microstep: 894.73 | bwd_inner_microstep: 894.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677
[2024-06-11 00:22:00,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.77 | bwd_microstep: 1722.97 | bwd_inner_microstep: 1722.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 00:22:01,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1251.80 | bwd_inner_microstep: 1251.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-11 00:22:03,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.16 | bwd_microstep: 1346.58 | bwd_inner_microstep: 1346.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 00:22:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1392.95 | bwd_inner_microstep: 1392.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-11 00:22:07,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1349.86 | bwd_inner_microstep: 1349.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-11 00:22:09,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.90 | bwd_microstep: 1310.53 | bwd_inner_microstep: 1310.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3526
[2024-06-11 00:22:11,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.79 | bwd_microstep: 1257.76 | bwd_inner_microstep: 1257.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 00:22:13,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1556.61 | bwd_inner_microstep: 1556.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 00:22:15,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-11 00:22:17,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 1509.15 | bwd_inner_microstep: 1509.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441
[2024-06-11 00:22:19,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.94 | bwd_microstep: 1408.81 | bwd_inner_microstep: 1408.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3462
[2024-06-11 00:22:21,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.87 | bwd_microstep: 1433.03 | bwd_inner_microstep: 1433.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2292
[2024-06-11 00:22:22,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.70 | bwd_microstep: 1021.17 | bwd_inner_microstep: 1021.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3601
[2024-06-11 00:22:24,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1533.04 | bwd_inner_microstep: 1533.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-11 00:22:26,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1429.98 | bwd_inner_microstep: 1429.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-11 00:22:28,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.18 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-11 00:22:30,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.99 | bwd_microstep: 1444.80 | bwd_inner_microstep: 1444.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2262
[2024-06-11 00:22:32,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.99 | bwd_microstep: 1000.82 | bwd_inner_microstep: 1000.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-11 00:22:38,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-11 00:22:38,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.11 | bwd_microstep: 5832.24 | bwd_inner_microstep: 1658.64 | bwd_allreduce_microstep: 4173.55 | step_microstep: 37.84
[2024-06-11 00:22:38,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15741.50 | bwd: 46465.04 | bwd_inner: 42290.59 | bwd_allreduce: 4173.78 | step: 39.34
{'loss': 1.166, 'learning_rate': 4.230546083457941e-06, 'epoch': 0.8}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 00:22:40,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1266.68 | bwd_inner_microstep: 1266.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-11 00:22:42,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.99 | bwd_microstep: 1276.82 | bwd_inner_microstep: 1276.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928
[2024-06-11 00:22:43,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.65 | bwd_microstep: 725.64 | bwd_inner_microstep: 725.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 00:22:44,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1383.64 | bwd_inner_microstep: 1383.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 00:22:46,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.92 | bwd_microstep: 1246.53 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3777
[2024-06-11 00:22:48,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1505.64 | bwd_inner_microstep: 1505.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769
[2024-06-11 00:22:50,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1471.40 | bwd_inner_microstep: 1471.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 00:22:52,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1246.78 | bwd_inner_microstep: 1246.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 00:22:54,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1289.28 | bwd_inner_microstep: 1289.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-11 00:22:56,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1319.11 | bwd_inner_microstep: 1319.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-11 00:22:57,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 796.07 | bwd_inner_microstep: 796.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1901
[2024-06-11 00:22:58,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.05 | bwd_microstep: 777.39 | bwd_inner_microstep: 777.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-11 00:23:00,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.50 | bwd_microstep: 1280.16 | bwd_inner_microstep: 1280.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2134
[2024-06-11 00:23:01,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.31 | bwd_microstep: 888.72 | bwd_inner_microstep: 888.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-11 00:23:02,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.62 | bwd_microstep: 893.77 | bwd_inner_microstep: 893.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2133
[2024-06-11 00:23:03,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.07 | bwd_microstep: 930.03 | bwd_inner_microstep: 930.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-11 00:23:05,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1324.23 | bwd_inner_microstep: 1324.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-11 00:23:07,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.49 | bwd_microstep: 1581.93 | bwd_inner_microstep: 1581.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3540
[2024-06-11 00:23:10,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1585.90 | bwd_inner_microstep: 1585.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3436
[2024-06-11 00:23:11,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.55 | bwd_microstep: 1202.72 | bwd_inner_microstep: 1202.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713
[2024-06-11 00:23:13,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.40 | bwd_microstep: 1462.38 | bwd_inner_microstep: 1462.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2064
[2024-06-11 00:23:15,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.74 | bwd_microstep: 1012.60 | bwd_inner_microstep: 1012.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-11 00:23:17,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1509.76 | bwd_inner_microstep: 1509.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-11 00:23:19,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.02 | bwd_microstep: 1488.36 | bwd_inner_microstep: 1488.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-11 00:23:21,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 1535.95 | bwd_inner_microstep: 1535.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 00:23:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1496.82 | bwd_inner_microstep: 1496.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 00:23:25,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.59 | bwd_microstep: 1256.51 | bwd_inner_microstep: 1256.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3614
[2024-06-11 00:23:26,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1246.59 | bwd_inner_microstep: 1246.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 00:23:29,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.42 | bwd_microstep: 1556.72 | bwd_inner_microstep: 1556.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-11 00:23:31,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1629.77 | bwd_inner_microstep: 1629.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 00:23:33,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1495.06 | bwd_inner_microstep: 1495.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2261
[2024-06-11 00:23:40,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-11 00:23:40,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.49 | bwd_microstep: 6537.07 | bwd_inner_microstep: 1042.85 | bwd_allreduce_microstep: 5494.16 | step_microstep: 37.79
[2024-06-11 00:23:40,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15213.05 | bwd: 46220.03 | bwd_inner: 40724.96 | bwd_allreduce: 5494.38 | step: 39.34
{'loss': 1.1119, 'learning_rate': 4.207487861779158e-06, 'epoch': 0.8}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966
[2024-06-11 00:23:41,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.81 | bwd_microstep: 816.27 | bwd_inner_microstep: 816.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3946
[2024-06-11 00:23:43,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1393.00 | bwd_inner_microstep: 1392.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 00:23:45,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.43 | bwd_microstep: 1373.30 | bwd_inner_microstep: 1373.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-11 00:23:47,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1544.60 | bwd_inner_microstep: 1544.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 00:23:49,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.79 | bwd_microstep: 1373.93 | bwd_inner_microstep: 1373.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470
[2024-06-11 00:23:51,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.06 | bwd_microstep: 1343.54 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3474
[2024-06-11 00:23:52,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1228.87 | bwd_inner_microstep: 1228.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 00:23:54,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1277.39 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2458
[2024-06-11 00:23:55,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 948.63 | bwd_inner_microstep: 948.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 00:23:57,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1389.03 | bwd_inner_microstep: 1389.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-11 00:23:59,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.24 | bwd_microstep: 1313.89 | bwd_inner_microstep: 1313.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2682
[2024-06-11 00:24:00,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 928.92 | bwd_inner_microstep: 928.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3669
[2024-06-11 00:24:03,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.66 | bwd_microstep: 1756.34 | bwd_inner_microstep: 1756.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3480
[2024-06-11 00:24:05,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.82 | bwd_microstep: 1549.15 | bwd_inner_microstep: 1549.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-11 00:24:07,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1475.35 | bwd_inner_microstep: 1475.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3644
[2024-06-11 00:24:09,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.14 | bwd_microstep: 1347.18 | bwd_inner_microstep: 1347.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 00:24:11,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.06 | bwd_microstep: 1392.53 | bwd_inner_microstep: 1392.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-11 00:24:12,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.72 | bwd_microstep: 806.85 | bwd_inner_microstep: 806.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-11 00:24:14,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1397.71 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-11 00:24:16,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.11 | bwd_microstep: 1585.16 | bwd_inner_microstep: 1585.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3611
[2024-06-11 00:24:18,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.71 | bwd_microstep: 1559.96 | bwd_inner_microstep: 1559.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3653
[2024-06-11 00:24:20,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.98 | bwd_microstep: 1653.48 | bwd_inner_microstep: 1653.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 00:24:22,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1375.86 | bwd_inner_microstep: 1375.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-11 00:24:24,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1493.58 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2292
[2024-06-11 00:24:26,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.33 | bwd_microstep: 1035.13 | bwd_inner_microstep: 1035.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3601
[2024-06-11 00:24:28,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1340.18 | bwd_inner_microstep: 1340.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-11 00:24:30,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.28 | bwd_microstep: 1431.24 | bwd_inner_microstep: 1431.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426
[2024-06-11 00:24:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1257.47 | bwd_inner_microstep: 1257.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-11 00:24:33,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.58 | bwd_microstep: 1484.35 | bwd_inner_microstep: 1484.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3595
[2024-06-11 00:24:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.92 | bwd_microstep: 1457.58 | bwd_inner_microstep: 1457.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3813
[2024-06-11 00:24:37,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.00 | bwd_microstep: 1258.78 | bwd_inner_microstep: 1258.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 00:24:42,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.03 | optimizer_step: 6.61
[2024-06-11 00:24:42,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.52 | bwd_microstep: 4535.45 | bwd_inner_microstep: 1863.82 | bwd_allreduce_microstep: 2671.59 | step_microstep: 37.46
[2024-06-11 00:24:42,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16194.44 | bwd: 46124.75 | bwd_inner: 43452.24 | bwd_allreduce: 2671.81 | step: 38.92
{'loss': 1.1617, 'learning_rate': 4.184485261246032e-06, 'epoch': 0.8}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388
[2024-06-11 00:24:44,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.17 | bwd_microstep: 1237.72 | bwd_inner_microstep: 1237.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 00:24:46,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1247.00 | bwd_inner_microstep: 1246.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3863
[2024-06-11 00:24:48,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1395.13 | bwd_inner_microstep: 1395.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865
[2024-06-11 00:24:50,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.41 | bwd_microstep: 1561.59 | bwd_inner_microstep: 1561.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 00:24:52,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.96 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-11 00:24:54,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.25 | bwd_microstep: 1538.17 | bwd_inner_microstep: 1538.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4104
[2024-06-11 00:24:56,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.20 | bwd_microstep: 1734.33 | bwd_inner_microstep: 1734.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 00:24:58,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1252.80 | bwd_inner_microstep: 1252.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 00:25:00,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1283.78 | bwd_inner_microstep: 1283.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1996
[2024-06-11 00:25:01,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.03 | bwd_microstep: 707.73 | bwd_inner_microstep: 707.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 00:25:03,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1278.68 | bwd_inner_microstep: 1278.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1927
[2024-06-11 00:25:04,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.69 | bwd_microstep: 759.26 | bwd_inner_microstep: 759.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-11 00:25:06,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.04 | bwd_microstep: 1635.92 | bwd_inner_microstep: 1635.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 00:25:08,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1375.04 | bwd_inner_microstep: 1375.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3677
[2024-06-11 00:25:10,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.00 | bwd_microstep: 1665.06 | bwd_inner_microstep: 1665.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-11 00:25:12,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1341.97 | bwd_inner_microstep: 1341.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3697
[2024-06-11 00:25:14,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1329.49 | bwd_inner_microstep: 1329.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-11 00:25:16,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.69 | bwd_microstep: 1352.93 | bwd_inner_microstep: 1352.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2179
[2024-06-11 00:25:17,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.73 | bwd_microstep: 918.48 | bwd_inner_microstep: 918.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 00:25:19,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.67 | bwd_microstep: 1381.38 | bwd_inner_microstep: 1381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-11 00:25:21,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.67 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-11 00:25:23,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1480.32 | bwd_inner_microstep: 1480.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3605
[2024-06-11 00:25:25,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1340.25 | bwd_inner_microstep: 1340.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998
[2024-06-11 00:25:26,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.76 | bwd_microstep: 711.02 | bwd_inner_microstep: 710.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-11 00:25:28,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.94 | bwd_microstep: 1661.46 | bwd_inner_microstep: 1661.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-11 00:25:30,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-11 00:25:32,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1409.08 | bwd_inner_microstep: 1409.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-11 00:25:34,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.84 | bwd_microstep: 1599.06 | bwd_inner_microstep: 1599.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 00:25:36,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.18 | bwd_microstep: 1657.58 | bwd_inner_microstep: 1657.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-11 00:25:38,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.02 | bwd_microstep: 1542.55 | bwd_inner_microstep: 1542.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2283
[2024-06-11 00:25:40,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.08 | bwd_microstep: 1073.00 | bwd_inner_microstep: 1072.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-11 00:25:45,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.11 | optimizer_step: 6.62
[2024-06-11 00:25:45,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 4838.06 | bwd_inner_microstep: 1684.85 | bwd_allreduce_microstep: 3153.15 | step_microstep: 38.10
[2024-06-11 00:25:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16157.46 | bwd: 46509.98 | bwd_inner: 43355.92 | bwd_allreduce: 3153.39 | step: 39.52
{'loss': 1.137, 'learning_rate': 4.161538362873596e-06, 'epoch': 0.8}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3548
[2024-06-11 00:25:47,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1346.84 | bwd_inner_microstep: 1346.77 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3007
[2024-06-11 00:25:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.79 | bwd_microstep: 1051.68 | bwd_inner_microstep: 1051.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862
[2024-06-11 00:25:51,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.74 | bwd_microstep: 1561.71 | bwd_inner_microstep: 1561.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 00:25:53,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1244.48 | bwd_inner_microstep: 1244.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3781
[2024-06-11 00:25:54,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.59 | bwd_microstep: 1347.87 | bwd_inner_microstep: 1347.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2242
[2024-06-11 00:25:56,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.96 | bwd_microstep: 897.84 | bwd_inner_microstep: 897.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-11 00:25:58,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1398.98 | bwd_inner_microstep: 1398.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-11 00:26:00,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1346.14 | bwd_inner_microstep: 1346.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1979
[2024-06-11 00:26:01,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.95 | bwd_microstep: 735.73 | bwd_inner_microstep: 735.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2214
[2024-06-11 00:26:02,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.36 | bwd_microstep: 1052.51 | bwd_inner_microstep: 1052.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 00:26:04,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1250.68 | bwd_inner_microstep: 1250.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3687
[2024-06-11 00:26:06,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.49 | bwd_microstep: 1572.46 | bwd_inner_microstep: 1572.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3680
[2024-06-11 00:26:08,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.08 | bwd_microstep: 1721.63 | bwd_inner_microstep: 1721.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454
[2024-06-11 00:26:10,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.92 | bwd_microstep: 1187.41 | bwd_inner_microstep: 1187.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1978
[2024-06-11 00:26:11,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.66 | bwd_microstep: 893.79 | bwd_inner_microstep: 893.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 00:26:13,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1395.67 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-11 00:26:15,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.77 | bwd_microstep: 1241.99 | bwd_inner_microstep: 1241.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3729
[2024-06-11 00:26:17,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.61 | bwd_microstep: 1336.82 | bwd_inner_microstep: 1336.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696
[2024-06-11 00:26:19,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1432.98 | bwd_inner_microstep: 1432.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 00:26:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1558.23 | bwd_inner_microstep: 1558.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-11 00:26:23,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.90 | bwd_microstep: 1260.62 | bwd_inner_microstep: 1260.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-11 00:26:25,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.20 | bwd_microstep: 1529.73 | bwd_inner_microstep: 1529.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 00:26:27,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1375.92 | bwd_inner_microstep: 1375.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-11 00:26:28,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1414.16 | bwd_inner_microstep: 1414.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3463
[2024-06-11 00:26:30,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.63 | bwd_microstep: 1343.58 | bwd_inner_microstep: 1343.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2953
[2024-06-11 00:26:32,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.18 | bwd_microstep: 1198.97 | bwd_inner_microstep: 1198.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 00:26:34,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1287.19 | bwd_inner_microstep: 1287.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 00:26:36,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 1548.32 | bwd_inner_microstep: 1548.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2763
[2024-06-11 00:26:38,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.78 | bwd_microstep: 1243.60 | bwd_inner_microstep: 1243.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 00:26:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1346.93 | bwd_inner_microstep: 1346.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-11 00:26:41,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.93 | bwd_microstep: 1148.44 | bwd_inner_microstep: 1148.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-11 00:26:47,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.10 | optimizer_step: 6.58
[2024-06-11 00:26:47,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 5050.35 | bwd_inner_microstep: 1671.09 | bwd_allreduce_microstep: 3379.20 | step_microstep: 40.40
[2024-06-11 00:26:47,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15657.84 | bwd: 45323.25 | bwd_inner: 41943.08 | bwd_allreduce: 3379.47 | step: 41.89
26 [23:44:12<6:01:45, 61.32s/it]


 79%|███████▉  | 1372/1726 [23:44:12<6:01:45, 61.32s/it]
 80%|███████▉  | 1373/1726 [23:45:15<6:02:53, 61.68s/it]


 80%|███████▉  | 1373/1726 [23:45:15<6:02:53, 61.68s/it]
 80%|███████▉  | 1374/1726 [23:46:17<6:01:59, 61.70s/it]


 80%|███████▉  | 1374/1726 [23:46:17<6:01:59, 61.70s/it]
 80%|███████▉  | 1375/1726 [23:47:19<6:02:37, 61.99s/it]


 80%|███████▉  | 1375/1726 [23:47:19<6:02:37, 61.99s/it]
 80%|███████▉  | 1376/1726 [23:48:22<6:03:21, 62.29s/it]


 80%|███████▉  | 1376/1726 [23:48:22<6:03:21, 62.29s/it]
 80%|███████▉  | 1377/1726 [{'loss': 1.1562, 'learning_rate': 4.138647247480707e-06, 'epoch': 0.8}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469
[2024-06-11 00:26:49,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1305.81 | bwd_inner_microstep: 1305.62 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 00:26:50,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.70 | bwd_microstep: 790.79 | bwd_inner_microstep: 790.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3836
[2024-06-11 00:26:52,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.02 | bwd_microstep: 1356.76 | bwd_inner_microstep: 1356.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 00:26:53,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1382.92 | bwd_inner_microstep: 1382.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-11 00:26:55,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1343.30 | bwd_inner_microstep: 1343.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 00:26:56,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 792.34 | bwd_inner_microstep: 792.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 00:26:58,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.10 | bwd_microstep: 1343.21 | bwd_inner_microstep: 1343.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3537
[2024-06-11 00:27:00,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.34 | bwd_microstep: 1244.09 | bwd_inner_microstep: 1244.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-11 00:27:02,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 1440.40 | bwd_inner_microstep: 1440.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-11 00:27:04,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1406.11 | bwd_inner_microstep: 1406.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3494
[2024-06-11 00:27:06,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1345.35 | bwd_inner_microstep: 1345.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 00:27:08,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1250.06 | bwd_inner_microstep: 1250.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638
[2024-06-11 00:27:09,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.22 | bwd_microstep: 1420.50 | bwd_inner_microstep: 1420.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 00:27:11,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1389.10 | bwd_inner_microstep: 1389.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3655
[2024-06-11 00:27:13,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.48 | bwd_microstep: 1225.66 | bwd_inner_microstep: 1225.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 00:27:15,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1372.10 | bwd_inner_microstep: 1372.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-11 00:27:17,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.82 | bwd_microstep: 1294.95 | bwd_inner_microstep: 1294.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 00:27:19,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.48 | bwd_microstep: 1513.27 | bwd_inner_microstep: 1513.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 00:27:21,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4102
[2024-06-11 00:27:23,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.17 | bwd_microstep: 1667.69 | bwd_inner_microstep: 1667.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-11 00:27:24,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 801.17 | bwd_inner_microstep: 801.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622
[2024-06-11 00:27:26,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1612.43 | bwd_inner_microstep: 1612.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-11 00:27:28,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.12 | bwd_microstep: 877.32 | bwd_inner_microstep: 877.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-11 00:27:30,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1351.95 | bwd_inner_microstep: 1351.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-11 00:27:30,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.52 | bwd_microstep: 697.14 | bwd_inner_microstep: 697.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 00:27:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 00:27:34,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.74 | bwd_microstep: 1342.05 | bwd_inner_microstep: 1342.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2196
[2024-06-11 00:27:35,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.24 | bwd_microstep: 858.60 | bwd_inner_microstep: 858.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814
[2024-06-11 00:27:38,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.53 | bwd_microstep: 1603.55 | bwd_inner_microstep: 1603.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3406
[2024-06-11 00:27:39,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.75 | bwd_microstep: 1439.77 | bwd_inner_microstep: 1439.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815
[2024-06-11 00:27:42,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.97 | bwd_microstep: 1516.70 | bwd_inner_microstep: 1516.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3740
[2024-06-11 00:27:49,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-11 00:27:49,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 7270.37 | bwd_inner_microstep: 1808.55 | bwd_allreduce_microstep: 5461.77 | step_microstep: 37.74
[2024-06-11 00:27:49,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15501.95 | bwd: 46928.37 | bwd_inner: 41465.56 | bwd_allreduce: 5462.08 | step: 39.30
{'loss': 1.1716, 'learning_rate': 4.11581199568976e-06, 'epoch': 0.8}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 00:27:52,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.40 | bwd_microstep: 1469.33 | bwd_inner_microstep: 1469.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-11 00:27:53,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 707.99 | bwd_inner_microstep: 707.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3902
[2024-06-11 00:27:55,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1482.84 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-11 00:27:56,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.56 | bwd_microstep: 872.90 | bwd_inner_microstep: 872.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-11 00:27:57,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.75 | bwd_microstep: 1193.18 | bwd_inner_microstep: 1193.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2222
[2024-06-11 00:27:59,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.27 | bwd_microstep: 958.41 | bwd_inner_microstep: 958.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-11 00:28:01,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.49 | bwd_microstep: 1529.44 | bwd_inner_microstep: 1529.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 00:28:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 00:28:05,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1384.36 | bwd_inner_microstep: 1384.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-11 00:28:06,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1277.52 | bwd_inner_microstep: 1277.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951
[2024-06-11 00:28:07,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.21 | bwd_microstep: 728.46 | bwd_inner_microstep: 728.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 00:28:09,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1286.80 | bwd_inner_microstep: 1286.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975
[2024-06-11 00:28:10,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.98 | bwd_microstep: 830.95 | bwd_inner_microstep: 830.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2011
[2024-06-11 00:28:11,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.64 | bwd_microstep: 739.58 | bwd_inner_microstep: 739.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-11 00:28:13,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.10 | bwd_microstep: 1442.75 | bwd_inner_microstep: 1442.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895
[2024-06-11 00:28:14,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.44 | bwd_microstep: 681.80 | bwd_inner_microstep: 681.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2781
[2024-06-11 00:28:16,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.58 | bwd_microstep: 1147.51 | bwd_inner_microstep: 1147.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-11 00:28:18,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.84 | bwd_microstep: 1521.26 | bwd_inner_microstep: 1521.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1447
[2024-06-11 00:28:19,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.46 | bwd_microstep: 539.27 | bwd_inner_microstep: 539.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 00:28:21,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.74 | bwd_microstep: 1281.63 | bwd_inner_microstep: 1281.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735
[2024-06-11 00:28:23,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1433.50 | bwd_inner_microstep: 1433.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 00:28:24,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.76 | bwd_microstep: 1281.93 | bwd_inner_microstep: 1281.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2285
[2024-06-11 00:28:26,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.41 | bwd_microstep: 905.37 | bwd_inner_microstep: 905.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818
[2024-06-11 00:28:28,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.08 | bwd_microstep: 1486.71 | bwd_inner_microstep: 1486.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-11 00:28:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1642.28 | bwd_inner_microstep: 1642.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-11 00:28:32,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.18 | bwd_microstep: 1520.94 | bwd_inner_microstep: 1520.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3611
[2024-06-11 00:28:34,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1441.11 | bwd_inner_microstep: 1441.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-11 00:28:36,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1443.90 | bwd_inner_microstep: 1443.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583
[2024-06-11 00:28:38,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.06 | bwd_microstep: 1530.07 | bwd_inner_microstep: 1530.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2268
[2024-06-11 00:28:39,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.26 | bwd_microstep: 780.05 | bwd_inner_microstep: 780.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-11 00:28:41,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.89 | bwd_microstep: 1647.09 | bwd_inner_microstep: 1647.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-11 00:28:49,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-11 00:28:49,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.26 | bwd_microstep: 7115.05 | bwd_inner_microstep: 1456.77 | bwd_allreduce_microstep: 5658.22 | step_microstep: 38.47
[2024-06-11 00:28:49,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14583.58 | bwd: 44690.85 | bwd_inner: 39031.71 | bwd_allreduce: 5658.46 | step: 39.86
{'loss': 1.1734, 'learning_rate': 4.0930326879263924e-06, 'epoch': 0.8}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1881
[2024-06-11 00:28:50,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.02 | bwd_microstep: 766.82 | bwd_inner_microstep: 766.73 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3923
[2024-06-11 00:28:52,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.71 | bwd_microstep: 1586.91 | bwd_inner_microstep: 1586.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3852
[2024-06-11 00:28:54,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.73 | bwd_microstep: 1462.63 | bwd_inner_microstep: 1462.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 00:28:56,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1337.79 | bwd_inner_microstep: 1337.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 00:28:58,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.04 | bwd_microstep: 1380.16 | bwd_inner_microstep: 1380.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-11 00:29:00,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1526.37 | bwd_inner_microstep: 1526.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-11 00:29:01,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.83 | bwd_microstep: 792.52 | bwd_inner_microstep: 792.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 00:29:03,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1246.64 | bwd_inner_microstep: 1246.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756
[2024-06-11 00:29:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.98 | bwd_microstep: 1641.32 | bwd_inner_microstep: 1641.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 00:29:07,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420
[2024-06-11 00:29:09,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.12 | bwd_microstep: 1215.33 | bwd_inner_microstep: 1215.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-11 00:29:11,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1442.42 | bwd_inner_microstep: 1442.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-11 00:29:13,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 1411.95 | bwd_inner_microstep: 1411.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-11 00:29:15,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1290.46 | bwd_inner_microstep: 1290.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1936
[2024-06-11 00:29:16,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.80 | bwd_microstep: 810.29 | bwd_inner_microstep: 810.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-11 00:29:17,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.31 | bwd_microstep: 1160.08 | bwd_inner_microstep: 1160.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-11 00:29:18,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.40 | bwd_microstep: 698.33 | bwd_inner_microstep: 698.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-11 00:29:20,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.16 | bwd_microstep: 1529.02 | bwd_inner_microstep: 1528.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-11 00:29:22,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.93 | bwd_microstep: 803.06 | bwd_inner_microstep: 803.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-11 00:29:23,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.08 | bwd_microstep: 811.25 | bwd_inner_microstep: 811.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 00:29:24,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1256.14 | bwd_inner_microstep: 1256.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-11 00:29:26,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.46 | bwd_microstep: 1297.11 | bwd_inner_microstep: 1297.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 00:29:28,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1401.79 | bwd_inner_microstep: 1401.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-11 00:29:29,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.25 | bwd_microstep: 976.76 | bwd_inner_microstep: 976.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-11 00:29:32,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1513.95 | bwd_inner_microstep: 1513.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 00:29:34,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.88 | bwd_microstep: 1665.87 | bwd_inner_microstep: 1665.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-11 00:29:36,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.08 | bwd_microstep: 1433.01 | bwd_inner_microstep: 1432.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3475
[2024-06-11 00:29:38,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.13 | bwd_microstep: 1332.55 | bwd_inner_microstep: 1332.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3423
[2024-06-11 00:29:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1406.23 | bwd_inner_microstep: 1406.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-11 00:29:41,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.30 | bwd_microstep: 1358.58 | bwd_inner_microstep: 1358.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3078
[2024-06-11 00:29:43,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.10 | bwd_microstep: 1331.04 | bwd_inner_microstep: 1331.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-11 00:29:51,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-11 00:29:51,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.99 | bwd_microstep: 6810.71 | bwd_inner_microstep: 1711.00 | bwd_allreduce_microstep: 5099.65 | step_microstep: 38.26
[2024-06-11 00:29:51,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15274.58 | bwd: 46039.98 | bwd_inner: 40939.34 | bwd_allreduce: 5099.93 | step: 39.91
{'loss': 1.191, 'learning_rate': 4.070309404419204e-06, 'epoch': 0.8}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 00:29:53,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.44 | bwd_microstep: 1328.27 | bwd_inner_microstep: 1328.14 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 00:29:54,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1272.08 | bwd_inner_microstep: 1272.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4394
[2024-06-11 00:29:57,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.39 | bwd_microstep: 1605.30 | bwd_inner_microstep: 1605.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430
[2024-06-11 00:29:58,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.21 | bwd_microstep: 1149.18 | bwd_inner_microstep: 1149.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-11 00:30:00,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1495.29 | bwd_inner_microstep: 1495.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521
[2024-06-11 00:30:02,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.56 | bwd_microstep: 1193.20 | bwd_inner_microstep: 1193.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 00:30:04,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 00:30:06,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.79 | bwd_microstep: 1380.92 | bwd_inner_microstep: 1380.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-11 00:30:07,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.72 | bwd_microstep: 1182.25 | bwd_inner_microstep: 1182.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 00:30:09,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1476.60 | bwd_inner_microstep: 1476.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-11 00:30:11,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1296.70 | bwd_inner_microstep: 1296.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956
[2024-06-11 00:30:12,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.55 | bwd_microstep: 825.10 | bwd_inner_microstep: 825.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-11 00:30:14,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1297.70 | bwd_inner_microstep: 1297.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3687
[2024-06-11 00:30:16,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.17 | bwd_microstep: 1585.13 | bwd_inner_microstep: 1585.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-11 00:30:18,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1487.50 | bwd_inner_microstep: 1487.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 00:30:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1371.11 | bwd_inner_microstep: 1371.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3620
[2024-06-11 00:30:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1437.30 | bwd_inner_microstep: 1437.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 00:30:24,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1377.57 | bwd_inner_microstep: 1377.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-11 00:30:26,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.33 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-11 00:30:28,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1411.04 | bwd_inner_microstep: 1411.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-11 00:30:30,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1389.63 | bwd_inner_microstep: 1389.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3849
[2024-06-11 00:30:32,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.77 | bwd_microstep: 1630.99 | bwd_inner_microstep: 1630.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3717
[2024-06-11 00:30:34,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.92 | bwd_microstep: 1666.28 | bwd_inner_microstep: 1666.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502
[2024-06-11 00:30:36,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.65 | bwd_microstep: 1220.41 | bwd_inner_microstep: 1220.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-11 00:30:38,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.77 | bwd_microstep: 1503.35 | bwd_inner_microstep: 1503.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3702
[2024-06-11 00:30:40,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.39 | bwd_microstep: 1362.54 | bwd_inner_microstep: 1362.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-11 00:30:42,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1492.42 | bwd_inner_microstep: 1492.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3571
[2024-06-11 00:30:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.28 | bwd_microstep: 1571.75 | bwd_inner_microstep: 1571.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 00:30:46,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1380.30 | bwd_inner_microstep: 1380.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-11 00:30:48,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.58 | bwd_microstep: 1540.21 | bwd_inner_microstep: 1540.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-11 00:30:50,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1509.00 | bwd_inner_microstep: 1508.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2918
[2024-06-11 00:30:52,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.03 | optimizer_step: 6.59
[2024-06-11 00:30:52,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.06 | bwd_microstep: 1205.36 | bwd_inner_microstep: 1123.59 | bwd_allreduce_microstep: 81.71 | step_microstep: 37.42
[2024-06-11 00:30:52,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16607.68 | bwd: 44302.48 | bwd_inner: 44219.76 | bwd_allreduce: 82.00 | step: 38.91
{'loss': 1.2084, 'learning_rate': 4.04764222519948e-06, 'epoch': 0.8}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 00:30:54,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1242.85 | bwd_inner_microstep: 1242.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 00:30:56,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.00 | bwd_microstep: 1382.92 | bwd_inner_microstep: 1382.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3863
[2024-06-11 00:30:58,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1460.03 | bwd_inner_microstep: 1460.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 00:31:45,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.32 | bwd_microstep: 1542.84 | bwd_inner_microstep: 1542.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-11 00:31:47,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.27 | bwd_microstep: 1536.42 | bwd_inner_microstep: 1536.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-11 00:31:49,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 1436.87 | bwd_inner_microstep: 1436.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 00:31:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.71 | bwd_microstep: 1274.41 | bwd_inner_microstep: 1274.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 00:31:53,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.37 | bwd_microstep: 1237.90 | bwd_inner_microstep: 1237.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-11 00:31:54,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.70 | bwd_microstep: 1387.48 | bwd_inner_microstep: 1387.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-11 00:31:55,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.30 | bwd_microstep: 677.09 | bwd_inner_microstep: 677.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 00:31:57,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.67 | bwd_microstep: 1476.11 | bwd_inner_microstep: 1476.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 00:31:59,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.67 | bwd_microstep: 1279.86 | bwd_inner_microstep: 1279.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628
[2024-06-11 00:32:01,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.17 | bwd_microstep: 1532.07 | bwd_inner_microstep: 1532.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3659
[2024-06-11 00:32:03,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1462.00 | bwd_inner_microstep: 1461.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3527
[2024-06-11 00:32:05,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.60 | bwd_microstep: 1555.31 | bwd_inner_microstep: 1555.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 00:32:08,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.57 | bwd_microstep: 1480.68 | bwd_inner_microstep: 1480.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-11 00:32:09,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.49 | bwd_microstep: 1314.24 | bwd_inner_microstep: 1314.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-11 00:32:11,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1491.29 | bwd_inner_microstep: 1491.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 00:32:13,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.92 | bwd_microstep: 1479.59 | bwd_inner_microstep: 1479.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2155
[2024-06-11 00:32:15,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.40 | bwd_microstep: 760.33 | bwd_inner_microstep: 760.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645
[2024-06-11 00:32:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1409.56 | bwd_inner_microstep: 1409.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 00:32:18,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1285.36 | bwd_inner_microstep: 1285.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 00:32:20,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.15 | bwd_microstep: 1388.40 | bwd_inner_microstep: 1388.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3606
[2024-06-11 00:32:22,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1437.89 | bwd_inner_microstep: 1437.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 00:32:24,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.80 | bwd_microstep: 1552.51 | bwd_inner_microstep: 1552.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 00:32:27,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.06 | bwd_microstep: 1642.90 | bwd_inner_microstep: 1642.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564
[2024-06-11 00:32:28,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1326.41 | bwd_inner_microstep: 1326.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-11 00:32:30,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1403.27 | bwd_inner_microstep: 1403.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-11 00:32:32,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1494.22 | bwd_inner_microstep: 1494.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-11 00:32:35,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.96 | bwd_microstep: 1599.77 | bwd_inner_microstep: 1599.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 00:32:37,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1473.96 | bwd_inner_microstep: 1473.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2699
[2024-06-11 00:32:38,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.02 | optimizer_step: 6.64
[2024-06-11 00:32:38,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.15 | bwd_microstep: 1070.42 | bwd_inner_microstep: 1061.43 | bwd_allreduce_microstep: 8.94 | step_microstep: 37.56
[2024-06-11 00:32:38,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16522.12 | bwd: 44094.98 | bwd_inner: 44085.13 | bwd_allreduce: 9.17 | step: 39.03
23:49:23<6:00:36, 62.00s/it]


 80%|███████▉  | 1377/1726 [23:49:23<6:00:36, 62.00s/it]
 80%|███████▉  | 1378/1726 [23:50:26<6:00:54, 62.23s/it]


 80%|███████▉  | 1378/1726 [23:50:26<6:00:54, 62.23s/it]
 80%|███████▉  | 1379/1726 [23:51:26<5:55:18, 61.44s/it]


 80%|███████▉  | 1379/1726 [23:51:26<5:55:18, 61.44s/it]
 80%|███████▉  | 1380/1726 [23:52:27<5:54:39, 61.50s/it]


 80%|███████▉  | 1380/1726 [23:52:27<5:54:39, 61.50s/it]
 80%|████████  | 1381/1726 [23:53:29<5:53:11, 61.42s/it]


 80%|████████  | 1381/1726 [23:53:29<5:53:11, 61.42s/it]
 80%|████████  | 1382/1726 [23:5{'loss': 1.2438, 'learning_rate': 4.025031230100913e-06, 'epoch': 0.8}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-11 00:32:40,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.61 | bwd_microstep: 1494.85 | bwd_inner_microstep: 1494.70 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4416
[2024-06-11 00:32:43,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.18 | bwd_microstep: 1711.51 | bwd_inner_microstep: 1711.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4155
[2024-06-11 00:32:45,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.81 | bwd_microstep: 1536.99 | bwd_inner_microstep: 1536.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 00:32:47,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.83 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-11 00:32:49,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.44 | bwd_microstep: 1390.28 | bwd_inner_microstep: 1390.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 00:32:50,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1240.08 | bwd_inner_microstep: 1240.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-11 00:32:52,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.27 | bwd_microstep: 1180.23 | bwd_inner_microstep: 1180.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 00:32:54,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-11 00:32:56,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.64 | bwd_microstep: 1430.94 | bwd_inner_microstep: 1430.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3667
[2024-06-11 00:32:58,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.64 | bwd_microstep: 1580.83 | bwd_inner_microstep: 1580.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497
[2024-06-11 00:33:00,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.41 | bwd_microstep: 1508.42 | bwd_inner_microstep: 1508.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 00:33:02,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.96 | bwd_microstep: 1515.80 | bwd_inner_microstep: 1515.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3445
[2024-06-11 00:33:04,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.22 | bwd_microstep: 1515.67 | bwd_inner_microstep: 1515.60 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 00:33:06,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.19 | bwd_microstep: 1481.19 | bwd_inner_microstep: 1481.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.28
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 00:33:08,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1385.81 | bwd_inner_microstep: 1385.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-11 00:33:10,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.07 | bwd_microstep: 1661.70 | bwd_inner_microstep: 1661.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 00:33:12,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.11 | bwd_microstep: 799.29 | bwd_inner_microstep: 799.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 00:33:13,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.82 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 00:33:16,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1662.15 | bwd_inner_microstep: 1662.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 00:33:18,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.11 | bwd_microstep: 1487.36 | bwd_inner_microstep: 1487.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-11 00:33:20,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1291.44 | bwd_inner_microstep: 1291.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 00:33:21,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1013
[2024-06-11 00:33:22,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 164.15 | bwd_microstep: 428.94 | bwd_inner_microstep: 428.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756
[2024-06-11 00:33:24,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.56 | bwd_microstep: 1648.24 | bwd_inner_microstep: 1648.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-11 00:33:26,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1312.77 | bwd_inner_microstep: 1312.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668
[2024-06-11 00:33:28,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.61 | bwd_microstep: 1327.17 | bwd_inner_microstep: 1327.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1926
[2024-06-11 00:33:29,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.63 | bwd_microstep: 758.85 | bwd_inner_microstep: 758.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-11 00:33:31,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1411.03 | bwd_inner_microstep: 1410.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1932
[2024-06-11 00:33:32,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.96 | bwd_microstep: 735.14 | bwd_inner_microstep: 735.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2279
[2024-06-11 00:33:34,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.21 | bwd_microstep: 1073.58 | bwd_inner_microstep: 1073.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-11 00:33:36,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-11 00:34:00,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-11 00:34:00,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.79 | bwd_microstep: 24347.94 | bwd_inner_microstep: 1697.77 | bwd_allreduce_microstep: 22650.08 | step_microstep: 38.99
[2024-06-11 00:34:00,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16090.75 | bwd: 65901.07 | bwd_inner: 43249.69 | bwd_allreduce: 22650.44 | step: 41.92
{'loss': 1.2094, 'learning_rate': 4.002476498759303e-06, 'epoch': 0.8}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 00:34:02,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.54 | bwd_microstep: 1334.22 | bwd_inner_microstep: 1334.13 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3914
[2024-06-11 00:34:05,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.36 | bwd_microstep: 1628.29 | bwd_inner_microstep: 1628.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 00:34:06,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1382.51 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 00:34:08,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1377.58 | bwd_inner_microstep: 1377.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-11 00:34:10,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.10 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2446
[2024-06-11 00:34:12,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.20 | bwd_microstep: 1040.10 | bwd_inner_microstep: 1040.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-11 00:34:13,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.95 | bwd_microstep: 1184.61 | bwd_inner_microstep: 1184.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-11 00:34:15,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1525.41 | bwd_inner_microstep: 1525.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 00:34:17,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3677
[2024-06-11 00:34:19,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.91 | bwd_microstep: 1450.66 | bwd_inner_microstep: 1450.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 00:34:28,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537
[2024-06-11 00:34:30,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1319.16 | bwd_inner_microstep: 1319.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 00:34:32,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.32 | bwd_microstep: 1375.99 | bwd_inner_microstep: 1375.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2011
[2024-06-11 00:34:33,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.09 | bwd_microstep: 717.56 | bwd_inner_microstep: 717.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-11 00:34:35,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1419.79 | bwd_inner_microstep: 1419.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3516
[2024-06-11 00:34:37,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.24 | bwd_microstep: 1311.84 | bwd_inner_microstep: 1311.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-11 00:34:39,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.46 | bwd_microstep: 1446.41 | bwd_inner_microstep: 1446.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-11 00:34:41,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1313.60 | bwd_inner_microstep: 1313.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 00:34:43,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.92 | bwd_microstep: 1549.37 | bwd_inner_microstep: 1549.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150
[2024-06-11 00:34:44,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.67 | bwd_microstep: 946.25 | bwd_inner_microstep: 946.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111
[2024-06-11 00:34:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.64 | bwd_microstep: 917.16 | bwd_inner_microstep: 917.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 00:34:47,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.49 | bwd_microstep: 1647.73 | bwd_inner_microstep: 1647.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538
[2024-06-11 00:34:49,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.60 | bwd_microstep: 1292.56 | bwd_inner_microstep: 1292.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-11 00:34:51,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1399.80 | bwd_inner_microstep: 1399.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-11 00:34:53,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.16 | bwd_microstep: 1182.55 | bwd_inner_microstep: 1182.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2281
[2024-06-11 00:34:54,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.89 | bwd_microstep: 812.36 | bwd_inner_microstep: 812.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3780
[2024-06-11 00:34:56,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.77 | bwd_microstep: 1604.08 | bwd_inner_microstep: 1604.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-11 00:34:58,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1423.63 | bwd_inner_microstep: 1423.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-11 00:35:00,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1429.95 | bwd_inner_microstep: 1429.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 00:35:02,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1554.33 | bwd_inner_microstep: 1554.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2722
[2024-06-11 00:35:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.69 | bwd_microstep: 1167.91 | bwd_inner_microstep: 1167.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023
[2024-06-11 00:35:05,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.04 | optimizer_step: 6.61
[2024-06-11 00:35:05,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.29 | bwd_microstep: 945.37 | bwd_inner_microstep: 937.18 | bwd_allreduce_microstep: 8.14 | step_microstep: 37.53
[2024-06-11 00:35:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15607.90 | bwd: 41574.14 | bwd_inner: 41565.02 | bwd_allreduce: 8.43 | step: 39.07
{'loss': 1.1982, 'learning_rate': 3.979978110612313e-06, 'epoch': 0.8}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-11 00:35:07,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3023
[2024-06-11 00:35:09,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.10 | bwd_microstep: 1136.98 | bwd_inner_microstep: 1136.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 00:35:10,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1254.69 | bwd_inner_microstep: 1254.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868
[2024-06-11 00:35:13,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.68 | bwd_microstep: 1563.43 | bwd_inner_microstep: 1563.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 00:35:15,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1382.91 | bwd_inner_microstep: 1382.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-11 00:35:16,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.58 | bwd_microstep: 1150.61 | bwd_inner_microstep: 1150.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-11 00:35:17,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.79 | bwd_microstep: 698.40 | bwd_inner_microstep: 698.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 00:35:19,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4044
[2024-06-11 00:35:21,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1421.92 | bwd_inner_microstep: 1421.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-11 00:35:22,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.56 | bwd_microstep: 796.21 | bwd_inner_microstep: 796.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-11 00:35:23,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 793.49 | bwd_inner_microstep: 793.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1913
[2024-06-11 00:35:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.63 | bwd_microstep: 748.43 | bwd_inner_microstep: 748.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3655
[2024-06-11 00:35:26,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.46 | bwd_microstep: 1548.00 | bwd_inner_microstep: 1547.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3657
[2024-06-11 00:35:29,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.26 | bwd_microstep: 1768.81 | bwd_inner_microstep: 1768.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3458
[2024-06-11 00:35:31,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.02 | bwd_microstep: 1433.61 | bwd_inner_microstep: 1433.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3831
[2024-06-11 00:35:33,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.89 | bwd_microstep: 1863.45 | bwd_inner_microstep: 1863.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3631
[2024-06-11 00:35:35,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1441.49 | bwd_inner_microstep: 1441.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 551
[2024-06-11 00:35:35,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.72 | bwd_microstep: 246.97 | bwd_inner_microstep: 246.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3522
[2024-06-11 00:35:38,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1523.40 | bwd_inner_microstep: 1523.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3712
[2024-06-11 00:35:39,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1339.10 | bwd_inner_microstep: 1339.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-11 00:35:41,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1449.72 | bwd_inner_microstep: 1449.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-11 00:35:44,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.31 | bwd_microstep: 1549.92 | bwd_inner_microstep: 1549.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3625
[2024-06-11 00:35:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1247.67 | bwd_inner_microstep: 1247.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-11 00:35:47,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1496.79 | bwd_inner_microstep: 1496.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-11 00:35:49,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.12 | bwd_microstep: 1355.57 | bwd_inner_microstep: 1355.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 00:35:51,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.61 | bwd_microstep: 1403.84 | bwd_inner_microstep: 1403.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-11 00:35:54,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.03 | bwd_microstep: 1726.58 | bwd_inner_microstep: 1726.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273
[2024-06-11 00:35:55,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.72 | bwd_microstep: 1071.28 | bwd_inner_microstep: 1071.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3734
[2024-06-11 00:35:57,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1468.71 | bwd_inner_microstep: 1468.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2228
[2024-06-11 00:35:58,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.03 | bwd_microstep: 899.57 | bwd_inner_microstep: 899.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 00:36:00,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.93 | bwd_microstep: 1503.26 | bwd_inner_microstep: 1503.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 00:36:07,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 00:36:07,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.33 | bwd_microstep: 6496.40 | bwd_inner_microstep: 1531.88 | bwd_allreduce_microstep: 4964.48 | step_microstep: 37.71
[2024-06-11 00:36:07,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15430.87 | bwd: 46411.64 | bwd_inner: 41446.26 | bwd_allreduce: 4964.70 | step: 39.16
{'loss': 1.1896, 'learning_rate': 3.957536144899123e-06, 'epoch': 0.8}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-11 00:36:09,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.22 | bwd_microstep: 1334.45 | bwd_inner_microstep: 1334.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 00:36:11,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.76 | bwd_microstep: 1276.79 | bwd_inner_microstep: 1276.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289
[2024-06-11 00:36:12,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.45 | bwd_microstep: 873.08 | bwd_inner_microstep: 873.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 00:36:15,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.95 | bwd_microstep: 1649.64 | bwd_inner_microstep: 1649.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 00:36:16,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1376.16 | bwd_inner_microstep: 1376.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4059
[2024-06-11 00:36:19,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.15 | bwd_microstep: 1519.62 | bwd_inner_microstep: 1519.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-11 00:36:20,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1249.71 | bwd_inner_microstep: 1249.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2536
[2024-06-11 00:36:21,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.55 | bwd_microstep: 840.97 | bwd_inner_microstep: 840.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-11 00:36:23,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1411.75 | bwd_inner_microstep: 1411.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3755
[2024-06-11 00:36:26,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.24 | bwd_microstep: 1620.27 | bwd_inner_microstep: 1620.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1853
[2024-06-11 00:36:27,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.19 | bwd_microstep: 735.74 | bwd_inner_microstep: 735.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 00:36:29,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1480.61 | bwd_inner_microstep: 1480.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3662
[2024-06-11 00:36:31,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.09 | bwd_microstep: 1717.46 | bwd_inner_microstep: 1717.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469
[2024-06-11 00:36:33,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.27 | bwd_microstep: 1440.24 | bwd_inner_microstep: 1440.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 00:36:35,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.09 | bwd_microstep: 1381.70 | bwd_inner_microstep: 1381.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909
[2024-06-11 00:36:36,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 687.01 | bwd_inner_microstep: 686.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 00:36:38,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.73 | bwd_microstep: 1287.55 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-11 00:36:40,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.61 | bwd_microstep: 1656.99 | bwd_inner_microstep: 1656.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-11 00:36:42,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.49 | bwd_microstep: 1416.59 | bwd_inner_microstep: 1416.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-11 00:36:44,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 637
[2024-06-11 00:36:44,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.91 | bwd_microstep: 264.87 | bwd_inner_microstep: 264.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 00:36:46,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-11 00:36:48,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.37 | bwd_microstep: 1459.72 | bwd_inner_microstep: 1459.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856
[2024-06-11 00:36:50,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.29 | bwd_microstep: 1661.34 | bwd_inner_microstep: 1661.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692
[2024-06-11 00:36:52,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.37 | bwd_microstep: 1620.67 | bwd_inner_microstep: 1620.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-11 00:36:55,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.34 | bwd_microstep: 1509.24 | bwd_inner_microstep: 1509.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-11 00:36:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.75 | bwd_microstep: 1512.08 | bwd_inner_microstep: 1512.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-11 00:36:59,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.84 | bwd_microstep: 1490.09 | bwd_inner_microstep: 1490.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 00:37:01,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1494.59 | bwd_inner_microstep: 1494.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-11 00:37:03,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1342.27 | bwd_inner_microstep: 1342.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382
[2024-06-11 00:37:04,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.40 | bwd_microstep: 1272.73 | bwd_inner_microstep: 1272.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-11 00:37:08,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.03 | optimizer_step: 6.62
[2024-06-11 00:37:08,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.13 | bwd_microstep: 2812.30 | bwd_inner_microstep: 1631.73 | bwd_allreduce_microstep: 1180.52 | step_microstep: 37.45
[2024-06-11 00:37:08,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15957.09 | bwd: 44070.63 | bwd_inner: 42889.21 | bwd_allreduce: 1180.74 | step: 38.92
{'loss': 1.198, 'learning_rate': 3.9351506806602425e-06, 'epoch': 0.8}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3449
[2024-06-11 00:37:10,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.93 | bwd_microstep: 1548.52 | bwd_inner_microstep: 1548.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3838
[2024-06-11 00:37:12,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.45 | bwd_microstep: 1511.28 | bwd_inner_microstep: 1511.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3854
[2024-06-11 00:37:14,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1458.23 | bwd_inner_microstep: 1458.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857
[2024-06-11 00:37:16,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.03 | bwd_microstep: 1660.08 | bwd_inner_microstep: 1660.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 00:37:18,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.11 | bwd_microstep: 1279.41 | bwd_inner_microstep: 1279.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1953
[2024-06-11 00:37:19,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.49 | bwd_microstep: 699.06 | bwd_inner_microstep: 699.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-11 00:37:21,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.13 | bwd_microstep: 1531.99 | bwd_inner_microstep: 1531.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 00:37:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-11 00:37:25,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.98 | bwd_microstep: 1532.23 | bwd_inner_microstep: 1532.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881
[2024-06-11 00:37:26,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.04 | bwd_microstep: 710.58 | bwd_inner_microstep: 710.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2155
[2024-06-11 00:37:27,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.49 | bwd_microstep: 882.33 | bwd_inner_microstep: 882.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3669
[2024-06-11 00:37:30,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.24 | bwd_microstep: 1771.00 | bwd_inner_microstep: 1770.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2836
[2024-06-11 00:37:31,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.11 | bwd_microstep: 1259.97 | bwd_inner_microstep: 1259.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-11 00:37:33,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1450.67 | bwd_inner_microstep: 1450.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3515
[2024-06-11 00:37:35,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.02 | bwd_microstep: 1514.40 | bwd_inner_microstep: 1514.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-11 00:37:37,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.66 | bwd_microstep: 1414.84 | bwd_inner_microstep: 1414.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 00:37:40,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1489.81 | bwd_inner_microstep: 1489.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2086
[2024-06-11 00:37:41,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.36 | bwd_microstep: 917.66 | bwd_inner_microstep: 917.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-11 00:37:43,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 1301.52 | bwd_inner_microstep: 1301.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088
[2024-06-11 00:37:44,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.73 | bwd_microstep: 728.23 | bwd_inner_microstep: 728.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 00:37:46,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1397.72 | bwd_inner_microstep: 1397.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-11 00:37:47,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.61 | bwd_microstep: 1387.35 | bwd_inner_microstep: 1387.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-11 00:37:50,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1551.50 | bwd_inner_microstep: 1551.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812
[2024-06-11 00:37:52,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.97 | bwd_microstep: 1723.00 | bwd_inner_microstep: 1722.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 00:37:54,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.10 | bwd_microstep: 1256.24 | bwd_inner_microstep: 1256.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-11 00:37:56,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.98 | bwd_microstep: 1448.32 | bwd_inner_microstep: 1448.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3833
[2024-06-11 00:37:58,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.05 | bwd_microstep: 1729.23 | bwd_inner_microstep: 1729.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-11 00:38:00,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1572.96 | bwd_inner_microstep: 1572.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-11 00:38:02,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1491.77 | bwd_inner_microstep: 1491.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564
[2024-06-11 00:38:04,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.12 | bwd_microstep: 1206.69 | bwd_inner_microstep: 1206.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-11 00:38:06,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.12 | bwd_microstep: 1649.37 | bwd_inner_microstep: 1649.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3765
[2024-06-11 00:38:09,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.03 | optimizer_step: 6.63
[2024-06-11 00:38:09,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 2446.98 | bwd_inner_microstep: 1549.37 | bwd_allreduce_microstep: 897.56 | step_microstep: 37.65
[2024-06-11 00:38:09,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16342.96 | bwd: 44805.35 | bwd_inner: 43906.90 | bwd_allreduce: 897.78 | step: 39.10
5:15<7:09:03, 74.84s/it]


 80%|████████  | 1382/1726 [23:55:15<7:09:03, 74.84s/it]
 80%|████████  | 1383/1726 [23:56:37<7:20:41, 77.09s/it]


 80%|████████  | 1383/1726 [23:56:37<7:20:41, 77.09s/it]
 80%|████████  | 1384/1726 [23:57:42<6:58:20, 73.39s/it]


 80%|████████  | 1384/1726 [23:57:42<6:58:20, 73.39s/it]
 80%|████████  | 1385/1726 [23:58:44<6:37:59, 70.03s/it]


 80%|████████  | 1385/1726 [23:58:44<6:37:59, 70.03s/it]
 80%|████████  | 1386/1726 [23:59:45<6:20:23, 67.13s/it]


 80%|████████  | 1386/1726 [23:59:45<6:20:23, 67.13s/it]
 80%|████████  | 1387/1726 [24:00:46{'loss': 1.1901, 'learning_rate': 3.9128217967371515e-06, 'epoch': 0.8}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3454
[2024-06-11 00:38:11,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.84 | bwd_microstep: 1547.33 | bwd_inner_microstep: 1547.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-11 00:38:13,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1481.12 | bwd_inner_microstep: 1481.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871
[2024-06-11 00:38:16,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.00 | bwd_microstep: 1664.75 | bwd_inner_microstep: 1664.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-11 00:38:17,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.84 | bwd_microstep: 1275.73 | bwd_inner_microstep: 1275.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-11 00:38:19,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.75 | bwd_microstep: 1446.08 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2047
[2024-06-11 00:38:21,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.44 | bwd_microstep: 777.34 | bwd_inner_microstep: 777.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-11 00:38:23,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.13 | bwd_microstep: 1442.24 | bwd_inner_microstep: 1442.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-11 00:38:25,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.48 | bwd_microstep: 1630.09 | bwd_inner_microstep: 1630.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2649
[2024-06-11 00:38:26,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.53 | bwd_microstep: 922.82 | bwd_inner_microstep: 922.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2218
[2024-06-11 00:38:27,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.52 | bwd_microstep: 876.75 | bwd_inner_microstep: 876.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 00:38:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1393.81 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-11 00:38:30,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 795.10 | bwd_inner_microstep: 795.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-11 00:38:31,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.65 | bwd_microstep: 681.03 | bwd_inner_microstep: 681.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-11 00:38:32,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.44 | bwd_microstep: 717.18 | bwd_inner_microstep: 717.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3452
[2024-06-11 00:38:34,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.13 | bwd_microstep: 1202.49 | bwd_inner_microstep: 1202.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3682
[2024-06-11 00:38:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.65 | bwd_microstep: 1666.86 | bwd_inner_microstep: 1666.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 00:38:38,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-11 00:38:40,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.56 | bwd_microstep: 1580.49 | bwd_inner_microstep: 1580.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 00:38:42,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1378.36 | bwd_inner_microstep: 1378.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074
[2024-06-11 00:38:44,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.61 | bwd_microstep: 1013.38 | bwd_inner_microstep: 1013.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3542
[2024-06-11 00:38:46,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.74 | bwd_microstep: 1618.35 | bwd_inner_microstep: 1618.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-11 00:38:47,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.43 | bwd_microstep: 975.98 | bwd_inner_microstep: 975.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 00:38:49,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1399.46 | bwd_inner_microstep: 1399.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-11 00:38:51,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1492.18 | bwd_inner_microstep: 1492.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3832
[2024-06-11 00:38:54,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.85 | bwd_microstep: 1749.87 | bwd_inner_microstep: 1749.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389
[2024-06-11 00:38:55,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.48 | bwd_microstep: 1300.00 | bwd_inner_microstep: 1299.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-11 00:38:57,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.23 | bwd_microstep: 972.41 | bwd_inner_microstep: 972.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-11 00:38:58,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.19 | bwd_microstep: 1185.40 | bwd_inner_microstep: 1185.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 00:39:00,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.96 | bwd_microstep: 1507.63 | bwd_inner_microstep: 1507.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3827
[2024-06-11 00:39:03,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.42 | bwd_microstep: 1585.30 | bwd_inner_microstep: 1585.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589
[2024-06-11 00:39:05,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 1438.52 | bwd_inner_microstep: 1438.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-11 00:39:09,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 00:39:09,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.74 | bwd_microstep: 4226.70 | bwd_inner_microstep: 1309.09 | bwd_allreduce_microstep: 2917.56 | step_microstep: 37.85
[2024-06-11 00:39:09,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15417.33 | bwd: 44324.88 | bwd_inner: 41406.41 | bwd_allreduce: 2917.79 | step: 39.30
{'loss': 1.2146, 'learning_rate': 3.890549571772062e-06, 'epoch': 0.8}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 00:39:11,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1462.75 | bwd_inner_microstep: 1462.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405
[2024-06-11 00:39:13,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.42 | bwd_microstep: 1369.30 | bwd_inner_microstep: 1369.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 00:39:15,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1340.63 | bwd_inner_microstep: 1340.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 00:39:17,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.82 | bwd_microstep: 1284.21 | bwd_inner_microstep: 1284.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 00:39:19,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1275.46 | bwd_inner_microstep: 1275.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2074
[2024-06-11 00:39:20,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.73 | bwd_microstep: 725.64 | bwd_inner_microstep: 725.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-11 00:39:21,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.68 | bwd_microstep: 798.69 | bwd_inner_microstep: 798.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 00:39:23,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1245.71 | bwd_inner_microstep: 1245.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-11 00:39:25,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.53 | bwd_microstep: 1521.70 | bwd_inner_microstep: 1521.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-11 00:39:27,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.14 | bwd_microstep: 1514.63 | bwd_inner_microstep: 1514.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3694
[2024-06-11 00:39:29,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.28 | bwd_microstep: 1793.51 | bwd_inner_microstep: 1793.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3472
[2024-06-11 00:39:31,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.76 | bwd_microstep: 1438.74 | bwd_inner_microstep: 1438.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-11 00:39:33,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.04 | bwd_microstep: 1447.71 | bwd_inner_microstep: 1447.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-11 00:39:35,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.24 | bwd_microstep: 1319.34 | bwd_inner_microstep: 1319.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 00:39:37,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.93 | bwd_microstep: 1286.56 | bwd_inner_microstep: 1286.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2013
[2024-06-11 00:39:38,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.52 | bwd_microstep: 834.95 | bwd_inner_microstep: 834.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3467
[2024-06-11 00:39:40,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.06 | bwd_microstep: 1347.35 | bwd_inner_microstep: 1347.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-11 00:39:41,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.75 | bwd_microstep: 696.13 | bwd_inner_microstep: 696.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 00:39:43,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.53 | bwd_microstep: 1456.78 | bwd_inner_microstep: 1456.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-11 00:39:44,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.08 | bwd_microstep: 1197.63 | bwd_inner_microstep: 1197.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-11 00:39:46,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1441.36 | bwd_inner_microstep: 1441.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 00:39:48,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1395.42 | bwd_inner_microstep: 1395.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-11 00:39:49,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.81 | bwd_microstep: 698.08 | bwd_inner_microstep: 698.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809
[2024-06-11 00:39:51,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1475.88 | bwd_inner_microstep: 1475.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 00:39:54,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.68 | bwd_microstep: 1654.65 | bwd_inner_microstep: 1654.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823
[2024-06-11 00:39:56,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.54 | bwd_microstep: 1751.57 | bwd_inner_microstep: 1751.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 00:39:58,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1472.02 | bwd_inner_microstep: 1471.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-11 00:40:00,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.04 | bwd_microstep: 1757.60 | bwd_inner_microstep: 1757.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037
[2024-06-11 00:40:02,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.26 | bwd_microstep: 810.97 | bwd_inner_microstep: 810.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2670
[2024-06-11 00:40:03,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.19 | bwd_microstep: 1120.31 | bwd_inner_microstep: 1120.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3582
[2024-06-11 00:40:05,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1528.12 | bwd_inner_microstep: 1528.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3471
[2024-06-11 00:40:10,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.10 | optimizer_step: 6.62
[2024-06-11 00:40:10,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 4117.04 | bwd_inner_microstep: 1499.11 | bwd_allreduce_microstep: 2617.88 | step_microstep: 37.80
[2024-06-11 00:40:10,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15659.81 | bwd: 44580.46 | bwd_inner: 41961.67 | bwd_allreduce: 2618.12 | step: 39.25
{'loss': 1.1688, 'learning_rate': 3.868334084207637e-06, 'epoch': 0.8}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 00:40:12,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.01 | bwd_microstep: 1334.46 | bwd_inner_microstep: 1334.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474
[2024-06-11 00:40:13,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.56 | bwd_microstep: 1238.78 | bwd_inner_microstep: 1238.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3885
[2024-06-11 00:40:16,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.93 | bwd_microstep: 1578.13 | bwd_inner_microstep: 1578.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-11 00:40:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.46 | bwd_microstep: 1450.86 | bwd_inner_microstep: 1450.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4085
[2024-06-11 00:40:20,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.15 | bwd_microstep: 1626.90 | bwd_inner_microstep: 1626.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2407
[2024-06-11 00:40:21,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.94 | bwd_microstep: 936.40 | bwd_inner_microstep: 936.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4144
[2024-06-11 00:40:23,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.39 | bwd_microstep: 1441.50 | bwd_inner_microstep: 1441.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 00:40:25,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1247.51 | bwd_inner_microstep: 1247.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995
[2024-06-11 00:40:26,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 800.25 | bwd_inner_microstep: 800.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1953
[2024-06-11 00:40:27,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.00 | bwd_microstep: 729.07 | bwd_inner_microstep: 729.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3409
[2024-06-11 00:40:29,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1296.00 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 00:40:31,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1485.65 | bwd_inner_microstep: 1485.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2109
[2024-06-11 00:40:32,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.45 | bwd_microstep: 900.53 | bwd_inner_microstep: 900.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 00:40:34,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1248.11 | bwd_inner_microstep: 1248.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665
[2024-06-11 00:40:36,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.16 | bwd_microstep: 1722.43 | bwd_inner_microstep: 1722.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3826
[2024-06-11 00:40:39,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.49 | bwd_microstep: 1754.85 | bwd_inner_microstep: 1754.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2560
[2024-06-11 00:40:40,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.09 | bwd_microstep: 1094.34 | bwd_inner_microstep: 1094.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-11 00:40:42,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.41 | bwd_microstep: 1323.71 | bwd_inner_microstep: 1323.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2146
[2024-06-11 00:40:43,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.87 | bwd_microstep: 851.95 | bwd_inner_microstep: 851.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 00:40:45,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.25 | bwd_microstep: 1398.31 | bwd_inner_microstep: 1398.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-11 00:40:47,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.15 | bwd_microstep: 1423.52 | bwd_inner_microstep: 1423.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 00:40:49,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.39 | bwd_microstep: 1656.78 | bwd_inner_microstep: 1656.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 00:40:51,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.08 | bwd_microstep: 1554.76 | bwd_inner_microstep: 1554.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-11 00:40:54,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.55 | bwd_microstep: 1486.51 | bwd_inner_microstep: 1486.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291
[2024-06-11 00:40:55,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.55 | bwd_microstep: 880.51 | bwd_inner_microstep: 880.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-11 00:40:56,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.86 | bwd_microstep: 806.58 | bwd_inner_microstep: 806.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-11 00:40:58,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.29 | bwd_microstep: 1428.51 | bwd_inner_microstep: 1428.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3511
[2024-06-11 00:41:00,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.34 | bwd_microstep: 1227.28 | bwd_inner_microstep: 1227.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-11 00:41:01,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1404.95 | bwd_inner_microstep: 1404.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3815
[2024-06-11 00:41:04,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1495.70 | bwd_inner_microstep: 1495.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 00:41:06,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.33 | bwd_microstep: 1497.50 | bwd_inner_microstep: 1497.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-11 00:41:11,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-11 00:41:11,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.26 | bwd_microstep: 5159.35 | bwd_inner_microstep: 1628.50 | bwd_allreduce_microstep: 3530.80 | step_microstep: 37.91
[2024-06-11 00:41:11,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15640.97 | bwd: 45481.70 | bwd_inner: 41950.01 | bwd_allreduce: 3531.03 | step: 39.38
{'loss': 1.1375, 'learning_rate': 3.846175412286701e-06, 'epoch': 0.81}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-11 00:41:13,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.64 | bwd_microstep: 1180.08 | bwd_inner_microstep: 1180.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 00:41:15,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.95 | bwd_microstep: 1474.07 | bwd_inner_microstep: 1474.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 00:41:17,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.06 | bwd_microstep: 1292.15 | bwd_inner_microstep: 1292.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 00:41:19,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1243.37 | bwd_inner_microstep: 1243.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-11 00:41:20,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.79 | bwd_microstep: 1284.79 | bwd_inner_microstep: 1284.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 00:41:22,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 00:41:24,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1251.66 | bwd_inner_microstep: 1251.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486
[2024-06-11 00:41:26,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.67 | bwd_microstep: 1412.21 | bwd_inner_microstep: 1412.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-11 00:41:28,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.34 | bwd_microstep: 1279.79 | bwd_inner_microstep: 1279.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487
[2024-06-11 00:41:30,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.59 | bwd_microstep: 1443.06 | bwd_inner_microstep: 1443.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511
[2024-06-11 00:41:32,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1442.96 | bwd_inner_microstep: 1442.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-11 00:41:34,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.78 | bwd_microstep: 1508.98 | bwd_inner_microstep: 1508.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3526
[2024-06-11 00:41:36,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.81 | bwd_microstep: 1541.18 | bwd_inner_microstep: 1541.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 00:41:38,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1397.94 | bwd_inner_microstep: 1397.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2088
[2024-06-11 00:41:39,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.88 | bwd_microstep: 917.91 | bwd_inner_microstep: 917.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 00:41:41,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1282.84 | bwd_inner_microstep: 1282.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-11 00:41:43,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.59 | bwd_microstep: 1186.82 | bwd_inner_microstep: 1186.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 00:41:45,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.01 | bwd_microstep: 1657.88 | bwd_inner_microstep: 1657.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 00:41:47,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1396.92 | bwd_inner_microstep: 1396.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653
[2024-06-11 00:41:49,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.41 | bwd_microstep: 1420.35 | bwd_inner_microstep: 1420.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-11 00:41:51,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1341.64 | bwd_inner_microstep: 1341.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-11 00:41:53,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1549.09 | bwd_inner_microstep: 1549.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-11 00:41:55,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1401.19 | bwd_inner_microstep: 1401.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-11 00:41:57,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.38 | bwd_microstep: 1449.80 | bwd_inner_microstep: 1449.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3703
[2024-06-11 00:41:59,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1364.09 | bwd_inner_microstep: 1364.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1976
[2024-06-11 00:42:00,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.37 | bwd_microstep: 735.42 | bwd_inner_microstep: 735.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 00:42:02,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.60 | bwd_microstep: 1648.02 | bwd_inner_microstep: 1647.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-11 00:42:04,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.68 | bwd_microstep: 1750.50 | bwd_inner_microstep: 1750.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 00:42:06,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1487.23 | bwd_inner_microstep: 1487.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2553
[2024-06-11 00:42:08,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.57 | bwd_microstep: 965.78 | bwd_inner_microstep: 965.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-11 00:42:10,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1439.99 | bwd_inner_microstep: 1439.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-11 00:42:13,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-11 00:42:13,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 2909.44 | bwd_inner_microstep: 1527.62 | bwd_allreduce_microstep: 1381.77 | step_microstep: 37.80
[2024-06-11 00:42:13,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16316.21 | bwd: 45037.34 | bwd_inner: 43654.67 | bwd_allreduce: 1381.99 | step: 39.21
{'loss': 1.1604, 'learning_rate': 3.824073634051993e-06, 'epoch': 0.81}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419
[2024-06-11 00:42:15,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.92 | bwd_microstep: 1272.89 | bwd_inner_microstep: 1272.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4444
[2024-06-11 00:42:17,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.21 | bwd_microstep: 1722.37 | bwd_inner_microstep: 1722.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3869
[2024-06-11 00:42:19,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1459.95 | bwd_inner_microstep: 1459.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3794
[2024-06-11 00:42:21,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.61 | bwd_microstep: 1347.67 | bwd_inner_microstep: 1347.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850
[2024-06-11 00:42:23,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.76 | bwd_microstep: 1660.28 | bwd_inner_microstep: 1660.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747
[2024-06-11 00:42:25,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1432.25 | bwd_inner_microstep: 1432.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-11 00:42:27,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.71 | bwd_microstep: 1345.48 | bwd_inner_microstep: 1345.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-11 00:42:29,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1388.81 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414
[2024-06-11 00:42:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.82 | bwd_microstep: 1210.63 | bwd_inner_microstep: 1210.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-11 00:42:33,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.40 | bwd_microstep: 1298.79 | bwd_inner_microstep: 1298.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2090
[2024-06-11 00:42:34,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.67 | bwd_microstep: 821.51 | bwd_inner_microstep: 821.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3755
[2024-06-11 00:42:36,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.71 | bwd_microstep: 1566.95 | bwd_inner_microstep: 1566.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513
[2024-06-11 00:42:38,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1418.13 | bwd_inner_microstep: 1418.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3664
[2024-06-11 00:42:40,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.70 | bwd_microstep: 1322.93 | bwd_inner_microstep: 1322.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3379
[2024-06-11 00:42:42,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.98 | bwd_microstep: 1362.24 | bwd_inner_microstep: 1362.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-11 00:42:44,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.53 | bwd_microstep: 1427.95 | bwd_inner_microstep: 1427.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 00:42:45,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.00 | bwd_microstep: 1406.55 | bwd_inner_microstep: 1406.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 00:42:47,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1354.49 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-11 00:42:49,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.31 | bwd_microstep: 1183.73 | bwd_inner_microstep: 1183.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-11 00:42:51,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.10 | bwd_microstep: 1536.72 | bwd_inner_microstep: 1536.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-11 00:42:53,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1415.09 | bwd_inner_microstep: 1415.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3497
[2024-06-11 00:42:55,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1254.11 | bwd_inner_microstep: 1253.64 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084
[2024-06-11 00:42:56,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.00 | bwd_microstep: 822.28 | bwd_inner_microstep: 822.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 00:42:58,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1387.36 | bwd_inner_microstep: 1387.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 00:43:00,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.88 | bwd_microstep: 1296.40 | bwd_inner_microstep: 1296.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3428
[2024-06-11 00:43:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.95 | bwd_microstep: 1395.88 | bwd_inner_microstep: 1395.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-11 00:43:04,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.14 | bwd_microstep: 1581.87 | bwd_inner_microstep: 1581.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-11 00:43:06,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.66 | bwd_microstep: 1514.77 | bwd_inner_microstep: 1514.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-11 00:43:08,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.58 | bwd_microstep: 1592.87 | bwd_inner_microstep: 1592.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417
[2024-06-11 00:43:10,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.61 | bwd_microstep: 1538.79 | bwd_inner_microstep: 1538.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-11 00:43:12,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.26 | bwd_microstep: 1277.90 | bwd_inner_microstep: 1277.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3573
[2024-06-11 00:43:14,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-11 00:43:14,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.57 | bwd_microstep: 1700.46 | bwd_inner_microstep: 1692.77 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.27
[2024-06-11 00:43:14,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16588.73 | bwd: 44318.14 | bwd_inner: 44309.50 | bwd_allreduce: 7.91 | step: 38.77
<6:09:42, 65.44s/it]


 80%|████████  | 1387/1726 [24:00:46<6:09:42, 65.44s/it]
 80%|████████  | 1388/1726 [24:01:46<5:59:34, 63.83s/it]


 80%|████████  | 1388/1726 [24:01:46<5:59:34, 63.83s/it]
 80%|████████  | 1389/1726 [24:02:47<5:53:00, 62.85s/it]


 80%|████████  | 1389/1726 [24:02:47<5:53:00, 62.85s/it]
 81%|████████  | 1390/1726 [24:03:48<5:49:37, 62.43s/it]


 81%|████████  | 1390/1726 [24:03:48<5:49:37, 62.43s/it]
 81%|████████  | 1391/1726 [24:04:50<5:47:19, 62.21s/it]


 81%|████████  | 1391/1726 [24:04:50<5:47:19, 62.21s/it]
 81%|████████  | 1392/1726 [24:05:51<5:4{'loss': 1.1747, 'learning_rate': 3.8020288273458493e-06, 'epoch': 0.81}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-11 00:43:16,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.49 | bwd_microstep: 1471.71 | bwd_inner_microstep: 1471.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3999
[2024-06-11 00:43:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.13 | bwd_microstep: 1710.09 | bwd_inner_microstep: 1710.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2368
[2024-06-11 00:43:20,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.86 | bwd_microstep: 924.86 | bwd_inner_microstep: 924.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-11 00:43:22,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.51 | bwd_microstep: 1311.86 | bwd_inner_microstep: 1311.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044
[2024-06-11 00:43:23,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.81 | bwd_microstep: 809.09 | bwd_inner_microstep: 809.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 00:43:25,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1389.45 | bwd_inner_microstep: 1389.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 00:43:27,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 00:43:28,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.23 | bwd_microstep: 1342.48 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-11 00:43:30,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 796.30 | bwd_inner_microstep: 796.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416
[2024-06-11 00:43:31,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.13 | bwd_microstep: 1186.19 | bwd_inner_microstep: 1186.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2476
[2024-06-11 00:43:33,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.23 | bwd_microstep: 958.56 | bwd_inner_microstep: 958.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 00:43:35,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.45 | bwd_microstep: 1520.55 | bwd_inner_microstep: 1520.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-11 00:43:36,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1351.45 | bwd_inner_microstep: 1351.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396
[2024-06-11 00:43:38,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.67 | bwd_microstep: 1277.75 | bwd_inner_microstep: 1277.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513
[2024-06-11 00:43:40,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1410.09 | bwd_inner_microstep: 1410.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 00:43:42,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1380.88 | bwd_inner_microstep: 1380.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3910
[2024-06-11 00:43:45,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.40 | bwd_microstep: 1763.08 | bwd_inner_microstep: 1763.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 00:43:46,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.34 | bwd_microstep: 1402.51 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 00:43:48,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.48 | bwd_microstep: 1398.36 | bwd_inner_microstep: 1398.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644
[2024-06-11 00:43:50,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.64 | bwd_microstep: 1316.96 | bwd_inner_microstep: 1316.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 664
[2024-06-11 00:43:51,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.59 | bwd_microstep: 280.39 | bwd_inner_microstep: 280.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3834
[2024-06-11 00:43:53,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1494.87 | bwd_inner_microstep: 1494.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-11 00:43:55,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.09 | bwd_microstep: 1627.30 | bwd_inner_microstep: 1627.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3725
[2024-06-11 00:43:57,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.78 | bwd_microstep: 1483.87 | bwd_inner_microstep: 1483.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2434
[2024-06-11 00:43:58,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.17 | bwd_microstep: 1009.72 | bwd_inner_microstep: 1009.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 00:44:00,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1253.49 | bwd_inner_microstep: 1253.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2406
[2024-06-11 00:44:02,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.40 | bwd_microstep: 1032.39 | bwd_inner_microstep: 1032.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808
[2024-06-11 00:44:04,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.57 | bwd_microstep: 1581.76 | bwd_inner_microstep: 1581.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2214
[2024-06-11 00:44:05,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.17 | bwd_microstep: 833.80 | bwd_inner_microstep: 833.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076
[2024-06-11 00:44:06,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.22 | bwd_microstep: 916.75 | bwd_inner_microstep: 916.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2068
[2024-06-11 00:44:07,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.97 | bwd_microstep: 754.85 | bwd_inner_microstep: 754.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-11 00:44:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-11 00:44:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.13 | bwd_microstep: 6887.15 | bwd_inner_microstep: 930.86 | bwd_allreduce_microstep: 5956.24 | step_microstep: 37.72
[2024-06-11 00:44:14,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14657.98 | bwd: 45160.76 | bwd_inner: 39203.59 | bwd_allreduce: 5956.47 | step: 39.32
{'loss': 1.1459, 'learning_rate': 3.7800410698099808e-06, 'epoch': 0.81}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-11 00:44:16,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1441.08 | bwd_inner_microstep: 1440.92 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-11 00:44:18,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.96 | bwd_microstep: 1242.04 | bwd_inner_microstep: 1242.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2447
[2024-06-11 00:44:19,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.20 | bwd_microstep: 941.91 | bwd_inner_microstep: 941.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 00:44:21,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1371.26 | bwd_inner_microstep: 1371.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 00:44:23,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1549.12 | bwd_inner_microstep: 1549.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-11 00:44:25,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 803.26 | bwd_inner_microstep: 803.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-11 00:44:27,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.00 | bwd_microstep: 1538.59 | bwd_inner_microstep: 1538.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 00:44:29,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.62 | bwd_microstep: 1386.57 | bwd_inner_microstep: 1386.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 00:44:31,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-11 00:44:32,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1189.73 | bwd_inner_microstep: 1189.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 00:44:34,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.97 | bwd_microstep: 1286.27 | bwd_inner_microstep: 1286.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-11 00:44:36,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.70 | bwd_microstep: 1318.58 | bwd_inner_microstep: 1318.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 00:44:38,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.71 | bwd_microstep: 1380.50 | bwd_inner_microstep: 1380.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-11 00:44:40,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.48 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3652
[2024-06-11 00:44:42,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.78 | bwd_microstep: 1818.72 | bwd_inner_microstep: 1818.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-11 00:44:44,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1556.07 | bwd_inner_microstep: 1556.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 00:44:46,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.81 | bwd_microstep: 1281.88 | bwd_inner_microstep: 1281.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3839
[2024-06-11 00:44:48,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.90 | bwd_microstep: 1361.15 | bwd_inner_microstep: 1361.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3831
[2024-06-11 00:44:50,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1625.80 | bwd_inner_microstep: 1625.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-11 00:44:51,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.86 | bwd_microstep: 696.51 | bwd_inner_microstep: 696.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-11 00:44:53,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1415.22 | bwd_inner_microstep: 1415.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3528
[2024-06-11 00:44:55,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1359.87 | bwd_inner_microstep: 1359.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468
[2024-06-11 00:44:57,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.40 | bwd_microstep: 1313.29 | bwd_inner_microstep: 1313.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432
[2024-06-11 00:44:59,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1312.08 | bwd_inner_microstep: 1312.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 00:45:00,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.13 | bwd_microstep: 1257.22 | bwd_inner_microstep: 1257.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 00:45:02,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.26 | bwd_microstep: 1454.84 | bwd_inner_microstep: 1454.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-11 00:45:05,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.28 | bwd_microstep: 1601.54 | bwd_inner_microstep: 1601.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 00:45:07,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1549.07 | bwd_inner_microstep: 1549.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-11 00:45:09,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1492.78 | bwd_inner_microstep: 1492.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 00:45:11,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.41 | bwd_microstep: 1401.36 | bwd_inner_microstep: 1401.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3537
[2024-06-11 00:45:13,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.70 | bwd_microstep: 1662.01 | bwd_inner_microstep: 1661.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3594
[2024-06-11 00:45:16,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.04 | optimizer_step: 6.64
[2024-06-11 00:45:16,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.54 | bwd_microstep: 1878.38 | bwd_inner_microstep: 1809.35 | bwd_allreduce_microstep: 68.98 | step_microstep: 37.57
[2024-06-11 00:45:16,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16500.09 | bwd: 44245.36 | bwd_inner: 44175.35 | bwd_allreduce: 69.27 | step: 39.11
{'loss': 1.164, 'learning_rate': 3.7581104388851363e-06, 'epoch': 0.81}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 00:45:17,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.57 | bwd_microstep: 1337.14 | bwd_inner_microstep: 1337.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2973
[2024-06-11 00:45:19,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.65 | bwd_microstep: 1203.20 | bwd_inner_microstep: 1203.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3932
[2024-06-11 00:45:21,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.65 | bwd_microstep: 1624.46 | bwd_inner_microstep: 1624.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3783
[2024-06-11 00:45:23,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-11 00:45:25,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.85 | bwd_microstep: 1152.14 | bwd_inner_microstep: 1152.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-11 00:45:27,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1279.12 | bwd_inner_microstep: 1279.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 00:45:28,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.58 | bwd_microstep: 1285.16 | bwd_inner_microstep: 1285.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 00:45:30,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1387.30 | bwd_inner_microstep: 1387.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-11 00:45:31,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.93 | bwd_microstep: 798.60 | bwd_inner_microstep: 798.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3637
[2024-06-11 00:45:33,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.19 | bwd_microstep: 1434.12 | bwd_inner_microstep: 1434.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2745
[2024-06-11 00:45:35,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.06 | bwd_microstep: 1171.99 | bwd_inner_microstep: 1171.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 00:45:37,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.96 | bwd_microstep: 1383.42 | bwd_inner_microstep: 1383.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661
[2024-06-11 00:45:39,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.98 | bwd_microstep: 1716.38 | bwd_inner_microstep: 1716.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-11 00:45:41,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.43 | bwd_microstep: 1584.43 | bwd_inner_microstep: 1584.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 00:45:43,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1291.58 | bwd_inner_microstep: 1291.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2019
[2024-06-11 00:45:44,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.74 | bwd_microstep: 714.64 | bwd_inner_microstep: 714.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-11 00:45:46,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1528.09 | bwd_inner_microstep: 1528.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-11 00:45:48,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1511.17 | bwd_inner_microstep: 1511.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-11 00:45:50,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1309.39 | bwd_inner_microstep: 1309.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 00:45:52,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1414.50 | bwd_inner_microstep: 1414.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-11 00:45:54,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.04 | bwd_microstep: 1183.96 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-11 00:45:56,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1444.57 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 00:45:58,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3548
[2024-06-11 00:45:59,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.05 | bwd_microstep: 1231.54 | bwd_inner_microstep: 1231.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-11 00:46:01,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-11 00:46:03,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 1539.66 | bwd_inner_microstep: 1539.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2051
[2024-06-11 00:46:04,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.48 | bwd_microstep: 817.84 | bwd_inner_microstep: 817.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544
[2024-06-11 00:46:06,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.37 | bwd_microstep: 1428.08 | bwd_inner_microstep: 1428.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-11 00:46:09,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.61 | bwd_microstep: 1750.19 | bwd_inner_microstep: 1750.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-11 00:46:11,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.26 | bwd_microstep: 1752.42 | bwd_inner_microstep: 1752.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-11 00:46:13,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1541.34 | bwd_inner_microstep: 1541.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3582
[2024-06-11 00:46:17,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.07 | optimizer_step: 6.56
[2024-06-11 00:46:17,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.54 | bwd_microstep: 2607.47 | bwd_inner_microstep: 1777.81 | bwd_allreduce_microstep: 829.60 | step_microstep: 37.81
[2024-06-11 00:46:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16258.59 | bwd: 44497.97 | bwd_inner: 43667.47 | bwd_allreduce: 829.83 | step: 39.37
{'loss': 1.1582, 'learning_rate': 3.7362370118108947e-06, 'epoch': 0.81}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 00:46:18,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.63 | bwd_microstep: 1236.43 | bwd_inner_microstep: 1236.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3938
[2024-06-11 00:46:21,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.90 | bwd_microstep: 1596.20 | bwd_inner_microstep: 1596.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895
[2024-06-11 00:46:23,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.47 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3479
[2024-06-11 00:46:25,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.91 | bwd_microstep: 1549.44 | bwd_inner_microstep: 1549.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 00:46:26,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1246.67 | bwd_inner_microstep: 1246.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 00:46:28,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1342.41 | bwd_inner_microstep: 1342.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-11 00:46:30,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.19 | bwd_microstep: 1183.32 | bwd_inner_microstep: 1183.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3707
[2024-06-11 00:46:32,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.32 | bwd_microstep: 1329.52 | bwd_inner_microstep: 1329.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 00:46:34,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-11 00:46:35,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.91 | bwd_microstep: 1158.64 | bwd_inner_microstep: 1158.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-11 00:46:37,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1414.20 | bwd_inner_microstep: 1414.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672
[2024-06-11 00:46:39,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.18 | bwd_microstep: 1551.65 | bwd_inner_microstep: 1551.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058
[2024-06-11 00:46:41,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.68 | bwd_microstep: 912.16 | bwd_inner_microstep: 912.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454
[2024-06-11 00:46:42,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1314.49 | bwd_inner_microstep: 1314.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3648
[2024-06-11 00:46:45,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.31 | bwd_microstep: 1708.64 | bwd_inner_microstep: 1708.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 00:46:47,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.28 | bwd_microstep: 1243.87 | bwd_inner_microstep: 1243.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3427
[2024-06-11 00:46:49,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.36 | bwd_microstep: 1538.45 | bwd_inner_microstep: 1538.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-11 00:46:50,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.21 | bwd_microstep: 1306.96 | bwd_inner_microstep: 1306.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3826
[2024-06-11 00:46:53,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.60 | bwd_microstep: 1852.69 | bwd_inner_microstep: 1852.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2936
[2024-06-11 00:46:54,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.45 | bwd_microstep: 1001.86 | bwd_inner_microstep: 1001.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-11 00:46:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.34 | bwd_microstep: 1481.92 | bwd_inner_microstep: 1481.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3887
[2024-06-11 00:46:59,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1548.82 | bwd_inner_microstep: 1548.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-11 00:47:00,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.14 | bwd_microstep: 974.76 | bwd_inner_microstep: 974.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3674
[2024-06-11 00:47:02,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.74 | bwd_microstep: 1458.30 | bwd_inner_microstep: 1458.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804
[2024-06-11 00:47:04,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.96 | bwd_microstep: 1353.68 | bwd_inner_microstep: 1353.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2017
[2024-06-11 00:47:05,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.73 | bwd_microstep: 906.51 | bwd_inner_microstep: 906.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3799
[2024-06-11 00:47:07,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.86 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-11 00:47:09,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.48 | bwd_microstep: 1505.67 | bwd_inner_microstep: 1505.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-11 00:47:11,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1296.61 | bwd_inner_microstep: 1296.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3463
[2024-06-11 00:47:13,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.75 | bwd_microstep: 1340.65 | bwd_inner_microstep: 1340.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695
[2024-06-11 00:47:15,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1331.23 | bwd_inner_microstep: 1331.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769
[2024-06-11 00:47:19,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-11 00:47:19,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.09 | bwd_microstep: 3401.91 | bwd_inner_microstep: 1775.25 | bwd_allreduce_microstep: 1626.61 | step_microstep: 37.64
[2024-06-11 00:47:19,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16357.63 | bwd: 45534.94 | bwd_inner: 43907.41 | bwd_allreduce: 1626.83 | step: 39.33
{'loss': 1.165, 'learning_rate': 3.7144208656253476e-06, 'epoch': 0.81}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-11 00:47:21,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.54 | bwd_microstep: 1568.71 | bwd_inner_microstep: 1568.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4016
[2024-06-11 00:47:23,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.64 | bwd_microstep: 1514.85 | bwd_inner_microstep: 1514.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3804
[2024-06-11 00:47:25,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.90 | bwd_microstep: 1576.87 | bwd_inner_microstep: 1576.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 00:47:27,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.76 | bwd_microstep: 1286.14 | bwd_inner_microstep: 1286.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-11 00:47:29,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.45 | bwd_microstep: 1521.95 | bwd_inner_microstep: 1521.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 00:47:31,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-11 00:47:32,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.82 | bwd_microstep: 796.47 | bwd_inner_microstep: 796.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-11 00:47:34,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.23 | bwd_microstep: 1309.34 | bwd_inner_microstep: 1309.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-11 00:47:35,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 795.61 | bwd_inner_microstep: 795.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-11 00:47:37,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.06 | bwd_microstep: 1280.10 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 00:47:38,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1248.81 | bwd_inner_microstep: 1248.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3493
[2024-06-11 00:47:40,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1343.09 | bwd_inner_microstep: 1343.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3658
[2024-06-11 00:47:43,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.79 | bwd_microstep: 1819.21 | bwd_inner_microstep: 1819.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-11 00:47:45,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.67 | bwd_microstep: 1434.19 | bwd_inner_microstep: 1434.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-11 00:47:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1603.31 | bwd_inner_microstep: 1603.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2140
[2024-06-11 00:47:48,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.43 | bwd_microstep: 738.14 | bwd_inner_microstep: 738.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3840
[2024-06-11 00:47:50,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1561.78 | bwd_inner_microstep: 1561.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 00:47:52,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1378.35 | bwd_inner_microstep: 1378.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-11 00:47:53,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.22 | bwd_microstep: 833.69 | bwd_inner_microstep: 833.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-11 00:47:55,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.55 | bwd_microstep: 1458.01 | bwd_inner_microstep: 1457.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-11 00:47:57,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.73 | bwd_microstep: 1603.42 | bwd_inner_microstep: 1603.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3466
[2024-06-11 00:48:00,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1485.52 | bwd_inner_microstep: 1485.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 00:48:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1554.74 | bwd_inner_microstep: 1554.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-11 00:48:04,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1496.19 | bwd_inner_microstep: 1496.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782
[2024-06-11 00:48:06,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.95 | bwd_microstep: 1349.13 | bwd_inner_microstep: 1349.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2292
[2024-06-11 00:48:07,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.49 | bwd_microstep: 816.16 | bwd_inner_microstep: 816.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-11 00:48:09,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1297.34 | bwd_inner_microstep: 1297.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 00:48:11,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1462.81 | bwd_inner_microstep: 1462.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-11 00:48:12,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1395.42 | bwd_inner_microstep: 1395.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575
[2024-06-11 00:48:14,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1431.13 | bwd_inner_microstep: 1431.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 00:48:16,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1375.80 | bwd_inner_microstep: 1375.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-11 00:48:19,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.03 | optimizer_step: 6.61
[2024-06-11 00:48:19,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.81 | bwd_microstep: 1917.06 | bwd_inner_microstep: 1419.49 | bwd_allreduce_microstep: 497.53 | step_microstep: 37.45
[2024-06-11 00:48:19,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16096.43 | bwd: 43536.12 | bwd_inner: 43037.69 | bwd_allreduce: 497.75 | step: 39.09
4:39, 61.92s/it]


 81%|████████  | 1392/1726 [24:05:51<5:44:39, 61.92s/it]
 81%|████████  | 1393/1726 [24:06:51<5:40:41, 61.39s/it]


 81%|████████  | 1393/1726 [24:06:51<5:40:41, 61.39s/it]
 81%|████████  | 1394/1726 [24:07:52<5:39:10, 61.30s/it]


 81%|████████  | 1394/1726 [24:07:52<5:39:10, 61.30s/it]
 81%|████████  | 1395/1726 [24:08:53<5:37:49, 61.24s/it]


 81%|████████  | 1395/1726 [24:08:53<5:37:49, 61.24s/it]
 81%|████████  | 1396/1726 [24:09:56<5:38:27, 61.54s/it]


 81%|████████  | 1396/1726 [24:09:56<5:38:27, 61.54s/it]
 81%|████████  | 1397/1726 [24:10:56<5:34:51{'loss': 1.1512, 'learning_rate': 3.692662077164855e-06, 'epoch': 0.81}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 00:48:21,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.62 | bwd_microstep: 1468.10 | bwd_inner_microstep: 1468.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 00:48:23,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 1381.11 | bwd_inner_microstep: 1381.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-11 00:48:25,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.95 | bwd_microstep: 1581.01 | bwd_inner_microstep: 1580.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4196
[2024-06-11 00:48:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.02 | bwd_microstep: 1549.91 | bwd_inner_microstep: 1549.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2253
[2024-06-11 00:48:28,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.95 | bwd_microstep: 965.49 | bwd_inner_microstep: 965.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 00:48:31,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.69 | bwd_microstep: 1655.60 | bwd_inner_microstep: 1655.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743
[2024-06-11 00:48:33,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1430.65 | bwd_inner_microstep: 1430.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 00:48:34,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1247.86 | bwd_inner_microstep: 1247.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-11 00:48:37,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1508.38 | bwd_inner_microstep: 1508.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-11 00:48:37,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.82 | bwd_microstep: 703.24 | bwd_inner_microstep: 703.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-11 00:48:39,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 797.90 | bwd_inner_microstep: 797.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-11 00:48:41,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1415.32 | bwd_inner_microstep: 1415.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 00:48:42,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.15 | bwd_microstep: 1377.25 | bwd_inner_microstep: 1377.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2041
[2024-06-11 00:48:44,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.91 | bwd_microstep: 779.35 | bwd_inner_microstep: 779.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3648
[2024-06-11 00:48:46,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.18 | bwd_microstep: 1456.59 | bwd_inner_microstep: 1456.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 00:48:47,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.03 | bwd_microstep: 1339.25 | bwd_inner_microstep: 1339.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495
[2024-06-11 00:48:50,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.89 | bwd_microstep: 1584.78 | bwd_inner_microstep: 1584.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 624
[2024-06-11 00:48:50,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.41 | bwd_microstep: 261.14 | bwd_inner_microstep: 261.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-11 00:48:52,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.78 | bwd_microstep: 1185.03 | bwd_inner_microstep: 1185.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 00:48:53,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1248.11 | bwd_inner_microstep: 1248.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-11 00:48:55,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1495.07 | bwd_inner_microstep: 1495.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-11 00:48:56,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.48 | bwd_microstep: 806.92 | bwd_inner_microstep: 806.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939
[2024-06-11 00:48:58,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.43 | bwd_microstep: 732.01 | bwd_inner_microstep: 731.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-11 00:48:59,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.61 | bwd_microstep: 1289.32 | bwd_inner_microstep: 1289.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-11 00:49:01,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 1441.08 | bwd_inner_microstep: 1441.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-11 00:49:03,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1608.76 | bwd_inner_microstep: 1608.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3810
[2024-06-11 00:49:05,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.58 | bwd_microstep: 1385.37 | bwd_inner_microstep: 1385.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-11 00:49:08,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.10 | bwd_microstep: 1634.15 | bwd_inner_microstep: 1634.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-11 00:49:10,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.92 | bwd_microstep: 1599.27 | bwd_inner_microstep: 1599.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1936
[2024-06-11 00:49:11,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.30 | bwd_microstep: 758.00 | bwd_inner_microstep: 757.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2659
[2024-06-11 00:49:12,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.09 | bwd_microstep: 1005.72 | bwd_inner_microstep: 1005.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2555
[2024-06-11 00:49:20,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 00:49:20,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.42 | bwd_microstep: 7549.37 | bwd_inner_microstep: 1281.72 | bwd_allreduce_microstep: 6267.59 | step_microstep: 37.66
[2024-06-11 00:49:20,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14910.24 | bwd: 46241.13 | bwd_inner: 39972.63 | bwd_allreduce: 6267.83 | step: 39.13
{'loss': 1.1939, 'learning_rate': 3.6709607230637545e-06, 'epoch': 0.81}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3417
[2024-06-11 00:49:22,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.26 | bwd_microstep: 1506.18 | bwd_inner_microstep: 1506.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2472
[2024-06-11 00:49:24,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.81 | bwd_microstep: 1014.32 | bwd_inner_microstep: 1014.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4065
[2024-06-11 00:49:26,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.52 | bwd_microstep: 1514.52 | bwd_inner_microstep: 1514.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 00:49:28,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 1369.87 | bwd_inner_microstep: 1369.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 00:49:30,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1341.68 | bwd_inner_microstep: 1341.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2039
[2024-06-11 00:49:31,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.77 | bwd_microstep: 716.82 | bwd_inner_microstep: 716.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4061
[2024-06-11 00:49:33,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.35 | bwd_microstep: 1618.26 | bwd_inner_microstep: 1618.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-11 00:49:34,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 799.62 | bwd_inner_microstep: 799.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3732
[2024-06-11 00:49:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.28 | bwd_microstep: 1365.59 | bwd_inner_microstep: 1365.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2076
[2024-06-11 00:49:37,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.92 | bwd_microstep: 727.12 | bwd_inner_microstep: 727.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3632
[2024-06-11 00:49:39,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1348.82 | bwd_inner_microstep: 1348.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-11 00:49:40,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.08 | bwd_microstep: 686.02 | bwd_inner_microstep: 686.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-11 00:49:42,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.28 | bwd_microstep: 1417.64 | bwd_inner_microstep: 1417.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-11 00:49:44,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1485.69 | bwd_inner_microstep: 1485.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3429
[2024-06-11 00:49:46,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1397.86 | bwd_inner_microstep: 1397.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 00:49:47,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.65 | bwd_microstep: 1290.47 | bwd_inner_microstep: 1290.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-11 00:49:50,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1509.58 | bwd_inner_microstep: 1509.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3829
[2024-06-11 00:49:51,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.08 | bwd_microstep: 1417.85 | bwd_inner_microstep: 1417.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3641
[2024-06-11 00:49:53,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.29 | bwd_microstep: 1465.14 | bwd_inner_microstep: 1465.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-11 00:49:54,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.51 | bwd_microstep: 697.33 | bwd_inner_microstep: 697.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-11 00:49:56,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.82 | bwd_microstep: 1460.05 | bwd_inner_microstep: 1460.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2519
[2024-06-11 00:49:58,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.04 | bwd_microstep: 900.85 | bwd_inner_microstep: 900.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899
[2024-06-11 00:50:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1509.60 | bwd_inner_microstep: 1509.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 00:50:02,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.51 | bwd_microstep: 1547.53 | bwd_inner_microstep: 1547.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930
[2024-06-11 00:50:03,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.31 | bwd_microstep: 760.63 | bwd_inner_microstep: 760.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 00:50:05,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1504.46 | bwd_inner_microstep: 1504.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 00:50:07,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.47 | bwd_microstep: 1291.67 | bwd_inner_microstep: 1291.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-11 00:50:09,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1552.24 | bwd_inner_microstep: 1552.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-11 00:50:11,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 1401.50 | bwd_inner_microstep: 1401.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2945
[2024-06-11 00:50:13,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.13 | bwd_microstep: 1288.29 | bwd_inner_microstep: 1288.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-11 00:50:14,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.23 | bwd_microstep: 1242.09 | bwd_inner_microstep: 1242.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3588
[2024-06-11 00:50:21,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.07 | optimizer_step: 6.59
[2024-06-11 00:50:21,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.11 | bwd_microstep: 5503.00 | bwd_inner_microstep: 2202.03 | bwd_allreduce_microstep: 3300.92 | step_microstep: 37.66
[2024-06-11 00:50:21,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15286.74 | bwd: 44652.31 | bwd_inner: 41350.47 | bwd_allreduce: 3301.15 | step: 39.22
{'loss': 1.2097, 'learning_rate': 3.649316879754099e-06, 'epoch': 0.81}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-11 00:50:23,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-11 00:50:24,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1304.92 | bwd_inner_microstep: 1304.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2358
[2024-06-11 00:50:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.54 | bwd_microstep: 892.18 | bwd_inner_microstep: 892.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3807
[2024-06-11 00:50:28,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.39 | bwd_microstep: 1507.38 | bwd_inner_microstep: 1507.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-11 00:50:29,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 787.79 | bwd_inner_microstep: 787.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 00:50:31,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235
[2024-06-11 00:50:32,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.59 | bwd_microstep: 960.46 | bwd_inner_microstep: 960.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-11 00:50:34,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.85 | bwd_microstep: 1310.73 | bwd_inner_microstep: 1310.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415
[2024-06-11 00:50:35,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.56 | bwd_microstep: 1181.19 | bwd_inner_microstep: 1181.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2111
[2024-06-11 00:50:37,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.17 | bwd_microstep: 1014.47 | bwd_inner_microstep: 1014.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-11 00:50:39,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.60 | bwd_microstep: 1491.25 | bwd_inner_microstep: 1491.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-11 00:50:41,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.66 | bwd_microstep: 1288.26 | bwd_inner_microstep: 1288.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-11 00:50:42,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.85 | bwd_microstep: 1156.37 | bwd_inner_microstep: 1156.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831
[2024-06-11 00:50:44,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.25 | bwd_microstep: 1451.84 | bwd_inner_microstep: 1451.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1922
[2024-06-11 00:50:45,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.30 | bwd_microstep: 725.70 | bwd_inner_microstep: 725.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520
[2024-06-11 00:50:47,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.97 | bwd_microstep: 1288.71 | bwd_inner_microstep: 1288.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2097
[2024-06-11 00:50:48,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.67 | bwd_microstep: 728.12 | bwd_inner_microstep: 728.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 00:50:50,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1281.84 | bwd_inner_microstep: 1281.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-11 00:50:52,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1428.05 | bwd_inner_microstep: 1428.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-11 00:50:54,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.47 | bwd_microstep: 1429.09 | bwd_inner_microstep: 1429.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3712
[2024-06-11 00:50:56,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.34 | bwd_microstep: 1463.97 | bwd_inner_microstep: 1463.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 00:50:58,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1382.53 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 00:51:00,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.97 | bwd_microstep: 1657.07 | bwd_inner_microstep: 1657.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-11 00:51:02,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.66 | bwd_microstep: 1654.62 | bwd_inner_microstep: 1654.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3475
[2024-06-11 00:51:04,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1329.57 | bwd_inner_microstep: 1329.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3707
[2024-06-11 00:51:06,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.41 | bwd_microstep: 1592.09 | bwd_inner_microstep: 1592.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3636
[2024-06-11 00:51:08,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.21 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 00:51:10,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1244.61 | bwd_inner_microstep: 1244.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3585
[2024-06-11 00:51:12,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.35 | bwd_microstep: 1457.46 | bwd_inner_microstep: 1457.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-11 00:51:14,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.61 | bwd_microstep: 1449.45 | bwd_inner_microstep: 1449.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-11 00:51:16,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.91 | bwd_microstep: 1347.32 | bwd_inner_microstep: 1347.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-11 00:51:23,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 00:51:23,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.88 | bwd_microstep: 6796.18 | bwd_inner_microstep: 1782.39 | bwd_allreduce_microstep: 5013.72 | step_microstep: 38.88
[2024-06-11 00:51:23,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15563.62 | bwd: 46716.88 | bwd_inner: 41702.25 | bwd_allreduce: 5013.96 | step: 40.34
{'loss': 1.1438, 'learning_rate': 3.6277306234653953e-06, 'epoch': 0.81}
, 61.07s/it]


 81%|████████  | 1397/1726 [24:10:56<5:34:51, 61.07s/it]
 81%|████████  | 1398/1726 [24:11:57<5:34:31, 61.19s/it]


 81%|████████  | 1398/1726 [24:11:57<5:34:31, 61.19s/it]
 81%|████████  | 1399/1726 [24:12:57<5:31:59, 60.92s/it]


 81%|████████  | 1399/1726 [24:12:57<5:31:59, 60.92s/it]
 81%|████████  | 1400/1726 [24:14:00<5:33:44, 61.42s/it]


 81%|████████  | 1400/1726 [24:14:00<5:33:44, 61.42s/it][INFO|trainer.py:2936] 2024-06-11 00:51:26,027 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400
[INFO|configuration_utils.py:473] 2024-06-11 00:51:26,030 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/config.json
[INFO|configuration_utils.py:594] 2024-06-11 00:51:26,033 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-11 00:51:34,047 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-11 00:51:34,065 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-11 00:51:34,067 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-11 00:51:34,068 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/added_tokens.json
[2024-06-11 00:51:34,477] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1400 is about to be saved!
[2024-06-11 00:51:34,488] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/mp_rank_00_model_states.pt
[2024-06-11 00:51:34,488] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/mp_rank_00_model_states.pt...
[2024-06-11 00:51:43,074] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/mp_rank_00_model_states.pt.
[2024-06-11 00:51:43,086] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-11 00:51:54,643] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-11 00:51:54,656] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-11 00:51:54,656] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1400 is ready now!
[INFO|trainer.py:3028] 2024-06-11 00:51:54,881 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-800] due to args.save_total_limit
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 00:51:57,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.51 | bwd_microstep: 1269.01 | bwd_inner_microstep: 1268.91 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 00:51:59,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1372.14 | bwd_inner_microstep: 1372.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2643
[2024-06-11 00:52:00,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.94 | bwd_microstep: 1109.36 | bwd_inner_microstep: 1109.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862
[2024-06-11 00:52:02,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1460.57 | bwd_inner_microstep: 1460.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 00:52:04,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1243.47 | bwd_inner_microstep: 1243.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-11 00:52:05,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.71 | bwd_microstep: 790.96 | bwd_inner_microstep: 790.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4134
[2024-06-11 00:52:07,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.20 | bwd_microstep: 1736.00 | bwd_inner_microstep: 1735.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-11 00:52:10,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.43 | bwd_microstep: 1621.02 | bwd_inner_microstep: 1621.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-11 00:52:17,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.31 | bwd_microstep: 1271.62 | bwd_inner_microstep: 1271.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2907
[2024-06-11 00:52:19,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.15 | bwd_microstep: 1149.51 | bwd_inner_microstep: 1149.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 00:52:28,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.89 | bwd_microstep: 1276.07 | bwd_inner_microstep: 1276.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-11 00:52:30,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.90 | bwd_microstep: 1473.67 | bwd_inner_microstep: 1473.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-11 00:52:32,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1572.70 | bwd_inner_microstep: 1572.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668
[2024-06-11 00:52:41,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.55 | bwd_microstep: 1604.02 | bwd_inner_microstep: 1603.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 00:52:42,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.77 | bwd_microstep: 1381.12 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 00:52:54,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1391.82 | bwd_inner_microstep: 1391.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-11 00:52:56,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.93 | bwd_microstep: 1438.47 | bwd_inner_microstep: 1438.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2203
[2024-06-11 00:52:57,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.25 | bwd_microstep: 955.75 | bwd_inner_microstep: 955.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2092
[2024-06-11 00:52:58,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.00 | bwd_microstep: 792.01 | bwd_inner_microstep: 791.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 00:53:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1396.14 | bwd_inner_microstep: 1396.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424
[2024-06-11 00:53:02,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.67 | bwd_microstep: 1276.87 | bwd_inner_microstep: 1276.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719
[2024-06-11 00:53:04,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-11 00:53:06,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1535.11 | bwd_inner_microstep: 1535.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-11 00:53:07,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.93 | bwd_microstep: 692.57 | bwd_inner_microstep: 692.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-11 00:53:09,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.77 | bwd_microstep: 1159.89 | bwd_inner_microstep: 1159.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-11 00:53:10,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1404.59 | bwd_inner_microstep: 1404.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3609
[2024-06-11 00:53:13,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.86 | bwd_microstep: 1582.54 | bwd_inner_microstep: 1582.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-11 00:53:14,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1346.09 | bwd_inner_microstep: 1345.99 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.23
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3780
[2024-06-11 00:53:17,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.88 | bwd_microstep: 1693.21 | bwd_inner_microstep: 1693.09 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-11 00:53:19,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.68 | bwd_microstep: 1477.30 | bwd_inner_microstep: 1477.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-11 00:53:21,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1408.42 | bwd_inner_microstep: 1408.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568
[2024-06-11 00:53:33,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.62
[2024-06-11 00:53:33,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.66 | bwd_microstep: 11112.92 | bwd_inner_microstep: 1765.14 | bwd_allreduce_microstep: 9347.70 | step_microstep: 40.61
[2024-06-11 00:53:33,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16136.22 | bwd: 52333.53 | bwd_inner: 42984.62 | bwd_allreduce: 9348.10 | step: 42.65
{'loss': 1.1706, 'learning_rate': 3.6062020302243196e-06, 'epoch': 0.81}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 00:53:35,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.87 | bwd_microstep: 1469.77 | bwd_inner_microstep: 1469.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3936
[2024-06-11 00:53:37,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1420.89 | bwd_inner_microstep: 1420.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-11 00:53:39,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 1448.38 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 00:53:40,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1277.73 | bwd_inner_microstep: 1277.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742
[2024-06-11 00:53:42,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.49 | bwd_microstep: 1528.90 | bwd_inner_microstep: 1528.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-11 00:53:45,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.15 | bwd_microstep: 1636.04 | bwd_inner_microstep: 1636.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738
[2024-06-11 00:53:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.95 | bwd_microstep: 1434.39 | bwd_inner_microstep: 1434.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 00:53:48,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.60 | bwd_microstep: 1242.89 | bwd_inner_microstep: 1242.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952
[2024-06-11 00:53:49,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.92 | bwd_microstep: 730.80 | bwd_inner_microstep: 730.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886
[2024-06-11 00:53:50,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.30 | bwd_microstep: 685.33 | bwd_inner_microstep: 685.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2177
[2024-06-11 00:53:52,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.92 | bwd_microstep: 917.67 | bwd_inner_microstep: 917.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-11 00:53:53,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.95 | bwd_microstep: 888.86 | bwd_inner_microstep: 888.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418
[2024-06-11 00:53:55,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1311.84 | bwd_inner_microstep: 1311.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-11 00:53:57,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1433.85 | bwd_inner_microstep: 1433.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 00:53:59,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1384.92 | bwd_inner_microstep: 1384.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-11 00:54:01,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.65 | bwd_microstep: 1503.37 | bwd_inner_microstep: 1503.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 00:54:03,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1451.90 | bwd_inner_microstep: 1451.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3606
[2024-06-11 00:54:05,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1457.71 | bwd_inner_microstep: 1457.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825
[2024-06-11 00:54:07,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.76 | bwd_microstep: 1490.88 | bwd_inner_microstep: 1490.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3575
[2024-06-11 00:54:09,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.85 | bwd_microstep: 1630.20 | bwd_inner_microstep: 1630.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643
[2024-06-11 00:54:11,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.88 | bwd_microstep: 1318.16 | bwd_inner_microstep: 1318.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624
[2024-06-11 00:54:13,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.07 | bwd_microstep: 1442.77 | bwd_inner_microstep: 1442.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3593
[2024-06-11 00:54:15,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.78 | bwd_microstep: 1466.41 | bwd_inner_microstep: 1466.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997
[2024-06-11 00:54:16,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.01 | bwd_microstep: 736.41 | bwd_inner_microstep: 736.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066
[2024-06-11 00:54:17,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.89 | bwd_microstep: 815.79 | bwd_inner_microstep: 815.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 00:54:19,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1256.32 | bwd_inner_microstep: 1256.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3566
[2024-06-11 00:54:21,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.25 | bwd_microstep: 1562.70 | bwd_inner_microstep: 1562.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-11 00:54:23,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.42 | bwd_microstep: 1357.03 | bwd_inner_microstep: 1357.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752
[2024-06-11 00:54:25,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1341.72 | bwd_inner_microstep: 1341.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-11 00:54:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.31 | bwd_microstep: 1438.49 | bwd_inner_microstep: 1438.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 00:54:28,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3591
[2024-06-11 00:54:39,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-11 00:54:39,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.19 | bwd_microstep: 10357.43 | bwd_inner_microstep: 1766.42 | bwd_allreduce_microstep: 8590.95 | step_microstep: 39.11
[2024-06-11 00:54:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15738.70 | bwd: 50832.70 | bwd_inner: 42240.80 | bwd_allreduce: 8591.18 | step: 40.95
{'loss': 1.1724, 'learning_rate': 3.584731175854479e-06, 'epoch': 0.81}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-11 00:54:41,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1301.77 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 00:54:43,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1372.96 | bwd_inner_microstep: 1372.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3946
[2024-06-11 00:54:46,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.47 | bwd_microstep: 1687.51 | bwd_inner_microstep: 1687.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3912
[2024-06-11 00:54:47,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-11 00:54:49,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.13 | bwd_microstep: 1215.56 | bwd_inner_microstep: 1215.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3743
[2024-06-11 00:54:51,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1297.99 | bwd_inner_microstep: 1297.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743
[2024-06-11 00:54:53,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.28 | bwd_microstep: 1430.56 | bwd_inner_microstep: 1430.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-11 00:54:55,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.47 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 00:54:57,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.59 | bwd_microstep: 1285.88 | bwd_inner_microstep: 1285.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1886
[2024-06-11 00:54:58,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.78 | bwd_microstep: 711.40 | bwd_inner_microstep: 711.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 00:54:59,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1285.79 | bwd_inner_microstep: 1285.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670
[2024-06-11 00:55:01,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1455.59 | bwd_inner_microstep: 1455.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-11 00:55:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.70 | bwd_microstep: 1282.37 | bwd_inner_microstep: 1282.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2664
[2024-06-11 00:55:05,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.81 | bwd_microstep: 1216.10 | bwd_inner_microstep: 1216.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-11 00:55:07,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.64 | bwd_microstep: 1529.41 | bwd_inner_microstep: 1529.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 00:55:09,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.83 | bwd_microstep: 1561.78 | bwd_inner_microstep: 1561.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2017
[2024-06-11 00:55:10,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.20 | bwd_microstep: 743.44 | bwd_inner_microstep: 743.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 00:55:12,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1382.67 | bwd_inner_microstep: 1382.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-11 00:55:14,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1441.11 | bwd_inner_microstep: 1441.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-11 00:55:16,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1327.54 | bwd_inner_microstep: 1327.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-11 00:55:18,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2104
[2024-06-11 00:55:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.60 | bwd_microstep: 921.13 | bwd_inner_microstep: 921.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543
[2024-06-11 00:55:21,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.89 | bwd_microstep: 1328.01 | bwd_inner_microstep: 1327.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-11 00:55:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1288.04 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480
[2024-06-11 00:55:25,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.39 | bwd_microstep: 1442.72 | bwd_inner_microstep: 1442.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3638
[2024-06-11 00:55:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.31 | bwd_microstep: 1657.47 | bwd_inner_microstep: 1657.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811
[2024-06-11 00:55:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1355.76 | bwd_inner_microstep: 1355.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-11 00:55:31,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1586.24 | bwd_inner_microstep: 1586.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-11 00:55:33,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1559.06 | bwd_inner_microstep: 1559.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2028
[2024-06-11 00:55:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.92 | bwd_microstep: 999.08 | bwd_inner_microstep: 999.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 00:55:37,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.13 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1542.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-11 00:55:41,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.14 | optimizer_step: 6.63
[2024-06-11 00:55:41,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.55 | bwd_microstep: 3401.79 | bwd_inner_microstep: 1753.90 | bwd_allreduce_microstep: 1647.83 | step_microstep: 38.96
[2024-06-11 00:55:41,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16079.26 | bwd: 44796.47 | bwd_inner: 43147.73 | bwd_allreduce: 1648.06 | step: 40.43
{'loss': 1.1564, 'learning_rate': 3.5633181359760925e-06, 'epoch': 0.81}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 00:55:42,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.03 | bwd_microstep: 787.02 | bwd_inner_microstep: 786.89 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-11 00:55:44,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.93 | bwd_microstep: 1559.49 | bwd_inner_microstep: 1559.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3859
[2024-06-11 00:55:46,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1364.75 | bwd_inner_microstep: 1364.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 00:55:48,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1247.36 | bwd_inner_microstep: 1247.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 00:55:49,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1353.03 | bwd_inner_microstep: 1353.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-11 00:55:52,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1648.31 | bwd_inner_microstep: 1648.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749
[2024-06-11 00:55:54,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.71 | bwd_microstep: 1437.01 | bwd_inner_microstep: 1436.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-11 00:55:56,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1442.11 | bwd_inner_microstep: 1442.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 00:55:57,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1288.67 | bwd_inner_microstep: 1288.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3993
[2024-06-11 00:56:00,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.42 | bwd_microstep: 1574.35 | bwd_inner_microstep: 1574.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 00:56:01,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1285.85 | bwd_inner_microstep: 1285.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3415
[2024-06-11 00:56:03,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.60 | bwd_microstep: 1213.52 | bwd_inner_microstep: 1213.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-11 00:56:05,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.56 | bwd_microstep: 1537.07 | bwd_inner_microstep: 1537.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3517
[2024-06-11 00:56:07,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1517.16 | bwd_inner_microstep: 1517.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 00:56:09,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3647
[2024-06-11 00:56:12,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.11 | bwd_microstep: 1711.06 | bwd_inner_microstep: 1711.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-11 00:56:14,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.37 | bwd_microstep: 1599.07 | bwd_inner_microstep: 1599.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-11 00:56:16,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.97 | bwd_microstep: 1214.05 | bwd_inner_microstep: 1214.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-11 00:56:17,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 799.03 | bwd_inner_microstep: 799.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 00:56:18,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.92 | bwd_microstep: 974.22 | bwd_inner_microstep: 974.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-11 00:56:20,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1519.55 | bwd_inner_microstep: 1519.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 00:56:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.50 | bwd_microstep: 1554.43 | bwd_inner_microstep: 1554.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 00:56:24,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 00:56:26,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1256.67 | bwd_inner_microstep: 1256.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 00:56:28,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1412.20 | bwd_inner_microstep: 1412.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588
[2024-06-11 00:56:30,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1336.99 | bwd_inner_microstep: 1336.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 00:56:32,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1401.19 | bwd_inner_microstep: 1401.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-11 00:56:34,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1391.78 | bwd_inner_microstep: 1391.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-11 00:56:36,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1441.69 | bwd_inner_microstep: 1441.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-11 00:56:38,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.64 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771
[2024-06-11 00:56:40,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.59 | bwd_microstep: 1447.25 | bwd_inner_microstep: 1447.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-11 00:56:42,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.26 | optimizer_step: 6.61
[2024-06-11 00:56:42,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.17 | bwd_microstep: 2112.15 | bwd_inner_microstep: 1721.29 | bwd_allreduce_microstep: 390.80 | step_microstep: 38.65
[2024-06-11 00:56:42,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16518.43 | bwd: 44724.51 | bwd_inner: 44332.71 | bwd_allreduce: 391.08 | step: 40.31
{'loss': 1.2194, 'learning_rate': 3.5419629860057915e-06, 'epoch': 0.81}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 00:56:44,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.18 | bwd_microstep: 1273.98 | bwd_inner_microstep: 1273.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4012
[2024-06-11 00:56:46,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1513.69 | bwd_inner_microstep: 1513.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-11 00:56:48,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1449.07 | bwd_inner_microstep: 1449.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 00:56:50,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1479.18 | bwd_inner_microstep: 1479.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899
[2024-06-11 00:56:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.50 | bwd_microstep: 777.19 | bwd_inner_microstep: 777.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495
[2024-06-11 00:56:53,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.67 | bwd_microstep: 1221.33 | bwd_inner_microstep: 1221.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3695
[2024-06-11 00:56:55,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.72 | bwd_microstep: 1590.33 | bwd_inner_microstep: 1590.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-11 00:56:57,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1250.35 | bwd_inner_microstep: 1250.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-11 00:56:58,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.90 | bwd_microstep: 678.27 | bwd_inner_microstep: 678.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-11 00:56:59,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.12 | bwd_microstep: 812.42 | bwd_inner_microstep: 812.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-11 00:57:01,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.76 | bwd_microstep: 1315.89 | bwd_inner_microstep: 1315.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479
[2024-06-11 00:57:03,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1411.99 | bwd_inner_microstep: 1411.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 00:57:05,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2185
[2024-06-11 00:57:06,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.58 | bwd_microstep: 858.41 | bwd_inner_microstep: 858.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2153
[2024-06-11 00:57:07,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.40 | bwd_microstep: 1045.79 | bwd_inner_microstep: 1045.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-11 00:57:09,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.15 | bwd_microstep: 1279.49 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109
[2024-06-11 00:57:10,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.67 | bwd_microstep: 918.72 | bwd_inner_microstep: 918.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-11 00:57:12,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1493.05 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3150
[2024-06-11 00:57:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1350.44 | bwd_inner_microstep: 1350.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-11 00:57:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.75 | bwd_microstep: 1314.93 | bwd_inner_microstep: 1314.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3618
[2024-06-11 00:57:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.50 | bwd_microstep: 1216.35 | bwd_inner_microstep: 1216.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 00:57:20,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.75 | bwd_microstep: 1659.66 | bwd_inner_microstep: 1659.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 00:57:22,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.51 | bwd_microstep: 1399.53 | bwd_inner_microstep: 1399.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 00:57:24,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1351.67 | bwd_inner_microstep: 1351.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867
[2024-06-11 00:57:26,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.39 | bwd_microstep: 1679.29 | bwd_inner_microstep: 1679.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753
[2024-06-11 00:57:28,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.46 | bwd_microstep: 1475.09 | bwd_inner_microstep: 1475.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3598
[2024-06-11 00:57:30,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.19 | bwd_microstep: 1674.54 | bwd_inner_microstep: 1674.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-11 00:57:32,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1398.80 | bwd_inner_microstep: 1398.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599
[2024-06-11 00:57:35,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.20 | bwd_microstep: 1708.58 | bwd_inner_microstep: 1708.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2958
[2024-06-11 00:57:36,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.57 | bwd_microstep: 1101.90 | bwd_inner_microstep: 1101.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-11 00:57:39,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.20 | bwd_microstep: 1606.70 | bwd_inner_microstep: 1606.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760
[2024-06-11 00:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.30 | optimizer_step: 6.61
[2024-06-11 00:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.91 | bwd_microstep: 5307.76 | bwd_inner_microstep: 1857.21 | bwd_allreduce_microstep: 3450.49 | step_microstep: 39.95
[2024-06-11 00:57:44,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15822.06 | bwd: 46007.52 | bwd_inner: 42556.11 | bwd_allreduce: 3450.73 | step: 41.58
{'loss': 1.1881, 'learning_rate': 3.520665801156289e-06, 'epoch': 0.81}

 81%|████████  | 1401/1726 [24:16:09<7:23:07, 81.81s/it]


 81%|████████  | 1401/1726 [24:16:09<7:23:07, 81.81s/it]
 81%|████████  | 1402/1726 [24:17:16<6:57:38, 77.34s/it]


 81%|████████  | 1402/1726 [24:17:16<6:57:38, 77.34s/it]
 81%|████████▏ | 1403/1726 [24:18:17<6:30:18, 72.50s/it]


 81%|████████▏ | 1403/1726 [24:18:17<6:30:18, 72.50s/it]
 81%|████████▏ | 1404/1726 [24:19:19<6:11:32, 69.23s/it]


 81%|████████▏ | 1404/1726 [24:19:19<6:11:32, 69.23s/it]
 81%|████████▏ | 1405/1726 [24:20:21<5:59:04, 67.12s/it]


 81%|████████▏ | 1405/1726 [24:20:21<5:59:04, 67dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-11 00:57:46,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.67 | bwd_microstep: 1334.11 | bwd_inner_microstep: 1333.71 | bwd_allreduce_microstep: 0.26 | step_microstep: 0.35
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4384
[2024-06-11 00:57:49,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.40 | bwd_microstep: 1705.87 | bwd_inner_microstep: 1705.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 00:57:51,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1480.16 | bwd_inner_microstep: 1480.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849
[2024-06-11 00:57:53,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.64 | bwd_microstep: 1555.19 | bwd_inner_microstep: 1555.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795
[2024-06-11 00:57:55,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.54 | bwd_microstep: 1450.60 | bwd_inner_microstep: 1450.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-11 00:57:56,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.84 | bwd_microstep: 788.22 | bwd_inner_microstep: 788.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4029
[2024-06-11 00:57:58,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1451.29 | bwd_inner_microstep: 1451.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-11 00:57:59,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.61 | bwd_microstep: 790.40 | bwd_inner_microstep: 790.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895
[2024-06-11 00:58:00,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.88 | bwd_microstep: 716.49 | bwd_inner_microstep: 716.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405
[2024-06-11 00:58:02,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1440.89 | bwd_inner_microstep: 1440.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1940
[2024-06-11 00:58:03,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.51 | bwd_microstep: 823.74 | bwd_inner_microstep: 823.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3410
[2024-06-11 00:58:05,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.10 | bwd_microstep: 1404.37 | bwd_inner_microstep: 1404.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3488
[2024-06-11 00:58:07,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1500.08 | bwd_inner_microstep: 1500.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-11 00:58:09,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1578.33 | bwd_inner_microstep: 1578.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-11 00:58:11,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1509.86 | bwd_inner_microstep: 1509.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2298
[2024-06-11 00:58:13,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.26 | bwd_microstep: 939.48 | bwd_inner_microstep: 939.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3578
[2024-06-11 00:58:15,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.08 | bwd_microstep: 1532.42 | bwd_inner_microstep: 1532.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-11 00:58:17,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 1292.43 | bwd_inner_microstep: 1292.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2117
[2024-06-11 00:58:18,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.30 | bwd_microstep: 830.01 | bwd_inner_microstep: 829.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055
[2024-06-11 00:58:19,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.46 | bwd_microstep: 911.64 | bwd_inner_microstep: 911.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 00:58:21,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1560.33 | bwd_inner_microstep: 1560.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 00:58:23,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.56 | bwd_microstep: 1562.16 | bwd_inner_microstep: 1562.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-11 00:58:25,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1296.75 | bwd_inner_microstep: 1296.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486
[2024-06-11 00:58:27,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.52 | bwd_microstep: 1192.07 | bwd_inner_microstep: 1192.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2163
[2024-06-11 00:58:28,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.93 | bwd_microstep: 793.36 | bwd_inner_microstep: 793.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2585
[2024-06-11 00:58:29,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.43 | bwd_microstep: 1073.09 | bwd_inner_microstep: 1073.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2285
[2024-06-11 00:58:31,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.76 | bwd_microstep: 940.53 | bwd_inner_microstep: 940.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3731
[2024-06-11 00:58:33,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.45 | bwd_microstep: 1630.60 | bwd_inner_microstep: 1630.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-11 00:58:35,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.30 | bwd_microstep: 1451.53 | bwd_inner_microstep: 1451.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3665
[2024-06-11 00:58:37,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.70 | bwd_microstep: 1263.74 | bwd_inner_microstep: 1263.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3769
[2024-06-11 00:58:39,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 00:58:44,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.32 | optimizer_step: 6.59
[2024-06-11 00:58:44,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 4532.42 | bwd_inner_microstep: 1552.23 | bwd_allreduce_microstep: 2980.11 | step_microstep: 39.85
[2024-06-11 00:58:44,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15171.49 | bwd: 43645.32 | bwd_inner: 40663.92 | bwd_allreduce: 2980.64 | step: 41.93
{'loss': 1.1458, 'learning_rate': 3.4994266564361733e-06, 'epoch': 0.81}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931
[2024-06-11 00:58:45,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.97 | bwd_microstep: 885.00 | bwd_inner_microstep: 884.87 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 00:58:47,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 00:58:48,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.61 | bwd_microstep: 1271.46 | bwd_inner_microstep: 1271.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3858
[2024-06-11 00:58:51,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.72 | bwd_microstep: 1524.28 | bwd_inner_microstep: 1524.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472
[2024-06-11 00:58:52,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.22 | bwd_microstep: 1246.37 | bwd_inner_microstep: 1246.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 00:58:54,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-11 00:58:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.71 | bwd_microstep: 1302.96 | bwd_inner_microstep: 1302.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-11 00:58:58,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1538.83 | bwd_inner_microstep: 1538.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 00:59:00,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1284.13 | bwd_inner_microstep: 1284.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2129
[2024-06-11 00:59:01,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.86 | bwd_microstep: 922.45 | bwd_inner_microstep: 922.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873
[2024-06-11 00:59:02,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.33 | bwd_microstep: 678.98 | bwd_inner_microstep: 678.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 00:59:04,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.78 | bwd_microstep: 1389.45 | bwd_inner_microstep: 1389.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3940
[2024-06-11 00:59:06,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1495.15 | bwd_inner_microstep: 1495.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3650
[2024-06-11 00:59:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.46 | bwd_microstep: 1665.00 | bwd_inner_microstep: 1664.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-11 00:59:10,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.90 | bwd_microstep: 1451.33 | bwd_inner_microstep: 1451.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3020
[2024-06-11 00:59:12,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.16 | bwd_microstep: 1230.27 | bwd_inner_microstep: 1230.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235
[2024-06-11 00:59:13,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.71 | bwd_microstep: 868.60 | bwd_inner_microstep: 868.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-11 00:59:15,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.16 | bwd_microstep: 1398.21 | bwd_inner_microstep: 1398.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3606
[2024-06-11 00:59:17,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.15 | bwd_microstep: 1431.83 | bwd_inner_microstep: 1431.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-11 00:59:18,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.09 | bwd_microstep: 797.22 | bwd_inner_microstep: 797.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 614
[2024-06-11 00:59:19,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.45 | bwd_microstep: 261.72 | bwd_inner_microstep: 261.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-11 00:59:20,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.58 | bwd_microstep: 1189.25 | bwd_inner_microstep: 1189.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1987
[2024-06-11 00:59:22,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.52 | bwd_microstep: 865.16 | bwd_inner_microstep: 865.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1972
[2024-06-11 00:59:23,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.45 | bwd_microstep: 892.29 | bwd_inner_microstep: 892.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1993
[2024-06-11 00:59:24,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.34 | bwd_microstep: 898.99 | bwd_inner_microstep: 898.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-11 00:59:26,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1439.84 | bwd_inner_microstep: 1439.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430
[2024-06-11 00:59:28,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.67 | bwd_microstep: 1281.25 | bwd_inner_microstep: 1281.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-11 00:59:30,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.03 | bwd_microstep: 1647.48 | bwd_inner_microstep: 1647.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3741
[2024-06-11 00:59:32,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1368.58 | bwd_inner_microstep: 1368.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 00:59:34,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1289.80 | bwd_inner_microstep: 1289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-11 00:59:36,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.98 | bwd_microstep: 1497.99 | bwd_inner_microstep: 1497.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-11 00:59:45,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.62 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-11 00:59:45,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 8815.55 | bwd_inner_microstep: 1702.31 | bwd_allreduce_microstep: 7113.19 | step_microstep: 40.82
[2024-06-11 00:59:45,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14705.85 | bwd: 46489.09 | bwd_inner: 39374.90 | bwd_allreduce: 7113.48 | step: 42.39
{'loss': 1.2046, 'learning_rate': 3.478245626649597e-06, 'epoch': 0.82}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469
[2024-06-11 00:59:47,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.47 | bwd_microstep: 1530.47 | bwd_inner_microstep: 1530.27 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-11 00:59:49,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.75 | bwd_microstep: 1338.04 | bwd_inner_microstep: 1338.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902
[2024-06-11 00:59:51,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.68 | bwd_microstep: 1679.58 | bwd_inner_microstep: 1679.50 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-11 00:59:53,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.72 | bwd_microstep: 1296.45 | bwd_inner_microstep: 1296.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-11 00:59:55,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1532.88 | bwd_inner_microstep: 1532.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-11 00:59:56,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.86 | bwd_microstep: 793.04 | bwd_inner_microstep: 793.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-11 00:59:58,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1338.90 | bwd_inner_microstep: 1338.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609
[2024-06-11 01:00:00,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.25 | bwd_microstep: 1312.34 | bwd_inner_microstep: 1312.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-11 01:00:01,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.35 | bwd_microstep: 797.76 | bwd_inner_microstep: 797.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420
[2024-06-11 01:00:03,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.12 | bwd_microstep: 1279.04 | bwd_inner_microstep: 1279.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-11 01:00:05,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1420.87 | bwd_inner_microstep: 1420.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1976
[2024-06-11 01:00:06,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.15 | bwd_microstep: 829.03 | bwd_inner_microstep: 829.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-11 01:00:08,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.54 | bwd_microstep: 1158.36 | bwd_inner_microstep: 1158.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689
[2024-06-11 01:00:10,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.58 | bwd_microstep: 1549.30 | bwd_inner_microstep: 1549.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-11 01:00:12,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.84 | bwd_microstep: 1628.23 | bwd_inner_microstep: 1628.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3484
[2024-06-11 01:00:14,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1549.28 | bwd_inner_microstep: 1549.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-11 01:00:16,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.55 | bwd_microstep: 1615.98 | bwd_inner_microstep: 1615.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-11 01:00:19,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1605.39 | bwd_inner_microstep: 1605.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 01:00:20,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.23 | bwd_microstep: 1255.40 | bwd_inner_microstep: 1255.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 01:00:22,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.25 | bwd_microstep: 1404.85 | bwd_inner_microstep: 1404.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084
[2024-06-11 01:00:23,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.79 | bwd_microstep: 821.27 | bwd_inner_microstep: 821.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604
[2024-06-11 01:00:25,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.74 | bwd_microstep: 1309.26 | bwd_inner_microstep: 1309.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821
[2024-06-11 01:00:27,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 1355.45 | bwd_inner_microstep: 1355.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-11 01:00:29,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 1610.02 | bwd_inner_microstep: 1609.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2697
[2024-06-11 01:00:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.09 | bwd_microstep: 1129.07 | bwd_inner_microstep: 1129.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-11 01:00:33,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.46 | bwd_microstep: 1404.81 | bwd_inner_microstep: 1404.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601
[2024-06-11 01:00:35,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1508.63 | bwd_inner_microstep: 1508.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-11 01:00:37,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-11 01:00:39,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 1438.69 | bwd_inner_microstep: 1438.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3816
[2024-06-11 01:00:41,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1402.96 | bwd_inner_microstep: 1402.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-11 01:00:43,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760
[2024-06-11 01:00:46,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.07 | optimizer_step: 6.59
[2024-06-11 01:00:46,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 2651.72 | bwd_inner_microstep: 1750.37 | bwd_allreduce_microstep: 901.31 | step_microstep: 37.74
[2024-06-11 01:00:46,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16141.43 | bwd: 44357.88 | bwd_inner: 43455.45 | bwd_allreduce: 901.65 | step: 39.50
{'loss': 1.2037, 'learning_rate': 3.457122786396032e-06, 'epoch': 0.82}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1923
[2024-06-11 01:00:47,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.28 | bwd_microstep: 880.09 | bwd_inner_microstep: 879.96 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3953
[2024-06-11 01:00:49,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.95 | bwd_microstep: 1592.25 | bwd_inner_microstep: 1592.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4370
[2024-06-11 01:00:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.46 | bwd_microstep: 1541.82 | bwd_inner_microstep: 1541.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 01:00:53,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.95 | bwd_microstep: 1279.31 | bwd_inner_microstep: 1279.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-11 01:00:55,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.99 | bwd_microstep: 1474.78 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2444
[2024-06-11 01:00:57,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 949.13 | bwd_inner_microstep: 949.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 01:00:59,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 01:01:00,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1252.89 | bwd_inner_microstep: 1252.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3487
[2024-06-11 01:01:02,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.81 | bwd_microstep: 1437.14 | bwd_inner_microstep: 1437.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941
[2024-06-11 01:01:04,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.67 | bwd_microstep: 889.43 | bwd_inner_microstep: 889.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-11 01:01:05,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.01 | bwd_microstep: 1346.01 | bwd_inner_microstep: 1345.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 01:01:07,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1382.18 | bwd_inner_microstep: 1381.98 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661
[2024-06-11 01:01:10,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.83 | bwd_microstep: 1715.10 | bwd_inner_microstep: 1715.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-11 01:01:12,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1345.96 | bwd_inner_microstep: 1345.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943
[2024-06-11 01:01:13,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.76 | bwd_microstep: 729.87 | bwd_inner_microstep: 729.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-11 01:01:14,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.80 | bwd_microstep: 1163.00 | bwd_inner_microstep: 1162.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 619
[2024-06-11 01:01:15,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.22 | bwd_microstep: 261.02 | bwd_inner_microstep: 260.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 01:01:17,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1557.89 | bwd_inner_microstep: 1557.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 01:01:19,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1560.57 | bwd_inner_microstep: 1560.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2118
[2024-06-11 01:01:20,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.98 | bwd_microstep: 926.47 | bwd_inner_microstep: 926.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-11 01:01:22,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1294.86 | bwd_inner_microstep: 1294.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-11 01:01:23,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.71 | bwd_microstep: 801.85 | bwd_inner_microstep: 801.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 01:01:25,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1557.69 | bwd_inner_microstep: 1557.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-11 01:01:27,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.00 | bwd_microstep: 1459.74 | bwd_inner_microstep: 1459.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3551
[2024-06-11 01:01:29,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 11, images per sample: 2.75, dynamic token length: 1535
[2024-06-11 01:01:30,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 232.69 | bwd_microstep: 612.84 | bwd_inner_microstep: 612.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-11 01:01:33,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.64 | bwd_microstep: 1620.19 | bwd_inner_microstep: 1620.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552
[2024-06-11 01:01:34,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1426.91 | bwd_inner_microstep: 1426.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3592
[2024-06-11 01:01:37,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.05 | bwd_microstep: 1650.80 | bwd_inner_microstep: 1650.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-11 01:01:39,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.34 | bwd_microstep: 1499.94 | bwd_inner_microstep: 1499.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-11 01:01:41,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1400.46 | bwd_inner_microstep: 1400.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-11 01:01:46,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-11 01:01:46,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.48 | bwd_microstep: 4826.51 | bwd_inner_microstep: 1534.74 | bwd_allreduce_microstep: 3291.71 | step_microstep: 39.26
[2024-06-11 01:01:46,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15330.18 | bwd: 44407.06 | bwd_inner: 41114.19 | bwd_allreduce: 3292.07 | step: 40.96
{'loss': 1.2128, 'learning_rate': 3.436058210070012e-06, 'epoch': 0.82}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-11 01:01:47,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.64 | bwd_microstep: 788.64 | bwd_inner_microstep: 788.50 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830
[2024-06-11 01:01:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1481.29 | bwd_inner_microstep: 1481.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1979
[2024-06-11 01:01:50,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.73 | bwd_microstep: 858.17 | bwd_inner_microstep: 858.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-11 01:01:52,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.96 | bwd_microstep: 1294.18 | bwd_inner_microstep: 1294.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3793
[2024-06-11 01:01:54,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.18 | bwd_microstep: 1352.04 | bwd_inner_microstep: 1352.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 01:01:56,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1295.87 | bwd_inner_microstep: 1295.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-11 01:01:58,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.00 | bwd_microstep: 1428.03 | bwd_inner_microstep: 1428.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503
[2024-06-11 01:02:00,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.11 | bwd_microstep: 1222.75 | bwd_inner_microstep: 1222.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 01:02:01,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.95 | bwd_microstep: 1288.32 | bwd_inner_microstep: 1288.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 01:02:03,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 01:02:05,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1354.53 | bwd_inner_microstep: 1354.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-11 01:02:06,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 793.21 | bwd_inner_microstep: 793.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-11 01:02:08,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1484.37 | bwd_inner_microstep: 1484.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3908
[2024-06-11 01:02:11,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.33 | bwd_microstep: 1662.48 | bwd_inner_microstep: 1662.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-11 01:02:13,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1286.83 | bwd_inner_microstep: 1286.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823
[2024-06-11 01:02:15,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.10 | bwd_microstep: 1751.74 | bwd_inner_microstep: 1751.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-11 01:02:17,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1494.57 | bwd_inner_microstep: 1494.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-11 01:02:19,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1288.01 | bwd_inner_microstep: 1287.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925
[2024-06-11 01:02:20,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.87 | bwd_microstep: 697.96 | bwd_inner_microstep: 697.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3853
[2024-06-11 01:02:22,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.01 | bwd_microstep: 1668.38 | bwd_inner_microstep: 1668.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3620
[2024-06-11 01:02:24,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.85 | bwd_microstep: 1535.49 | bwd_inner_microstep: 1535.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486
[2024-06-11 01:02:26,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.20 | bwd_microstep: 1315.06 | bwd_inner_microstep: 1315.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3439
[2024-06-11 01:02:28,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1410.33 | bwd_inner_microstep: 1410.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817
[2024-06-11 01:02:30,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 1507.08 | bwd_inner_microstep: 1507.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-11 01:02:32,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.96 | bwd_microstep: 1504.57 | bwd_inner_microstep: 1504.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 01:02:34,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.94 | bwd_microstep: 1655.52 | bwd_inner_microstep: 1655.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-11 01:02:36,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1310.19 | bwd_inner_microstep: 1310.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-11 01:02:38,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.36 | bwd_microstep: 1300.53 | bwd_inner_microstep: 1300.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 01:02:40,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1397.91 | bwd_inner_microstep: 1397.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-11 01:02:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.88 | bwd_microstep: 1406.84 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-11 01:02:44,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.09 | bwd_microstep: 1752.85 | bwd_inner_microstep: 1752.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 01:02:49,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.13 | optimizer_step: 6.61
[2024-06-11 01:02:49,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 3833.66 | bwd_inner_microstep: 1764.67 | bwd_allreduce_microstep: 2068.94 | step_microstep: 39.24
[2024-06-11 01:02:49,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16334.94 | bwd: 45902.42 | bwd_inner: 43832.47 | bwd_allreduce: 2069.22 | step: 40.84
{'loss': 1.1339, 'learning_rate': 3.4150519718608744e-06, 'epoch': 0.82}
.12s/it]
 81%|████████▏ | 1406/1726 [24:21:20<5:45:15, 64.74s/it]


 81%|████████▏ | 1406/1726 [24:21:20<5:45:15, 64.74s/it]
 82%|████████▏ | 1407/1726 [24:22:22<5:39:04, 63.78s/it]


 82%|████████▏ | 1407/1726 [24:22:22<5:39:04, 63.78s/it]
 82%|████████▏ | 1408/1726 [24:23:23<5:33:21, 62.90s/it]


 82%|████████▏ | 1408/1726 [24:23:23<5:33:21, 62.90s/it]
 82%|████████▏ | 1409/1726 [24:24:23<5:27:51, 62.05s/it]


 82%|████████▏ | 1409/1726 [24:24:23<5:27:51, 62.05s/it]
 82%|████████▏ | 1410/1726 [24:25:25<5:27:39, 62.21s/it]


 82%|████████▏ | 1410/1726 [24:2dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-11 01:02:50,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.63 | bwd_microstep: 905.53 | bwd_inner_microstep: 905.46 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-11 01:02:51,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 799.69 | bwd_inner_microstep: 799.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 01:02:53,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-11 01:02:55,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.31 | bwd_microstep: 1645.98 | bwd_inner_microstep: 1645.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885
[2024-06-11 01:02:56,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.88 | bwd_microstep: 715.93 | bwd_inner_microstep: 715.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2467
[2024-06-11 01:02:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.28 | bwd_microstep: 858.38 | bwd_inner_microstep: 858.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-11 01:02:59,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.02 | bwd_microstep: 1303.38 | bwd_inner_microstep: 1303.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 01:03:01,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1389.38 | bwd_inner_microstep: 1389.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3491
[2024-06-11 01:03:03,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-11 01:03:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.28 | bwd_microstep: 1499.39 | bwd_inner_microstep: 1499.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 01:03:07,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.86 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 01:03:09,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 01:03:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.38 | bwd_microstep: 1379.63 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-11 01:03:13,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1483.95 | bwd_inner_microstep: 1483.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002
[2024-06-11 01:03:14,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.09 | bwd_microstep: 901.16 | bwd_inner_microstep: 901.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-11 01:03:16,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.21 | bwd_microstep: 1414.30 | bwd_inner_microstep: 1414.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644
[2024-06-11 01:03:18,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.22 | bwd_microstep: 1447.34 | bwd_inner_microstep: 1447.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3439
[2024-06-11 01:03:20,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.36 | bwd_microstep: 1216.13 | bwd_inner_microstep: 1216.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 01:03:22,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1279.59 | bwd_inner_microstep: 1279.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 01:03:24,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1384.68 | bwd_inner_microstep: 1384.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2090
[2024-06-11 01:03:25,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.80 | bwd_microstep: 731.46 | bwd_inner_microstep: 731.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2185
[2024-06-11 01:03:26,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.83 | bwd_microstep: 922.93 | bwd_inner_microstep: 922.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-11 01:03:28,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.70 | bwd_microstep: 1220.04 | bwd_inner_microstep: 1220.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 01:03:30,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1562.17 | bwd_inner_microstep: 1562.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 01:03:32,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1403.05 | bwd_inner_microstep: 1403.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2078
[2024-06-11 01:03:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.23 | bwd_microstep: 727.65 | bwd_inner_microstep: 727.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 01:03:35,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1415.28 | bwd_inner_microstep: 1415.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574
[2024-06-11 01:03:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.70 | bwd_microstep: 1425.33 | bwd_inner_microstep: 1425.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461
[2024-06-11 01:03:38,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.64 | bwd_microstep: 1313.18 | bwd_inner_microstep: 1313.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584
[2024-06-11 01:03:40,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.69 | bwd_microstep: 1537.95 | bwd_inner_microstep: 1537.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 01:03:43,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.67 | bwd_microstep: 1470.92 | bwd_inner_microstep: 1470.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-11 01:03:52,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.28 | optimizer_step: 6.62
[2024-06-11 01:03:52,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 8464.59 | bwd_inner_microstep: 1516.64 | bwd_allreduce_microstep: 6947.89 | step_microstep: 39.31
[2024-06-11 01:03:52,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15106.72 | bwd: 47383.53 | bwd_inner: 40434.68 | bwd_allreduce: 6948.15 | step: 40.92
{'loss': 1.1595, 'learning_rate': 3.3941041457524748e-06, 'epoch': 0.82}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515
[2024-06-11 01:03:53,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.42 | bwd_microstep: 1340.70 | bwd_inner_microstep: 1340.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2406
[2024-06-11 01:03:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.77 | bwd_microstep: 998.44 | bwd_inner_microstep: 998.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3924
[2024-06-11 01:03:57,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.30 | bwd_microstep: 1686.15 | bwd_inner_microstep: 1686.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-11 01:03:59,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.06 | bwd_microstep: 1646.25 | bwd_inner_microstep: 1646.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-11 01:04:02,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.59 | bwd_microstep: 1651.76 | bwd_inner_microstep: 1651.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2551
[2024-06-11 01:04:03,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.61 | bwd_microstep: 1031.29 | bwd_inner_microstep: 1031.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 01:04:05,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.52 | bwd_microstep: 1291.42 | bwd_inner_microstep: 1291.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-11 01:04:07,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1530.49 | bwd_inner_microstep: 1530.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 01:04:09,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.33 | bwd_microstep: 1382.28 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 01:04:11,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.22 | bwd_microstep: 1246.90 | bwd_inner_microstep: 1246.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 01:04:12,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.53 | bwd_microstep: 1287.37 | bwd_inner_microstep: 1287.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2128
[2024-06-11 01:04:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.49 | bwd_microstep: 892.31 | bwd_inner_microstep: 892.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3725
[2024-06-11 01:04:16,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1591.94 | bwd_inner_microstep: 1591.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-11 01:04:18,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.80 | bwd_microstep: 1409.93 | bwd_inner_microstep: 1409.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-11 01:04:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.21 | bwd_microstep: 892.22 | bwd_inner_microstep: 892.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-11 01:04:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1445.59 | bwd_inner_microstep: 1445.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 01:04:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.80 | bwd_microstep: 1451.30 | bwd_inner_microstep: 1451.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-11 01:04:25,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1613.33 | bwd_inner_microstep: 1613.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 01:04:27,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.00 | bwd_microstep: 1185.70 | bwd_inner_microstep: 1185.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 01:04:29,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1280.71 | bwd_inner_microstep: 1280.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 01:04:30,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.41 | bwd_microstep: 1282.83 | bwd_inner_microstep: 1282.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983
[2024-06-11 01:04:31,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.82 | bwd_microstep: 705.37 | bwd_inner_microstep: 705.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-11 01:04:33,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.62 | bwd_microstep: 1313.38 | bwd_inner_microstep: 1313.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 01:04:35,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.08 | bwd_microstep: 1556.48 | bwd_inner_microstep: 1556.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-11 01:04:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.14 | bwd_microstep: 806.45 | bwd_inner_microstep: 806.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3554
[2024-06-11 01:04:38,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1421.95 | bwd_inner_microstep: 1421.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3562
[2024-06-11 01:04:41,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.35 | bwd_microstep: 1698.81 | bwd_inner_microstep: 1698.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-11 01:04:43,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1654.34 | bwd_inner_microstep: 1654.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539
[2024-06-11 01:04:45,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1589.94 | bwd_inner_microstep: 1589.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2034
[2024-06-11 01:04:46,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.22 | bwd_microstep: 716.21 | bwd_inner_microstep: 716.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806
[2024-06-11 01:04:48,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.60 | bwd_microstep: 1475.86 | bwd_inner_microstep: 1475.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484
[2024-06-11 01:04:53,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-11 01:04:53,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.72 | bwd_microstep: 3971.47 | bwd_inner_microstep: 1786.97 | bwd_allreduce_microstep: 2184.44 | step_microstep: 39.41
[2024-06-11 01:04:53,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15930.28 | bwd: 45049.21 | bwd_inner: 42863.84 | bwd_allreduce: 2184.67 | step: 40.92
{'loss': 1.1928, 'learning_rate': 3.3732148055229463e-06, 'epoch': 0.82}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-11 01:04:55,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.07 | bwd_microstep: 1339.06 | bwd_inner_microstep: 1338.99 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 01:04:56,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1247.29 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3909
[2024-06-11 01:04:59,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1487.29 | bwd_inner_microstep: 1487.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 01:05:00,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1383.46 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 01:05:02,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1387.72 | bwd_inner_microstep: 1387.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807
[2024-06-11 01:05:04,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.91 | bwd_microstep: 1480.81 | bwd_inner_microstep: 1480.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-11 01:05:06,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.56 | bwd_microstep: 1184.26 | bwd_inner_microstep: 1184.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-11 01:05:08,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 01:05:10,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1398.90 | bwd_inner_microstep: 1398.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3409
[2024-06-11 01:05:11,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.45 | bwd_microstep: 1211.46 | bwd_inner_microstep: 1211.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3446
[2024-06-11 01:05:13,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.41 | bwd_microstep: 1240.40 | bwd_inner_microstep: 1240.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-11 01:05:15,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1582.35 | bwd_inner_microstep: 1582.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3675
[2024-06-11 01:05:18,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.57 | bwd_microstep: 1685.67 | bwd_inner_microstep: 1685.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2415
[2024-06-11 01:05:19,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.37 | bwd_microstep: 823.66 | bwd_inner_microstep: 823.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3640
[2024-06-11 01:05:21,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.47 | bwd_microstep: 1761.47 | bwd_inner_microstep: 1761.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 01:05:23,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1338.04 | bwd_inner_microstep: 1338.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3675
[2024-06-11 01:05:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.02 | bwd_microstep: 1404.32 | bwd_inner_microstep: 1404.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917
[2024-06-11 01:05:26,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.97 | bwd_microstep: 688.41 | bwd_inner_microstep: 688.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438
[2024-06-11 01:05:28,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.36 | bwd_microstep: 1283.20 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-11 01:05:30,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1399.53 | bwd_inner_microstep: 1399.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3832
[2024-06-11 01:05:32,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.88 | bwd_microstep: 1689.12 | bwd_inner_microstep: 1689.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 01:05:34,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.39 | bwd_microstep: 1493.38 | bwd_inner_microstep: 1493.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 01:05:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1257.98 | bwd_inner_microstep: 1257.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 01:05:38,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.27 | bwd_microstep: 1389.20 | bwd_inner_microstep: 1389.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-11 01:05:40,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.66 | bwd_microstep: 1460.17 | bwd_inner_microstep: 1460.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-11 01:05:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 01:05:43,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1352.39 | bwd_inner_microstep: 1352.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 01:05:45,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1375.48 | bwd_inner_microstep: 1375.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585
[2024-06-11 01:05:47,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.19 | bwd_microstep: 1339.48 | bwd_inner_microstep: 1339.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-11 01:05:49,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1406.12 | bwd_inner_microstep: 1406.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046
[2024-06-11 01:05:50,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.64 | bwd_microstep: 906.38 | bwd_inner_microstep: 906.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3691
[2024-06-11 01:05:55,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.13 | optimizer_step: 6.59
[2024-06-11 01:05:55,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.28 | bwd_microstep: 3646.50 | bwd_inner_microstep: 2067.06 | bwd_allreduce_microstep: 1579.39 | step_microstep: 39.00
[2024-06-11 01:05:55,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16243.45 | bwd: 45227.00 | bwd_inner: 43646.64 | bwd_allreduce: 1579.64 | step: 40.63
{'loss': 1.1653, 'learning_rate': 3.3523840247444394e-06, 'epoch': 0.82}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2630
[2024-06-11 01:05:56,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.39 | bwd_microstep: 1058.18 | bwd_inner_microstep: 1058.04 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3921
[2024-06-11 01:05:58,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.56 | bwd_microstep: 1551.98 | bwd_inner_microstep: 1551.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 01:06:00,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1340.27 | bwd_inner_microstep: 1340.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-11 01:06:02,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.11 | bwd_microstep: 1149.44 | bwd_inner_microstep: 1149.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-11 01:06:03,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1246.66 | bwd_inner_microstep: 1246.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-11 01:06:06,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.05 | bwd_microstep: 1532.02 | bwd_inner_microstep: 1532.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-11 01:06:07,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.20 | bwd_inner_microstep: 1253.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-11 01:06:08,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.92 | bwd_microstep: 800.44 | bwd_inner_microstep: 800.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 01:06:10,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-11 01:06:12,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1416.23 | bwd_inner_microstep: 1416.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-11 01:06:14,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.30 | bwd_microstep: 1531.96 | bwd_inner_microstep: 1531.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3517
[2024-06-11 01:06:16,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3459
[2024-06-11 01:06:18,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1404.35 | bwd_inner_microstep: 1404.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-11 01:06:19,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 798.28 | bwd_inner_microstep: 798.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-11 01:06:21,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1519.33 | bwd_inner_microstep: 1519.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2074
[2024-06-11 01:06:23,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.43 | bwd_microstep: 915.28 | bwd_inner_microstep: 915.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3634
[2024-06-11 01:06:25,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.76 | bwd_microstep: 1707.45 | bwd_inner_microstep: 1707.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3637
[2024-06-11 01:06:27,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.91 | bwd_microstep: 1709.38 | bwd_inner_microstep: 1709.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-11 01:06:29,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.46 | bwd_microstep: 1162.35 | bwd_inner_microstep: 1162.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-11 01:06:31,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1417.67 | bwd_inner_microstep: 1417.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 01:06:33,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.94 | bwd_microstep: 1406.41 | bwd_inner_microstep: 1406.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-11 01:06:35,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.04 | bwd_microstep: 1522.57 | bwd_inner_microstep: 1522.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-11 01:06:37,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1349.56 | bwd_inner_microstep: 1349.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 01:06:39,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1248.62 | bwd_inner_microstep: 1248.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3715
[2024-06-11 01:06:40,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1367.78 | bwd_inner_microstep: 1367.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 01:06:43,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 1498.38 | bwd_inner_microstep: 1498.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 01:06:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.22 | bwd_microstep: 1558.71 | bwd_inner_microstep: 1558.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-11 01:06:47,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.09 | bwd_microstep: 1508.23 | bwd_inner_microstep: 1508.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813
[2024-06-11 01:06:49,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.10 | bwd_microstep: 1754.88 | bwd_inner_microstep: 1754.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-11 01:06:51,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.28 | bwd_microstep: 1594.57 | bwd_inner_microstep: 1594.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-11 01:06:54,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.74 | bwd_microstep: 1650.37 | bwd_inner_microstep: 1650.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3813
[2024-06-11 01:07:00,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.13 | optimizer_step: 6.61
[2024-06-11 01:07:00,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.99 | bwd_microstep: 5370.73 | bwd_inner_microstep: 1915.27 | bwd_allreduce_microstep: 3455.40 | step_microstep: 38.93
[2024-06-11 01:07:00,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16564.40 | bwd: 48082.03 | bwd_inner: 44625.60 | bwd_allreduce: 3455.68 | step: 40.72
{'loss': 1.163, 'learning_rate': 3.3316118767828498e-06, 'epoch': 0.82}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-11 01:07:02,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.19 | bwd_microstep: 1406.37 | bwd_inner_microstep: 1406.28 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-11 01:07:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.34 | bwd_microstep: 1475.19 | bwd_inner_microstep: 1475.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3803
[2024-06-11 01:07:06,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.14 | bwd_microstep: 1617.69 | bwd_inner_microstep: 1617.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2427
[2024-06-11 01:07:07,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.54 | bwd_microstep: 908.80 | bwd_inner_microstep: 908.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 01:07:09,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897
[2024-06-11 01:07:10,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.12 | bwd_microstep: 685.01 | bwd_inner_microstep: 684.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891
[2024-06-11 01:07:11,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 713.90 | bwd_inner_microstep: 713.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 01:07:13,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1386.92 | bwd_inner_microstep: 1386.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-11 01:07:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1315.65 | bwd_inner_microstep: 1315.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410
[2024-06-11 01:07:16,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.77 | bwd_microstep: 1307.24 | bwd_inner_microstep: 1307.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-11 01:07:19,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.35 | bwd_microstep: 1628.71 | bwd_inner_microstep: 1628.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 01:07:21,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-11 01:07:23,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1587.54 | bwd_inner_microstep: 1587.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 01:07:25,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3074
[2024-06-11 01:07:26,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1242.35 | bwd_inner_microstep: 1242.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-11 01:07:28,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.78 | bwd_microstep: 974.11 | bwd_inner_microstep: 974.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-11 01:07:30,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1494.42 | bwd_inner_microstep: 1494.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-11 01:07:32,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.09 | bwd_microstep: 1634.90 | bwd_inner_microstep: 1634.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876
[2024-06-11 01:07:33,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.10 | bwd_microstep: 710.00 | bwd_inner_microstep: 709.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-11 01:07:35,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.39 | bwd_microstep: 1199.21 | bwd_inner_microstep: 1199.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-11 01:07:37,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.58 | bwd_microstep: 1600.53 | bwd_inner_microstep: 1600.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 01:07:39,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1553.87 | bwd_inner_microstep: 1553.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 01:07:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-11 01:07:43,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-11 01:07:45,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1355.03 | bwd_inner_microstep: 1355.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 01:07:47,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1509.98 | bwd_inner_microstep: 1509.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-11 01:07:49,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1455.12 | bwd_inner_microstep: 1455.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3678
[2024-06-11 01:07:52,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.69 | bwd_microstep: 1693.20 | bwd_inner_microstep: 1693.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834
[2024-06-11 01:07:54,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.30 | bwd_microstep: 1754.78 | bwd_inner_microstep: 1754.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-11 01:07:56,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933
[2024-06-11 01:07:57,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.31 | bwd_microstep: 822.47 | bwd_inner_microstep: 822.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3584
[2024-06-11 01:08:01,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.13 | optimizer_step: 6.58
[2024-06-11 01:08:01,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 3682.57 | bwd_inner_microstep: 1643.87 | bwd_allreduce_microstep: 2038.64 | step_microstep: 38.90
[2024-06-11 01:08:01,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16015.59 | bwd: 45143.84 | bwd_inner: 43104.21 | bwd_allreduce: 2038.92 | step: 40.43
{'loss': 1.1865, 'learning_rate': 3.310898434797585e-06, 'epoch': 0.82}
5:25<5:27:39, 62.21s/it]
 82%|████████▏ | 1411/1726 [24:26:28<5:27:35, 62.40s/it]


 82%|████████▏ | 1411/1726 [24:26:28<5:27:35, 62.40s/it]
 82%|████████▏ | 1412/1726 [24:27:30<5:24:51, 62.08s/it]


 82%|████████▏ | 1412/1726 [24:27:30<5:24:51, 62.08s/it]
 82%|████████▏ | 1413/1726 [24:28:31<5:23:26, 62.00s/it]


 82%|████████▏ | 1413/1726 [24:28:31<5:23:26, 62.00s/it]
 82%|████████▏ | 1414/1726 [24:29:36<5:27:05, 62.90s/it]


 82%|████████▏ | 1414/1726 [24:29:36<5:27:05, 62.90s/it]
 82%|████████▏ | 1415/1726 [24:30:38<5:23:52, 62.48s/it]


 82%|████████▏ |dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 01:08:03,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.46 | bwd_microstep: 1379.31 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-11 01:08:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 791.64 | bwd_inner_microstep: 791.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-11 01:08:06,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1320.02 | bwd_inner_microstep: 1319.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 01:08:08,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1383.92 | bwd_inner_microstep: 1383.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 01:08:10,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 01:08:12,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972
[2024-06-11 01:08:13,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.17 | bwd_microstep: 797.77 | bwd_inner_microstep: 797.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 01:08:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.49 | bwd_microstep: 1387.15 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435
[2024-06-11 01:08:16,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.04 | bwd_microstep: 1313.57 | bwd_inner_microstep: 1313.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660
[2024-06-11 01:08:19,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1615.38 | bwd_inner_microstep: 1615.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414
[2024-06-11 01:08:21,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1374.27 | bwd_inner_microstep: 1374.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409
[2024-06-11 01:08:22,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1308.90 | bwd_inner_microstep: 1308.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-11 01:08:24,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.00 | bwd_microstep: 887.93 | bwd_inner_microstep: 887.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 01:08:26,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1601.48 | bwd_inner_microstep: 1601.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137
[2024-06-11 01:08:27,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.41 | bwd_microstep: 832.67 | bwd_inner_microstep: 832.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 01:08:29,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1258.12 | bwd_inner_microstep: 1258.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-11 01:08:31,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.81 | bwd_microstep: 1391.60 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-11 01:08:33,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.63 | bwd_microstep: 1584.10 | bwd_inner_microstep: 1584.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3824
[2024-06-11 01:08:35,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1482.93 | bwd_inner_microstep: 1482.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 01:08:37,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1380.36 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-11 01:08:38,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.29 | bwd_microstep: 879.36 | bwd_inner_microstep: 879.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-11 01:08:40,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1404.47 | bwd_inner_microstep: 1404.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2282
[2024-06-11 01:08:41,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.89 | bwd_microstep: 784.57 | bwd_inner_microstep: 784.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3833
[2024-06-11 01:08:44,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.13 | bwd_microstep: 1859.10 | bwd_inner_microstep: 1859.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-11 01:08:46,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.02 | bwd_microstep: 1625.31 | bwd_inner_microstep: 1625.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-11 01:08:48,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1510.13 | bwd_inner_microstep: 1510.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-11 01:08:51,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 2511.73 | bwd_inner_microstep: 2511.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-11 01:08:53,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1527.44 | bwd_inner_microstep: 1527.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-11 01:08:55,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1604.23 | bwd_inner_microstep: 1604.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 01:08:57,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1277.34 | bwd_inner_microstep: 1277.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777
[2024-06-11 01:08:59,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.95 | bwd_microstep: 1652.37 | bwd_inner_microstep: 1652.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 01:09:02,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.05 | optimizer_step: 6.60
[2024-06-11 01:09:02,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1797.45 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 316.45 | step_microstep: 37.84
[2024-06-11 01:09:02,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15949.54 | bwd: 44159.44 | bwd_inner: 43842.08 | bwd_allreduce: 316.68 | step: 39.59
{'loss': 1.1516, 'learning_rate': 3.290243771741275e-06, 'epoch': 0.82}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403
[2024-06-11 01:09:03,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1301.52 | bwd_inner_microstep: 1301.34 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.18
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931
[2024-06-11 01:09:06,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1496.69 | bwd_inner_microstep: 1496.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2981
[2024-06-11 01:09:07,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.80 | bwd_microstep: 1050.34 | bwd_inner_microstep: 1050.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 01:09:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.21 | bwd_microstep: 1346.47 | bwd_inner_microstep: 1346.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-11 01:09:11,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.10 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626
[2024-06-11 01:09:12,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1311.95 | bwd_inner_microstep: 1311.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-11 01:09:15,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.58 | bwd_microstep: 1633.26 | bwd_inner_microstep: 1633.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-11 01:09:16,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-11 01:09:18,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1281.57 | bwd_inner_microstep: 1281.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500
[2024-06-11 01:09:20,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1416.48 | bwd_inner_microstep: 1416.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1950
[2024-06-11 01:09:21,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.79 | bwd_microstep: 850.41 | bwd_inner_microstep: 850.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2152
[2024-06-11 01:09:23,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.75 | bwd_microstep: 909.48 | bwd_inner_microstep: 909.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3413
[2024-06-11 01:09:24,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.94 | bwd_microstep: 1211.62 | bwd_inner_microstep: 1211.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 01:09:26,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1477.52 | bwd_inner_microstep: 1477.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3530
[2024-06-11 01:09:28,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.68 | bwd_microstep: 1451.63 | bwd_inner_microstep: 1451.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-11 01:09:31,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.83 | bwd_microstep: 1717.30 | bwd_inner_microstep: 1717.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931
[2024-06-11 01:09:32,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.58 | bwd_microstep: 697.48 | bwd_inner_microstep: 697.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 01:09:34,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1392.88 | bwd_inner_microstep: 1392.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-11 01:09:35,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.97 | bwd_microstep: 808.62 | bwd_inner_microstep: 808.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701
[2024-06-11 01:09:37,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1530.06 | bwd_inner_microstep: 1530.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-11 01:09:39,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.75 | bwd_microstep: 1598.86 | bwd_inner_microstep: 1598.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1996
[2024-06-11 01:09:40,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.40 | bwd_microstep: 859.24 | bwd_inner_microstep: 859.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-11 01:09:42,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.11 | bwd_microstep: 1592.13 | bwd_inner_microstep: 1592.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 01:09:45,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 1545.76 | bwd_inner_microstep: 1545.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 01:09:47,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1507.52 | bwd_inner_microstep: 1507.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684
[2024-06-11 01:09:48,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.28 | bwd_microstep: 1328.60 | bwd_inner_microstep: 1328.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 01:09:50,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.04 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 01:09:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.60 | bwd_microstep: 1280.39 | bwd_inner_microstep: 1280.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769
[2024-06-11 01:09:54,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.02 | bwd_microstep: 1569.53 | bwd_inner_microstep: 1569.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 01:09:56,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1350.05 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-11 01:09:58,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.48 | bwd_microstep: 1534.82 | bwd_inner_microstep: 1534.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3591
[2024-06-11 01:10:03,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-11 01:10:03,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.42 | bwd_microstep: 4083.39 | bwd_inner_microstep: 1445.98 | bwd_allreduce_microstep: 2637.35 | step_microstep: 38.27
[2024-06-11 01:10:03,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15770.56 | bwd: 44956.11 | bwd_inner: 42317.72 | bwd_allreduce: 2637.66 | step: 39.92
{'loss': 1.1757, 'learning_rate': 3.269647960359532e-06, 'epoch': 0.82}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 01:10:05,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1363.09 | bwd_inner_microstep: 1363.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2440
[2024-06-11 01:10:06,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.51 | bwd_microstep: 1044.44 | bwd_inner_microstep: 1044.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 01:10:08,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.39 | bwd_microstep: 1648.77 | bwd_inner_microstep: 1648.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-11 01:10:10,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1316.15 | bwd_inner_microstep: 1316.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 01:10:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.85 | bwd_microstep: 1385.51 | bwd_inner_microstep: 1385.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788
[2024-06-11 01:10:14,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.64 | bwd_microstep: 1450.63 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-11 01:10:15,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.84 | bwd_microstep: 701.83 | bwd_inner_microstep: 701.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-11 01:10:17,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.55 | bwd_microstep: 1278.15 | bwd_inner_microstep: 1278.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 01:10:18,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.59 | bwd_microstep: 790.85 | bwd_inner_microstep: 790.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074
[2024-06-11 01:10:19,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.12 | bwd_microstep: 818.92 | bwd_inner_microstep: 818.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3759
[2024-06-11 01:10:21,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.83 | bwd_microstep: 1469.67 | bwd_inner_microstep: 1469.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1923
[2024-06-11 01:10:22,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.20 | bwd_microstep: 771.87 | bwd_inner_microstep: 771.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 01:10:24,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1253.28 | bwd_inner_microstep: 1253.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-11 01:10:26,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.89 | bwd_microstep: 1343.05 | bwd_inner_microstep: 1343.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-11 01:10:28,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1390.72 | bwd_inner_microstep: 1390.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-11 01:10:30,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1478.77 | bwd_inner_microstep: 1478.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3444
[2024-06-11 01:10:32,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.76 | bwd_microstep: 1478.72 | bwd_inner_microstep: 1478.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914
[2024-06-11 01:10:33,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.02 | bwd_microstep: 687.31 | bwd_inner_microstep: 687.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3547
[2024-06-11 01:10:34,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1229.34 | bwd_inner_microstep: 1229.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 01:10:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.26 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 01:10:39,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.19 | bwd_microstep: 1648.17 | bwd_inner_microstep: 1648.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-11 01:10:41,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1489.61 | bwd_inner_microstep: 1489.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2084
[2024-06-11 01:10:42,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.54 | bwd_microstep: 918.08 | bwd_inner_microstep: 918.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3556
[2024-06-11 01:10:44,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.11 | bwd_microstep: 1328.04 | bwd_inner_microstep: 1328.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-11 01:10:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1436.04 | bwd_inner_microstep: 1436.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 01:10:48,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1283.97 | bwd_inner_microstep: 1283.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-11 01:10:50,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1499.51 | bwd_inner_microstep: 1499.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 01:10:52,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1555.91 | bwd_inner_microstep: 1555.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-11 01:10:54,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.18 | bwd_microstep: 1423.54 | bwd_inner_microstep: 1423.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 01:10:56,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.40 | bwd_microstep: 1401.49 | bwd_inner_microstep: 1401.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-11 01:10:58,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1504.08 | bwd_inner_microstep: 1504.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-11 01:11:03,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 01:11:03,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.29 | bwd_microstep: 4904.47 | bwd_inner_microstep: 1695.04 | bwd_allreduce_microstep: 3209.38 | step_microstep: 37.81
[2024-06-11 01:11:03,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15554.74 | bwd: 44841.48 | bwd_inner: 41631.20 | bwd_allreduce: 3209.61 | step: 39.26
{'loss': 1.1747, 'learning_rate': 3.2491110731906982e-06, 'epoch': 0.82}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387
[2024-06-11 01:11:05,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1326.65 | bwd_inner_microstep: 1326.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 01:11:07,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.88 | bwd_microstep: 1471.71 | bwd_inner_microstep: 1471.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2327
[2024-06-11 01:11:09,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.51 | bwd_microstep: 981.03 | bwd_inner_microstep: 981.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 01:11:10,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755
[2024-06-11 01:11:12,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1339.00 | bwd_inner_microstep: 1338.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 01:11:14,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1279.51 | bwd_inner_microstep: 1279.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2646
[2024-06-11 01:11:15,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.53 | bwd_microstep: 1019.42 | bwd_inner_microstep: 1019.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-11 01:11:17,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.61 | bwd_microstep: 814.20 | bwd_inner_microstep: 814.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 01:11:19,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1981
[2024-06-11 01:11:20,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.53 | bwd_microstep: 855.57 | bwd_inner_microstep: 855.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2147
[2024-06-11 01:11:21,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.19 | bwd_microstep: 1039.67 | bwd_inner_microstep: 1039.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-11 01:11:23,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.72 | bwd_microstep: 1348.38 | bwd_inner_microstep: 1348.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 01:11:25,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.37 | bwd_microstep: 1245.01 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 01:11:27,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1376.06 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-11 01:11:29,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606
[2024-06-11 01:11:31,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.02 | bwd_microstep: 1537.98 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3634
[2024-06-11 01:11:33,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1343.03 | bwd_inner_microstep: 1343.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3664
[2024-06-11 01:11:35,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.63 | bwd_microstep: 1323.98 | bwd_inner_microstep: 1323.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619
[2024-06-11 01:11:36,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1310.07 | bwd_inner_microstep: 1310.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 01:11:38,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1385.72 | bwd_inner_microstep: 1385.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506
[2024-06-11 01:11:40,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1316.39 | bwd_inner_microstep: 1316.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3576
[2024-06-11 01:11:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1251.82 | bwd_inner_microstep: 1251.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2225
[2024-06-11 01:11:43,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.27 | bwd_microstep: 923.83 | bwd_inner_microstep: 923.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 01:11:45,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1282.54 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190
[2024-06-11 01:11:46,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.15 | bwd_microstep: 858.37 | bwd_inner_microstep: 858.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-11 01:11:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.87 | bwd_microstep: 1308.23 | bwd_inner_microstep: 1308.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2286
[2024-06-11 01:11:49,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.88 | bwd_microstep: 937.26 | bwd_inner_microstep: 937.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-11 01:11:51,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.26 | bwd_microstep: 1642.80 | bwd_inner_microstep: 1642.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-11 01:11:53,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 976.66 | bwd_inner_microstep: 976.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 01:11:55,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1556.42 | bwd_inner_microstep: 1556.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046
[2024-06-11 01:11:56,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.37 | bwd_microstep: 902.51 | bwd_inner_microstep: 902.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379
[2024-06-11 01:12:02,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-11 01:12:02,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.00 | bwd_microstep: 5488.49 | bwd_inner_microstep: 1626.60 | bwd_allreduce_microstep: 3861.84 | step_microstep: 38.90
[2024-06-11 01:12:02,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14855.09 | bwd: 43640.48 | bwd_inner: 39777.73 | bwd_allreduce: 3862.07 | step: 40.37
{'loss': 1.2013, 'learning_rate': 3.2286331825655882e-06, 'epoch': 0.82}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 01:12:04,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1365.12 | bwd_inner_microstep: 1365.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4044
[2024-06-11 01:12:06,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.92 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 01:12:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1242.65 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 01:12:10,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1377.85 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 01:12:12,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756
[2024-06-11 01:12:14,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1442.46 | bwd_inner_microstep: 1442.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3398
[2024-06-11 01:12:16,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.07 | bwd_microstep: 1178.70 | bwd_inner_microstep: 1178.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 01:12:18,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1389.49 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713
[2024-06-11 01:12:20,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.44 | bwd_microstep: 1459.35 | bwd_inner_microstep: 1459.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 01:12:22,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1474.41 | bwd_inner_microstep: 1474.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 01:12:23,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1378.08 | bwd_inner_microstep: 1378.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-11 01:12:26,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1480.88 | bwd_inner_microstep: 1480.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 01:12:27,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1375.22 | bwd_inner_microstep: 1375.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641
[2024-06-11 01:12:29,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.01 | bwd_microstep: 1435.72 | bwd_inner_microstep: 1435.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 01:12:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.97 | bwd_microstep: 1450.93 | bwd_inner_microstep: 1450.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 01:12:33,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1398.22 | bwd_inner_microstep: 1398.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2069
[2024-06-11 01:12:34,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.39 | bwd_microstep: 753.80 | bwd_inner_microstep: 753.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100
[2024-06-11 01:12:36,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.98 | bwd_microstep: 921.17 | bwd_inner_microstep: 921.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 01:12:38,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.65 | bwd_microstep: 1405.56 | bwd_inner_microstep: 1405.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-11 01:12:40,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1414.87 | bwd_inner_microstep: 1414.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-11 01:12:42,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1415.74 | bwd_inner_microstep: 1415.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-11 01:12:44,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.44 | bwd_microstep: 1451.44 | bwd_inner_microstep: 1451.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3541
[2024-06-11 01:12:46,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1520.77 | bwd_inner_microstep: 1520.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590
[2024-06-11 01:12:47,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.51 | bwd_microstep: 1307.42 | bwd_inner_microstep: 1307.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997
[2024-06-11 01:12:49,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.25 | bwd_microstep: 833.91 | bwd_inner_microstep: 833.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 01:12:51,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.43 | bwd_microstep: 1657.85 | bwd_inner_microstep: 1657.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-11 01:12:53,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1518.41 | bwd_inner_microstep: 1518.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1911
[2024-06-11 01:12:54,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.65 | bwd_microstep: 718.28 | bwd_inner_microstep: 718.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3674
[2024-06-11 01:12:56,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.33 | bwd_microstep: 1585.58 | bwd_inner_microstep: 1585.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2533
[2024-06-11 01:12:58,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.70 | bwd_microstep: 1091.58 | bwd_inner_microstep: 1091.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563
[2024-06-11 01:13:00,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 1462.62 | bwd_inner_microstep: 1462.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237
[2024-06-11 01:13:04,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.02 | optimizer_step: 6.58
[2024-06-11 01:13:04,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.46 | bwd_microstep: 3766.96 | bwd_inner_microstep: 983.79 | bwd_allreduce_microstep: 2783.12 | step_microstep: 37.34
[2024-06-11 01:13:04,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15882.59 | bwd: 45307.01 | bwd_inner: 42523.00 | bwd_allreduce: 2783.35 | step: 38.87
{'loss': 1.2181, 'learning_rate': 3.20821436060722e-06, 'epoch': 0.82}
 1415/1726 [24:30:38<5:23:52, 62.48s/it]
 82%|████████▏ | 1416/1726 [24:31:38<5:19:41, 61.87s/it]


 82%|████████▏ | 1416/1726 [24:31:38<5:19:41, 61.87s/it]
 82%|████████▏ | 1417/1726 [24:32:39<5:17:25, 61.63s/it]


 82%|████████▏ | 1417/1726 [24:32:39<5:17:25, 61.63s/it]
 82%|████████▏ | 1418/1726 [24:33:40<5:14:59, 61.36s/it]


 82%|████████▏ | 1418/1726 [24:33:40<5:14:59, 61.36s/it]
 82%|████████▏ | 1419/1726 [24:34:39<5:10:03, 60.60s/it]


 82%|████████▏ | 1419/1726 [24:34:39<5:10:03, 60.60s/it]
 82%|████████▏ | 1420/1726 [24:35:41<5:10:28, 60.88s/it]


 82%|████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 01:13:06,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.85 | bwd_microstep: 1328.96 | bwd_inner_microstep: 1328.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-11 01:13:07,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1282.52 | bwd_inner_microstep: 1282.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2351
[2024-06-11 01:13:09,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.20 | bwd_microstep: 985.44 | bwd_inner_microstep: 985.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821
[2024-06-11 01:13:11,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.45 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-11 01:13:13,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.50 | bwd_microstep: 1283.77 | bwd_inner_microstep: 1283.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961
[2024-06-11 01:13:14,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 703.60 | bwd_inner_microstep: 703.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-11 01:13:15,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.51 | bwd_microstep: 969.19 | bwd_inner_microstep: 969.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3719
[2024-06-11 01:13:17,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1363.76 | bwd_inner_microstep: 1363.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 01:13:19,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1251.96 | bwd_inner_microstep: 1251.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3702
[2024-06-11 01:13:21,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.69 | bwd_microstep: 1579.03 | bwd_inner_microstep: 1579.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3416
[2024-06-11 01:13:23,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.99 | bwd_microstep: 1327.81 | bwd_inner_microstep: 1327.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3672
[2024-06-11 01:13:25,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.30 | bwd_microstep: 1449.39 | bwd_inner_microstep: 1449.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-11 01:13:27,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.70 | bwd_microstep: 1487.39 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523
[2024-06-11 01:13:29,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1590.56 | bwd_inner_microstep: 1590.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 01:13:31,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1490.81 | bwd_inner_microstep: 1490.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910
[2024-06-11 01:13:32,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.08 | bwd_microstep: 778.30 | bwd_inner_microstep: 778.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-11 01:13:34,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.29 | bwd_microstep: 1294.34 | bwd_inner_microstep: 1294.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 807
[2024-06-11 01:13:34,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 122.57 | bwd_microstep: 311.99 | bwd_inner_microstep: 311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476
[2024-06-11 01:13:36,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1407.46 | bwd_inner_microstep: 1407.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2105
[2024-06-11 01:13:37,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.76 | bwd_microstep: 822.84 | bwd_inner_microstep: 822.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-11 01:13:39,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1497.31 | bwd_inner_microstep: 1497.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3443
[2024-06-11 01:13:41,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.68 | bwd_microstep: 1332.90 | bwd_inner_microstep: 1332.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3441
[2024-06-11 01:13:43,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.28 | bwd_microstep: 1300.48 | bwd_inner_microstep: 1300.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-11 01:13:45,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1557.76 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-11 01:13:47,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1497.76 | bwd_inner_microstep: 1497.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-11 01:13:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.04 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3577
[2024-06-11 01:13:51,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.27 | bwd_microstep: 1557.46 | bwd_inner_microstep: 1557.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601
[2024-06-11 01:13:54,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.62 | bwd_microstep: 1706.20 | bwd_inner_microstep: 1706.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-11 01:13:55,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1346.50 | bwd_inner_microstep: 1346.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-11 01:13:57,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 1458.76 | bwd_inner_microstep: 1458.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3586
[2024-06-11 01:14:00,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.11 | bwd_microstep: 1805.76 | bwd_inner_microstep: 1805.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-11 01:14:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.05 | optimizer_step: 6.59
[2024-06-11 01:14:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.31 | bwd_microstep: 4313.18 | bwd_inner_microstep: 1985.21 | bwd_allreduce_microstep: 2327.92 | step_microstep: 37.56
[2024-06-11 01:14:05,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15836.07 | bwd: 44998.53 | bwd_inner: 42669.70 | bwd_allreduce: 2328.14 | step: 38.96
{'loss': 1.156, 'learning_rate': 3.1878546792305908e-06, 'epoch': 0.82}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-11 01:14:06,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.45 | bwd_microstep: 779.97 | bwd_inner_microstep: 779.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3945
[2024-06-11 01:14:08,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.17 | bwd_microstep: 1524.75 | bwd_inner_microstep: 1524.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-11 01:14:10,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.13 | bwd_microstep: 1350.87 | bwd_inner_microstep: 1350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407
[2024-06-11 01:14:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.94 | bwd_microstep: 1208.47 | bwd_inner_microstep: 1208.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-11 01:14:13,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.21 | bwd_microstep: 787.47 | bwd_inner_microstep: 787.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415
[2024-06-11 01:14:14,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.79 | bwd_microstep: 1149.02 | bwd_inner_microstep: 1148.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-11 01:14:16,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.65 | bwd_microstep: 1427.53 | bwd_inner_microstep: 1427.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2158
[2024-06-11 01:14:18,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 878.42 | bwd_inner_microstep: 878.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2630
[2024-06-11 01:14:19,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.19 | bwd_microstep: 1015.90 | bwd_inner_microstep: 1015.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-11 01:14:21,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.89 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3474
[2024-06-11 01:14:23,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.24 | bwd_microstep: 1214.94 | bwd_inner_microstep: 1214.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-11 01:14:24,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.91 | bwd_microstep: 691.55 | bwd_inner_microstep: 691.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 01:14:26,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1499.95 | bwd_inner_microstep: 1499.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3494
[2024-06-11 01:14:27,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.46 | bwd_microstep: 1343.25 | bwd_inner_microstep: 1343.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-11 01:14:29,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1412.78 | bwd_inner_microstep: 1412.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635
[2024-06-11 01:14:31,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1418.05 | bwd_inner_microstep: 1418.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 899
[2024-06-11 01:14:32,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.75 | bwd_microstep: 371.74 | bwd_inner_microstep: 371.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988
[2024-06-11 01:14:33,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.14 | bwd_microstep: 800.77 | bwd_inner_microstep: 800.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-11 01:14:35,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1253.30 | bwd_inner_microstep: 1253.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-11 01:14:37,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.08 | bwd_microstep: 1489.78 | bwd_inner_microstep: 1489.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 01:14:39,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1397.65 | bwd_inner_microstep: 1397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604
[2024-06-11 01:14:41,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.69 | bwd_microstep: 1440.24 | bwd_inner_microstep: 1440.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-11 01:14:42,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.65 | bwd_microstep: 1183.96 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-11 01:14:44,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1326.71 | bwd_inner_microstep: 1326.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 01:14:46,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.15 | bwd_microstep: 1660.81 | bwd_inner_microstep: 1660.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3785
[2024-06-11 01:14:49,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1575.09 | bwd_inner_microstep: 1575.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 01:14:51,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580
[2024-06-11 01:14:53,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.82 | bwd_microstep: 1596.57 | bwd_inner_microstep: 1596.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1922
[2024-06-11 01:14:54,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.18 | bwd_microstep: 756.88 | bwd_inner_microstep: 756.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3780
[2024-06-11 01:14:56,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.23 | bwd_microstep: 1742.28 | bwd_inner_microstep: 1742.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 01:14:58,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1502.44 | bwd_inner_microstep: 1502.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 01:15:06,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 01:15:06,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 6736.31 | bwd_inner_microstep: 1561.80 | bwd_allreduce_microstep: 5174.46 | step_microstep: 37.76
[2024-06-11 01:15:06,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14985.46 | bwd: 45267.45 | bwd_inner: 40092.06 | bwd_allreduce: 5174.69 | step: 39.24
{'loss': 1.154, 'learning_rate': 3.167554210142374e-06, 'epoch': 0.82}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-11 01:15:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1327.50 | bwd_inner_microstep: 1327.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 01:15:09,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.01 | bwd_microstep: 1240.55 | bwd_inner_microstep: 1240.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-11 01:15:11,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.83 | bwd_microstep: 1641.85 | bwd_inner_microstep: 1641.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1898
[2024-06-11 01:15:12,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.41 | bwd_microstep: 771.49 | bwd_inner_microstep: 771.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 01:15:14,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.43 | bwd_microstep: 1242.08 | bwd_inner_microstep: 1242.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-11 01:15:16,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.45 | bwd_microstep: 1625.37 | bwd_inner_microstep: 1625.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938
[2024-06-11 01:15:17,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.83 | bwd_microstep: 725.91 | bwd_inner_microstep: 725.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 01:15:19,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1387.53 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-11 01:15:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.30 | bwd_microstep: 1617.30 | bwd_inner_microstep: 1617.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 01:15:23,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1257.78 | bwd_inner_microstep: 1257.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 01:15:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.45 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569
[2024-06-11 01:15:27,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.29 | bwd_microstep: 1590.29 | bwd_inner_microstep: 1590.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 01:15:29,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1344.44 | bwd_inner_microstep: 1344.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1948
[2024-06-11 01:15:30,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.44 | bwd_microstep: 885.53 | bwd_inner_microstep: 885.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 01:15:32,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.53 | bwd_microstep: 1242.83 | bwd_inner_microstep: 1242.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 01:15:34,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.13 | bwd_microstep: 1385.01 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-11 01:15:36,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.77 | bwd_microstep: 1243.18 | bwd_inner_microstep: 1243.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 01:15:38,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.24 | bwd_microstep: 1517.48 | bwd_inner_microstep: 1517.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 01:15:39,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1253.06 | bwd_inner_microstep: 1253.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-11 01:15:41,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1459.22 | bwd_inner_microstep: 1459.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-11 01:15:43,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1356.25 | bwd_inner_microstep: 1356.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2183
[2024-06-11 01:15:45,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.94 | bwd_microstep: 952.84 | bwd_inner_microstep: 952.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-11 01:15:47,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.17 | bwd_microstep: 1405.47 | bwd_inner_microstep: 1405.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-11 01:15:48,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1159.17 | bwd_inner_microstep: 1159.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-11 01:15:50,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.57 | bwd_microstep: 1451.00 | bwd_inner_microstep: 1450.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042
[2024-06-11 01:15:51,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.92 | bwd_microstep: 908.38 | bwd_inner_microstep: 908.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-11 01:15:53,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.79 | bwd_microstep: 1355.17 | bwd_inner_microstep: 1355.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3845
[2024-06-11 01:15:56,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.38 | bwd_microstep: 1591.19 | bwd_inner_microstep: 1591.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-11 01:15:57,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.70 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815
[2024-06-11 01:15:59,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.68 | bwd_microstep: 1448.76 | bwd_inner_microstep: 1448.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-11 01:16:01,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1444.74 | bwd_inner_microstep: 1444.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-11 01:16:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.61
[2024-06-11 01:16:50,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 48047.10 | bwd_inner_microstep: 1682.02 | bwd_allreduce_microstep: 46365.01 | step_microstep: 38.85
[2024-06-11 01:16:50,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15724.17 | bwd: 88565.35 | bwd_inner: 42199.42 | bwd_allreduce: 46365.25 | step: 40.30
{'loss': 1.1723, 'learning_rate': 3.1473130248407278e-06, 'epoch': 0.82}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 01:16:52,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.03 | bwd_microstep: 1271.55 | bwd_inner_microstep: 1271.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3872
[2024-06-11 01:16:54,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.14 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-11 01:16:56,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.92 | bwd_microstep: 1640.38 | bwd_inner_microstep: 1640.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-11 01:16:58,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.81 | bwd_microstep: 958.32 | bwd_inner_microstep: 958.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 01:16:59,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1273.63 | bwd_inner_microstep: 1273.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628
[2024-06-11 01:17:01,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1308.67 | bwd_inner_microstep: 1308.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-11 01:17:03,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.08 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 01:17:05,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1387.52 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3509
[2024-06-11 01:18:21,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.33 | bwd_microstep: 1497.96 | bwd_inner_microstep: 1497.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 01:18:23,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1374.61 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-11 01:18:25,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.68 | bwd_microstep: 1336.40 | bwd_inner_microstep: 1336.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 01:18:27,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3637
[2024-06-11 01:18:29,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.29 | bwd_microstep: 1690.36 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 01:18:31,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.32 | bwd_microstep: 1374.37 | bwd_inner_microstep: 1374.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089
[2024-06-11 01:18:32,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.49 | bwd_microstep: 847.49 | bwd_inner_microstep: 847.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 01:18:34,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.67 | bwd_microstep: 1275.53 | bwd_inner_microstep: 1275.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-11 01:18:36,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.86 | bwd_microstep: 1507.38 | bwd_inner_microstep: 1507.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551
[2024-06-11 01:18:38,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1227.69 | bwd_inner_microstep: 1227.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2227
[2024-06-11 01:18:39,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.62 | bwd_microstep: 860.64 | bwd_inner_microstep: 860.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-11 01:18:41,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.64 | bwd_microstep: 1284.13 | bwd_inner_microstep: 1284.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 01:18:43,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.03 | bwd_microstep: 1277.86 | bwd_inner_microstep: 1277.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 01:18:44,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 01:18:47,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.64 | bwd_microstep: 1552.67 | bwd_inner_microstep: 1552.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-11 01:18:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1298.11 | bwd_inner_microstep: 1298.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-11 01:18:49,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.05 | bwd_microstep: 814.44 | bwd_inner_microstep: 814.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387
[2024-06-11 01:18:51,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.26 | bwd_microstep: 1335.15 | bwd_inner_microstep: 1335.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-11 01:18:53,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1343.26 | bwd_inner_microstep: 1343.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-11 01:18:55,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1444.71 | bwd_inner_microstep: 1444.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3436
[2024-06-11 01:18:57,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.73 | bwd_microstep: 1296.40 | bwd_inner_microstep: 1296.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-11 01:18:59,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.58 | bwd_microstep: 1598.05 | bwd_inner_microstep: 1598.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3811
[2024-06-11 01:19:01,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3599
[2024-06-11 01:19:06,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-11 01:19:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 3878.13 | bwd_inner_microstep: 1707.08 | bwd_allreduce_microstep: 2170.99 | step_microstep: 38.18
[2024-06-11 01:19:06,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15907.46 | bwd: 44737.67 | bwd_inner: 42565.78 | bwd_allreduce: 2171.22 | step: 39.66
{'loss': 1.1842, 'learning_rate': 3.127131194615003e-06, 'epoch': 0.82}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3468
[2024-06-11 01:19:08,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1516.29 | bwd_inner_microstep: 1516.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474
[2024-06-11 01:19:09,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.54 | bwd_microstep: 1242.87 | bwd_inner_microstep: 1242.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 01:19:11,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-11 01:19:13,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1548.19 | bwd_inner_microstep: 1548.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:19:15,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1247.79 | bwd_inner_microstep: 1247.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-11 01:19:17,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1532.29 | bwd_inner_microstep: 1532.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3742
[2024-06-11 01:19:19,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1496.60 | bwd_inner_microstep: 1496.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-11 01:19:20,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 793.34 | bwd_inner_microstep: 793.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-11 01:19:22,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.61 | bwd_microstep: 797.00 | bwd_inner_microstep: 796.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 01:19:23,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1249.08 | bwd_inner_microstep: 1249.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3522
[2024-06-11 01:19:25,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1226.88 | bwd_inner_microstep: 1226.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1894
[2024-06-11 01:19:26,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.90 | bwd_microstep: 746.06 | bwd_inner_microstep: 746.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 01:19:28,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1387.96 | bwd_inner_microstep: 1387.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-11 01:19:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.64 | bwd_microstep: 790.35 | bwd_inner_microstep: 790.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3442
[2024-06-11 01:19:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.98 | bwd_microstep: 1409.21 | bwd_inner_microstep: 1409.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 01:19:33,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-11 01:19:35,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.87 | bwd_microstep: 1240.16 | bwd_inner_microstep: 1240.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2129
[2024-06-11 01:19:36,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.37 | bwd_microstep: 931.32 | bwd_inner_microstep: 931.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431
[2024-06-11 01:19:38,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.00 | bwd_microstep: 1539.91 | bwd_inner_microstep: 1539.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3115
[2024-06-11 01:19:40,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.84 | bwd_microstep: 1343.66 | bwd_inner_microstep: 1343.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2909
[2024-06-11 01:19:41,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.31 | bwd_microstep: 1092.35 | bwd_inner_microstep: 1092.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703
[2024-06-11 01:19:43,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.48 | bwd_microstep: 1531.34 | bwd_inner_microstep: 1531.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 01:19:46,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.30 | bwd_microstep: 1557.90 | bwd_inner_microstep: 1557.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150
[2024-06-11 01:19:47,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.86 | bwd_microstep: 849.68 | bwd_inner_microstep: 849.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930
[2024-06-11 01:19:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.01 | bwd_microstep: 694.86 | bwd_inner_microstep: 694.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-11 01:19:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.54 | bwd_microstep: 1160.62 | bwd_inner_microstep: 1160.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-11 01:19:51,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 1501.00 | bwd_inner_microstep: 1500.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-11 01:19:53,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1404.89 | bwd_inner_microstep: 1404.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-11 01:19:55,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.54 | bwd_microstep: 971.03 | bwd_inner_microstep: 971.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2235
[2024-06-11 01:19:56,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.68 | bwd_microstep: 901.59 | bwd_inner_microstep: 901.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-11 01:19:58,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.93 | bwd_microstep: 1201.99 | bwd_inner_microstep: 1201.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044
[2024-06-11 01:20:20,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.10 | optimizer_step: 6.61
[2024-06-11 01:20:20,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.20 | bwd_microstep: 22509.65 | bwd_inner_microstep: 931.19 | bwd_allreduce_microstep: 21578.41 | step_microstep: 37.96
[2024-06-11 01:20:20,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14454.56 | bwd: 60138.89 | bwd_inner: 38559.58 | bwd_allreduce: 21578.63 | step: 39.45
{'loss': 1.2155, 'learning_rate': 3.107008790545494e-06, 'epoch': 0.83}
��███▏ | 1420/1726 [24:35:41<5:10:28, 60.88s/it]
 82%|████████▏ | 1421/1726 [24:36:42<5:09:53, 60.96s/it]


 82%|████████▏ | 1421/1726 [24:36:42<5:09:53, 60.96s/it]
 82%|████████▏ | 1422/1726 [24:37:42<5:08:16, 60.84s/it]


 82%|████████▏ | 1422/1726 [24:37:42<5:08:16, 60.84s/it]
 82%|████████▏ | 1423/1726 [24:39:27<6:13:34, 73.98s/it]


 82%|████████▏ | 1423/1726 [24:39:27<6:13:34, 73.98s/it]
 83%|████████▎ | 1424/1726 [24:41:42<7:45:05, 92.40s/it]


 83%|████████▎ | 1424/1726 [24:41:42<7:45:05, 92.40s/it]
 83%|████████▎ | 1425/1726 [24:42:57<7:17:14, 87.16s/it]


 8dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 01:20:22,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1322.32 | bwd_inner_microstep: 1322.26 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-11 01:20:24,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1278.79 | bwd_inner_microstep: 1278.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2368
[2024-06-11 01:20:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.54 | bwd_microstep: 887.94 | bwd_inner_microstep: 887.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3917
[2024-06-11 01:20:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.42 | bwd_microstep: 1681.22 | bwd_inner_microstep: 1681.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 01:20:30,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1476.17 | bwd_inner_microstep: 1476.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 01:20:31,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.40 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 01:20:33,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.06 | bwd_microstep: 1239.99 | bwd_inner_microstep: 1239.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-11 01:20:35,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225
[2024-06-11 01:20:36,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.92 | bwd_microstep: 952.87 | bwd_inner_microstep: 952.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-11 01:20:38,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1350.84 | bwd_inner_microstep: 1350.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3406
[2024-06-11 01:20:40,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1320.91 | bwd_inner_microstep: 1320.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-11 01:20:42,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1308.48 | bwd_inner_microstep: 1308.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-11 01:20:44,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1332.90 | bwd_inner_microstep: 1332.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-11 01:20:45,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1249.59 | bwd_inner_microstep: 1249.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 01:20:47,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489
[2024-06-11 01:20:50,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.27 | bwd_microstep: 1572.01 | bwd_inner_microstep: 1571.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-11 01:20:52,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1512.28 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960
[2024-06-11 01:20:53,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.60 | bwd_microstep: 703.37 | bwd_inner_microstep: 703.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-11 01:20:54,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1253.34 | bwd_inner_microstep: 1253.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 01:20:56,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.70 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 01:20:58,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1479.90 | bwd_inner_microstep: 1479.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546
[2024-06-11 01:21:00,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.64 | bwd_microstep: 1201.15 | bwd_inner_microstep: 1201.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-11 01:21:02,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1295.05 | bwd_inner_microstep: 1295.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-11 01:21:04,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1487.67 | bwd_inner_microstep: 1487.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-11 01:21:06,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.77 | bwd_microstep: 1553.47 | bwd_inner_microstep: 1553.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271
[2024-06-11 01:21:07,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.62 | bwd_microstep: 875.43 | bwd_inner_microstep: 875.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281
[2024-06-11 01:21:09,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.90 | bwd_microstep: 876.63 | bwd_inner_microstep: 876.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3821
[2024-06-11 01:21:11,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.07 | bwd_microstep: 1691.62 | bwd_inner_microstep: 1691.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-11 01:21:13,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1555.45 | bwd_inner_microstep: 1555.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278
[2024-06-11 01:21:14,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.30 | bwd_microstep: 907.51 | bwd_inner_microstep: 907.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-11 01:21:16,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.85 | bwd_microstep: 1474.01 | bwd_inner_microstep: 1473.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-11 01:21:22,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 01:21:22,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 4755.12 | bwd_inner_microstep: 1637.48 | bwd_allreduce_microstep: 3117.59 | step_microstep: 37.87
[2024-06-11 01:21:22,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15684.18 | bwd: 45209.39 | bwd_inner: 42090.84 | bwd_allreduce: 3117.84 | step: 39.42
{'loss': 1.2222, 'learning_rate': 3.0869458835032097e-06, 'epoch': 0.83}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2380
[2024-06-11 01:21:23,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.85 | bwd_microstep: 955.02 | bwd_inner_microstep: 954.95 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 01:21:25,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.47 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475
[2024-06-11 01:21:27,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.18 | bwd_microstep: 1214.09 | bwd_inner_microstep: 1214.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 01:21:29,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.03 | bwd_microstep: 1474.81 | bwd_inner_microstep: 1474.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2228
[2024-06-11 01:21:30,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.51 | bwd_microstep: 862.67 | bwd_inner_microstep: 862.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3772
[2024-06-11 01:21:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.33 | bwd_microstep: 1742.24 | bwd_inner_microstep: 1742.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772
[2024-06-11 01:21:34,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1541.30 | bwd_inner_microstep: 1541.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-11 01:21:35,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.76 | bwd_microstep: 799.33 | bwd_inner_microstep: 799.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 01:21:37,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1387.13 | bwd_inner_microstep: 1387.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-11 01:21:39,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1421.12 | bwd_inner_microstep: 1421.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 01:21:41,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1380.04 | bwd_inner_microstep: 1380.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548
[2024-06-11 01:21:43,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.53 | bwd_microstep: 1303.80 | bwd_inner_microstep: 1303.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-11 01:21:45,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1581.74 | bwd_inner_microstep: 1581.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3474
[2024-06-11 01:21:47,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.63 | bwd_microstep: 1544.39 | bwd_inner_microstep: 1544.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-11 01:21:50,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1653.69 | bwd_inner_microstep: 1653.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-11 01:21:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1452.40 | bwd_inner_microstep: 1452.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-11 01:21:53,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.37 | bwd_microstep: 1310.99 | bwd_inner_microstep: 1310.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-11 01:21:55,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.24 | bwd_microstep: 1186.28 | bwd_inner_microstep: 1186.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-11 01:21:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.21 | bwd_microstep: 1295.72 | bwd_inner_microstep: 1295.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2286
[2024-06-11 01:21:58,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.06 | bwd_microstep: 1073.42 | bwd_inner_microstep: 1073.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-11 01:22:00,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1404.13 | bwd_inner_microstep: 1404.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-11 01:22:02,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.81 | bwd_microstep: 1498.47 | bwd_inner_microstep: 1498.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279
[2024-06-11 01:22:04,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.26 | bwd_microstep: 876.90 | bwd_inner_microstep: 876.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-11 01:22:05,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.36 | bwd_microstep: 975.17 | bwd_inner_microstep: 975.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 01:22:07,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.26 | bwd_microstep: 1554.35 | bwd_inner_microstep: 1554.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3546
[2024-06-11 01:22:09,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.86 | bwd_microstep: 1561.49 | bwd_inner_microstep: 1561.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-11 01:22:11,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1392.11 | bwd_inner_microstep: 1392.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818
[2024-06-11 01:22:13,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.06 | bwd_microstep: 1387.17 | bwd_inner_microstep: 1387.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3783
[2024-06-11 01:22:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.85 | bwd_microstep: 1689.41 | bwd_inner_microstep: 1689.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2196
[2024-06-11 01:22:17,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.84 | bwd_microstep: 1053.29 | bwd_inner_microstep: 1053.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-11 01:22:18,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.47 | bwd_microstep: 790.07 | bwd_inner_microstep: 790.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764
[2024-06-11 01:22:21,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.03 | optimizer_step: 6.61
[2024-06-11 01:22:21,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 2531.02 | bwd_inner_microstep: 1722.94 | bwd_allreduce_microstep: 808.04 | step_microstep: 37.90
[2024-06-11 01:22:21,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15798.31 | bwd: 43233.95 | bwd_inner: 42424.97 | bwd_allreduce: 808.29 | step: 39.39
{'loss': 1.164, 'learning_rate': 3.0669425441495936e-06, 'epoch': 0.83}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 01:22:23,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.87 | bwd_microstep: 1238.91 | bwd_inner_microstep: 1238.85 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1351
[2024-06-11 01:22:24,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.66 | bwd_microstep: 516.84 | bwd_inner_microstep: 516.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-11 01:22:26,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.00 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 01:22:27,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1251.03 | bwd_inner_microstep: 1251.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:22:29,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1247.62 | bwd_inner_microstep: 1247.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 01:22:31,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4079
[2024-06-11 01:22:33,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1522.69 | bwd_inner_microstep: 1522.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-11 01:22:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.19 | bwd_microstep: 1527.30 | bwd_inner_microstep: 1527.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-11 01:22:37,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.98 | bwd_microstep: 1150.06 | bwd_inner_microstep: 1150.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 01:22:39,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1387.48 | bwd_inner_microstep: 1387.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-11 01:22:40,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1251.34 | bwd_inner_microstep: 1251.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3438
[2024-06-11 01:22:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.21 | bwd_microstep: 1219.91 | bwd_inner_microstep: 1219.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-11 01:22:44,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1281.53 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-11 01:22:46,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1446.84 | bwd_inner_microstep: 1446.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661
[2024-06-11 01:22:48,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.33 | bwd_microstep: 1512.07 | bwd_inner_microstep: 1512.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2113
[2024-06-11 01:22:49,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 888.51 | bwd_inner_microstep: 888.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461
[2024-06-11 01:22:51,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1182.79 | bwd_inner_microstep: 1182.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-11 01:22:53,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1409.00 | bwd_inner_microstep: 1408.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-11 01:22:55,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.02 | bwd_microstep: 1486.71 | bwd_inner_microstep: 1486.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 01:22:57,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.34 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 01:22:59,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.79 | bwd_microstep: 1381.87 | bwd_inner_microstep: 1381.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-11 01:23:00,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.29 | bwd_microstep: 1291.30 | bwd_inner_microstep: 1291.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 01:23:02,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1253.71 | bwd_inner_microstep: 1253.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-11 01:23:04,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.67 | bwd_microstep: 1460.12 | bwd_inner_microstep: 1460.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3679
[2024-06-11 01:23:06,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1399.12 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-11 01:23:08,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-11 01:23:10,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.62 | bwd_microstep: 1357.56 | bwd_inner_microstep: 1357.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-11 01:23:12,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.68 | bwd_microstep: 1424.28 | bwd_inner_microstep: 1424.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-11 01:23:14,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1407.63 | bwd_inner_microstep: 1407.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-11 01:23:16,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1446.51 | bwd_inner_microstep: 1446.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 01:23:18,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.59 | bwd_microstep: 1650.20 | bwd_inner_microstep: 1650.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2902
[2024-06-11 01:23:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.05 | optimizer_step: 6.59
[2024-06-11 01:23:22,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.75 | bwd_microstep: 3707.87 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 2325.43 | step_microstep: 37.90
[2024-06-11 01:23:22,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.91 | bwd: 45116.55 | bwd_inner: 42790.17 | bwd_allreduce: 2325.67 | step: 39.39
{'loss': 1.1931, 'learning_rate': 3.046998842936315e-06, 'epoch': 0.83}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2006
[2024-06-11 01:23:24,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.52 | bwd_microstep: 890.79 | bwd_inner_microstep: 890.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 01:23:26,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1488.21 | bwd_inner_microstep: 1488.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3955
[2024-06-11 01:23:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.39 | bwd_microstep: 1692.47 | bwd_inner_microstep: 1692.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 01:23:30,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.28 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847
[2024-06-11 01:23:32,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.95 | bwd_microstep: 1658.03 | bwd_inner_microstep: 1658.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-11 01:23:34,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.28 | bwd_microstep: 1528.75 | bwd_inner_microstep: 1528.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3537
[2024-06-11 01:23:36,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.08 | bwd_microstep: 1228.97 | bwd_inner_microstep: 1228.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 01:23:38,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1247.90 | bwd_inner_microstep: 1247.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 01:23:40,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1479.60 | bwd_inner_microstep: 1479.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-11 01:23:41,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.55 | bwd_microstep: 1157.33 | bwd_inner_microstep: 1157.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-11 01:23:43,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.98 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435
[2024-06-11 01:23:45,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.14 | bwd_microstep: 1282.25 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3697
[2024-06-11 01:23:47,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.41 | bwd_microstep: 1471.98 | bwd_inner_microstep: 1471.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975
[2024-06-11 01:23:48,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.91 | bwd_microstep: 891.40 | bwd_inner_microstep: 891.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519
[2024-06-11 01:23:50,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.48 | bwd_microstep: 1325.35 | bwd_inner_microstep: 1325.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-11 01:23:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.49 | bwd_microstep: 1426.54 | bwd_inner_microstep: 1426.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-11 01:23:54,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1415.54 | bwd_inner_microstep: 1415.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-11 01:23:56,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.79 | bwd_microstep: 1312.37 | bwd_inner_microstep: 1312.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-11 01:23:58,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.94 | bwd_microstep: 1283.50 | bwd_inner_microstep: 1283.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636
[2024-06-11 01:24:00,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1514.16 | bwd_inner_microstep: 1514.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933
[2024-06-11 01:24:02,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1600.11 | bwd_inner_microstep: 1600.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-11 01:24:04,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000
[2024-06-11 01:24:05,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.15 | bwd_microstep: 769.88 | bwd_inner_microstep: 769.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605
[2024-06-11 01:24:07,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1310.22 | bwd_inner_microstep: 1310.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-11 01:24:09,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1500.52 | bwd_inner_microstep: 1500.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3902
[2024-06-11 01:24:11,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.49 | bwd_microstep: 1637.12 | bwd_inner_microstep: 1637.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3763
[2024-06-11 01:24:13,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1376.96 | bwd_inner_microstep: 1376.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 01:24:15,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1511.38 | bwd_inner_microstep: 1511.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797
[2024-06-11 01:24:17,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1513.26 | bwd_inner_microstep: 1513.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3599
[2024-06-11 01:24:19,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.43 | bwd_microstep: 1463.00 | bwd_inner_microstep: 1462.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2713
[2024-06-11 01:24:21,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.18 | bwd_microstep: 1129.13 | bwd_inner_microstep: 1129.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 01:24:27,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-11 01:24:27,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 5686.44 | bwd_inner_microstep: 1755.16 | bwd_allreduce_microstep: 3931.22 | step_microstep: 37.76
[2024-06-11 01:24:27,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16353.58 | bwd: 47807.21 | bwd_inner: 43875.07 | bwd_allreduce: 3931.45 | step: 39.25
{'loss': 1.2133, 'learning_rate': 3.0271148501049796e-06, 'epoch': 0.83}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 01:24:29,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1476.75 | bwd_inner_microstep: 1476.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3998
[2024-06-11 01:24:31,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.82 | bwd_microstep: 1705.07 | bwd_inner_microstep: 1705.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 01:24:34,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.39 | bwd_microstep: 1543.80 | bwd_inner_microstep: 1543.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-11 01:24:35,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.27 | bwd_microstep: 1250.00 | bwd_inner_microstep: 1249.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-11 01:24:37,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.40 | bwd_microstep: 1183.31 | bwd_inner_microstep: 1183.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 01:24:39,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.79 | bwd_microstep: 1245.86 | bwd_inner_microstep: 1245.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-11 01:24:40,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.59 | bwd_microstep: 1295.32 | bwd_inner_microstep: 1295.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 01:24:42,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1384.18 | bwd_inner_microstep: 1384.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696
[2024-06-11 01:24:44,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.63 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692
[2024-06-11 01:24:46,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1522.28 | bwd_inner_microstep: 1522.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 01:24:48,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-11 01:24:50,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.89 | bwd_microstep: 1281.40 | bwd_inner_microstep: 1281.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3737
[2024-06-11 01:24:52,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.39 | bwd_microstep: 1666.34 | bwd_inner_microstep: 1666.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-11 01:24:54,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1512.83 | bwd_inner_microstep: 1512.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2907
[2024-06-11 01:24:56,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.83 | bwd_microstep: 1220.79 | bwd_inner_microstep: 1220.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3685
[2024-06-11 01:24:58,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.70 | bwd_microstep: 1720.34 | bwd_inner_microstep: 1720.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-11 01:25:00,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1344.44 | bwd_inner_microstep: 1344.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819
[2024-06-11 01:25:03,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.89 | bwd_microstep: 1752.19 | bwd_inner_microstep: 1752.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-11 01:25:05,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1387.87 | bwd_inner_microstep: 1387.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527
[2024-06-11 01:25:07,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.45 | bwd_microstep: 1583.47 | bwd_inner_microstep: 1583.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 01:25:09,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1463.85 | bwd_inner_microstep: 1463.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3447
[2024-06-11 01:25:11,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.58 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-11 01:25:13,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1404.24 | bwd_inner_microstep: 1404.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 01:25:15,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.94 | bwd_microstep: 1452.24 | bwd_inner_microstep: 1452.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 01:25:17,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.35 | bwd_microstep: 1493.81 | bwd_inner_microstep: 1493.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3779
[2024-06-11 01:25:19,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.76 | bwd_microstep: 1573.74 | bwd_inner_microstep: 1573.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 01:25:21,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.97 | bwd_microstep: 1656.64 | bwd_inner_microstep: 1656.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 01:25:23,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.44 | bwd_microstep: 1454.37 | bwd_inner_microstep: 1454.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385
[2024-06-11 01:25:25,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.73 | bwd_microstep: 1241.82 | bwd_inner_microstep: 1241.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-11 01:25:27,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.70 | bwd_microstep: 1501.22 | bwd_inner_microstep: 1501.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-11 01:25:29,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.18 | bwd_microstep: 1502.40 | bwd_inner_microstep: 1502.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 01:25:31,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 01:25:31,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.26 | bwd_microstep: 1442.74 | bwd_inner_microstep: 1434.82 | bwd_allreduce_microstep: 7.87 | step_microstep: 37.69
[2024-06-11 01:25:31,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17358.83 | bwd: 46485.47 | bwd_inner: 46476.68 | bwd_allreduce: 8.10 | step: 39.16
3%|████████▎ | 1425/1726 [24:42:57<7:17:14, 87.16s/it]
 83%|████████▎ | 1426/1726 [24:43:58<6:36:53, 79.38s/it]


 83%|████████▎ | 1426/1726 [24:43:58<6:36:53, 79.38s/it]
 83%|████████▎ | 1427/1726 [24:44:58<6:05:38, 73.37s/it]


 83%|████████▎ | 1427/1726 [24:44:58<6:05:38, 73.37s/it]
 83%|████████▎ | 1428/1726 [24:45:59<5:46:38, 69.79s/it]


 83%|████████▎ | 1428/1726 [24:45:59<5:46:38, 69.79s/it]
 83%|████████▎ | 1429/1726 [24:47:04<5:37:36, 68.20s/it]


 83%|████████▎ | 1429/1726 [24:47:04<5:37:36, 68.20s/it]
 83%|████████▎ | 1430/1726 [24:48:08<5:30:31, 67.00s/it]
                                            {'loss': 1.1703, 'learning_rate': 3.0072906356869145e-06, 'epoch': 0.83}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 01:25:33,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1277.84 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:25:35,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1244.20 | bwd_inner_microstep: 1244.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936
[2024-06-11 01:25:36,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.15 | bwd_microstep: 696.72 | bwd_inner_microstep: 696.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 01:25:38,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1499.83 | bwd_inner_microstep: 1499.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 01:25:40,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1381.35 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 01:25:41,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.71 | bwd_microstep: 1246.22 | bwd_inner_microstep: 1246.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 01:25:43,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1396.63 | bwd_inner_microstep: 1396.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-11 01:25:44,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.73 | bwd_microstep: 790.73 | bwd_inner_microstep: 790.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-11 01:25:45,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.12 | bwd_microstep: 797.60 | bwd_inner_microstep: 797.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2185
[2024-06-11 01:25:47,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.36 | bwd_microstep: 856.81 | bwd_inner_microstep: 856.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-11 01:25:49,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1500.61 | bwd_inner_microstep: 1500.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3931
[2024-06-11 01:25:51,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.09 | bwd_microstep: 1619.41 | bwd_inner_microstep: 1619.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3662
[2024-06-11 01:25:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1385.98 | bwd_inner_microstep: 1385.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533
[2024-06-11 01:25:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1585.92 | bwd_inner_microstep: 1585.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2384
[2024-06-11 01:25:56,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.35 | bwd_microstep: 933.50 | bwd_inner_microstep: 933.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-11 01:25:58,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.01 | bwd_microstep: 1485.96 | bwd_inner_microstep: 1485.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3688
[2024-06-11 01:26:01,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.86 | bwd_microstep: 1661.54 | bwd_inner_microstep: 1661.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828
[2024-06-11 01:26:03,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.90 | bwd_microstep: 1578.24 | bwd_inner_microstep: 1578.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-11 01:26:05,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.23 | bwd_microstep: 1525.56 | bwd_inner_microstep: 1525.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-11 01:26:07,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1487.09 | bwd_inner_microstep: 1487.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070
[2024-06-11 01:26:08,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.39 | bwd_microstep: 821.28 | bwd_inner_microstep: 821.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-11 01:26:10,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1258.76 | bwd_inner_microstep: 1258.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795
[2024-06-11 01:26:12,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-11 01:26:14,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1501.53 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580
[2024-06-11 01:26:16,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1422.35 | bwd_inner_microstep: 1422.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-11 01:26:18,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.07 | bwd_microstep: 1414.08 | bwd_inner_microstep: 1414.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 01:26:20,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.50 | bwd_microstep: 1381.28 | bwd_inner_microstep: 1381.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 01:26:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1409.08 | bwd_inner_microstep: 1409.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-11 01:26:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1508.37 | bwd_inner_microstep: 1508.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 01:26:26,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1560.25 | bwd_inner_microstep: 1560.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031
[2024-06-11 01:26:27,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.26 | bwd_microstep: 805.21 | bwd_inner_microstep: 805.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3770
[2024-06-11 01:26:33,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 01:26:33,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.10 | bwd_microstep: 4927.42 | bwd_inner_microstep: 2109.83 | bwd_allreduce_microstep: 2817.54 | step_microstep: 38.10
[2024-06-11 01:26:33,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15870.05 | bwd: 45517.40 | bwd_inner: 42698.82 | bwd_allreduce: 2817.85 | step: 39.64
{'loss': 1.13, 'learning_rate': 2.9875262695028874e-06, 'epoch': 0.83}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 01:26:35,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1370.71 | bwd_inner_microstep: 1370.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 01:26:37,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-11 01:26:39,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.70 | bwd_microstep: 1457.25 | bwd_inner_microstep: 1457.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475
[2024-06-11 01:26:40,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1307.39 | bwd_inner_microstep: 1307.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 01:26:42,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.49 | bwd_microstep: 1337.79 | bwd_inner_microstep: 1337.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:26:44,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1244.61 | bwd_inner_microstep: 1244.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 01:26:46,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1249.63 | bwd_inner_microstep: 1249.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-11 01:26:47,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.98 | bwd_microstep: 1185.83 | bwd_inner_microstep: 1185.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423
[2024-06-11 01:26:49,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1305.86 | bwd_inner_microstep: 1305.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2080
[2024-06-11 01:26:50,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.94 | bwd_microstep: 848.86 | bwd_inner_microstep: 848.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-11 01:26:52,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1414.26 | bwd_inner_microstep: 1414.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441
[2024-06-11 01:26:54,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1395.80 | bwd_inner_microstep: 1395.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-11 01:26:56,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.54 | bwd_microstep: 1406.63 | bwd_inner_microstep: 1406.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 01:26:58,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.85 | bwd_microstep: 1282.54 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-11 01:27:00,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.67 | bwd_microstep: 1208.17 | bwd_inner_microstep: 1208.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 01:27:01,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.07 | bwd_microstep: 1382.29 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 01:27:04,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 1646.09 | bwd_inner_microstep: 1646.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-11 01:27:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1614.91 | bwd_inner_microstep: 1614.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3824
[2024-06-11 01:27:08,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.72 | bwd_microstep: 1813.08 | bwd_inner_microstep: 1813.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3599
[2024-06-11 01:27:11,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.32 | bwd_microstep: 1672.38 | bwd_inner_microstep: 1672.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-11 01:27:13,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.96 | bwd_microstep: 1280.66 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-11 01:27:15,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1644.74 | bwd_inner_microstep: 1644.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3033
[2024-06-11 01:27:16,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.88 | bwd_microstep: 1072.58 | bwd_inner_microstep: 1072.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3429
[2024-06-11 01:27:18,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.67 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-11 01:27:20,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.88 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-11 01:27:22,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.99 | bwd_microstep: 1400.47 | bwd_inner_microstep: 1400.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 01:27:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.60 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-11 01:27:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.75 | bwd_microstep: 1508.76 | bwd_inner_microstep: 1508.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 01:27:28,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.17 | bwd_microstep: 1394.33 | bwd_inner_microstep: 1394.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-11 01:27:29,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1309.80 | bwd_inner_microstep: 1309.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511
[2024-06-11 01:27:31,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.80 | bwd_microstep: 1193.68 | bwd_inner_microstep: 1193.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-11 01:27:35,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.08 | optimizer_step: 6.58
[2024-06-11 01:27:35,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 3538.05 | bwd_inner_microstep: 1628.34 | bwd_allreduce_microstep: 1909.67 | step_microstep: 37.61
[2024-06-11 01:27:35,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16342.65 | bwd: 45638.49 | bwd_inner: 43727.93 | bwd_allreduce: 1909.90 | step: 39.03
{'loss': 1.149, 'learning_rate': 2.967821821162904e-06, 'epoch': 0.83}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 01:27:37,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2406
[2024-06-11 01:27:38,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.75 | bwd_microstep: 999.87 | bwd_inner_microstep: 999.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-11 01:27:41,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1545.99 | bwd_inner_microstep: 1545.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1885
[2024-06-11 01:27:42,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.72 | bwd_microstep: 743.43 | bwd_inner_microstep: 743.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-11 01:27:44,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.78 | bwd_microstep: 1634.73 | bwd_inner_microstep: 1634.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 01:27:46,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1389.96 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-11 01:27:47,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.69 | bwd_microstep: 1187.41 | bwd_inner_microstep: 1187.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-11 01:27:49,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.63 | bwd_microstep: 1423.78 | bwd_inner_microstep: 1423.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 01:27:51,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.82 | bwd_microstep: 1376.20 | bwd_inner_microstep: 1376.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3694
[2024-06-11 01:27:54,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.76 | bwd_microstep: 1720.67 | bwd_inner_microstep: 1720.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 01:27:56,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.12 | bwd_microstep: 1481.55 | bwd_inner_microstep: 1481.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 01:27:58,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1483.58 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 01:28:00,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1373.17 | bwd_inner_microstep: 1373.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-11 01:28:01,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.95 | bwd_microstep: 682.60 | bwd_inner_microstep: 682.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3647
[2024-06-11 01:28:03,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1467.38 | bwd_inner_microstep: 1467.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-11 01:28:05,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1606.89 | bwd_inner_microstep: 1606.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-11 01:28:07,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1289.49 | bwd_inner_microstep: 1289.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 01:28:09,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.95 | bwd_microstep: 1389.72 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-11 01:28:11,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.12 | bwd_microstep: 1527.07 | bwd_inner_microstep: 1527.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1992
[2024-06-11 01:28:12,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.20 | bwd_microstep: 706.76 | bwd_inner_microstep: 706.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 01:28:13,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1280.69 | bwd_inner_microstep: 1280.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009
[2024-06-11 01:28:15,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.15 | bwd_microstep: 805.31 | bwd_inner_microstep: 805.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440
[2024-06-11 01:28:16,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1379.92 | bwd_inner_microstep: 1379.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 01:28:18,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.01 | bwd_microstep: 1458.24 | bwd_inner_microstep: 1458.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3312
[2024-06-11 01:28:20,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.19 | bwd_microstep: 1192.06 | bwd_inner_microstep: 1192.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828
[2024-06-11 01:28:22,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1390.60 | bwd_inner_microstep: 1390.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 01:28:24,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1448.52 | bwd_inner_microstep: 1448.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529
[2024-06-11 01:28:26,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.23 | bwd_microstep: 1450.72 | bwd_inner_microstep: 1450.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566
[2024-06-11 01:28:28,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1594.06 | bwd_inner_microstep: 1594.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 01:28:30,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-11 01:28:32,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1251.48 | bwd_inner_microstep: 1251.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584
[2024-06-11 01:28:36,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.11 | optimizer_step: 6.57
[2024-06-11 01:28:36,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 3460.96 | bwd_inner_microstep: 1802.16 | bwd_allreduce_microstep: 1658.75 | step_microstep: 37.78
[2024-06-11 01:28:36,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15941.71 | bwd: 44467.17 | bwd_inner: 42807.52 | bwd_allreduce: 1658.98 | step: 39.26
{'loss': 1.1924, 'learning_rate': 2.948177360065918e-06, 'epoch': 0.83}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 01:28:38,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.39 | bwd_microstep: 1472.90 | bwd_inner_microstep: 1472.74 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 01:28:40,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3898
[2024-06-11 01:28:42,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.23 | bwd_microstep: 1481.44 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 01:28:44,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754
[2024-06-11 01:28:46,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.96 | bwd_microstep: 1533.65 | bwd_inner_microstep: 1533.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846
[2024-06-11 01:28:48,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.11 | bwd_microstep: 1661.45 | bwd_inner_microstep: 1661.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3733
[2024-06-11 01:28:50,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1494.75 | bwd_inner_microstep: 1494.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-11 01:28:52,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1436.33 | bwd_inner_microstep: 1436.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-11 01:28:54,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1317.35 | bwd_inner_microstep: 1317.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2192
[2024-06-11 01:28:55,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.14 | bwd_microstep: 921.38 | bwd_inner_microstep: 921.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-11 01:28:57,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.55 | bwd_microstep: 798.35 | bwd_inner_microstep: 798.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 01:28:58,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1344.50 | bwd_inner_microstep: 1344.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122
[2024-06-11 01:29:00,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.60 | bwd_microstep: 937.08 | bwd_inner_microstep: 937.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-11 01:29:02,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.64 | bwd_microstep: 1533.41 | bwd_inner_microstep: 1533.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3668
[2024-06-11 01:29:04,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.59 | bwd_microstep: 1547.65 | bwd_inner_microstep: 1547.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-11 01:29:06,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.59 | bwd_microstep: 1617.66 | bwd_inner_microstep: 1617.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-11 01:29:08,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1491.20 | bwd_inner_microstep: 1491.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-11 01:29:10,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1351.97 | bwd_inner_microstep: 1351.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691
[2024-06-11 01:29:12,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1431.10 | bwd_inner_microstep: 1431.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-11 01:29:14,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1426.22 | bwd_inner_microstep: 1426.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302
[2024-06-11 01:29:15,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.36 | bwd_microstep: 977.22 | bwd_inner_microstep: 977.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-11 01:29:17,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.62 | bwd_microstep: 1304.07 | bwd_inner_microstep: 1304.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3754
[2024-06-11 01:29:19,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.48 | bwd_microstep: 1307.92 | bwd_inner_microstep: 1307.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562
[2024-06-11 01:29:21,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.04 | bwd_microstep: 1303.31 | bwd_inner_microstep: 1303.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718
[2024-06-11 01:29:23,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.22 | bwd_microstep: 1537.23 | bwd_inner_microstep: 1537.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3481
[2024-06-11 01:29:25,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.64 | bwd_microstep: 1438.30 | bwd_inner_microstep: 1438.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-11 01:29:26,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.31 | bwd_microstep: 812.30 | bwd_inner_microstep: 812.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3762
[2024-06-11 01:29:28,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.79 | bwd_microstep: 1349.04 | bwd_inner_microstep: 1349.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2275
[2024-06-11 01:29:29,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.47 | bwd_microstep: 814.22 | bwd_inner_microstep: 814.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810
[2024-06-11 01:29:31,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.32 | bwd_microstep: 1362.77 | bwd_inner_microstep: 1362.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3385
[2024-06-11 01:29:33,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.63 | bwd_microstep: 1366.12 | bwd_inner_microstep: 1366.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3861
[2024-06-11 01:29:36,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.01 | optimizer_step: 6.62
[2024-06-11 01:29:36,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 681.15 | bwd_microstep: 1999.50 | bwd_inner_microstep: 1972.83 | bwd_allreduce_microstep: 26.62 | step_microstep: 37.38
[2024-06-11 01:29:36,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16095.97 | bwd: 43129.40 | bwd_inner: 43101.75 | bwd_allreduce: 26.91 | step: 38.92
{'loss': 1.1135, 'learning_rate': 2.9285929553996384e-06, 'epoch': 0.83}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 01:29:37,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-11 01:29:38,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.28 | bwd_microstep: 684.01 | bwd_inner_microstep: 683.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 01:29:40,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1481.89 | bwd_inner_microstep: 1481.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 01:29:42,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1395.32 | bwd_inner_microstep: 1395.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-11 01:29:44,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1447.74 | bwd_inner_microstep: 1447.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-11 01:29:46,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.47 | bwd_microstep: 1154.01 | bwd_inner_microstep: 1153.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746
[2024-06-11 01:29:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.15 | bwd_microstep: 1538.48 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1909
[2024-06-11 01:29:49,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.30 | bwd_microstep: 730.41 | bwd_inner_microstep: 730.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 01:29:51,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 1284.85 | bwd_inner_microstep: 1284.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 01:29:53,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1402.99 | bwd_inner_microstep: 1402.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-11 01:29:55,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.59 | bwd_microstep: 1531.28 | bwd_inner_microstep: 1531.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450
[2024-06-11 01:29:57,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.87 | bwd_microstep: 1409.39 | bwd_inner_microstep: 1409.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 01:29:59,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.85 | bwd_microstep: 1287.23 | bwd_inner_microstep: 1287.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 01:30:01,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1381.95 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 01:30:03,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.03 | bwd_microstep: 1482.38 | bwd_inner_microstep: 1482.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 01:30:04,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1342.63 | bwd_inner_microstep: 1342.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-11 01:30:07,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1491.05 | bwd_inner_microstep: 1491.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-11 01:30:08,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1344.97 | bwd_inner_microstep: 1344.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2097
[2024-06-11 01:30:10,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.15 | bwd_microstep: 831.26 | bwd_inner_microstep: 831.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-11 01:30:11,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1416.63 | bwd_inner_microstep: 1416.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545
[2024-06-11 01:30:14,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.20 | bwd_microstep: 1523.45 | bwd_inner_microstep: 1523.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-11 01:30:15,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.99 | bwd_microstep: 1297.05 | bwd_inner_microstep: 1297.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-11 01:30:16,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.09 | bwd_microstep: 710.16 | bwd_inner_microstep: 710.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3474
[2024-06-11 01:30:18,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.66 | bwd_microstep: 1235.05 | bwd_inner_microstep: 1235.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-11 01:30:20,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2239
[2024-06-11 01:30:21,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.21 | bwd_microstep: 1015.21 | bwd_inner_microstep: 1015.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758
[2024-06-11 01:30:23,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 1442.70 | bwd_inner_microstep: 1442.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809
[2024-06-11 01:30:25,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.94 | bwd_microstep: 1355.75 | bwd_inner_microstep: 1355.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3605
[2024-06-11 01:30:27,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1555.57 | bwd_inner_microstep: 1555.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2030
[2024-06-11 01:30:29,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.26 | bwd_microstep: 1002.41 | bwd_inner_microstep: 1002.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3734
[2024-06-11 01:30:31,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1556.87 | bwd_inner_microstep: 1556.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280
[2024-06-11 01:30:38,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 01:30:38,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.78 | bwd_microstep: 6310.00 | bwd_inner_microstep: 1142.30 | bwd_allreduce_microstep: 5167.63 | step_microstep: 38.90
[2024-06-11 01:30:38,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15424.86 | bwd: 46421.04 | bwd_inner: 41252.47 | bwd_allreduce: 5167.87 | step: 40.38


 83%|████████▎ | 1430/1726 [24:48:08<5:30:31, 67.00s/it]
 83%|████████▎ | 1431/1726 [24:49:10<5:21:37, 65.42s/it]


 83%|████████▎ | 1431/1726 [24:49:10<5:21:37, 65.42s/it]
 83%|████████▎ | 1432/1726 [24:50:12<5:15:58, 64.49s/it]


 83%|████████▎ | 1432/1726 [24:50:12<5:15:58, 64.49s/it]
 83%|████████▎ | 1433/1726 [24:51:13<5:09:24, 63.36s/it]


 83%|████████▎ | 1433/1726 [24:51:13<5:09:24, 63.36s/it]
 83%|████████▎ | 1434/1726 [24:52:12<5:02:48, 62.22s/it]


 83%|████████▎ | 1434/1726 [24:52:12<5:02:48, 62.22s/it]
 83%|████████▎ | 1435/1726 [24:53:14<5:01:42, 62.21s/it]
                            {'loss': 1.1851, 'learning_rate': 2.909068676140212e-06, 'epoch': 0.83}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-11 01:30:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.76 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3485
[2024-06-11 01:30:42,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.05 | bwd_microstep: 1242.51 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-11 01:30:44,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.81 | bwd_microstep: 1557.17 | bwd_inner_microstep: 1557.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788
[2024-06-11 01:30:46,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.90 | bwd_microstep: 1451.42 | bwd_inner_microstep: 1451.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755
[2024-06-11 01:30:48,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1338.43 | bwd_inner_microstep: 1338.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 01:30:49,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.57 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-11 01:30:52,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.80 | bwd_microstep: 1625.85 | bwd_inner_microstep: 1625.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-11 01:30:54,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1525.24 | bwd_inner_microstep: 1525.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4025
[2024-06-11 01:30:56,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.02 | bwd_microstep: 1616.01 | bwd_inner_microstep: 1615.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-11 01:30:58,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.19 | bwd_microstep: 1279.40 | bwd_inner_microstep: 1279.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-11 01:31:00,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.53 | bwd_microstep: 1342.05 | bwd_inner_microstep: 1342.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661
[2024-06-11 01:31:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1565.87 | bwd_inner_microstep: 1565.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495
[2024-06-11 01:31:04,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.18 | bwd_microstep: 1313.22 | bwd_inner_microstep: 1313.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2905
[2024-06-11 01:31:05,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.39 | bwd_microstep: 1220.01 | bwd_inner_microstep: 1219.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-11 01:31:07,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.36 | bwd_microstep: 1556.89 | bwd_inner_microstep: 1556.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2171
[2024-06-11 01:31:09,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.85 | bwd_microstep: 952.14 | bwd_inner_microstep: 952.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-11 01:31:11,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.25 | bwd_microstep: 1336.33 | bwd_inner_microstep: 1336.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3630
[2024-06-11 01:31:13,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1646.19 | bwd_inner_microstep: 1646.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-11 01:31:15,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1290.87 | bwd_inner_microstep: 1290.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533
[2024-06-11 01:31:17,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1426.49 | bwd_inner_microstep: 1426.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-11 01:31:18,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.49 | bwd_microstep: 800.03 | bwd_inner_microstep: 800.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3834
[2024-06-11 01:31:20,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.78 | bwd_microstep: 1437.14 | bwd_inner_microstep: 1437.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 01:31:22,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1493.26 | bwd_inner_microstep: 1493.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 01:31:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.36 | bwd_microstep: 1647.61 | bwd_inner_microstep: 1647.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 01:31:26,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 01:31:28,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.07 | bwd_microstep: 1287.02 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-11 01:31:30,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1479.24 | bwd_inner_microstep: 1479.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 01:31:32,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1376.10 | bwd_inner_microstep: 1376.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 01:31:34,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1372.87 | bwd_inner_microstep: 1372.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-11 01:31:36,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1496.34 | bwd_inner_microstep: 1496.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-11 01:31:38,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1431.72 | bwd_inner_microstep: 1431.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1207
[2024-06-11 01:31:40,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.05 | optimizer_step: 6.59
[2024-06-11 01:31:40,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 173.29 | bwd_microstep: 2201.44 | bwd_inner_microstep: 515.60 | bwd_allreduce_microstep: 1685.78 | step_microstep: 37.86
[2024-06-11 01:31:40,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16391.58 | bwd: 45607.16 | bwd_inner: 43920.47 | bwd_allreduce: 1686.01 | step: 39.34
{'loss': 1.1599, 'learning_rate': 2.8896045910520663e-06, 'epoch': 0.83}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2657
[2024-06-11 01:31:42,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.51 | bwd_microstep: 1102.86 | bwd_inner_microstep: 1102.79 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3941
[2024-06-11 01:31:44,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.97 | bwd_microstep: 1543.98 | bwd_inner_microstep: 1543.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 01:31:46,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 01:31:48,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.94 | bwd_microstep: 1482.50 | bwd_inner_microstep: 1482.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 01:31:49,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1283.65 | bwd_inner_microstep: 1283.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 01:31:51,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1387.30 | bwd_inner_microstep: 1387.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 01:31:53,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1249.74 | bwd_inner_microstep: 1249.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 01:31:55,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1250.28 | bwd_inner_microstep: 1250.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-11 01:31:56,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.18 | bwd_microstep: 1191.61 | bwd_inner_microstep: 1191.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 01:31:58,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1379.01 | bwd_inner_microstep: 1378.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3657
[2024-06-11 01:32:00,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1544.74 | bwd_inner_microstep: 1544.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-11 01:32:03,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1577.50 | bwd_inner_microstep: 1577.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-11 01:32:04,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1248.15 | bwd_inner_microstep: 1248.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2164
[2024-06-11 01:32:06,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.23 | bwd_microstep: 884.45 | bwd_inner_microstep: 884.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 01:32:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.65 | bwd_microstep: 1380.83 | bwd_inner_microstep: 1380.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829
[2024-06-11 01:32:10,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.90 | bwd_microstep: 1487.77 | bwd_inner_microstep: 1487.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-11 01:32:11,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1290.81 | bwd_inner_microstep: 1290.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3532
[2024-06-11 01:32:13,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.12 | bwd_microstep: 1197.86 | bwd_inner_microstep: 1197.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-11 01:32:15,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1297.27 | bwd_inner_microstep: 1297.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-11 01:32:17,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1409.20 | bwd_inner_microstep: 1409.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-11 01:32:18,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.78 | bwd_microstep: 1159.26 | bwd_inner_microstep: 1159.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-11 01:32:20,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1397.31 | bwd_inner_microstep: 1397.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-11 01:32:21,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.99 | bwd_microstep: 702.12 | bwd_inner_microstep: 702.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-11 01:32:23,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.57 | bwd_microstep: 1606.52 | bwd_inner_microstep: 1606.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3437
[2024-06-11 01:32:25,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.53 | bwd_microstep: 1215.33 | bwd_inner_microstep: 1215.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3562
[2024-06-11 01:32:27,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1360.16 | bwd_inner_microstep: 1360.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609
[2024-06-11 01:32:29,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.91 | bwd_microstep: 1537.75 | bwd_inner_microstep: 1537.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3558
[2024-06-11 01:32:31,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.12 | bwd_microstep: 1358.64 | bwd_inner_microstep: 1358.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379
[2024-06-11 01:32:33,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1337.95 | bwd_inner_microstep: 1337.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2682
[2024-06-11 01:32:34,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.28 | bwd_microstep: 1108.04 | bwd_inner_microstep: 1108.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3811
[2024-06-11 01:32:37,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.53 | bwd_microstep: 1615.42 | bwd_inner_microstep: 1615.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3025
[2024-06-11 01:32:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-11 01:32:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.98 | bwd_microstep: 4266.36 | bwd_inner_microstep: 1389.58 | bwd_allreduce_microstep: 2876.73 | step_microstep: 37.97
[2024-06-11 01:32:41,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15860.67 | bwd: 45233.09 | bwd_inner: 42355.41 | bwd_allreduce: 2876.98 | step: 39.51
{'loss': 1.1832, 'learning_rate': 2.870200768687603e-06, 'epoch': 0.83}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448
[2024-06-11 01:32:43,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.75 | bwd_microstep: 1308.62 | bwd_inner_microstep: 1308.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-11 01:32:45,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.61 | bwd_microstep: 1152.73 | bwd_inner_microstep: 1152.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-11 01:32:47,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1451.83 | bwd_inner_microstep: 1451.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-11 01:32:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.21 | bwd_microstep: 790.79 | bwd_inner_microstep: 790.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 01:32:50,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1246.83 | bwd_inner_microstep: 1246.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 01:32:51,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-11 01:32:54,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.31 | bwd_microstep: 1627.67 | bwd_inner_microstep: 1627.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 01:32:56,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.06 | bwd_microstep: 1343.64 | bwd_inner_microstep: 1343.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2116
[2024-06-11 01:32:57,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.29 | bwd_microstep: 862.12 | bwd_inner_microstep: 862.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491
[2024-06-11 01:32:59,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.08 | bwd_microstep: 1444.03 | bwd_inner_microstep: 1444.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3496
[2024-06-11 01:33:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1678.16 | bwd_inner_microstep: 1678.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 01:33:03,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681
[2024-06-11 01:33:05,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.71 | bwd_microstep: 1421.25 | bwd_inner_microstep: 1421.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1953
[2024-06-11 01:33:06,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.66 | bwd_microstep: 893.70 | bwd_inner_microstep: 893.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-11 01:33:08,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.41 | bwd_microstep: 1427.10 | bwd_inner_microstep: 1427.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3507
[2024-06-11 01:33:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.79 | bwd_microstep: 1551.28 | bwd_inner_microstep: 1551.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 01:33:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1511.43 | bwd_inner_microstep: 1511.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826
[2024-06-11 01:33:14,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.07 | bwd_microstep: 1519.68 | bwd_inner_microstep: 1519.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-11 01:33:16,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1397.51 | bwd_inner_microstep: 1397.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 01:33:19,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.79 | bwd_inner_microstep: 1556.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3798
[2024-06-11 01:33:21,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.58 | bwd_microstep: 1749.18 | bwd_inner_microstep: 1749.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836
[2024-06-11 01:33:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.47 | bwd_microstep: 1768.86 | bwd_inner_microstep: 1768.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3833
[2024-06-11 01:33:26,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.27 | bwd_microstep: 1621.40 | bwd_inner_microstep: 1621.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-11 01:33:27,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.39 | bwd_microstep: 1191.06 | bwd_inner_microstep: 1191.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2185
[2024-06-11 01:33:29,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.44 | bwd_microstep: 1051.77 | bwd_inner_microstep: 1051.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 01:33:31,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1349.83 | bwd_inner_microstep: 1349.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 01:33:32,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.93 | bwd_microstep: 1256.11 | bwd_inner_microstep: 1256.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3549
[2024-06-11 01:33:34,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1452.58 | bwd_inner_microstep: 1452.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 01:33:36,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1560.59 | bwd_inner_microstep: 1560.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4000
[2024-06-11 01:33:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.95 | bwd_microstep: 1710.41 | bwd_inner_microstep: 1710.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3583
[2024-06-11 01:33:41,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.62 | bwd_microstep: 1803.54 | bwd_inner_microstep: 1803.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2813
[2024-06-11 01:33:57,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-11 01:33:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.09 | bwd_microstep: 14775.17 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 13391.72 | step_microstep: 38.06
[2024-06-11 01:33:57,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16598.52 | bwd: 58139.92 | bwd_inner: 44747.29 | bwd_allreduce: 13391.95 | step: 39.55
{'loss': 1.1188, 'learning_rate': 2.850857277386978e-06, 'epoch': 0.83}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3626
[2024-06-11 01:33:58,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.36 | bwd_microstep: 1264.63 | bwd_inner_microstep: 1264.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4565
[2024-06-11 01:34:01,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 682.61 | bwd_microstep: 1843.66 | bwd_inner_microstep: 1843.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3970
[2024-06-11 01:34:03,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.91 | bwd_microstep: 1693.09 | bwd_inner_microstep: 1693.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3876
[2024-06-11 01:34:05,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1475.82 | bwd_inner_microstep: 1475.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-11 01:34:07,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.02 | bwd_microstep: 1147.97 | bwd_inner_microstep: 1147.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 01:34:09,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.78 | bwd_microstep: 1552.32 | bwd_inner_microstep: 1552.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 01:34:11,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.89 | bwd_microstep: 1247.72 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-11 01:34:13,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.11 | bwd_microstep: 1642.06 | bwd_inner_microstep: 1642.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2516
[2024-06-11 01:34:14,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.51 | bwd_microstep: 961.58 | bwd_inner_microstep: 961.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 01:34:16,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.65 | bwd_microstep: 1280.56 | bwd_inner_microstep: 1280.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 01:34:18,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.68 | bwd_inner_microstep: 1254.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673
[2024-06-11 01:34:20,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.10 | bwd_microstep: 1327.26 | bwd_inner_microstep: 1327.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-11 01:34:21,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1345.09 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 01:34:23,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3631
[2024-06-11 01:34:25,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.03 | bwd_microstep: 1451.72 | bwd_inner_microstep: 1451.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-11 01:34:27,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.14 | bwd_microstep: 1144.80 | bwd_inner_microstep: 1144.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 01:34:29,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1515.21 | bwd_inner_microstep: 1515.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-11 01:34:31,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1348.69 | bwd_inner_microstep: 1348.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 01:34:33,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1397.82 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473
[2024-06-11 01:34:35,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.99 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-11 01:34:37,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1449.49 | bwd_inner_microstep: 1449.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816
[2024-06-11 01:34:39,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.32 | bwd_microstep: 1754.61 | bwd_inner_microstep: 1754.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-11 01:34:41,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.17 | bwd_microstep: 1499.30 | bwd_inner_microstep: 1499.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-11 01:34:43,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.17 | bwd_microstep: 1319.02 | bwd_inner_microstep: 1318.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-11 01:34:45,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1359.79 | bwd_inner_microstep: 1359.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 01:34:47,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.48 | bwd_microstep: 1187.85 | bwd_inner_microstep: 1187.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997
[2024-06-11 01:34:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.38 | bwd_microstep: 801.08 | bwd_inner_microstep: 801.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2269
[2024-06-11 01:34:49,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.80 | bwd_microstep: 811.15 | bwd_inner_microstep: 811.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-11 01:34:51,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1637.43 | bwd_inner_microstep: 1637.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 01:34:53,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.46 | bwd_microstep: 1377.59 | bwd_inner_microstep: 1377.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3817
[2024-06-11 01:34:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.24 | bwd_microstep: 1721.18 | bwd_inner_microstep: 1721.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3799
[2024-06-11 01:35:01,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-11 01:35:01,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 4795.44 | bwd_inner_microstep: 1734.56 | bwd_allreduce_microstep: 3060.82 | step_microstep: 37.94
[2024-06-11 01:35:01,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16440.91 | bwd: 47339.91 | bwd_inner: 44278.19 | bwd_allreduce: 3061.04 | step: 39.37
{'loss': 1.1472, 'learning_rate': 2.831574185277883e-06, 'epoch': 0.83}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2926
[2024-06-11 01:35:02,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.24 | bwd_microstep: 1181.75 | bwd_inner_microstep: 1181.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3448
[2024-06-11 01:35:04,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.37 | bwd_microstep: 1216.32 | bwd_inner_microstep: 1216.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3402
[2024-06-11 01:35:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.82 | bwd_microstep: 1306.48 | bwd_inner_microstep: 1306.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-11 01:35:08,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1646.92 | bwd_inner_microstep: 1646.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-11 01:35:10,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.11 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-11 01:35:12,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1250.58 | bwd_inner_microstep: 1250.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-11 01:35:14,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1250.90 | bwd_inner_microstep: 1250.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3733
[2024-06-11 01:35:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.95 | bwd_microstep: 1396.49 | bwd_inner_microstep: 1396.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-11 01:35:16,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.74 | bwd_microstep: 685.27 | bwd_inner_microstep: 685.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651
[2024-06-11 01:35:18,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.27 | bwd_microstep: 1425.38 | bwd_inner_microstep: 1425.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 01:35:20,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1389.81 | bwd_inner_microstep: 1389.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 01:35:22,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1256.32 | bwd_inner_microstep: 1256.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 01:35:24,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.90 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-11 01:35:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.37 | bwd_microstep: 1612.14 | bwd_inner_microstep: 1612.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3631
[2024-06-11 01:35:28,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.65 | bwd_microstep: 1458.90 | bwd_inner_microstep: 1458.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-11 01:35:30,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.62 | bwd_microstep: 1301.42 | bwd_inner_microstep: 1301.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824
[2024-06-11 01:35:32,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.14 | bwd_microstep: 1356.21 | bwd_inner_microstep: 1356.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948
[2024-06-11 01:35:33,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.52 | bwd_microstep: 728.63 | bwd_inner_microstep: 728.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 01:35:35,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1255.54 | bwd_inner_microstep: 1255.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 01:35:37,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.88 | bwd_microstep: 1549.78 | bwd_inner_microstep: 1549.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3641
[2024-06-11 01:35:38,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.77 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540
[2024-06-11 01:35:40,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1424.09 | bwd_inner_microstep: 1424.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3806
[2024-06-11 01:35:42,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1288.51 | bwd_inner_microstep: 1288.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-11 01:35:44,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1506.08 | bwd_inner_microstep: 1506.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3811
[2024-06-11 01:35:46,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.28 | bwd_microstep: 1581.35 | bwd_inner_microstep: 1581.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 01:35:48,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1390.36 | bwd_inner_microstep: 1390.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-11 01:35:50,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.75 | bwd_microstep: 1404.41 | bwd_inner_microstep: 1404.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570
[2024-06-11 01:35:52,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1489.67 | bwd_inner_microstep: 1489.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583
[2024-06-11 01:35:54,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021
[2024-06-11 01:35:56,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.44 | bwd_microstep: 809.35 | bwd_inner_microstep: 809.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3481
[2024-06-11 01:35:58,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.48 | bwd_microstep: 1426.92 | bwd_inner_microstep: 1426.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3776
[2024-06-11 01:36:01,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.04 | optimizer_step: 6.61
[2024-06-11 01:36:01,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.70 | bwd_microstep: 3234.92 | bwd_inner_microstep: 1975.84 | bwd_allreduce_microstep: 1259.03 | step_microstep: 37.59
[2024-06-11 01:36:01,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16095.28 | bwd: 44425.62 | bwd_inner: 43165.70 | bwd_allreduce: 1259.25 | step: 39.13


 83%|████████▎ | 1435/1726 [24:53:14<5:01:42, 62.21s/it]
 83%|████████▎ | 1436/1726 [24:54:17<5:00:50, 62.24s/it]


 83%|████████▎ | 1436/1726 [24:54:17<5:00:50, 62.24s/it]
 83%|████████▎ | 1437/1726 [24:55:18<4:58:37, 62.00s/it]


 83%|████████▎ | 1437/1726 [24:55:18<4:58:37, 62.00s/it]
 83%|████████▎ | 1438/1726 [24:56:33<5:16:25, 65.92s/it]


 83%|████████▎ | 1438/1726 [24:56:33<5:16:25, 65.92s/it]
 83%|████████▎ | 1439/1726 [24:57:37<5:12:43, 65.38s/it]


 83%|████████▎ | 1439/1726 [24:57:37<5:12:43, 65.38s/it]
 83%|████████▎ | 1440/1726 [24:58:38<5:05:10, 64.02s/it]
            {'loss': 1.2137, 'learning_rate': 2.812351560275246e-06, 'epoch': 0.83}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 01:36:04,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1491.99 | bwd_inner_microstep: 1491.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3991
[2024-06-11 01:36:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.65 | bwd_microstep: 1701.77 | bwd_inner_microstep: 1701.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3888
[2024-06-11 01:36:08,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.14 | bwd_microstep: 1681.26 | bwd_inner_microstep: 1681.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 01:36:10,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.96 | bwd_microstep: 1274.90 | bwd_inner_microstep: 1274.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-11 01:36:12,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.42 | bwd_microstep: 1348.64 | bwd_inner_microstep: 1348.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-11 01:36:14,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1482.37 | bwd_inner_microstep: 1482.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-11 01:36:16,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1429.28 | bwd_inner_microstep: 1429.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 01:36:17,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.71 | bwd_microstep: 792.51 | bwd_inner_microstep: 792.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 01:36:19,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3527
[2024-06-11 01:36:21,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.48 | bwd_microstep: 1541.43 | bwd_inner_microstep: 1541.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430
[2024-06-11 01:36:23,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.30 | bwd_microstep: 1451.77 | bwd_inner_microstep: 1451.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474
[2024-06-11 01:36:25,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.11 | bwd_microstep: 1577.66 | bwd_inner_microstep: 1577.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-11 01:36:26,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.74 | bwd_microstep: 679.88 | bwd_inner_microstep: 679.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583
[2024-06-11 01:36:28,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.07 | bwd_microstep: 1238.68 | bwd_inner_microstep: 1238.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-11 01:36:29,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.00 | bwd_microstep: 796.86 | bwd_inner_microstep: 796.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984
[2024-06-11 01:36:30,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.99 | bwd_microstep: 736.45 | bwd_inner_microstep: 736.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-11 01:36:32,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.73 | bwd_microstep: 1624.84 | bwd_inner_microstep: 1624.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 01:36:34,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.19 | bwd_microstep: 1283.33 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-11 01:36:36,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1293.60 | bwd_inner_microstep: 1293.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107
[2024-06-11 01:36:37,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.64 | bwd_microstep: 923.34 | bwd_inner_microstep: 923.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1984
[2024-06-11 01:36:38,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.03 | bwd_microstep: 706.80 | bwd_inner_microstep: 706.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637
[2024-06-11 01:36:40,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1515.98 | bwd_inner_microstep: 1515.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-11 01:36:42,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 1541.12 | bwd_inner_microstep: 1541.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1998
[2024-06-11 01:36:43,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.93 | bwd_microstep: 862.47 | bwd_inner_microstep: 862.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 01:36:45,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1393.07 | bwd_inner_microstep: 1393.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-11 01:36:47,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1439.61 | bwd_inner_microstep: 1439.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3383
[2024-06-11 01:36:49,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.08 | bwd_microstep: 1274.46 | bwd_inner_microstep: 1274.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 01:36:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.39 | bwd_microstep: 1476.84 | bwd_inner_microstep: 1476.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601
[2024-06-11 01:36:53,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.39 | bwd_microstep: 1702.56 | bwd_inner_microstep: 1702.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3622
[2024-06-11 01:36:55,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1376.22 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3669
[2024-06-11 01:36:57,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1479.76 | bwd_inner_microstep: 1479.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1194
[2024-06-11 01:37:02,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.06 | optimizer_step: 6.61
[2024-06-11 01:37:02,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 171.63 | bwd_microstep: 3947.93 | bwd_inner_microstep: 512.75 | bwd_allreduce_microstep: 3435.12 | step_microstep: 37.82
[2024-06-11 01:37:02,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.38 | bwd: 44448.17 | bwd_inner: 41012.14 | bwd_allreduce: 3435.35 | step: 39.27
{'loss': 1.19, 'learning_rate': 2.7931894700810703e-06, 'epoch': 0.83}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3466
[2024-06-11 01:37:03,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.65 | bwd_microstep: 1336.03 | bwd_inner_microstep: 1336.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3902
[2024-06-11 01:37:06,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1515.66 | bwd_inner_microstep: 1515.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 01:37:07,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1273.81 | bwd_inner_microstep: 1273.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3991
[2024-06-11 01:37:10,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.02 | bwd_microstep: 1702.47 | bwd_inner_microstep: 1702.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-11 01:37:11,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 794.34 | bwd_inner_microstep: 794.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736
[2024-06-11 01:37:13,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1429.78 | bwd_inner_microstep: 1429.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 01:37:15,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1282.04 | bwd_inner_microstep: 1282.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3497
[2024-06-11 01:37:16,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.23 | bwd_microstep: 1219.49 | bwd_inner_microstep: 1219.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-11 01:37:17,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 794.00 | bwd_inner_microstep: 793.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 01:37:19,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667
[2024-06-11 01:37:21,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.23 | bwd_microstep: 1323.03 | bwd_inner_microstep: 1323.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 01:37:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1482.90 | bwd_inner_microstep: 1482.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2659
[2024-06-11 01:37:24,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.35 | bwd_microstep: 1120.37 | bwd_inner_microstep: 1120.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 01:37:26,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1473.49 | bwd_inner_microstep: 1473.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3637
[2024-06-11 01:37:29,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.34 | bwd_microstep: 1707.06 | bwd_inner_microstep: 1707.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 01:37:31,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.88 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-11 01:37:33,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.27 | bwd_microstep: 1512.56 | bwd_inner_microstep: 1512.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2097
[2024-06-11 01:37:34,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.11 | bwd_microstep: 917.56 | bwd_inner_microstep: 917.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3430
[2024-06-11 01:37:36,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1392.60 | bwd_inner_microstep: 1392.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1928
[2024-06-11 01:37:37,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.86 | bwd_microstep: 760.16 | bwd_inner_microstep: 760.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 01:37:39,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1376.19 | bwd_inner_microstep: 1376.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 01:37:41,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1412.62 | bwd_inner_microstep: 1412.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-11 01:37:43,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.84 | bwd_microstep: 1186.79 | bwd_inner_microstep: 1186.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671
[2024-06-11 01:37:45,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1429.89 | bwd_inner_microstep: 1429.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3522
[2024-06-11 01:37:47,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582
[2024-06-11 01:37:49,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.40 | bwd_microstep: 1305.56 | bwd_inner_microstep: 1305.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428
[2024-06-11 01:37:50,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.03 | bwd_microstep: 1282.47 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-11 01:37:52,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.01 | bwd_microstep: 1181.73 | bwd_inner_microstep: 1181.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 01:37:54,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.19 | bwd_microstep: 1457.25 | bwd_inner_microstep: 1457.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-11 01:37:56,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1458.28 | bwd_inner_microstep: 1458.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3817
[2024-06-11 01:37:59,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.58 | bwd_microstep: 1860.13 | bwd_inner_microstep: 1860.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442
[2024-06-11 01:38:03,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.07 | optimizer_step: 6.63
[2024-06-11 01:38:03,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 4102.73 | bwd_inner_microstep: 1641.60 | bwd_allreduce_microstep: 2461.08 | step_microstep: 37.75
[2024-06-11 01:38:03,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.12 | bwd: 45377.29 | bwd_inner: 42915.31 | bwd_allreduce: 2461.31 | step: 39.18
{'loss': 1.2245, 'learning_rate': 2.774087982184124e-06, 'epoch': 0.84}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-11 01:38:05,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1402.49 | bwd_inner_microstep: 1402.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 4040
[2024-06-11 01:38:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.66 | bwd_microstep: 1355.07 | bwd_inner_microstep: 1355.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:38:09,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.42 | bwd_microstep: 1241.39 | bwd_inner_microstep: 1241.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 01:38:11,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1374.25 | bwd_inner_microstep: 1374.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798
[2024-06-11 01:38:13,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.49 | bwd_microstep: 1452.18 | bwd_inner_microstep: 1452.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782
[2024-06-11 01:38:15,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.90 | bwd_microstep: 1477.99 | bwd_inner_microstep: 1477.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3763
[2024-06-11 01:38:17,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.70 | bwd_microstep: 1606.94 | bwd_inner_microstep: 1606.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-11 01:38:19,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.53 | bwd_microstep: 1149.63 | bwd_inner_microstep: 1149.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 01:38:20,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.67 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2402
[2024-06-11 01:38:22,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.07 | bwd_microstep: 839.86 | bwd_inner_microstep: 839.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 01:38:23,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.14 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 01:38:25,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1281.13 | bwd_inner_microstep: 1281.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3592
[2024-06-11 01:38:27,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.39 | bwd_microstep: 1212.86 | bwd_inner_microstep: 1212.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3696
[2024-06-11 01:38:29,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 1358.73 | bwd_inner_microstep: 1358.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 01:38:31,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1383.55 | bwd_inner_microstep: 1383.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3501
[2024-06-11 01:38:33,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.80 | bwd_microstep: 1680.25 | bwd_inner_microstep: 1680.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 01:38:35,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1486.16 | bwd_inner_microstep: 1486.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-11 01:38:37,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1472.53 | bwd_inner_microstep: 1472.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-11 01:38:39,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1352.25 | bwd_inner_microstep: 1352.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955
[2024-06-11 01:38:40,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.95 | bwd_microstep: 823.85 | bwd_inner_microstep: 823.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-11 01:38:42,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1556.23 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 01:38:44,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1392.55 | bwd_inner_microstep: 1392.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-11 01:38:46,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.98 | bwd_microstep: 1517.83 | bwd_inner_microstep: 1517.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 01:38:48,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1258.73 | bwd_inner_microstep: 1258.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 01:38:50,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1397.04 | bwd_inner_microstep: 1397.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490
[2024-06-11 01:38:52,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.28 | bwd_microstep: 1316.63 | bwd_inner_microstep: 1316.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947
[2024-06-11 01:38:53,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.39 | bwd_microstep: 729.26 | bwd_inner_microstep: 729.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 01:38:55,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.00 | bwd_microstep: 1410.51 | bwd_inner_microstep: 1410.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 01:38:57,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.32 | bwd_microstep: 1543.88 | bwd_inner_microstep: 1543.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 01:38:59,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1550.70 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 01:39:01,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.56 | bwd_microstep: 1482.25 | bwd_inner_microstep: 1482.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397
[2024-06-11 01:39:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.01 | optimizer_step: 6.60
[2024-06-11 01:39:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 2330.01 | bwd_inner_microstep: 1591.52 | bwd_allreduce_microstep: 738.44 | step_microstep: 37.53
[2024-06-11 01:39:04,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16142.13 | bwd: 44002.92 | bwd_inner: 43263.58 | bwd_allreduce: 738.67 | step: 38.99
{'loss': 1.1462, 'learning_rate': 2.755047163859763e-06, 'epoch': 0.84}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-11 01:39:06,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921
[2024-06-11 01:39:07,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.29 | bwd_microstep: 786.14 | bwd_inner_microstep: 786.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2369
[2024-06-11 01:39:08,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.41 | bwd_microstep: 995.76 | bwd_inner_microstep: 995.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-11 01:39:10,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.60 | bwd_microstep: 1531.16 | bwd_inner_microstep: 1531.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-11 01:39:11,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.54 | bwd_microstep: 789.99 | bwd_inner_microstep: 789.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-11 01:39:13,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.00 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 01:39:15,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1283.20 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2689
[2024-06-11 01:39:16,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.52 | bwd_microstep: 1061.99 | bwd_inner_microstep: 1061.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3486
[2024-06-11 01:39:19,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.88 | bwd_microstep: 1509.45 | bwd_inner_microstep: 1509.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3399
[2024-06-11 01:39:20,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.12 | bwd_microstep: 1207.44 | bwd_inner_microstep: 1207.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-11 01:39:22,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.49 | bwd_microstep: 1624.25 | bwd_inner_microstep: 1624.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 01:39:25,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.09 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 01:39:27,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1559.19 | bwd_inner_microstep: 1559.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503
[2024-06-11 01:39:29,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1445.72 | bwd_inner_microstep: 1445.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 01:39:30,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1284.98 | bwd_inner_microstep: 1284.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-11 01:39:32,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.60 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 01:39:34,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1510.82 | bwd_inner_microstep: 1510.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-11 01:39:36,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1413.03 | bwd_inner_microstep: 1413.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091
[2024-06-11 01:39:38,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.54 | bwd_microstep: 821.95 | bwd_inner_microstep: 821.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 01:39:40,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1556.64 | bwd_inner_microstep: 1556.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-11 01:39:41,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.11 | bwd_microstep: 819.55 | bwd_inner_microstep: 819.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2081
[2024-06-11 01:39:42,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.12 | bwd_microstep: 881.48 | bwd_inner_microstep: 881.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 01:39:44,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.94 | bwd_microstep: 1391.39 | bwd_inner_microstep: 1391.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2115
[2024-06-11 01:39:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.10 | bwd_microstep: 925.46 | bwd_inner_microstep: 925.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428
[2024-06-11 01:39:47,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1284.88 | bwd_inner_microstep: 1284.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817
[2024-06-11 01:39:49,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.30 | bwd_microstep: 1511.42 | bwd_inner_microstep: 1511.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3564
[2024-06-11 01:39:51,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1524.68 | bwd_inner_microstep: 1524.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 01:39:53,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547
[2024-06-11 01:39:55,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1523.41 | bwd_inner_microstep: 1523.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-11 01:39:57,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1396.43 | bwd_inner_microstep: 1396.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3456
[2024-06-11 01:39:59,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1400.00 | bwd_inner_microstep: 1399.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579
[2024-06-11 01:40:05,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.61
[2024-06-11 01:40:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.31 | bwd_microstep: 4974.82 | bwd_inner_microstep: 1605.11 | bwd_allreduce_microstep: 3369.65 | step_microstep: 38.64
[2024-06-11 01:40:05,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15556.16 | bwd: 45123.57 | bwd_inner: 41753.00 | bwd_allreduce: 3369.89 | step: 40.17
{'loss': 1.1567, 'learning_rate': 2.7360670821696422e-06, 'epoch': 0.84}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-11 01:40:06,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.79 | bwd_microstep: 1141.86 | bwd_inner_microstep: 1141.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3855
[2024-06-11 01:40:09,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.76 | bwd_microstep: 1655.48 | bwd_inner_microstep: 1655.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4226
[2024-06-11 01:40:11,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.45 | bwd_microstep: 1564.80 | bwd_inner_microstep: 1564.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 01:40:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 01:40:15,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1374.72 | bwd_inner_microstep: 1374.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-11 01:40:17,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1408.84 | bwd_inner_microstep: 1408.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 01:40:18,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1252.15 | bwd_inner_microstep: 1252.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-11 01:40:19,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 699.06 | bwd_inner_microstep: 699.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 01:40:21,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1255.66 | bwd_inner_microstep: 1255.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 01:40:23,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-11 01:40:25,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.73 | bwd_microstep: 1278.46 | bwd_inner_microstep: 1278.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3650
[2024-06-11 01:40:27,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.71 | bwd_microstep: 1666.25 | bwd_inner_microstep: 1666.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-11 01:40:29,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.20 | bwd_microstep: 1617.82 | bwd_inner_microstep: 1617.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-11 01:40:31,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1550.45 | bwd_inner_microstep: 1550.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-11 01:40:32,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 788.21 | bwd_inner_microstep: 788.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1967
[2024-06-11 01:40:33,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 824.45 | bwd_inner_microstep: 824.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-11 01:40:36,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2437
[2024-06-11 01:40:37,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.23 | bwd_microstep: 995.11 | bwd_inner_microstep: 995.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3531
[2024-06-11 01:40:39,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.51 | bwd_microstep: 1453.70 | bwd_inner_microstep: 1453.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 01:40:41,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.30 | bwd_microstep: 1259.95 | bwd_inner_microstep: 1259.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-11 01:40:43,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.28 | bwd_microstep: 1554.55 | bwd_inner_microstep: 1554.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 01:40:45,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1392.22 | bwd_inner_microstep: 1392.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 01:40:47,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.77 | bwd_microstep: 1375.05 | bwd_inner_microstep: 1375.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 01:40:49,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1558.70 | bwd_inner_microstep: 1558.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3526
[2024-06-11 01:40:51,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.41 | bwd_microstep: 1582.25 | bwd_inner_microstep: 1582.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2180
[2024-06-11 01:40:52,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.52 | bwd_microstep: 952.03 | bwd_inner_microstep: 952.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3571
[2024-06-11 01:40:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1457.82 | bwd_inner_microstep: 1457.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3595
[2024-06-11 01:40:56,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.11 | bwd_microstep: 1373.14 | bwd_inner_microstep: 1373.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2722
[2024-06-11 01:40:58,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.39 | bwd_microstep: 1233.14 | bwd_inner_microstep: 1233.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383
[2024-06-11 01:41:00,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.31 | bwd_microstep: 1242.34 | bwd_inner_microstep: 1242.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-11 01:41:02,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.80 | bwd_microstep: 1500.85 | bwd_inner_microstep: 1500.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:41:07,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.09 | optimizer_step: 6.59
[2024-06-11 01:41:07,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 5120.19 | bwd_inner_microstep: 1411.62 | bwd_allreduce_microstep: 3708.52 | step_microstep: 39.09
[2024-06-11 01:41:07,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15884.14 | bwd: 46315.53 | bwd_inner: 42606.11 | bwd_allreduce: 3708.75 | step: 40.54


 83%|████████▎ | 1440/1726 [24:58:38<5:05:10, 64.02s/it]
 83%|████████▎ | 1441/1726 [24:59:38<4:58:32, 62.85s/it]


 83%|████████▎ | 1441/1726 [24:59:38<4:58:32, 62.85s/it]
 84%|████████▎ | 1442/1726 [25:00:40<4:55:52, 62.51s/it]


 84%|████████▎ | 1442/1726 [25:00:40<4:55:52, 62.51s/it]
 84%|████████▎ | 1443/1726 [25:01:41<4:51:56, 61.90s/it]


 84%|████████▎ | 1443/1726 [25:01:41<4:51:56, 61.90s/it]
 84%|████████▎ | 1444/1726 [25:02:42<4:49:39, 61.63s/it]


 84%|████████▎ | 1444/1726 [25:02:42<4:49:39, 61.63s/it]
 84%|████████▎ | 1445/1726 [25:03:44<4:49:53, 61.90s/{'loss': 1.1724, 'learning_rate': 2.717147803961511e-06, 'epoch': 0.84}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-11 01:41:09,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 01:41:11,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1339.83 | bwd_inner_microstep: 1339.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775
[2024-06-11 01:41:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.40 | bwd_microstep: 1637.24 | bwd_inner_microstep: 1637.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-11 01:41:15,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1536.74 | bwd_inner_microstep: 1536.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-11 01:41:17,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1394.09 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3714
[2024-06-11 01:41:19,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1330.12 | bwd_inner_microstep: 1330.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3710
[2024-06-11 01:41:21,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1456.88 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475
[2024-06-11 01:41:23,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1411.81 | bwd_inner_microstep: 1411.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1895
[2024-06-11 01:41:24,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.84 | bwd_microstep: 872.76 | bwd_inner_microstep: 872.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 01:41:26,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.64 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3721
[2024-06-11 01:41:29,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.69 | bwd_microstep: 1728.65 | bwd_inner_microstep: 1728.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-11 01:41:31,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1340.97 | bwd_inner_microstep: 1340.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3404
[2024-06-11 01:41:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.81 | bwd_microstep: 1536.34 | bwd_inner_microstep: 1536.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3908
[2024-06-11 01:41:35,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444
[2024-06-11 01:41:37,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.95 | bwd_microstep: 1218.68 | bwd_inner_microstep: 1218.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644
[2024-06-11 01:41:39,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1418.72 | bwd_inner_microstep: 1418.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-11 01:41:41,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.98 | bwd_microstep: 1656.98 | bwd_inner_microstep: 1656.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 955
[2024-06-11 01:41:41,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.37 | bwd_microstep: 379.56 | bwd_inner_microstep: 379.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3698
[2024-06-11 01:41:43,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.89 | bwd_microstep: 1234.29 | bwd_inner_microstep: 1234.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-11 01:41:45,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1394.90 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 01:41:47,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1396.67 | bwd_inner_microstep: 1396.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-11 01:41:49,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1511.12 | bwd_inner_microstep: 1511.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3536
[2024-06-11 01:41:51,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.77 | bwd_microstep: 1232.67 | bwd_inner_microstep: 1232.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-11 01:41:53,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1295.27 | bwd_inner_microstep: 1295.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-11 01:41:55,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1581.85 | bwd_inner_microstep: 1581.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 01:41:57,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.93 | bwd_microstep: 1461.27 | bwd_inner_microstep: 1461.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1995
[2024-06-11 01:41:58,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 833.62 | bwd_inner_microstep: 833.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-11 01:42:00,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1514.67 | bwd_inner_microstep: 1514.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-11 01:42:02,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.89 | bwd_microstep: 1530.97 | bwd_inner_microstep: 1530.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3589
[2024-06-11 01:42:05,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.28 | bwd_microstep: 1805.04 | bwd_inner_microstep: 1805.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3746
[2024-06-11 01:42:07,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.83 | bwd_microstep: 1738.66 | bwd_inner_microstep: 1738.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3439
[2024-06-11 01:42:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.06 | optimizer_step: 6.60
[2024-06-11 01:42:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.34 | bwd_microstep: 1585.08 | bwd_inner_microstep: 1577.25 | bwd_allreduce_microstep: 7.78 | step_microstep: 37.68
[2024-06-11 01:42:09,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16681.29 | bwd: 44786.44 | bwd_inner: 44777.74 | bwd_allreduce: 8.00 | step: 39.16
{'loss': 1.2062, 'learning_rate': 2.698289395868965e-06, 'epoch': 0.84}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 01:42:11,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1472.42 | bwd_inner_microstep: 1472.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 01:42:13,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1384.31 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-11 01:42:15,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1399.93 | bwd_inner_microstep: 1399.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-11 01:42:17,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.95 | bwd_microstep: 1564.40 | bwd_inner_microstep: 1564.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-11 01:42:19,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 01:42:21,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1387.43 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-11 01:42:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1415.04 | bwd_inner_microstep: 1415.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4053
[2024-06-11 01:42:25,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.18 | bwd_microstep: 1526.32 | bwd_inner_microstep: 1526.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-11 01:42:27,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.29 | bwd_microstep: 1153.60 | bwd_inner_microstep: 1153.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 01:42:29,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1383.74 | bwd_inner_microstep: 1383.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-11 01:42:31,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1302.25 | bwd_inner_microstep: 1302.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 01:42:32,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1260.60 | bwd_inner_microstep: 1260.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3449
[2024-06-11 01:42:35,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.22 | bwd_microstep: 1479.91 | bwd_inner_microstep: 1479.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3730
[2024-06-11 01:42:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.47 | bwd_microstep: 1730.05 | bwd_inner_microstep: 1730.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 01:42:39,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1381.15 | bwd_inner_microstep: 1381.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-11 01:42:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1493.58 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-11 01:42:43,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.82 | bwd_microstep: 1625.34 | bwd_inner_microstep: 1625.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-11 01:42:45,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.31 | bwd_microstep: 1289.36 | bwd_inner_microstep: 1289.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653
[2024-06-11 01:42:47,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1325.71 | bwd_inner_microstep: 1325.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-11 01:42:49,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.80 | bwd_microstep: 1610.84 | bwd_inner_microstep: 1610.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667
[2024-06-11 01:42:51,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1325.56 | bwd_inner_microstep: 1325.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 01:42:53,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1409.82 | bwd_inner_microstep: 1409.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-11 01:42:54,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.73 | bwd_microstep: 702.32 | bwd_inner_microstep: 702.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3732
[2024-06-11 01:42:56,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.32 | bwd_microstep: 1339.19 | bwd_inner_microstep: 1339.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2102
[2024-06-11 01:42:57,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.80 | bwd_microstep: 854.20 | bwd_inner_microstep: 854.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063
[2024-06-11 01:42:58,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.94 | bwd_microstep: 910.83 | bwd_inner_microstep: 910.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 01:43:00,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.15 | bwd_microstep: 1645.99 | bwd_inner_microstep: 1645.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3572
[2024-06-11 01:43:03,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.45 | bwd_microstep: 1664.08 | bwd_inner_microstep: 1664.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3450
[2024-06-11 01:43:04,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1401.14 | bwd_inner_microstep: 1401.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440
[2024-06-11 01:43:06,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.72 | bwd_microstep: 1452.76 | bwd_inner_microstep: 1452.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 01:43:08,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1344.32 | bwd_inner_microstep: 1344.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-11 01:43:14,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.14 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-11 01:43:14,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 4634.54 | bwd_inner_microstep: 1686.59 | bwd_allreduce_microstep: 2947.90 | step_microstep: 38.41
[2024-06-11 01:43:14,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16579.19 | bwd: 47521.71 | bwd_inner: 44572.90 | bwd_allreduce: 2948.13 | step: 39.89
{'loss': 1.2167, 'learning_rate': 2.679491924311226e-06, 'epoch': 0.84}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414
[2024-06-11 01:43:15,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.27 | bwd_microstep: 1364.99 | bwd_inner_microstep: 1364.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 01:43:17,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.87 | bwd_microstep: 1243.01 | bwd_inner_microstep: 1242.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 01:43:19,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1480.70 | bwd_inner_microstep: 1480.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854
[2024-06-11 01:43:22,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.26 | bwd_microstep: 1657.19 | bwd_inner_microstep: 1657.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-11 01:43:23,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1435.60 | bwd_inner_microstep: 1435.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3406
[2024-06-11 01:43:25,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.92 | bwd_microstep: 1181.91 | bwd_inner_microstep: 1181.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743
[2024-06-11 01:43:27,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1635.23 | bwd_inner_microstep: 1635.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3491
[2024-06-11 01:43:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.51 | bwd_microstep: 1333.01 | bwd_inner_microstep: 1332.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 01:43:31,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.25 | bwd_microstep: 1394.87 | bwd_inner_microstep: 1394.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-11 01:43:33,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1409.52 | bwd_inner_microstep: 1409.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683
[2024-06-11 01:43:35,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.03 | bwd_microstep: 1625.36 | bwd_inner_microstep: 1625.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-11 01:43:36,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.07 | bwd_microstep: 707.98 | bwd_inner_microstep: 707.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3408
[2024-06-11 01:43:38,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1291.57 | bwd_inner_microstep: 1291.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-11 01:43:40,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1490.01 | bwd_inner_microstep: 1489.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 01:43:42,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.97 | bwd_microstep: 1255.87 | bwd_inner_microstep: 1255.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 01:43:44,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612
[2024-06-11 01:43:46,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.39 | bwd_microstep: 1534.47 | bwd_inner_microstep: 1534.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-11 01:43:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.55 | bwd_microstep: 695.83 | bwd_inner_microstep: 695.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-11 01:43:49,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1324.48 | bwd_inner_microstep: 1324.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-11 01:43:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1293.51 | bwd_inner_microstep: 1293.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-11 01:43:53,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.37 | bwd_microstep: 1580.32 | bwd_inner_microstep: 1580.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-11 01:43:55,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1499.36 | bwd_inner_microstep: 1499.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436
[2024-06-11 01:43:56,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.20 | bwd_microstep: 1286.68 | bwd_inner_microstep: 1286.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-11 01:43:59,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.35 | bwd_microstep: 1547.70 | bwd_inner_microstep: 1547.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3905
[2024-06-11 01:44:01,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.35 | bwd_microstep: 1696.79 | bwd_inner_microstep: 1696.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-11 01:44:03,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.95 | bwd_microstep: 1457.36 | bwd_inner_microstep: 1457.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-11 01:44:05,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.98 | bwd_microstep: 1157.32 | bwd_inner_microstep: 1157.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-11 01:44:07,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.71 | bwd_microstep: 1621.95 | bwd_inner_microstep: 1621.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 01:44:09,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1556.23 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 01:44:11,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1497.92 | bwd_inner_microstep: 1497.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527
[2024-06-11 01:44:13,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1494.21 | bwd_inner_microstep: 1494.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571
[2024-06-11 01:44:15,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.39 | optimizer_gradients: 4.06 | optimizer_step: 6.65
[2024-06-11 01:44:15,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.38 | bwd_microstep: 1541.40 | bwd_inner_microstep: 1533.73 | bwd_allreduce_microstep: 7.63 | step_microstep: 39.03
[2024-06-11 01:44:15,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16684.41 | bwd: 44641.69 | bwd_inner: 44633.17 | bwd_allreduce: 7.85 | step: 40.54
{'loss': 1.1905, 'learning_rate': 2.6607554554928917e-06, 'epoch': 0.84}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-11 01:44:17,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1478.95 | bwd_inner_microstep: 1478.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3908
[2024-06-11 01:44:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.46 | bwd_microstep: 1691.23 | bwd_inner_microstep: 1691.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3872
[2024-06-11 01:44:22,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-11 01:44:24,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.28 | bwd_microstep: 1560.70 | bwd_inner_microstep: 1560.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 01:44:26,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1388.18 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-11 01:44:28,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1549.31 | bwd_inner_microstep: 1549.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-11 01:44:30,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.18 | bwd_microstep: 1385.94 | bwd_inner_microstep: 1385.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 01:44:32,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.66 | bwd_microstep: 1388.53 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 01:44:33,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1251.33 | bwd_inner_microstep: 1251.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3509
[2024-06-11 01:44:35,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1250.51 | bwd_inner_microstep: 1250.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3969
[2024-06-11 01:44:38,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.52 | bwd_microstep: 1809.62 | bwd_inner_microstep: 1809.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-11 01:44:39,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1408.43 | bwd_inner_microstep: 1408.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 949
[2024-06-11 01:44:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 157.52 | bwd_microstep: 412.33 | bwd_inner_microstep: 412.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513
[2024-06-11 01:44:42,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1420.20 | bwd_inner_microstep: 1420.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-11 01:44:44,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.40 | bwd_microstep: 1522.78 | bwd_inner_microstep: 1522.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-11 01:44:45,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 790.63 | bwd_inner_microstep: 790.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3648
[2024-06-11 01:44:48,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.17 | bwd_microstep: 1815.13 | bwd_inner_microstep: 1815.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 01:44:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1489.28 | bwd_inner_microstep: 1489.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 01:44:52,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.70 | bwd_microstep: 1557.04 | bwd_inner_microstep: 1557.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-11 01:44:54,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 1495.67 | bwd_inner_microstep: 1495.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099
[2024-06-11 01:44:55,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.51 | bwd_microstep: 921.42 | bwd_inner_microstep: 921.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-11 01:44:57,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.86 | bwd_microstep: 1504.25 | bwd_inner_microstep: 1504.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999
[2024-06-11 01:44:58,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.00 | bwd_microstep: 709.61 | bwd_inner_microstep: 709.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030
[2024-06-11 01:44:59,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.07 | bwd_microstep: 808.72 | bwd_inner_microstep: 808.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-11 01:45:01,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.90 | bwd_microstep: 1452.95 | bwd_inner_microstep: 1452.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521
[2024-06-11 01:45:03,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1396.08 | bwd_inner_microstep: 1396.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 01:45:05,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1382.32 | bwd_inner_microstep: 1382.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477
[2024-06-11 01:45:07,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.09 | bwd_microstep: 1477.34 | bwd_inner_microstep: 1477.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 01:45:09,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.83 | bwd_microstep: 1549.37 | bwd_inner_microstep: 1549.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 01:45:11,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3563
[2024-06-11 01:45:14,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.15 | bwd_microstep: 1629.65 | bwd_inner_microstep: 1629.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-11 01:45:16,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.03 | optimizer_step: 6.58
[2024-06-11 01:45:16,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.12 | bwd_microstep: 1847.78 | bwd_inner_microstep: 1605.13 | bwd_allreduce_microstep: 242.60 | step_microstep: 37.54
[2024-06-11 01:45:16,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16367.28 | bwd: 44147.74 | bwd_inner: 43904.11 | bwd_allreduce: 242.89 | step: 39.07
{'loss': 1.193, 'learning_rate': 2.642080055403704e-06, 'epoch': 0.84}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-11 01:45:18,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.74 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.53 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 01:45:20,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1398.68 | bwd_inner_microstep: 1398.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4251
[2024-06-11 01:45:23,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.75 | bwd_microstep: 1766.90 | bwd_inner_microstep: 1766.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866
[2024-06-11 01:45:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.32 | bwd_microstep: 677.25 | bwd_inner_microstep: 677.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 01:45:25,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-11 01:45:27,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.04 | bwd_microstep: 1152.41 | bwd_inner_microstep: 1152.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399
[2024-06-11 01:45:29,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1150.11 | bwd_inner_microstep: 1150.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 01:45:30,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1285.70 | bwd_inner_microstep: 1285.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 01:45:32,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1247.42 | bwd_inner_microstep: 1247.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954
[2024-06-11 01:45:33,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 701.65 | bwd_inner_microstep: 701.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-11 01:45:35,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1412.54 | bwd_inner_microstep: 1412.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-11 01:45:37,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.96 | bwd_microstep: 1530.37 | bwd_inner_microstep: 1530.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-11 01:45:39,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.98 | bwd_microstep: 1714.43 | bwd_inner_microstep: 1714.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3549
[2024-06-11 01:45:41,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.26 | bwd_microstep: 1230.89 | bwd_inner_microstep: 1230.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-11 01:45:43,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.86 | bwd_microstep: 1299.80 | bwd_inner_microstep: 1299.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3839
[2024-06-11 01:45:45,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 1360.56 | bwd_inner_microstep: 1360.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3529
[2024-06-11 01:45:47,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.96 | bwd_microstep: 1259.15 | bwd_inner_microstep: 1259.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628
[2024-06-11 01:45:49,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1440.81 | bwd_inner_microstep: 1440.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646
[2024-06-11 01:45:51,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1511.11 | bwd_inner_microstep: 1511.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3486
[2024-06-11 01:45:52,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.40 | bwd_microstep: 1248.65 | bwd_inner_microstep: 1248.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-11 01:45:54,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.49 | bwd_microstep: 975.51 | bwd_inner_microstep: 975.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 01:45:56,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.97 | bwd_microstep: 1520.83 | bwd_inner_microstep: 1520.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431
[2024-06-11 01:45:57,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.72 | bwd_microstep: 1155.22 | bwd_inner_microstep: 1155.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533
[2024-06-11 01:45:59,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.05 | bwd_microstep: 1230.42 | bwd_inner_microstep: 1230.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-11 01:46:01,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1183.35 | bwd_inner_microstep: 1183.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3600
[2024-06-11 01:46:03,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1539.67 | bwd_inner_microstep: 1539.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3651
[2024-06-11 01:46:05,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.79 | bwd_microstep: 1450.03 | bwd_inner_microstep: 1450.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-11 01:46:07,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1402.49 | bwd_inner_microstep: 1402.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-11 01:46:09,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1360.31 | bwd_inner_microstep: 1360.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3720
[2024-06-11 01:46:11,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1562.91 | bwd_inner_microstep: 1562.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-11 01:46:13,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.03 | bwd_microstep: 1298.03 | bwd_inner_microstep: 1298.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588
[2024-06-11 01:46:15,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.05 | optimizer_step: 6.64
[2024-06-11 01:46:15,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.04 | bwd_microstep: 1740.65 | bwd_inner_microstep: 1732.88 | bwd_allreduce_microstep: 7.72 | step_microstep: 37.70
[2024-06-11 01:46:15,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16044.55 | bwd: 42667.02 | bwd_inner: 42658.32 | bwd_allreduce: 8.00 | step: 39.32
it]


 84%|████████▎ | 1445/1726 [25:03:44<4:49:53, 61.90s/it]
 84%|████████▍ | 1446/1726 [25:04:46<4:48:43, 61.87s/it]


 84%|████████▍ | 1446/1726 [25:04:46<4:48:43, 61.87s/it]
 84%|████████▍ | 1447/1726 [25:05:50<4:51:17, 62.64s/it]


 84%|████████▍ | 1447/1726 [25:05:50<4:51:17, 62.64s/it]
 84%|████████▍ | 1448/1726 [25:06:52<4:48:53, 62.35s/it]


 84%|████████▍ | 1448/1726 [25:06:52<4:48:53, 62.35s/it]
 84%|████████▍ | 1449/1726 [25:07:53<4:45:46, 61.90s/it]


 84%|████████▍ | 1449/1726 [25:07:53<4:45:46, 61.90s/it]
 84%|████████▍ | 1450/1726 [25:08:52<{'loss': 1.1686, 'learning_rate': 2.623465789818327e-06, 'epoch': 0.84}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 01:46:17,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1242.79 | bwd_inner_microstep: 1242.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4434
[2024-06-11 01:46:19,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.11 | bwd_microstep: 1823.35 | bwd_inner_microstep: 1823.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 01:46:21,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.12 | bwd_microstep: 1480.64 | bwd_inner_microstep: 1480.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798
[2024-06-11 01:46:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1446.51 | bwd_inner_microstep: 1446.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 01:46:25,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.30 | bwd_microstep: 1282.96 | bwd_inner_microstep: 1282.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-11 01:46:27,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.03 | bwd_microstep: 1529.88 | bwd_inner_microstep: 1529.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 845
[2024-06-11 01:46:28,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.31 | bwd_microstep: 347.56 | bwd_inner_microstep: 347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-11 01:46:30,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.13 | bwd_microstep: 1283.46 | bwd_inner_microstep: 1283.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-11 01:46:32,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.13 | bwd_microstep: 1525.55 | bwd_inner_microstep: 1525.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3485
[2024-06-11 01:46:33,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1220.32 | bwd_inner_microstep: 1220.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1936
[2024-06-11 01:46:35,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.84 | bwd_microstep: 849.28 | bwd_inner_microstep: 849.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2135
[2024-06-11 01:46:36,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.63 | bwd_microstep: 987.54 | bwd_inner_microstep: 987.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3663
[2024-06-11 01:46:38,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.48 | bwd_microstep: 1550.41 | bwd_inner_microstep: 1550.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-11 01:46:40,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.13 | bwd_microstep: 1706.51 | bwd_inner_microstep: 1706.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2011
[2024-06-11 01:46:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.57 | bwd_microstep: 897.75 | bwd_inner_microstep: 897.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-11 01:46:44,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.80 | bwd_microstep: 1709.34 | bwd_inner_microstep: 1709.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-11 01:46:46,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.62 | bwd_microstep: 1481.08 | bwd_inner_microstep: 1481.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 01:46:48,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.14 | bwd_microstep: 1288.91 | bwd_inner_microstep: 1288.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 01:46:50,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1552.27 | bwd_inner_microstep: 1552.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-11 01:46:52,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1512.41 | bwd_inner_microstep: 1512.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515
[2024-06-11 01:46:54,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1351.42 | bwd_inner_microstep: 1351.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 01:46:56,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.75 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810
[2024-06-11 01:46:58,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.47 | bwd_microstep: 1761.65 | bwd_inner_microstep: 1761.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2181
[2024-06-11 01:47:00,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.90 | bwd_microstep: 889.00 | bwd_inner_microstep: 888.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-11 01:47:02,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1497.13 | bwd_inner_microstep: 1497.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3263
[2024-06-11 01:47:03,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.48 | bwd_microstep: 1364.77 | bwd_inner_microstep: 1364.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-11 01:47:05,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.87 | bwd_microstep: 1183.91 | bwd_inner_microstep: 1183.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275
[2024-06-11 01:47:06,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.72 | bwd_microstep: 877.35 | bwd_inner_microstep: 877.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-11 01:47:08,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1402.57 | bwd_inner_microstep: 1402.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-11 01:47:10,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.23 | bwd_microstep: 1456.16 | bwd_inner_microstep: 1456.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-11 01:47:12,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.95 | bwd_microstep: 1298.61 | bwd_inner_microstep: 1298.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-11 01:47:16,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-11 01:47:16,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.21 | bwd_microstep: 3533.45 | bwd_inner_microstep: 2156.06 | bwd_allreduce_microstep: 1377.33 | step_microstep: 38.01
[2024-06-11 01:47:16,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16019.68 | bwd: 44791.95 | bwd_inner: 43413.71 | bwd_allreduce: 1377.57 | step: 39.46
{'loss': 1.2031, 'learning_rate': 2.6049127242961005e-06, 'epoch': 0.84}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 01:47:18,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.59 | bwd_microstep: 1469.72 | bwd_inner_microstep: 1469.56 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3988
[2024-06-11 01:47:20,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.04 | bwd_microstep: 1435.96 | bwd_inner_microstep: 1435.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 01:47:22,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1382.08 | bwd_inner_microstep: 1382.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 01:47:24,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1274.13 | bwd_inner_microstep: 1274.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 01:47:26,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1375.63 | bwd_inner_microstep: 1375.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 01:47:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.82 | bwd_microstep: 1282.34 | bwd_inner_microstep: 1282.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 01:47:29,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1248.16 | bwd_inner_microstep: 1248.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3409
[2024-06-11 01:47:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.21 | bwd_microstep: 1179.31 | bwd_inner_microstep: 1179.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-11 01:47:32,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.80 | bwd_microstep: 792.49 | bwd_inner_microstep: 792.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 01:47:34,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1286.28 | bwd_inner_microstep: 1286.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-11 01:47:36,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1539.13 | bwd_inner_microstep: 1539.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504
[2024-06-11 01:47:38,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.20 | bwd_microstep: 1314.69 | bwd_inner_microstep: 1314.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-11 01:47:40,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1489.79 | bwd_inner_microstep: 1489.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 01:47:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1348.48 | bwd_inner_microstep: 1348.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424
[2024-06-11 01:47:44,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.26 | bwd_microstep: 1407.05 | bwd_inner_microstep: 1407.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2918
[2024-06-11 01:47:45,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.50 | bwd_microstep: 1189.81 | bwd_inner_microstep: 1189.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663
[2024-06-11 01:47:48,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.21 | bwd_microstep: 1717.32 | bwd_inner_microstep: 1717.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3513
[2024-06-11 01:47:50,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1337.70 | bwd_inner_microstep: 1337.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940
[2024-06-11 01:47:52,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.74 | bwd_microstep: 1599.54 | bwd_inner_microstep: 1599.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582
[2024-06-11 01:47:54,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1307.79 | bwd_inner_microstep: 1307.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2590
[2024-06-11 01:47:55,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.09 | bwd_microstep: 1070.35 | bwd_inner_microstep: 1070.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-11 01:47:57,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.78 | bwd_microstep: 1491.97 | bwd_inner_microstep: 1491.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-11 01:47:59,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.07 | bwd_microstep: 1158.48 | bwd_inner_microstep: 1158.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 01:48:01,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3709
[2024-06-11 01:48:02,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1236.21 | bwd_inner_microstep: 1236.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-11 01:48:04,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1282.32 | bwd_inner_microstep: 1282.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905
[2024-06-11 01:48:05,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.07 | bwd_microstep: 684.81 | bwd_inner_microstep: 684.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3777
[2024-06-11 01:48:07,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.04 | bwd_microstep: 1259.65 | bwd_inner_microstep: 1259.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-11 01:48:09,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.42 | bwd_microstep: 1597.19 | bwd_inner_microstep: 1597.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 01:48:11,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1375.13 | bwd_inner_microstep: 1375.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3732
[2024-06-11 01:48:13,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.50 | bwd_microstep: 1733.02 | bwd_inner_microstep: 1733.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-11 01:48:19,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.09 | optimizer_step: 6.63
[2024-06-11 01:48:19,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.06 | bwd_microstep: 5284.43 | bwd_inner_microstep: 1712.54 | bwd_allreduce_microstep: 3571.84 | step_microstep: 38.02
[2024-06-11 01:48:19,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16034.21 | bwd: 46430.22 | bwd_inner: 42857.36 | bwd_allreduce: 3572.13 | step: 39.55
{'loss': 1.1829, 'learning_rate': 2.586420924180837e-06, 'epoch': 0.84}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 01:48:21,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.11 | bwd_microstep: 1369.61 | bwd_inner_microstep: 1369.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 01:48:23,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1373.71 | bwd_inner_microstep: 1373.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 01:48:25,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1648.55 | bwd_inner_microstep: 1648.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492
[2024-06-11 01:48:27,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1413.34 | bwd_inner_microstep: 1413.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 01:48:29,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.76 | bwd_microstep: 1278.68 | bwd_inner_microstep: 1278.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 01:48:31,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-11 01:48:32,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.99 | bwd_microstep: 1278.93 | bwd_inner_microstep: 1278.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 01:48:33,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 791.10 | bwd_inner_microstep: 791.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 01:48:35,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1247.51 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 01:48:37,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.61 | bwd_microstep: 1287.32 | bwd_inner_microstep: 1287.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-11 01:48:39,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.04 | bwd_microstep: 1278.48 | bwd_inner_microstep: 1278.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2461
[2024-06-11 01:48:40,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.76 | bwd_microstep: 1046.66 | bwd_inner_microstep: 1046.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501
[2024-06-11 01:48:42,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1552.11 | bwd_inner_microstep: 1552.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 01:48:44,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1485.74 | bwd_inner_microstep: 1485.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 01:48:46,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1246.45 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3632
[2024-06-11 01:48:48,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1249.38 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519
[2024-06-11 01:48:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1443.82 | bwd_inner_microstep: 1443.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-11 01:48:52,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1610.17 | bwd_inner_microstep: 1610.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658
[2024-06-11 01:48:54,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.54 | bwd_microstep: 1423.83 | bwd_inner_microstep: 1423.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 01:48:56,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1489.55 | bwd_inner_microstep: 1489.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523
[2024-06-11 01:48:58,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1325.49 | bwd_inner_microstep: 1325.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702
[2024-06-11 01:49:00,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1331.07 | bwd_inner_microstep: 1331.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 01:49:02,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1297.45 | bwd_inner_microstep: 1297.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 01:49:03,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.04 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-11 01:49:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1449.63 | bwd_inner_microstep: 1449.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 01:49:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 01:49:09,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1250.95 | bwd_inner_microstep: 1250.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-11 01:49:11,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 01:49:13,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1375.85 | bwd_inner_microstep: 1375.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-11 01:49:15,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.46 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-11 01:49:17,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.82 | bwd_microstep: 1544.34 | bwd_inner_microstep: 1544.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577
[2024-06-11 01:49:19,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.00 | optimizer_step: 6.60
[2024-06-11 01:49:19,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.68 | bwd_microstep: 2215.98 | bwd_inner_microstep: 1443.60 | bwd_allreduce_microstep: 772.33 | step_microstep: 37.38
[2024-06-11 01:49:19,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16150.24 | bwd: 43925.16 | bwd_inner: 43151.93 | bwd_allreduce: 772.56 | step: 38.86
{'loss': 1.1475, 'learning_rate': 2.5679904546005507e-06, 'epoch': 0.84}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 01:49:21,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1373.83 | bwd_inner_microstep: 1373.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857
[2024-06-11 01:49:22,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.36 | bwd_microstep: 674.99 | bwd_inner_microstep: 674.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3847
[2024-06-11 01:49:24,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1362.75 | bwd_inner_microstep: 1362.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 01:49:26,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1378.98 | bwd_inner_microstep: 1378.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 01:49:28,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1542.98 | bwd_inner_microstep: 1542.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 01:49:30,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1249.41 | bwd_inner_microstep: 1249.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-11 01:49:32,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.40 | bwd_microstep: 1188.22 | bwd_inner_microstep: 1188.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-11 01:49:33,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.12 | bwd_microstep: 1342.49 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-11 01:49:36,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.47 | bwd_microstep: 1650.49 | bwd_inner_microstep: 1650.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-11 01:49:37,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.52 | bwd_microstep: 680.46 | bwd_inner_microstep: 680.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3979
[2024-06-11 01:49:39,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.42 | bwd_microstep: 1745.38 | bwd_inner_microstep: 1745.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-11 01:49:41,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.13 | bwd_microstep: 1605.15 | bwd_inner_microstep: 1605.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-11 01:49:43,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.15 | bwd_microstep: 1308.54 | bwd_inner_microstep: 1308.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397
[2024-06-11 01:49:45,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1435.08 | bwd_inner_microstep: 1435.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3465
[2024-06-11 01:49:47,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.20 | bwd_microstep: 1604.68 | bwd_inner_microstep: 1604.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-11 01:49:49,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.77 | bwd_microstep: 1188.41 | bwd_inner_microstep: 1188.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3520
[2024-06-11 01:49:51,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.03 | bwd_microstep: 1318.08 | bwd_inner_microstep: 1318.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-11 01:49:53,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291
[2024-06-11 01:49:54,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.17 | bwd_microstep: 975.29 | bwd_inner_microstep: 975.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-11 01:49:56,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1509.70 | bwd_inner_microstep: 1509.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 01:49:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.99 | bwd_microstep: 1380.33 | bwd_inner_microstep: 1380.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3825
[2024-06-11 01:50:00,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.61 | bwd_microstep: 1510.55 | bwd_inner_microstep: 1510.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2383
[2024-06-11 01:50:02,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.92 | bwd_microstep: 933.00 | bwd_inner_microstep: 932.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3605
[2024-06-11 01:50:04,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.64 | bwd_microstep: 1454.18 | bwd_inner_microstep: 1454.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3381
[2024-06-11 01:50:05,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.11 | bwd_microstep: 1367.29 | bwd_inner_microstep: 1367.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-11 01:50:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1528.16 | bwd_inner_microstep: 1528.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-11 01:50:10,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1449.09 | bwd_inner_microstep: 1449.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055
[2024-06-11 01:50:11,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.56 | bwd_microstep: 815.68 | bwd_inner_microstep: 815.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-11 01:50:13,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1402.78 | bwd_inner_microstep: 1402.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-11 01:50:15,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.29 | bwd_microstep: 1504.57 | bwd_inner_microstep: 1504.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3781
[2024-06-11 01:50:17,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.03 | bwd_microstep: 1352.13 | bwd_inner_microstep: 1352.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-11 01:50:21,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 01:50:21,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.88 | bwd_microstep: 3657.59 | bwd_inner_microstep: 1576.67 | bwd_allreduce_microstep: 2080.87 | step_microstep: 37.82
[2024-06-11 01:50:21,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16011.12 | bwd: 44986.85 | bwd_inner: 42905.07 | bwd_allreduce: 2081.10 | step: 39.24
{'loss': 1.1754, 'learning_rate': 2.5496213804672663e-06, 'epoch': 0.84}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3393
[2024-06-11 01:50:23,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1302.29 | bwd_inner_microstep: 1302.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3064
[2024-06-11 01:50:24,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.33 | bwd_microstep: 1178.58 | bwd_inner_microstep: 1178.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2394
[2024-06-11 01:50:26,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.99 | bwd_microstep: 1000.56 | bwd_inner_microstep: 1000.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801
[2024-06-11 01:50:28,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.66 | bwd_microstep: 1445.71 | bwd_inner_microstep: 1445.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466
[2024-06-11 01:50:29,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.59 | bwd_microstep: 1212.15 | bwd_inner_microstep: 1212.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900
[2024-06-11 01:50:30,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.68 | bwd_microstep: 774.11 | bwd_inner_microstep: 774.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-11 01:50:32,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.44 | bwd_microstep: 1277.99 | bwd_inner_microstep: 1277.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-11 01:50:33,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.47 | bwd_microstep: 797.10 | bwd_inner_microstep: 797.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720
[2024-06-11 01:50:35,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1533.27 | bwd_inner_microstep: 1533.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3975
[2024-06-11 01:50:38,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.62 | bwd_microstep: 1606.67 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3485
[2024-06-11 01:50:39,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.68 | bwd_microstep: 1265.28 | bwd_inner_microstep: 1265.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-11 01:50:42,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1611.96 | bwd_inner_microstep: 1611.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-11 01:50:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.88 | bwd_microstep: 1718.83 | bwd_inner_microstep: 1718.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2948
[2024-06-11 01:50:45,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.51 | bwd_microstep: 1008.74 | bwd_inner_microstep: 1008.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 01:50:47,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-11 01:50:49,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.73 | bwd_microstep: 1290.17 | bwd_inner_microstep: 1290.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 01:50:51,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1412.66 | bwd_inner_microstep: 1412.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449
[2024-06-11 01:50:52,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.91 | bwd_microstep: 1159.09 | bwd_inner_microstep: 1159.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-11 01:50:54,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.44 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-11 01:50:55,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.19 | bwd_microstep: 876.51 | bwd_inner_microstep: 876.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 01:50:57,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1387.54 | bwd_inner_microstep: 1387.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-11 01:50:59,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.99 | bwd_microstep: 1535.10 | bwd_inner_microstep: 1535.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1993
[2024-06-11 01:51:00,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.76 | bwd_microstep: 773.10 | bwd_inner_microstep: 773.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-11 01:51:02,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 01:51:04,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1551.26 | bwd_inner_microstep: 1551.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3529
[2024-06-11 01:51:06,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.00 | bwd_microstep: 1437.64 | bwd_inner_microstep: 1437.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425
[2024-06-11 01:51:07,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.00 | bwd_microstep: 1034.26 | bwd_inner_microstep: 1034.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3578
[2024-06-11 01:51:10,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.11 | bwd_microstep: 1631.93 | bwd_inner_microstep: 1631.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 01:51:12,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1493.23 | bwd_inner_microstep: 1493.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-11 01:51:14,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1451.00 | bwd_inner_microstep: 1450.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3464
[2024-06-11 01:51:16,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1523.68 | bwd_inner_microstep: 1523.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-11 01:51:23,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-11 01:51:23,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 6894.84 | bwd_inner_microstep: 1513.78 | bwd_allreduce_microstep: 5381.00 | step_microstep: 38.27
[2024-06-11 01:51:23,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15411.13 | bwd: 46657.41 | bwd_inner: 41275.50 | bwd_allreduce: 5381.23 | step: 39.72
4:40:48, 61.04s/it]


 84%|████████▍ | 1450/1726 [25:08:52<4:40:48, 61.04s/it]
 84%|████████▍ | 1451/1726 [25:09:53<4:39:55, 61.08s/it]


 84%|████████▍ | 1451/1726 [25:09:53<4:39:55, 61.08s/it]
 84%|████████▍ | 1452/1726 [25:10:56<4:41:16, 61.59s/it]


 84%|████████▍ | 1452/1726 [25:10:56<4:41:16, 61.59s/it]
 84%|████████▍ | 1453/1726 [25:11:56<4:38:37, 61.24s/it]


 84%|████████▍ | 1453/1726 [25:11:56<4:38:37, 61.24s/it]
 84%|████████▍ | 1454/1726 [25:12:58<4:37:43, 61.26s/it]


 84%|████████▍ | 1454/1726 [25:12:58<4:37:43, 61.26s/it]
 84%|████████▍ | 1455{'loss': 1.1774, 'learning_rate': 2.531313766476757e-06, 'epoch': 0.84}
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1888
[2024-06-11 01:51:24,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.22 | bwd_microstep: 704.29 | bwd_inner_microstep: 704.19 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592
[2024-06-11 01:51:26,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.66 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415
[2024-06-11 01:51:28,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1304.55 | bwd_inner_microstep: 1304.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 01:51:30,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1442.20 | bwd_inner_microstep: 1442.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 01:51:32,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.08 | bwd_microstep: 1277.51 | bwd_inner_microstep: 1277.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3748
[2024-06-11 01:51:34,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.11 | bwd_microstep: 1365.38 | bwd_inner_microstep: 1365.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 01:51:36,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.92 | bwd_microstep: 1477.34 | bwd_inner_microstep: 1477.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-11 01:51:38,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.01 | bwd_microstep: 1404.45 | bwd_inner_microstep: 1404.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 01:51:39,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1382.50 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-11 01:51:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.99 | bwd_microstep: 1534.56 | bwd_inner_microstep: 1534.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 01:51:44,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-11 01:51:46,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.01 | bwd_microstep: 1632.08 | bwd_inner_microstep: 1632.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 01:51:48,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.90 | bwd_microstep: 1354.99 | bwd_inner_microstep: 1354.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 01:51:50,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 01:51:52,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1374.10 | bwd_inner_microstep: 1374.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-11 01:51:53,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1254.90 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2131
[2024-06-11 01:51:55,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.12 | bwd_microstep: 1021.63 | bwd_inner_microstep: 1021.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 01:51:57,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1377.31 | bwd_inner_microstep: 1377.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-11 01:51:58,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.19 | bwd_microstep: 1183.62 | bwd_inner_microstep: 1183.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 01:52:00,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1292.84 | bwd_inner_microstep: 1292.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3753
[2024-06-11 01:52:02,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1400.90 | bwd_inner_microstep: 1400.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-11 01:52:04,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.68 | bwd_microstep: 1160.33 | bwd_inner_microstep: 1160.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3311
[2024-06-11 01:52:05,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.77 | bwd_microstep: 1228.71 | bwd_inner_microstep: 1228.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031
[2024-06-11 01:52:06,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 809.21 | bwd_inner_microstep: 809.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-11 01:52:08,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1389.09 | bwd_inner_microstep: 1389.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 01:52:10,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1391.75 | bwd_inner_microstep: 1391.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4066
[2024-06-11 01:52:13,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.87 | bwd_microstep: 1618.63 | bwd_inner_microstep: 1618.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473
[2024-06-11 01:52:14,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.84 | bwd_microstep: 1326.06 | bwd_inner_microstep: 1326.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568
[2024-06-11 01:52:17,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.36 | bwd_microstep: 1591.70 | bwd_inner_microstep: 1591.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3063
[2024-06-11 01:52:18,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.97 | bwd_microstep: 1270.97 | bwd_inner_microstep: 1270.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773
[2024-06-11 01:52:21,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.93 | bwd_microstep: 1737.36 | bwd_inner_microstep: 1737.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3584
[2024-06-11 01:52:25,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-11 01:52:25,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.65 | bwd_microstep: 3366.63 | bwd_inner_microstep: 1668.37 | bwd_allreduce_microstep: 1698.22 | step_microstep: 37.86
[2024-06-11 01:52:25,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.49 | bwd: 44941.64 | bwd_inner: 43242.43 | bwd_allreduce: 1698.50 | step: 39.31
{'loss': 1.1836, 'learning_rate': 2.5130676771083585e-06, 'epoch': 0.84}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958
[2024-06-11 01:52:26,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.80 | bwd_microstep: 885.09 | bwd_inner_microstep: 885.00 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3944
[2024-06-11 01:52:28,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1594.98 | bwd_inner_microstep: 1594.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873
[2024-06-11 01:52:30,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.17 | bwd_microstep: 1679.60 | bwd_inner_microstep: 1679.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 01:52:32,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1379.05 | bwd_inner_microstep: 1379.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 01:52:34,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1244.33 | bwd_inner_microstep: 1244.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 01:52:36,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1281.34 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-11 01:52:37,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.73 | bwd_microstep: 790.11 | bwd_inner_microstep: 790.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-11 01:52:39,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1287.78 | bwd_inner_microstep: 1287.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 01:52:40,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1246.55 | bwd_inner_microstep: 1246.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 01:52:42,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.51 | bwd_microstep: 1280.46 | bwd_inner_microstep: 1280.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 01:52:44,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1251.86 | bwd_inner_microstep: 1251.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501
[2024-06-11 01:52:46,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.84 | bwd_microstep: 1221.90 | bwd_inner_microstep: 1221.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 01:52:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.43 | bwd_microstep: 1480.54 | bwd_inner_microstep: 1480.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970
[2024-06-11 01:52:49,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.91 | bwd_microstep: 892.35 | bwd_inner_microstep: 892.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654
[2024-06-11 01:52:51,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.36 | bwd_microstep: 1612.68 | bwd_inner_microstep: 1612.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 01:52:53,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495
[2024-06-11 01:52:55,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1413.37 | bwd_inner_microstep: 1413.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-11 01:52:57,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.99 | bwd_microstep: 1557.93 | bwd_inner_microstep: 1557.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294
[2024-06-11 01:52:59,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.99 | bwd_microstep: 1072.68 | bwd_inner_microstep: 1072.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 01:53:00,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.99 | bwd_microstep: 1284.33 | bwd_inner_microstep: 1284.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3438
[2024-06-11 01:53:02,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.69 | bwd_microstep: 1300.42 | bwd_inner_microstep: 1300.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-11 01:53:04,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.16 | bwd_microstep: 1613.81 | bwd_inner_microstep: 1613.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827
[2024-06-11 01:53:06,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.13 | bwd_microstep: 1388.61 | bwd_inner_microstep: 1388.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-11 01:53:08,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1392.80 | bwd_inner_microstep: 1392.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 01:53:10,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1494.44 | bwd_inner_microstep: 1494.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3391
[2024-06-11 01:53:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.81 | bwd_microstep: 1338.56 | bwd_inner_microstep: 1338.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3813
[2024-06-11 01:53:14,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.06 | bwd_microstep: 1506.63 | bwd_inner_microstep: 1506.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3593
[2024-06-11 01:53:17,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.66 | bwd_microstep: 1705.66 | bwd_inner_microstep: 1705.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192
[2024-06-11 01:53:18,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.98 | bwd_microstep: 956.56 | bwd_inner_microstep: 956.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 01:53:20,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1403.36 | bwd_inner_microstep: 1403.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3582
[2024-06-11 01:53:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.65 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773
[2024-06-11 01:53:28,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 01:53:28,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 5843.81 | bwd_inner_microstep: 1746.36 | bwd_allreduce_microstep: 4097.40 | step_microstep: 39.43
[2024-06-11 01:53:28,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15991.91 | bwd: 47035.26 | bwd_inner: 42936.88 | bwd_allreduce: 4097.67 | step: 40.91
{'loss': 1.2132, 'learning_rate': 2.494883176624694e-06, 'epoch': 0.84}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 01:53:30,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.83 | bwd_microstep: 1276.09 | bwd_inner_microstep: 1276.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 01:53:32,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1342.35 | bwd_inner_microstep: 1342.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 01:53:33,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1346.50 | bwd_inner_microstep: 1346.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 01:53:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1650.42 | bwd_inner_microstep: 1650.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801
[2024-06-11 01:53:38,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.86 | bwd_microstep: 1444.86 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-11 01:53:39,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1246.41 | bwd_inner_microstep: 1246.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 01:53:41,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1396.61 | bwd_inner_microstep: 1396.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 01:53:43,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1245.32 | bwd_inner_microstep: 1245.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-11 01:53:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.28 | bwd_microstep: 1527.11 | bwd_inner_microstep: 1527.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 01:53:47,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1258.43 | bwd_inner_microstep: 1258.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2029
[2024-06-11 01:53:48,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.84 | bwd_microstep: 904.92 | bwd_inner_microstep: 904.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-11 01:53:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.02 | bwd_microstep: 1338.29 | bwd_inner_microstep: 1338.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3512
[2024-06-11 01:53:52,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1519.04 | bwd_inner_microstep: 1519.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3690
[2024-06-11 01:53:54,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.16 | bwd_microstep: 1671.11 | bwd_inner_microstep: 1671.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3646
[2024-06-11 01:53:56,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.40 | bwd_microstep: 1312.80 | bwd_inner_microstep: 1312.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3428
[2024-06-11 01:53:58,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1470.33 | bwd_inner_microstep: 1470.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 01:54:00,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1389.84 | bwd_inner_microstep: 1389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651
[2024-06-11 01:54:02,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1413.82 | bwd_inner_microstep: 1413.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 01:54:04,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-11 01:54:06,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1489.42 | bwd_inner_microstep: 1489.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 01:54:08,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1498.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137
[2024-06-11 01:54:09,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.42 | bwd_microstep: 932.49 | bwd_inner_microstep: 932.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3685
[2024-06-11 01:54:12,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1556.03 | bwd_inner_microstep: 1556.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 01:54:14,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.37 | bwd_microstep: 1554.90 | bwd_inner_microstep: 1554.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-11 01:54:16,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1432.39 | bwd_inner_microstep: 1432.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-11 01:54:18,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1507.59 | bwd_inner_microstep: 1507.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2009
[2024-06-11 01:54:19,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.09 | bwd_microstep: 832.19 | bwd_inner_microstep: 832.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2078
[2024-06-11 01:54:20,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.71 | bwd_microstep: 916.91 | bwd_inner_microstep: 916.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3561
[2024-06-11 01:54:22,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.04 | bwd_microstep: 1427.18 | bwd_inner_microstep: 1427.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275
[2024-06-11 01:54:23,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.19 | bwd_microstep: 909.41 | bwd_inner_microstep: 909.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-11 01:54:25,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1399.48 | bwd_inner_microstep: 1399.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-11 01:54:30,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.04 | optimizer_step: 6.61
[2024-06-11 01:54:30,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.52 | bwd_microstep: 3817.71 | bwd_inner_microstep: 2231.56 | bwd_allreduce_microstep: 1586.11 | step_microstep: 37.50
[2024-06-11 01:54:30,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16157.44 | bwd: 45418.75 | bwd_inner: 43831.73 | bwd_allreduce: 1586.34 | step: 39.06
{'loss': 1.2044, 'learning_rate': 2.4767603290714812e-06, 'epoch': 0.84}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-11 01:54:32,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1578.15 | bwd_inner_microstep: 1578.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3938
[2024-06-11 01:54:34,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.29 | bwd_microstep: 1692.56 | bwd_inner_microstep: 1692.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3874
[2024-06-11 01:54:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.14 | bwd_microstep: 1478.76 | bwd_inner_microstep: 1478.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-11 01:54:38,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.18 | bwd_microstep: 970.93 | bwd_inner_microstep: 970.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-11 01:54:40,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1315.26 | bwd_inner_microstep: 1315.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 01:54:41,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-11 01:54:42,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.29 | bwd_microstep: 1156.13 | bwd_inner_microstep: 1156.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 01:54:44,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1486.00 | bwd_inner_microstep: 1485.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-11 01:54:46,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.80 | bwd_microstep: 1149.26 | bwd_inner_microstep: 1149.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1365
[2024-06-11 01:54:47,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.47 | bwd_microstep: 551.98 | bwd_inner_microstep: 551.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414
[2024-06-11 01:54:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.19 | bwd_microstep: 1209.62 | bwd_inner_microstep: 1209.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501
[2024-06-11 01:54:51,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.03 | bwd_microstep: 1550.37 | bwd_inner_microstep: 1550.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 01:54:53,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1494.31 | bwd_inner_microstep: 1494.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 01:54:55,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1483.44 | bwd_inner_microstep: 1483.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-11 01:54:57,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1350.28 | bwd_inner_microstep: 1350.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2966
[2024-06-11 01:54:58,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.22 | bwd_microstep: 1102.30 | bwd_inner_microstep: 1102.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3680
[2024-06-11 01:55:00,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.69 | bwd_microstep: 1695.47 | bwd_inner_microstep: 1695.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 01:55:02,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1249.22 | bwd_inner_microstep: 1249.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3937
[2024-06-11 01:55:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1700.64 | bwd_inner_microstep: 1700.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 01:55:06,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1296.73 | bwd_inner_microstep: 1296.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2172
[2024-06-11 01:55:07,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.52 | bwd_microstep: 885.44 | bwd_inner_microstep: 885.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3623
[2024-06-11 01:55:09,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1444.82 | bwd_inner_microstep: 1444.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-11 01:55:11,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.41 | bwd_microstep: 974.51 | bwd_inner_microstep: 974.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940
[2024-06-11 01:55:13,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.55 | bwd_microstep: 1602.37 | bwd_inner_microstep: 1602.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3919
[2024-06-11 01:55:15,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.77 | bwd_microstep: 1691.19 | bwd_inner_microstep: 1691.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 01:55:18,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.92 | bwd_microstep: 1644.89 | bwd_inner_microstep: 1644.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-11 01:55:19,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 697.42 | bwd_inner_microstep: 697.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-11 01:55:21,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.48 | bwd_microstep: 1664.97 | bwd_inner_microstep: 1664.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2039
[2024-06-11 01:55:22,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.65 | bwd_microstep: 844.63 | bwd_inner_microstep: 844.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-11 01:55:24,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1342.90 | bwd_inner_microstep: 1342.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-11 01:55:26,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1600.20 | bwd_inner_microstep: 1600.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797
[2024-06-11 01:55:40,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-11 01:55:40,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.85 | bwd_microstep: 13049.66 | bwd_inner_microstep: 1703.26 | bwd_allreduce_microstep: 11346.33 | step_microstep: 38.82
[2024-06-11 01:55:40,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15751.68 | bwd: 53744.90 | bwd_inner: 42397.65 | bwd_allreduce: 11346.58 | step: 40.24
{'loss': 1.1637, 'learning_rate': 2.45869919827729e-06, 'epoch': 0.85}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3403
[2024-06-11 01:55:42,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.52 | bwd_microstep: 1377.67 | bwd_inner_microstep: 1377.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-11 01:55:43,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.49 | bwd_microstep: 1145.76 | bwd_inner_microstep: 1145.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3883
[2024-06-11 01:55:45,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.75 | bwd_microstep: 1482.19 | bwd_inner_microstep: 1482.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-11 01:55:47,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.98 | bwd_microstep: 1486.77 | bwd_inner_microstep: 1486.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3913
[2024-06-11 01:55:49,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.15 | bwd_microstep: 1547.46 | bwd_inner_microstep: 1547.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2036
[2024-06-11 01:55:51,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.72 | bwd_microstep: 745.95 | bwd_inner_microstep: 745.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 01:55:52,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.49 | bwd_microstep: 1343.68 | bwd_inner_microstep: 1343.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 01:55:54,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.59 | bwd_microstep: 1242.25 | bwd_inner_microstep: 1242.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 01:55:56,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3961
[2024-06-11 01:55:58,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.78 | bwd_microstep: 1558.32 | bwd_inner_microstep: 1558.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3494
[2024-06-11 01:56:00,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.42 | bwd_microstep: 1510.64 | bwd_inner_microstep: 1510.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 01:56:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1374.18 | bwd_inner_microstep: 1374.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-11 01:56:04,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1574.23 | bwd_inner_microstep: 1574.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3654
[2024-06-11 01:56:07,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.89 | bwd_microstep: 1817.30 | bwd_inner_microstep: 1817.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3465
[2024-06-11 01:56:09,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1452.08 | bwd_inner_microstep: 1452.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3644
[2024-06-11 01:56:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.62 | bwd_microstep: 1775.65 | bwd_inner_microstep: 1775.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 01:56:13,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.48 | bwd_microstep: 1475.04 | bwd_inner_microstep: 1475.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2121
[2024-06-11 01:56:14,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.41 | bwd_microstep: 924.77 | bwd_inner_microstep: 924.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 01:56:16,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.80 | bwd_microstep: 1340.32 | bwd_inner_microstep: 1340.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-11 01:56:18,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.80 | bwd_microstep: 1255.88 | bwd_inner_microstep: 1255.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3531
[2024-06-11 01:56:20,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.55 | bwd_microstep: 1196.77 | bwd_inner_microstep: 1196.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 01:56:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1397.01 | bwd_inner_microstep: 1396.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-11 01:56:23,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1387.61 | bwd_inner_microstep: 1387.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3602
[2024-06-11 01:56:25,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.31 | bwd_microstep: 1458.89 | bwd_inner_microstep: 1458.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-11 01:56:27,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1395.53 | bwd_inner_microstep: 1395.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-11 01:56:30,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.74 | bwd_microstep: 1641.72 | bwd_inner_microstep: 1641.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-11 01:56:31,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.41 | bwd_microstep: 916.94 | bwd_inner_microstep: 916.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-11 01:56:33,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1453.58 | bwd_inner_microstep: 1453.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3815
[2024-06-11 01:56:35,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1416.48 | bwd_inner_microstep: 1416.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 01:56:37,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1556.38 | bwd_inner_microstep: 1556.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-11 01:56:39,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1590.98 | bwd_inner_microstep: 1590.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3587
[2024-06-11 01:56:59,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-11 01:56:59,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.67 | bwd_microstep: 19476.88 | bwd_inner_microstep: 1924.25 | bwd_allreduce_microstep: 17552.55 | step_microstep: 39.53
[2024-06-11 01:56:59,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16704.02 | bwd: 62565.57 | bwd_inner: 45012.07 | bwd_allreduce: 17552.80 | step: 40.98
/1726 [25:14:00<4:38:14, 61.60s/it]


 84%|████████▍ | 1455/1726 [25:14:00<4:38:14, 61.60s/it]
 84%|████████▍ | 1456/1726 [25:15:01<4:36:58, 61.55s/it]


 84%|████████▍ | 1456/1726 [25:15:01<4:36:58, 61.55s/it]
 84%|████████▍ | 1457/1726 [25:16:05<4:38:23, 62.09s/it]


 84%|████████▍ | 1457/1726 [25:16:05<4:38:23, 62.09s/it]
 84%|████████▍ | 1458/1726 [25:17:07<4:37:06, 62.04s/it]


 84%|████████▍ | 1458/1726 [25:17:07<4:37:06, 62.04s/it]
 85%|████████▍ | 1459/1726 [25:18:16<4:46:28, 64.38s/it]


 85%|████████▍ | 1459/1726 [25:18:16<4:46:28, 64.38s/it]
 85%|██████{'loss': 1.2112, 'learning_rate': 2.4406998478533384e-06, 'epoch': 0.85}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-11 01:57:01,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.36 | bwd_microstep: 1427.42 | bwd_inner_microstep: 1427.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-11 01:57:04,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.69 | bwd_microstep: 1674.00 | bwd_inner_microstep: 1673.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 01:57:06,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1373.53 | bwd_inner_microstep: 1373.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-11 01:57:07,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.51 | bwd_microstep: 1274.02 | bwd_inner_microstep: 1274.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-11 01:57:09,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.25 | bwd_microstep: 1533.11 | bwd_inner_microstep: 1533.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522
[2024-06-11 01:57:11,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.17 | bwd_microstep: 1193.98 | bwd_inner_microstep: 1193.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 01:57:13,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1382.01 | bwd_inner_microstep: 1381.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 01:57:15,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1254.07 | bwd_inner_microstep: 1254.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-11 01:57:16,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.43 | bwd_microstep: 1276.73 | bwd_inner_microstep: 1276.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-11 01:57:18,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.10 | bwd_microstep: 1424.43 | bwd_inner_microstep: 1424.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 01:57:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1483.90 | bwd_inner_microstep: 1483.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-11 01:57:23,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.42 | bwd_microstep: 1615.63 | bwd_inner_microstep: 1615.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1962
[2024-06-11 01:57:24,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.07 | bwd_microstep: 856.96 | bwd_inner_microstep: 856.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2465
[2024-06-11 01:57:25,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.36 | bwd_microstep: 1045.66 | bwd_inner_microstep: 1045.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-11 01:57:27,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.23 | bwd_microstep: 1481.06 | bwd_inner_microstep: 1481.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105
[2024-06-11 01:57:29,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.20 | bwd_microstep: 918.53 | bwd_inner_microstep: 918.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 01:57:31,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1390.71 | bwd_inner_microstep: 1390.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1926
[2024-06-11 01:57:32,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.25 | bwd_microstep: 760.52 | bwd_inner_microstep: 760.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-11 01:57:34,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1627.98 | bwd_inner_microstep: 1627.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 01:57:36,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 01:57:37,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1283.99 | bwd_inner_microstep: 1283.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 01:57:39,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1418.08 | bwd_inner_microstep: 1418.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292
[2024-06-11 01:57:41,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 883.92 | bwd_inner_microstep: 883.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2287
[2024-06-11 01:57:42,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.04 | bwd_microstep: 939.18 | bwd_inner_microstep: 939.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 643
[2024-06-11 01:57:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.31 | bwd_microstep: 275.81 | bwd_inner_microstep: 275.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-11 01:57:44,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.07 | bwd_microstep: 1509.37 | bwd_inner_microstep: 1509.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3602
[2024-06-11 01:57:46,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.49 | bwd_microstep: 1517.25 | bwd_inner_microstep: 1517.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 01:57:48,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1400.81 | bwd_inner_microstep: 1400.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-11 01:57:50,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.43 | bwd_microstep: 915.04 | bwd_inner_microstep: 915.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 01:57:52,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1550.02 | bwd_inner_microstep: 1549.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821
[2024-06-11 01:57:54,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.50 | bwd_microstep: 1750.46 | bwd_inner_microstep: 1750.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 01:58:00,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-11 01:58:00,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.81 | bwd_microstep: 5524.58 | bwd_inner_microstep: 1573.39 | bwd_allreduce_microstep: 3951.14 | step_microstep: 37.89
[2024-06-11 01:58:00,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15387.82 | bwd: 45246.39 | bwd_inner: 41294.33 | bwd_allreduce: 3951.37 | step: 39.44
{'loss': 1.1648, 'learning_rate': 2.4227623411932412e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2903
[2024-06-11 01:58:02,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.27 | bwd_microstep: 1209.43 | bwd_inner_microstep: 1209.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 01:58:04,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.34 | bwd_microstep: 1240.40 | bwd_inner_microstep: 1240.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915
[2024-06-11 01:58:06,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-11 01:58:08,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.16 | bwd_microstep: 1340.73 | bwd_inner_microstep: 1340.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874
[2024-06-11 01:58:09,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.60 | bwd_microstep: 679.11 | bwd_inner_microstep: 679.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 01:58:10,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1244.06 | bwd_inner_microstep: 1244.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 01:58:12,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1285.05 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 01:58:14,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1246.86 | bwd_inner_microstep: 1246.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542
[2024-06-11 01:58:16,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 1450.23 | bwd_inner_microstep: 1450.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 01:58:18,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.76 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3744
[2024-06-11 01:58:20,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.32 | bwd_microstep: 1832.96 | bwd_inner_microstep: 1832.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2909
[2024-06-11 01:58:22,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.14 | bwd_microstep: 1186.73 | bwd_inner_microstep: 1186.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3489
[2024-06-11 01:58:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.76 | bwd_microstep: 1542.32 | bwd_inner_microstep: 1542.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 01:58:26,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.75 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672
[2024-06-11 01:58:28,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.38 | bwd_microstep: 1548.12 | bwd_inner_microstep: 1548.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 01:58:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1285.46 | bwd_inner_microstep: 1285.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2124
[2024-06-11 01:58:31,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.94 | bwd_microstep: 766.49 | bwd_inner_microstep: 766.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-11 01:58:33,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.23 | bwd_microstep: 1422.91 | bwd_inner_microstep: 1422.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 01:58:35,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1275.26 | bwd_inner_microstep: 1275.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-11 01:58:37,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1609.18 | bwd_inner_microstep: 1609.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-11 01:58:38,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.52 | bwd_microstep: 974.32 | bwd_inner_microstep: 974.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941
[2024-06-11 01:58:39,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.91 | bwd_microstep: 695.25 | bwd_inner_microstep: 695.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-11 01:58:40,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.15 | bwd_microstep: 801.05 | bwd_inner_microstep: 801.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-11 01:58:42,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1497.54 | bwd_inner_microstep: 1497.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 01:58:45,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.51 | bwd_microstep: 1546.23 | bwd_inner_microstep: 1546.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-11 01:58:46,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.69 | bwd_microstep: 1406.08 | bwd_inner_microstep: 1406.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 01:58:49,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.68 | bwd_microstep: 1649.38 | bwd_inner_microstep: 1649.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3805
[2024-06-11 01:58:51,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1291.35 | bwd_inner_microstep: 1291.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2274
[2024-06-11 01:58:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.00 | bwd_microstep: 824.33 | bwd_inner_microstep: 824.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566
[2024-06-11 01:58:53,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1299.63 | bwd_inner_microstep: 1299.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2713
[2024-06-11 01:58:55,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.49 | bwd_microstep: 1130.25 | bwd_inner_microstep: 1130.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891
[2024-06-11 01:59:01,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 01:59:01,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.62 | bwd_microstep: 6009.53 | bwd_inner_microstep: 953.64 | bwd_allreduce_microstep: 5055.83 | step_microstep: 37.64
[2024-06-11 01:59:01,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15134.09 | bwd: 45606.73 | bwd_inner: 40549.99 | bwd_allreduce: 5056.06 | step: 39.09
{'loss': 1.2392, 'learning_rate': 2.4048867414728004e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-11 01:59:03,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1391.42 | bwd_inner_microstep: 1391.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3926
[2024-06-11 01:59:06,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.24 | bwd_microstep: 1689.51 | bwd_inner_microstep: 1689.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-11 01:59:07,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.25 | bwd_microstep: 1241.79 | bwd_inner_microstep: 1241.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787
[2024-06-11 01:59:09,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.36 | bwd_microstep: 1441.86 | bwd_inner_microstep: 1441.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-11 01:59:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.02 | bwd_microstep: 707.18 | bwd_inner_microstep: 707.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-11 01:59:13,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.39 | bwd_microstep: 1639.63 | bwd_inner_microstep: 1639.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 01:59:14,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-11 01:59:15,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 805.88 | bwd_inner_microstep: 805.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484
[2024-06-11 01:59:17,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1411.74 | bwd_inner_microstep: 1411.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469
[2024-06-11 01:59:20,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.97 | bwd_microstep: 1538.91 | bwd_inner_microstep: 1538.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 01:59:22,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.06 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 01:59:24,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.38 | bwd_microstep: 1503.91 | bwd_inner_microstep: 1503.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3658
[2024-06-11 01:59:26,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.24 | bwd_microstep: 1818.87 | bwd_inner_microstep: 1818.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942
[2024-06-11 01:59:27,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.39 | bwd_microstep: 881.46 | bwd_inner_microstep: 881.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505
[2024-06-11 01:59:29,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.76 | bwd_microstep: 1223.20 | bwd_inner_microstep: 1223.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3931
[2024-06-11 01:59:31,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-11 01:59:33,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1402.18 | bwd_inner_microstep: 1402.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2130
[2024-06-11 01:59:34,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.32 | bwd_microstep: 733.84 | bwd_inner_microstep: 733.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 01:59:36,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1252.46 | bwd_inner_microstep: 1252.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-11 01:59:38,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1509.29 | bwd_inner_microstep: 1509.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 01:59:40,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1400.01 | bwd_inner_microstep: 1399.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-11 01:59:41,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.44 | bwd_microstep: 1183.20 | bwd_inner_microstep: 1183.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 01:59:43,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1552.82 | bwd_inner_microstep: 1552.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2273
[2024-06-11 01:59:45,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.74 | bwd_microstep: 1003.49 | bwd_inner_microstep: 1003.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815
[2024-06-11 01:59:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.01 | bwd_microstep: 1753.37 | bwd_inner_microstep: 1753.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388
[2024-06-11 01:59:49,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1438.69 | bwd_inner_microstep: 1438.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420
[2024-06-11 01:59:51,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1442.56 | bwd_inner_microstep: 1442.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3454
[2024-06-11 01:59:53,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1417.66 | bwd_inner_microstep: 1417.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-11 01:59:55,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.52 | bwd_microstep: 1446.98 | bwd_inner_microstep: 1446.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771
[2024-06-11 01:59:57,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.57 | bwd_microstep: 1447.68 | bwd_inner_microstep: 1447.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3796
[2024-06-11 01:59:59,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 1476.23 | bwd_inner_microstep: 1476.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3763
[2024-06-11 02:00:02,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-11 02:00:02,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 2722.63 | bwd_inner_microstep: 1512.92 | bwd_allreduce_microstep: 1209.66 | step_microstep: 37.84
[2024-06-11 02:00:02,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16151.72 | bwd: 44602.14 | bwd_inner: 43391.58 | bwd_allreduce: 1209.89 | step: 39.28
{'loss': 1.1369, 'learning_rate': 2.3870731116497915e-06, 'epoch': 0.85}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 02:00:04,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.64 | bwd_microstep: 1274.80 | bwd_inner_microstep: 1274.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1884
[2024-06-11 02:00:05,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.95 | bwd_microstep: 771.02 | bwd_inner_microstep: 771.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3921
[2024-06-11 02:00:07,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.96 | bwd_microstep: 1520.17 | bwd_inner_microstep: 1520.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4204
[2024-06-11 02:00:09,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.64 | bwd_microstep: 1458.84 | bwd_inner_microstep: 1458.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-11 02:00:12,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.66 | bwd_microstep: 1546.85 | bwd_inner_microstep: 1546.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-11 02:00:13,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.72 | bwd_microstep: 807.11 | bwd_inner_microstep: 807.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-11 02:00:15,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.48 | bwd_microstep: 1530.99 | bwd_inner_microstep: 1530.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 02:00:17,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.50 | bwd_microstep: 1279.56 | bwd_inner_microstep: 1279.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-11 02:00:18,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.10 | bwd_microstep: 1150.96 | bwd_inner_microstep: 1150.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 02:00:20,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1388.44 | bwd_inner_microstep: 1388.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2115
[2024-06-11 02:00:21,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.09 | bwd_microstep: 982.19 | bwd_inner_microstep: 982.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3498
[2024-06-11 02:00:23,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.12 | bwd_microstep: 1428.08 | bwd_inner_microstep: 1428.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3630
[2024-06-11 02:00:26,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.25 | bwd_microstep: 1676.06 | bwd_inner_microstep: 1676.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 02:00:28,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1481.97 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2094
[2024-06-11 02:00:29,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.54 | bwd_microstep: 918.76 | bwd_inner_microstep: 918.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3631
[2024-06-11 02:00:31,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1269.82 | bwd_inner_microstep: 1269.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-11 02:00:33,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1613.29 | bwd_inner_microstep: 1613.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-11 02:00:35,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1515.40 | bwd_inner_microstep: 1515.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090
[2024-06-11 02:00:36,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.78 | bwd_microstep: 918.50 | bwd_inner_microstep: 918.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-11 02:00:37,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.92 | bwd_microstep: 805.94 | bwd_inner_microstep: 805.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-11 02:00:39,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1290.39 | bwd_inner_microstep: 1290.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 02:00:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.79 | bwd_microstep: 1658.58 | bwd_inner_microstep: 1658.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2070
[2024-06-11 02:00:43,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.62 | bwd_microstep: 753.62 | bwd_inner_microstep: 753.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-11 02:00:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.16 | bwd_microstep: 969.18 | bwd_inner_microstep: 969.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3549
[2024-06-11 02:00:46,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2953
[2024-06-11 02:00:48,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1329.45 | bwd_inner_microstep: 1329.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600
[2024-06-11 02:00:50,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.81 | bwd_microstep: 1430.22 | bwd_inner_microstep: 1430.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-11 02:00:52,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.95 | bwd_microstep: 1441.10 | bwd_inner_microstep: 1441.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-11 02:00:54,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.97 | bwd_microstep: 1505.20 | bwd_inner_microstep: 1505.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-11 02:00:56,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1513.83 | bwd_inner_microstep: 1513.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019
[2024-06-11 02:00:57,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.39 | bwd_microstep: 839.67 | bwd_inner_microstep: 839.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576
[2024-06-11 02:01:03,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 02:01:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.37 | bwd_microstep: 5369.07 | bwd_inner_microstep: 1922.71 | bwd_allreduce_microstep: 3446.31 | step_microstep: 37.83
[2024-06-11 02:01:03,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15424.65 | bwd: 44996.81 | bwd_inner: 41549.59 | bwd_allreduce: 3446.54 | step: 39.33
{'loss': 1.1548, 'learning_rate': 2.369321514463716e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393
[2024-06-11 02:01:05,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1335.92 | bwd_inner_microstep: 1335.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891
[2024-06-11 02:01:07,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.99 | bwd_microstep: 1583.83 | bwd_inner_microstep: 1583.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 02:01:09,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.81 | bwd_microstep: 1240.97 | bwd_inner_microstep: 1240.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-11 02:01:11,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.68 | bwd_microstep: 1318.54 | bwd_inner_microstep: 1318.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 02:01:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 02:01:15,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.31 | bwd_microstep: 1382.06 | bwd_inner_microstep: 1382.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743
[2024-06-11 02:01:17,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 1430.86 | bwd_inner_microstep: 1430.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 02:01:18,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-11 02:01:20,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.58 | bwd_microstep: 1149.60 | bwd_inner_microstep: 1149.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434
[2024-06-11 02:01:22,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.37 | bwd_microstep: 1308.71 | bwd_inner_microstep: 1308.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3717
[2024-06-11 02:01:24,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.17 | bwd_microstep: 1729.62 | bwd_inner_microstep: 1729.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 02:01:26,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 02:01:28,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1336.52 | bwd_inner_microstep: 1336.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-11 02:01:30,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1609.33 | bwd_inner_microstep: 1609.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3510
[2024-06-11 02:01:32,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1429.07 | bwd_inner_microstep: 1429.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-11 02:01:34,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1395.20 | bwd_inner_microstep: 1395.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 02:01:36,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1288.36 | bwd_inner_microstep: 1288.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914
[2024-06-11 02:01:37,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 717.41 | bwd_inner_microstep: 717.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-11 02:01:39,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.63 | bwd_microstep: 1402.73 | bwd_inner_microstep: 1402.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 02:01:41,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1397.52 | bwd_inner_microstep: 1397.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532
[2024-06-11 02:01:43,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.14 | bwd_microstep: 1293.11 | bwd_inner_microstep: 1293.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-11 02:01:44,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.94 | bwd_microstep: 817.54 | bwd_inner_microstep: 817.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 02:01:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1255.54 | bwd_inner_microstep: 1255.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909
[2024-06-11 02:01:46,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.69 | bwd_microstep: 685.12 | bwd_inner_microstep: 685.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 02:01:49,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1555.56 | bwd_inner_microstep: 1555.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-11 02:01:50,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.39 | bwd_microstep: 1300.03 | bwd_inner_microstep: 1300.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3435
[2024-06-11 02:01:52,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1295.23 | bwd_inner_microstep: 1295.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433
[2024-06-11 02:01:54,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1443.68 | bwd_inner_microstep: 1443.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 02:01:56,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1351.00 | bwd_inner_microstep: 1350.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-11 02:01:58,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1436.99 | bwd_inner_microstep: 1436.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3089
[2024-06-11 02:02:00,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.31 | bwd_microstep: 1150.49 | bwd_inner_microstep: 1150.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2978
[2024-06-11 02:02:04,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.06 | optimizer_step: 6.60
[2024-06-11 02:02:04,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.33 | bwd_microstep: 4292.24 | bwd_inner_microstep: 1360.87 | bwd_allreduce_microstep: 2931.32 | step_microstep: 37.95
[2024-06-11 02:02:04,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.06 | bwd: 45040.07 | bwd_inner: 42107.85 | bwd_allreduce: 2931.55 | step: 39.44
██▍ | 1460/1726 [25:19:36<5:05:39, 68.95s/it]


 85%|████████▍ | 1460/1726 [25:19:36<5:05:39, 68.95s/it]
 85%|████████▍ | 1461/1726 [25:20:37<4:53:57, 66.56s/it]


 85%|████████▍ | 1461/1726 [25:20:37<4:53:57, 66.56s/it]
 85%|████████▍ | 1462/1726 [25:21:38<4:45:35, 64.91s/it]


 85%|████████▍ | 1462/1726 [25:21:38<4:45:35, 64.91s/it]
 85%|████████▍ | 1463/1726 [25:22:39<4:39:29, 63.76s/it]


 85%|████████▍ | 1463/1726 [25:22:39<4:39:29, 63.76s/it]
 85%|████████▍ | 1464/1726 [25:23:40<4:34:28, 62.86s/it]


 85%|████████▍ | 1464/1726 [25:23:40<4:34:28, 62.86s/it]
 85%|�{'loss': 1.2122, 'learning_rate': 2.3516320124356186e-06, 'epoch': 0.85}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537
[2024-06-11 02:02:06,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.02 | bwd_microstep: 1441.46 | bwd_inner_microstep: 1441.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 02:02:08,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.31 | bwd_microstep: 1384.71 | bwd_inner_microstep: 1384.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921
[2024-06-11 02:02:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.71 | bwd_microstep: 785.23 | bwd_inner_microstep: 785.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 02:02:12,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1653.70 | bwd_inner_microstep: 1653.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 02:02:14,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1479.33 | bwd_inner_microstep: 1479.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 02:02:15,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1246.84 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-11 02:02:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.85 | bwd_microstep: 1642.07 | bwd_inner_microstep: 1642.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-11 02:02:20,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.53 | bwd_microstep: 1628.99 | bwd_inner_microstep: 1628.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-11 02:02:22,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1247.68 | bwd_inner_microstep: 1247.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440
[2024-06-11 02:02:24,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.34 | bwd_microstep: 1411.37 | bwd_inner_microstep: 1411.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3460
[2024-06-11 02:02:26,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 1427.99 | bwd_inner_microstep: 1427.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3450
[2024-06-11 02:02:27,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.15 | bwd_microstep: 1376.05 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-11 02:02:29,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1491.78 | bwd_inner_microstep: 1491.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 02:02:31,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-11 02:02:32,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.32 | bwd_microstep: 729.62 | bwd_inner_microstep: 729.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 02:02:34,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-11 02:02:36,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1418.65 | bwd_inner_microstep: 1418.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 02:02:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1391.24 | bwd_inner_microstep: 1391.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 02:02:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1354.48 | bwd_inner_microstep: 1354.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 02:02:42,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1253.90 | bwd_inner_microstep: 1253.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3814
[2024-06-11 02:02:44,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.29 | bwd_microstep: 1416.40 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-11 02:02:45,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1160.48 | bwd_inner_microstep: 1160.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2272
[2024-06-11 02:02:47,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.62 | bwd_microstep: 973.35 | bwd_inner_microstep: 973.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3512
[2024-06-11 02:02:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.20 | bwd_microstep: 1226.93 | bwd_inner_microstep: 1226.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 02:02:50,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3050
[2024-06-11 02:02:52,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.11 | bwd_microstep: 1136.52 | bwd_inner_microstep: 1136.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-11 02:02:54,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.82 | bwd_microstep: 1525.17 | bwd_inner_microstep: 1525.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 02:02:56,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.74 | bwd_microstep: 1489.90 | bwd_inner_microstep: 1489.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2057
[2024-06-11 02:02:57,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.80 | bwd_microstep: 1011.77 | bwd_inner_microstep: 1011.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540
[2024-06-11 02:02:59,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1415.88 | bwd_inner_microstep: 1415.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 02:03:01,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1498.58 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2570
[2024-06-11 02:03:05,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-11 02:03:05,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.68 | bwd_microstep: 3113.32 | bwd_inner_microstep: 1216.27 | bwd_allreduce_microstep: 1896.99 | step_microstep: 37.58
[2024-06-11 02:03:05,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15865.54 | bwd: 44401.35 | bwd_inner: 42503.45 | bwd_allreduce: 1897.22 | step: 39.10
{'loss': 1.1704, 'learning_rate': 2.334004667867824e-06, 'epoch': 0.85}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3523
[2024-06-11 02:03:07,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1430.27 | bwd_inner_microstep: 1430.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-11 02:03:09,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.03 | bwd_microstep: 1302.64 | bwd_inner_microstep: 1302.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 02:03:11,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.34 | bwd_microstep: 1375.71 | bwd_inner_microstep: 1375.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 02:03:13,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1480.77 | bwd_inner_microstep: 1480.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763
[2024-06-11 02:03:15,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.82 | bwd_microstep: 1470.81 | bwd_inner_microstep: 1470.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-11 02:03:16,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.29 | bwd_microstep: 790.41 | bwd_inner_microstep: 790.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 02:03:18,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1339.70 | bwd_inner_microstep: 1339.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3712
[2024-06-11 02:03:19,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.61 | bwd_microstep: 1328.46 | bwd_inner_microstep: 1328.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-11 02:03:21,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 793.52 | bwd_inner_microstep: 793.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-11 02:03:23,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1499.06 | bwd_inner_microstep: 1499.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-11 02:03:25,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1421.35 | bwd_inner_microstep: 1421.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-11 02:03:26,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.44 | bwd_microstep: 1215.12 | bwd_inner_microstep: 1215.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 02:03:28,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1480.44 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-11 02:03:30,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.33 | bwd_microstep: 1417.70 | bwd_inner_microstep: 1417.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516
[2024-06-11 02:03:32,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1408.86 | bwd_inner_microstep: 1408.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-11 02:03:34,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3837
[2024-06-11 02:03:36,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.95 | bwd_microstep: 1756.45 | bwd_inner_microstep: 1756.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-11 02:03:38,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1485.10 | bwd_inner_microstep: 1485.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-11 02:03:40,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.39 | bwd_microstep: 1291.34 | bwd_inner_microstep: 1291.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497
[2024-06-11 02:03:42,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.55 | bwd_microstep: 1187.11 | bwd_inner_microstep: 1187.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665
[2024-06-11 02:03:44,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.30 | bwd_microstep: 1623.15 | bwd_inner_microstep: 1623.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 02:03:46,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.16 | bwd_microstep: 1553.86 | bwd_inner_microstep: 1553.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-11 02:03:48,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1310.67 | bwd_inner_microstep: 1310.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-11 02:03:50,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.96 | bwd_microstep: 1357.37 | bwd_inner_microstep: 1357.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150
[2024-06-11 02:03:51,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.78 | bwd_microstep: 850.35 | bwd_inner_microstep: 850.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 892
[2024-06-11 02:03:52,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.41 | bwd_microstep: 368.64 | bwd_inner_microstep: 368.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2122
[2024-06-11 02:03:53,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.83 | bwd_microstep: 1025.20 | bwd_inner_microstep: 1025.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-11 02:03:55,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1538.31 | bwd_inner_microstep: 1538.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2955
[2024-06-11 02:03:57,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.12 | bwd_microstep: 1096.25 | bwd_inner_microstep: 1096.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2289
[2024-06-11 02:03:58,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.90 | bwd_microstep: 937.84 | bwd_inner_microstep: 937.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3589
[2024-06-11 02:04:00,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.63 | bwd_microstep: 1803.04 | bwd_inner_microstep: 1803.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-11 02:04:08,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.09 | optimizer_step: 6.58
[2024-06-11 02:04:08,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 6656.84 | bwd_inner_microstep: 1641.87 | bwd_allreduce_microstep: 5014.92 | step_microstep: 37.79
[2024-06-11 02:04:08,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15604.32 | bwd: 46877.53 | bwd_inner: 41861.71 | bwd_allreduce: 5015.15 | step: 39.23
{'loss': 1.1981, 'learning_rate': 2.3164395428437605e-06, 'epoch': 0.85}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1906
[2024-06-11 02:04:09,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.97 | bwd_microstep: 865.90 | bwd_inner_microstep: 865.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3951
[2024-06-11 02:04:11,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.03 | bwd_microstep: 1594.22 | bwd_inner_microstep: 1594.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3895
[2024-06-11 02:04:13,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.36 | bwd_microstep: 1683.51 | bwd_inner_microstep: 1683.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794
[2024-06-11 02:04:16,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.37 | bwd_microstep: 1645.93 | bwd_inner_microstep: 1645.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 02:04:17,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1279.19 | bwd_inner_microstep: 1279.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 02:04:19,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1248.90 | bwd_inner_microstep: 1248.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 02:04:21,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1383.81 | bwd_inner_microstep: 1383.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-11 02:04:23,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1401.55 | bwd_inner_microstep: 1401.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764
[2024-06-11 02:04:25,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.23 | bwd_microstep: 1446.76 | bwd_inner_microstep: 1446.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-11 02:04:27,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.18 | bwd_microstep: 1188.27 | bwd_inner_microstep: 1188.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4072
[2024-06-11 02:04:29,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.36 | bwd_microstep: 1626.20 | bwd_inner_microstep: 1626.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3477
[2024-06-11 02:04:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1442.69 | bwd_inner_microstep: 1442.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3717
[2024-06-11 02:04:33,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.71 | bwd_microstep: 1664.71 | bwd_inner_microstep: 1664.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3380
[2024-06-11 02:04:35,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.02 | bwd_microstep: 1386.99 | bwd_inner_microstep: 1386.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 02:04:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1483.15 | bwd_inner_microstep: 1483.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-11 02:04:39,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1404.89 | bwd_inner_microstep: 1404.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 02:04:41,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1276.95 | bwd_inner_microstep: 1276.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-11 02:04:43,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.90 | bwd_microstep: 1180.30 | bwd_inner_microstep: 1180.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-11 02:04:44,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1404.95 | bwd_inner_microstep: 1404.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3476
[2024-06-11 02:04:46,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1328.29 | bwd_inner_microstep: 1328.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 02:04:48,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1375.72 | bwd_inner_microstep: 1375.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 02:04:50,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.96 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 02:04:52,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1252.70 | bwd_inner_microstep: 1252.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 02:04:54,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.12 | bwd_microstep: 1508.83 | bwd_inner_microstep: 1508.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 02:04:56,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.11 | bwd_microstep: 1568.75 | bwd_inner_microstep: 1568.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3600
[2024-06-11 02:04:58,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.72 | bwd_microstep: 1338.01 | bwd_inner_microstep: 1337.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822
[2024-06-11 02:05:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1588.56 | bwd_inner_microstep: 1588.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3716
[2024-06-11 02:05:02,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.20 | bwd_microstep: 1583.95 | bwd_inner_microstep: 1583.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560
[2024-06-11 02:05:04,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1430.18 | bwd_inner_microstep: 1430.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2234
[2024-06-11 02:05:06,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.35 | bwd_microstep: 1064.23 | bwd_inner_microstep: 1064.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-11 02:05:08,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.77 | bwd_microstep: 1646.48 | bwd_inner_microstep: 1646.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644
[2024-06-11 02:05:11,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.65 | optimizer_gradients: 4.03 | optimizer_step: 6.61
[2024-06-11 02:05:11,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 2635.71 | bwd_inner_microstep: 1494.83 | bwd_allreduce_microstep: 1140.83 | step_microstep: 38.35
[2024-06-11 02:05:11,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16854.87 | bwd: 46383.37 | bwd_inner: 45241.62 | bwd_allreduce: 1141.06 | step: 39.87
{'loss': 1.2488, 'learning_rate': 2.2989366992276917e-06, 'epoch': 0.85}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 02:05:13,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.84 | bwd_microstep: 1492.53 | bwd_inner_microstep: 1492.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3949
[2024-06-11 02:05:16,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.06 | bwd_microstep: 1555.40 | bwd_inner_microstep: 1555.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2451
[2024-06-11 02:05:17,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 946.55 | bwd_inner_microstep: 946.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301
[2024-06-11 02:05:18,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.73 | bwd_microstep: 973.52 | bwd_inner_microstep: 973.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-11 02:05:19,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.60 | bwd_microstep: 676.12 | bwd_inner_microstep: 676.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-11 02:05:21,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1536.74 | bwd_inner_microstep: 1536.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-11 02:05:23,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.71 | bwd_microstep: 1300.49 | bwd_inner_microstep: 1300.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-11 02:05:24,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.94 | bwd_microstep: 814.28 | bwd_inner_microstep: 814.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481
[2024-06-11 02:05:26,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.42 | bwd_microstep: 1214.44 | bwd_inner_microstep: 1214.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-11 02:05:27,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 790.37 | bwd_inner_microstep: 790.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-11 02:05:29,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.77 | bwd_microstep: 1422.42 | bwd_inner_microstep: 1422.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-11 02:05:31,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1254.82 | bwd_inner_microstep: 1254.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515
[2024-06-11 02:05:32,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.57 | bwd_microstep: 1190.34 | bwd_inner_microstep: 1190.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1951
[2024-06-11 02:05:34,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.55 | bwd_microstep: 890.17 | bwd_inner_microstep: 890.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-11 02:05:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1410.65 | bwd_inner_microstep: 1410.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-11 02:05:37,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1335.35 | bwd_inner_microstep: 1335.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3689
[2024-06-11 02:05:39,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.02 | bwd_microstep: 1264.02 | bwd_inner_microstep: 1264.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3514
[2024-06-11 02:05:41,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1347.04 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-11 02:05:43,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.76 | bwd_microstep: 1299.84 | bwd_inner_microstep: 1299.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-11 02:05:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1447.11 | bwd_inner_microstep: 1447.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 02:05:47,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1551.76 | bwd_inner_microstep: 1551.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535
[2024-06-11 02:05:49,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.22 | bwd_microstep: 1447.71 | bwd_inner_microstep: 1447.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-11 02:05:50,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.66 | bwd_microstep: 696.51 | bwd_inner_microstep: 696.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-11 02:05:52,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1422.70 | bwd_inner_microstep: 1422.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3716
[2024-06-11 02:05:54,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1598.51 | bwd_inner_microstep: 1598.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555
[2024-06-11 02:05:56,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.89 | bwd_microstep: 1202.67 | bwd_inner_microstep: 1202.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-11 02:05:57,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.48 | bwd_microstep: 1297.76 | bwd_inner_microstep: 1297.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 02:05:59,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1377.71 | bwd_inner_microstep: 1377.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-11 02:06:01,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 803.83 | bwd_inner_microstep: 803.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3807
[2024-06-11 02:06:03,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.01 | bwd_microstep: 1723.82 | bwd_inner_microstep: 1723.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440
[2024-06-11 02:06:05,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.38 | bwd_microstep: 1213.29 | bwd_inner_microstep: 1213.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3766
[2024-06-11 02:06:14,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-11 02:06:14,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.46 | bwd_microstep: 8999.02 | bwd_inner_microstep: 2082.58 | bwd_allreduce_microstep: 6916.37 | step_microstep: 39.29
[2024-06-11 02:06:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15153.81 | bwd: 47497.52 | bwd_inner: 40580.24 | bwd_allreduce: 6916.61 | step: 40.75
{'loss': 1.1867, 'learning_rate': 2.2814961986645525e-06, 'epoch': 0.85}
�███████▍ | 1465/1726 [25:24:41<4:31:09, 62.34s/it]


 85%|████████▍ | 1465/1726 [25:24:41<4:31:09, 62.34s/it]
 85%|████████▍ | 1466/1726 [25:25:42<4:27:51, 61.81s/it]


 85%|████████▍ | 1466/1726 [25:25:42<4:27:51, 61.81s/it]
 85%|████████▍ | 1467/1726 [25:26:44<4:28:06, 62.11s/it]


 85%|████████▍ | 1467/1726 [25:26:44<4:28:06, 62.11s/it]
 85%|████████▌ | 1468/1726 [25:27:48<4:28:57, 62.55s/it]


 85%|████████▌ | 1468/1726 [25:27:48<4:28:57, 62.55s/it]
 85%|████████▌ | 1469/1726 [25:28:51<4:28:27, 62.68s/it]


 85%|████████▌ | 1469/1726 [25:28:51<4:28:27, 62dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 02:06:16,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.12 | bwd_microstep: 1377.78 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 02:06:18,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1377.85 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 02:06:20,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-11 02:06:23,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1544.31 | bwd_inner_microstep: 1544.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 02:06:24,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1376.43 | bwd_inner_microstep: 1376.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-11 02:06:26,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1408.89 | bwd_inner_microstep: 1408.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878
[2024-06-11 02:06:27,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.67 | bwd_microstep: 679.86 | bwd_inner_microstep: 679.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 02:06:29,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1248.75 | bwd_inner_microstep: 1248.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948
[2024-06-11 02:06:30,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.81 | bwd_microstep: 826.49 | bwd_inner_microstep: 826.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-11 02:06:32,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.84 | bwd_microstep: 1314.77 | bwd_inner_microstep: 1314.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063
[2024-06-11 02:06:33,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.42 | bwd_microstep: 814.78 | bwd_inner_microstep: 814.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-11 02:06:35,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.83 | bwd_microstep: 1482.48 | bwd_inner_microstep: 1482.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3442
[2024-06-11 02:06:37,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1392.50 | bwd_inner_microstep: 1392.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471
[2024-06-11 02:06:39,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.62 | bwd_microstep: 1441.17 | bwd_inner_microstep: 1441.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3676
[2024-06-11 02:06:41,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.11 | bwd_microstep: 1586.83 | bwd_inner_microstep: 1586.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-11 02:06:43,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1290.85 | bwd_inner_microstep: 1290.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841
[2024-06-11 02:06:45,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.59 | bwd_microstep: 1664.55 | bwd_inner_microstep: 1664.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-11 02:06:47,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.57 | bwd_microstep: 1430.76 | bwd_inner_microstep: 1430.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627
[2024-06-11 02:06:49,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1311.52 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622
[2024-06-11 02:06:51,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1341.89 | bwd_inner_microstep: 1341.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 02:06:53,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1379.71 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-11 02:06:55,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.15 | bwd_microstep: 1640.30 | bwd_inner_microstep: 1640.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2184
[2024-06-11 02:06:56,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.33 | bwd_microstep: 919.58 | bwd_inner_microstep: 919.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-11 02:06:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.77 | bwd_microstep: 1476.71 | bwd_inner_microstep: 1476.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 02:07:00,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1412.09 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591
[2024-06-11 02:07:02,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590
[2024-06-11 02:07:04,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1309.74 | bwd_inner_microstep: 1309.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3438
[2024-06-11 02:07:06,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1399.53 | bwd_inner_microstep: 1399.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-11 02:07:08,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1601.19 | bwd_inner_microstep: 1601.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 02:07:10,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.64 | bwd_microstep: 1646.67 | bwd_inner_microstep: 1646.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-11 02:07:12,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1401.37 | bwd_inner_microstep: 1401.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3577
[2024-06-11 02:07:16,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 02:07:16,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.29 | bwd_microstep: 3220.90 | bwd_inner_microstep: 1761.94 | bwd_allreduce_microstep: 1458.91 | step_microstep: 37.87
[2024-06-11 02:07:16,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.77 | bwd: 45312.26 | bwd_inner: 43852.39 | bwd_allreduce: 1459.17 | step: 39.32
{'loss': 1.2503, 'learning_rate': 2.264118102579693e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 02:07:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1379.11 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3946
[2024-06-11 02:07:20,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.07 | bwd_microstep: 1526.05 | bwd_inner_microstep: 1526.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889
[2024-06-11 02:07:22,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.26 | bwd_microstep: 1582.82 | bwd_inner_microstep: 1582.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933
[2024-06-11 02:07:23,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.07 | bwd_microstep: 696.17 | bwd_inner_microstep: 696.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 02:07:25,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.89 | bwd_microstep: 1476.60 | bwd_inner_microstep: 1476.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613
[2024-06-11 02:07:27,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.27 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3489
[2024-06-11 02:07:29,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1955
[2024-06-11 02:07:30,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.61 | bwd_microstep: 730.84 | bwd_inner_microstep: 730.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746
[2024-06-11 02:07:32,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1638.91 | bwd_inner_microstep: 1638.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-11 02:07:33,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 701.15 | bwd_inner_microstep: 701.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-11 02:07:35,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.44 | bwd_microstep: 1151.68 | bwd_inner_microstep: 1151.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-11 02:07:36,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.41 | bwd_microstep: 798.08 | bwd_inner_microstep: 798.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 02:07:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.97 | bwd_microstep: 1377.61 | bwd_inner_microstep: 1377.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3643
[2024-06-11 02:07:40,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.02 | bwd_microstep: 1474.88 | bwd_inner_microstep: 1474.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 02:07:42,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.76 | bwd_microstep: 1485.50 | bwd_inner_microstep: 1485.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-11 02:07:44,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.77 | bwd_microstep: 1521.60 | bwd_inner_microstep: 1521.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2292
[2024-06-11 02:07:45,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.51 | bwd_microstep: 1072.00 | bwd_inner_microstep: 1071.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-11 02:07:47,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-11 02:07:49,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1538.25 | bwd_inner_microstep: 1538.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3872
[2024-06-11 02:07:51,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.07 | bwd_microstep: 1372.27 | bwd_inner_microstep: 1372.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 02:07:53,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1251.73 | bwd_inner_microstep: 1251.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-11 02:07:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1440.60 | bwd_inner_microstep: 1440.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2937
[2024-06-11 02:07:57,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.96 | bwd_microstep: 1193.72 | bwd_inner_microstep: 1193.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 02:07:58,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.41 | bwd_microstep: 1284.29 | bwd_inner_microstep: 1284.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 02:08:01,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1554.99 | bwd_inner_microstep: 1554.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-11 02:08:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1499.39 | bwd_inner_microstep: 1499.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 02:08:05,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1658.97 | bwd_inner_microstep: 1658.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-11 02:08:07,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-11 02:08:09,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.68 | bwd_microstep: 1612.27 | bwd_inner_microstep: 1612.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568
[2024-06-11 02:08:11,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.38 | bwd_microstep: 1560.26 | bwd_inner_microstep: 1560.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-11 02:08:13,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.71 | bwd_microstep: 1544.00 | bwd_inner_microstep: 1543.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3743
[2024-06-11 02:08:19,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.08 | optimizer_step: 6.58
[2024-06-11 02:08:19,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.20 | bwd_microstep: 4821.84 | bwd_inner_microstep: 1925.83 | bwd_allreduce_microstep: 2895.96 | step_microstep: 37.62
[2024-06-11 02:08:19,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16097.61 | bwd: 46192.29 | bwd_inner: 43295.43 | bwd_allreduce: 2896.19 | step: 39.07
{'loss': 1.1697, 'learning_rate': 2.246802472178675e-06, 'epoch': 0.85}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-11 02:08:21,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1467.49 | bwd_inner_microstep: 1467.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3907
[2024-06-11 02:08:23,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.54 | bwd_microstep: 1685.00 | bwd_inner_microstep: 1684.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3879
[2024-06-11 02:08:25,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.32 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 02:08:27,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1389.93 | bwd_inner_microstep: 1389.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-11 02:08:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.17 | bwd_microstep: 1518.89 | bwd_inner_microstep: 1518.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529
[2024-06-11 02:08:31,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1321.44 | bwd_inner_microstep: 1321.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753
[2024-06-11 02:08:33,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.94 | bwd_microstep: 1467.83 | bwd_inner_microstep: 1467.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 02:08:35,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1243.18 | bwd_inner_microstep: 1243.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3925
[2024-06-11 02:08:37,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.99 | bwd_microstep: 1486.38 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-11 02:08:39,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1409.61 | bwd_inner_microstep: 1409.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3557
[2024-06-11 02:08:41,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.47 | bwd_microstep: 1360.03 | bwd_inner_microstep: 1360.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 02:08:43,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1480.50 | bwd_inner_microstep: 1480.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2097
[2024-06-11 02:08:44,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.12 | bwd_microstep: 1013.89 | bwd_inner_microstep: 1013.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 02:08:46,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1344.64 | bwd_inner_microstep: 1344.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1858
[2024-06-11 02:08:47,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.76 | bwd_microstep: 707.59 | bwd_inner_microstep: 707.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 02:08:49,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1385.40 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-11 02:08:50,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.18 | bwd_microstep: 892.36 | bwd_inner_microstep: 892.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829
[2024-06-11 02:08:52,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.51 | bwd_microstep: 1358.00 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 02:08:54,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2979
[2024-06-11 02:08:55,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.82 | bwd_microstep: 1105.03 | bwd_inner_microstep: 1105.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 02:08:57,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.34 | bwd_microstep: 1378.27 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-11 02:09:00,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.38 | bwd_microstep: 1650.62 | bwd_inner_microstep: 1650.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3609
[2024-06-11 02:09:01,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.13 | bwd_microstep: 1213.35 | bwd_inner_microstep: 1213.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-11 02:09:03,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1559.61 | bwd_inner_microstep: 1559.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167
[2024-06-11 02:09:05,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.12 | bwd_microstep: 855.14 | bwd_inner_microstep: 855.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 02:09:06,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.36 | bwd_microstep: 1298.90 | bwd_inner_microstep: 1298.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 02:09:08,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.71 | bwd_microstep: 1261.38 | bwd_inner_microstep: 1261.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3534
[2024-06-11 02:09:10,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.43 | bwd_microstep: 1356.01 | bwd_inner_microstep: 1355.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-11 02:09:12,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.26 | bwd_microstep: 1396.63 | bwd_inner_microstep: 1396.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-11 02:09:14,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.03 | bwd_microstep: 1431.52 | bwd_inner_microstep: 1431.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3457
[2024-06-11 02:09:16,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1425.26 | bwd_inner_microstep: 1425.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3768
[2024-06-11 02:09:20,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-11 02:09:20,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 3532.47 | bwd_inner_microstep: 1553.53 | bwd_allreduce_microstep: 1978.88 | step_microstep: 38.40
[2024-06-11 02:09:20,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16020.52 | bwd: 44856.96 | bwd_inner: 42877.17 | bwd_allreduce: 1979.11 | step: 39.87
{'loss': 1.1904, 'learning_rate': 2.229549368447057e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 02:09:22,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1332.73 | bwd_inner_microstep: 1332.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4016
[2024-06-11 02:09:24,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1504.42 | bwd_inner_microstep: 1504.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3833
[2024-06-11 02:09:26,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.50 | bwd_microstep: 1513.38 | bwd_inner_microstep: 1513.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 02:09:28,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1377.30 | bwd_inner_microstep: 1377.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-11 02:09:30,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.11 | bwd_microstep: 1446.15 | bwd_inner_microstep: 1446.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 02:09:31,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.19 | bwd_microstep: 794.43 | bwd_inner_microstep: 794.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 02:09:33,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 02:09:35,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 02:09:37,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.35 | bwd_microstep: 1392.35 | bwd_inner_microstep: 1392.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 02:09:39,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1394.24 | bwd_inner_microstep: 1394.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-11 02:09:41,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.61 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3639
[2024-06-11 02:09:43,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.04 | bwd_microstep: 1375.64 | bwd_inner_microstep: 1375.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655
[2024-06-11 02:09:45,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.19 | bwd_microstep: 1575.98 | bwd_inner_microstep: 1575.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-11 02:09:47,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.87 | bwd_microstep: 1275.33 | bwd_inner_microstep: 1275.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-11 02:09:49,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1582.49 | bwd_inner_microstep: 1582.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 02:09:51,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.00 | bwd_microstep: 1381.15 | bwd_inner_microstep: 1381.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3843
[2024-06-11 02:09:53,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.61 | bwd_microstep: 1713.03 | bwd_inner_microstep: 1713.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-11 02:09:55,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.00 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3460
[2024-06-11 02:09:57,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.45 | bwd_microstep: 1341.77 | bwd_inner_microstep: 1341.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 02:09:59,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1558.64 | bwd_inner_microstep: 1558.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-11 02:10:01,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.57 | bwd_microstep: 1613.78 | bwd_inner_microstep: 1613.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-11 02:10:03,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1404.26 | bwd_inner_microstep: 1404.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547
[2024-06-11 02:10:05,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.45 | bwd_microstep: 1329.32 | bwd_inner_microstep: 1329.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938
[2024-06-11 02:10:06,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.30 | bwd_microstep: 728.40 | bwd_inner_microstep: 728.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026
[2024-06-11 02:10:07,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.91 | bwd_microstep: 806.04 | bwd_inner_microstep: 806.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-11 02:10:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.40 | bwd_microstep: 1607.35 | bwd_inner_microstep: 1607.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 02:10:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1351.21 | bwd_inner_microstep: 1351.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 02:10:13,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.51 | bwd_microstep: 1555.00 | bwd_inner_microstep: 1554.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-11 02:10:15,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.94 | bwd_microstep: 1441.04 | bwd_inner_microstep: 1441.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3420
[2024-06-11 02:10:17,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.12 | bwd_microstep: 1374.30 | bwd_inner_microstep: 1374.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275
[2024-06-11 02:10:18,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.93 | bwd_microstep: 909.40 | bwd_inner_microstep: 909.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-11 02:10:23,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.58
[2024-06-11 02:10:23,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.16 | bwd_microstep: 3769.87 | bwd_inner_microstep: 1694.40 | bwd_allreduce_microstep: 2075.42 | step_microstep: 37.71
[2024-06-11 02:10:23,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16322.36 | bwd: 45956.31 | bwd_inner: 43879.98 | bwd_allreduce: 2075.65 | step: 39.22
{'loss': 1.2066, 'learning_rate': 2.212358852150187e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 02:10:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1364.68 | bwd_inner_microstep: 1364.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-11 02:10:26,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.47 | bwd_microstep: 675.39 | bwd_inner_microstep: 675.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3921
[2024-06-11 02:10:28,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1488.64 | bwd_inner_microstep: 1488.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 02:10:29,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2251
[2024-06-11 02:10:31,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.57 | bwd_microstep: 965.45 | bwd_inner_microstep: 965.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-11 02:10:32,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.93 | bwd_microstep: 674.92 | bwd_inner_microstep: 674.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 02:10:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.28 | bwd_microstep: 1482.43 | bwd_inner_microstep: 1482.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-11 02:10:35,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1291.39 | bwd_inner_microstep: 1291.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 02:10:37,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.66 | bwd_microstep: 1257.81 | bwd_inner_microstep: 1257.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-11 02:10:39,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3625
[2024-06-11 02:10:41,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.07 | bwd_microstep: 1562.84 | bwd_inner_microstep: 1562.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465
[2024-06-11 02:10:43,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.51 | bwd_microstep: 1312.43 | bwd_inner_microstep: 1312.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-11 02:10:45,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1454.39 | bwd_inner_microstep: 1454.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 02:10:47,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1283.28 | bwd_inner_microstep: 1283.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3656
[2024-06-11 02:10:49,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.84 | bwd_microstep: 1823.64 | bwd_inner_microstep: 1823.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-11 02:10:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1525.59 | bwd_inner_microstep: 1525.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3434
[2024-06-11 02:10:53,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.26 | bwd_microstep: 1215.10 | bwd_inner_microstep: 1215.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537
[2024-06-11 02:10:55,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.31 | bwd_microstep: 1199.38 | bwd_inner_microstep: 1199.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548
[2024-06-11 02:10:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1428.44 | bwd_inner_microstep: 1428.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-11 02:10:59,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.81 | bwd_microstep: 1420.08 | bwd_inner_microstep: 1420.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3538
[2024-06-11 02:11:01,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1358.05 | bwd_inner_microstep: 1358.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647
[2024-06-11 02:11:03,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1615.24 | bwd_inner_microstep: 1615.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617
[2024-06-11 02:11:05,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1510.11 | bwd_inner_microstep: 1510.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 02:11:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.24 | bwd_microstep: 1569.44 | bwd_inner_microstep: 1569.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-11 02:11:08,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 802.11 | bwd_inner_microstep: 802.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-11 02:11:10,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.46 | bwd_microstep: 1637.49 | bwd_inner_microstep: 1637.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2193
[2024-06-11 02:11:12,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.19 | bwd_microstep: 910.92 | bwd_inner_microstep: 910.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-11 02:11:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1348.52 | bwd_inner_microstep: 1348.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-11 02:11:16,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.26 | bwd_microstep: 1639.26 | bwd_inner_microstep: 1639.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3575
[2024-06-11 02:11:18,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.58 | bwd_microstep: 1561.23 | bwd_inner_microstep: 1561.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3468
[2024-06-11 02:11:20,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.02 | bwd_microstep: 1505.06 | bwd_inner_microstep: 1505.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 02:11:25,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.19 | optimizer_step: 6.62
[2024-06-11 02:11:25,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 4176.00 | bwd_inner_microstep: 1712.20 | bwd_allreduce_microstep: 2463.75 | step_microstep: 39.08
[2024-06-11 02:11:25,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16108.53 | bwd: 45721.89 | bwd_inner: 43257.23 | bwd_allreduce: 2463.98 | step: 40.58
{'loss': 1.151, 'learning_rate': 2.19523098383297e-06, 'epoch': 0.85}
.68s/it]
 85%|████████▌ | 1470/1726 [25:29:53<4:26:29, 62.46s/it]


 85%|████████▌ | 1470/1726 [25:29:53<4:26:29, 62.46s/it]
 85%|████████▌ | 1471/1726 [25:30:56<4:25:39, 62.51s/it]


 85%|████████▌ | 1471/1726 [25:30:56<4:25:39, 62.51s/it]
 85%|████████▌ | 1472/1726 [25:31:57<4:22:58, 62.12s/it]


 85%|████████▌ | 1472/1726 [25:31:57<4:22:58, 62.12s/it]
 85%|████████▌ | 1473/1726 [25:32:59<4:22:33, 62.27s/it]


 85%|████████▌ | 1473/1726 [25:32:59<4:22:33, 62.27s/it]
 85%|████████▌ | 1474/1726 [25:34:02<4:21:24, 62.24s/it]


 85%|████████▌ | 1474/1726 [25:3dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-11 02:11:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.55 | bwd_microstep: 782.43 | bwd_inner_microstep: 782.28 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 02:11:28,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.38 | bwd_microstep: 1247.28 | bwd_inner_microstep: 1247.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3977
[2024-06-11 02:11:30,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1408.88 | bwd_inner_microstep: 1408.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 02:11:32,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.96 | bwd_microstep: 1479.27 | bwd_inner_microstep: 1479.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2254
[2024-06-11 02:11:33,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.64 | bwd_microstep: 967.53 | bwd_inner_microstep: 967.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 02:11:35,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1247.57 | bwd_inner_microstep: 1247.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3490
[2024-06-11 02:11:37,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.71 | bwd_microstep: 1333.77 | bwd_inner_microstep: 1333.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 02:11:38,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1899
[2024-06-11 02:11:39,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.66 | bwd_microstep: 813.13 | bwd_inner_microstep: 813.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-11 02:11:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.22 | bwd_microstep: 1617.47 | bwd_inner_microstep: 1617.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654
[2024-06-11 02:11:44,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.14 | bwd_microstep: 1444.61 | bwd_inner_microstep: 1444.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3625
[2024-06-11 02:11:46,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.92 | bwd_microstep: 1812.04 | bwd_inner_microstep: 1812.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3663
[2024-06-11 02:11:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.95 | bwd_microstep: 1824.47 | bwd_inner_microstep: 1824.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 02:11:51,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.16 | bwd_microstep: 1481.91 | bwd_inner_microstep: 1481.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3671
[2024-06-11 02:11:53,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.39 | bwd_microstep: 1721.42 | bwd_inner_microstep: 1721.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-11 02:11:55,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1448.13 | bwd_inner_microstep: 1448.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3743
[2024-06-11 02:11:57,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.02 | bwd_microstep: 1565.17 | bwd_inner_microstep: 1565.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-11 02:11:59,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1294.87 | bwd_inner_microstep: 1294.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 02:12:01,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1401.53 | bwd_inner_microstep: 1401.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 02:12:03,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1396.21 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-11 02:12:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1513.07 | bwd_inner_microstep: 1513.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141
[2024-06-11 02:12:06,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.91 | bwd_microstep: 835.43 | bwd_inner_microstep: 835.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 02:12:08,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1560.56 | bwd_inner_microstep: 1560.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2253
[2024-06-11 02:12:10,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.79 | bwd_microstep: 1000.05 | bwd_inner_microstep: 1000.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-11 02:12:12,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.41 | bwd_microstep: 1552.03 | bwd_inner_microstep: 1552.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 02:12:14,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.59 | bwd_microstep: 1661.33 | bwd_inner_microstep: 1661.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3776
[2024-06-11 02:12:16,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.04 | bwd_microstep: 1283.67 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046
[2024-06-11 02:12:17,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.59 | bwd_microstep: 810.39 | bwd_inner_microstep: 810.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1163
[2024-06-11 02:12:18,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 169.82 | bwd_microstep: 435.45 | bwd_inner_microstep: 435.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2030
[2024-06-11 02:12:19,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.89 | bwd_microstep: 783.22 | bwd_inner_microstep: 783.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3792
[2024-06-11 02:12:21,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1501.54 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074
[2024-06-11 02:12:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 02:12:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.69 | bwd_microstep: 4407.70 | bwd_inner_microstep: 1157.96 | bwd_allreduce_microstep: 3249.69 | step_microstep: 37.82
[2024-06-11 02:12:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15477.95 | bwd: 44878.67 | bwd_inner: 41627.98 | bwd_allreduce: 3249.97 | step: 39.33
{'loss': 1.1548, 'learning_rate': 2.178165823819667e-06, 'epoch': 0.85}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-11 02:12:27,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1336.16 | bwd_inner_microstep: 1336.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 02:12:29,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.28 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 02:12:31,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3517
[2024-06-11 02:12:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-11 02:12:35,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1454.22 | bwd_inner_microstep: 1454.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3542
[2024-06-11 02:12:37,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1439.51 | bwd_inner_microstep: 1439.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 02:12:39,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.65 | bwd_microstep: 1286.47 | bwd_inner_microstep: 1286.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 02:12:40,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-11 02:12:42,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.34 | bwd_microstep: 793.51 | bwd_inner_microstep: 793.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-11 02:12:43,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.99 | bwd_microstep: 1297.47 | bwd_inner_microstep: 1297.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678
[2024-06-11 02:12:45,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.22 | bwd_microstep: 1361.67 | bwd_inner_microstep: 1361.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-11 02:12:47,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.52 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 02:12:49,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1355.32 | bwd_inner_microstep: 1355.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988
[2024-06-11 02:12:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 897.18 | bwd_inner_microstep: 897.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-11 02:12:53,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.67 | bwd_microstep: 1486.86 | bwd_inner_microstep: 1486.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-11 02:12:55,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.90 | bwd_microstep: 1628.52 | bwd_inner_microstep: 1628.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2210
[2024-06-11 02:12:56,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.34 | bwd_microstep: 960.18 | bwd_inner_microstep: 960.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-11 02:12:58,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.98 | bwd_microstep: 1191.33 | bwd_inner_microstep: 1191.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 02:13:00,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1380.67 | bwd_inner_microstep: 1380.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 02:13:02,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1398.07 | bwd_inner_microstep: 1398.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-11 02:13:03,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.66 | bwd_microstep: 1216.70 | bwd_inner_microstep: 1216.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692
[2024-06-11 02:13:05,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1329.70 | bwd_inner_microstep: 1329.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-11 02:13:07,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1319.53 | bwd_inner_microstep: 1319.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069
[2024-06-11 02:13:08,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.10 | bwd_microstep: 915.25 | bwd_inner_microstep: 915.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-11 02:13:10,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.60 | bwd_microstep: 1451.02 | bwd_inner_microstep: 1450.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655
[2024-06-11 02:13:12,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.86 | bwd_microstep: 1584.23 | bwd_inner_microstep: 1584.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-11 02:13:14,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3809
[2024-06-11 02:13:17,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.90 | bwd_microstep: 1616.57 | bwd_inner_microstep: 1616.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3767
[2024-06-11 02:13:19,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.03 | bwd_microstep: 1710.51 | bwd_inner_microstep: 1710.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2271
[2024-06-11 02:13:20,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.55 | bwd_microstep: 935.50 | bwd_inner_microstep: 935.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2403
[2024-06-11 02:13:22,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.16 | bwd_microstep: 910.83 | bwd_inner_microstep: 910.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 02:13:28,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.92 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 02:13:28,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 5927.00 | bwd_inner_microstep: 1700.89 | bwd_allreduce_microstep: 4226.06 | step_microstep: 38.98
[2024-06-11 02:13:28,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.77 | bwd: 46475.04 | bwd_inner: 42248.08 | bwd_allreduce: 4226.29 | step: 40.44
{'loss': 1.1534, 'learning_rate': 2.1611634322136934e-06, 'epoch': 0.86}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-11 02:13:30,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1335.41 | bwd_inner_microstep: 1335.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937
[2024-06-11 02:13:32,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.77 | bwd_microstep: 1486.90 | bwd_inner_microstep: 1486.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876
[2024-06-11 02:13:33,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.39 | bwd_microstep: 707.14 | bwd_inner_microstep: 707.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3852
[2024-06-11 02:13:35,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.13 | bwd_microstep: 1626.89 | bwd_inner_microstep: 1626.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4156
[2024-06-11 02:13:37,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.73 | bwd_microstep: 1538.78 | bwd_inner_microstep: 1538.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-11 02:13:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1427.86 | bwd_inner_microstep: 1427.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-11 02:13:41,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1536.32 | bwd_inner_microstep: 1536.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 02:13:43,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.89 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3514
[2024-06-11 02:13:45,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769
[2024-06-11 02:13:47,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.49 | bwd_microstep: 1639.60 | bwd_inner_microstep: 1639.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3496
[2024-06-11 02:13:49,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1416.40 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452
[2024-06-11 02:13:51,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1447.85 | bwd_inner_microstep: 1447.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3707
[2024-06-11 02:13:53,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.47 | bwd_microstep: 1692.70 | bwd_inner_microstep: 1692.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905
[2024-06-11 02:13:54,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.14 | bwd_microstep: 716.00 | bwd_inner_microstep: 715.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 02:13:56,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2095
[2024-06-11 02:13:58,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.32 | bwd_microstep: 947.11 | bwd_inner_microstep: 947.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1944
[2024-06-11 02:13:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.78 | bwd_microstep: 739.95 | bwd_inner_microstep: 739.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 02:14:01,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.72 | bwd_microstep: 1379.38 | bwd_inner_microstep: 1379.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 02:14:03,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.90 | bwd_microstep: 1646.04 | bwd_inner_microstep: 1646.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-11 02:14:05,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1508.74 | bwd_inner_microstep: 1508.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-11 02:14:07,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.23 | bwd_microstep: 1504.61 | bwd_inner_microstep: 1504.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 02:14:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.85 | bwd_microstep: 1451.41 | bwd_inner_microstep: 1451.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 02:14:11,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.46 | bwd_microstep: 1279.79 | bwd_inner_microstep: 1279.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 02:14:13,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.51 | bwd_microstep: 1391.43 | bwd_inner_microstep: 1391.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 02:14:15,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1457.80 | bwd_inner_microstep: 1457.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2267
[2024-06-11 02:14:16,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.92 | bwd_microstep: 934.27 | bwd_inner_microstep: 934.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-11 02:14:18,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.43 | bwd_microstep: 1646.89 | bwd_inner_microstep: 1646.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574
[2024-06-11 02:14:20,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.07 | bwd_microstep: 1566.35 | bwd_inner_microstep: 1566.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1946
[2024-06-11 02:14:22,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.20 | bwd_microstep: 777.85 | bwd_inner_microstep: 777.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799
[2024-06-11 02:14:24,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1550.97 | bwd_inner_microstep: 1550.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-11 02:14:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.97 | bwd_microstep: 1544.30 | bwd_inner_microstep: 1544.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-11 02:14:30,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-11 02:14:30,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 3885.06 | bwd_inner_microstep: 1688.03 | bwd_allreduce_microstep: 2196.97 | step_microstep: 38.94
[2024-06-11 02:14:30,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16165.44 | bwd: 45697.83 | bwd_inner: 43499.95 | bwd_allreduce: 2197.19 | step: 40.50
{'loss': 1.1701, 'learning_rate': 2.1442238688973682e-06, 'epoch': 0.86}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 02:14:32,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1370.85 | bwd_inner_microstep: 1370.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3928
[2024-06-11 02:14:34,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 02:14:36,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-11 02:14:38,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1482.74 | bwd_inner_microstep: 1482.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758
[2024-06-11 02:14:40,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.82 | bwd_microstep: 1640.36 | bwd_inner_microstep: 1640.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 02:14:42,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1384.77 | bwd_inner_microstep: 1384.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709
[2024-06-11 02:14:45,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.35 | bwd_microstep: 1531.83 | bwd_inner_microstep: 1531.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416
[2024-06-11 02:14:46,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.29 | bwd_microstep: 1154.90 | bwd_inner_microstep: 1154.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 02:14:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1289.59 | bwd_inner_microstep: 1289.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 02:14:50,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1485.07 | bwd_inner_microstep: 1485.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 02:14:52,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1348.23 | bwd_inner_microstep: 1348.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-11 02:14:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1251.20 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439
[2024-06-11 02:14:55,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1191.15 | bwd_inner_microstep: 1191.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3556
[2024-06-11 02:14:57,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.33 | bwd_microstep: 1452.57 | bwd_inner_microstep: 1452.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3662
[2024-06-11 02:15:00,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.85 | bwd_microstep: 1723.96 | bwd_inner_microstep: 1723.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2747
[2024-06-11 02:15:01,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.86 | bwd_microstep: 947.19 | bwd_inner_microstep: 947.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 02:15:03,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1246.42 | bwd_inner_microstep: 1246.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3544
[2024-06-11 02:15:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.03 | bwd_microstep: 1694.23 | bwd_inner_microstep: 1694.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700
[2024-06-11 02:15:07,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1628.62 | bwd_inner_microstep: 1628.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-11 02:15:09,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.63 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1941
[2024-06-11 02:15:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.60 | bwd_microstep: 762.65 | bwd_inner_microstep: 762.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-11 02:15:11,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.17 | bwd_microstep: 793.68 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 02:15:13,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1391.92 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825
[2024-06-11 02:15:15,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.70 | bwd_microstep: 1357.71 | bwd_inner_microstep: 1357.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-11 02:15:17,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.85 | bwd_microstep: 978.17 | bwd_inner_microstep: 978.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3758
[2024-06-11 02:15:18,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.08 | bwd_microstep: 1345.73 | bwd_inner_microstep: 1345.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 02:15:21,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1498.32 | bwd_inner_microstep: 1498.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-11 02:15:22,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1415.75 | bwd_inner_microstep: 1415.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 02:15:25,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.32 | bwd_microstep: 1476.08 | bwd_inner_microstep: 1476.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-11 02:15:26,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1248.60 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-11 02:15:28,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.81 | bwd_microstep: 1543.69 | bwd_inner_microstep: 1543.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588
[2024-06-11 02:15:30,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.04 | optimizer_step: 6.66
[2024-06-11 02:15:30,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1377.86 | bwd_inner_microstep: 1370.02 | bwd_allreduce_microstep: 7.80 | step_microstep: 37.40
[2024-06-11 02:15:30,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16249.72 | bwd: 43413.92 | bwd_inner: 43405.22 | bwd_allreduce: 8.02 | step: 38.94
{'loss': 1.1923, 'learning_rate': 2.127347193531757e-06, 'epoch': 0.86}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3507
[2024-06-11 02:15:32,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1344.08 | bwd_inner_microstep: 1344.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3904
[2024-06-11 02:15:35,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.73 | bwd_microstep: 1718.08 | bwd_inner_microstep: 1718.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 02:15:37,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1553.85 | bwd_inner_microstep: 1553.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-11 02:15:39,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1436.25 | bwd_inner_microstep: 1436.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-11 02:15:41,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.11 | bwd_microstep: 1541.38 | bwd_inner_microstep: 1541.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-11 02:15:43,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1278.92 | bwd_inner_microstep: 1278.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 02:15:44,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1250.71 | bwd_inner_microstep: 1250.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 02:15:46,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.32 | bwd_microstep: 1281.58 | bwd_inner_microstep: 1281.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-11 02:15:48,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.43 | bwd_microstep: 1155.08 | bwd_inner_microstep: 1155.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079
[2024-06-11 02:15:49,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.01 | bwd_microstep: 822.14 | bwd_inner_microstep: 822.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-11 02:15:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.49 | bwd_microstep: 1151.36 | bwd_inner_microstep: 1151.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 02:15:52,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-11 02:15:54,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.95 | bwd_microstep: 1438.06 | bwd_inner_microstep: 1438.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3439
[2024-06-11 02:15:56,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1484.09 | bwd_inner_microstep: 1484.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-11 02:15:58,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 1480.33 | bwd_inner_microstep: 1480.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2022
[2024-06-11 02:16:00,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.59 | bwd_microstep: 903.11 | bwd_inner_microstep: 903.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411
[2024-06-11 02:16:01,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.79 | bwd_microstep: 1312.62 | bwd_inner_microstep: 1312.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 02:16:04,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.27 | bwd_microstep: 1646.65 | bwd_inner_microstep: 1646.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 02:16:05,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290
[2024-06-11 02:16:07,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.95 | bwd_microstep: 909.02 | bwd_inner_microstep: 908.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-11 02:16:08,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.06 | bwd_microstep: 802.84 | bwd_inner_microstep: 802.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468
[2024-06-11 02:16:10,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.29 | bwd_microstep: 1337.88 | bwd_inner_microstep: 1337.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2140
[2024-06-11 02:16:11,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.94 | bwd_microstep: 737.94 | bwd_inner_microstep: 737.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 02:16:13,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.11 | bwd_microstep: 1405.13 | bwd_inner_microstep: 1405.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806
[2024-06-11 02:16:15,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.82 | bwd_microstep: 1616.85 | bwd_inner_microstep: 1616.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3546
[2024-06-11 02:16:17,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.40 | bwd_microstep: 1231.50 | bwd_inner_microstep: 1231.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 02:16:19,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.17 | bwd_microstep: 1656.16 | bwd_inner_microstep: 1656.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-11 02:16:21,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1319.05 | bwd_inner_microstep: 1319.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2192
[2024-06-11 02:16:22,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.26 | bwd_microstep: 859.88 | bwd_inner_microstep: 859.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2172
[2024-06-11 02:16:23,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.57 | bwd_microstep: 953.23 | bwd_inner_microstep: 953.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590
[2024-06-11 02:16:25,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.91 | bwd_microstep: 1610.05 | bwd_inner_microstep: 1610.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 02:17:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.35 | optimizer_step: 6.59
[2024-06-11 02:17:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 49443.05 | bwd_inner_microstep: 1547.69 | bwd_allreduce_microstep: 47895.29 | step_microstep: 38.93
[2024-06-11 02:17:15,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15461.85 | bwd: 89312.93 | bwd_inner: 41416.66 | bwd_allreduce: 47895.55 | step: 40.52
{'loss': 1.181, 'learning_rate': 2.1105334655564148e-06, 'epoch': 0.86}
4:02<4:21:24, 62.24s/it]
 85%|████████▌ | 1475/1726 [25:35:02<4:18:26, 61.78s/it]


 85%|████████▌ | 1475/1726 [25:35:02<4:18:26, 61.78s/it]
 86%|████████▌ | 1476/1726 [25:36:05<4:18:22, 62.01s/it]


 86%|████████▌ | 1476/1726 [25:36:05<4:18:22, 62.01s/it]
 86%|████████▌ | 1477/1726 [25:37:07<4:17:35, 62.07s/it]


 86%|████████▌ | 1477/1726 [25:37:07<4:17:35, 62.07s/it]
 86%|████████▌ | 1478/1726 [25:38:07<4:13:59, 61.45s/it]


 86%|████████▌ | 1478/1726 [25:38:07<4:13:59, 61.45s/it]
 86%|████████▌ | 1479/1726 [25:39:52<5:06:53, 74.55s/it]


 86%|████████▌ |dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-11 02:17:17,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.17 | bwd_microstep: 1433.16 | bwd_inner_microstep: 1433.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-11 02:17:19,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.44 | bwd_microstep: 1338.92 | bwd_inner_microstep: 1338.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 02:17:21,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1365.69 | bwd_inner_microstep: 1365.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 02:17:23,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.05 | bwd_microstep: 1237.21 | bwd_inner_microstep: 1237.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1859
[2024-06-11 02:17:24,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.58 | bwd_microstep: 675.40 | bwd_inner_microstep: 675.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2203
[2024-06-11 02:17:25,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.45 | bwd_microstep: 948.06 | bwd_inner_microstep: 948.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-11 02:17:27,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.82 | bwd_microstep: 1301.89 | bwd_inner_microstep: 1301.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3490
[2024-06-11 02:17:29,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1407.23 | bwd_inner_microstep: 1407.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 02:18:33,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.82 | bwd_microstep: 1328.22 | bwd_inner_microstep: 1328.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-11 02:18:35,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.35 | bwd_microstep: 1629.49 | bwd_inner_microstep: 1629.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-11 02:18:37,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1331.00 | bwd_inner_microstep: 1330.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-11 02:18:39,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.67 | bwd_microstep: 1603.87 | bwd_inner_microstep: 1603.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-11 02:18:41,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1400.75 | bwd_inner_microstep: 1400.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492
[2024-06-11 02:18:43,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1304.61 | bwd_inner_microstep: 1304.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-11 02:18:44,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.38 | bwd_microstep: 676.21 | bwd_inner_microstep: 676.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508
[2024-06-11 02:18:45,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.43 | bwd_microstep: 1310.56 | bwd_inner_microstep: 1310.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3587
[2024-06-11 02:18:47,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 1459.68 | bwd_inner_microstep: 1459.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 02:18:49,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.53 | bwd_microstep: 1387.61 | bwd_inner_microstep: 1387.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3629
[2024-06-11 02:18:52,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.22 | bwd_microstep: 1630.29 | bwd_inner_microstep: 1630.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 02:18:53,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1379.59 | bwd_inner_microstep: 1379.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 02:18:56,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.35 | bwd_microstep: 1500.90 | bwd_inner_microstep: 1500.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-11 02:18:57,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.79 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-11 02:18:58,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.53 | bwd_microstep: 1292.09 | bwd_inner_microstep: 1292.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-11 02:19:00,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.72 | bwd_microstep: 818.91 | bwd_inner_microstep: 818.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-11 02:19:01,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1305.89 | bwd_inner_microstep: 1305.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 02:19:04,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.89 | bwd_microstep: 1549.52 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771
[2024-06-11 02:19:05,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1341.11 | bwd_inner_microstep: 1341.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3586
[2024-06-11 02:19:07,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1306.57 | bwd_inner_microstep: 1306.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-11 02:19:09,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.83 | bwd_microstep: 1485.05 | bwd_inner_microstep: 1485.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2232
[2024-06-11 02:19:11,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.68 | bwd_microstep: 959.93 | bwd_inner_microstep: 959.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 02:19:12,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.93 | bwd_microstep: 971.40 | bwd_inner_microstep: 971.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447
[2024-06-11 02:19:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-11 02:19:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 5764.51 | bwd_inner_microstep: 1639.49 | bwd_allreduce_microstep: 4124.97 | step_microstep: 37.70
[2024-06-11 02:19:18,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.71 | bwd: 45265.14 | bwd_inner: 41139.27 | bwd_allreduce: 4125.20 | step: 39.14
{'loss': 1.1794, 'learning_rate': 2.093782744189217e-06, 'epoch': 0.86}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-11 02:19:20,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.52 | bwd_microstep: 1265.45 | bwd_inner_microstep: 1265.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4027
[2024-06-11 02:19:22,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.30 | bwd_microstep: 1609.31 | bwd_inner_microstep: 1609.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3860
[2024-06-11 02:19:25,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.56 | bwd_microstep: 1661.94 | bwd_inner_microstep: 1661.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3577
[2024-06-11 02:19:26,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.25 | bwd_microstep: 1333.13 | bwd_inner_microstep: 1333.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 02:19:28,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960
[2024-06-11 02:19:29,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 826.88 | bwd_inner_microstep: 826.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 02:19:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1297.48 | bwd_inner_microstep: 1297.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 02:19:33,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1278.33 | bwd_inner_microstep: 1278.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734
[2024-06-11 02:19:35,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.21 | bwd_microstep: 1535.74 | bwd_inner_microstep: 1535.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695
[2024-06-11 02:19:37,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1528.64 | bwd_inner_microstep: 1528.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-11 02:19:39,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.80 | bwd_microstep: 1282.97 | bwd_inner_microstep: 1282.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3647
[2024-06-11 02:19:41,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.06 | bwd_microstep: 1352.37 | bwd_inner_microstep: 1352.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 02:19:43,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-11 02:19:45,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.68 | bwd_microstep: 1309.21 | bwd_inner_microstep: 1309.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 02:19:46,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 02:19:48,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1388.80 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-11 02:19:51,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.65 | bwd_microstep: 1713.06 | bwd_inner_microstep: 1713.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-11 02:19:53,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1496.07 | bwd_inner_microstep: 1496.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3626
[2024-06-11 02:19:55,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1561.76 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 02:19:57,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1350.68 | bwd_inner_microstep: 1350.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 02:19:59,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1555.80 | bwd_inner_microstep: 1555.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-11 02:20:01,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1524.21 | bwd_inner_microstep: 1524.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 02:20:03,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544
[2024-06-11 02:20:05,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.00 | bwd_microstep: 1426.08 | bwd_inner_microstep: 1426.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 02:20:07,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1256.51 | bwd_inner_microstep: 1256.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1134
[2024-06-11 02:20:07,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 171.76 | bwd_microstep: 447.99 | bwd_inner_microstep: 447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570
[2024-06-11 02:20:09,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1463.24 | bwd_inner_microstep: 1463.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446
[2024-06-11 02:20:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 1408.48 | bwd_inner_microstep: 1408.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3471
[2024-06-11 02:20:13,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1329.86 | bwd_inner_microstep: 1329.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-11 02:20:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1506.35 | bwd_inner_microstep: 1506.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-11 02:20:17,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1346.10 | bwd_inner_microstep: 1346.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774
[2024-06-11 02:20:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.29 | optimizer_step: 6.63
[2024-06-11 02:20:58,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.32 | bwd_microstep: 40029.47 | bwd_inner_microstep: 1977.36 | bwd_allreduce_microstep: 38052.04 | step_microstep: 38.60
[2024-06-11 02:20:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16592.66 | bwd: 82655.66 | bwd_inner: 44602.70 | bwd_allreduce: 38052.28 | step: 40.00
{'loss': 1.1697, 'learning_rate': 2.077095088426102e-06, 'epoch': 0.86}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 02:21:00,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.19 | bwd_microstep: 1354.00 | bwd_inner_microstep: 1353.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 02:21:02,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1268.59 | bwd_inner_microstep: 1268.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-11 02:21:03,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.27 | bwd_microstep: 1270.97 | bwd_inner_microstep: 1270.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2321
[2024-06-11 02:21:05,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.19 | bwd_microstep: 910.02 | bwd_inner_microstep: 909.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862
[2024-06-11 02:21:07,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.06 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3761
[2024-06-11 02:21:09,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 02:21:10,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.11 | bwd_microstep: 1277.05 | bwd_inner_microstep: 1277.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-11 02:21:13,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.26 | bwd_microstep: 1520.95 | bwd_inner_microstep: 1520.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 02:21:45,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.91 | bwd_microstep: 1276.18 | bwd_inner_microstep: 1276.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 02:22:02,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.44 | bwd_microstep: 1373.16 | bwd_inner_microstep: 1373.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702
[2024-06-11 02:22:04,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405
[2024-06-11 02:22:06,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1295.68 | bwd_inner_microstep: 1295.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-11 02:22:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923
[2024-06-11 02:22:09,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.66 | bwd_microstep: 812.56 | bwd_inner_microstep: 812.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3646
[2024-06-11 02:22:11,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1337.99 | bwd_inner_microstep: 1337.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-11 02:22:13,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.99 | bwd_microstep: 1591.40 | bwd_inner_microstep: 1591.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-11 02:22:15,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1377.61 | bwd_inner_microstep: 1377.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-11 02:22:16,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.99 | bwd_microstep: 796.13 | bwd_inner_microstep: 796.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-11 02:22:18,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.50 | bwd_microstep: 1176.98 | bwd_inner_microstep: 1176.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-11 02:22:20,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.60 | bwd_microstep: 1648.95 | bwd_inner_microstep: 1648.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918
[2024-06-11 02:22:21,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.09 | bwd_microstep: 685.24 | bwd_inner_microstep: 685.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670
[2024-06-11 02:22:23,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.66 | bwd_microstep: 1449.15 | bwd_inner_microstep: 1449.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-11 02:22:25,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.57 | bwd_microstep: 1488.62 | bwd_inner_microstep: 1488.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-11 02:22:27,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1340.56 | bwd_inner_microstep: 1340.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2529
[2024-06-11 02:22:29,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.93 | bwd_microstep: 1036.30 | bwd_inner_microstep: 1036.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 02:22:31,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 02:22:32,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1393.76 | bwd_inner_microstep: 1393.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-11 02:22:33,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.43 | bwd_microstep: 697.42 | bwd_inner_microstep: 697.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3623
[2024-06-11 02:22:36,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.83 | bwd_microstep: 1803.77 | bwd_inner_microstep: 1803.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-11 02:22:38,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.65 | bwd_microstep: 1590.15 | bwd_inner_microstep: 1590.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 02:22:40,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1345.51 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-11 02:22:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-11 02:22:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.92 | bwd_microstep: 4262.10 | bwd_inner_microstep: 1416.22 | bwd_allreduce_microstep: 2845.82 | step_microstep: 38.05
[2024-06-11 02:22:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15662.40 | bwd: 44683.26 | bwd_inner: 41836.52 | bwd_allreduce: 2846.06 | step: 39.52
{'loss': 1.1634, 'learning_rate': 2.0604705570409166e-06, 'epoch': 0.86}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965
[2024-06-11 02:22:46,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.20 | bwd_microstep: 881.29 | bwd_inner_microstep: 881.15 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-11 02:22:48,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1487.73 | bwd_inner_microstep: 1487.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857
[2024-06-11 02:22:50,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.22 | bwd_microstep: 1658.63 | bwd_inner_microstep: 1658.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-11 02:22:52,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.94 | bwd_microstep: 1444.60 | bwd_inner_microstep: 1444.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 02:22:54,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1277.06 | bwd_inner_microstep: 1277.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948
[2024-06-11 02:22:55,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.97 | bwd_microstep: 825.39 | bwd_inner_microstep: 825.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884
[2024-06-11 02:22:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 714.10 | bwd_inner_microstep: 714.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224
[2024-06-11 02:22:57,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 958.76 | bwd_inner_microstep: 958.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-11 02:22:59,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.55 | bwd_microstep: 779.48 | bwd_inner_microstep: 779.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 02:23:00,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1282.24 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 02:23:02,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1392.84 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-11 02:23:04,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1398.01 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578
[2024-06-11 02:23:06,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.97 | bwd_microstep: 1500.19 | bwd_inner_microstep: 1500.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-11 02:23:08,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1408.25 | bwd_inner_microstep: 1408.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302
[2024-06-11 02:23:10,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.26 | bwd_microstep: 978.79 | bwd_inner_microstep: 978.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-11 02:23:11,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.94 | bwd_microstep: 798.70 | bwd_inner_microstep: 798.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3671
[2024-06-11 02:23:13,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-11 02:23:15,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.26 | bwd_microstep: 1632.02 | bwd_inner_microstep: 1632.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-11 02:23:17,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.48 | bwd_microstep: 1289.97 | bwd_inner_microstep: 1289.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2300
[2024-06-11 02:23:18,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.34 | bwd_microstep: 881.79 | bwd_inner_microstep: 881.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-11 02:23:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1297.00 | bwd_inner_microstep: 1296.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2289
[2024-06-11 02:23:21,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.55 | bwd_microstep: 1022.35 | bwd_inner_microstep: 1022.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3550
[2024-06-11 02:23:23,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.07 | bwd_microstep: 1346.04 | bwd_inner_microstep: 1346.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 02:23:25,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.20 | bwd_microstep: 1401.90 | bwd_inner_microstep: 1401.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 02:23:27,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.39 | bwd_microstep: 1560.40 | bwd_inner_microstep: 1560.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 02:23:29,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1394.19 | bwd_inner_microstep: 1394.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3721
[2024-06-11 02:23:31,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.88 | bwd_microstep: 1601.01 | bwd_inner_microstep: 1600.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3820
[2024-06-11 02:23:34,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.72 | bwd_microstep: 1812.98 | bwd_inner_microstep: 1812.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-11 02:23:36,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1648.10 | bwd_inner_microstep: 1648.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3792
[2024-06-11 02:23:38,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.69 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-11 02:23:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1306.64 | bwd_inner_microstep: 1306.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 02:24:03,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.24 | optimizer_step: 6.64
[2024-06-11 02:24:03,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 22291.04 | bwd_inner_microstep: 1552.76 | bwd_allreduce_microstep: 20738.21 | step_microstep: 38.84
[2024-06-11 02:24:03,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15525.10 | bwd: 62405.76 | bwd_inner: 41666.53 | bwd_allreduce: 20738.50 | step: 40.34
{'loss': 1.1499, 'learning_rate': 2.0439092085851685e-06, 'epoch': 0.86}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3555
[2024-06-11 02:24:05,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.66 | bwd_microstep: 1576.35 | bwd_inner_microstep: 1576.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 02:24:07,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1341.64 | bwd_inner_microstep: 1341.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 02:24:09,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1382.12 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814
[2024-06-11 02:24:11,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1476.84 | bwd_inner_microstep: 1476.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750
[2024-06-11 02:24:13,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1334.99 | bwd_inner_microstep: 1334.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-11 02:24:14,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.65 | bwd_microstep: 814.50 | bwd_inner_microstep: 814.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3792
[2024-06-11 02:24:16,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.53 | bwd_microstep: 1794.57 | bwd_inner_microstep: 1794.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3425
[2024-06-11 02:24:18,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.51 | bwd_microstep: 1180.44 | bwd_inner_microstep: 1180.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-11 02:24:41,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.76 | bwd_microstep: 1467.88 | bwd_inner_microstep: 1467.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3477
[2024-06-11 02:24:43,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.96 | bwd_microstep: 1533.32 | bwd_inner_microstep: 1533.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 02:24:45,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1479.74 | bwd_inner_microstep: 1479.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 02:24:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.29 | bwd_microstep: 1374.13 | bwd_inner_microstep: 1374.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 02:24:49,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.80 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569
[2024-06-11 02:24:51,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.26 | bwd_microstep: 1201.47 | bwd_inner_microstep: 1201.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 02:24:53,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1247.75 | bwd_inner_microstep: 1247.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 02:24:54,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1385.79 | bwd_inner_microstep: 1385.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-11 02:24:57,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.97 | bwd_microstep: 1552.52 | bwd_inner_microstep: 1552.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667
[2024-06-11 02:24:59,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.37 | bwd_microstep: 1615.54 | bwd_inner_microstep: 1615.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 02:25:01,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.07 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 02:25:03,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.20 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3598
[2024-06-11 02:25:05,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.69 | bwd_microstep: 1210.96 | bwd_inner_microstep: 1210.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3896
[2024-06-11 02:25:07,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.25 | bwd_microstep: 1489.00 | bwd_inner_microstep: 1488.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-11 02:25:08,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1185.96 | bwd_inner_microstep: 1185.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 02:25:10,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1412.09 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-11 02:25:12,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.40 | bwd_microstep: 1542.44 | bwd_inner_microstep: 1542.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009
[2024-06-11 02:25:14,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.79 | bwd_microstep: 893.45 | bwd_inner_microstep: 893.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 893
[2024-06-11 02:25:14,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.28 | bwd_microstep: 369.02 | bwd_inner_microstep: 368.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-11 02:25:16,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2049
[2024-06-11 02:25:17,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.49 | bwd_microstep: 873.92 | bwd_inner_microstep: 873.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 02:25:19,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.06 | bwd_microstep: 1411.27 | bwd_inner_microstep: 1411.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3401
[2024-06-11 02:25:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1396.42 | bwd_inner_microstep: 1396.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581
[2024-06-11 02:25:23,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.05 | optimizer_step: 6.61
[2024-06-11 02:25:23,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1401.00 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.41
[2024-06-11 02:25:23,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15930.54 | bwd: 42566.13 | bwd_inner: 42557.62 | bwd_allreduce: 7.83 | step: 38.87
{'loss': 1.1997, 'learning_rate': 2.0274111013878418e-06, 'epoch': 0.86}
 1479/1726 [25:39:52<5:06:53, 74.55s/it]
 86%|████████▌ | 1480/1726 [25:41:55<6:05:04, 89.04s/it]


 86%|████████▌ | 1480/1726 [25:41:55<6:05:04, 89.04s/it]
 86%|████████▌ | 1481/1726 [25:43:35<6:16:29, 92.20s/it]


 86%|████████▌ | 1481/1726 [25:43:35<6:16:29, 92.20s/it]
 86%|████████▌ | 1482/1726 [25:45:21<6:32:48, 96.59s/it]


 86%|████████▌ | 1482/1726 [25:45:21<6:32:48, 96.59s/it]
 86%|████████▌ | 1483/1726 [25:46:40<6:08:55, 91.09s/it]


 86%|████████▌ | 1483/1726 [25:46:40<6:08:55, 91.09s/it]
 86%|████████▌ | 1484/1726 [25:48:00<5:53:56, 87.76s/it]


 86%|████�dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-11 02:25:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1470.85 | bwd_inner_microstep: 1470.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401
[2024-06-11 02:25:27,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.55 | bwd_microstep: 1209.35 | bwd_inner_microstep: 1209.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298
[2024-06-11 02:25:28,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.50 | bwd_microstep: 972.17 | bwd_inner_microstep: 972.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808
[2024-06-11 02:25:30,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.81 | bwd_microstep: 1380.02 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478
[2024-06-11 02:25:32,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1184.74 | bwd_inner_microstep: 1184.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3749
[2024-06-11 02:25:33,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1367.15 | bwd_inner_microstep: 1367.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-11 02:25:35,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.76 | bwd_microstep: 1152.00 | bwd_inner_microstep: 1151.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 02:25:37,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.54 | bwd_microstep: 1387.91 | bwd_inner_microstep: 1387.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-11 02:25:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.98 | bwd_microstep: 1531.94 | bwd_inner_microstep: 1531.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2127
[2024-06-11 02:25:40,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.98 | bwd_microstep: 832.86 | bwd_inner_microstep: 832.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569
[2024-06-11 02:25:42,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1365.51 | bwd_inner_microstep: 1365.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2900
[2024-06-11 02:25:44,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.57 | bwd_microstep: 1171.86 | bwd_inner_microstep: 1171.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692
[2024-06-11 02:25:46,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.64 | bwd_microstep: 1458.49 | bwd_inner_microstep: 1458.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-11 02:25:48,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.07 | bwd_microstep: 1477.67 | bwd_inner_microstep: 1477.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997
[2024-06-11 02:25:49,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.16 | bwd_microstep: 705.81 | bwd_inner_microstep: 705.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519
[2024-06-11 02:25:51,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1419.76 | bwd_inner_microstep: 1419.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 02:25:52,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 791.37 | bwd_inner_microstep: 791.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-11 02:25:53,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.87 | bwd_microstep: 889.55 | bwd_inner_microstep: 889.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-11 02:25:55,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.85 | bwd_microstep: 1529.52 | bwd_inner_microstep: 1529.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 02:25:57,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1349.32 | bwd_inner_microstep: 1349.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-11 02:25:58,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 809.66 | bwd_inner_microstep: 809.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 02:26:00,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1253.86 | bwd_inner_microstep: 1253.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 02:26:02,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1256.16 | bwd_inner_microstep: 1256.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3530
[2024-06-11 02:26:03,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.46 | bwd_microstep: 1226.42 | bwd_inner_microstep: 1226.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449
[2024-06-11 02:26:05,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.59 | bwd_microstep: 1189.97 | bwd_inner_microstep: 1189.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3722
[2024-06-11 02:26:07,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1370.80 | bwd_inner_microstep: 1370.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050
[2024-06-11 02:26:08,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.02 | bwd_microstep: 911.51 | bwd_inner_microstep: 911.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-11 02:26:10,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1511.73 | bwd_inner_microstep: 1511.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2025
[2024-06-11 02:26:11,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.53 | bwd_microstep: 777.76 | bwd_inner_microstep: 777.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2272
[2024-06-11 02:26:13,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.79 | bwd_microstep: 1001.20 | bwd_inner_microstep: 1001.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 02:26:15,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.73 | bwd_microstep: 1647.93 | bwd_inner_microstep: 1647.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-11 02:26:47,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.58
[2024-06-11 02:26:47,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.89 | bwd_microstep: 31329.12 | bwd_inner_microstep: 1629.42 | bwd_allreduce_microstep: 29699.63 | step_microstep: 38.76
[2024-06-11 02:26:47,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14676.68 | bwd: 68933.98 | bwd_inner: 39233.42 | bwd_allreduce: 29699.88 | step: 40.17
{'loss': 1.1422, 'learning_rate': 2.010976293555189e-06, 'epoch': 0.86}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028
[2024-06-11 02:26:48,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.38 | bwd_microstep: 896.74 | bwd_inner_microstep: 896.56 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3955
[2024-06-11 02:26:50,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.01 | bwd_microstep: 1583.07 | bwd_inner_microstep: 1583.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821
[2024-06-11 02:26:52,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1502.82 | bwd_inner_microstep: 1502.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2468
[2024-06-11 02:26:54,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.69 | bwd_microstep: 948.17 | bwd_inner_microstep: 948.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 02:26:55,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1272.29 | bwd_inner_microstep: 1272.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3420
[2024-06-11 02:26:57,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.64 | bwd_microstep: 1180.27 | bwd_inner_microstep: 1180.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879
[2024-06-11 02:26:58,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.22 | bwd_microstep: 681.17 | bwd_inner_microstep: 681.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-11 02:27:00,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.13 | bwd_microstep: 1535.09 | bwd_inner_microstep: 1535.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425
[2024-06-11 02:27:49,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.61 | bwd_microstep: 1146.88 | bwd_inner_microstep: 1146.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3727
[2024-06-11 02:27:51,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.97 | bwd_microstep: 1354.07 | bwd_inner_microstep: 1354.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-11 02:27:53,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.80 | bwd_microstep: 1377.40 | bwd_inner_microstep: 1377.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583
[2024-06-11 02:27:55,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.10 | bwd_microstep: 1232.20 | bwd_inner_microstep: 1232.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-11 02:27:57,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1281.57 | bwd_inner_microstep: 1281.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-11 02:27:58,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.96 | bwd_microstep: 1275.16 | bwd_inner_microstep: 1275.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-11 02:28:00,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.84 | bwd_microstep: 1570.84 | bwd_inner_microstep: 1570.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649
[2024-06-11 02:28:03,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.88 | bwd_microstep: 1535.80 | bwd_inner_microstep: 1535.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 02:28:04,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.28 | bwd_microstep: 1389.07 | bwd_inner_microstep: 1389.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3627
[2024-06-11 02:28:06,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.06 | bwd_microstep: 1210.64 | bwd_inner_microstep: 1210.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3673
[2024-06-11 02:28:08,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.74 | bwd_microstep: 1449.39 | bwd_inner_microstep: 1449.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-11 02:28:09,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.30 | bwd_microstep: 799.59 | bwd_inner_microstep: 799.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 02:28:11,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1368.10 | bwd_inner_microstep: 1368.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000
[2024-06-11 02:28:12,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.64 | bwd_microstep: 768.05 | bwd_inner_microstep: 768.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 02:28:14,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1388.54 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-11 02:28:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1401.57 | bwd_inner_microstep: 1401.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 02:28:18,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1248.94 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2298
[2024-06-11 02:28:19,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.26 | bwd_microstep: 1003.34 | bwd_inner_microstep: 1003.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3877
[2024-06-11 02:28:22,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.25 | bwd_microstep: 1779.15 | bwd_inner_microstep: 1779.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280
[2024-06-11 02:28:23,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.51 | bwd_microstep: 969.19 | bwd_inner_microstep: 969.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-11 02:28:25,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.04 | bwd_microstep: 1640.70 | bwd_inner_microstep: 1640.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-11 02:28:26,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.69 | bwd_microstep: 788.39 | bwd_inner_microstep: 788.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-11 02:28:28,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1428.92 | bwd_inner_microstep: 1428.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3786
[2024-06-11 02:28:37,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-11 02:28:37,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.25 | bwd_microstep: 8401.36 | bwd_inner_microstep: 2014.68 | bwd_allreduce_microstep: 6386.62 | step_microstep: 38.49
[2024-06-11 02:28:37,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15328.81 | bwd: 47408.52 | bwd_inner: 41020.84 | bwd_allreduce: 6386.92 | step: 39.93
{'loss': 1.1004, 'learning_rate': 1.9946048429705133e-06, 'epoch': 0.86}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417
[2024-06-11 02:28:39,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.45 | bwd_microstep: 1433.80 | bwd_inner_microstep: 1433.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3951
[2024-06-11 02:28:42,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.12 | bwd_microstep: 1588.48 | bwd_inner_microstep: 1588.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 02:28:43,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1278.13 | bwd_inner_microstep: 1278.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 02:28:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.88 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-11 02:28:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.33 | bwd_microstep: 1147.98 | bwd_inner_microstep: 1147.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-11 02:28:49,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.60 | bwd_microstep: 1276.18 | bwd_inner_microstep: 1276.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-11 02:28:50,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1293.20 | bwd_inner_microstep: 1293.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-11 02:28:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1298.81 | bwd_inner_microstep: 1298.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-11 02:28:54,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1413.98 | bwd_inner_microstep: 1413.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708
[2024-06-11 02:28:56,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1527.46 | bwd_inner_microstep: 1527.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3632
[2024-06-11 02:28:58,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1345.16 | bwd_inner_microstep: 1345.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3674
[2024-06-11 02:29:00,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1562.30 | bwd_inner_microstep: 1562.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-11 02:29:01,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.83 | bwd_microstep: 676.61 | bwd_inner_microstep: 676.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-11 02:29:03,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1520.83 | bwd_inner_microstep: 1520.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-11 02:29:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.93 | bwd_microstep: 1243.90 | bwd_inner_microstep: 1243.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3677
[2024-06-11 02:29:07,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1553.41 | bwd_inner_microstep: 1553.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-11 02:29:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.58 | bwd_microstep: 1587.81 | bwd_inner_microstep: 1587.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3582
[2024-06-11 02:29:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.04 | bwd_microstep: 1205.86 | bwd_inner_microstep: 1205.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-11 02:29:13,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1613.35 | bwd_inner_microstep: 1613.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 02:29:15,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1388.24 | bwd_inner_microstep: 1388.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3709
[2024-06-11 02:29:17,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.06 | bwd_microstep: 1296.73 | bwd_inner_microstep: 1296.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2163
[2024-06-11 02:29:18,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.67 | bwd_microstep: 759.62 | bwd_inner_microstep: 759.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-11 02:29:20,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.68 | bwd_inner_microstep: 1459.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2009
[2024-06-11 02:29:21,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.45 | bwd_microstep: 709.97 | bwd_inner_microstep: 709.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3564
[2024-06-11 02:29:23,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.57 | bwd_microstep: 1346.71 | bwd_inner_microstep: 1346.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458
[2024-06-11 02:29:25,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.96 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563
[2024-06-11 02:29:27,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1299.73 | bwd_inner_microstep: 1299.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3741
[2024-06-11 02:29:28,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 02:29:31,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.22 | bwd_microstep: 1647.63 | bwd_inner_microstep: 1647.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-11 02:29:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.18 | bwd_microstep: 779.75 | bwd_inner_microstep: 779.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827
[2024-06-11 02:29:34,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1481.32 | bwd_inner_microstep: 1481.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811
[2024-06-11 02:29:52,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-11 02:29:52,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.95 | bwd_microstep: 17053.75 | bwd_inner_microstep: 1986.10 | bwd_allreduce_microstep: 15067.58 | step_microstep: 38.64
[2024-06-11 02:29:52,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15984.52 | bwd: 57912.87 | bwd_inner: 42844.36 | bwd_allreduce: 15067.82 | step: 40.04
{'loss': 1.1487, 'learning_rate': 1.9782968072939803e-06, 'epoch': 0.86}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-11 02:29:54,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.78 | bwd_microstep: 1458.10 | bwd_inner_microstep: 1458.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-11 02:29:55,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.84 | bwd_microstep: 1341.42 | bwd_inner_microstep: 1341.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3845
[2024-06-11 02:29:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.40 | bwd_microstep: 1601.39 | bwd_inner_microstep: 1601.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-11 02:29:59,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.20 | bwd_microstep: 1149.05 | bwd_inner_microstep: 1149.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821
[2024-06-11 02:30:01,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1381.43 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-11 02:30:03,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.70 | bwd_microstep: 1181.12 | bwd_inner_microstep: 1181.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-11 02:30:05,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.34 | bwd_microstep: 1283.32 | bwd_inner_microstep: 1283.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-11 02:30:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.77 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-11 02:30:08,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.31 | bwd_microstep: 1150.46 | bwd_inner_microstep: 1150.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3412
[2024-06-11 02:30:10,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 1306.39 | bwd_inner_microstep: 1306.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520
[2024-06-11 02:30:12,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.24 | bwd_microstep: 1221.23 | bwd_inner_microstep: 1221.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-11 02:30:14,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.98 | bwd_microstep: 1366.56 | bwd_inner_microstep: 1366.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645
[2024-06-11 02:30:16,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1510.75 | bwd_inner_microstep: 1510.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3658
[2024-06-11 02:30:18,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.04 | bwd_microstep: 1813.63 | bwd_inner_microstep: 1813.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3674
[2024-06-11 02:30:21,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.19 | bwd_microstep: 1721.13 | bwd_inner_microstep: 1721.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298
[2024-06-11 02:30:22,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.87 | bwd_microstep: 977.29 | bwd_inner_microstep: 977.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-11 02:30:24,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1345.58 | bwd_inner_microstep: 1345.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1851
[2024-06-11 02:30:25,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.33 | bwd_microstep: 671.94 | bwd_inner_microstep: 671.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-11 02:30:26,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.04 | bwd_microstep: 1159.72 | bwd_inner_microstep: 1159.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-11 02:30:29,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.49 | bwd_microstep: 1560.34 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-11 02:30:30,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.10 | bwd_microstep: 1286.60 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189
[2024-06-11 02:30:32,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.65 | bwd_microstep: 858.51 | bwd_inner_microstep: 858.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-11 02:30:33,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.64 | bwd_microstep: 971.23 | bwd_inner_microstep: 971.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3813
[2024-06-11 02:30:35,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1413.79 | bwd_inner_microstep: 1413.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-11 02:30:37,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.28 | bwd_microstep: 1649.30 | bwd_inner_microstep: 1649.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-11 02:30:39,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.73 | bwd_microstep: 1526.79 | bwd_inner_microstep: 1526.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3756
[2024-06-11 02:30:41,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1274.70 | bwd_inner_microstep: 1274.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-11 02:30:42,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.10 | bwd_microstep: 875.86 | bwd_inner_microstep: 875.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571
[2024-06-11 02:30:44,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1418.12 | bwd_inner_microstep: 1418.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059
[2024-06-11 02:30:45,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.64 | bwd_microstep: 909.30 | bwd_inner_microstep: 909.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3581
[2024-06-11 02:30:47,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1455.75 | bwd_inner_microstep: 1455.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033
[2024-06-11 02:31:13,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-11 02:31:13,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.33 | bwd_microstep: 25338.52 | bwd_inner_microstep: 964.46 | bwd_allreduce_microstep: 24374.01 | step_microstep: 37.99
[2024-06-11 02:31:13,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15443.45 | bwd: 65722.44 | bwd_inner: 41347.52 | bwd_allreduce: 24374.24 | step: 39.46
{'loss': 1.1649, 'learning_rate': 1.9620522439624025e-06, 'epoch': 0.86}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 02:31:15,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.92 | bwd_microstep: 1362.31 | bwd_inner_microstep: 1362.12 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 02:31:17,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 02:31:19,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1546.34 | bwd_inner_microstep: 1546.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789
[2024-06-11 02:31:21,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.18 | bwd_microstep: 1541.69 | bwd_inner_microstep: 1541.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-11 02:31:23,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.10 | bwd_microstep: 1177.74 | bwd_inner_microstep: 1177.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-11 02:31:25,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1377.85 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-11 02:31:26,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.49 | bwd_microstep: 1298.36 | bwd_inner_microstep: 1298.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 02:31:28,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1244.03 | bwd_inner_microstep: 1244.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498
[2024-06-11 02:32:16,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.18 | bwd_microstep: 1434.66 | bwd_inner_microstep: 1434.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914
[2024-06-11 02:32:17,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.65 | bwd_microstep: 774.44 | bwd_inner_microstep: 774.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038
[2024-06-11 02:32:18,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.55 | bwd_microstep: 900.14 | bwd_inner_microstep: 900.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658
[2024-06-11 02:32:20,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1413.86 | bwd_inner_microstep: 1413.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410
[2024-06-11 02:32:22,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.36 | bwd_microstep: 1337.30 | bwd_inner_microstep: 1337.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3488
[2024-06-11 02:32:24,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1563.20 | bwd_inner_microstep: 1563.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3624
[2024-06-11 02:32:26,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1342.13 | bwd_inner_microstep: 1342.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 02:32:28,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966
[2024-06-11 02:32:29,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.11 | bwd_microstep: 701.53 | bwd_inner_microstep: 701.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1932
[2024-06-11 02:32:30,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.54 | bwd_microstep: 728.62 | bwd_inner_microstep: 728.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-11 02:32:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.20 | bwd_microstep: 1189.44 | bwd_inner_microstep: 1189.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3637
[2024-06-11 02:32:33,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1442.15 | bwd_inner_microstep: 1442.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-11 02:32:35,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1509.63 | bwd_inner_microstep: 1509.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1985
[2024-06-11 02:32:36,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.33 | bwd_microstep: 705.64 | bwd_inner_microstep: 705.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-11 02:32:38,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1252.79 | bwd_inner_microstep: 1252.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 02:32:40,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1248.59 | bwd_inner_microstep: 1248.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2142
[2024-06-11 02:32:41,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.56 | bwd_microstep: 737.74 | bwd_inner_microstep: 737.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-11 02:32:42,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.50 | bwd_microstep: 877.40 | bwd_inner_microstep: 877.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602
[2024-06-11 02:32:44,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1535.00 | bwd_inner_microstep: 1534.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2233
[2024-06-11 02:32:46,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.30 | bwd_microstep: 1058.16 | bwd_inner_microstep: 1058.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 02:32:47,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1388.68 | bwd_inner_microstep: 1388.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-11 02:32:50,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1550.86 | bwd_inner_microstep: 1550.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2064
[2024-06-11 02:32:51,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.21 | bwd_microstep: 1010.76 | bwd_inner_microstep: 1010.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3821
[2024-06-11 02:32:58,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-11 02:32:58,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.54 | bwd_microstep: 6458.81 | bwd_inner_microstep: 1810.77 | bwd_allreduce_microstep: 4647.99 | step_microstep: 37.96
[2024-06-11 02:32:58,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14876.35 | bwd: 44439.91 | bwd_inner: 39790.87 | bwd_allreduce: 4648.30 | step: 39.54
{'loss': 1.1968, 'learning_rate': 1.945871210189054e-06, 'epoch': 0.86}
��███▌ | 1484/1726 [25:48:00<5:53:56, 87.76s/it]
 86%|████████▌ | 1485/1726 [25:49:24<5:47:52, 86.61s/it]


 86%|████████▌ | 1485/1726 [25:49:24<5:47:52, 86.61s/it]
 86%|████████▌ | 1486/1726 [25:51:14<6:15:07, 93.78s/it]


 86%|████████▌ | 1486/1726 [25:51:14<6:15:07, 93.78s/it]
 86%|████████▌ | 1487/1726 [25:52:28<5:50:11, 87.91s/it]


 86%|████████▌ | 1487/1726 [25:52:28<5:50:11, 87.91s/it]
 86%|████████▌ | 1488/1726 [25:53:50<5:41:04, 85.99s/it]


 86%|████████▌ | 1488/1726 [25:53:50<5:41:04, 85.99s/it]
 86%|████████▋ | 1489/1726 [25:55:35<6:02:10, 91.69s/it]


 8dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-11 02:33:00,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.43 | bwd_microstep: 1145.44 | bwd_inner_microstep: 1145.27 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-11 02:33:01,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.93 | bwd_microstep: 1149.83 | bwd_inner_microstep: 1149.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 02:33:03,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.98 | bwd_microstep: 1269.34 | bwd_inner_microstep: 1269.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-11 02:33:04,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.51 | bwd_microstep: 676.21 | bwd_inner_microstep: 676.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789
[2024-06-11 02:33:06,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.33 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-11 02:33:08,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.53 | bwd_microstep: 1528.15 | bwd_inner_microstep: 1528.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 02:33:10,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-11 02:33:11,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.90 | bwd_microstep: 1147.68 | bwd_inner_microstep: 1147.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1953
[2024-06-11 02:33:12,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.14 | bwd_microstep: 728.44 | bwd_inner_microstep: 728.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 02:33:14,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3715
[2024-06-11 02:33:16,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.75 | bwd_microstep: 1491.33 | bwd_inner_microstep: 1491.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-11 02:33:18,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1247.42 | bwd_inner_microstep: 1247.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4186
[2024-06-11 02:33:21,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 685.27 | bwd_microstep: 1853.00 | bwd_inner_microstep: 1852.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2644
[2024-06-11 02:33:22,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.37 | bwd_microstep: 1207.49 | bwd_inner_microstep: 1207.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1878
[2024-06-11 02:33:23,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.02 | bwd_microstep: 769.30 | bwd_inner_microstep: 769.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628
[2024-06-11 02:33:26,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1607.14 | bwd_inner_microstep: 1607.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 02:33:28,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2000
[2024-06-11 02:33:29,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.57 | bwd_microstep: 740.41 | bwd_inner_microstep: 740.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816
[2024-06-11 02:33:31,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.34 | bwd_microstep: 1600.25 | bwd_inner_microstep: 1600.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517
[2024-06-11 02:33:33,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.48 | bwd_microstep: 1319.40 | bwd_inner_microstep: 1319.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-11 02:33:35,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1628.95 | bwd_inner_microstep: 1628.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-11 02:33:37,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1609.23 | bwd_inner_microstep: 1609.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-11 02:33:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1296.29 | bwd_inner_microstep: 1296.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2043
[2024-06-11 02:33:40,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.05 | bwd_microstep: 885.52 | bwd_inner_microstep: 885.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-11 02:33:42,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.32 | bwd_microstep: 1484.33 | bwd_inner_microstep: 1484.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3843
[2024-06-11 02:33:44,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.79 | bwd_microstep: 1513.28 | bwd_inner_microstep: 1513.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 02:33:46,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1372.24 | bwd_inner_microstep: 1372.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947
[2024-06-11 02:33:47,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.54 | bwd_microstep: 697.98 | bwd_inner_microstep: 697.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2646
[2024-06-11 02:33:49,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.65 | bwd_microstep: 1017.36 | bwd_inner_microstep: 1017.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-11 02:33:51,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.99 | bwd_microstep: 1289.79 | bwd_inner_microstep: 1289.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-11 02:33:53,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.88 | bwd_microstep: 1536.76 | bwd_inner_microstep: 1536.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 02:35:07,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-11 02:35:07,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 74202.91 | bwd_inner_microstep: 1696.99 | bwd_allreduce_microstep: 72505.86 | step_microstep: 38.89
[2024-06-11 02:35:07,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15370.78 | bwd: 113640.46 | bwd_inner: 41133.55 | bwd_allreduce: 72506.17 | step: 40.45
{'loss': 1.1356, 'learning_rate': 1.9297537629634486e-06, 'epoch': 0.86}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 02:35:09,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.11 | bwd_microstep: 1227.04 | bwd_inner_microstep: 1227.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-11 02:35:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.97 | bwd_microstep: 785.52 | bwd_inner_microstep: 785.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2354
[2024-06-11 02:35:12,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.72 | bwd_microstep: 916.48 | bwd_inner_microstep: 916.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-11 02:35:14,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.49 | bwd_microstep: 1572.51 | bwd_inner_microstep: 1572.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-11 02:35:15,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1304.41 | bwd_inner_microstep: 1304.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 02:35:17,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.56 | bwd_microstep: 1271.23 | bwd_inner_microstep: 1271.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 02:35:19,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1375.25 | bwd_inner_microstep: 1375.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-11 02:35:20,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 793.73 | bwd_inner_microstep: 793.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-11 02:35:21,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.42 | bwd_microstep: 789.36 | bwd_inner_microstep: 789.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 02:35:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1469.93 | bwd_inner_microstep: 1469.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-11 02:35:26,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.53 | bwd_microstep: 1593.37 | bwd_inner_microstep: 1593.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663
[2024-06-11 02:35:28,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.72 | bwd_microstep: 1603.40 | bwd_inner_microstep: 1603.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3658
[2024-06-11 02:36:12,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.87 | bwd_microstep: 1531.40 | bwd_inner_microstep: 1531.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624
[2024-06-11 02:36:14,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.23 | bwd_microstep: 1525.82 | bwd_inner_microstep: 1525.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3642
[2024-06-11 02:36:17,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1662.50 | bwd_inner_microstep: 1662.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-11 02:36:18,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.52 | bwd_microstep: 1333.33 | bwd_inner_microstep: 1333.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 02:36:21,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.09 | bwd_microstep: 1499.20 | bwd_inner_microstep: 1499.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 02:36:23,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.09 | bwd_microstep: 1441.67 | bwd_inner_microstep: 1441.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 02:36:24,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.64 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-11 02:36:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.50 | bwd_microstep: 1150.81 | bwd_inner_microstep: 1150.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-11 02:36:28,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.97 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3508
[2024-06-11 02:36:29,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.91 | bwd_microstep: 1189.38 | bwd_inner_microstep: 1189.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515
[2024-06-11 02:36:31,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1313.87 | bwd_inner_microstep: 1313.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677
[2024-06-11 02:36:33,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.62 | bwd_microstep: 1514.81 | bwd_inner_microstep: 1514.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 02:36:35,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.07 | bwd_microstep: 1249.92 | bwd_inner_microstep: 1249.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 02:36:37,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1379.88 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 02:36:39,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454
[2024-06-11 02:36:41,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.08 | bwd_microstep: 1374.72 | bwd_inner_microstep: 1374.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182
[2024-06-11 02:36:42,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.29 | bwd_microstep: 952.98 | bwd_inner_microstep: 952.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 02:36:44,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.11 | bwd_microstep: 1282.83 | bwd_inner_microstep: 1282.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013
[2024-06-11 02:36:45,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.67 | bwd_microstep: 803.50 | bwd_inner_microstep: 803.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-11 02:36:52,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-11 02:36:52,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 6638.26 | bwd_inner_microstep: 1799.51 | bwd_allreduce_microstep: 4838.69 | step_microstep: 38.19
[2024-06-11 02:36:52,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15571.14 | bwd: 46592.78 | bwd_inner: 41753.18 | bwd_allreduce: 4838.92 | step: 39.67
{'loss': 1.21, 'learning_rate': 1.913699959051152e-06, 'epoch': 0.86}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3456
[2024-06-11 02:36:54,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1291.14 | bwd_inner_microstep: 1291.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-11 02:36:55,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 774.91 | bwd_inner_microstep: 774.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-11 02:36:56,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.96 | bwd_microstep: 967.33 | bwd_inner_microstep: 967.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795
[2024-06-11 02:36:59,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.96 | bwd_microstep: 1643.95 | bwd_inner_microstep: 1643.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 02:37:01,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1375.79 | bwd_inner_microstep: 1375.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 02:37:02,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1245.75 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-11 02:37:04,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1529.15 | bwd_inner_microstep: 1529.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 02:37:06,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1344.89 | bwd_inner_microstep: 1344.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 02:37:08,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1385.55 | bwd_inner_microstep: 1385.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459
[2024-06-11 02:37:10,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1308.48 | bwd_inner_microstep: 1308.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 02:37:12,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.20 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-11 02:37:14,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1477.76 | bwd_inner_microstep: 1477.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-11 02:37:16,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.68 | bwd_microstep: 1500.13 | bwd_inner_microstep: 1500.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3500
[2024-06-11 02:37:18,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1546.80 | bwd_inner_microstep: 1546.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3441
[2024-06-11 02:37:20,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.05 | bwd_microstep: 1514.21 | bwd_inner_microstep: 1514.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 02:37:22,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2560
[2024-06-11 02:37:24,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.86 | bwd_microstep: 1062.78 | bwd_inner_microstep: 1062.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 02:37:26,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.47 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 02:37:27,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3679
[2024-06-11 02:37:29,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1292.77 | bwd_inner_microstep: 1292.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-11 02:37:31,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934
[2024-06-11 02:37:32,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.48 | bwd_microstep: 725.82 | bwd_inner_microstep: 725.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3819
[2024-06-11 02:37:34,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.99 | bwd_microstep: 1582.19 | bwd_inner_microstep: 1582.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-11 02:37:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.14 | bwd_microstep: 878.73 | bwd_inner_microstep: 878.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-11 02:37:38,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.88 | bwd_microstep: 1602.21 | bwd_inner_microstep: 1602.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-11 02:37:40,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1337.91 | bwd_inner_microstep: 1337.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-11 02:37:41,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.40 | bwd_microstep: 808.21 | bwd_inner_microstep: 808.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695
[2024-06-11 02:37:43,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1461.48 | bwd_inner_microstep: 1461.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3471
[2024-06-11 02:37:44,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1266.33 | bwd_inner_microstep: 1266.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2560
[2024-06-11 02:37:46,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.70 | bwd_microstep: 872.67 | bwd_inner_microstep: 872.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-11 02:37:48,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1446.37 | bwd_inner_microstep: 1446.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2234
[2024-06-11 02:37:54,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-11 02:37:54,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.66 | bwd_microstep: 6311.38 | bwd_inner_microstep: 1146.09 | bwd_allreduce_microstep: 5165.22 | step_microstep: 38.79
[2024-06-11 02:37:54,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15369.95 | bwd: 46363.31 | bwd_inner: 41197.16 | bwd_allreduce: 5165.46 | step: 40.28
{'loss': 1.1862, 'learning_rate': 1.8977098549935745e-06, 'epoch': 0.86}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 02:37:56,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1363.74 | bwd_inner_microstep: 1363.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-11 02:37:58,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.19 | bwd_microstep: 1340.73 | bwd_inner_microstep: 1340.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3884
[2024-06-11 02:38:00,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.46 | bwd_microstep: 1645.47 | bwd_inner_microstep: 1645.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-11 02:38:02,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.74 | bwd_microstep: 1310.83 | bwd_inner_microstep: 1310.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 02:38:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.59 | bwd_microstep: 1280.31 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 02:38:06,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.76 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4066
[2024-06-11 02:38:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.52 | bwd_microstep: 1619.67 | bwd_inner_microstep: 1619.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 02:38:10,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.49 | bwd_microstep: 1276.55 | bwd_inner_microstep: 1276.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2085
[2024-06-11 02:38:11,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.34 | bwd_microstep: 733.43 | bwd_inner_microstep: 733.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4063
[2024-06-11 02:38:13,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.58 | bwd_microstep: 1722.68 | bwd_inner_microstep: 1722.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-11 02:38:15,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.99 | bwd_microstep: 1349.21 | bwd_inner_microstep: 1349.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-11 02:38:17,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1628.14 | bwd_inner_microstep: 1628.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-11 02:38:18,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.46 | bwd_microstep: 796.25 | bwd_inner_microstep: 796.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-11 02:38:21,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1576.63 | bwd_inner_microstep: 1576.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-11 02:38:22,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.23 | bwd_microstep: 790.26 | bwd_inner_microstep: 790.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451
[2024-06-11 02:38:24,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.64 | bwd_microstep: 1381.51 | bwd_inner_microstep: 1381.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 02:38:26,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1491.20 | bwd_inner_microstep: 1491.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-11 02:38:27,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1345.02 | bwd_inner_microstep: 1344.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069
[2024-06-11 02:38:29,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.43 | bwd_microstep: 920.21 | bwd_inner_microstep: 920.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 02:38:31,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1299.78 | bwd_inner_microstep: 1299.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-11 02:38:33,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1405.29 | bwd_inner_microstep: 1405.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3680
[2024-06-11 02:38:34,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.17 | bwd_microstep: 1262.46 | bwd_inner_microstep: 1262.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.99
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 02:38:36,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.30 | bwd_microstep: 1399.25 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3718
[2024-06-11 02:38:38,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1564.85 | bwd_inner_microstep: 1564.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3614
[2024-06-11 02:38:40,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.04 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3446
[2024-06-11 02:38:42,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 02:38:44,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1400.78 | bwd_inner_microstep: 1400.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4233
[2024-06-11 02:38:47,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 687.25 | bwd_microstep: 1871.40 | bwd_inner_microstep: 1871.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962
[2024-06-11 02:38:48,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.23 | bwd_microstep: 794.83 | bwd_inner_microstep: 794.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-11 02:38:49,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3470
[2024-06-11 02:38:51,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.75 | bwd_microstep: 1523.71 | bwd_inner_microstep: 1523.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819
[2024-06-11 02:38:55,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.09 | optimizer_step: 6.59
[2024-06-11 02:38:55,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.50 | bwd_microstep: 3624.81 | bwd_inner_microstep: 1559.32 | bwd_allreduce_microstep: 2065.45 | step_microstep: 37.71
[2024-06-11 02:38:55,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15903.68 | bwd: 44720.42 | bwd_inner: 42654.02 | bwd_allreduce: 2065.70 | step: 39.27
{'loss': 1.1798, 'learning_rate': 1.8817835071077882e-06, 'epoch': 0.86}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-11 02:38:57,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.21 | bwd_microstep: 1435.67 | bwd_inner_microstep: 1435.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-11 02:38:59,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1273.59 | bwd_inner_microstep: 1273.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 839
[2024-06-11 02:39:00,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.34 | bwd_microstep: 340.82 | bwd_inner_microstep: 340.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 02:39:02,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1454.23 | bwd_inner_microstep: 1454.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 02:39:03,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1243.69 | bwd_inner_microstep: 1243.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2705
[2024-06-11 02:39:05,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.31 | bwd_microstep: 1000.63 | bwd_inner_microstep: 1000.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-11 02:39:07,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1349.24 | bwd_inner_microstep: 1349.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-11 02:39:08,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.98 | bwd_microstep: 1150.70 | bwd_inner_microstep: 1150.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 02:39:10,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.39 | bwd_microstep: 1381.82 | bwd_inner_microstep: 1381.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2634
[2024-06-11 02:39:12,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.38 | bwd_microstep: 1112.28 | bwd_inner_microstep: 1112.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2184
[2024-06-11 02:39:13,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.94 | bwd_microstep: 762.66 | bwd_inner_microstep: 762.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-11 02:39:15,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.86 | bwd_microstep: 1484.43 | bwd_inner_microstep: 1484.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674
[2024-06-11 02:39:17,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.17 | bwd_microstep: 1523.98 | bwd_inner_microstep: 1523.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680
[2024-06-11 02:39:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.16 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438
[2024-06-11 02:39:20,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.49 | bwd_microstep: 1154.65 | bwd_inner_microstep: 1154.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 02:39:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1961
[2024-06-11 02:39:23,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 826.46 | bwd_inner_microstep: 826.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 02:39:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.81 | bwd_microstep: 1380.01 | bwd_inner_microstep: 1379.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-11 02:39:28,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.60 | bwd_microstep: 1608.44 | bwd_inner_microstep: 1608.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 02:39:29,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1372.63 | bwd_inner_microstep: 1372.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3007
[2024-06-11 02:39:31,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.05 | bwd_microstep: 1300.57 | bwd_inner_microstep: 1300.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3528
[2024-06-11 02:39:33,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.25 | bwd_microstep: 1535.49 | bwd_inner_microstep: 1535.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 02:39:35,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.53 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-11 02:39:37,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.30 | bwd_microstep: 975.91 | bwd_inner_microstep: 975.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 02:39:39,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1416.82 | bwd_inner_microstep: 1416.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-11 02:39:41,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.71 | bwd_microstep: 1656.50 | bwd_inner_microstep: 1656.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566
[2024-06-11 02:39:43,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.43 | bwd_microstep: 1599.27 | bwd_inner_microstep: 1599.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722
[2024-06-11 02:39:45,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1537.08 | bwd_inner_microstep: 1537.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2021
[2024-06-11 02:39:47,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.41 | bwd_microstep: 1726.94 | bwd_inner_microstep: 1726.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773
[2024-06-11 02:39:50,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.06 | bwd_microstep: 1634.48 | bwd_inner_microstep: 1634.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3385
[2024-06-11 02:39:51,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.24 | bwd_microstep: 1368.84 | bwd_inner_microstep: 1368.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3574
[2024-06-11 02:39:57,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.32 | optimizer_step: 6.63
[2024-06-11 02:39:57,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.34 | bwd_microstep: 5344.15 | bwd_inner_microstep: 1494.02 | bwd_allreduce_microstep: 3850.06 | step_microstep: 38.37
[2024-06-11 02:39:57,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15480.60 | bwd: 46244.48 | bwd_inner: 42393.49 | bwd_allreduce: 3850.31 | step: 39.88
6%|████████▋ | 1489/1726 [25:55:35<6:02:10, 91.69s/it]
 86%|████████▋ | 1490/1726 [25:57:44<6:45:05, 102.99s/it]


 86%|████████▋ | 1490/1726 [25:57:44<6:45:05, 102.99s/it]
 86%|████████▋ | 1491/1726 [25:59:29<6:45:32, 103.54s/it]


 86%|████████▋ | 1491/1726 [25:59:29<6:45:32, 103.54s/it]
 86%|████████▋ | 1492/1726 [26:00:31<5:55:17, 91.10s/it]


 86%|████████▋ | 1492/1726 [26:00:31<5:55:17, 91.10s/it]
 87%|████████▋ | 1493/1726 [26:01:32<5:18:38, 82.06s/it]


 87%|████████▋ | 1493/1726 [26:01:32<5:18:38, 82.06s/it]
 87%|████████▋ | 1494/1726 [26:02:34<4:54:04, 76.06s/it]
                                     {'loss': 1.2018, 'learning_rate': 1.8659209714863013e-06, 'epoch': 0.87}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-11 02:40:00,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.90 | bwd_microstep: 1571.99 | bwd_inner_microstep: 1571.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-11 02:40:01,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.64 | bwd_microstep: 1145.76 | bwd_inner_microstep: 1145.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 02:40:03,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474
[2024-06-11 02:40:05,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.68 | bwd_microstep: 1408.13 | bwd_inner_microstep: 1408.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777
[2024-06-11 02:40:07,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 1643.75 | bwd_inner_microstep: 1643.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 02:40:09,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1253.98 | bwd_inner_microstep: 1253.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 02:40:11,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.74 | bwd_microstep: 1385.18 | bwd_inner_microstep: 1385.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411
[2024-06-11 02:40:12,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.39 | bwd_microstep: 1152.58 | bwd_inner_microstep: 1152.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3502
[2024-06-11 02:40:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.27 | bwd_microstep: 1335.78 | bwd_inner_microstep: 1335.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-11 02:40:16,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.14 | bwd_microstep: 1185.81 | bwd_inner_microstep: 1185.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-11 02:40:18,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.28 | bwd_microstep: 1317.03 | bwd_inner_microstep: 1317.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096
[2024-06-11 02:40:19,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.32 | bwd_microstep: 916.57 | bwd_inner_microstep: 916.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3672
[2024-06-11 02:40:21,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1655.15 | bwd_inner_microstep: 1655.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2730
[2024-06-11 02:40:23,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.64 | bwd_microstep: 1100.24 | bwd_inner_microstep: 1100.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-11 02:40:25,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1386.64 | bwd_inner_microstep: 1386.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-11 02:40:27,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1473.87 | bwd_inner_microstep: 1473.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3646
[2024-06-11 02:40:29,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.19 | bwd_microstep: 1710.47 | bwd_inner_microstep: 1710.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-11 02:40:31,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.97 | bwd_microstep: 1521.49 | bwd_inner_microstep: 1521.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-11 02:40:33,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1253.20 | bwd_inner_microstep: 1253.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-11 02:40:35,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.46 | bwd_microstep: 1612.31 | bwd_inner_microstep: 1612.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-11 02:40:37,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.45 | bwd_microstep: 1182.08 | bwd_inner_microstep: 1182.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-11 02:40:39,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1428.77 | bwd_inner_microstep: 1428.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137
[2024-06-11 02:40:40,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.71 | bwd_microstep: 832.05 | bwd_inner_microstep: 832.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1379
[2024-06-11 02:40:41,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.62 | bwd_microstep: 524.79 | bwd_inner_microstep: 524.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3722
[2024-06-11 02:40:43,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.98 | bwd_microstep: 1482.22 | bwd_inner_microstep: 1482.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 02:40:45,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1554.39 | bwd_inner_microstep: 1554.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-11 02:40:47,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.96 | bwd_microstep: 1502.58 | bwd_inner_microstep: 1502.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 02:40:49,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.61 | bwd_microstep: 1455.69 | bwd_inner_microstep: 1455.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821
[2024-06-11 02:40:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.20 | bwd_microstep: 1418.55 | bwd_inner_microstep: 1418.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3808
[2024-06-11 02:40:53,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.84 | bwd_microstep: 1751.02 | bwd_inner_microstep: 1751.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3576
[2024-06-11 02:40:55,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.23 | bwd_microstep: 1300.23 | bwd_inner_microstep: 1300.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3588
[2024-06-11 02:41:00,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 02:41:00,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.09 | bwd_microstep: 4056.27 | bwd_inner_microstep: 1649.25 | bwd_allreduce_microstep: 2406.96 | step_microstep: 37.87
[2024-06-11 02:41:00,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16202.05 | bwd: 45898.69 | bwd_inner: 43490.82 | bwd_allreduce: 2407.19 | step: 39.37
{'loss': 1.1887, 'learning_rate': 1.850122303996882e-06, 'epoch': 0.87}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 02:41:02,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.15 | bwd_microstep: 1478.39 | bwd_inner_microstep: 1478.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-11 02:41:03,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 787.37 | bwd_inner_microstep: 787.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3902
[2024-06-11 02:41:05,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831
[2024-06-11 02:41:07,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1387.75 | bwd_inner_microstep: 1387.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4185
[2024-06-11 02:41:09,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.19 | bwd_microstep: 1650.14 | bwd_inner_microstep: 1650.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943
[2024-06-11 02:41:10,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.01 | bwd_microstep: 759.88 | bwd_inner_microstep: 759.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-11 02:41:11,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.86 | bwd_microstep: 802.56 | bwd_inner_microstep: 802.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705
[2024-06-11 02:41:13,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1526.70 | bwd_inner_microstep: 1526.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967
[2024-06-11 02:41:14,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.82 | bwd_microstep: 796.34 | bwd_inner_microstep: 796.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482
[2024-06-11 02:41:16,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 1312.30 | bwd_inner_microstep: 1312.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 02:41:18,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1494.13 | bwd_inner_microstep: 1494.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-11 02:41:21,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.03 | bwd_microstep: 1619.70 | bwd_inner_microstep: 1619.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1894
[2024-06-11 02:41:22,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.43 | bwd_microstep: 836.38 | bwd_inner_microstep: 836.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652
[2024-06-11 02:41:24,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.69 | bwd_microstep: 1719.90 | bwd_inner_microstep: 1719.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 02:41:26,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1389.42 | bwd_inner_microstep: 1389.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3636
[2024-06-11 02:41:28,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.22 | bwd_microstep: 1709.40 | bwd_inner_microstep: 1709.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 02:41:30,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-11 02:41:32,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1415.44 | bwd_inner_microstep: 1415.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533
[2024-06-11 02:41:34,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1294.99 | bwd_inner_microstep: 1294.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 02:41:36,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3618
[2024-06-11 02:41:38,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.68 | bwd_microstep: 1472.76 | bwd_inner_microstep: 1472.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3725
[2024-06-11 02:41:40,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.96 | bwd_microstep: 1240.78 | bwd_inner_microstep: 1240.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 02:41:42,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.83 | bwd_microstep: 1660.81 | bwd_inner_microstep: 1660.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2195
[2024-06-11 02:41:43,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.92 | bwd_microstep: 861.62 | bwd_inner_microstep: 861.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3552
[2024-06-11 02:41:45,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.78 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3590
[2024-06-11 02:41:47,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.98 | bwd_microstep: 1531.68 | bwd_inner_microstep: 1531.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-11 02:41:49,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1399.47 | bwd_inner_microstep: 1399.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-11 02:41:51,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.22 | bwd_microstep: 1334.22 | bwd_inner_microstep: 1334.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3800
[2024-06-11 02:41:53,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.16 | bwd_microstep: 1621.32 | bwd_inner_microstep: 1621.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-11 02:41:55,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.48 | bwd_microstep: 1632.85 | bwd_inner_microstep: 1632.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813
[2024-06-11 02:41:58,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.94 | bwd_microstep: 1619.67 | bwd_inner_microstep: 1619.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3803
[2024-06-11 02:42:00,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.01 | optimizer_step: 6.59
[2024-06-11 02:42:00,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.95 | bwd_microstep: 2060.95 | bwd_inner_microstep: 1685.24 | bwd_allreduce_microstep: 375.66 | step_microstep: 37.43
[2024-06-11 02:42:00,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16235.79 | bwd: 43922.02 | bwd_inner: 43545.45 | bwd_allreduce: 375.89 | step: 38.83
{'loss': 1.1628, 'learning_rate': 1.8343875602823558e-06, 'epoch': 0.87}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-11 02:42:01,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 785.39 | bwd_inner_microstep: 785.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-11 02:42:03,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.47 | bwd_microstep: 1210.52 | bwd_inner_microstep: 1210.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 02:42:05,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1375.87 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-11 02:42:07,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1298.44 | bwd_inner_microstep: 1298.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 02:42:09,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787
[2024-06-11 02:42:11,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1443.71 | bwd_inner_microstep: 1443.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911
[2024-06-11 02:42:12,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.05 | bwd_microstep: 780.04 | bwd_inner_microstep: 780.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3698
[2024-06-11 02:42:14,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1578.20 | bwd_inner_microstep: 1578.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1925
[2024-06-11 02:42:15,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.58 | bwd_microstep: 819.78 | bwd_inner_microstep: 819.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-11 02:42:17,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1335.04 | bwd_inner_microstep: 1335.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440
[2024-06-11 02:42:19,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.46 | bwd_microstep: 1311.83 | bwd_inner_microstep: 1311.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2648
[2024-06-11 02:42:20,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.56 | bwd_microstep: 957.07 | bwd_inner_microstep: 957.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501
[2024-06-11 02:42:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.49 | bwd_microstep: 1550.29 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657
[2024-06-11 02:42:24,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.54 | bwd_microstep: 1621.73 | bwd_inner_microstep: 1621.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-11 02:42:26,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1253.47 | bwd_inner_microstep: 1253.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3632
[2024-06-11 02:42:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.65 | bwd_microstep: 1544.91 | bwd_inner_microstep: 1544.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660
[2024-06-11 02:42:30,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1426.82 | bwd_inner_microstep: 1426.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-11 02:42:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1399.45 | bwd_inner_microstep: 1399.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-11 02:42:34,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.43 | bwd_microstep: 1458.46 | bwd_inner_microstep: 1458.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 02:42:36,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1290.02 | bwd_inner_microstep: 1290.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-11 02:42:37,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.40 | bwd_microstep: 800.86 | bwd_inner_microstep: 800.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 02:42:39,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997
[2024-06-11 02:42:40,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.91 | bwd_microstep: 738.24 | bwd_inner_microstep: 738.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2188
[2024-06-11 02:42:41,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.45 | bwd_microstep: 766.45 | bwd_inner_microstep: 766.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 02:42:43,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1396.89 | bwd_inner_microstep: 1396.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996
[2024-06-11 02:42:44,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.13 | bwd_microstep: 802.55 | bwd_inner_microstep: 802.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3826
[2024-06-11 02:42:46,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.15 | bwd_microstep: 1605.03 | bwd_inner_microstep: 1605.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-11 02:42:48,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.91 | bwd_microstep: 1395.60 | bwd_inner_microstep: 1395.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-11 02:42:50,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.44 | bwd_microstep: 1604.49 | bwd_inner_microstep: 1604.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 02:42:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.99 | bwd_microstep: 1556.07 | bwd_inner_microstep: 1556.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089
[2024-06-11 02:42:54,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.10 | bwd_microstep: 915.97 | bwd_inner_microstep: 915.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585
[2024-06-11 02:43:01,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.11 | optimizer_step: 6.62
[2024-06-11 02:43:01,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.55 | bwd_microstep: 6955.24 | bwd_inner_microstep: 1817.03 | bwd_allreduce_microstep: 5138.17 | step_microstep: 37.98
[2024-06-11 02:43:01,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15114.61 | bwd: 45634.72 | bwd_inner: 40495.61 | bwd_allreduce: 5138.41 | step: 39.58
{'loss': 1.1346, 'learning_rate': 1.8187167957604047e-06, 'epoch': 0.87}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490
[2024-06-11 02:43:04,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1572.49 | bwd_inner_microstep: 1572.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936
[2024-06-11 02:43:06,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.63 | bwd_microstep: 1589.30 | bwd_inner_microstep: 1589.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 02:43:07,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1245.21 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 02:43:09,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1245.58 | bwd_inner_microstep: 1245.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-11 02:43:10,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 793.49 | bwd_inner_microstep: 793.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781
[2024-06-11 02:43:12,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.78 | bwd_microstep: 1443.62 | bwd_inner_microstep: 1443.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2238
[2024-06-11 02:43:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.07 | bwd_microstep: 927.37 | bwd_inner_microstep: 927.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3721
[2024-06-11 02:43:16,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1559.17 | bwd_inner_microstep: 1559.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086
[2024-06-11 02:43:17,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.01 | bwd_microstep: 818.75 | bwd_inner_microstep: 818.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403
[2024-06-11 02:43:19,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.63 | bwd_microstep: 1212.38 | bwd_inner_microstep: 1212.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 02:43:20,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.58 | bwd_microstep: 1279.53 | bwd_inner_microstep: 1279.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1869
[2024-06-11 02:43:21,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.40 | bwd_microstep: 803.47 | bwd_inner_microstep: 803.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-11 02:43:23,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.68 | bwd_microstep: 1443.31 | bwd_inner_microstep: 1443.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-11 02:43:26,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1583.85 | bwd_inner_microstep: 1583.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2954
[2024-06-11 02:43:27,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.76 | bwd_microstep: 1012.42 | bwd_inner_microstep: 1012.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3541
[2024-06-11 02:43:29,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.62 | bwd_microstep: 1376.53 | bwd_inner_microstep: 1376.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178
[2024-06-11 02:43:30,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.42 | bwd_microstep: 951.70 | bwd_inner_microstep: 951.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3458
[2024-06-11 02:43:32,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.66 | bwd_microstep: 1570.19 | bwd_inner_microstep: 1570.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-11 02:43:34,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1407.23 | bwd_inner_microstep: 1407.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-11 02:43:36,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.20 | bwd_microstep: 1294.57 | bwd_inner_microstep: 1294.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-11 02:43:38,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.63 | bwd_microstep: 1191.55 | bwd_inner_microstep: 1191.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3580
[2024-06-11 02:43:40,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.44 | bwd_microstep: 1632.96 | bwd_inner_microstep: 1632.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 02:43:42,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1352.81 | bwd_inner_microstep: 1352.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-11 02:43:44,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.67 | bwd_microstep: 1508.40 | bwd_inner_microstep: 1508.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-11 02:43:46,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1503.10 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-11 02:43:48,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1608.89 | bwd_inner_microstep: 1608.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-11 02:43:50,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.67 | bwd_microstep: 1438.32 | bwd_inner_microstep: 1438.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 02:43:52,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1459.19 | bwd_inner_microstep: 1459.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-11 02:43:54,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.64 | bwd_microstep: 1501.84 | bwd_inner_microstep: 1501.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-11 02:43:56,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3581
[2024-06-11 02:43:59,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.21 | bwd_microstep: 1563.75 | bwd_inner_microstep: 1563.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1921
[2024-06-11 02:44:03,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.19 | optimizer_step: 6.58
[2024-06-11 02:44:03,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.88 | bwd_microstep: 3615.91 | bwd_inner_microstep: 926.54 | bwd_allreduce_microstep: 2689.32 | step_microstep: 37.81
[2024-06-11 02:44:03,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15813.45 | bwd: 45049.92 | bwd_inner: 42359.70 | bwd_allreduce: 2689.55 | step: 39.41
{'loss': 1.1652, 'learning_rate': 1.803110065623388e-06, 'epoch': 0.87}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 5207
[2024-06-11 02:44:05,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 695.68 | bwd_microstep: 1819.01 | bwd_inner_microstep: 1818.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3515
[2024-06-11 02:44:07,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.87 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480
[2024-06-11 02:44:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.10 | bwd_microstep: 1438.77 | bwd_inner_microstep: 1438.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 02:44:11,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1278.75 | bwd_inner_microstep: 1278.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 02:44:12,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1250.39 | bwd_inner_microstep: 1250.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 02:44:14,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.16 | bwd_microstep: 1338.96 | bwd_inner_microstep: 1338.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417
[2024-06-11 02:44:16,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.64 | bwd_microstep: 1151.77 | bwd_inner_microstep: 1151.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-11 02:44:18,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1344.56 | bwd_inner_microstep: 1344.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 02:44:19,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1248.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-11 02:44:22,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.60 | bwd_microstep: 1526.75 | bwd_inner_microstep: 1526.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 02:44:24,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.16 | bwd_microstep: 1485.77 | bwd_inner_microstep: 1485.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-11 02:44:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1347.53 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 02:44:27,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1297.87 | bwd_inner_microstep: 1297.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-11 02:44:29,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1576.56 | bwd_inner_microstep: 1576.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-11 02:44:30,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 703.65 | bwd_inner_microstep: 703.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 02:44:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 02:44:34,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672
[2024-06-11 02:44:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.92 | bwd_microstep: 1424.64 | bwd_inner_microstep: 1424.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690
[2024-06-11 02:44:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1330.46 | bwd_inner_microstep: 1330.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-11 02:44:40,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1411.39 | bwd_inner_microstep: 1411.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-11 02:44:42,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1404.17 | bwd_inner_microstep: 1404.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2149
[2024-06-11 02:44:43,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.25 | bwd_microstep: 852.42 | bwd_inner_microstep: 852.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3752
[2024-06-11 02:44:45,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.76 | bwd_microstep: 1376.27 | bwd_inner_microstep: 1376.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3627
[2024-06-11 02:44:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.84 | bwd_microstep: 1540.25 | bwd_inner_microstep: 1540.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-11 02:44:49,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.30 | bwd_microstep: 1558.58 | bwd_inner_microstep: 1558.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3435
[2024-06-11 02:44:51,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.97 | bwd_microstep: 1188.82 | bwd_inner_microstep: 1188.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-11 02:44:53,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.52 | bwd_microstep: 1522.30 | bwd_inner_microstep: 1522.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-11 02:44:55,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.98 | bwd_microstep: 1443.37 | bwd_inner_microstep: 1443.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3445
[2024-06-11 02:44:57,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.48 | bwd_microstep: 1546.91 | bwd_inner_microstep: 1546.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3732
[2024-06-11 02:44:59,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1300.38 | bwd_inner_microstep: 1300.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819
[2024-06-11 02:45:01,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1487.49 | bwd_inner_microstep: 1487.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 02:45:03,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.31 | optimizer_step: 6.60
[2024-06-11 02:45:03,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1839.99 | bwd_inner_microstep: 1451.56 | bwd_allreduce_microstep: 388.38 | step_microstep: 37.70
[2024-06-11 02:45:03,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16398.70 | bwd: 44153.17 | bwd_inner: 43763.90 | bwd_allreduce: 388.61 | step: 39.19


 87%|████████▋ | 1494/1726 [26:02:34<4:54:04, 76.06s/it]
 87%|████████▋ | 1495/1726 [26:03:37<4:37:04, 71.97s/it]


 87%|████████▋ | 1495/1726 [26:03:37<4:37:04, 71.97s/it]
 87%|████████▋ | 1496/1726 [26:04:37<4:22:40, 68.52s/it]


 87%|████████▋ | 1496/1726 [26:04:37<4:22:40, 68.52s/it]
 87%|████████▋ | 1497/1726 [26:05:38<4:13:00, 66.29s/it]


 87%|████████▋ | 1497/1726 [26:05:38<4:13:00, 66.29s/it]
 87%|████████▋ | 1498/1726 [26:06:39<4:06:06, 64.77s/it]


 87%|████████▋ | 1498/1726 [26:06:39<4:06:06, 64.77s/it]
 87%|████████▋ | 1499/1726 [26:07:40<4:00:37, 63.60s/it]
                     {'loss': 1.1638, 'learning_rate': 1.7875674248381237e-06, 'epoch': 0.87}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 02:45:06,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1487.95 | bwd_inner_microstep: 1487.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3915
[2024-06-11 02:45:08,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.01 | bwd_microstep: 1689.92 | bwd_inner_microstep: 1689.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-11 02:45:10,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.57 | bwd_microstep: 1286.74 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2310
[2024-06-11 02:45:11,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.13 | bwd_microstep: 818.26 | bwd_inner_microstep: 818.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 02:45:13,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.55 | bwd_microstep: 1281.26 | bwd_inner_microstep: 1281.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3902
[2024-06-11 02:45:15,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1519.11 | bwd_inner_microstep: 1519.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 02:45:17,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.19 | bwd_microstep: 1384.80 | bwd_inner_microstep: 1384.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 02:45:18,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.97 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885
[2024-06-11 02:45:19,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.62 | bwd_microstep: 712.90 | bwd_inner_microstep: 712.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 02:45:21,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1389.27 | bwd_inner_microstep: 1389.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2111
[2024-06-11 02:45:22,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.00 | bwd_microstep: 762.22 | bwd_inner_microstep: 762.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1953
[2024-06-11 02:45:24,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.46 | bwd_microstep: 857.26 | bwd_inner_microstep: 857.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2978
[2024-06-11 02:45:25,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.05 | bwd_microstep: 1102.77 | bwd_inner_microstep: 1102.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 02:45:27,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.52 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-11 02:45:29,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.59 | bwd_microstep: 1293.08 | bwd_inner_microstep: 1293.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-11 02:45:31,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1483.77 | bwd_inner_microstep: 1483.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498
[2024-06-11 02:45:33,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.87 | bwd_microstep: 1548.77 | bwd_inner_microstep: 1548.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665
[2024-06-11 02:45:35,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.49 | bwd_microstep: 1426.98 | bwd_inner_microstep: 1426.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-11 02:45:36,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.92 | bwd_microstep: 795.19 | bwd_inner_microstep: 795.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507
[2024-06-11 02:45:38,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.66 | bwd_microstep: 1189.84 | bwd_inner_microstep: 1189.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 02:45:40,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.20 | bwd_microstep: 1653.53 | bwd_inner_microstep: 1653.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-11 02:45:42,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.06 | bwd_microstep: 1524.68 | bwd_inner_microstep: 1524.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 02:45:44,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 02:45:46,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.81 | bwd_microstep: 1658.82 | bwd_inner_microstep: 1658.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-11 02:45:48,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1438.10 | bwd_inner_microstep: 1438.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2018
[2024-06-11 02:45:49,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.50 | bwd_microstep: 839.59 | bwd_inner_microstep: 839.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 02:45:51,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.00 | bwd_microstep: 1659.10 | bwd_inner_microstep: 1659.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-11 02:45:53,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.36 | bwd_microstep: 1439.43 | bwd_inner_microstep: 1439.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060
[2024-06-11 02:45:55,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.61 | bwd_microstep: 846.06 | bwd_inner_microstep: 846.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 02:45:57,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1410.53 | bwd_inner_microstep: 1410.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 02:45:58,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3585
[2024-06-11 02:46:05,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59
[2024-06-11 02:46:05,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.74 | bwd_microstep: 5873.01 | bwd_inner_microstep: 1928.12 | bwd_allreduce_microstep: 3944.82 | step_microstep: 38.74
[2024-06-11 02:46:05,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15536.30 | bwd: 45706.84 | bwd_inner: 41761.09 | bwd_allreduce: 3945.06 | step: 40.15
{'loss': 1.188, 'learning_rate': 1.7720889281457121e-06, 'epoch': 0.87}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3480
[2024-06-11 02:46:07,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.89 | bwd_microstep: 1568.00 | bwd_inner_microstep: 1567.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4362
[2024-06-11 02:46:09,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.09 | bwd_microstep: 1507.60 | bwd_inner_microstep: 1507.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2349
[2024-06-11 02:46:11,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.26 | bwd_microstep: 919.91 | bwd_inner_microstep: 919.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 02:46:13,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.16 | bwd_microstep: 1494.38 | bwd_inner_microstep: 1494.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776
[2024-06-11 02:46:15,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.27 | bwd_microstep: 1541.85 | bwd_inner_microstep: 1541.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 02:46:16,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1246.77 | bwd_inner_microstep: 1246.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 02:46:18,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1380.95 | bwd_inner_microstep: 1380.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916
[2024-06-11 02:46:19,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.87 | bwd_microstep: 717.60 | bwd_inner_microstep: 717.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502
[2024-06-11 02:46:21,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.57 | bwd_microstep: 1187.96 | bwd_inner_microstep: 1187.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-11 02:46:22,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.69 | bwd_microstep: 777.42 | bwd_inner_microstep: 777.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-11 02:46:24,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.33 | bwd_microstep: 1346.89 | bwd_inner_microstep: 1346.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2950
[2024-06-11 02:46:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.66 | bwd_microstep: 1195.46 | bwd_inner_microstep: 1195.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-11 02:46:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.48 | bwd_microstep: 1520.01 | bwd_inner_microstep: 1519.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-11 02:46:30,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1511.86 | bwd_inner_microstep: 1511.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 02:46:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.77 | bwd_microstep: 1295.22 | bwd_inner_microstep: 1295.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501
[2024-06-11 02:46:34,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1576.27 | bwd_inner_microstep: 1576.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2125
[2024-06-11 02:46:35,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.73 | bwd_microstep: 828.96 | bwd_inner_microstep: 828.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-11 02:46:37,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.80 | bwd_microstep: 1478.89 | bwd_inner_microstep: 1478.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3677
[2024-06-11 02:46:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.19 | bwd_microstep: 1453.86 | bwd_inner_microstep: 1453.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-11 02:46:41,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.75 | bwd_microstep: 1185.92 | bwd_inner_microstep: 1185.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2212
[2024-06-11 02:46:42,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.35 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644
[2024-06-11 02:46:44,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1616.60 | bwd_inner_microstep: 1616.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-11 02:46:46,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1432.80 | bwd_inner_microstep: 1432.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2022
[2024-06-11 02:46:47,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.32 | bwd_microstep: 745.01 | bwd_inner_microstep: 744.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 02:46:49,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.99 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064
[2024-06-11 02:46:50,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.98 | bwd_microstep: 916.45 | bwd_inner_microstep: 916.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553
[2024-06-11 02:46:52,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.50 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-11 02:46:54,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1553.57 | bwd_inner_microstep: 1553.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-11 02:46:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.40 | bwd_microstep: 1359.43 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487
[2024-06-11 02:46:58,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.52 | bwd_microstep: 1346.65 | bwd_inner_microstep: 1346.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3597
[2024-06-11 02:47:00,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.68 | bwd_microstep: 1706.26 | bwd_inner_microstep: 1706.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229
[2024-06-11 02:47:07,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.10 | optimizer_step: 6.59
[2024-06-11 02:47:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.59 | bwd_microstep: 6307.21 | bwd_inner_microstep: 1088.77 | bwd_allreduce_microstep: 5218.38 | step_microstep: 37.94
[2024-06-11 02:47:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15323.19 | bwd: 46229.61 | bwd_inner: 41010.32 | bwd_allreduce: 5218.61 | step: 39.46
{'loss': 1.1947, 'learning_rate': 1.7566746300613325e-06, 'epoch': 0.87}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474
[2024-06-11 02:47:09,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.53 | bwd_microstep: 1568.62 | bwd_inner_microstep: 1568.50 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 02:47:11,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.10 | bwd_microstep: 1242.79 | bwd_inner_microstep: 1242.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2336
[2024-06-11 02:47:12,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.19 | bwd_microstep: 982.17 | bwd_inner_microstep: 982.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463
[2024-06-11 02:47:14,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1237.60 | bwd_inner_microstep: 1237.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2175
[2024-06-11 02:47:15,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.22 | bwd_microstep: 852.50 | bwd_inner_microstep: 852.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-11 02:47:17,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.94 | bwd_microstep: 1248.02 | bwd_inner_microstep: 1247.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 02:47:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1380.62 | bwd_inner_microstep: 1380.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 02:47:20,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.14 | bwd_microstep: 1283.85 | bwd_inner_microstep: 1283.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977
[2024-06-11 02:47:22,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.76 | bwd_microstep: 827.43 | bwd_inner_microstep: 827.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 02:47:23,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 787.63 | bwd_inner_microstep: 787.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3717
[2024-06-11 02:47:25,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1475.51 | bwd_inner_microstep: 1475.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-11 02:47:27,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1301.05 | bwd_inner_microstep: 1301.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2404
[2024-06-11 02:47:28,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.76 | bwd_microstep: 1036.37 | bwd_inner_microstep: 1036.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3427
[2024-06-11 02:47:30,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.02 | bwd_microstep: 1369.69 | bwd_inner_microstep: 1369.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636
[2024-06-11 02:47:32,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.63 | bwd_microstep: 1603.85 | bwd_inner_microstep: 1603.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670
[2024-06-11 02:47:34,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.50 | bwd_microstep: 1625.04 | bwd_inner_microstep: 1625.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 02:47:36,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.49 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3449
[2024-06-11 02:47:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1300.80 | bwd_inner_microstep: 1300.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2419
[2024-06-11 02:47:39,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.95 | bwd_microstep: 939.14 | bwd_inner_microstep: 939.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830
[2024-06-11 02:47:41,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1460.97 | bwd_inner_microstep: 1460.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3900
[2024-06-11 02:47:43,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1395.96 | bwd_inner_microstep: 1395.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3684
[2024-06-11 02:47:45,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1234.20 | bwd_inner_microstep: 1234.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 02:47:47,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1355.16 | bwd_inner_microstep: 1355.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 02:47:49,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1393.50 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-11 02:47:50,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.63 | bwd_microstep: 711.08 | bwd_inner_microstep: 711.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 02:47:52,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1396.20 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814
[2024-06-11 02:47:54,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.97 | bwd_microstep: 1581.79 | bwd_inner_microstep: 1581.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 02:47:56,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-11 02:47:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 1304.44 | bwd_inner_microstep: 1304.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071
[2024-06-11 02:47:59,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.43 | bwd_microstep: 913.35 | bwd_inner_microstep: 913.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3483
[2024-06-11 02:48:01,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.59 | bwd_microstep: 1343.90 | bwd_inner_microstep: 1343.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2261
[2024-06-11 02:48:10,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.21 | optimizer_step: 6.62
[2024-06-11 02:48:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.66 | bwd_microstep: 9228.76 | bwd_inner_microstep: 1102.37 | bwd_allreduce_microstep: 8126.32 | step_microstep: 38.77
[2024-06-11 02:48:10,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14937.80 | bwd: 48070.22 | bwd_inner: 39942.87 | bwd_allreduce: 8126.62 | step: 40.36
{'loss': 1.1484, 'learning_rate': 1.7413245848740734e-06, 'epoch': 0.87}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3499
[2024-06-11 02:48:12,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.84 | bwd_microstep: 1498.34 | bwd_inner_microstep: 1498.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-11 02:48:14,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1392.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-11 02:48:16,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.86 | bwd_microstep: 1143.37 | bwd_inner_microstep: 1143.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3881
[2024-06-11 02:48:18,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.34 | bwd_microstep: 1443.86 | bwd_inner_microstep: 1443.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937
[2024-06-11 02:48:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.69 | bwd_microstep: 759.28 | bwd_inner_microstep: 759.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3485
[2024-06-11 02:48:21,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.96 | bwd_microstep: 1216.67 | bwd_inner_microstep: 1216.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-11 02:48:22,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.62 | bwd_microstep: 1311.94 | bwd_inner_microstep: 1311.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420
[2024-06-11 02:48:24,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.79 | bwd_microstep: 1152.26 | bwd_inner_microstep: 1152.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713
[2024-06-11 02:48:26,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.13 | bwd_microstep: 1528.07 | bwd_inner_microstep: 1528.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-11 02:48:28,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1634.35 | bwd_inner_microstep: 1634.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 02:48:30,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 02:48:32,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.94 | bwd_microstep: 1482.64 | bwd_inner_microstep: 1482.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3717
[2024-06-11 02:48:35,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1560.33 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515
[2024-06-11 02:48:37,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1410.83 | bwd_inner_microstep: 1410.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946
[2024-06-11 02:48:38,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.29 | bwd_microstep: 730.40 | bwd_inner_microstep: 730.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-11 02:48:39,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.08 | bwd_microstep: 1398.57 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3951
[2024-06-11 02:48:42,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.96 | bwd_microstep: 1670.58 | bwd_inner_microstep: 1670.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640
[2024-06-11 02:48:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.47 | bwd_microstep: 1579.79 | bwd_inner_microstep: 1579.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 02:48:46,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1396.16 | bwd_inner_microstep: 1396.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-11 02:48:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1557.13 | bwd_inner_microstep: 1557.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-11 02:48:50,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1421.33 | bwd_inner_microstep: 1421.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-11 02:48:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1410.99 | bwd_inner_microstep: 1410.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756
[2024-06-11 02:48:54,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.43 | bwd_microstep: 1546.94 | bwd_inner_microstep: 1546.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-11 02:48:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.30 | bwd_microstep: 1533.35 | bwd_inner_microstep: 1533.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-11 02:48:58,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.52 | bwd_microstep: 1252.47 | bwd_inner_microstep: 1252.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 02:49:00,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.75 | bwd_microstep: 1555.87 | bwd_inner_microstep: 1555.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 02:49:02,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1375.82 | bwd_inner_microstep: 1375.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560
[2024-06-11 02:49:04,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1427.13 | bwd_inner_microstep: 1427.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-11 02:49:06,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.94 | bwd_inner_microstep: 1479.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820
[2024-06-11 02:49:09,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.15 | bwd_microstep: 1856.48 | bwd_inner_microstep: 1856.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3807
[2024-06-11 02:49:11,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.70 | bwd_microstep: 1820.45 | bwd_inner_microstep: 1820.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3849
[2024-06-11 02:49:13,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.05 | optimizer_step: 6.61
[2024-06-11 02:49:13,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1536.21 | bwd_inner_microstep: 1528.46 | bwd_allreduce_microstep: 7.70 | step_microstep: 37.51
[2024-06-11 02:49:13,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16985.83 | bwd: 45564.46 | bwd_inner: 45555.83 | bwd_allreduce: 7.92 | step: 39.07
{'loss': 1.1815, 'learning_rate': 1.726038846646707e-06, 'epoch': 0.87}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-11 02:49:15,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1337.71 | bwd_inner_microstep: 1337.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3602
[2024-06-11 02:49:17,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1242.11 | bwd_inner_microstep: 1242.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 02:49:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1246.58 | bwd_inner_microstep: 1246.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-11 02:49:20,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1345.08 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-11 02:49:22,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.97 | bwd_microstep: 1445.74 | bwd_inner_microstep: 1445.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3779
[2024-06-11 02:49:24,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189
[2024-06-11 02:49:25,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.19 | bwd_microstep: 856.62 | bwd_inner_microstep: 856.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 02:49:27,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.40 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-11 02:49:28,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.96 | bwd_microstep: 796.46 | bwd_inner_microstep: 796.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052
[2024-06-11 02:49:29,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 816.14 | bwd_inner_microstep: 816.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 02:49:32,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1497.78 | bwd_inner_microstep: 1497.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402
[2024-06-11 02:49:33,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.20 | bwd_microstep: 1366.48 | bwd_inner_microstep: 1366.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3498
[2024-06-11 02:49:35,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.27 | bwd_microstep: 1429.35 | bwd_inner_microstep: 1429.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099
[2024-06-11 02:49:37,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 920.88 | bwd_inner_microstep: 920.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-11 02:49:39,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1348.92 | bwd_inner_microstep: 1348.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 02:49:40,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.94 | bwd_microstep: 1280.01 | bwd_inner_microstep: 1279.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 02:49:42,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.41 | bwd_microstep: 1487.39 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-11 02:49:44,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1556.91 | bwd_inner_microstep: 1556.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 02:49:46,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1287.31 | bwd_inner_microstep: 1287.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 02:49:48,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1412.94 | bwd_inner_microstep: 1412.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 02:49:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1396.74 | bwd_inner_microstep: 1396.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517
[2024-06-11 02:49:52,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.16 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699
[2024-06-11 02:49:54,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1333.28 | bwd_inner_microstep: 1333.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 02:49:56,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.56 | bwd_microstep: 1656.94 | bwd_inner_microstep: 1656.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521
[2024-06-11 02:49:58,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1323.83 | bwd_inner_microstep: 1323.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-11 02:50:00,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.56 | bwd_microstep: 1432.72 | bwd_inner_microstep: 1432.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276
[2024-06-11 02:50:01,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.75 | bwd_microstep: 1071.71 | bwd_inner_microstep: 1071.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3595
[2024-06-11 02:50:03,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.50 | bwd_microstep: 1554.49 | bwd_inner_microstep: 1554.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-11 02:50:05,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1508.88 | bwd_inner_microstep: 1508.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2221
[2024-06-11 02:50:07,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.55 | bwd_microstep: 862.02 | bwd_inner_microstep: 862.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 02:50:08,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1253.28 | bwd_inner_microstep: 1253.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273
[2024-06-11 02:50:13,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-11 02:50:13,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.05 | bwd_microstep: 4120.99 | bwd_inner_microstep: 1104.01 | bwd_allreduce_microstep: 3016.93 | step_microstep: 38.16
[2024-06-11 02:50:13,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15355.61 | bwd: 44060.41 | bwd_inner: 41042.56 | bwd_allreduce: 3017.16 | step: 39.73


 87%|████████▋ | 1499/1726 [26:07:40<4:00:37, 63.60s/it]
 87%|████████▋ | 1500/1726 [26:08:42<3:57:16, 62.99s/it]


 87%|████████▋ | 1500/1726 [26:08:42<3:57:16, 62.99s/it]
 87%|████████▋ | 1501/1726 [26:09:44<3:54:58, 62.66s/it]


 87%|████████▋ | 1501/1726 [26:09:44<3:54:58, 62.66s/it]
 87%|████████▋ | 1502/1726 [26:10:47<3:54:41, 62.87s/it]


 87%|████████▋ | 1502/1726 [26:10:47<3:54:41, 62.87s/it]
 87%|████████▋ | 1503/1726 [26:11:50<3:53:40, 62.87s/it]


 87%|████████▋ | 1503/1726 [26:11:50<3:53:40, 62.87s/it]
 87%|████████▋ | 1504/1726 [26:12:50<3:49:10, 61.94s/it]
     {'loss': 1.1971, 'learning_rate': 1.7108174692155266e-06, 'epoch': 0.87}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3379
[2024-06-11 02:50:14,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.89 | bwd_microstep: 1141.12 | bwd_inner_microstep: 1141.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 02:50:16,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1241.55 | bwd_inner_microstep: 1241.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3866
[2024-06-11 02:50:18,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.26 | bwd_microstep: 1659.41 | bwd_inner_microstep: 1659.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-11 02:50:21,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.42 | bwd_inner_microstep: 1556.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789
[2024-06-11 02:50:23,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.05 | bwd_microstep: 1454.36 | bwd_inner_microstep: 1454.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-11 02:50:24,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.20 | bwd_microstep: 1251.14 | bwd_inner_microstep: 1251.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 02:50:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.31 | bwd_microstep: 1387.39 | bwd_inner_microstep: 1387.25 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-11 02:50:28,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 1344.82 | bwd_inner_microstep: 1344.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-11 02:50:29,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.81 | bwd_microstep: 800.37 | bwd_inner_microstep: 800.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-11 02:50:31,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 02:50:33,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.45 | bwd_microstep: 1253.12 | bwd_inner_microstep: 1253.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2001
[2024-06-11 02:50:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.03 | bwd_microstep: 831.29 | bwd_inner_microstep: 831.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2092
[2024-06-11 02:50:35,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.62 | bwd_microstep: 1015.53 | bwd_inner_microstep: 1015.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649
[2024-06-11 02:50:38,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.98 | bwd_microstep: 1517.15 | bwd_inner_microstep: 1517.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514
[2024-06-11 02:50:40,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1581.11 | bwd_inner_microstep: 1581.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2008
[2024-06-11 02:50:41,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.91 | bwd_microstep: 738.62 | bwd_inner_microstep: 738.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 02:50:43,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1392.96 | bwd_inner_microstep: 1392.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3539
[2024-06-11 02:50:45,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1426.26 | bwd_inner_microstep: 1426.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-11 02:50:46,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.77 | bwd_microstep: 684.09 | bwd_inner_microstep: 684.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 02:50:48,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1460.22 | bwd_inner_microstep: 1460.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 02:50:50,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1396.24 | bwd_inner_microstep: 1396.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-11 02:50:51,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.50 | bwd_microstep: 1152.53 | bwd_inner_microstep: 1152.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 02:50:53,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.66 | bwd_microstep: 1358.21 | bwd_inner_microstep: 1358.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-11 02:50:55,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 1550.17 | bwd_inner_microstep: 1550.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-11 02:50:57,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568
[2024-06-11 02:50:59,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1360.63 | bwd_inner_microstep: 1360.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233
[2024-06-11 02:51:01,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.59 | bwd_microstep: 963.23 | bwd_inner_microstep: 963.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2474
[2024-06-11 02:51:02,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.46 | bwd_microstep: 1124.36 | bwd_inner_microstep: 1124.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-11 02:51:04,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.89 | bwd_microstep: 1750.03 | bwd_inner_microstep: 1750.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-11 02:51:06,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.17 | bwd_microstep: 976.79 | bwd_inner_microstep: 976.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815
[2024-06-11 02:51:08,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1355.82 | bwd_inner_microstep: 1355.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-11 02:51:14,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.56
[2024-06-11 02:51:14,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.41 | bwd_microstep: 6089.66 | bwd_inner_microstep: 994.50 | bwd_allreduce_microstep: 5095.09 | step_microstep: 39.05
[2024-06-11 02:51:14,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15185.17 | bwd: 45764.10 | bwd_inner: 40667.99 | bwd_allreduce: 5095.37 | step: 40.69
{'loss': 1.182, 'learning_rate': 1.6956605061901377e-06, 'epoch': 0.87}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 02:51:16,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.10 | bwd_microstep: 1466.87 | bwd_inner_microstep: 1466.72 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3946
[2024-06-11 02:51:18,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.88 | bwd_microstep: 1594.22 | bwd_inner_microstep: 1594.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 02:51:20,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.73 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 02:51:23,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.66 | bwd_microstep: 1650.02 | bwd_inner_microstep: 1649.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 02:51:24,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1239.70 | bwd_inner_microstep: 1239.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2205
[2024-06-11 02:51:25,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.66 | bwd_microstep: 859.13 | bwd_inner_microstep: 859.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-11 02:51:26,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.80 | bwd_microstep: 676.34 | bwd_inner_microstep: 676.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-11 02:51:28,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.59 | bwd_microstep: 1299.92 | bwd_inner_microstep: 1299.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 02:51:30,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1247.27 | bwd_inner_microstep: 1247.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 02:51:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1251.07 | bwd_inner_microstep: 1251.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954
[2024-06-11 02:51:33,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.51 | bwd_microstep: 732.18 | bwd_inner_microstep: 732.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101
[2024-06-11 02:51:34,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.71 | bwd_microstep: 822.01 | bwd_inner_microstep: 821.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 02:51:36,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1376.83 | bwd_inner_microstep: 1376.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-11 02:51:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1242.73 | bwd_inner_microstep: 1242.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3641
[2024-06-11 02:51:40,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.62 | bwd_microstep: 1542.43 | bwd_inner_microstep: 1542.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-11 02:51:41,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.50 | bwd_microstep: 686.33 | bwd_inner_microstep: 686.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3672
[2024-06-11 02:51:42,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1232.05 | bwd_inner_microstep: 1232.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 02:51:44,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1394.93 | bwd_inner_microstep: 1394.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-11 02:51:46,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.89 | bwd_microstep: 1611.73 | bwd_inner_microstep: 1611.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-11 02:51:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.44 | bwd_microstep: 1556.54 | bwd_inner_microstep: 1556.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 02:51:50,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1392.81 | bwd_inner_microstep: 1392.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2034
[2024-06-11 02:51:51,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.09 | bwd_microstep: 715.35 | bwd_inner_microstep: 715.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3597
[2024-06-11 02:51:54,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 1643.65 | bwd_inner_microstep: 1643.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443
[2024-06-11 02:51:56,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1355.64 | bwd_inner_microstep: 1355.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 02:51:58,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3751
[2024-06-11 02:52:00,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1378.56 | bwd_inner_microstep: 1378.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806
[2024-06-11 02:52:02,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1483.34 | bwd_inner_microstep: 1483.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721
[2024-06-11 02:52:04,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1334.29 | bwd_inner_microstep: 1334.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-11 02:52:06,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.00 | bwd_microstep: 1636.45 | bwd_inner_microstep: 1636.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2274
[2024-06-11 02:52:07,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.42 | bwd_microstep: 814.59 | bwd_inner_microstep: 814.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3803
[2024-06-11 02:52:09,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.22 | bwd_microstep: 1668.75 | bwd_inner_microstep: 1668.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3568
[2024-06-11 02:52:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 02:52:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.46 | bwd_microstep: 4827.08 | bwd_inner_microstep: 2015.20 | bwd_allreduce_microstep: 2811.82 | step_microstep: 37.64
[2024-06-11 02:52:15,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15493.48 | bwd: 44629.20 | bwd_inner: 41816.37 | bwd_allreduce: 2812.11 | step: 39.21
{'loss': 1.1976, 'learning_rate': 1.6805680109532962e-06, 'epoch': 0.87}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 02:52:17,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1365.78 | bwd_inner_microstep: 1365.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-11 02:52:18,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.61 | bwd_microstep: 1150.48 | bwd_inner_microstep: 1150.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 02:52:20,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2643
[2024-06-11 02:52:22,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.54 | bwd_microstep: 1150.79 | bwd_inner_microstep: 1150.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-11 02:52:23,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 790.78 | bwd_inner_microstep: 790.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788
[2024-06-11 02:52:25,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1451.41 | bwd_inner_microstep: 1451.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-11 02:52:26,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.97 | bwd_microstep: 797.40 | bwd_inner_microstep: 797.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 02:52:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1249.80 | bwd_inner_microstep: 1249.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 02:52:29,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.67 | bwd_microstep: 1389.28 | bwd_inner_microstep: 1389.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-11 02:52:31,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.48 | bwd_microstep: 1158.76 | bwd_inner_microstep: 1158.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-11 02:52:33,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.38 | bwd_microstep: 1317.49 | bwd_inner_microstep: 1317.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-11 02:52:35,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1373.64 | bwd_inner_microstep: 1373.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410
[2024-06-11 02:52:37,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1438.75 | bwd_inner_microstep: 1438.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655
[2024-06-11 02:52:39,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.17 | bwd_microstep: 1612.90 | bwd_inner_microstep: 1612.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2744
[2024-06-11 02:52:41,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.67 | bwd_microstep: 1076.13 | bwd_inner_microstep: 1076.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513
[2024-06-11 02:52:43,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.88 | bwd_microstep: 1448.51 | bwd_inner_microstep: 1448.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515
[2024-06-11 02:52:44,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1252.00 | bwd_inner_microstep: 1251.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3518
[2024-06-11 02:52:46,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.27 | bwd_microstep: 1440.06 | bwd_inner_microstep: 1440.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3855
[2024-06-11 02:52:48,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.47 | bwd_microstep: 1563.46 | bwd_inner_microstep: 1563.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3552
[2024-06-11 02:52:50,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.31 | bwd_microstep: 1233.91 | bwd_inner_microstep: 1233.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 02:52:52,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-11 02:52:54,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.91 | bwd_microstep: 1319.38 | bwd_inner_microstep: 1319.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-11 02:52:56,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.89 | bwd_microstep: 1461.30 | bwd_inner_microstep: 1461.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-11 02:52:57,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.01 | bwd_microstep: 702.96 | bwd_inner_microstep: 702.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3444
[2024-06-11 02:52:59,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1299.92 | bwd_inner_microstep: 1299.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-11 02:53:01,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1496.55 | bwd_inner_microstep: 1496.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070
[2024-06-11 02:53:02,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.51 | bwd_microstep: 850.24 | bwd_inner_microstep: 850.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-11 02:53:04,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.02 | bwd_microstep: 1550.76 | bwd_inner_microstep: 1550.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-11 02:53:06,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1431.15 | bwd_inner_microstep: 1431.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689
[2024-06-11 02:53:08,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1521.89 | bwd_inner_microstep: 1521.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602
[2024-06-11 02:53:10,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-11 02:53:17,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.10 | optimizer_step: 6.61
[2024-06-11 02:53:17,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 6562.55 | bwd_inner_microstep: 1874.84 | bwd_allreduce_microstep: 4687.65 | step_microstep: 37.97
[2024-06-11 02:53:17,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15682.12 | bwd: 46738.45 | bwd_inner: 42049.89 | bwd_allreduce: 4687.88 | step: 39.54
{'loss': 1.1316, 'learning_rate': 1.6655400366606867e-06, 'epoch': 0.87}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470
[2024-06-11 02:53:20,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.82 | bwd_microstep: 1562.46 | bwd_inner_microstep: 1562.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3903
[2024-06-11 02:53:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.34 | bwd_microstep: 1680.78 | bwd_inner_microstep: 1680.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849
[2024-06-11 02:53:24,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.23 | bwd_microstep: 1658.94 | bwd_inner_microstep: 1658.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 02:53:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 02:53:28,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1246.49 | bwd_inner_microstep: 1246.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 02:53:30,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.01 | bwd_microstep: 1481.86 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 02:53:32,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.57 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 02:53:33,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1282.74 | bwd_inner_microstep: 1282.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 02:53:35,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1246.65 | bwd_inner_microstep: 1246.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-11 02:53:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1423.03 | bwd_inner_microstep: 1423.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 02:53:39,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1285.65 | bwd_inner_microstep: 1285.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-11 02:53:41,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.27 | bwd_microstep: 1321.02 | bwd_inner_microstep: 1320.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-11 02:53:43,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.20 | bwd_microstep: 1486.27 | bwd_inner_microstep: 1486.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-11 02:53:45,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1348.02 | bwd_inner_microstep: 1347.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3703
[2024-06-11 02:53:47,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.69 | bwd_microstep: 1654.71 | bwd_inner_microstep: 1654.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 02:53:49,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-11 02:53:51,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1512.34 | bwd_inner_microstep: 1512.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 02:53:52,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1252.43 | bwd_inner_microstep: 1252.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 02:53:54,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1349.73 | bwd_inner_microstep: 1349.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925
[2024-06-11 02:53:55,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.87 | bwd_microstep: 726.75 | bwd_inner_microstep: 726.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-11 02:53:57,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.12 | bwd_microstep: 1524.58 | bwd_inner_microstep: 1524.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-11 02:53:59,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1295.90 | bwd_inner_microstep: 1295.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 02:54:01,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1298.24 | bwd_inner_microstep: 1298.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-11 02:54:03,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.61 | bwd_microstep: 1314.95 | bwd_inner_microstep: 1314.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3439
[2024-06-11 02:54:05,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1299.02 | bwd_inner_microstep: 1298.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3563
[2024-06-11 02:54:06,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1264.63 | bwd_inner_microstep: 1264.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834
[2024-06-11 02:54:09,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.96 | bwd_microstep: 1753.53 | bwd_inner_microstep: 1753.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2227
[2024-06-11 02:54:10,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.69 | bwd_microstep: 863.08 | bwd_inner_microstep: 863.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3835
[2024-06-11 02:54:12,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.28 | bwd_microstep: 1690.44 | bwd_inner_microstep: 1690.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801
[2024-06-11 02:54:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.40 | bwd_microstep: 1750.64 | bwd_inner_microstep: 1750.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-11 02:54:17,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.25 | bwd_microstep: 1406.80 | bwd_inner_microstep: 1406.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2015
[2024-06-11 02:54:19,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.02 | optimizer_step: 6.63
[2024-06-11 02:54:19,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.45 | bwd_microstep: 1898.40 | bwd_inner_microstep: 991.03 | bwd_allreduce_microstep: 907.32 | step_microstep: 37.32
[2024-06-11 02:54:19,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16381.50 | bwd: 44826.70 | bwd_inner: 43918.49 | bwd_allreduce: 907.55 | step: 38.74
{'loss': 1.1546, 'learning_rate': 1.6505766362407571e-06, 'epoch': 0.87}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-11 02:54:21,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1470.43 | bwd_inner_microstep: 1470.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 02:54:23,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1380.69 | bwd_inner_microstep: 1380.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2384
[2024-06-11 02:54:24,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.56 | bwd_microstep: 997.74 | bwd_inner_microstep: 997.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823
[2024-06-11 02:54:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.30 | bwd_microstep: 1483.56 | bwd_inner_microstep: 1483.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803
[2024-06-11 02:54:29,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.67 | bwd_microstep: 1650.68 | bwd_inner_microstep: 1650.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4179
[2024-06-11 02:54:31,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.49 | bwd_microstep: 1483.53 | bwd_inner_microstep: 1483.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-11 02:54:33,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.18 | bwd_microstep: 1638.27 | bwd_inner_microstep: 1638.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-11 02:54:35,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.62 | bwd_microstep: 1542.49 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178
[2024-06-11 02:54:36,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.22 | bwd_microstep: 950.75 | bwd_inner_microstep: 950.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 02:54:38,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1252.56 | bwd_inner_microstep: 1252.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 02:54:40,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-11 02:54:42,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.76 | bwd_microstep: 1278.43 | bwd_inner_microstep: 1278.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3665
[2024-06-11 02:54:44,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1403.95 | bwd_inner_microstep: 1403.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-11 02:54:46,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2930
[2024-06-11 02:54:47,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.23 | bwd_microstep: 1240.05 | bwd_inner_microstep: 1240.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-11 02:54:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.97 | bwd_microstep: 792.42 | bwd_inner_microstep: 792.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 02:54:50,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1252.22 | bwd_inner_microstep: 1252.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2012
[2024-06-11 02:54:51,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.03 | bwd_microstep: 835.41 | bwd_inner_microstep: 835.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3461
[2024-06-11 02:54:53,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.51 | bwd_microstep: 1325.73 | bwd_inner_microstep: 1325.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3538
[2024-06-11 02:54:55,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1356.28 | bwd_inner_microstep: 1356.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 02:54:57,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.80 | bwd_microstep: 1660.20 | bwd_inner_microstep: 1660.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-11 02:54:59,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1433.68 | bwd_inner_microstep: 1433.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-11 02:55:01,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1432.08 | bwd_inner_microstep: 1432.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3812
[2024-06-11 02:55:03,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1402.53 | bwd_inner_microstep: 1402.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 02:55:05,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.80 | bwd_microstep: 1657.56 | bwd_inner_microstep: 1657.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 02:55:07,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1392.83 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570
[2024-06-11 02:55:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.43 | bwd_microstep: 1528.28 | bwd_inner_microstep: 1528.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3694
[2024-06-11 02:55:11,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.85 | bwd_microstep: 1325.77 | bwd_inner_microstep: 1325.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-11 02:55:13,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1405.82 | bwd_inner_microstep: 1405.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3750
[2024-06-11 02:55:16,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.81 | bwd_microstep: 1840.94 | bwd_inner_microstep: 1840.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-11 02:55:18,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1493.56 | bwd_inner_microstep: 1493.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-11 02:55:21,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.01 | optimizer_step: 6.62
[2024-06-11 02:55:21,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.18 | bwd_microstep: 3050.97 | bwd_inner_microstep: 1665.67 | bwd_allreduce_microstep: 1385.25 | step_microstep: 37.52
[2024-06-11 02:55:21,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16498.64 | bwd: 45692.02 | bwd_inner: 44305.84 | bwd_allreduce: 1385.48 | step: 39.02


 87%|████████▋ | 1504/1726 [26:12:50<3:49:10, 61.94s/it]
 87%|████████▋ | 1505/1726 [26:13:51<3:47:24, 61.74s/it]


 87%|████████▋ | 1505/1726 [26:13:51<3:47:24, 61.74s/it]
 87%|████████▋ | 1506/1726 [26:14:51<3:44:57, 61.35s/it]


 87%|████████▋ | 1506/1726 [26:14:51<3:44:57, 61.35s/it]
 87%|████████▋ | 1507/1726 [26:15:54<3:45:28, 61.77s/it]


 87%|████████▋ | 1507/1726 [26:15:54<3:45:28, 61.77s/it]
 87%|████████▋ | 1508/1726 [26:16:56<3:44:11, 61.70s/it]


 87%|████████▋ | 1508/1726 [26:16:56<3:44:11, 61.70s/it]
 87%|████████▋ | 1509/1726 [26:17:58<3:44:03, {'loss': 1.2004, 'learning_rate': 1.6356778623945223e-06, 'epoch': 0.87}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 02:55:23,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1372.21 | bwd_inner_microstep: 1372.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4205
[2024-06-11 02:55:26,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.77 | bwd_microstep: 1751.27 | bwd_inner_microstep: 1751.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-11 02:55:28,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.30 | bwd_microstep: 1646.40 | bwd_inner_microstep: 1646.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778
[2024-06-11 02:55:30,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.64 | bwd_microstep: 1639.99 | bwd_inner_microstep: 1639.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964
[2024-06-11 02:55:31,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 794.06 | bwd_inner_microstep: 794.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-11 02:55:33,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.14 | bwd_microstep: 1214.33 | bwd_inner_microstep: 1214.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 02:55:35,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1384.84 | bwd_inner_microstep: 1384.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3486
[2024-06-11 02:55:37,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1332.33 | bwd_inner_microstep: 1332.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 02:55:39,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.01 | bwd_microstep: 1388.16 | bwd_inner_microstep: 1388.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-11 02:55:41,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.21 | bwd_microstep: 1310.82 | bwd_inner_microstep: 1310.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675
[2024-06-11 02:55:43,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.20 | bwd_microstep: 1615.18 | bwd_inner_microstep: 1615.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500
[2024-06-11 02:55:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.26 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-11 02:55:47,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1485.86 | bwd_inner_microstep: 1485.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3431
[2024-06-11 02:55:49,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1397.49 | bwd_inner_microstep: 1397.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-11 02:55:51,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.13 | bwd_microstep: 1709.34 | bwd_inner_microstep: 1709.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3391
[2024-06-11 02:55:53,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.75 | bwd_microstep: 1337.50 | bwd_inner_microstep: 1337.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-11 02:55:55,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1513.87 | bwd_inner_microstep: 1513.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-11 02:55:57,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1510.72 | bwd_inner_microstep: 1510.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3630
[2024-06-11 02:55:59,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.98 | bwd_microstep: 1646.35 | bwd_inner_microstep: 1646.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287
[2024-06-11 02:56:01,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 880.72 | bwd_inner_microstep: 880.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516
[2024-06-11 02:56:03,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.29 | bwd_microstep: 1581.42 | bwd_inner_microstep: 1581.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-11 02:56:04,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.44 | bwd_microstep: 714.97 | bwd_inner_microstep: 714.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-11 02:56:06,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1410.35 | bwd_inner_microstep: 1410.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3613
[2024-06-11 02:56:08,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1569.47 | bwd_inner_microstep: 1569.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-11 02:56:10,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.27 | bwd_microstep: 1489.88 | bwd_inner_microstep: 1489.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 02:56:12,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1379.14 | bwd_inner_microstep: 1379.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-11 02:56:14,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1255.23 | bwd_inner_microstep: 1255.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2179
[2024-06-11 02:56:15,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.07 | bwd_microstep: 954.72 | bwd_inner_microstep: 954.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-11 02:56:17,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1389.43 | bwd_inner_microstep: 1389.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-11 02:56:19,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.03 | bwd_microstep: 1537.93 | bwd_inner_microstep: 1537.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 02:56:21,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1554.48 | bwd_inner_microstep: 1554.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3772
[2024-06-11 02:56:23,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.62
[2024-06-11 02:56:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.66 | bwd_microstep: 1365.39 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 87.08 | step_microstep: 37.57
[2024-06-11 02:56:23,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16581.09 | bwd: 44542.11 | bwd_inner: 44454.13 | bwd_allreduce: 87.31 | step: 39.06
{'loss': 1.2242, 'learning_rate': 1.620843767595388e-06, 'epoch': 0.87}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864
[2024-06-11 02:56:24,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.27 | bwd_microstep: 670.20 | bwd_inner_microstep: 670.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-11 02:56:25,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.27 | bwd_microstep: 788.23 | bwd_inner_microstep: 788.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862
[2024-06-11 02:56:27,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.85 | bwd_microstep: 1564.63 | bwd_inner_microstep: 1564.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1871
[2024-06-11 02:56:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.45 | bwd_microstep: 743.15 | bwd_inner_microstep: 743.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3564
[2024-06-11 02:56:30,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.67 | bwd_microstep: 1350.39 | bwd_inner_microstep: 1350.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 02:56:32,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 02:56:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.16 | bwd_microstep: 1486.26 | bwd_inner_microstep: 1486.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 02:56:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1285.62 | bwd_inner_microstep: 1285.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-11 02:56:38,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1963
[2024-06-11 02:56:39,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 734.22 | bwd_inner_microstep: 734.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2008
[2024-06-11 02:56:40,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.60 | bwd_microstep: 741.89 | bwd_inner_microstep: 741.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3925
[2024-06-11 02:56:42,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.51 | bwd_microstep: 1430.26 | bwd_inner_microstep: 1430.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 02:56:43,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3626
[2024-06-11 02:56:46,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1538.87 | bwd_inner_microstep: 1538.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671
[2024-06-11 02:56:47,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.33 | bwd_microstep: 1328.59 | bwd_inner_microstep: 1328.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3656
[2024-06-11 02:56:50,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.70 | bwd_microstep: 1656.84 | bwd_inner_microstep: 1656.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 02:56:51,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1255.46 | bwd_inner_microstep: 1255.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 02:56:53,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.36 | bwd_microstep: 1260.82 | bwd_inner_microstep: 1260.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2105
[2024-06-11 02:56:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.70 | bwd_microstep: 965.37 | bwd_inner_microstep: 965.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3526
[2024-06-11 02:56:57,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.87 | bwd_microstep: 1634.34 | bwd_inner_microstep: 1634.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1893
[2024-06-11 02:56:58,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.01 | bwd_microstep: 776.06 | bwd_inner_microstep: 776.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3819
[2024-06-11 02:57:00,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.33 | bwd_microstep: 1617.11 | bwd_inner_microstep: 1617.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 02:57:02,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1394.72 | bwd_inner_microstep: 1394.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1896
[2024-06-11 02:57:03,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 810.92 | bwd_inner_microstep: 810.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 02:57:05,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1548.77 | bwd_inner_microstep: 1548.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446
[2024-06-11 02:57:07,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.67 | bwd_microstep: 1286.79 | bwd_inner_microstep: 1286.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-11 02:57:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1617.26 | bwd_inner_microstep: 1617.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 02:57:12,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.12 | bwd_microstep: 1650.22 | bwd_inner_microstep: 1650.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 899
[2024-06-11 02:57:12,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.03 | bwd_microstep: 371.53 | bwd_inner_microstep: 371.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555
[2024-06-11 02:57:14,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.24 | bwd_microstep: 1203.52 | bwd_inner_microstep: 1203.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 02:57:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1399.31 | bwd_inner_microstep: 1399.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2007
[2024-06-11 02:57:25,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-11 02:57:25,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.35 | bwd_microstep: 8521.41 | bwd_inner_microstep: 801.20 | bwd_allreduce_microstep: 7720.15 | step_microstep: 38.20
[2024-06-11 02:57:25,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14583.87 | bwd: 46695.97 | bwd_inner: 38974.87 | bwd_allreduce: 7720.39 | step: 39.67
{'loss': 1.1542, 'learning_rate': 1.606074404088962e-06, 'epoch': 0.88}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389
[2024-06-11 02:57:26,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.59 | bwd_microstep: 1295.19 | bwd_inner_microstep: 1295.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3943
[2024-06-11 02:57:28,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1590.12 | bwd_inner_microstep: 1590.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-11 02:57:31,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.32 | bwd_microstep: 1555.09 | bwd_inner_microstep: 1555.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2884
[2024-06-11 02:57:32,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.60 | bwd_microstep: 1182.50 | bwd_inner_microstep: 1182.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784
[2024-06-11 02:57:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.28 | bwd_microstep: 1645.30 | bwd_inner_microstep: 1645.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 02:57:36,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-11 02:57:38,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1293.82 | bwd_inner_microstep: 1293.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 02:57:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.46 | bwd_microstep: 1384.96 | bwd_inner_microstep: 1384.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484
[2024-06-11 02:57:42,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.63 | bwd_microstep: 1311.24 | bwd_inner_microstep: 1311.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 02:57:44,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.96 | bwd_microstep: 1337.96 | bwd_inner_microstep: 1337.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3708
[2024-06-11 02:57:46,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.71 | bwd_microstep: 1718.89 | bwd_inner_microstep: 1718.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3420
[2024-06-11 02:57:48,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1357.56 | bwd_inner_microstep: 1357.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3119
[2024-06-11 02:57:50,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1247.54 | bwd_inner_microstep: 1247.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-11 02:57:52,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.99 | bwd_microstep: 1623.08 | bwd_inner_microstep: 1623.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 02:57:54,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-11 02:57:56,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1189.87 | bwd_inner_microstep: 1189.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473
[2024-06-11 02:57:57,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.44 | bwd_microstep: 1182.81 | bwd_inner_microstep: 1182.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565
[2024-06-11 02:57:59,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1333.15 | bwd_inner_microstep: 1333.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 02:58:01,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1251.29 | bwd_inner_microstep: 1251.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-11 02:58:02,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.96 | bwd_microstep: 696.07 | bwd_inner_microstep: 696.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-11 02:58:03,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.46 | bwd_microstep: 1188.93 | bwd_inner_microstep: 1188.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-11 02:58:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.35 | bwd_microstep: 1488.33 | bwd_inner_microstep: 1488.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 02:58:07,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.53 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-11 02:58:10,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.33 | bwd_microstep: 1656.63 | bwd_inner_microstep: 1656.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-11 02:58:12,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.42 | bwd_microstep: 1559.20 | bwd_inner_microstep: 1559.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-11 02:58:13,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1184.51 | bwd_inner_microstep: 1184.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-11 02:58:15,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1300.93 | bwd_inner_microstep: 1300.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-11 02:58:17,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.09 | bwd_microstep: 1304.24 | bwd_inner_microstep: 1304.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3567
[2024-06-11 02:58:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.37 | bwd_microstep: 1431.25 | bwd_inner_microstep: 1431.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803
[2024-06-11 02:58:21,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.46 | bwd_microstep: 1480.95 | bwd_inner_microstep: 1480.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-11 02:58:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1350.36 | bwd_inner_microstep: 1350.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592
[2024-06-11 02:58:25,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.01 | optimizer_step: 6.61
[2024-06-11 02:58:25,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.77 | bwd_microstep: 1862.42 | bwd_inner_microstep: 1756.09 | bwd_allreduce_microstep: 106.29 | step_microstep: 37.34
[2024-06-11 02:58:25,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16490.11 | bwd: 44141.17 | bwd_inner: 44033.99 | bwd_allreduce: 106.51 | step: 38.81
{'loss': 1.1846, 'learning_rate': 1.5913698238928632e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 02:58:27,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1375.71 | bwd_inner_microstep: 1375.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-11 02:58:29,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2432
[2024-06-11 02:58:31,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.44 | bwd_microstep: 1006.39 | bwd_inner_microstep: 1006.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537
[2024-06-11 02:58:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1456.41 | bwd_inner_microstep: 1456.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402
[2024-06-11 02:58:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.05 | bwd_microstep: 1181.34 | bwd_inner_microstep: 1181.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957
[2024-06-11 02:58:35,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 701.59 | bwd_inner_microstep: 701.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714
[2024-06-11 02:58:37,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1530.87 | bwd_inner_microstep: 1530.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 02:58:39,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.44 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 02:58:41,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-11 02:58:43,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1347.77 | bwd_inner_microstep: 1347.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 02:58:45,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1389.57 | bwd_inner_microstep: 1389.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 02:58:47,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.37 | bwd_microstep: 1282.65 | bwd_inner_microstep: 1282.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-11 02:58:48,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.12 | bwd_microstep: 889.45 | bwd_inner_microstep: 889.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 02:58:50,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1485.55 | bwd_inner_microstep: 1485.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3634
[2024-06-11 02:58:52,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.73 | bwd_microstep: 1463.10 | bwd_inner_microstep: 1463.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-11 02:58:54,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.26 | bwd_microstep: 1193.66 | bwd_inner_microstep: 1193.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 02:58:55,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988
[2024-06-11 02:58:57,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.66 | bwd_microstep: 833.93 | bwd_inner_microstep: 833.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-11 02:58:59,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1611.77 | bwd_inner_microstep: 1611.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-11 02:59:01,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1454.92 | bwd_inner_microstep: 1454.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-11 02:59:03,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.46 | bwd_microstep: 1355.28 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 02:59:05,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 1563.56 | bwd_inner_microstep: 1563.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545
[2024-06-11 02:59:07,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1358.40 | bwd_inner_microstep: 1358.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3807
[2024-06-11 02:59:09,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.56 | bwd_microstep: 1686.38 | bwd_inner_microstep: 1686.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 02:59:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.83 | bwd_microstep: 1383.11 | bwd_inner_microstep: 1383.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551
[2024-06-11 02:59:13,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1295.82 | bwd_inner_microstep: 1295.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144
[2024-06-11 02:59:14,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.56 | bwd_microstep: 833.62 | bwd_inner_microstep: 833.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-11 02:59:16,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.67 | bwd_microstep: 1642.53 | bwd_inner_microstep: 1642.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-11 02:59:18,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.40 | bwd_microstep: 1603.89 | bwd_inner_microstep: 1603.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-11 02:59:21,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.67 | bwd_microstep: 1627.72 | bwd_inner_microstep: 1627.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-11 02:59:23,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1417.09 | bwd_inner_microstep: 1417.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2275
[2024-06-11 02:59:30,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 02:59:30,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.43 | bwd_microstep: 7294.98 | bwd_inner_microstep: 1418.06 | bwd_allreduce_microstep: 5876.87 | step_microstep: 37.81
[2024-06-11 02:59:30,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15853.38 | bwd: 48587.21 | bwd_inner: 42709.43 | bwd_allreduce: 5877.10 | step: 39.28
{'loss': 1.2078, 'learning_rate': 1.5767300787965512e-06, 'epoch': 0.88}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-11 02:59:32,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1465.37 | bwd_inner_microstep: 1465.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4348
[2024-06-11 02:59:35,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.19 | bwd_microstep: 1696.23 | bwd_inner_microstep: 1696.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3865
[2024-06-11 02:59:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1519.97 | bwd_inner_microstep: 1519.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 02:59:39,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2693
[2024-06-11 02:59:40,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.11 | bwd_microstep: 1027.82 | bwd_inner_microstep: 1027.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 02:59:42,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1246.29 | bwd_inner_microstep: 1246.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 02:59:44,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.68 | bwd_microstep: 1244.78 | bwd_inner_microstep: 1244.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3745
[2024-06-11 02:59:45,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1337.10 | bwd_inner_microstep: 1337.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743
[2024-06-11 02:59:48,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.12 | bwd_microstep: 1625.29 | bwd_inner_microstep: 1625.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404
[2024-06-11 02:59:49,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1309.39 | bwd_inner_microstep: 1309.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-11 02:59:51,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3408
[2024-06-11 02:59:53,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.68 | bwd_microstep: 1404.92 | bwd_inner_microstep: 1404.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-11 02:59:55,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.57 | bwd_microstep: 1547.19 | bwd_inner_microstep: 1547.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-11 02:59:57,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1558.99 | bwd_inner_microstep: 1558.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389
[2024-06-11 02:59:59,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.16 | bwd_microstep: 1301.97 | bwd_inner_microstep: 1301.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 03:00:01,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1342.60 | bwd_inner_microstep: 1342.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-11 03:00:03,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1291.26 | bwd_inner_microstep: 1291.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-11 03:00:05,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1490.37 | bwd_inner_microstep: 1490.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 03:00:07,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1377.57 | bwd_inner_microstep: 1377.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-11 03:00:09,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1536.52 | bwd_inner_microstep: 1536.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-11 03:00:11,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1511.90 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2000
[2024-06-11 03:00:12,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.94 | bwd_microstep: 705.64 | bwd_inner_microstep: 705.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695
[2024-06-11 03:00:14,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1332.56 | bwd_inner_microstep: 1332.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-11 03:00:16,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.75 | bwd_microstep: 1653.36 | bwd_inner_microstep: 1653.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-11 03:00:18,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1402.19 | bwd_inner_microstep: 1402.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 03:00:20,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1251.36 | bwd_inner_microstep: 1251.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2677
[2024-06-11 03:00:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.49 | bwd_microstep: 1218.81 | bwd_inner_microstep: 1218.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-11 03:00:24,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.72 | bwd_microstep: 1506.82 | bwd_inner_microstep: 1506.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3560
[2024-06-11 03:00:26,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.71 | bwd_microstep: 1539.06 | bwd_inner_microstep: 1539.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 03:00:28,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.46 | bwd_microstep: 1653.11 | bwd_inner_microstep: 1653.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-11 03:00:30,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1496.21 | bwd_inner_microstep: 1496.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-11 03:00:32,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.05 | optimizer_step: 6.63
[2024-06-11 03:00:32,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1508.73 | bwd_inner_microstep: 1443.23 | bwd_allreduce_microstep: 65.45 | step_microstep: 37.53
[2024-06-11 03:00:32,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16702.82 | bwd: 44840.73 | bwd_inner: 44774.38 | bwd_allreduce: 65.68 | step: 39.06
61.95s/it]


 87%|████████▋ | 1509/1726 [26:17:58<3:44:03, 61.95s/it]
 87%|████████▋ | 1510/1726 [26:19:00<3:42:29, 61.80s/it]


 87%|████████▋ | 1510/1726 [26:19:00<3:42:29, 61.80s/it]
 88%|████████▊ | 1511/1726 [26:20:01<3:41:14, 61.74s/it]


 88%|████████▊ | 1511/1726 [26:20:01<3:41:14, 61.74s/it]
 88%|████████▊ | 1512/1726 [26:21:02<3:39:22, 61.51s/it]


 88%|████████▊ | 1512/1726 [26:21:02<3:39:22, 61.51s/it]
 88%|████████▊ | 1513/1726 [26:22:07<3:41:49, 62.49s/it]


 88%|████████▊ | 1513/1726 [26:22:07<3:41:49, 62.49s/it]
 88%|████████▊ | 1514/1726 [26{'loss': 1.1792, 'learning_rate': 1.5621552203611234e-06, 'epoch': 0.88}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 03:00:34,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.59 | bwd_microstep: 1240.12 | bwd_inner_microstep: 1240.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509
[2024-06-11 03:00:36,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.86 | bwd_microstep: 1318.16 | bwd_inner_microstep: 1318.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-11 03:00:37,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.25 | bwd_microstep: 1185.05 | bwd_inner_microstep: 1185.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3858
[2024-06-11 03:00:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.65 | bwd_microstep: 1663.05 | bwd_inner_microstep: 1663.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-11 03:00:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.65 | bwd_microstep: 1641.56 | bwd_inner_microstep: 1641.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 03:00:44,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-11 03:00:46,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1446.25 | bwd_inner_microstep: 1446.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2144
[2024-06-11 03:00:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.05 | bwd_microstep: 926.85 | bwd_inner_microstep: 926.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679
[2024-06-11 03:00:49,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.89 | bwd_microstep: 1425.14 | bwd_inner_microstep: 1425.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 03:00:51,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947
[2024-06-11 03:00:52,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 789.91 | bwd_inner_microstep: 789.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2145
[2024-06-11 03:00:53,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.21 | bwd_microstep: 945.09 | bwd_inner_microstep: 945.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3469
[2024-06-11 03:00:55,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.20 | bwd_microstep: 1425.72 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 03:00:57,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1374.11 | bwd_inner_microstep: 1374.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-11 03:00:58,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 791.79 | bwd_inner_microstep: 791.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641
[2024-06-11 03:01:00,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.44 | bwd_microstep: 1709.62 | bwd_inner_microstep: 1709.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625
[2024-06-11 03:01:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1441.25 | bwd_inner_microstep: 1441.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 03:01:05,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1499.75 | bwd_inner_microstep: 1499.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-11 03:01:06,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.60 | bwd_microstep: 1164.91 | bwd_inner_microstep: 1164.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-11 03:01:07,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.89 | bwd_microstep: 805.53 | bwd_inner_microstep: 805.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965
[2024-06-11 03:01:08,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.64 | bwd_microstep: 796.10 | bwd_inner_microstep: 796.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-11 03:01:10,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.15 | bwd_microstep: 1428.56 | bwd_inner_microstep: 1428.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 03:01:13,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.22 | bwd_microstep: 1656.60 | bwd_inner_microstep: 1656.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540
[2024-06-11 03:01:15,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1496.26 | bwd_inner_microstep: 1496.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3720
[2024-06-11 03:01:17,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1368.51 | bwd_inner_microstep: 1368.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3605
[2024-06-11 03:01:18,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468
[2024-06-11 03:01:20,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.01 | bwd_microstep: 1242.55 | bwd_inner_microstep: 1242.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3817
[2024-06-11 03:01:22,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.84 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 03:01:25,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.79 | bwd_microstep: 1646.88 | bwd_inner_microstep: 1646.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3030
[2024-06-11 03:01:26,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.39 | bwd_microstep: 1070.61 | bwd_inner_microstep: 1070.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 03:01:28,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.21 | bwd_microstep: 1447.60 | bwd_inner_microstep: 1447.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 03:01:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 03:01:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 5854.40 | bwd_inner_microstep: 1541.38 | bwd_allreduce_microstep: 4312.97 | step_microstep: 37.79
[2024-06-11 03:01:35,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15694.78 | bwd: 46418.85 | bwd_inner: 42104.98 | bwd_allreduce: 4313.20 | step: 39.28
{'loss': 1.1518, 'learning_rate': 1.5476452999191626e-06, 'epoch': 0.88}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957
[2024-06-11 03:01:36,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.62 | bwd_microstep: 886.20 | bwd_inner_microstep: 886.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-11 03:01:38,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1299.32 | bwd_inner_microstep: 1299.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908
[2024-06-11 03:01:40,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1588.43 | bwd_inner_microstep: 1588.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-11 03:01:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1306.65 | bwd_inner_microstep: 1306.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 03:01:44,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1939
[2024-06-11 03:01:45,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.70 | bwd_microstep: 759.07 | bwd_inner_microstep: 759.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 03:01:46,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.24 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-11 03:01:47,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.09 | bwd_microstep: 791.10 | bwd_inner_microstep: 791.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-11 03:01:49,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.23 | bwd_microstep: 1279.05 | bwd_inner_microstep: 1279.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2245
[2024-06-11 03:01:50,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.78 | bwd_microstep: 901.40 | bwd_inner_microstep: 901.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504
[2024-06-11 03:01:52,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1442.78 | bwd_inner_microstep: 1442.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 4023
[2024-06-11 03:01:55,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 682.59 | bwd_microstep: 1875.48 | bwd_inner_microstep: 1875.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 03:01:57,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1283.80 | bwd_inner_microstep: 1283.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2392
[2024-06-11 03:01:58,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.01 | bwd_microstep: 1028.98 | bwd_inner_microstep: 1028.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3694
[2024-06-11 03:02:00,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.96 | bwd_microstep: 1548.04 | bwd_inner_microstep: 1548.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 03:02:02,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1288.17 | bwd_inner_microstep: 1288.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514
[2024-06-11 03:02:04,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1448.80 | bwd_inner_microstep: 1448.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093
[2024-06-11 03:02:05,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.07 | bwd_microstep: 916.82 | bwd_inner_microstep: 916.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-11 03:02:07,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1443.34 | bwd_inner_microstep: 1443.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3521
[2024-06-11 03:02:09,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.19 | bwd_microstep: 1516.84 | bwd_inner_microstep: 1516.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-11 03:02:11,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 788.85 | bwd_inner_microstep: 788.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-11 03:02:13,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1429.47 | bwd_inner_microstep: 1429.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2098
[2024-06-11 03:02:14,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.48 | bwd_microstep: 886.31 | bwd_inner_microstep: 886.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940
[2024-06-11 03:02:15,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 696.75 | bwd_inner_microstep: 696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 03:02:17,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 03:02:19,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1492.93 | bwd_inner_microstep: 1492.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-11 03:02:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.84 | bwd_microstep: 1279.19 | bwd_inner_microstep: 1279.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-11 03:02:23,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.08 | bwd_microstep: 1608.42 | bwd_inner_microstep: 1608.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 03:02:24,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1285.12 | bwd_inner_microstep: 1285.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-11 03:02:27,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1552.37 | bwd_inner_microstep: 1552.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458
[2024-06-11 03:02:28,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.72 | bwd_microstep: 1310.54 | bwd_inner_microstep: 1310.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-11 03:02:36,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 03:02:36,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.00 | bwd_microstep: 7171.55 | bwd_inner_microstep: 1862.77 | bwd_allreduce_microstep: 5308.73 | step_microstep: 37.73
[2024-06-11 03:02:36,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15154.61 | bwd: 46152.26 | bwd_inner: 40842.62 | bwd_allreduce: 5308.95 | step: 39.18
{'loss': 1.1708, 'learning_rate': 1.5332003685745279e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 03:02:38,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1366.93 | bwd_inner_microstep: 1366.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3955
[2024-06-11 03:02:40,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.53 | bwd_microstep: 1591.57 | bwd_inner_microstep: 1591.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 03:02:42,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.54 | bwd_microstep: 1292.02 | bwd_inner_microstep: 1291.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 03:02:44,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.09 | bwd_microstep: 1240.98 | bwd_inner_microstep: 1240.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 03:02:46,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1383.61 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-11 03:02:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1250.45 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-11 03:02:49,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.23 | bwd_microstep: 1184.11 | bwd_inner_microstep: 1184.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-11 03:02:51,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1302.39 | bwd_inner_microstep: 1302.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065
[2024-06-11 03:02:52,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.80 | bwd_microstep: 817.81 | bwd_inner_microstep: 817.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-11 03:02:54,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.58 | bwd_microstep: 1529.62 | bwd_inner_microstep: 1529.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128
[2024-06-11 03:02:55,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.46 | bwd_microstep: 927.34 | bwd_inner_microstep: 927.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3439
[2024-06-11 03:02:57,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1311.02 | bwd_inner_microstep: 1310.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 03:02:59,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1469.26 | bwd_inner_microstep: 1469.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-11 03:03:01,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619
[2024-06-11 03:03:03,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.46 | bwd_microstep: 1432.11 | bwd_inner_microstep: 1432.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2875
[2024-06-11 03:03:05,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.57 | bwd_microstep: 1116.73 | bwd_inner_microstep: 1116.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1990
[2024-06-11 03:03:06,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.75 | bwd_microstep: 861.77 | bwd_inner_microstep: 861.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3531
[2024-06-11 03:03:08,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.59 | bwd_microstep: 1455.07 | bwd_inner_microstep: 1455.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972
[2024-06-11 03:03:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.19 | bwd_microstep: 828.04 | bwd_inner_microstep: 828.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3511
[2024-06-11 03:03:11,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1438.52 | bwd_inner_microstep: 1438.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-11 03:03:13,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.24 | bwd_microstep: 1424.38 | bwd_inner_microstep: 1424.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685
[2024-06-11 03:03:15,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1431.57 | bwd_inner_microstep: 1431.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3470
[2024-06-11 03:03:17,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.83 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-11 03:03:18,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.39 | bwd_microstep: 1411.37 | bwd_inner_microstep: 1411.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-11 03:03:20,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1300.62 | bwd_inner_microstep: 1300.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 03:03:22,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.42 | bwd_microstep: 1546.73 | bwd_inner_microstep: 1546.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-11 03:03:24,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.06 | bwd_microstep: 1433.25 | bwd_inner_microstep: 1433.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3592
[2024-06-11 03:03:26,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1462.80 | bwd_inner_microstep: 1462.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-11 03:03:28,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1501.99 | bwd_inner_microstep: 1501.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3569
[2024-06-11 03:03:31,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1528.10 | bwd_inner_microstep: 1528.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-11 03:03:33,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1402.42 | bwd_inner_microstep: 1402.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-11 03:03:38,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.27 | optimizer_step: 6.60
[2024-06-11 03:03:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 4579.89 | bwd_inner_microstep: 1585.05 | bwd_allreduce_microstep: 2994.75 | step_microstep: 39.81
[2024-06-11 03:03:38,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15837.07 | bwd: 45315.01 | bwd_inner: 42319.31 | bwd_allreduce: 2995.00 | step: 41.26
{'loss': 1.1462, 'learning_rate': 1.518820477202203e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 03:03:40,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1347.76 | bwd_inner_microstep: 1347.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2380
[2024-06-11 03:03:41,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.57 | bwd_microstep: 947.59 | bwd_inner_microstep: 947.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 03:03:43,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.65 | bwd_microstep: 1651.48 | bwd_inner_microstep: 1651.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 03:03:45,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1380.28 | bwd_inner_microstep: 1380.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2244
[2024-06-11 03:03:46,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.00 | bwd_microstep: 899.46 | bwd_inner_microstep: 899.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279
[2024-06-11 03:03:48,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.94 | bwd_microstep: 906.58 | bwd_inner_microstep: 906.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 03:03:49,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1350.57 | bwd_inner_microstep: 1350.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 03:03:51,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1399.73 | bwd_inner_microstep: 1399.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1974
[2024-06-11 03:03:52,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.82 | bwd_microstep: 750.08 | bwd_inner_microstep: 750.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1944
[2024-06-11 03:03:54,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.43 | bwd_microstep: 840.30 | bwd_inner_microstep: 840.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 03:03:55,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 1393.93 | bwd_inner_microstep: 1393.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431
[2024-06-11 03:03:57,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.86 | bwd_microstep: 1370.62 | bwd_inner_microstep: 1370.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2226
[2024-06-11 03:03:59,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.50 | bwd_microstep: 992.60 | bwd_inner_microstep: 992.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399
[2024-06-11 03:04:01,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.85 | bwd_microstep: 1434.56 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419
[2024-06-11 03:04:03,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1439.33 | bwd_inner_microstep: 1439.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655
[2024-06-11 03:04:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.49 | bwd_microstep: 1617.39 | bwd_inner_microstep: 1617.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3004
[2024-06-11 03:04:07,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.74 | bwd_microstep: 1298.81 | bwd_inner_microstep: 1298.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624
[2024-06-11 03:04:09,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1508.94 | bwd_inner_microstep: 1508.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-11 03:04:11,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.61 | bwd_microstep: 1612.18 | bwd_inner_microstep: 1612.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-11 03:04:13,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.33 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464
[2024-06-11 03:04:15,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1310.08 | bwd_inner_microstep: 1310.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461
[2024-06-11 03:04:17,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.02 | bwd_microstep: 1213.75 | bwd_inner_microstep: 1213.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-11 03:04:18,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1292.93 | bwd_inner_microstep: 1292.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 03:04:20,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.10 | bwd_microstep: 1352.55 | bwd_inner_microstep: 1352.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2062
[2024-06-11 03:04:21,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.40 | bwd_microstep: 894.83 | bwd_inner_microstep: 894.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 03:04:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1393.26 | bwd_inner_microstep: 1393.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-11 03:04:25,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.24 | bwd_microstep: 1339.07 | bwd_inner_microstep: 1339.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039
[2024-06-11 03:04:26,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.21 | bwd_microstep: 903.94 | bwd_inner_microstep: 903.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3811
[2024-06-11 03:04:28,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.31 | bwd_microstep: 1388.46 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-11 03:04:30,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434
[2024-06-11 03:04:32,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.94 | bwd_microstep: 1373.78 | bwd_inner_microstep: 1373.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-11 03:04:39,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-11 03:04:39,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 6574.05 | bwd_inner_microstep: 1684.88 | bwd_allreduce_microstep: 4889.10 | step_microstep: 38.59
[2024-06-11 03:04:39,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15344.94 | bwd: 46040.24 | bwd_inner: 41150.21 | bwd_allreduce: 4889.34 | step: 40.07
{'loss': 1.1446, 'learning_rate': 1.504505676448076e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 03:04:41,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.65 | bwd_microstep: 1331.47 | bwd_inner_microstep: 1331.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851
[2024-06-11 03:04:43,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.17 | bwd_microstep: 1454.58 | bwd_inner_microstep: 1454.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 03:04:45,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 03:04:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.38 | bwd_microstep: 1341.11 | bwd_inner_microstep: 1341.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 03:04:49,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.19 | bwd_microstep: 1549.82 | bwd_inner_microstep: 1549.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471
[2024-06-11 03:04:51,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.47 | bwd_microstep: 1214.19 | bwd_inner_microstep: 1214.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 03:04:53,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1278.90 | bwd_inner_microstep: 1278.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-11 03:04:54,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 03:04:56,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.53 | bwd_microstep: 1281.46 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 03:04:58,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.96 | bwd_microstep: 1351.44 | bwd_inner_microstep: 1351.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 03:05:00,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.04 | bwd_microstep: 1261.07 | bwd_inner_microstep: 1261.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-11 03:05:02,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.32 | bwd_microstep: 1440.63 | bwd_inner_microstep: 1440.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-11 03:05:04,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.47 | bwd_microstep: 1568.53 | bwd_inner_microstep: 1568.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2046
[2024-06-11 03:05:05,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.29 | bwd_microstep: 838.04 | bwd_inner_microstep: 838.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 03:05:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.33 | bwd_microstep: 1386.46 | bwd_inner_microstep: 1386.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 03:05:09,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1400.88 | bwd_inner_microstep: 1400.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-11 03:05:10,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.13 | bwd_microstep: 696.27 | bwd_inner_microstep: 696.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1952
[2024-06-11 03:05:11,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.83 | bwd_microstep: 837.50 | bwd_inner_microstep: 837.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518
[2024-06-11 03:05:13,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1488.41 | bwd_inner_microstep: 1488.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660
[2024-06-11 03:05:15,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.35 | bwd_microstep: 1451.25 | bwd_inner_microstep: 1451.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615
[2024-06-11 03:05:15,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.61 | bwd_microstep: 262.89 | bwd_inner_microstep: 262.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 03:05:17,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1373.36 | bwd_inner_microstep: 1373.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 03:05:19,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.27 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004
[2024-06-11 03:05:20,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.64 | bwd_microstep: 800.70 | bwd_inner_microstep: 800.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 03:05:22,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.51 | bwd_microstep: 1559.94 | bwd_inner_microstep: 1559.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690
[2024-06-11 03:05:24,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.94 | bwd_microstep: 1363.09 | bwd_inner_microstep: 1363.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2053
[2024-06-11 03:05:26,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.29 | bwd_microstep: 1008.67 | bwd_inner_microstep: 1008.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2075
[2024-06-11 03:05:27,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.84 | bwd_microstep: 945.62 | bwd_inner_microstep: 945.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-11 03:05:28,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.49 | bwd_microstep: 792.08 | bwd_inner_microstep: 792.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698
[2024-06-11 03:05:30,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.45 | bwd_microstep: 1424.06 | bwd_inner_microstep: 1424.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806
[2024-06-11 03:05:32,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.11 | bwd_microstep: 1755.64 | bwd_inner_microstep: 1755.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3576
[2024-06-11 03:05:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-11 03:05:42,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.66 | bwd_microstep: 9138.22 | bwd_inner_microstep: 1455.96 | bwd_allreduce_microstep: 7682.19 | step_microstep: 38.71
[2024-06-11 03:05:42,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14902.84 | bwd: 47538.10 | bwd_inner: 39854.98 | bwd_allreduce: 7682.43 | step: 40.26
:23:09<3:40:08, 62.31s/it]


 88%|████████▊ | 1514/1726 [26:23:09<3:40:08, 62.31s/it]
 88%|████████▊ | 1515/1726 [26:24:11<3:39:15, 62.35s/it]


 88%|████████▊ | 1515/1726 [26:24:11<3:39:15, 62.35s/it]
 88%|████████▊ | 1516/1726 [26:25:13<3:37:27, 62.13s/it]


 88%|████████▊ | 1516/1726 [26:25:13<3:37:27, 62.13s/it]
 88%|████████▊ | 1517/1726 [26:26:14<3:35:44, 61.94s/it]


 88%|████████▊ | 1517/1726 [26:26:14<3:35:44, 61.94s/it]
 88%|████████▊ | 1518/1726 [26:27:16<3:34:28, 61.87s/it]


 88%|████████▊ | 1518/1726 [26:27:16<3:34:28, 61.87s/it]
 88%|████████▊{'loss': 1.2086, 'learning_rate': 1.4902560167288105e-06, 'epoch': 0.88}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3457
[2024-06-11 03:05:44,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1488.85 | bwd_inner_microstep: 1488.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872
[2024-06-11 03:05:46,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.10 | bwd_microstep: 1657.64 | bwd_inner_microstep: 1657.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 03:05:48,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-11 03:05:50,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 03:05:52,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1248.98 | bwd_inner_microstep: 1248.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-11 03:05:54,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.16 | bwd_microstep: 1526.71 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478
[2024-06-11 03:05:56,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.43 | bwd_microstep: 1247.20 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427
[2024-06-11 03:05:58,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.01 | bwd_microstep: 1277.06 | bwd_inner_microstep: 1277.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-11 03:05:59,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1276.87 | bwd_inner_microstep: 1276.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3680
[2024-06-11 03:06:01,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.21 | bwd_microstep: 1444.63 | bwd_inner_microstep: 1444.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-11 03:06:03,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1402.11 | bwd_inner_microstep: 1402.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 03:06:05,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1342.83 | bwd_inner_microstep: 1342.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 03:06:07,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.70 | bwd_microstep: 1479.98 | bwd_inner_microstep: 1479.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446
[2024-06-11 03:06:09,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.14 | bwd_microstep: 1215.09 | bwd_inner_microstep: 1215.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 03:06:11,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.77 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 03:06:12,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1274.39 | bwd_inner_microstep: 1274.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3629
[2024-06-11 03:06:14,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1339.95 | bwd_inner_microstep: 1339.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-11 03:06:16,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 03:06:18,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1394.22 | bwd_inner_microstep: 1394.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827
[2024-06-11 03:06:20,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1479.84 | bwd_inner_microstep: 1479.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-11 03:06:22,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1493.97 | bwd_inner_microstep: 1493.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 03:06:24,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1389.01 | bwd_inner_microstep: 1388.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-11 03:06:26,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1296.38 | bwd_inner_microstep: 1296.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-11 03:06:28,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.14 | bwd_microstep: 1445.45 | bwd_inner_microstep: 1445.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 03:06:30,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1388.37 | bwd_inner_microstep: 1388.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3629
[2024-06-11 03:06:32,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1391.67 | bwd_inner_microstep: 1391.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060
[2024-06-11 03:06:33,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.02 | bwd_microstep: 813.36 | bwd_inner_microstep: 813.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3708
[2024-06-11 03:06:35,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.26 | bwd_microstep: 1590.38 | bwd_inner_microstep: 1590.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563
[2024-06-11 03:06:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.46 | bwd_microstep: 1586.74 | bwd_inner_microstep: 1586.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3525
[2024-06-11 03:06:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.77 | bwd_microstep: 1659.46 | bwd_inner_microstep: 1659.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3707
[2024-06-11 03:06:41,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1457.54 | bwd_inner_microstep: 1457.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 03:06:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.04 | optimizer_step: 6.63
[2024-06-11 03:06:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1442.04 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.45
[2024-06-11 03:06:43,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16608.62 | bwd: 44302.04 | bwd_inner: 44293.54 | bwd_allreduce: 7.83 | step: 38.92
{'loss': 1.1973, 'learning_rate': 1.4760715482316301e-06, 'epoch': 0.88}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-11 03:06:45,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1485.83 | bwd_inner_microstep: 1485.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4151
[2024-06-11 03:06:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1570.24 | bwd_inner_microstep: 1570.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903
[2024-06-11 03:06:50,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1484.67 | bwd_inner_microstep: 1484.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862
[2024-06-11 03:06:52,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.59 | bwd_microstep: 1562.36 | bwd_inner_microstep: 1562.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3822
[2024-06-11 03:06:54,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.04 | bwd_microstep: 1290.89 | bwd_inner_microstep: 1290.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-11 03:06:55,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.74 | bwd_microstep: 1186.91 | bwd_inner_microstep: 1186.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 03:06:57,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1389.38 | bwd_inner_microstep: 1389.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2243
[2024-06-11 03:06:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.19 | bwd_microstep: 965.59 | bwd_inner_microstep: 965.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 03:07:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 1390.42 | bwd_inner_microstep: 1390.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 03:07:02,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607
[2024-06-11 03:07:04,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1438.34 | bwd_inner_microstep: 1438.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3491
[2024-06-11 03:07:06,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-11 03:07:08,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.12 | bwd_microstep: 1277.76 | bwd_inner_microstep: 1277.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2939
[2024-06-11 03:07:10,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1284.82 | bwd_inner_microstep: 1284.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 03:07:11,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1348.05 | bwd_inner_microstep: 1348.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654
[2024-06-11 03:07:14,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1519.48 | bwd_inner_microstep: 1519.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682
[2024-06-11 03:07:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.61 | bwd_microstep: 1521.42 | bwd_inner_microstep: 1521.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998
[2024-06-11 03:07:17,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 898.08 | bwd_inner_microstep: 898.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-11 03:07:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1311.35 | bwd_inner_microstep: 1311.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 03:07:20,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.19 | bwd_microstep: 788.19 | bwd_inner_microstep: 788.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-11 03:07:22,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.53 | bwd_microstep: 1290.70 | bwd_inner_microstep: 1290.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 03:07:24,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1403.52 | bwd_inner_microstep: 1403.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3714
[2024-06-11 03:07:25,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1333.75 | bwd_inner_microstep: 1333.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 03:07:28,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1558.19 | bwd_inner_microstep: 1558.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-11 03:07:29,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 03:07:31,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1283.71 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3566
[2024-06-11 03:07:33,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.92 | bwd_microstep: 1527.60 | bwd_inner_microstep: 1527.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3557
[2024-06-11 03:07:36,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1588.03 | bwd_inner_microstep: 1588.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-11 03:07:38,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1652.60 | bwd_inner_microstep: 1652.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-11 03:07:40,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.68 | bwd_microstep: 1630.20 | bwd_inner_microstep: 1630.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-11 03:07:42,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.89 | bwd_microstep: 1753.30 | bwd_inner_microstep: 1753.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2810
[2024-06-11 03:07:46,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.07 | optimizer_step: 6.59
[2024-06-11 03:07:46,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.49 | bwd_microstep: 3150.16 | bwd_inner_microstep: 1301.22 | bwd_allreduce_microstep: 1848.89 | step_microstep: 37.64
[2024-06-11 03:07:46,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16449.05 | bwd: 45926.83 | bwd_inner: 44077.04 | bwd_allreduce: 1849.11 | step: 39.11
{'loss': 1.1417, 'learning_rate': 1.4619523209141573e-06, 'epoch': 0.88}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-11 03:07:48,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.02 | bwd_microstep: 1472.81 | bwd_inner_microstep: 1472.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4669
[2024-06-11 03:07:51,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.99 | bwd_microstep: 1777.70 | bwd_inner_microstep: 1777.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3856
[2024-06-11 03:07:53,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1463.79 | bwd_inner_microstep: 1463.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 03:07:55,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.27 | bwd_microstep: 1447.32 | bwd_inner_microstep: 1447.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 03:07:56,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1342.06 | bwd_inner_microstep: 1342.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-11 03:07:58,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1278.85 | bwd_inner_microstep: 1278.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 03:08:00,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1244.23 | bwd_inner_microstep: 1244.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 03:08:02,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1386.99 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-11 03:08:04,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.36 | bwd_microstep: 1638.99 | bwd_inner_microstep: 1638.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-11 03:08:06,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.18 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 03:08:08,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1381.06 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2151
[2024-06-11 03:08:09,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 880.17 | bwd_inner_microstep: 880.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3536
[2024-06-11 03:08:11,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.08 | bwd_microstep: 1197.55 | bwd_inner_microstep: 1197.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-11 03:08:12,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.79 | bwd_microstep: 887.41 | bwd_inner_microstep: 887.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836
[2024-06-11 03:08:14,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.86 | bwd_microstep: 1752.53 | bwd_inner_microstep: 1752.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 03:08:16,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1547.91 | bwd_inner_microstep: 1547.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2108
[2024-06-11 03:08:18,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.88 | bwd_microstep: 918.82 | bwd_inner_microstep: 918.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-11 03:08:20,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.40 | bwd_microstep: 1355.76 | bwd_inner_microstep: 1355.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-11 03:08:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.68 | bwd_microstep: 1291.83 | bwd_inner_microstep: 1291.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 03:08:23,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1248.40 | bwd_inner_microstep: 1248.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 03:08:25,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.11 | bwd_microstep: 1553.33 | bwd_inner_microstep: 1553.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 03:08:27,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1554.70 | bwd_inner_microstep: 1554.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 03:08:29,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.11 | bwd_microstep: 1389.15 | bwd_inner_microstep: 1389.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-11 03:08:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.56 | bwd_microstep: 1555.05 | bwd_inner_microstep: 1555.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145
[2024-06-11 03:08:33,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.91 | bwd_microstep: 850.28 | bwd_inner_microstep: 850.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3618
[2024-06-11 03:08:35,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 1372.05 | bwd_inner_microstep: 1372.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3564
[2024-06-11 03:08:37,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.57 | bwd_microstep: 1459.00 | bwd_inner_microstep: 1458.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783
[2024-06-11 03:08:39,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1446.79 | bwd_inner_microstep: 1446.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 03:08:40,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1348.78 | bwd_inner_microstep: 1348.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400
[2024-06-11 03:08:42,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.37 | bwd_microstep: 1435.96 | bwd_inner_microstep: 1435.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3455
[2024-06-11 03:08:44,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1503.45 | bwd_inner_microstep: 1503.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3589
[2024-06-11 03:08:48,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.05 | optimizer_step: 6.60
[2024-06-11 03:08:48,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.66 | bwd_microstep: 3255.68 | bwd_inner_microstep: 1731.54 | bwd_allreduce_microstep: 1524.09 | step_microstep: 37.56
[2024-06-11 03:08:48,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16379.00 | bwd: 45484.68 | bwd_inner: 43959.67 | bwd_allreduce: 1524.32 | step: 39.04
{'loss': 1.1692, 'learning_rate': 1.4478983845042493e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 03:08:50,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.50 | bwd_microstep: 1370.19 | bwd_inner_microstep: 1370.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 03:08:52,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1389.39 | bwd_inner_microstep: 1389.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826
[2024-06-11 03:08:54,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.53 | bwd_microstep: 1511.28 | bwd_inner_microstep: 1511.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-11 03:08:56,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.25 | bwd_microstep: 1445.61 | bwd_inner_microstep: 1445.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795
[2024-06-11 03:08:58,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.90 | bwd_microstep: 1644.74 | bwd_inner_microstep: 1644.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-11 03:09:00,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.84 | bwd_microstep: 1175.44 | bwd_inner_microstep: 1175.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-11 03:09:01,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.19 | bwd_microstep: 791.28 | bwd_inner_microstep: 791.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3495
[2024-06-11 03:09:03,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1247.82 | bwd_inner_microstep: 1247.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-11 03:09:05,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.40 | bwd_microstep: 1501.67 | bwd_inner_microstep: 1501.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-11 03:09:07,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.66 | bwd_microstep: 1182.94 | bwd_inner_microstep: 1182.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-11 03:09:08,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.17 | bwd_microstep: 1288.71 | bwd_inner_microstep: 1288.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690
[2024-06-11 03:09:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.48 | bwd_microstep: 1613.57 | bwd_inner_microstep: 1613.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-11 03:09:13,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.41 | bwd_microstep: 1486.68 | bwd_inner_microstep: 1486.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676
[2024-06-11 03:09:15,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1620.42 | bwd_inner_microstep: 1620.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-11 03:09:17,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.48 | bwd_microstep: 1627.23 | bwd_inner_microstep: 1627.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522
[2024-06-11 03:09:19,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1487.71 | bwd_inner_microstep: 1487.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1981
[2024-06-11 03:09:20,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.50 | bwd_microstep: 856.14 | bwd_inner_microstep: 856.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518
[2024-06-11 03:09:22,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1415.11 | bwd_inner_microstep: 1415.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-11 03:09:24,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1407.58 | bwd_inner_microstep: 1407.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629
[2024-06-11 03:09:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1415.41 | bwd_inner_microstep: 1415.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687
[2024-06-11 03:09:28,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1431.05 | bwd_inner_microstep: 1431.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 03:09:30,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1380.42 | bwd_inner_microstep: 1380.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-11 03:09:32,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1431.57 | bwd_inner_microstep: 1431.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-11 03:09:34,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.02 | bwd_microstep: 1465.74 | bwd_inner_microstep: 1465.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-11 03:09:36,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1615.99 | bwd_inner_microstep: 1615.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-11 03:09:38,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1553.91 | bwd_inner_microstep: 1553.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-11 03:09:40,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 1345.02 | bwd_inner_microstep: 1344.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605
[2024-06-11 03:09:42,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1533.91 | bwd_inner_microstep: 1533.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2401
[2024-06-11 03:09:44,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.11 | bwd_microstep: 1034.25 | bwd_inner_microstep: 1034.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764
[2024-06-11 03:09:46,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.23 | bwd_microstep: 1536.32 | bwd_inner_microstep: 1536.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 03:09:48,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.80 | bwd_microstep: 1649.91 | bwd_inner_microstep: 1649.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587
[2024-06-11 03:09:51,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.98 | optimizer_gradients: 4.02 | optimizer_step: 6.60
[2024-06-11 03:09:51,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 2328.28 | bwd_inner_microstep: 1666.03 | bwd_allreduce_microstep: 662.20 | step_microstep: 37.61
[2024-06-11 03:09:51,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16777.80 | bwd: 45785.31 | bwd_inner: 45122.21 | bwd_allreduce: 662.43 | step: 39.05
{'loss': 1.2033, 'learning_rate': 1.4339097884997787e-06, 'epoch': 0.88}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-11 03:09:53,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.47 | bwd_microstep: 1358.69 | bwd_inner_microstep: 1358.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 03:09:55,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1240.41 | bwd_inner_microstep: 1240.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 03:09:57,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1282.40 | bwd_inner_microstep: 1282.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-11 03:09:58,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.13 | bwd_microstep: 1337.83 | bwd_inner_microstep: 1337.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 03:10:00,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 03:10:02,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.54 | bwd_microstep: 1280.89 | bwd_inner_microstep: 1280.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 03:10:04,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.48 | bwd_microstep: 1292.20 | bwd_inner_microstep: 1292.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-11 03:10:05,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 790.01 | bwd_inner_microstep: 789.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-11 03:10:06,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.98 | bwd_microstep: 1149.89 | bwd_inner_microstep: 1149.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921
[2024-06-11 03:10:08,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.78 | bwd_microstep: 789.59 | bwd_inner_microstep: 789.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3701
[2024-06-11 03:10:09,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1290.36 | bwd_inner_microstep: 1290.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2479
[2024-06-11 03:10:11,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.77 | bwd_microstep: 980.70 | bwd_inner_microstep: 980.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-11 03:10:12,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.06 | bwd_microstep: 1240.25 | bwd_inner_microstep: 1240.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-11 03:10:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.37 | bwd_microstep: 1353.20 | bwd_inner_microstep: 1353.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 03:10:16,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1389.94 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2120
[2024-06-11 03:10:18,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.84 | bwd_microstep: 1022.87 | bwd_inner_microstep: 1022.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3599
[2024-06-11 03:10:20,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.51 | bwd_microstep: 1554.46 | bwd_inner_microstep: 1554.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-11 03:10:21,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.21 | bwd_microstep: 1159.33 | bwd_inner_microstep: 1159.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 03:10:23,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1560.47 | bwd_inner_microstep: 1560.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2104
[2024-06-11 03:10:25,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.74 | bwd_microstep: 821.65 | bwd_inner_microstep: 821.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-11 03:10:27,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.10 | bwd_microstep: 1451.01 | bwd_inner_microstep: 1450.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 950
[2024-06-11 03:10:27,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.22 | bwd_microstep: 379.32 | bwd_inner_microstep: 379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2289
[2024-06-11 03:10:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.74 | bwd_microstep: 910.96 | bwd_inner_microstep: 910.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 03:10:31,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.14 | bwd_microstep: 1658.72 | bwd_inner_microstep: 1658.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 03:10:33,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.22 | bwd_microstep: 1654.05 | bwd_inner_microstep: 1654.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3561
[2024-06-11 03:10:35,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1266.38 | bwd_inner_microstep: 1266.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3438
[2024-06-11 03:10:37,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1378.17 | bwd_inner_microstep: 1378.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 03:10:39,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1394.26 | bwd_inner_microstep: 1394.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-11 03:10:41,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1452.59 | bwd_inner_microstep: 1452.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-11 03:10:42,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.40 | bwd_microstep: 877.04 | bwd_inner_microstep: 877.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3575
[2024-06-11 03:10:44,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.49 | bwd_microstep: 1693.70 | bwd_inner_microstep: 1693.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2118
[2024-06-11 03:10:51,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.09 | optimizer_step: 6.58
[2024-06-11 03:10:51,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.11 | bwd_microstep: 6830.19 | bwd_inner_microstep: 1056.32 | bwd_allreduce_microstep: 5773.82 | step_microstep: 37.94
[2024-06-11 03:10:51,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14735.32 | bwd: 45089.47 | bwd_inner: 39314.75 | bwd_allreduce: 5774.05 | step: 39.43
 | 1519/1726 [26:28:19<3:34:22, 62.14s/it]


 88%|████████▊ | 1519/1726 [26:28:19<3:34:22, 62.14s/it]
 88%|████████▊ | 1520/1726 [26:29:20<3:32:25, 61.87s/it]


 88%|████████▊ | 1520/1726 [26:29:20<3:32:25, 61.87s/it]
 88%|████████▊ | 1521/1726 [26:30:23<3:32:14, 62.12s/it]


 88%|████████▊ | 1521/1726 [26:30:23<3:32:14, 62.12s/it]
 88%|████████▊ | 1522/1726 [26:31:25<3:31:17, 62.14s/it]


 88%|████████▊ | 1522/1726 [26:31:25<3:31:17, 62.14s/it]
 88%|████████▊ | 1523/1726 [26:32:28<3:31:01, 62.37s/it]


 88%|████████▊ | 1523/1726 [26:32:28<3:31:01, 62.37s/it]
 88%|███�{'loss': 1.1447, 'learning_rate': 1.419986582168522e-06, 'epoch': 0.88}
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3474
[2024-06-11 03:10:53,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1495.22 | bwd_inner_microstep: 1495.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-11 03:10:55,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.06 | bwd_microstep: 1487.70 | bwd_inner_microstep: 1487.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3492
[2024-06-11 03:10:57,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1345.64 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3883
[2024-06-11 03:10:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1444.85 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792
[2024-06-11 03:11:02,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.44 | bwd_microstep: 1647.11 | bwd_inner_microstep: 1647.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-11 03:11:03,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 820.08 | bwd_inner_microstep: 820.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3524
[2024-06-11 03:11:05,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430
[2024-06-11 03:11:06,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.43 | bwd_microstep: 1152.94 | bwd_inner_microstep: 1152.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414
[2024-06-11 03:11:08,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.09 | bwd_microstep: 1213.29 | bwd_inner_microstep: 1213.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 03:11:10,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1403.36 | bwd_inner_microstep: 1403.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776
[2024-06-11 03:11:12,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.68 | bwd_microstep: 1437.28 | bwd_inner_microstep: 1437.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529
[2024-06-11 03:11:14,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.87 | bwd_microstep: 1291.27 | bwd_inner_microstep: 1291.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-11 03:11:16,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1417.43 | bwd_inner_microstep: 1417.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 03:11:17,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.06 | bwd_microstep: 1381.79 | bwd_inner_microstep: 1381.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 03:11:19,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.19 | bwd_microstep: 1378.30 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3653
[2024-06-11 03:11:22,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.59 | bwd_microstep: 1584.12 | bwd_inner_microstep: 1584.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297
[2024-06-11 03:11:23,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.01 | bwd_microstep: 878.06 | bwd_inner_microstep: 878.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-11 03:11:25,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.49 | bwd_microstep: 1590.94 | bwd_inner_microstep: 1590.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2391
[2024-06-11 03:11:26,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.04 | bwd_microstep: 935.28 | bwd_inner_microstep: 935.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627
[2024-06-11 03:11:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1507.88 | bwd_inner_microstep: 1507.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2088
[2024-06-11 03:11:30,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.13 | bwd_microstep: 1014.30 | bwd_inner_microstep: 1014.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933
[2024-06-11 03:11:31,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.28 | bwd_microstep: 1187.89 | bwd_inner_microstep: 1187.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2139
[2024-06-11 03:11:33,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.68 | bwd_microstep: 1025.19 | bwd_inner_microstep: 1025.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615
[2024-06-11 03:11:35,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.44 | bwd_microstep: 1340.63 | bwd_inner_microstep: 1340.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-11 03:11:37,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.56 | bwd_microstep: 1635.13 | bwd_inner_microstep: 1635.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 03:11:39,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1279.89 | bwd_inner_microstep: 1279.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 03:11:41,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1352.77 | bwd_inner_microstep: 1352.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429
[2024-06-11 03:11:42,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.46 | bwd_microstep: 1153.31 | bwd_inner_microstep: 1153.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3628
[2024-06-11 03:11:44,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.55 | bwd_microstep: 1461.75 | bwd_inner_microstep: 1461.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497
[2024-06-11 03:11:46,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.88 | bwd_microstep: 1509.32 | bwd_inner_microstep: 1509.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641
[2024-06-11 03:11:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1312.45 | bwd_inner_microstep: 1312.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3852
[2024-06-11 03:11:52,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.61
[2024-06-11 03:11:52,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.89 | bwd_microstep: 3610.59 | bwd_inner_microstep: 1770.20 | bwd_allreduce_microstep: 1840.34 | step_microstep: 37.77
[2024-06-11 03:11:52,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15951.30 | bwd: 44648.22 | bwd_inner: 42806.98 | bwd_allreduce: 1840.56 | step: 39.19
{'loss': 1.1345, 'learning_rate': 1.406128814547929e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-11 03:11:54,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1341.31 | bwd_inner_microstep: 1341.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3985
[2024-06-11 03:11:56,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.31 | bwd_microstep: 1507.71 | bwd_inner_microstep: 1507.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3913
[2024-06-11 03:11:58,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.97 | bwd_microstep: 1636.04 | bwd_inner_microstep: 1636.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4247
[2024-06-11 03:12:01,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.64 | bwd_microstep: 1761.81 | bwd_inner_microstep: 1761.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870
[2024-06-11 03:12:02,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.36 | bwd_microstep: 706.92 | bwd_inner_microstep: 706.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-11 03:12:03,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.91 | bwd_microstep: 1148.22 | bwd_inner_microstep: 1148.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 03:12:05,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1399.46 | bwd_inner_microstep: 1399.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948
[2024-06-11 03:12:06,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.21 | bwd_microstep: 728.55 | bwd_inner_microstep: 728.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 03:12:08,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1282.71 | bwd_inner_microstep: 1282.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-11 03:12:09,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.62 | bwd_microstep: 792.14 | bwd_inner_microstep: 792.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-11 03:12:12,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.72 | bwd_microstep: 1624.03 | bwd_inner_microstep: 1624.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877
[2024-06-11 03:12:12,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.21 | bwd_microstep: 679.04 | bwd_inner_microstep: 679.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 03:12:14,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659
[2024-06-11 03:12:16,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.76 | bwd_microstep: 1441.63 | bwd_inner_microstep: 1441.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-11 03:12:18,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.63 | bwd_microstep: 1485.28 | bwd_inner_microstep: 1485.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 03:12:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.40 | bwd_microstep: 1481.11 | bwd_inner_microstep: 1481.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2050
[2024-06-11 03:12:22,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.40 | bwd_microstep: 939.44 | bwd_inner_microstep: 939.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2399
[2024-06-11 03:12:23,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.59 | bwd_microstep: 1057.62 | bwd_inner_microstep: 1057.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 03:12:25,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009
[2024-06-11 03:12:26,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.03 | bwd_microstep: 739.71 | bwd_inner_microstep: 739.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3459
[2024-06-11 03:12:28,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.68 | bwd_microstep: 1211.52 | bwd_inner_microstep: 1211.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3710
[2024-06-11 03:12:30,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.19 | bwd_microstep: 1333.66 | bwd_inner_microstep: 1333.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 03:12:32,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.14 | bwd_microstep: 1549.72 | bwd_inner_microstep: 1549.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-11 03:12:34,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1252.28 | bwd_inner_microstep: 1252.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 03:12:35,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1391.32 | bwd_inner_microstep: 1391.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674
[2024-06-11 03:12:37,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.71 | bwd_microstep: 1421.47 | bwd_inner_microstep: 1421.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 03:12:39,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1450.18 | bwd_inner_microstep: 1450.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3748
[2024-06-11 03:12:41,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1340.30 | bwd_inner_microstep: 1340.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057
[2024-06-11 03:12:42,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.66 | bwd_microstep: 850.00 | bwd_inner_microstep: 849.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3819
[2024-06-11 03:12:45,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.20 | bwd_microstep: 1817.20 | bwd_inner_microstep: 1817.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3762
[2024-06-11 03:12:47,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.39 | bwd_microstep: 1844.67 | bwd_inner_microstep: 1844.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-11 03:12:51,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.08 | optimizer_step: 6.64
[2024-06-11 03:12:51,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 3146.28 | bwd_inner_microstep: 1568.78 | bwd_allreduce_microstep: 1577.45 | step_microstep: 37.77
[2024-06-11 03:12:51,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15485.47 | bwd: 43094.44 | bwd_inner: 41516.09 | bwd_allreduce: 1577.68 | step: 39.19
{'loss': 1.2023, 'learning_rate': 1.3923365344450002e-06, 'epoch': 0.88}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3457
[2024-06-11 03:12:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.75 | bwd_microstep: 1233.70 | bwd_inner_microstep: 1233.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4761
[2024-06-11 03:12:55,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.56 | bwd_microstep: 1779.07 | bwd_inner_microstep: 1779.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015
[2024-06-11 03:12:56,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.37 | bwd_microstep: 801.00 | bwd_inner_microstep: 800.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765
[2024-06-11 03:12:59,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1534.92 | bwd_inner_microstep: 1534.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 03:13:01,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.45 | bwd_microstep: 1549.09 | bwd_inner_microstep: 1549.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 03:13:02,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1248.66 | bwd_inner_microstep: 1248.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4022
[2024-06-11 03:13:05,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.23 | bwd_microstep: 1710.64 | bwd_inner_microstep: 1710.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-11 03:13:07,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 03:13:09,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.70 | bwd_microstep: 1293.83 | bwd_inner_microstep: 1293.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-11 03:13:10,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.34 | bwd_microstep: 1278.01 | bwd_inner_microstep: 1277.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3911
[2024-06-11 03:13:13,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.09 | bwd_microstep: 1525.73 | bwd_inner_microstep: 1525.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480
[2024-06-11 03:13:14,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.50 | bwd_microstep: 1311.73 | bwd_inner_microstep: 1311.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 03:13:16,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.48 | bwd_microstep: 1379.26 | bwd_inner_microstep: 1379.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-11 03:13:18,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1345.75 | bwd_inner_microstep: 1345.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410
[2024-06-11 03:13:20,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.80 | bwd_microstep: 1366.67 | bwd_inner_microstep: 1366.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1900
[2024-06-11 03:13:21,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.07 | bwd_microstep: 713.03 | bwd_inner_microstep: 713.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-11 03:13:23,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.64 | bwd_microstep: 1354.36 | bwd_inner_microstep: 1354.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-11 03:13:24,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.61 | bwd_microstep: 1157.46 | bwd_inner_microstep: 1157.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-11 03:13:27,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.33 | bwd_microstep: 1523.08 | bwd_inner_microstep: 1523.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3689
[2024-06-11 03:13:28,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.76 | bwd_microstep: 1326.69 | bwd_inner_microstep: 1326.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 03:13:31,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1554.30 | bwd_inner_microstep: 1554.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 03:13:32,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1399.77 | bwd_inner_microstep: 1399.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3622
[2024-06-11 03:13:34,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.37 | bwd_microstep: 1246.02 | bwd_inner_microstep: 1246.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641
[2024-06-11 03:13:36,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1411.84 | bwd_inner_microstep: 1411.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532
[2024-06-11 03:13:38,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1323.66 | bwd_inner_microstep: 1323.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 03:13:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1349.12 | bwd_inner_microstep: 1349.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-11 03:13:42,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.20 | bwd_microstep: 1625.10 | bwd_inner_microstep: 1625.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3597
[2024-06-11 03:13:44,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.98 | bwd_microstep: 1665.11 | bwd_inner_microstep: 1665.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3608
[2024-06-11 03:13:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.23 | bwd_microstep: 1575.10 | bwd_inner_microstep: 1575.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-11 03:13:49,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1489.76 | bwd_inner_microstep: 1489.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3586
[2024-06-11 03:13:51,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.73 | bwd_microstep: 1670.49 | bwd_inner_microstep: 1670.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3616
[2024-06-11 03:13:56,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.09 | optimizer_step: 6.58
[2024-06-11 03:13:56,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.03 | bwd_microstep: 4510.89 | bwd_inner_microstep: 1874.37 | bwd_allreduce_microstep: 2636.46 | step_microstep: 37.85
[2024-06-11 03:13:56,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16805.32 | bwd: 47742.74 | bwd_inner: 45105.37 | bwd_allreduce: 2636.70 | step: 39.40
{'loss': 1.1702, 'learning_rate': 1.3786097904360563e-06, 'epoch': 0.88}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 03:13:58,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.69 | bwd_microstep: 1331.20 | bwd_inner_microstep: 1331.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3948
[2024-06-11 03:14:00,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1497.64 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 03:14:02,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1275.39 | bwd_inner_microstep: 1275.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871
[2024-06-11 03:14:04,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.71 | bwd_microstep: 1660.86 | bwd_inner_microstep: 1660.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 03:14:06,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1273.47 | bwd_inner_microstep: 1273.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 03:14:08,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1247.45 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 03:14:09,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 03:14:11,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1413.76 | bwd_inner_microstep: 1413.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 03:14:13,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1386.76 | bwd_inner_microstep: 1386.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977
[2024-06-11 03:14:14,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.83 | bwd_microstep: 704.86 | bwd_inner_microstep: 704.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3412
[2024-06-11 03:14:16,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.25 | bwd_microstep: 1181.63 | bwd_inner_microstep: 1181.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3918
[2024-06-11 03:14:18,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.94 | bwd_microstep: 1788.54 | bwd_inner_microstep: 1788.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670
[2024-06-11 03:14:21,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.75 | bwd_microstep: 1715.87 | bwd_inner_microstep: 1715.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397
[2024-06-11 03:14:23,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.43 | bwd_microstep: 1432.91 | bwd_inner_microstep: 1432.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603
[2024-06-11 03:14:25,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.98 | bwd_microstep: 1428.15 | bwd_inner_microstep: 1428.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3626
[2024-06-11 03:14:27,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1701.46 | bwd_inner_microstep: 1701.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3507
[2024-06-11 03:14:29,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1542.94 | bwd_inner_microstep: 1542.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 03:14:30,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.04 | bwd_microstep: 793.67 | bwd_inner_microstep: 793.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2590
[2024-06-11 03:14:32,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.72 | bwd_microstep: 1072.12 | bwd_inner_microstep: 1072.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-11 03:14:34,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1491.73 | bwd_inner_microstep: 1491.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525
[2024-06-11 03:14:36,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.80 | bwd_microstep: 1356.81 | bwd_inner_microstep: 1356.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453
[2024-06-11 03:14:37,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1159.39 | bwd_inner_microstep: 1159.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 03:14:39,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1373.51 | bwd_inner_microstep: 1373.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-11 03:14:41,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.97 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 03:14:43,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1280.29 | bwd_inner_microstep: 1280.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601
[2024-06-11 03:14:45,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.69 | bwd_microstep: 1605.34 | bwd_inner_microstep: 1605.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-11 03:14:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.83 | bwd_microstep: 973.13 | bwd_inner_microstep: 973.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 03:14:49,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1557.90 | bwd_inner_microstep: 1557.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3598
[2024-06-11 03:14:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1449.59 | bwd_inner_microstep: 1449.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 03:14:53,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.31 | bwd_microstep: 1643.88 | bwd_inner_microstep: 1643.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790
[2024-06-11 03:14:55,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.64 | bwd_microstep: 1639.75 | bwd_inner_microstep: 1639.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819
[2024-06-11 03:14:57,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.05 | optimizer_step: 6.63
[2024-06-11 03:14:57,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.38 | bwd_microstep: 1695.37 | bwd_inner_microstep: 1687.68 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.60
[2024-06-11 03:14:57,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16580.00 | bwd: 44459.47 | bwd_inner: 44450.88 | bwd_allreduce: 7.88 | step: 39.11
{'loss': 1.1656, 'learning_rate': 1.3649486308666314e-06, 'epoch': 0.89}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3394
[2024-06-11 03:14:59,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.82 | bwd_microstep: 1147.32 | bwd_inner_microstep: 1147.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 03:15:01,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1385.08 | bwd_inner_microstep: 1385.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4108
[2024-06-11 03:15:03,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.49 | bwd_microstep: 1733.44 | bwd_inner_microstep: 1733.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466
[2024-06-11 03:15:05,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.24 | bwd_microstep: 1210.27 | bwd_inner_microstep: 1210.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-11 03:15:07,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1320.62 | bwd_inner_microstep: 1320.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944
[2024-06-11 03:15:08,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.88 | bwd_microstep: 729.56 | bwd_inner_microstep: 729.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 03:15:10,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1387.37 | bwd_inner_microstep: 1387.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 03:15:12,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.86 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657
[2024-06-11 03:15:14,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.80 | bwd_microstep: 1424.76 | bwd_inner_microstep: 1424.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 03:15:15,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1387.55 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 03:15:17,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1251.16 | bwd_inner_microstep: 1251.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2521
[2024-06-11 03:15:19,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.07 | bwd_microstep: 935.01 | bwd_inner_microstep: 934.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-11 03:15:20,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 797.04 | bwd_inner_microstep: 797.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 03:15:22,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 1513.68 | bwd_inner_microstep: 1513.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2150
[2024-06-11 03:15:23,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.16 | bwd_microstep: 1045.25 | bwd_inner_microstep: 1045.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 03:15:25,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1246.44 | bwd_inner_microstep: 1246.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-11 03:15:27,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.60 | bwd_microstep: 1419.24 | bwd_inner_microstep: 1419.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626
[2024-06-11 03:15:29,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1612.49 | bwd_inner_microstep: 1612.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531
[2024-06-11 03:15:31,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.31 | bwd_microstep: 1586.24 | bwd_inner_microstep: 1586.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2580
[2024-06-11 03:15:33,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.11 | bwd_microstep: 1046.81 | bwd_inner_microstep: 1046.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2432
[2024-06-11 03:15:34,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.70 | bwd_microstep: 879.04 | bwd_inner_microstep: 879.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 03:15:36,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1379.34 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178
[2024-06-11 03:15:37,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.33 | bwd_microstep: 957.16 | bwd_inner_microstep: 957.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-11 03:15:39,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1532.06 | bwd_inner_microstep: 1532.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-11 03:15:41,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.61 | bwd_microstep: 1419.89 | bwd_inner_microstep: 1419.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3567
[2024-06-11 03:15:43,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.08 | bwd_microstep: 1530.43 | bwd_inner_microstep: 1530.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-11 03:15:45,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-11 03:15:46,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.00 | bwd_microstep: 718.61 | bwd_inner_microstep: 718.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3573
[2024-06-11 03:15:49,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.12 | bwd_microstep: 1698.44 | bwd_inner_microstep: 1698.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584
[2024-06-11 03:15:51,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.58 | bwd_microstep: 1501.28 | bwd_inner_microstep: 1501.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-11 03:15:52,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 793.68 | bwd_inner_microstep: 793.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-11 03:16:00,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.58
[2024-06-11 03:16:00,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 7814.44 | bwd_inner_microstep: 1640.50 | bwd_allreduce_microstep: 6173.88 | step_microstep: 37.88
[2024-06-11 03:16:00,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15308.65 | bwd: 47178.40 | bwd_inner: 41003.60 | bwd_allreduce: 6174.11 | step: 39.37
�████▊ | 1524/1726 [26:33:28<3:27:43, 61.70s/it]


 88%|████████▊ | 1524/1726 [26:33:28<3:27:43, 61.70s/it]
 88%|████████▊ | 1525/1726 [26:34:29<3:25:55, 61.47s/it]


 88%|████████▊ | 1525/1726 [26:34:29<3:25:55, 61.47s/it]
 88%|████████▊ | 1526/1726 [26:35:28<3:22:20, 60.70s/it]


 88%|████████▊ | 1526/1726 [26:35:28<3:22:20, 60.70s/it]
 88%|████████▊ | 1527/1726 [26:36:33<3:25:29, 61.96s/it]


 88%|████████▊ | 1527/1726 [26:36:33<3:25:29, 61.96s/it]
 89%|████████▊ | 1528/1726 [26:37:34<3:23:53, 61.79s/it]


 89%|████████▊ | 1528/1726 [26:37:34<3:23:53, 61.79s/it]
{'loss': 1.1391, 'learning_rate': 1.3513531038512517e-06, 'epoch': 0.89}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 03:16:02,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1472.87 | bwd_inner_microstep: 1472.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 03:16:04,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1274.89 | bwd_inner_microstep: 1274.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 03:16:06,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1546.11 | bwd_inner_microstep: 1546.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 03:16:08,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.43 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 03:16:10,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.25 | bwd_microstep: 1474.06 | bwd_inner_microstep: 1474.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-11 03:16:12,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1344.24 | bwd_inner_microstep: 1344.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-11 03:16:14,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.85 | bwd_microstep: 1639.71 | bwd_inner_microstep: 1639.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-11 03:16:15,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 792.04 | bwd_inner_microstep: 792.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 03:16:17,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1387.75 | bwd_inner_microstep: 1387.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3733
[2024-06-11 03:16:19,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1561.79 | bwd_inner_microstep: 1561.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3675
[2024-06-11 03:16:22,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.93 | bwd_microstep: 1821.72 | bwd_inner_microstep: 1821.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2326
[2024-06-11 03:16:23,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.43 | bwd_microstep: 919.52 | bwd_inner_microstep: 919.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-11 03:16:25,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1574.01 | bwd_inner_microstep: 1573.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-11 03:16:27,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-11 03:16:29,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1484.49 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-11 03:16:30,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.33 | bwd_microstep: 798.37 | bwd_inner_microstep: 798.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 03:16:32,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-11 03:16:34,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.22 | bwd_microstep: 1184.54 | bwd_inner_microstep: 1184.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 03:16:36,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1497.77 | bwd_inner_microstep: 1497.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 03:16:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1379.33 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-11 03:16:40,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1409.47 | bwd_inner_microstep: 1409.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3695
[2024-06-11 03:16:42,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.32 | bwd_microstep: 1360.82 | bwd_inner_microstep: 1360.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3609
[2024-06-11 03:16:44,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1341.54 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-11 03:16:46,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.45 | bwd_microstep: 1538.47 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 03:16:48,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1553.47 | bwd_inner_microstep: 1553.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3721
[2024-06-11 03:16:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1382.24 | bwd_inner_microstep: 1382.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2246
[2024-06-11 03:16:51,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.71 | bwd_microstep: 1001.57 | bwd_inner_microstep: 1001.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-11 03:16:52,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 809.31 | bwd_inner_microstep: 809.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802
[2024-06-11 03:16:54,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.48 | bwd_microstep: 1549.14 | bwd_inner_microstep: 1549.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039
[2024-06-11 03:16:56,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.37 | bwd_microstep: 903.81 | bwd_inner_microstep: 903.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-11 03:16:57,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.30 | bwd_microstep: 1336.80 | bwd_inner_microstep: 1336.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-11 03:17:23,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.34 | optimizer_step: 6.59
[2024-06-11 03:17:23,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 25065.52 | bwd_inner_microstep: 1801.84 | bwd_allreduce_microstep: 23263.57 | step_microstep: 40.43
[2024-06-11 03:17:23,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16060.36 | bwd: 66525.00 | bwd_inner: 43260.47 | bwd_allreduce: 23263.84 | step: 41.89
{'loss': 1.2178, 'learning_rate': 1.3378232572732985e-06, 'epoch': 0.89}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529
[2024-06-11 03:17:25,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.75 | bwd_microstep: 1436.06 | bwd_inner_microstep: 1435.97 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3964
[2024-06-11 03:17:28,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.65 | bwd_microstep: 1688.43 | bwd_inner_microstep: 1688.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2341
[2024-06-11 03:17:29,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.94 | bwd_microstep: 885.65 | bwd_inner_microstep: 885.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-11 03:17:31,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.61 | bwd_microstep: 1444.57 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079
[2024-06-11 03:17:32,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.85 | bwd_microstep: 818.34 | bwd_inner_microstep: 818.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 03:17:34,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 03:17:35,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1245.13 | bwd_inner_microstep: 1245.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435
[2024-06-11 03:17:37,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.14 | bwd_microstep: 1154.28 | bwd_inner_microstep: 1154.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3710
[2024-06-11 03:17:39,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.75 | bwd_microstep: 1555.31 | bwd_inner_microstep: 1555.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 03:17:41,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1344.64 | bwd_inner_microstep: 1344.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411
[2024-06-11 03:17:43,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1342.12 | bwd_inner_microstep: 1342.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107
[2024-06-11 03:17:44,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.36 | bwd_microstep: 1019.52 | bwd_inner_microstep: 1019.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 03:17:46,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-11 03:17:48,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3626
[2024-06-11 03:17:50,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.83 | bwd_microstep: 1365.58 | bwd_inner_microstep: 1365.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695
[2024-06-11 03:17:52,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.32 | bwd_microstep: 1426.97 | bwd_inner_microstep: 1426.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-11 03:17:54,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.68 | bwd_microstep: 1199.42 | bwd_inner_microstep: 1199.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533
[2024-06-11 03:17:55,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1327.44 | bwd_inner_microstep: 1327.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-11 03:17:56,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.31 | bwd_microstep: 799.35 | bwd_inner_microstep: 799.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 03:17:59,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.10 | bwd_microstep: 1560.16 | bwd_inner_microstep: 1560.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795
[2024-06-11 03:18:01,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1656.11 | bwd_inner_microstep: 1656.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-11 03:18:03,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1427.16 | bwd_inner_microstep: 1427.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3739
[2024-06-11 03:18:05,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1368.13 | bwd_inner_microstep: 1368.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-11 03:18:06,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.23 | bwd_microstep: 977.67 | bwd_inner_microstep: 977.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 893
[2024-06-11 03:18:07,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.69 | bwd_microstep: 369.62 | bwd_inner_microstep: 369.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-11 03:18:08,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 978.11 | bwd_inner_microstep: 978.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431
[2024-06-11 03:18:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.08 | bwd_microstep: 1406.77 | bwd_inner_microstep: 1406.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-11 03:18:12,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.15 | bwd_microstep: 1432.49 | bwd_inner_microstep: 1432.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-11 03:18:13,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.58 | bwd_microstep: 701.56 | bwd_inner_microstep: 701.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3578
[2024-06-11 03:18:15,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.24 | bwd_microstep: 1542.92 | bwd_inner_microstep: 1542.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 03:18:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.95 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573
[2024-06-11 03:18:23,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.62
[2024-06-11 03:18:23,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.54 | bwd_microstep: 5983.39 | bwd_inner_microstep: 1762.37 | bwd_allreduce_microstep: 4220.96 | step_microstep: 39.70
[2024-06-11 03:18:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15111.84 | bwd: 44721.88 | bwd_inner: 40499.93 | bwd_allreduce: 4221.25 | step: 41.28
{'loss': 1.1358, 'learning_rate': 1.3243591387848164e-06, 'epoch': 0.89}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 03:18:25,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1367.30 | bwd_inner_microstep: 1367.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 03:18:27,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.95 | bwd_microstep: 1281.89 | bwd_inner_microstep: 1281.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-11 03:18:29,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.09 | bwd_microstep: 1436.69 | bwd_inner_microstep: 1436.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 03:18:31,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1382.08 | bwd_inner_microstep: 1382.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407
[2024-06-11 03:18:33,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.22 | bwd_microstep: 1211.82 | bwd_inner_microstep: 1211.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472
[2024-06-11 03:18:35,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.17 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3481
[2024-06-11 03:18:36,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1330.93 | bwd_inner_microstep: 1330.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867
[2024-06-11 03:18:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.60 | bwd_microstep: 678.03 | bwd_inner_microstep: 678.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 03:18:39,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1255.00 | bwd_inner_microstep: 1254.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 03:18:41,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1347.39 | bwd_inner_microstep: 1347.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-11 03:18:43,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.10 | bwd_microstep: 1607.21 | bwd_inner_microstep: 1607.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-11 03:18:45,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1342.55 | bwd_inner_microstep: 1342.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2967
[2024-06-11 03:18:47,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.26 | bwd_microstep: 1198.57 | bwd_inner_microstep: 1198.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-11 03:18:49,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 1409.89 | bwd_inner_microstep: 1409.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 03:18:50,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1288.66 | bwd_inner_microstep: 1288.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 03:18:52,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1387.08 | bwd_inner_microstep: 1387.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-11 03:18:54,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1292.10 | bwd_inner_microstep: 1292.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122
[2024-06-11 03:18:55,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.92 | bwd_microstep: 929.78 | bwd_inner_microstep: 929.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 03:18:57,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1393.04 | bwd_inner_microstep: 1393.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-11 03:18:59,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1258.17 | bwd_inner_microstep: 1258.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-11 03:19:01,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1287.75 | bwd_inner_microstep: 1287.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-11 03:19:03,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-11 03:19:05,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1406.11 | bwd_inner_microstep: 1406.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-11 03:19:06,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.19 | bwd_microstep: 972.60 | bwd_inner_microstep: 972.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-11 03:19:07,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.28 | bwd_microstep: 973.73 | bwd_inner_microstep: 973.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3556
[2024-06-11 03:19:10,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.26 | bwd_microstep: 1526.21 | bwd_inner_microstep: 1526.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 03:19:11,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1378.97 | bwd_inner_microstep: 1378.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2021
[2024-06-11 03:19:13,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 898.50 | bwd_inner_microstep: 898.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-11 03:19:15,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.57 | bwd_microstep: 1451.08 | bwd_inner_microstep: 1451.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1858
[2024-06-11 03:19:16,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 708.24 | bwd_inner_microstep: 708.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-11 03:19:18,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-11 03:19:25,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-11 03:19:25,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 6521.19 | bwd_inner_microstep: 1643.28 | bwd_allreduce_microstep: 4877.86 | step_microstep: 40.24
[2024-06-11 03:19:25,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15267.29 | bwd: 45779.19 | bwd_inner: 40900.37 | bwd_allreduce: 4878.10 | step: 41.82
{'loss': 1.1787, 'learning_rate': 1.3109607958063641e-06, 'epoch': 0.89}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 03:19:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.98 | bwd_microstep: 1371.85 | bwd_inner_microstep: 1371.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3976
[2024-06-11 03:19:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.40 | bwd_microstep: 1603.48 | bwd_inner_microstep: 1603.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3959
[2024-06-11 03:19:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.00 | bwd_microstep: 1695.83 | bwd_inner_microstep: 1695.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893
[2024-06-11 03:19:33,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.48 | bwd_microstep: 1583.64 | bwd_inner_microstep: 1583.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 03:19:35,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.69 | bwd_microstep: 1376.26 | bwd_inner_microstep: 1376.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762
[2024-06-11 03:19:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.72 | bwd_microstep: 1539.09 | bwd_inner_microstep: 1539.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882
[2024-06-11 03:19:38,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.84 | bwd_microstep: 680.16 | bwd_inner_microstep: 680.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 03:19:40,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737
[2024-06-11 03:19:42,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1534.41 | bwd_inner_microstep: 1534.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 03:19:43,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.91 | bwd_microstep: 796.09 | bwd_inner_microstep: 796.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3400
[2024-06-11 03:19:45,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 1290.86 | bwd_inner_microstep: 1290.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3904
[2024-06-11 03:19:47,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.72 | bwd_microstep: 1444.19 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3714
[2024-06-11 03:19:50,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.09 | bwd_microstep: 1831.24 | bwd_inner_microstep: 1831.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3382
[2024-06-11 03:19:51,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.15 | bwd_microstep: 1144.01 | bwd_inner_microstep: 1143.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 03:19:53,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.47 | bwd_microstep: 1484.09 | bwd_inner_microstep: 1484.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537
[2024-06-11 03:19:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1325.85 | bwd_inner_microstep: 1325.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 03:19:57,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1283.44 | bwd_inner_microstep: 1283.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 03:19:59,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 03:20:01,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1386.70 | bwd_inner_microstep: 1386.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 03:20:02,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.63 | bwd_microstep: 1279.73 | bwd_inner_microstep: 1279.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-11 03:20:04,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1285.67 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3817
[2024-06-11 03:20:06,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.72 | bwd_microstep: 1259.76 | bwd_inner_microstep: 1259.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 03:20:08,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1353.02 | bwd_inner_microstep: 1352.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 03:20:10,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.90 | bwd_inner_microstep: 1283.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 03:20:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1495.84 | bwd_inner_microstep: 1495.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 03:20:14,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.55 | bwd_microstep: 1557.38 | bwd_inner_microstep: 1557.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131
[2024-06-11 03:20:15,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.72 | bwd_microstep: 929.84 | bwd_inner_microstep: 929.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3819
[2024-06-11 03:20:17,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1500.44 | bwd_inner_microstep: 1500.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-11 03:20:19,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1553.27 | bwd_inner_microstep: 1553.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2241
[2024-06-11 03:20:21,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.44 | bwd_microstep: 928.06 | bwd_inner_microstep: 928.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3594
[2024-06-11 03:20:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.12 | bwd_microstep: 1671.49 | bwd_inner_microstep: 1671.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2031
[2024-06-11 03:20:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.09 | optimizer_step: 6.61
[2024-06-11 03:20:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.06 | bwd_microstep: 5583.73 | bwd_inner_microstep: 1035.78 | bwd_allreduce_microstep: 4547.90 | step_microstep: 37.88
[2024-06-11 03:20:29,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16080.83 | bwd: 47686.87 | bwd_inner: 43138.03 | bwd_allreduce: 4548.14 | step: 39.43
{'loss': 1.1558, 'learning_rate': 1.297628275526832e-06, 'epoch': 0.89}
 89%|████████▊ | 1529/1726 [26:38:37<3:23:52, 62.10s/it]


 89%|████████▊ | 1529/1726 [26:38:37<3:23:52, 62.10s/it]
 89%|████████▊ | 1530/1726 [26:40:00<3:43:16, 68.35s/it]


 89%|████████▊ | 1530/1726 [26:40:00<3:43:16, 68.35s/it]
 89%|████████▊ | 1531/1726 [26:41:00<3:34:09, 65.90s/it]


 89%|████████▊ | 1531/1726 [26:41:00<3:34:09, 65.90s/it]
 89%|████████▉ | 1532/1726 [26:42:02<3:28:41, 64.55s/it]


 89%|████████▉ | 1532/1726 [26:42:02<3:28:41, 64.55s/it]
 89%|████████▉ | 1533/1726 [26:43:06<3:27:12, 64.42s/it]


 89%|████████▉ | 1533/1726 [26:43:06<3:27dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-11 03:20:31,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.51 | bwd_microstep: 1438.63 | bwd_inner_microstep: 1438.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2403
[2024-06-11 03:20:32,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.01 | bwd_microstep: 998.78 | bwd_inner_microstep: 998.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-11 03:20:34,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1559.78 | bwd_inner_microstep: 1559.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-11 03:20:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 1481.79 | bwd_inner_microstep: 1481.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 03:20:38,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1377.82 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-11 03:20:39,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.98 | bwd_microstep: 803.68 | bwd_inner_microstep: 803.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-11 03:20:41,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.25 | bwd_microstep: 1150.11 | bwd_inner_microstep: 1150.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-11 03:20:43,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 03:20:45,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1284.82 | bwd_inner_microstep: 1284.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2955
[2024-06-11 03:20:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.59 | bwd_microstep: 1167.96 | bwd_inner_microstep: 1167.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410
[2024-06-11 03:20:48,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.73 | bwd_microstep: 1152.56 | bwd_inner_microstep: 1152.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 03:20:50,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1351.58 | bwd_inner_microstep: 1351.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3482
[2024-06-11 03:20:52,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1335.87 | bwd_inner_microstep: 1335.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-11 03:20:54,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.73 | bwd_microstep: 1441.48 | bwd_inner_microstep: 1441.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 03:20:55,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.03 | bwd_microstep: 1291.09 | bwd_inner_microstep: 1291.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3105
[2024-06-11 03:20:57,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.78 | bwd_microstep: 1147.36 | bwd_inner_microstep: 1147.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 03:20:59,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.69 | bwd_microstep: 1285.77 | bwd_inner_microstep: 1285.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-11 03:21:01,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1507.85 | bwd_inner_microstep: 1507.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450
[2024-06-11 03:21:02,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.29 | bwd_microstep: 1159.37 | bwd_inner_microstep: 1159.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 03:21:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1375.71 | bwd_inner_microstep: 1375.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461
[2024-06-11 03:21:06,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1279.41 | bwd_inner_microstep: 1279.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 03:21:08,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.65 | bwd_microstep: 1557.61 | bwd_inner_microstep: 1557.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 03:21:10,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1256.43 | bwd_inner_microstep: 1256.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-11 03:21:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1487.21 | bwd_inner_microstep: 1487.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 03:21:14,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.64 | bwd_microstep: 1557.95 | bwd_inner_microstep: 1557.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808
[2024-06-11 03:21:16,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1358.41 | bwd_inner_microstep: 1358.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2205
[2024-06-11 03:21:18,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.04 | bwd_microstep: 990.08 | bwd_inner_microstep: 990.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3728
[2024-06-11 03:21:19,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1367.60 | bwd_inner_microstep: 1367.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593
[2024-06-11 03:21:21,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1369.05 | bwd_inner_microstep: 1369.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2899
[2024-06-11 03:21:23,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.18 | bwd_microstep: 1278.69 | bwd_inner_microstep: 1278.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3600
[2024-06-11 03:21:25,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.86 | bwd_microstep: 1739.53 | bwd_inner_microstep: 1739.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3619
[2024-06-11 03:21:31,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.13 | optimizer_step: 6.62
[2024-06-11 03:21:31,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 4740.25 | bwd_inner_microstep: 1777.50 | bwd_allreduce_microstep: 2962.69 | step_microstep: 39.10
[2024-06-11 03:21:31,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15926.13 | bwd: 45647.62 | bwd_inner: 42684.02 | bwd_allreduce: 2962.92 | step: 40.70
{'loss': 1.2138, 'learning_rate': 1.2843616249032874e-06, 'epoch': 0.89}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5199
[2024-06-11 03:21:34,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 760.52 | bwd_microstep: 2014.23 | bwd_inner_microstep: 2014.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3976
[2024-06-11 03:21:36,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1402.89 | bwd_inner_microstep: 1402.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3896
[2024-06-11 03:21:38,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.56 | bwd_microstep: 1611.95 | bwd_inner_microstep: 1611.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 03:21:40,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1649.15 | bwd_inner_microstep: 1649.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 03:21:42,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481
[2024-06-11 03:21:44,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.05 | bwd_microstep: 1410.94 | bwd_inner_microstep: 1410.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 03:21:46,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.53 | bwd_microstep: 1283.74 | bwd_inner_microstep: 1283.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-11 03:21:48,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.93 | bwd_microstep: 1525.18 | bwd_inner_microstep: 1525.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451
[2024-06-11 03:21:49,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1257.22 | bwd_inner_microstep: 1257.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3641
[2024-06-11 03:21:51,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.85 | bwd_microstep: 1350.91 | bwd_inner_microstep: 1350.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975
[2024-06-11 03:21:52,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.74 | bwd_microstep: 829.91 | bwd_inner_microstep: 829.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3496
[2024-06-11 03:21:54,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1346.48 | bwd_inner_microstep: 1346.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-11 03:21:56,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-11 03:21:58,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.32 | bwd_microstep: 1609.58 | bwd_inner_microstep: 1609.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407
[2024-06-11 03:22:00,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.52 | bwd_microstep: 1214.26 | bwd_inner_microstep: 1214.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3526
[2024-06-11 03:22:02,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1439.84 | bwd_inner_microstep: 1439.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991
[2024-06-11 03:22:03,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.14 | bwd_microstep: 801.83 | bwd_inner_microstep: 801.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007
[2024-06-11 03:22:04,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.28 | bwd_microstep: 803.48 | bwd_inner_microstep: 803.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-11 03:22:06,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.73 | bwd_microstep: 1585.06 | bwd_inner_microstep: 1585.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 03:22:09,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1522.17 | bwd_inner_microstep: 1522.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820
[2024-06-11 03:22:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.98 | bwd_microstep: 1355.20 | bwd_inner_microstep: 1355.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 03:22:13,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1560.01 | bwd_inner_microstep: 1559.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 03:22:14,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1282.88 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-11 03:22:16,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1414.59 | bwd_inner_microstep: 1414.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605
[2024-06-11 03:22:18,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1506.03 | bwd_inner_microstep: 1506.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456
[2024-06-11 03:22:20,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1545.90 | bwd_inner_microstep: 1545.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2028
[2024-06-11 03:22:22,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.05 | bwd_microstep: 836.50 | bwd_inner_microstep: 836.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3644
[2024-06-11 03:22:24,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.36 | bwd_microstep: 1575.74 | bwd_inner_microstep: 1575.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-11 03:22:26,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.38 | bwd_microstep: 1589.68 | bwd_inner_microstep: 1589.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808
[2024-06-11 03:22:28,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.54 | bwd_microstep: 1384.67 | bwd_inner_microstep: 1384.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3594
[2024-06-11 03:22:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.38 | bwd_microstep: 1675.91 | bwd_inner_microstep: 1675.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809
[2024-06-11 03:22:33,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.09 | optimizer_step: 6.63
[2024-06-11 03:22:33,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.98 | bwd_microstep: 1790.67 | bwd_inner_microstep: 1782.61 | bwd_allreduce_microstep: 8.00 | step_microstep: 38.48
[2024-06-11 03:22:33,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16707.26 | bwd: 44802.40 | bwd_inner: 44793.49 | bwd_allreduce: 8.23 | step: 40.07
{'loss': 1.1549, 'learning_rate': 1.2711608906608098e-06, 'epoch': 0.89}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-11 03:22:35,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.24 | bwd_microstep: 1477.97 | bwd_inner_microstep: 1477.77 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876
[2024-06-11 03:22:36,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.55 | bwd_microstep: 680.30 | bwd_inner_microstep: 680.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4299
[2024-06-11 03:22:38,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.34 | bwd_microstep: 1546.22 | bwd_inner_microstep: 1546.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850
[2024-06-11 03:22:40,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.86 | bwd_microstep: 1660.63 | bwd_inner_microstep: 1660.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480
[2024-06-11 03:22:42,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.59 | bwd_microstep: 1186.21 | bwd_inner_microstep: 1186.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 03:22:43,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.66 | bwd_microstep: 1248.12 | bwd_inner_microstep: 1248.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-11 03:22:46,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.28 | bwd_microstep: 1539.38 | bwd_inner_microstep: 1539.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2376
[2024-06-11 03:22:47,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.12 | bwd_microstep: 1028.59 | bwd_inner_microstep: 1028.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3405
[2024-06-11 03:22:49,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 03:22:51,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1348.46 | bwd_inner_microstep: 1348.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 03:22:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1389.59 | bwd_inner_microstep: 1389.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3917
[2024-06-11 03:22:55,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1603.26 | bwd_inner_microstep: 1603.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637
[2024-06-11 03:22:57,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1607.15 | bwd_inner_microstep: 1607.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3415
[2024-06-11 03:22:59,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.96 | bwd_microstep: 1376.08 | bwd_inner_microstep: 1376.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3675
[2024-06-11 03:23:01,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.28 | bwd_microstep: 1726.09 | bwd_inner_microstep: 1726.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 03:23:03,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1391.80 | bwd_inner_microstep: 1391.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531
[2024-06-11 03:23:06,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.35 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-11 03:23:07,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.51 | bwd_microstep: 788.77 | bwd_inner_microstep: 788.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 03:23:09,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1371.60 | bwd_inner_microstep: 1371.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3501
[2024-06-11 03:23:11,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.44 | bwd_microstep: 1630.46 | bwd_inner_microstep: 1630.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 03:23:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 1380.75 | bwd_inner_microstep: 1380.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 03:23:15,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1484.48 | bwd_inner_microstep: 1484.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3534
[2024-06-11 03:23:17,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.31 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3460
[2024-06-11 03:23:19,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1325.16 | bwd_inner_microstep: 1325.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-11 03:23:21,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.93 | bwd_microstep: 1647.05 | bwd_inner_microstep: 1647.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 03:23:23,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1518.26 | bwd_inner_microstep: 1518.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3810
[2024-06-11 03:23:25,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.50 | bwd_microstep: 1514.17 | bwd_inner_microstep: 1514.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-11 03:23:27,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1453.27 | bwd_inner_microstep: 1453.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724
[2024-06-11 03:23:29,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.70 | bwd_microstep: 1440.04 | bwd_inner_microstep: 1440.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-11 03:23:31,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558
[2024-06-11 03:23:33,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 03:23:35,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.02 | optimizer_step: 6.62
[2024-06-11 03:23:35,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.01 | bwd_microstep: 1294.50 | bwd_inner_microstep: 1286.21 | bwd_allreduce_microstep: 8.24 | step_microstep: 37.58
[2024-06-11 03:23:35,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16841.03 | bwd: 45109.74 | bwd_inner: 45100.46 | bwd_allreduce: 8.55 | step: 39.24
{'loss': 1.16, 'learning_rate': 1.2580261192923126e-06, 'epoch': 0.89}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1887
[2024-06-11 03:23:36,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.57 | bwd_microstep: 837.37 | bwd_inner_microstep: 837.24 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3976
[2024-06-11 03:23:38,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.84 | bwd_microstep: 1604.83 | bwd_inner_microstep: 1604.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 03:23:40,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1247.78 | bwd_inner_microstep: 1247.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-11 03:23:41,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.16 | bwd_microstep: 695.01 | bwd_inner_microstep: 694.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-11 03:23:43,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.05 | bwd_microstep: 1648.31 | bwd_inner_microstep: 1648.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 03:23:45,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1280.64 | bwd_inner_microstep: 1280.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 03:23:47,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705
[2024-06-11 03:23:49,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1525.57 | bwd_inner_microstep: 1525.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 03:23:51,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.84 | bwd_microstep: 1289.87 | bwd_inner_microstep: 1289.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-11 03:23:53,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.49 | bwd_microstep: 1278.76 | bwd_inner_microstep: 1278.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 03:23:54,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.79 | bwd_microstep: 1373.47 | bwd_inner_microstep: 1373.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504
[2024-06-11 03:23:57,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1576.06 | bwd_inner_microstep: 1576.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508
[2024-06-11 03:23:59,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1410.04 | bwd_inner_microstep: 1410.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3510
[2024-06-11 03:24:01,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.11 | bwd_microstep: 1555.76 | bwd_inner_microstep: 1555.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-11 03:24:03,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.32 | bwd_microstep: 1490.76 | bwd_inner_microstep: 1490.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3661
[2024-06-11 03:24:05,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1385.24 | bwd_inner_microstep: 1385.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 03:24:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.69 | bwd_microstep: 1385.98 | bwd_inner_microstep: 1385.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528
[2024-06-11 03:24:08,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.19 | bwd_microstep: 1325.63 | bwd_inner_microstep: 1325.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2010
[2024-06-11 03:24:09,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.45 | bwd_microstep: 741.95 | bwd_inner_microstep: 741.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-11 03:24:11,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.79 | bwd_microstep: 1461.31 | bwd_inner_microstep: 1461.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 03:24:14,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1510.71 | bwd_inner_microstep: 1510.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 03:24:15,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.60 | bwd_microstep: 1386.77 | bwd_inner_microstep: 1386.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 03:24:17,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1389.78 | bwd_inner_microstep: 1389.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 03:24:19,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.89 | bwd_microstep: 1284.73 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940
[2024-06-11 03:24:20,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.99 | bwd_microstep: 767.31 | bwd_inner_microstep: 767.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 03:24:22,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1554.82 | bwd_inner_microstep: 1554.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-11 03:24:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.31 | bwd_microstep: 1580.31 | bwd_inner_microstep: 1580.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 03:24:27,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.65 | bwd_microstep: 1603.49 | bwd_inner_microstep: 1603.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3776
[2024-06-11 03:24:29,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2698
[2024-06-11 03:24:31,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.49 | bwd_microstep: 1229.01 | bwd_inner_microstep: 1228.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-11 03:24:33,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.60 | bwd_microstep: 1594.41 | bwd_inner_microstep: 1594.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3006
[2024-06-11 03:24:38,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-11 03:24:38,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.15 | bwd_microstep: 4795.08 | bwd_inner_microstep: 1401.91 | bwd_allreduce_microstep: 3393.08 | step_microstep: 40.55
[2024-06-11 03:24:38,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16131.20 | bwd: 46685.25 | bwd_inner: 43291.11 | bwd_allreduce: 3393.39 | step: 42.22
{'loss': 1.1929, 'learning_rate': 1.244957357058394e-06, 'epoch': 0.89}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 03:24:40,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1375.13 | bwd_inner_microstep: 1375.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3947
[2024-06-11 03:24:42,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.96 | bwd_microstep: 1455.71 | bwd_inner_microstep: 1455.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 03:24:44,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1238.93 | bwd_inner_microstep: 1238.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-11 03:24:46,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.13 | bwd_microstep: 1542.84 | bwd_inner_microstep: 1542.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766
[2024-06-11 03:24:48,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.11 | bwd_microstep: 1640.38 | bwd_inner_microstep: 1640.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499
[2024-06-11 03:24:50,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.27 | bwd_microstep: 1190.79 | bwd_inner_microstep: 1190.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 03:24:52,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1251.29 | bwd_inner_microstep: 1251.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-11 03:24:53,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1289.06 | bwd_inner_microstep: 1289.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3693
[2024-06-11 03:24:55,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1388.57 | bwd_inner_microstep: 1388.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553
[2024-06-11 03:24:57,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1427.40 | bwd_inner_microstep: 1427.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 03:24:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1515.26 | bwd_inner_microstep: 1515.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 03:25:01,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1379.99 | bwd_inner_microstep: 1379.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500
[2024-06-11 03:25:03,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.39 | bwd_microstep: 1584.80 | bwd_inner_microstep: 1584.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497
[2024-06-11 03:25:06,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.07 | bwd_microstep: 1583.33 | bwd_inner_microstep: 1583.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512
[2024-06-11 03:25:08,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1413.06 | bwd_inner_microstep: 1413.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982
[2024-06-11 03:25:09,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 800.88 | bwd_inner_microstep: 800.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843
[2024-06-11 03:25:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.49 | bwd_microstep: 1663.11 | bwd_inner_microstep: 1663.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835
[2024-06-11 03:25:13,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.74 | bwd_microstep: 1465.43 | bwd_inner_microstep: 1465.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2125
[2024-06-11 03:25:14,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.40 | bwd_microstep: 767.69 | bwd_inner_microstep: 767.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 03:25:16,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.06 | bwd_microstep: 1655.68 | bwd_inner_microstep: 1655.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-11 03:25:18,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.21 | bwd_microstep: 1160.93 | bwd_inner_microstep: 1160.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2110
[2024-06-11 03:25:19,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.31 | bwd_microstep: 857.71 | bwd_inner_microstep: 857.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3722
[2024-06-11 03:25:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1242.41 | bwd_inner_microstep: 1242.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3608
[2024-06-11 03:25:23,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.87 | bwd_microstep: 1440.00 | bwd_inner_microstep: 1439.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3549
[2024-06-11 03:25:25,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.51 | bwd_microstep: 1236.76 | bwd_inner_microstep: 1236.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-11 03:25:26,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1399.32 | bwd_inner_microstep: 1399.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3812
[2024-06-11 03:25:29,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599
[2024-06-11 03:25:31,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.98 | bwd_microstep: 1599.76 | bwd_inner_microstep: 1599.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751
[2024-06-11 03:25:33,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1535.45 | bwd_inner_microstep: 1535.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 03:25:35,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.58 | bwd_microstep: 1649.14 | bwd_inner_microstep: 1649.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570
[2024-06-11 03:25:37,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.21 | bwd_microstep: 1600.03 | bwd_inner_microstep: 1600.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2198
[2024-06-11 03:25:40,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.10 | optimizer_step: 6.60
[2024-06-11 03:25:40,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.12 | bwd_microstep: 2577.27 | bwd_inner_microstep: 1089.83 | bwd_allreduce_microstep: 1487.38 | step_microstep: 38.63
[2024-06-11 03:25:40,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16374.46 | bwd: 45429.09 | bwd_inner: 43940.80 | bwd_allreduce: 1487.61 | step: 40.37
{'loss': 1.1995, 'learning_rate': 1.2319546499871616e-06, 'epoch': 0.89}
:12, 64.42s/it]
 89%|████████▉ | 1534/1726 [26:44:08<3:23:44, 63.67s/it]


 89%|████████▉ | 1534/1726 [26:44:08<3:23:44, 63.67s/it]
 89%|████████▉ | 1535/1726 [26:45:09<3:20:56, 63.12s/it]


 89%|████████▉ | 1535/1726 [26:45:09<3:20:56, 63.12s/it]
 89%|████████▉ | 1536/1726 [26:46:12<3:19:06, 62.88s/it]


 89%|████████▉ | 1536/1726 [26:46:12<3:19:06, 62.88s/it]
 89%|████████▉ | 1537/1726 [26:47:15<3:18:20, 62.97s/it]


 89%|████████▉ | 1537/1726 [26:47:15<3:18:20, 62.97s/it]
 89%|████████▉ | 1538/1726 [26:48:17<3:16:32, 62.73s/it]


 89%|████████▉ | 1538/172dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-11 03:25:41,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.39 | bwd_microstep: 781.81 | bwd_inner_microstep: 781.74 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3984
[2024-06-11 03:25:44,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.36 | bwd_microstep: 1801.59 | bwd_inner_microstep: 1801.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3907
[2024-06-11 03:25:46,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1655.41 | bwd_inner_microstep: 1655.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-11 03:25:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1351.18 | bwd_inner_microstep: 1351.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-11 03:25:50,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.40 | bwd_microstep: 1560.50 | bwd_inner_microstep: 1560.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-11 03:25:52,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.09 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-11 03:25:54,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.03 | bwd_microstep: 1214.49 | bwd_inner_microstep: 1214.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 03:25:56,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-11 03:25:57,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.30 | bwd_microstep: 1310.91 | bwd_inner_microstep: 1310.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489
[2024-06-11 03:25:59,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.73 | bwd_microstep: 1187.23 | bwd_inner_microstep: 1187.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 03:26:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1286.38 | bwd_inner_microstep: 1286.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2117
[2024-06-11 03:26:02,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.87 | bwd_microstep: 734.41 | bwd_inner_microstep: 734.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 03:26:04,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1349.31 | bwd_inner_microstep: 1349.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 03:26:05,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.97 | bwd_microstep: 1245.25 | bwd_inner_microstep: 1245.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3384
[2024-06-11 03:26:07,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.67 | bwd_microstep: 1366.34 | bwd_inner_microstep: 1366.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 03:26:09,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1341.40 | bwd_inner_microstep: 1341.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593
[2024-06-11 03:26:11,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.58 | bwd_microstep: 1311.96 | bwd_inner_microstep: 1311.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3639
[2024-06-11 03:26:13,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.50 | bwd_microstep: 1563.40 | bwd_inner_microstep: 1563.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3835
[2024-06-11 03:26:15,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.94 | bwd_microstep: 1585.28 | bwd_inner_microstep: 1585.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 03:26:17,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1399.58 | bwd_inner_microstep: 1399.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 03:26:19,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.96 | bwd_microstep: 1460.97 | bwd_inner_microstep: 1460.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-11 03:26:21,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1290.54 | bwd_inner_microstep: 1290.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064
[2024-06-11 03:26:22,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.38 | bwd_microstep: 914.99 | bwd_inner_microstep: 914.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-11 03:26:24,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1295.74 | bwd_inner_microstep: 1295.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555
[2024-06-11 03:26:26,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.15 | bwd_microstep: 1497.67 | bwd_inner_microstep: 1497.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818
[2024-06-11 03:26:28,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.91 | bwd_microstep: 1586.30 | bwd_inner_microstep: 1586.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3556
[2024-06-11 03:26:30,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1458.59 | bwd_inner_microstep: 1458.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3652
[2024-06-11 03:26:32,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1386.90 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3613
[2024-06-11 03:26:34,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.06 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185
[2024-06-11 03:26:36,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.93 | bwd_microstep: 956.10 | bwd_inner_microstep: 956.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3752
[2024-06-11 03:26:38,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1437.39 | bwd_inner_microstep: 1437.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-11 03:26:44,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.31 | optimizer_step: 6.60
[2024-06-11 03:26:44,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 5166.98 | bwd_inner_microstep: 1621.65 | bwd_allreduce_microstep: 3545.22 | step_microstep: 40.45
[2024-06-11 03:26:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.43 | bwd: 46725.68 | bwd_inner: 43179.41 | bwd_allreduce: 3545.51 | step: 42.17
{'loss': 1.1667, 'learning_rate': 1.2190180438740895e-06, 'epoch': 0.89}
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3483
[2024-06-11 03:26:46,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.63 | bwd_microstep: 1525.19 | bwd_inner_microstep: 1524.99 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.19
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4108
[2024-06-11 03:26:48,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.30 | bwd_microstep: 1627.87 | bwd_inner_microstep: 1627.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3883
[2024-06-11 03:26:50,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1383.18 | bwd_inner_microstep: 1383.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 03:26:52,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.21 | bwd_microstep: 1392.90 | bwd_inner_microstep: 1392.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 03:26:54,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1488.37 | bwd_inner_microstep: 1488.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783
[2024-06-11 03:26:56,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.96 | bwd_microstep: 1546.97 | bwd_inner_microstep: 1546.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1991
[2024-06-11 03:26:57,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.38 | bwd_microstep: 835.08 | bwd_inner_microstep: 835.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888
[2024-06-11 03:26:58,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.60 | bwd_microstep: 681.29 | bwd_inner_microstep: 681.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 03:27:00,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.51 | bwd_microstep: 1485.80 | bwd_inner_microstep: 1485.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 03:27:02,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1505.33 | bwd_inner_microstep: 1505.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2006
[2024-06-11 03:27:03,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.69 | bwd_microstep: 774.82 | bwd_inner_microstep: 774.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 03:27:05,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.35 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2485
[2024-06-11 03:27:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.08 | bwd_microstep: 1143.91 | bwd_inner_microstep: 1143.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-11 03:27:08,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1346.20 | bwd_inner_microstep: 1346.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3397
[2024-06-11 03:27:10,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1306.27 | bwd_inner_microstep: 1306.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587
[2024-06-11 03:27:12,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1338.32 | bwd_inner_microstep: 1338.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1997
[2024-06-11 03:27:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.23 | bwd_microstep: 897.76 | bwd_inner_microstep: 897.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4024
[2024-06-11 03:27:15,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1418.92 | bwd_inner_microstep: 1418.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2125
[2024-06-11 03:27:17,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.44 | bwd_microstep: 959.74 | bwd_inner_microstep: 959.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-11 03:27:18,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.52 | bwd_microstep: 1163.44 | bwd_inner_microstep: 1163.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 03:27:20,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3626
[2024-06-11 03:27:22,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.45 | bwd_microstep: 1219.54 | bwd_inner_microstep: 1219.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-11 03:27:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.82 | bwd_microstep: 1163.11 | bwd_inner_microstep: 1163.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-11 03:27:25,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1302.25 | bwd_inner_microstep: 1302.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 03:27:27,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1563.45 | bwd_inner_microstep: 1563.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3834
[2024-06-11 03:27:29,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.26 | bwd_microstep: 1490.04 | bwd_inner_microstep: 1490.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-11 03:27:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.95 | bwd_microstep: 1217.70 | bwd_inner_microstep: 1217.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 03:27:33,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.04 | bwd_microstep: 1557.07 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-11 03:27:35,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1527.05 | bwd_inner_microstep: 1527.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2131
[2024-06-11 03:27:36,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.28 | bwd_microstep: 836.25 | bwd_inner_microstep: 836.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 03:27:38,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.44 | bwd_microstep: 1467.63 | bwd_inner_microstep: 1467.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-11 03:27:45,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.26 | optimizer_step: 6.61
[2024-06-11 03:27:45,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 6341.49 | bwd_inner_microstep: 1870.93 | bwd_allreduce_microstep: 4470.50 | step_microstep: 39.55
[2024-06-11 03:27:45,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15472.25 | bwd: 46076.08 | bwd_inner: 41604.51 | bwd_allreduce: 4470.82 | step: 41.38
{'loss': 1.1129, 'learning_rate': 1.2061475842818337e-06, 'epoch': 0.89}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1911
[2024-06-11 03:27:47,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.73 | bwd_microstep: 865.58 | bwd_inner_microstep: 865.47 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2913
[2024-06-11 03:27:48,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.13 | bwd_microstep: 1085.85 | bwd_inner_microstep: 1085.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3848
[2024-06-11 03:27:50,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.35 | bwd_microstep: 1490.89 | bwd_inner_microstep: 1490.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4116
[2024-06-11 03:27:53,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.76 | bwd_microstep: 1734.38 | bwd_inner_microstep: 1734.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3762
[2024-06-11 03:27:55,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.00 | bwd_microstep: 1566.26 | bwd_inner_microstep: 1566.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 03:27:56,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1247.67 | bwd_inner_microstep: 1247.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 03:27:58,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.43 | bwd_microstep: 1276.81 | bwd_inner_microstep: 1276.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-11 03:28:00,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.79 | bwd_microstep: 1624.10 | bwd_inner_microstep: 1624.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3419
[2024-06-11 03:28:02,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.48 | bwd_microstep: 1296.62 | bwd_inner_microstep: 1296.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2178
[2024-06-11 03:28:03,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.80 | bwd_microstep: 885.68 | bwd_inner_microstep: 885.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-11 03:28:05,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 03:28:07,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.39 | bwd_microstep: 1382.48 | bwd_inner_microstep: 1382.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005
[2024-06-11 03:28:08,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.62 | bwd_microstep: 830.41 | bwd_inner_microstep: 830.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-11 03:28:10,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.08 | bwd_microstep: 1488.56 | bwd_inner_microstep: 1488.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3988
[2024-06-11 03:28:13,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.04 | bwd_microstep: 1700.83 | bwd_inner_microstep: 1700.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-11 03:28:15,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1482.84 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3416
[2024-06-11 03:28:17,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.25 | bwd_microstep: 1471.58 | bwd_inner_microstep: 1471.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-11 03:28:19,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1511.16 | bwd_inner_microstep: 1511.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3469
[2024-06-11 03:28:21,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1428.05 | bwd_inner_microstep: 1428.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-11 03:28:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1416.82 | bwd_inner_microstep: 1416.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 03:28:25,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.19 | bwd_microstep: 1390.73 | bwd_inner_microstep: 1390.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713
[2024-06-11 03:28:27,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.49 | bwd_microstep: 1461.65 | bwd_inner_microstep: 1461.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-11 03:28:28,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.84 | bwd_microstep: 802.25 | bwd_inner_microstep: 802.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 03:28:30,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.67 | bwd_microstep: 1554.71 | bwd_inner_microstep: 1554.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 03:28:32,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1402.29 | bwd_inner_microstep: 1402.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-11 03:28:33,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.34 | bwd_microstep: 972.86 | bwd_inner_microstep: 972.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-11 03:28:35,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.08 | bwd_microstep: 1216.29 | bwd_inner_microstep: 1216.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 03:28:37,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1493.88 | bwd_inner_microstep: 1493.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2352
[2024-06-11 03:28:38,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.49 | bwd_microstep: 892.46 | bwd_inner_microstep: 892.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3559
[2024-06-11 03:28:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.36 | bwd_microstep: 1233.07 | bwd_inner_microstep: 1233.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1990
[2024-06-11 03:28:41,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.50 | bwd_microstep: 736.60 | bwd_inner_microstep: 736.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736
[2024-06-11 03:28:48,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.32 | optimizer_step: 6.60
[2024-06-11 03:28:48,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 5831.18 | bwd_inner_microstep: 1619.94 | bwd_allreduce_microstep: 4211.19 | step_microstep: 39.31
[2024-06-11 03:28:48,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15628.90 | bwd: 46128.75 | bwd_inner: 41916.56 | bwd_allreduce: 4211.47 | step: 40.83
{'loss': 1.176, 'learning_rate': 1.1933433165400854e-06, 'epoch': 0.89}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 03:28:49,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1334.86 | bwd_inner_microstep: 1334.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 03:28:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 03:28:53,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.67 | bwd_microstep: 1548.17 | bwd_inner_microstep: 1548.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-11 03:28:56,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 03:28:57,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728
[2024-06-11 03:28:59,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1437.48 | bwd_inner_microstep: 1437.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 03:29:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1388.71 | bwd_inner_microstep: 1388.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-11 03:29:03,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.28 | bwd_microstep: 1289.70 | bwd_inner_microstep: 1289.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-11 03:29:05,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1533.26 | bwd_inner_microstep: 1533.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3408
[2024-06-11 03:29:07,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.34 | bwd_microstep: 1325.17 | bwd_inner_microstep: 1325.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689
[2024-06-11 03:29:09,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.07 | bwd_microstep: 1615.89 | bwd_inner_microstep: 1615.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-11 03:29:11,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1447.53 | bwd_inner_microstep: 1447.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511
[2024-06-11 03:29:13,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1585.70 | bwd_inner_microstep: 1585.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2087
[2024-06-11 03:29:15,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.28 | bwd_microstep: 1014.80 | bwd_inner_microstep: 1014.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3651
[2024-06-11 03:29:17,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1382.37 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606
[2024-06-11 03:29:18,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1311.24 | bwd_inner_microstep: 1311.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-11 03:29:21,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.17 | bwd_microstep: 1609.22 | bwd_inner_microstep: 1609.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623
[2024-06-11 03:29:23,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1612.64 | bwd_inner_microstep: 1612.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 03:29:25,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1257.34 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-11 03:29:27,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1428.82 | bwd_inner_microstep: 1428.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2435
[2024-06-11 03:29:28,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.91 | bwd_microstep: 854.72 | bwd_inner_microstep: 854.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3527
[2024-06-11 03:29:30,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1229.73 | bwd_inner_microstep: 1229.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725
[2024-06-11 03:29:32,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1442.91 | bwd_inner_microstep: 1442.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3808
[2024-06-11 03:29:33,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.65 | bwd_microstep: 1291.08 | bwd_inner_microstep: 1291.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709
[2024-06-11 03:29:35,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1436.86 | bwd_inner_microstep: 1436.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-11 03:29:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544
[2024-06-11 03:29:39,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1297.65 | bwd_inner_microstep: 1297.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-11 03:29:41,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1475.61 | bwd_inner_microstep: 1475.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-11 03:29:43,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1357.10 | bwd_inner_microstep: 1357.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 2957
[2024-06-11 03:29:45,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.77 | bwd_microstep: 1362.45 | bwd_inner_microstep: 1362.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3608
[2024-06-11 03:29:47,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1572.28 | bwd_inner_microstep: 1572.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2953
[2024-06-11 03:29:49,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.05 | optimizer_step: 6.62
[2024-06-11 03:29:49,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.02 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1322.79 | bwd_allreduce_microstep: 60.65 | step_microstep: 38.32
[2024-06-11 03:29:49,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16579.07 | bwd: 44446.84 | bwd_inner: 44385.29 | bwd_allreduce: 60.88 | step: 39.83
{'loss': 1.145, 'learning_rate': 1.1806052857454087e-06, 'epoch': 0.89}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 03:29:51,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.07 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4154
[2024-06-11 03:29:53,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.89 | bwd_microstep: 1650.20 | bwd_inner_microstep: 1650.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 03:29:55,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1247.87 | bwd_inner_microstep: 1247.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-11 03:29:57,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1493.52 | bwd_inner_microstep: 1493.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.39
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 03:29:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1479.00 | bwd_inner_microstep: 1478.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2217
[2024-06-11 03:30:00,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.99 | bwd_microstep: 893.12 | bwd_inner_microstep: 893.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 03:30:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1388.02 | bwd_inner_microstep: 1388.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 03:30:04,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-11 03:30:06,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.24 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1990
[2024-06-11 03:30:07,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 901.97 | bwd_inner_microstep: 901.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 03:30:09,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1382.41 | bwd_inner_microstep: 1382.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379
[2024-06-11 03:30:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.65 | bwd_microstep: 1241.86 | bwd_inner_microstep: 1241.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3414
[2024-06-11 03:30:12,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.08 | bwd_microstep: 1186.70 | bwd_inner_microstep: 1186.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-11 03:30:14,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3398
[2024-06-11 03:30:16,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.63 | bwd_microstep: 1407.71 | bwd_inner_microstep: 1407.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-11 03:30:18,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1344.19 | bwd_inner_microstep: 1344.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3593
[2024-06-11 03:30:20,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.50 | bwd_microstep: 1212.41 | bwd_inner_microstep: 1212.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3978
[2024-06-11 03:30:22,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.12 | bwd_microstep: 1613.58 | bwd_inner_microstep: 1613.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1997
[2024-06-11 03:30:23,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.64 | bwd_microstep: 786.69 | bwd_inner_microstep: 786.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3676
[2024-06-11 03:30:25,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.22 | bwd_microstep: 1280.82 | bwd_inner_microstep: 1280.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 03:30:27,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1280.14 | bwd_inner_microstep: 1280.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702
[2024-06-11 03:30:29,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.63 | bwd_microstep: 1628.54 | bwd_inner_microstep: 1628.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3546
[2024-06-11 03:30:31,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.88 | bwd_microstep: 1427.13 | bwd_inner_microstep: 1427.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 03:30:33,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1557.26 | bwd_inner_microstep: 1557.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3787
[2024-06-11 03:30:35,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.39 | bwd_microstep: 1354.63 | bwd_inner_microstep: 1354.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2177
[2024-06-11 03:30:36,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.87 | bwd_microstep: 888.97 | bwd_inner_microstep: 888.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564
[2024-06-11 03:30:38,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.55 | bwd_microstep: 1428.40 | bwd_inner_microstep: 1428.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-11 03:30:39,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.41 | bwd_microstep: 687.37 | bwd_inner_microstep: 687.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-11 03:30:41,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.88 | bwd_microstep: 1534.57 | bwd_inner_microstep: 1534.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3803
[2024-06-11 03:30:43,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.11 | bwd_microstep: 1500.43 | bwd_inner_microstep: 1500.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424
[2024-06-11 03:30:45,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1545.29 | bwd_inner_microstep: 1545.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3809
[2024-06-11 03:30:49,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-11 03:30:49,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.48 | bwd_microstep: 2439.74 | bwd_inner_microstep: 2093.18 | bwd_allreduce_microstep: 346.50 | step_microstep: 38.88
[2024-06-11 03:30:49,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15974.28 | bwd: 43383.99 | bwd_inner: 43036.59 | bwd_allreduce: 346.73 | step: 41.78
{'loss': 1.1578, 'learning_rate': 1.1679335367610855e-06, 'epoch': 0.89}
6 [26:48:17<3:16:32, 62.73s/it]
 89%|████████▉ | 1539/1726 [26:49:20<3:15:56, 62.87s/it]


 89%|████████▉ | 1539/1726 [26:49:20<3:15:56, 62.87s/it]
 89%|████████▉ | 1540/1726 [26:50:22<3:13:59, 62.58s/it]


 89%|████████▉ | 1540/1726 [26:50:22<3:13:59, 62.58s/it]
 89%|████████▉ | 1541/1726 [26:51:24<3:12:30, 62.43s/it]


 89%|████████▉ | 1541/1726 [26:51:24<3:12:30, 62.43s/it]
 89%|████████▉ | 1542/1726 [26:52:26<3:10:29, 62.12s/it]


 89%|████████▉ | 1542/1726 [26:52:26<3:10:29, 62.12s/it]
 89%|████████▉ | 1543/1726 [26:53:25<3:07:14, 61.39s/it]


 89%|███████�dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-11 03:30:50,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.83 | bwd_microstep: 1274.11 | bwd_inner_microstep: 1273.92 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 03:30:52,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1394.49 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866
[2024-06-11 03:30:54,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.90 | bwd_microstep: 1568.69 | bwd_inner_microstep: 1568.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3784
[2024-06-11 03:30:56,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.55 | bwd_microstep: 1349.23 | bwd_inner_microstep: 1349.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-11 03:30:58,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1389.77 | bwd_inner_microstep: 1389.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736
[2024-06-11 03:31:00,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.43 | bwd_microstep: 1533.68 | bwd_inner_microstep: 1533.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-11 03:31:01,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 797.68 | bwd_inner_microstep: 797.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920
[2024-06-11 03:31:02,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.64 | bwd_microstep: 689.84 | bwd_inner_microstep: 689.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 03:31:04,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1291.11 | bwd_inner_microstep: 1291.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2176
[2024-06-11 03:31:06,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.36 | bwd_microstep: 1013.26 | bwd_inner_microstep: 1013.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 03:31:08,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2141
[2024-06-11 03:31:09,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.01 | bwd_microstep: 929.43 | bwd_inner_microstep: 929.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3525
[2024-06-11 03:31:11,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.83 | bwd_microstep: 1541.90 | bwd_inner_microstep: 1541.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-11 03:31:13,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1620.81 | bwd_inner_microstep: 1620.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2991
[2024-06-11 03:31:15,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.17 | bwd_microstep: 1200.91 | bwd_inner_microstep: 1200.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642
[2024-06-11 03:31:17,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.49 | bwd_microstep: 1713.24 | bwd_inner_microstep: 1713.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3649
[2024-06-11 03:31:20,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.37 | bwd_microstep: 1716.11 | bwd_inner_microstep: 1716.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-11 03:31:22,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1476.77 | bwd_inner_microstep: 1476.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-11 03:31:24,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.95 | bwd_microstep: 1581.39 | bwd_inner_microstep: 1581.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-11 03:31:26,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1557.16 | bwd_inner_microstep: 1557.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-11 03:31:28,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1472.39 | bwd_inner_microstep: 1472.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-11 03:31:30,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3532
[2024-06-11 03:31:32,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.51 | bwd_microstep: 1687.92 | bwd_inner_microstep: 1687.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 03:31:34,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1486.83 | bwd_inner_microstep: 1486.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 03:31:37,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.86 | bwd_microstep: 1646.78 | bwd_inner_microstep: 1646.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085
[2024-06-11 03:31:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.44 | bwd_microstep: 820.66 | bwd_inner_microstep: 820.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 03:31:40,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1453.33 | bwd_inner_microstep: 1453.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-11 03:31:42,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.10 | bwd_microstep: 1452.33 | bwd_inner_microstep: 1452.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-11 03:31:44,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1309.91 | bwd_inner_microstep: 1309.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812
[2024-06-11 03:31:46,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.11 | bwd_microstep: 1487.05 | bwd_inner_microstep: 1487.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-11 03:31:48,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1409.75 | bwd_inner_microstep: 1409.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-11 03:31:50,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.07 | optimizer_step: 6.64
[2024-06-11 03:31:50,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.91 | bwd_microstep: 2277.62 | bwd_inner_microstep: 1780.84 | bwd_allreduce_microstep: 496.74 | step_microstep: 37.70
[2024-06-11 03:31:50,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16529.10 | bwd: 45016.37 | bwd_inner: 44518.58 | bwd_allreduce: 497.04 | step: 39.26
{'loss': 1.1787, 'learning_rate': 1.155328114216947e-06, 'epoch': 0.89}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400
[2024-06-11 03:31:52,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1303.97 | bwd_inner_microstep: 1303.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 03:31:54,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1383.98 | bwd_inner_microstep: 1383.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401
[2024-06-11 03:31:56,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.89 | bwd_microstep: 1209.42 | bwd_inner_microstep: 1209.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3853
[2024-06-11 03:31:58,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1370.06 | bwd_inner_microstep: 1370.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-11 03:31:59,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.17 | bwd_microstep: 798.06 | bwd_inner_microstep: 798.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749
[2024-06-11 03:32:01,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.26 | bwd_microstep: 1641.03 | bwd_inner_microstep: 1641.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-11 03:32:03,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1286.88 | bwd_inner_microstep: 1286.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708
[2024-06-11 03:32:05,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1426.48 | bwd_inner_microstep: 1426.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707
[2024-06-11 03:32:07,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1630.64 | bwd_inner_microstep: 1630.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-11 03:32:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.32 | bwd_microstep: 1531.32 | bwd_inner_microstep: 1531.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-11 03:32:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.72 | bwd_microstep: 799.15 | bwd_inner_microstep: 799.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 03:32:12,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888
[2024-06-11 03:32:13,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.20 | bwd_microstep: 686.26 | bwd_inner_microstep: 686.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632
[2024-06-11 03:32:15,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1409.18 | bwd_inner_microstep: 1409.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 03:32:17,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1391.75 | bwd_inner_microstep: 1391.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-11 03:32:19,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.49 | bwd_microstep: 1586.89 | bwd_inner_microstep: 1586.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-11 03:32:21,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1280.87 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3426
[2024-06-11 03:32:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.20 | bwd_microstep: 1371.69 | bwd_inner_microstep: 1371.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3703
[2024-06-11 03:32:25,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.86 | bwd_microstep: 1727.57 | bwd_inner_microstep: 1727.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3429
[2024-06-11 03:32:27,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.03 | bwd_microstep: 1513.23 | bwd_inner_microstep: 1513.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-11 03:32:30,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.72 | bwd_microstep: 1532.33 | bwd_inner_microstep: 1532.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-11 03:32:32,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.34 | bwd_microstep: 1537.03 | bwd_inner_microstep: 1537.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-11 03:32:34,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.54 | bwd_microstep: 1403.77 | bwd_inner_microstep: 1403.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2919
[2024-06-11 03:32:35,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.32 | bwd_microstep: 1130.62 | bwd_inner_microstep: 1130.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3561
[2024-06-11 03:32:37,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1465.20 | bwd_inner_microstep: 1465.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 03:32:39,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1375.98 | bwd_inner_microstep: 1375.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 03:32:41,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1403.45 | bwd_inner_microstep: 1403.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 03:32:43,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1515.83 | bwd_inner_microstep: 1515.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-11 03:32:45,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1301.67 | bwd_inner_microstep: 1301.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2049
[2024-06-11 03:32:46,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.47 | bwd_microstep: 843.12 | bwd_inner_microstep: 843.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3620
[2024-06-11 03:32:48,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.50 | bwd_microstep: 1516.13 | bwd_inner_microstep: 1516.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-11 03:33:10,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.64 | optimizer_step: 6.61
[2024-06-11 03:33:10,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.22 | bwd_microstep: 21402.31 | bwd_inner_microstep: 1867.99 | bwd_allreduce_microstep: 19534.22 | step_microstep: 41.51
[2024-06-11 03:33:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16231.18 | bwd: 63177.58 | bwd_inner: 43642.27 | bwd_allreduce: 19534.55 | step: 43.09
{'loss': 1.1732, 'learning_rate': 1.1427890625092265e-06, 'epoch': 0.9}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4614
[2024-06-11 03:33:13,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.84 | bwd_microstep: 1650.82 | bwd_inner_microstep: 1650.72 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891
[2024-06-11 03:33:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.88 | bwd_microstep: 1578.77 | bwd_inner_microstep: 1578.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-11 03:33:17,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766
[2024-06-11 03:33:19,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.34 | bwd_microstep: 1443.29 | bwd_inner_microstep: 1443.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 03:33:20,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1372.82 | bwd_inner_microstep: 1372.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484
[2024-06-11 03:33:22,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.23 | bwd_microstep: 1217.54 | bwd_inner_microstep: 1217.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1898
[2024-06-11 03:33:23,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.65 | bwd_microstep: 712.48 | bwd_inner_microstep: 712.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559
[2024-06-11 03:33:25,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1490.87 | bwd_inner_microstep: 1490.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-11 03:33:27,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1277.58 | bwd_inner_microstep: 1277.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573
[2024-06-11 03:33:29,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.79 | bwd_microstep: 1497.08 | bwd_inner_microstep: 1497.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-11 03:33:31,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1409.99 | bwd_inner_microstep: 1409.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3645
[2024-06-11 03:33:33,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1441.51 | bwd_inner_microstep: 1441.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971
[2024-06-11 03:33:34,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.13 | bwd_microstep: 891.19 | bwd_inner_microstep: 891.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3684
[2024-06-11 03:33:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.58 | bwd_microstep: 1551.81 | bwd_inner_microstep: 1551.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-11 03:33:54,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.41 | bwd_microstep: 1242.85 | bwd_inner_microstep: 1242.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2296
[2024-06-11 03:33:55,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.67 | bwd_microstep: 1001.40 | bwd_inner_microstep: 1001.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-11 03:33:57,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.82 | bwd_microstep: 1281.05 | bwd_inner_microstep: 1281.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1078
[2024-06-11 03:33:57,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 161.97 | bwd_microstep: 416.85 | bwd_inner_microstep: 416.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 03:33:59,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1388.50 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-11 03:34:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.08 | bwd_microstep: 1488.42 | bwd_inner_microstep: 1488.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-11 03:34:03,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1393.41 | bwd_inner_microstep: 1393.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 03:34:05,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1387.41 | bwd_inner_microstep: 1387.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-11 03:34:07,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1346.13 | bwd_inner_microstep: 1346.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3604
[2024-06-11 03:34:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1537.12 | bwd_inner_microstep: 1537.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 03:34:10,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.29 | bwd_microstep: 975.52 | bwd_inner_microstep: 975.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3899
[2024-06-11 03:34:13,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.23 | bwd_microstep: 1522.40 | bwd_inner_microstep: 1522.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009
[2024-06-11 03:34:14,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.33 | bwd_microstep: 739.91 | bwd_inner_microstep: 739.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1463
[2024-06-11 03:34:14,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.44 | bwd_microstep: 541.97 | bwd_inner_microstep: 541.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-11 03:34:17,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.48 | bwd_microstep: 1757.44 | bwd_inner_microstep: 1757.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3592
[2024-06-11 03:34:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.75 | bwd_microstep: 1663.90 | bwd_inner_microstep: 1663.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3582
[2024-06-11 03:34:21,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1626.53 | bwd_inner_microstep: 1626.04 | bwd_allreduce_microstep: 0.24 | step_microstep: 0.37
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-11 03:35:17,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.27 | optimizer_step: 6.61
[2024-06-11 03:35:17,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.24 | bwd_microstep: 55470.59 | bwd_inner_microstep: 1809.39 | bwd_allreduce_microstep: 53661.12 | step_microstep: 39.50
[2024-06-11 03:35:17,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15664.74 | bwd: 95657.25 | bwd_inner: 41994.71 | bwd_allreduce: 53661.69 | step: 41.62
{'loss': 1.1604, 'learning_rate': 1.1303164258003974e-06, 'epoch': 0.9}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459
[2024-06-11 03:35:19,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.33 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-11 03:35:21,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.51 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1474.59 | bwd_allreduce_microstep: 0.22 | step_microstep: 0.36
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791
[2024-06-11 03:35:23,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.73 | bwd_microstep: 1540.04 | bwd_inner_microstep: 1540.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4138
[2024-06-11 03:35:26,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1533.92 | bwd_inner_microstep: 1533.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4080
[2024-06-11 03:35:28,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.83 | bwd_microstep: 1718.82 | bwd_inner_microstep: 1718.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 03:35:30,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1377.14 | bwd_inner_microstep: 1377.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-11 03:35:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.65 | bwd_microstep: 1627.97 | bwd_inner_microstep: 1627.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-11 03:35:33,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 797.39 | bwd_inner_microstep: 797.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429
[2024-06-11 03:35:35,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.86 | bwd_microstep: 1278.05 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2428
[2024-06-11 03:35:36,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.82 | bwd_microstep: 1097.30 | bwd_inner_microstep: 1097.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687
[2024-06-11 03:35:39,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.03 | bwd_microstep: 1722.75 | bwd_inner_microstep: 1722.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3692
[2024-06-11 03:35:41,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.20 | bwd_microstep: 1564.56 | bwd_inner_microstep: 1564.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2128
[2024-06-11 03:35:42,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.48 | bwd_microstep: 826.05 | bwd_inner_microstep: 826.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 03:35:44,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1372.76 | bwd_inner_microstep: 1372.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496
[2024-06-11 03:35:46,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1574.97 | bwd_inner_microstep: 1574.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880
[2024-06-11 03:35:49,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1686.66 | bwd_inner_microstep: 1686.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 03:35:50,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.24 | bwd_microstep: 1254.32 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 03:35:52,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.28 | bwd_microstep: 1558.32 | bwd_inner_microstep: 1558.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2063
[2024-06-11 03:35:54,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.87 | bwd_microstep: 845.49 | bwd_inner_microstep: 845.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462
[2024-06-11 03:35:55,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.68 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999
[2024-06-11 03:35:57,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 800.40 | bwd_inner_microstep: 800.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-11 03:35:58,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.89 | bwd_microstep: 804.46 | bwd_inner_microstep: 804.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 03:36:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1515.88 | bwd_inner_microstep: 1515.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-11 03:36:02,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597
[2024-06-11 03:36:04,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1438.66 | bwd_inner_microstep: 1438.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 03:36:05,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.27 | bwd_inner_microstep: 1283.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3709
[2024-06-11 03:36:08,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.43 | bwd_microstep: 1696.83 | bwd_inner_microstep: 1696.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558
[2024-06-11 03:36:10,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.61 | bwd_microstep: 1561.77 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 03:36:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1403.28 | bwd_inner_microstep: 1403.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806
[2024-06-11 03:36:14,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.60 | bwd_microstep: 1521.92 | bwd_inner_microstep: 1521.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3585
[2024-06-11 03:36:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.24 | bwd_microstep: 1651.83 | bwd_inner_microstep: 1651.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-11 03:36:19,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.57
[2024-06-11 03:36:19,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 2428.44 | bwd_inner_microstep: 1661.03 | bwd_allreduce_microstep: 767.36 | step_microstep: 38.48
[2024-06-11 03:36:19,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16401.55 | bwd: 44972.16 | bwd_inner: 44203.55 | bwd_allreduce: 767.80 | step: 40.42
{'loss': 1.1832, 'learning_rate': 1.1179102480190208e-06, 'epoch': 0.9}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1921
[2024-06-11 03:36:20,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.37 | bwd_microstep: 817.87 | bwd_inner_microstep: 817.74 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 03:36:22,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.90 | bwd_microstep: 1278.64 | bwd_inner_microstep: 1278.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3872
[2024-06-11 03:36:24,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.30 | bwd_microstep: 1466.40 | bwd_inner_microstep: 1466.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-11 03:36:26,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-11 03:36:27,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.32 | bwd_microstep: 716.55 | bwd_inner_microstep: 716.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 03:36:29,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1382.55 | bwd_inner_microstep: 1382.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-11 03:36:31,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.54 | bwd_microstep: 1640.18 | bwd_inner_microstep: 1640.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872
[2024-06-11 03:36:32,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.47 | bwd_microstep: 678.35 | bwd_inner_microstep: 678.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1453
[2024-06-11 03:36:33,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 227.82 | bwd_microstep: 601.57 | bwd_inner_microstep: 601.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406
[2024-06-11 03:36:35,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1340.19 | bwd_inner_microstep: 1340.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-11 03:36:37,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1507.83 | bwd_inner_microstep: 1507.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2126
[2024-06-11 03:36:38,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.34 | bwd_microstep: 1022.02 | bwd_inner_microstep: 1021.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 03:36:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3656
[2024-06-11 03:36:43,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.06 | bwd_microstep: 1783.22 | bwd_inner_microstep: 1783.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-11 03:36:44,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.57 | bwd_microstep: 1191.52 | bwd_inner_microstep: 1191.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495
[2024-06-11 03:36:46,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.34 | bwd_microstep: 1189.74 | bwd_inner_microstep: 1189.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3498
[2024-06-11 03:36:48,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1251.23 | bwd_inner_microstep: 1251.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 03:36:50,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.10 | bwd_microstep: 1451.89 | bwd_inner_microstep: 1451.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446
[2024-06-11 03:36:51,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.39 | bwd_microstep: 1189.01 | bwd_inner_microstep: 1188.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3772
[2024-06-11 03:36:53,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.81 | bwd_microstep: 1250.47 | bwd_inner_microstep: 1250.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3818
[2024-06-11 03:36:55,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.02 | bwd_microstep: 1292.14 | bwd_inner_microstep: 1292.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-11 03:36:57,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1286.44 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145
[2024-06-11 03:36:58,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.99 | bwd_microstep: 852.67 | bwd_inner_microstep: 852.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2141
[2024-06-11 03:36:59,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.97 | bwd_microstep: 867.26 | bwd_inner_microstep: 867.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-11 03:37:01,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.75 | bwd_microstep: 1566.37 | bwd_inner_microstep: 1566.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-11 03:37:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.00 | bwd_microstep: 875.82 | bwd_inner_microstep: 875.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-11 03:37:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.29 | bwd_microstep: 1639.70 | bwd_inner_microstep: 1639.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3052
[2024-06-11 03:37:06,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.10 | bwd_microstep: 1172.37 | bwd_inner_microstep: 1172.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414
[2024-06-11 03:37:08,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270
[2024-06-11 03:37:09,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.80 | bwd_microstep: 969.46 | bwd_inner_microstep: 969.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758
[2024-06-11 03:37:11,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.82 | bwd_microstep: 1438.11 | bwd_inner_microstep: 1438.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 03:37:19,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.36 | optimizer_step: 6.59
[2024-06-11 03:37:19,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 7130.91 | bwd_inner_microstep: 1526.22 | bwd_allreduce_microstep: 5604.61 | step_microstep: 39.60
[2024-06-11 03:37:19,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14716.45 | bwd: 45010.89 | bwd_inner: 39405.26 | bwd_allreduce: 5604.89 | step: 41.15
{'loss': 1.185, 'learning_rate': 1.1055705728595955e-06, 'epoch': 0.9}
��▉ | 1543/1726 [26:53:25<3:07:14, 61.39s/it]
 89%|████████▉ | 1544/1726 [26:54:27<3:06:40, 61.54s/it]


 89%|████████▉ | 1544/1726 [26:54:27<3:06:40, 61.54s/it]
 90%|████████▉ | 1545/1726 [26:55:47<3:22:08, 67.01s/it]


 90%|████████▉ | 1545/1726 [26:55:47<3:22:08, 67.01s/it]
 90%|████████▉ | 1546/1726 [26:57:54<4:15:06, 85.04s/it]


 90%|████████▉ | 1546/1726 [26:57:54<4:15:06, 85.04s/it]
 90%|████████▉ | 1547/1726 [26:58:56<3:52:50, 78.05s/it]


 90%|████████▉ | 1547/1726 [26:58:56<3:52:50, 78.05s/it]
 90%|████████▉ | 1548/1726 [26:59:56<3:35:32, 72.65s/it]


 90%|██dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2454
[2024-06-11 03:37:21,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.28 | bwd_microstep: 1034.26 | bwd_inner_microstep: 1034.16 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3084
[2024-06-11 03:37:22,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.42 | bwd_microstep: 1088.02 | bwd_inner_microstep: 1087.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-11 03:37:24,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1478.75 | bwd_inner_microstep: 1478.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.92
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 03:37:25,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.53 | bwd_microstep: 790.57 | bwd_inner_microstep: 790.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-11 03:37:27,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1345.10 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 03:37:29,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1385.12 | bwd_inner_microstep: 1385.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 03:37:31,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1341.68 | bwd_inner_microstep: 1341.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 03:37:33,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.76 | bwd_microstep: 1283.57 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-11 03:37:35,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.43 | bwd_microstep: 1433.31 | bwd_inner_microstep: 1433.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-11 03:37:37,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1450.22 | bwd_inner_microstep: 1450.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-11 03:37:38,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 794.56 | bwd_inner_microstep: 794.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2148
[2024-06-11 03:37:39,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.32 | bwd_microstep: 820.92 | bwd_inner_microstep: 820.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565
[2024-06-11 03:37:41,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.84 | bwd_microstep: 1331.67 | bwd_inner_microstep: 1331.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3485
[2024-06-11 03:37:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.77 | bwd_microstep: 1545.24 | bwd_inner_microstep: 1545.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-11 03:37:45,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1487.59 | bwd_inner_microstep: 1487.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542
[2024-06-11 03:37:47,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1455.40 | bwd_inner_microstep: 1455.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1922
[2024-06-11 03:37:48,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.61 | bwd_microstep: 725.92 | bwd_inner_microstep: 725.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3500
[2024-06-11 03:37:50,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1250.42 | bwd_inner_microstep: 1250.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-11 03:37:51,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.24 | bwd_microstep: 805.12 | bwd_inner_microstep: 805.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3509
[2024-06-11 03:37:53,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1529.13 | bwd_inner_microstep: 1529.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3421
[2024-06-11 03:37:55,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1314.87 | bwd_inner_microstep: 1314.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3797
[2024-06-11 03:37:57,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.42 | bwd_microstep: 1616.98 | bwd_inner_microstep: 1616.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 03:37:59,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612
[2024-06-11 03:38:01,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.11 | bwd_microstep: 1347.56 | bwd_inner_microstep: 1347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3689
[2024-06-11 03:38:03,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.10 | bwd_microstep: 1392.81 | bwd_inner_microstep: 1392.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-11 03:38:05,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.31 | bwd_microstep: 1646.21 | bwd_inner_microstep: 1646.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3943
[2024-06-11 03:38:07,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.20 | bwd_microstep: 1803.08 | bwd_inner_microstep: 1803.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073
[2024-06-11 03:38:08,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.75 | bwd_microstep: 818.83 | bwd_inner_microstep: 818.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763
[2024-06-11 03:38:11,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.23 | bwd_microstep: 1646.94 | bwd_inner_microstep: 1646.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-11 03:38:13,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.51 | bwd_microstep: 1647.44 | bwd_inner_microstep: 1647.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2272
[2024-06-11 03:38:14,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.14 | bwd_microstep: 874.56 | bwd_inner_microstep: 874.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 03:38:20,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.58
[2024-06-11 03:38:20,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 5620.07 | bwd_inner_microstep: 1458.34 | bwd_allreduce_microstep: 4161.66 | step_microstep: 38.54
[2024-06-11 03:38:20,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15401.67 | bwd: 45482.64 | bwd_inner: 41319.97 | bwd_allreduce: 4161.95 | step: 42.17
{'loss': 1.2078, 'learning_rate': 1.0932974437823884e-06, 'epoch': 0.9}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3412
[2024-06-11 03:38:22,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1301.53 | bwd_inner_microstep: 1301.46 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 03:38:24,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.25 | bwd_microstep: 1342.50 | bwd_inner_microstep: 1342.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868
[2024-06-11 03:38:26,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.44 | bwd_microstep: 1564.90 | bwd_inner_microstep: 1564.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 03:38:28,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1554.89 | bwd_inner_microstep: 1554.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 03:38:30,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.34 | bwd_microstep: 1479.35 | bwd_inner_microstep: 1479.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 03:38:32,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.82 | bwd_microstep: 1481.31 | bwd_inner_microstep: 1481.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 03:38:34,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.27 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717
[2024-06-11 03:38:36,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1436.00 | bwd_inner_microstep: 1435.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 03:38:38,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1391.78 | bwd_inner_microstep: 1391.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 03:38:40,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1390.39 | bwd_inner_microstep: 1390.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499
[2024-06-11 03:38:42,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1417.09 | bwd_inner_microstep: 1417.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907
[2024-06-11 03:38:43,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.91 | bwd_microstep: 779.89 | bwd_inner_microstep: 779.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667
[2024-06-11 03:38:45,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1523.24 | bwd_inner_microstep: 1523.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-11 03:38:47,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.62 | bwd_microstep: 1584.54 | bwd_inner_microstep: 1584.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-11 03:38:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 03:38:51,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 03:38:53,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1281.75 | bwd_inner_microstep: 1281.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 03:38:55,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.35 | bwd_microstep: 1661.52 | bwd_inner_microstep: 1661.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616
[2024-06-11 03:38:57,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.55 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888
[2024-06-11 03:38:59,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.12 | bwd_microstep: 1085.07 | bwd_inner_microstep: 1085.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-11 03:39:01,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1431.26 | bwd_inner_microstep: 1431.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-11 03:39:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.42 | bwd_microstep: 1455.87 | bwd_inner_microstep: 1455.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-11 03:39:04,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.60 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-11 03:39:06,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.39 | bwd_microstep: 1412.73 | bwd_inner_microstep: 1412.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-11 03:39:08,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.28 | bwd_microstep: 1447.85 | bwd_inner_microstep: 1447.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-11 03:39:09,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.81 | bwd_microstep: 798.26 | bwd_inner_microstep: 798.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-11 03:39:11,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.87 | bwd_microstep: 1250.40 | bwd_inner_microstep: 1250.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3598
[2024-06-11 03:39:13,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.14 | bwd_microstep: 1213.30 | bwd_inner_microstep: 1213.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 03:39:15,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.34 | bwd_microstep: 1377.22 | bwd_inner_microstep: 1377.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233
[2024-06-11 03:39:16,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.23 | bwd_microstep: 868.50 | bwd_inner_microstep: 868.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-11 03:39:18,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.07 | bwd_microstep: 1447.01 | bwd_inner_microstep: 1446.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-11 03:39:23,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-11 03:39:23,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.76 | bwd_microstep: 4694.27 | bwd_inner_microstep: 789.46 | bwd_allreduce_microstep: 3904.75 | step_microstep: 39.03
[2024-06-11 03:39:23,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15890.51 | bwd: 46405.21 | bwd_inner: 42499.50 | bwd_allreduce: 3905.00 | step: 40.61
{'loss': 1.1253, 'learning_rate': 1.0810909040132977e-06, 'epoch': 0.9}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 03:39:25,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1268.15 | bwd_inner_microstep: 1268.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-11 03:39:26,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.89 | bwd_microstep: 694.20 | bwd_inner_microstep: 694.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 03:39:28,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1450.42 | bwd_inner_microstep: 1450.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-11 03:39:30,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.36 | bwd_microstep: 1289.19 | bwd_inner_microstep: 1289.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-11 03:39:32,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1454.86 | bwd_inner_microstep: 1454.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 03:39:33,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1384.95 | bwd_inner_microstep: 1384.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-11 03:39:35,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.10 | bwd_microstep: 1186.18 | bwd_inner_microstep: 1186.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 03:39:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 03:39:39,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.07 | bwd_microstep: 1277.76 | bwd_inner_microstep: 1277.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-11 03:39:41,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1397.20 | bwd_inner_microstep: 1397.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3684
[2024-06-11 03:39:43,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1505.59 | bwd_inner_microstep: 1505.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672
[2024-06-11 03:39:45,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.05 | bwd_microstep: 1614.29 | bwd_inner_microstep: 1614.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3680
[2024-06-11 03:39:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.32 | bwd_microstep: 1606.73 | bwd_inner_microstep: 1606.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3507
[2024-06-11 03:39:49,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.88 | bwd_microstep: 1366.39 | bwd_inner_microstep: 1366.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499
[2024-06-11 03:39:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1577.65 | bwd_inner_microstep: 1577.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644
[2024-06-11 03:39:53,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1413.54 | bwd_inner_microstep: 1413.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530
[2024-06-11 03:39:55,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1325.88 | bwd_inner_microstep: 1325.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834
[2024-06-11 03:39:57,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.65 | bwd_microstep: 1358.68 | bwd_inner_microstep: 1358.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-11 03:39:59,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.57 | bwd_microstep: 1289.55 | bwd_inner_microstep: 1289.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3149
[2024-06-11 03:40:00,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 1254.48 | bwd_inner_microstep: 1254.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574
[2024-06-11 03:40:02,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.03 | bwd_microstep: 1302.73 | bwd_inner_microstep: 1302.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-11 03:40:04,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.86 | bwd_microstep: 1311.31 | bwd_inner_microstep: 1311.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-11 03:40:06,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1493.06 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3608
[2024-06-11 03:40:08,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.23 | bwd_microstep: 1212.89 | bwd_inner_microstep: 1212.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-11 03:40:10,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1394.15 | bwd_inner_microstep: 1394.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016
[2024-06-11 03:40:11,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.40 | bwd_microstep: 710.35 | bwd_inner_microstep: 710.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 03:40:13,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1257.17 | bwd_inner_microstep: 1257.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729
[2024-06-11 03:40:14,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1366.76 | bwd_inner_microstep: 1366.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2728
[2024-06-11 03:40:16,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.04 | bwd_microstep: 1234.02 | bwd_inner_microstep: 1234.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3534
[2024-06-11 03:40:18,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.54 | bwd_microstep: 1342.23 | bwd_inner_microstep: 1342.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 03:40:20,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.69 | bwd_microstep: 1544.41 | bwd_inner_microstep: 1544.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587
[2024-06-11 03:40:25,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-11 03:40:25,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.50 | bwd_microstep: 3929.91 | bwd_inner_microstep: 1814.13 | bwd_allreduce_microstep: 2115.73 | step_microstep: 39.24
[2024-06-11 03:40:25,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16110.17 | bwd: 45200.12 | bwd_inner: 43083.49 | bwd_allreduce: 2115.96 | step: 40.83
{'loss': 1.2279, 'learning_rate': 1.0689509965436918e-06, 'epoch': 0.9}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 03:40:26,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1276.50 | bwd_inner_microstep: 1276.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-11 03:40:28,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.96 | bwd_microstep: 1276.32 | bwd_inner_microstep: 1276.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3895
[2024-06-11 03:40:31,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 1682.88 | bwd_inner_microstep: 1682.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 03:40:33,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1646.86 | bwd_inner_microstep: 1646.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3478
[2024-06-11 03:40:35,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.18 | bwd_microstep: 1261.66 | bwd_inner_microstep: 1261.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480
[2024-06-11 03:40:36,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.95 | bwd_microstep: 1213.86 | bwd_inner_microstep: 1213.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 03:40:38,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1484.12 | bwd_inner_microstep: 1484.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 03:40:40,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.63 | bwd_microstep: 1288.53 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869
[2024-06-11 03:40:41,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.56 | bwd_microstep: 714.77 | bwd_inner_microstep: 714.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942
[2024-06-11 03:40:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.52 | bwd_microstep: 819.00 | bwd_inner_microstep: 818.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 03:40:44,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1381.44 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-11 03:40:46,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.12 | bwd_microstep: 1581.44 | bwd_inner_microstep: 1581.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3699
[2024-06-11 03:40:49,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.22 | bwd_microstep: 1726.41 | bwd_inner_microstep: 1726.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 03:40:51,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1375.41 | bwd_inner_microstep: 1375.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2133
[2024-06-11 03:40:52,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.59 | bwd_microstep: 928.85 | bwd_inner_microstep: 928.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 03:40:54,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1345.83 | bwd_inner_microstep: 1345.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3421
[2024-06-11 03:40:55,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.29 | bwd_microstep: 1230.97 | bwd_inner_microstep: 1230.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677
[2024-06-11 03:40:58,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1629.57 | bwd_inner_microstep: 1629.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506
[2024-06-11 03:41:00,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1348.67 | bwd_inner_microstep: 1348.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970
[2024-06-11 03:41:01,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.95 | bwd_microstep: 796.81 | bwd_inner_microstep: 796.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 03:41:02,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.20 | bwd_microstep: 1186.48 | bwd_inner_microstep: 1186.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836
[2024-06-11 03:41:04,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.07 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1558.69 | bwd_allreduce_microstep: 0.31 | step_microstep: 0.36
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2302
[2024-06-11 03:41:06,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.57 | bwd_microstep: 848.31 | bwd_inner_microstep: 848.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 03:41:07,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1255.94 | bwd_inner_microstep: 1255.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 03:41:09,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.59 | bwd_microstep: 1301.49 | bwd_inner_microstep: 1301.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613
[2024-06-11 03:41:11,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1514.73 | bwd_inner_microstep: 1514.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-11 03:41:13,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1611.84 | bwd_inner_microstep: 1611.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280
[2024-06-11 03:41:15,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.13 | bwd_microstep: 879.88 | bwd_inner_microstep: 879.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 03:41:16,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3427
[2024-06-11 03:41:18,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.00 | bwd_microstep: 1376.64 | bwd_inner_microstep: 1376.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 03:41:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1414.17 | bwd_inner_microstep: 1413.91 | bwd_allreduce_microstep: 0.21 | step_microstep: 0.30
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-11 03:41:27,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-11 03:41:27,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 5952.00 | bwd_inner_microstep: 1677.89 | bwd_allreduce_microstep: 4274.05 | step_microstep: 39.07
[2024-06-11 03:41:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15650.31 | bwd: 46197.20 | bwd_inner: 41921.57 | bwd_allreduce: 4274.78 | step: 41.20
{'loss': 1.1327, 'learning_rate': 1.0568777641302663e-06, 'epoch': 0.9}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2055
[2024-06-11 03:41:28,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.03 | bwd_microstep: 872.59 | bwd_inner_microstep: 872.41 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2342
[2024-06-11 03:41:29,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.77 | bwd_microstep: 984.04 | bwd_inner_microstep: 984.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-11 03:41:32,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1480.34 | bwd_inner_microstep: 1480.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-11 03:41:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1543.66 | bwd_inner_microstep: 1543.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 03:41:35,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-11 03:41:37,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1396.02 | bwd_inner_microstep: 1395.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 03:41:39,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-11 03:41:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 793.15 | bwd_inner_microstep: 793.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 03:41:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1245.67 | bwd_inner_microstep: 1245.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-11 03:41:44,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.31 | bwd_microstep: 1424.10 | bwd_inner_microstep: 1424.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 03:41:46,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992
[2024-06-11 03:41:47,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.50 | bwd_microstep: 892.76 | bwd_inner_microstep: 892.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3667
[2024-06-11 03:41:49,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.65 | bwd_microstep: 1687.58 | bwd_inner_microstep: 1687.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 03:41:51,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1352.52 | bwd_inner_microstep: 1352.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 03:41:53,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.72 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-11 03:41:55,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.34 | bwd_microstep: 1477.55 | bwd_inner_microstep: 1477.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2144
[2024-06-11 03:41:56,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.42 | bwd_microstep: 932.35 | bwd_inner_microstep: 932.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465
[2024-06-11 03:41:58,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1477.72 | bwd_inner_microstep: 1477.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-11 03:42:01,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.73 | bwd_microstep: 1584.63 | bwd_inner_microstep: 1584.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-11 03:42:03,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.88 | bwd_microstep: 1498.07 | bwd_inner_microstep: 1498.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 03:42:04,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 03:42:06,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1253.33 | bwd_inner_microstep: 1253.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-11 03:42:08,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.09 | bwd_microstep: 1309.16 | bwd_inner_microstep: 1309.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609
[2024-06-11 03:42:10,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.59 | bwd_microstep: 1406.77 | bwd_inner_microstep: 1406.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555
[2024-06-11 03:42:12,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.34 | bwd_microstep: 1297.31 | bwd_inner_microstep: 1297.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-11 03:42:14,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1431.87 | bwd_inner_microstep: 1431.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-11 03:42:16,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1433.39 | bwd_inner_microstep: 1433.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603
[2024-06-11 03:42:18,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.07 | bwd_microstep: 1411.33 | bwd_inner_microstep: 1411.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 03:42:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1511.84 | bwd_inner_microstep: 1511.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956
[2024-06-11 03:42:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 701.67 | bwd_inner_microstep: 701.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3579
[2024-06-11 03:42:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.45 | bwd_microstep: 1697.76 | bwd_inner_microstep: 1697.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-11 03:42:28,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.10 | optimizer_step: 6.62
[2024-06-11 03:42:28,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 3886.82 | bwd_inner_microstep: 1630.54 | bwd_allreduce_microstep: 2256.23 | step_microstep: 37.98
[2024-06-11 03:42:28,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15746.29 | bwd: 44535.30 | bwd_inner: 42278.03 | bwd_allreduce: 2256.53 | step: 39.60
██████▉ | 1548/1726 [26:59:56<3:35:32, 72.65s/it]
 90%|████████▉ | 1549/1726 [27:00:57<3:24:13, 69.23s/it]


 90%|████████▉ | 1549/1726 [27:00:57<3:24:13, 69.23s/it]
 90%|████████▉ | 1550/1726 [27:02:00<3:17:16, 67.25s/it]


 90%|████████▉ | 1550/1726 [27:02:00<3:17:16, 67.25s/it]
 90%|████████▉ | 1551/1726 [27:03:01<3:11:15, 65.57s/it]


 90%|████████▉ | 1551/1726 [27:03:01<3:11:15, 65.57s/it]
 90%|████████▉ | 1552/1726 [27:04:04<3:07:13, 64.56s/it]


 90%|████████▉ | 1552/1726 [27:04:04<3:07:13, 64.56s/it]
 90%|████████▉ | 1553/1726 [27:05:04<3:02:44, 63.38s/it]
                                                     {'loss': 1.1744, 'learning_rate': 1.0448712492948743e-06, 'epoch': 0.9}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457
[2024-06-11 03:42:30,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.18 | bwd_microstep: 1563.21 | bwd_inner_microstep: 1563.14 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.23
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3620
[2024-06-11 03:42:32,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.22 | bwd_microstep: 1537.97 | bwd_inner_microstep: 1537.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-11 03:42:34,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.24 | bwd_microstep: 1656.30 | bwd_inner_microstep: 1656.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-11 03:42:36,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1550.28 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3576
[2024-06-11 03:42:38,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.91 | bwd_microstep: 1363.76 | bwd_inner_microstep: 1363.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-11 03:42:40,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.52 | bwd_microstep: 1643.83 | bwd_inner_microstep: 1643.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3718
[2024-06-11 03:42:42,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.37 | bwd_microstep: 1296.81 | bwd_inner_microstep: 1296.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754
[2024-06-11 03:42:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.71 | bwd_microstep: 1541.59 | bwd_inner_microstep: 1541.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420
[2024-06-11 03:42:46,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.20 | bwd_microstep: 1153.62 | bwd_inner_microstep: 1153.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-11 03:42:48,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147
[2024-06-11 03:42:49,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.21 | bwd_microstep: 946.17 | bwd_inner_microstep: 946.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3542
[2024-06-11 03:42:51,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1355.47 | bwd_inner_microstep: 1355.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-11 03:42:53,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.15 | bwd_microstep: 1488.97 | bwd_inner_microstep: 1488.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-11 03:42:54,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.07 | bwd_microstep: 706.23 | bwd_inner_microstep: 706.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-11 03:42:55,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 793.43 | bwd_inner_microstep: 793.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547
[2024-06-11 03:42:57,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.80 | bwd_microstep: 1454.91 | bwd_inner_microstep: 1454.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-11 03:42:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1298.38 | bwd_inner_microstep: 1298.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-11 03:43:00,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.77 | bwd_microstep: 978.15 | bwd_inner_microstep: 978.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903
[2024-06-11 03:43:02,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.35 | bwd_microstep: 1592.78 | bwd_inner_microstep: 1592.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 03:43:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1610.16 | bwd_inner_microstep: 1610.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-11 03:43:06,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.96 | bwd_microstep: 801.19 | bwd_inner_microstep: 801.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131
[2024-06-11 03:43:07,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.79 | bwd_microstep: 927.23 | bwd_inner_microstep: 927.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 03:43:09,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.83 | bwd_microstep: 1500.47 | bwd_inner_microstep: 1500.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969
[2024-06-11 03:43:10,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.18 | bwd_microstep: 828.95 | bwd_inner_microstep: 828.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 03:43:12,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.87 | bwd_microstep: 1259.66 | bwd_inner_microstep: 1259.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 03:43:14,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1391.89 | bwd_inner_microstep: 1391.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 03:43:16,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1557.56 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3566
[2024-06-11 03:43:18,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1544.03 | bwd_inner_microstep: 1544.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3813
[2024-06-11 03:43:20,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1501.44 | bwd_inner_microstep: 1501.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-11 03:43:23,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.89 | bwd_microstep: 1602.87 | bwd_inner_microstep: 1602.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3801
[2024-06-11 03:43:25,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.46 | bwd_microstep: 1687.14 | bwd_inner_microstep: 1687.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-11 03:43:30,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.55 | optimizer_gradients: 4.24 | optimizer_step: 6.61
[2024-06-11 03:43:30,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 4542.39 | bwd_inner_microstep: 2137.04 | bwd_allreduce_microstep: 2405.27 | step_microstep: 40.25
[2024-06-11 03:43:30,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16116.30 | bwd: 46058.33 | bwd_inner: 43652.07 | bwd_allreduce: 2405.55 | step: 41.96
{'loss': 1.1841, 'learning_rate': 1.0329314943244117e-06, 'epoch': 0.9}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 03:43:32,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.89 | bwd_microstep: 1475.31 | bwd_inner_microstep: 1475.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 03:43:34,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1278.42 | bwd_inner_microstep: 1278.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-11 03:43:36,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1455.98 | bwd_inner_microstep: 1455.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586
[2024-06-11 03:43:38,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1405.66 | bwd_inner_microstep: 1405.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2310
[2024-06-11 03:43:39,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.35 | bwd_microstep: 884.81 | bwd_inner_microstep: 884.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-11 03:43:41,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.42 | bwd_microstep: 1285.38 | bwd_inner_microstep: 1285.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 03:43:43,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1385.62 | bwd_inner_microstep: 1385.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 03:43:45,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-11 03:43:47,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1531.12 | bwd_inner_microstep: 1531.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 03:43:48,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.78 | bwd_microstep: 1255.65 | bwd_inner_microstep: 1255.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 03:43:50,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.06 | bwd_inner_microstep: 1385.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945
[2024-06-11 03:43:51,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.49 | bwd_microstep: 729.98 | bwd_inner_microstep: 729.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 03:43:53,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1257.33 | bwd_inner_microstep: 1257.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-11 03:43:54,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.64 | bwd_microstep: 793.60 | bwd_inner_microstep: 793.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 03:43:56,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1343.00 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 03:43:58,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3606
[2024-06-11 03:44:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.25 | bwd_microstep: 1704.83 | bwd_inner_microstep: 1704.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 03:44:02,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1353.89 | bwd_inner_microstep: 1353.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 03:44:04,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1256.87 | bwd_inner_microstep: 1256.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-11 03:44:06,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1262.20 | bwd_inner_microstep: 1262.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 03:44:08,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1496.54 | bwd_inner_microstep: 1496.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550
[2024-06-11 03:44:10,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1402.25 | bwd_inner_microstep: 1402.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100
[2024-06-11 03:44:11,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.51 | bwd_microstep: 921.10 | bwd_inner_microstep: 921.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-11 03:44:12,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.43 | bwd_microstep: 802.32 | bwd_inner_microstep: 802.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-11 03:44:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.26 | bwd_microstep: 1660.65 | bwd_inner_microstep: 1660.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 03:44:17,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.40 | bwd_microstep: 1660.00 | bwd_inner_microstep: 1659.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2291
[2024-06-11 03:44:18,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 1073.41 | bwd_inner_microstep: 1073.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3834
[2024-06-11 03:44:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.48 | bwd_microstep: 1502.54 | bwd_inner_microstep: 1502.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3806
[2024-06-11 03:44:22,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1355.33 | bwd_inner_microstep: 1355.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813
[2024-06-11 03:44:24,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1357.82 | bwd_inner_microstep: 1357.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-11 03:44:25,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 878.80 | bwd_inner_microstep: 878.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-11 03:44:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.15 | optimizer_step: 6.58
[2024-06-11 03:44:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 4906.23 | bwd_inner_microstep: 1616.17 | bwd_allreduce_microstep: 3290.00 | step_microstep: 38.71
[2024-06-11 03:44:31,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15462.03 | bwd: 44725.28 | bwd_inner: 41434.38 | bwd_allreduce: 3290.23 | step: 40.22
{'loss': 1.1215, 'learning_rate': 1.0210585412706187e-06, 'epoch': 0.9}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-11 03:44:32,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.87 | bwd_microstep: 1266.03 | bwd_inner_microstep: 1265.82 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574
[2024-06-11 03:44:34,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1428.65 | bwd_inner_microstep: 1428.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-11 03:44:36,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 1449.83 | bwd_inner_microstep: 1449.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 03:44:38,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1386.78 | bwd_inner_microstep: 1386.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787
[2024-06-11 03:44:40,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.77 | bwd_microstep: 1644.36 | bwd_inner_microstep: 1644.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 03:44:42,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1246.68 | bwd_inner_microstep: 1246.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 03:44:44,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-11 03:44:46,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1348.38 | bwd_inner_microstep: 1348.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 03:44:48,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1249.65 | bwd_inner_microstep: 1249.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 03:44:50,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1481.24 | bwd_inner_microstep: 1481.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447
[2024-06-11 03:44:52,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1375.94 | bwd_inner_microstep: 1375.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-11 03:44:54,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.07 | bwd_microstep: 1523.20 | bwd_inner_microstep: 1523.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-11 03:44:56,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 03:44:57,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1343.13 | bwd_inner_microstep: 1343.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3399
[2024-06-11 03:44:59,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.54 | bwd_microstep: 1310.61 | bwd_inner_microstep: 1310.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972
[2024-06-11 03:45:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.35 | bwd_microstep: 733.96 | bwd_inner_microstep: 733.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472
[2024-06-11 03:45:02,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.14 | bwd_microstep: 1315.30 | bwd_inner_microstep: 1315.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3525
[2024-06-11 03:45:04,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1437.95 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 03:45:06,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.55 | bwd_microstep: 1551.54 | bwd_inner_microstep: 1551.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-11 03:45:07,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.46 | bwd_microstep: 806.41 | bwd_inner_microstep: 806.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-11 03:45:09,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.12 | bwd_microstep: 1534.93 | bwd_inner_microstep: 1534.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3830
[2024-06-11 03:45:12,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.43 | bwd_microstep: 1824.83 | bwd_inner_microstep: 1824.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 03:45:14,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.12 | bwd_microstep: 1288.93 | bwd_inner_microstep: 1288.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2029
[2024-06-11 03:45:15,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.83 | bwd_microstep: 779.29 | bwd_inner_microstep: 779.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 03:45:17,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.55 | bwd_microstep: 1404.63 | bwd_inner_microstep: 1404.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-11 03:45:19,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1509.13 | bwd_inner_microstep: 1509.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 03:45:21,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1254.50 | bwd_inner_microstep: 1254.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3768
[2024-06-11 03:45:23,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.26 | bwd_microstep: 1847.69 | bwd_inner_microstep: 1847.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3581
[2024-06-11 03:45:25,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1333.38 | bwd_inner_microstep: 1333.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1988
[2024-06-11 03:45:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.09 | bwd_microstep: 707.07 | bwd_inner_microstep: 707.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2034
[2024-06-11 03:45:27,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.94 | bwd_microstep: 716.17 | bwd_inner_microstep: 716.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3560
[2024-06-11 03:45:31,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.59
[2024-06-11 03:45:31,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.99 | bwd_microstep: 3665.88 | bwd_inner_microstep: 2237.55 | bwd_allreduce_microstep: 1428.26 | step_microstep: 40.30
[2024-06-11 03:45:31,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15920.43 | bwd: 44495.04 | bwd_inner: 43065.70 | bwd_allreduce: 1428.59 | step: 43.07
{'loss': 1.2027, 'learning_rate': 1.0092524319499853e-06, 'epoch': 0.9}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-11 03:45:32,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 780.88 | bwd_inner_microstep: 780.82 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2380
[2024-06-11 03:45:34,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.10 | bwd_microstep: 960.60 | bwd_inner_microstep: 960.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 03:45:35,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1250.89 | bwd_inner_microstep: 1250.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2343
[2024-06-11 03:45:37,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.68 | bwd_microstep: 984.77 | bwd_inner_microstep: 984.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-11 03:45:39,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1456.05 | bwd_inner_microstep: 1456.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760
[2024-06-11 03:45:41,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.90 | bwd_microstep: 1640.49 | bwd_inner_microstep: 1640.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 03:45:43,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433
[2024-06-11 03:45:45,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.16 | bwd_microstep: 1280.45 | bwd_inner_microstep: 1280.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 03:45:47,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 03:45:49,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1482.43 | bwd_inner_microstep: 1482.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 03:45:51,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1389.07 | bwd_inner_microstep: 1389.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002
[2024-06-11 03:45:52,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 896.47 | bwd_inner_microstep: 896.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1950
[2024-06-11 03:45:53,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.55 | bwd_microstep: 822.40 | bwd_inner_microstep: 822.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 03:45:55,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1250.24 | bwd_inner_microstep: 1250.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424
[2024-06-11 03:45:57,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1539.95 | bwd_inner_microstep: 1539.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3631
[2024-06-11 03:45:59,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1572.79 | bwd_inner_microstep: 1572.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3668
[2024-06-11 03:46:01,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.04 | bwd_microstep: 1358.66 | bwd_inner_microstep: 1358.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681
[2024-06-11 03:46:03,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1525.44 | bwd_inner_microstep: 1525.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2010
[2024-06-11 03:46:04,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.26 | bwd_microstep: 709.93 | bwd_inner_microstep: 709.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122
[2024-06-11 03:46:05,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.10 | bwd_microstep: 831.93 | bwd_inner_microstep: 831.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 03:46:07,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1383.91 | bwd_inner_microstep: 1383.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-11 03:46:09,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558
[2024-06-11 03:46:11,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.80 | bwd_microstep: 1430.17 | bwd_inner_microstep: 1430.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-11 03:46:13,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 1499.34 | bwd_inner_microstep: 1499.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 03:46:15,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1275.96 | bwd_inner_microstep: 1275.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-11 03:46:17,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1425.81 | bwd_inner_microstep: 1425.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3796
[2024-06-11 03:46:19,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.75 | bwd_microstep: 1506.94 | bwd_inner_microstep: 1506.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478
[2024-06-11 03:46:21,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.38 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-11 03:46:23,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1594.45 | bwd_inner_microstep: 1594.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430
[2024-06-11 03:46:25,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1408.37 | bwd_inner_microstep: 1408.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 03:46:27,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.52 | bwd_microstep: 1552.91 | bwd_inner_microstep: 1552.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580
[2024-06-11 03:46:36,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.42 | optimizer_step: 6.64
[2024-06-11 03:46:36,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.35 | bwd_microstep: 8128.38 | bwd_inner_microstep: 1652.10 | bwd_allreduce_microstep: 6476.21 | step_microstep: 40.14
[2024-06-11 03:46:36,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15650.46 | bwd: 48417.90 | bwd_inner: 41940.72 | bwd_allreduce: 6476.47 | step: 41.72
{'loss': 1.1824, 'learning_rate': 9.975132079435635e-07, 'epoch': 0.9}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466
[2024-06-11 03:46:38,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.68 | bwd_microstep: 1403.54 | bwd_inner_microstep: 1403.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-11 03:46:39,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 792.94 | bwd_inner_microstep: 792.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882
[2024-06-11 03:46:41,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.21 | bwd_microstep: 1489.36 | bwd_inner_microstep: 1489.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868
[2024-06-11 03:46:42,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.42 | bwd_microstep: 677.69 | bwd_inner_microstep: 677.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-11 03:46:44,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1374.05 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 03:46:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1403.12 | bwd_inner_microstep: 1403.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 03:46:48,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1390.08 | bwd_inner_microstep: 1390.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 03:46:49,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1246.37 | bwd_inner_microstep: 1246.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-11 03:46:51,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.30 | bwd_microstep: 1436.40 | bwd_inner_microstep: 1436.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 03:46:53,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1398.85 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2469
[2024-06-11 03:46:55,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.26 | bwd_microstep: 1021.01 | bwd_inner_microstep: 1020.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 03:46:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.50 | bwd_microstep: 1387.79 | bwd_inner_microstep: 1387.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2131
[2024-06-11 03:46:58,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.16 | bwd_microstep: 987.87 | bwd_inner_microstep: 987.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 03:47:00,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.36 | bwd_microstep: 1487.17 | bwd_inner_microstep: 1487.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697
[2024-06-11 03:47:02,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.88 | bwd_microstep: 1447.07 | bwd_inner_microstep: 1447.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3497
[2024-06-11 03:47:04,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.02 | bwd_microstep: 1613.06 | bwd_inner_microstep: 1613.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505
[2024-06-11 03:47:06,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1418.37 | bwd_inner_microstep: 1418.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 03:47:08,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.78 | bwd_microstep: 1491.44 | bwd_inner_microstep: 1491.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-11 03:47:10,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1357.02 | bwd_inner_microstep: 1356.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1914
[2024-06-11 03:47:11,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.78 | bwd_microstep: 752.33 | bwd_inner_microstep: 752.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 03:47:13,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 1390.93 | bwd_inner_microstep: 1390.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537
[2024-06-11 03:47:15,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1327.55 | bwd_inner_microstep: 1327.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2236
[2024-06-11 03:47:16,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.18 | bwd_microstep: 866.39 | bwd_inner_microstep: 866.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532
[2024-06-11 03:47:18,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1426.13 | bwd_inner_microstep: 1426.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612
[2024-06-11 03:47:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1342.60 | bwd_inner_microstep: 1342.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 03:47:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.49 | bwd_microstep: 1550.06 | bwd_inner_microstep: 1550.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2189
[2024-06-11 03:47:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.76 | bwd_microstep: 955.62 | bwd_inner_microstep: 955.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533
[2024-06-11 03:47:26,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.93 | bwd_microstep: 1584.16 | bwd_inner_microstep: 1584.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-11 03:47:28,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.24 | bwd_microstep: 1744.36 | bwd_inner_microstep: 1744.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572
[2024-06-11 03:47:30,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1491.64 | bwd_inner_microstep: 1491.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3447
[2024-06-11 03:47:32,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1414.57 | bwd_inner_microstep: 1414.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-11 03:47:37,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.46 | optimizer_step: 6.60
[2024-06-11 03:47:37,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 4660.22 | bwd_inner_microstep: 1467.57 | bwd_allreduce_microstep: 3192.58 | step_microstep: 39.91
[2024-06-11 03:47:37,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15720.71 | bwd: 45329.78 | bwd_inner: 42136.27 | bwd_allreduce: 3192.82 | step: 41.77


 90%|████████▉ | 1553/1726 [27:05:04<3:02:44, 63.38s/it]
 90%|█████████ | 1554/1726 [27:06:07<3:00:57, 63.13s/it]


 90%|█████████ | 1554/1726 [27:06:07<3:00:57, 63.13s/it]
 90%|█████████ | 1555/1726 [27:07:07<2:57:40, 62.34s/it]


 90%|█████████ | 1555/1726 [27:07:07<2:57:40, 62.34s/it]
 90%|█████████ | 1556/1726 [27:08:08<2:55:18, 61.87s/it]


 90%|█████████ | 1556/1726 [27:08:08<2:55:18, 61.87s/it]
 90%|█████████ | 1557/1726 [27:09:12<2:56:24, 62.63s/it]


 90%|█████████ | 1557/1726 [27:09:12<2:56:24, 62.63s/it]
 90%|█████████ | 1558/1726 [27:10:14<2:54:20, 62.27s/it]
                                     {'loss': 1.1748, 'learning_rate': 9.858409105968337e-07, 'epoch': 0.9}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-11 03:47:39,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.93 | bwd_microstep: 1476.25 | bwd_inner_microstep: 1476.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 03:47:41,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1244.87 | bwd_inner_microstep: 1244.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899
[2024-06-11 03:47:43,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.13 | bwd_microstep: 1582.56 | bwd_inner_microstep: 1582.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 03:47:45,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1283.88 | bwd_inner_microstep: 1283.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 03:47:47,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1251.56 | bwd_inner_microstep: 1251.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 03:47:49,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1387.53 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-11 03:47:51,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1631.71 | bwd_inner_microstep: 1631.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-11 03:47:52,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.68 | bwd_microstep: 1194.74 | bwd_inner_microstep: 1194.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 722
[2024-06-11 03:47:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.17 | bwd_microstep: 295.69 | bwd_inner_microstep: 295.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1968
[2024-06-11 03:47:54,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.90 | bwd_microstep: 854.40 | bwd_inner_microstep: 854.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623
[2024-06-11 03:47:56,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1407.09 | bwd_inner_microstep: 1407.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3418
[2024-06-11 03:47:58,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.85 | bwd_microstep: 1210.04 | bwd_inner_microstep: 1210.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658
[2024-06-11 03:48:00,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.91 | bwd_microstep: 1519.07 | bwd_inner_microstep: 1519.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3963
[2024-06-11 03:48:02,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.85 | bwd_microstep: 1700.55 | bwd_inner_microstep: 1700.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 03:48:04,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1284.35 | bwd_inner_microstep: 1284.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 03:48:06,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1275.40 | bwd_inner_microstep: 1275.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 03:48:08,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1380.33 | bwd_inner_microstep: 1380.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 03:48:10,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.40 | bwd_microstep: 1461.33 | bwd_inner_microstep: 1461.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3508
[2024-06-11 03:48:11,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1191.48 | bwd_inner_microstep: 1191.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697
[2024-06-11 03:48:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1531.42 | bwd_inner_microstep: 1531.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-11 03:48:15,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.22 | bwd_microstep: 1150.02 | bwd_inner_microstep: 1149.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819
[2024-06-11 03:48:17,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.42 | bwd_microstep: 1359.37 | bwd_inner_microstep: 1359.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3710
[2024-06-11 03:48:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.18 | bwd_microstep: 1660.84 | bwd_inner_microstep: 1660.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2087
[2024-06-11 03:48:20,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.46 | bwd_microstep: 1016.38 | bwd_inner_microstep: 1016.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564
[2024-06-11 03:48:22,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1330.99 | bwd_inner_microstep: 1330.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-11 03:48:24,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.45 | bwd_microstep: 976.79 | bwd_inner_microstep: 976.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3758
[2024-06-11 03:48:25,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.68 | bwd_microstep: 1279.17 | bwd_inner_microstep: 1279.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-11 03:48:28,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1495.97 | bwd_inner_microstep: 1495.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-11 03:48:30,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.00 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-11 03:48:32,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1572.36 | bwd_inner_microstep: 1572.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-11 03:48:34,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1344.48 | bwd_inner_microstep: 1344.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-11 03:48:38,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 03:48:38,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.45 | bwd_microstep: 3394.43 | bwd_inner_microstep: 1810.71 | bwd_allreduce_microstep: 1583.67 | step_microstep: 38.16
[2024-06-11 03:48:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.63 | bwd: 44400.09 | bwd_inner: 42815.44 | bwd_allreduce: 1583.95 | step: 39.74
{'loss': 1.1705, 'learning_rate': 9.742355810195804e-07, 'epoch': 0.9}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559
[2024-06-11 03:48:40,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1322.90 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862
[2024-06-11 03:48:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.91 | bwd_microstep: 1565.02 | bwd_inner_microstep: 1565.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830
[2024-06-11 03:48:44,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1355.26 | bwd_inner_microstep: 1355.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 03:48:46,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1282.10 | bwd_inner_microstep: 1282.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-11 03:48:47,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.27 | bwd_microstep: 1406.81 | bwd_inner_microstep: 1406.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524
[2024-06-11 03:48:49,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1324.57 | bwd_inner_microstep: 1324.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 03:48:51,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1383.71 | bwd_inner_microstep: 1383.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-11 03:48:52,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.71 | bwd_microstep: 686.93 | bwd_inner_microstep: 686.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707
[2024-06-11 03:48:54,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.46 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-11 03:48:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1150.00 | bwd_inner_microstep: 1149.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2153
[2024-06-11 03:48:57,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.07 | bwd_microstep: 899.95 | bwd_inner_microstep: 899.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 03:48:59,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1375.17 | bwd_inner_microstep: 1375.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2624
[2024-06-11 03:49:00,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.39 | bwd_microstep: 1017.25 | bwd_inner_microstep: 1017.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 03:49:02,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1353.06 | bwd_inner_microstep: 1353.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-11 03:49:04,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1648.59 | bwd_inner_microstep: 1648.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 03:49:06,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1255.38 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466
[2024-06-11 03:49:08,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.24 | bwd_microstep: 1183.22 | bwd_inner_microstep: 1183.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627
[2024-06-11 03:49:10,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1443.33 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577
[2024-06-11 03:49:12,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1407.72 | bwd_inner_microstep: 1407.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3508
[2024-06-11 03:49:14,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1681.75 | bwd_inner_microstep: 1681.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-11 03:49:15,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.51 | bwd_microstep: 789.18 | bwd_inner_microstep: 789.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 03:49:17,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1390.92 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-11 03:49:18,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.66 | bwd_microstep: 787.17 | bwd_inner_microstep: 787.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443
[2024-06-11 03:49:20,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.78 | bwd_microstep: 1158.24 | bwd_inner_microstep: 1158.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 03:49:22,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3760
[2024-06-11 03:49:24,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.75 | bwd_microstep: 1468.67 | bwd_inner_microstep: 1468.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3813
[2024-06-11 03:49:26,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.04 | bwd_microstep: 1603.25 | bwd_inner_microstep: 1603.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3568
[2024-06-11 03:49:28,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1526.61 | bwd_inner_microstep: 1526.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-11 03:49:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.60 | bwd_microstep: 1396.24 | bwd_inner_microstep: 1396.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771
[2024-06-11 03:49:32,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.94 | bwd_microstep: 1543.11 | bwd_inner_microstep: 1543.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-11 03:49:34,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.32 | bwd_microstep: 1160.23 | bwd_inner_microstep: 1160.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770
[2024-06-11 03:49:38,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-11 03:49:38,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.19 | bwd_microstep: 3808.90 | bwd_inner_microstep: 1870.20 | bwd_allreduce_microstep: 1938.64 | step_microstep: 38.82
[2024-06-11 03:49:38,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15747.68 | bwd: 44086.75 | bwd_inner: 42147.20 | bwd_allreduce: 1938.88 | step: 40.38
{'loss': 1.0872, 'learning_rate': 9.626972600856966e-07, 'epoch': 0.9}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 03:49:40,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1363.83 | bwd_inner_microstep: 1363.71 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441
[2024-06-11 03:49:41,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.61 | bwd_microstep: 945.49 | bwd_inner_microstep: 945.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841
[2024-06-11 03:49:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.01 | bwd_microstep: 1560.41 | bwd_inner_microstep: 1560.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2339
[2024-06-11 03:49:45,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.13 | bwd_microstep: 891.70 | bwd_inner_microstep: 891.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 03:49:46,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1291.69 | bwd_inner_microstep: 1291.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 03:49:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1384.46 | bwd_inner_microstep: 1384.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-11 03:49:50,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.18 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3499
[2024-06-11 03:49:52,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.47 | bwd_microstep: 1221.59 | bwd_inner_microstep: 1221.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 03:49:54,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1251.42 | bwd_inner_microstep: 1251.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 03:49:56,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-11 03:49:57,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.47 | bwd_microstep: 1283.00 | bwd_inner_microstep: 1282.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-11 03:49:59,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1342.46 | bwd_inner_microstep: 1342.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-11 03:50:01,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1315.41 | bwd_inner_microstep: 1315.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099
[2024-06-11 03:50:02,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.38 | bwd_microstep: 923.29 | bwd_inner_microstep: 923.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 03:50:04,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1485.55 | bwd_inner_microstep: 1485.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 03:50:06,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1483.39 | bwd_inner_microstep: 1483.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3637
[2024-06-11 03:50:09,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.13 | bwd_microstep: 1647.54 | bwd_inner_microstep: 1647.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-11 03:50:10,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.60 | bwd_microstep: 1183.70 | bwd_inner_microstep: 1183.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3857
[2024-06-11 03:50:12,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1317.29 | bwd_inner_microstep: 1317.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3625
[2024-06-11 03:50:14,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.16 | bwd_microstep: 1457.00 | bwd_inner_microstep: 1456.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933
[2024-06-11 03:50:15,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.31 | bwd_microstep: 727.39 | bwd_inner_microstep: 727.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 03:50:17,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.40 | bwd_microstep: 1285.45 | bwd_inner_microstep: 1285.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-11 03:50:19,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1355.49 | bwd_inner_microstep: 1355.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-11 03:50:21,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1388.12 | bwd_inner_microstep: 1388.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2183
[2024-06-11 03:50:22,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.68 | bwd_microstep: 919.98 | bwd_inner_microstep: 919.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3571
[2024-06-11 03:50:24,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1482.30 | bwd_inner_microstep: 1482.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-11 03:50:26,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.56 | bwd_microstep: 1161.93 | bwd_inner_microstep: 1161.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813
[2024-06-11 03:50:28,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.64 | bwd_microstep: 1460.60 | bwd_inner_microstep: 1460.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-11 03:50:29,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.10 | bwd_microstep: 1337.03 | bwd_inner_microstep: 1337.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1944
[2024-06-11 03:50:31,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.48 | bwd_microstep: 823.70 | bwd_inner_microstep: 823.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3815
[2024-06-11 03:50:33,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.96 | bwd_microstep: 1853.58 | bwd_inner_microstep: 1853.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3432
[2024-06-11 03:50:41,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.14 | optimizer_step: 6.60
[2024-06-11 03:50:41,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.04 | bwd_microstep: 6918.94 | bwd_inner_microstep: 1565.26 | bwd_allreduce_microstep: 5353.62 | step_microstep: 39.02
[2024-06-11 03:50:41,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15507.40 | bwd: 46791.57 | bwd_inner: 41436.93 | bwd_allreduce: 5353.90 | step: 40.58
{'loss': 1.1616, 'learning_rate': 9.512259884331021e-07, 'epoch': 0.9}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-11 03:50:43,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1460.67 | bwd_inner_microstep: 1460.60 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 03:50:44,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.02 | bwd_microstep: 1281.84 | bwd_inner_microstep: 1281.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2369
[2024-06-11 03:50:46,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.74 | bwd_microstep: 993.65 | bwd_inner_microstep: 993.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410
[2024-06-11 03:50:48,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1281.55 | bwd_inner_microstep: 1281.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 03:50:50,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1383.71 | bwd_inner_microstep: 1383.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871
[2024-06-11 03:50:50,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.36 | bwd_microstep: 677.83 | bwd_inner_microstep: 677.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455
[2024-06-11 03:50:52,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1455.77 | bwd_inner_microstep: 1455.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764
[2024-06-11 03:50:55,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1540.59 | bwd_inner_microstep: 1540.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-11 03:50:56,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.12 | bwd_microstep: 1188.53 | bwd_inner_microstep: 1188.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567
[2024-06-11 03:50:58,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.87 | bwd_microstep: 1597.13 | bwd_inner_microstep: 1597.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728
[2024-06-11 03:51:01,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.17 | bwd_microstep: 1625.47 | bwd_inner_microstep: 1625.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923
[2024-06-11 03:51:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.72 | bwd_microstep: 818.47 | bwd_inner_microstep: 818.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662
[2024-06-11 03:51:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.78 | bwd_microstep: 1624.42 | bwd_inner_microstep: 1624.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655
[2024-06-11 03:51:06,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1526.16 | bwd_inner_microstep: 1526.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912
[2024-06-11 03:51:07,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.12 | bwd_microstep: 683.98 | bwd_inner_microstep: 683.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-11 03:51:09,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.06 | bwd_microstep: 1534.61 | bwd_inner_microstep: 1534.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2110
[2024-06-11 03:51:10,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.29 | bwd_microstep: 824.59 | bwd_inner_microstep: 824.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684
[2024-06-11 03:51:13,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1627.06 | bwd_inner_microstep: 1627.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823
[2024-06-11 03:51:15,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.28 | bwd_microstep: 1484.52 | bwd_inner_microstep: 1484.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 03:51:16,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1254.55 | bwd_inner_microstep: 1254.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520
[2024-06-11 03:51:18,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1220.62 | bwd_inner_microstep: 1220.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3388
[2024-06-11 03:51:20,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.53 | bwd_microstep: 1275.05 | bwd_inner_microstep: 1275.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 03:51:22,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1503.62 | bwd_inner_microstep: 1503.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-11 03:51:24,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.69 | bwd_microstep: 1436.85 | bwd_inner_microstep: 1436.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-11 03:51:26,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.75 | bwd_microstep: 1530.64 | bwd_inner_microstep: 1530.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 03:51:28,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.90 | bwd_microstep: 1274.87 | bwd_inner_microstep: 1274.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3826
[2024-06-11 03:51:30,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1421.28 | bwd_inner_microstep: 1421.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3724
[2024-06-11 03:51:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-11 03:51:34,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1305.78 | bwd_inner_microstep: 1305.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2243
[2024-06-11 03:51:35,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.20 | bwd_microstep: 993.70 | bwd_inner_microstep: 993.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-11 03:51:37,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1399.43 | bwd_inner_microstep: 1399.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816
[2024-06-11 03:53:16,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.63
[2024-06-11 03:53:16,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.32 | bwd_microstep: 98276.27 | bwd_inner_microstep: 1715.80 | bwd_allreduce_microstep: 96560.39 | step_microstep: 39.00
[2024-06-11 03:53:16,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15843.22 | bwd: 139062.45 | bwd_inner: 42501.08 | bwd_allreduce: 96560.67 | step: 40.56
{'loss': 1.1784, 'learning_rate': 9.398218064635478e-07, 'epoch': 0.9}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 03:53:18,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1330.75 | bwd_inner_microstep: 1330.55 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3935
[2024-06-11 03:53:20,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1483.00 | bwd_inner_microstep: 1482.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479
[2024-06-11 03:53:22,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.33 | bwd_microstep: 1472.68 | bwd_inner_microstep: 1472.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 03:53:24,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.07 | bwd_microstep: 1373.96 | bwd_inner_microstep: 1373.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 03:53:26,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1274.37 | bwd_inner_microstep: 1274.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3780
[2024-06-11 03:53:27,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.99 | bwd_microstep: 1392.16 | bwd_inner_microstep: 1392.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 03:53:29,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1400.43 | bwd_inner_microstep: 1400.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2524
[2024-06-11 03:53:31,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.41 | bwd_microstep: 899.20 | bwd_inner_microstep: 899.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-11 03:53:33,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1426.65 | bwd_inner_microstep: 1426.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702
[2024-06-11 03:53:35,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.95 | bwd_microstep: 1527.14 | bwd_inner_microstep: 1527.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 03:53:37,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1293.14 | bwd_inner_microstep: 1293.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2495
[2024-06-11 03:53:38,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.05 | bwd_microstep: 1021.39 | bwd_inner_microstep: 1021.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3419
[2024-06-11 03:53:40,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.25 | bwd_microstep: 1375.59 | bwd_inner_microstep: 1375.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-11 03:53:42,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.21 | bwd_microstep: 1438.21 | bwd_inner_microstep: 1438.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525
[2024-06-11 03:53:44,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.46 | bwd_microstep: 1582.87 | bwd_inner_microstep: 1582.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2133
[2024-06-11 03:53:45,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.54 | bwd_microstep: 831.61 | bwd_inner_microstep: 831.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-11 03:53:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1607.34 | bwd_inner_microstep: 1607.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-11 03:53:49,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.34 | bwd_microstep: 1431.21 | bwd_inner_microstep: 1431.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706
[2024-06-11 03:53:52,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.61 | bwd_microstep: 1527.91 | bwd_inner_microstep: 1527.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-11 03:53:53,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1259.56 | bwd_inner_microstep: 1259.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-11 03:53:56,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.80 | bwd_microstep: 1650.67 | bwd_inner_microstep: 1650.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856
[2024-06-11 03:53:58,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.46 | bwd_microstep: 1558.52 | bwd_inner_microstep: 1558.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-11 03:54:00,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.18 | bwd_microstep: 1520.77 | bwd_inner_microstep: 1520.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3106
[2024-06-11 03:54:02,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1341.07 | bwd_inner_microstep: 1341.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3966
[2024-06-11 03:54:04,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.64 | bwd_microstep: 1471.73 | bwd_inner_microstep: 1471.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2216
[2024-06-11 03:54:05,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.69 | bwd_microstep: 864.88 | bwd_inner_microstep: 864.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3562
[2024-06-11 03:54:07,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1265.14 | bwd_inner_microstep: 1265.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770
[2024-06-11 03:54:09,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.13 | bwd_microstep: 1444.36 | bwd_inner_microstep: 1444.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-11 03:54:10,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.55 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2059
[2024-06-11 03:54:12,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.46 | bwd_microstep: 864.20 | bwd_inner_microstep: 864.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-11 03:54:14,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 1651.42 | bwd_inner_microstep: 1651.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436
[2024-06-11 03:54:32,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.27 | optimizer_step: 6.61
[2024-06-11 03:54:32,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.48 | bwd_microstep: 17695.97 | bwd_inner_microstep: 1417.73 | bwd_allreduce_microstep: 16278.17 | step_microstep: 39.49
[2024-06-11 03:54:32,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16161.70 | bwd: 59586.21 | bwd_inner: 43306.96 | bwd_allreduce: 16278.49 | step: 41.04


 90%|█████████ | 1558/1726 [27:10:14<2:54:20, 62.27s/it]
 90%|█████████ | 1559/1726 [27:11:15<2:51:59, 61.80s/it]


 90%|█████████ | 1559/1726 [27:11:15<2:51:59, 61.80s/it]
 90%|█████████ | 1560/1726 [27:12:15<2:49:37, 61.31s/it]


 90%|█████████ | 1560/1726 [27:12:15<2:49:37, 61.31s/it]
 90%|█████████ | 1561/1726 [27:13:17<2:49:41, 61.71s/it]


 90%|█████████ | 1561/1726 [27:13:17<2:49:41, 61.71s/it]
 90%|█████████ | 1562/1726 [27:15:53<4:05:22, 89.77s/it]


 90%|█████████ | 1562/1726 [27:15:53<4:05:22, 89.77s/it]
 91%|█████████ | 1563/1726 [27:17:09<3:52:50, 85.71s/it]
                     {'loss': 1.1384, 'learning_rate': 9.284847543425113e-07, 'epoch': 0.91}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-11 03:54:34,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1331.14 | bwd_inner_microstep: 1331.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3861
[2024-06-11 03:54:36,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1455.46 | bwd_inner_microstep: 1455.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 03:54:38,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-11 03:54:40,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.65 | bwd_microstep: 1212.84 | bwd_inner_microstep: 1212.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 03:54:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.09 | bwd_microstep: 1476.39 | bwd_inner_microstep: 1476.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 03:54:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1274.99 | bwd_inner_microstep: 1274.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 03:54:45,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1278.50 | bwd_inner_microstep: 1278.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617
[2024-06-11 03:54:47,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.07 | bwd_microstep: 1306.19 | bwd_inner_microstep: 1306.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3545
[2024-06-11 03:54:49,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 1418.61 | bwd_inner_microstep: 1418.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 03:54:51,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.08 | bwd_microstep: 1482.39 | bwd_inner_microstep: 1482.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2334
[2024-06-11 03:54:52,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.18 | bwd_microstep: 981.98 | bwd_inner_microstep: 981.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643
[2024-06-11 03:54:55,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.25 | bwd_microstep: 1706.02 | bwd_inner_microstep: 1705.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 03:54:57,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.99 | bwd_microstep: 1398.39 | bwd_inner_microstep: 1398.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 03:54:59,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685
[2024-06-11 03:55:01,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.15 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 03:55:03,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3753
[2024-06-11 03:55:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.65 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587
[2024-06-11 03:55:07,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1334.38 | bwd_inner_microstep: 1334.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994
[2024-06-11 03:55:53,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.12 | bwd_microstep: 795.62 | bwd_inner_microstep: 795.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826
[2024-06-11 03:55:54,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.68 | bwd_microstep: 1347.11 | bwd_inner_microstep: 1347.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-11 03:55:56,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.62 | bwd_microstep: 1210.41 | bwd_inner_microstep: 1210.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-11 03:55:57,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 793.83 | bwd_inner_microstep: 793.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-11 03:55:59,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.64 | bwd_microstep: 1291.33 | bwd_inner_microstep: 1291.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-11 03:56:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.23 | bwd_microstep: 1637.86 | bwd_inner_microstep: 1637.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1981
[2024-06-11 03:56:02,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.38 | bwd_microstep: 703.99 | bwd_inner_microstep: 703.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 03:56:04,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1371.93 | bwd_inner_microstep: 1371.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 03:56:06,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.62 | bwd_microstep: 1449.03 | bwd_inner_microstep: 1449.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-11 03:56:08,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1978
[2024-06-11 03:56:09,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.76 | bwd_microstep: 765.22 | bwd_inner_microstep: 765.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-11 03:56:10,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.38 | bwd_microstep: 716.07 | bwd_inner_microstep: 716.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807
[2024-06-11 03:56:13,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.54 | bwd_microstep: 1745.83 | bwd_inner_microstep: 1745.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3800
[2024-06-11 03:56:20,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.60
[2024-06-11 03:56:20,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.53 | bwd_microstep: 6927.40 | bwd_inner_microstep: 1981.48 | bwd_allreduce_microstep: 4945.84 | step_microstep: 40.41
[2024-06-11 03:56:20,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.94 | bwd: 47114.07 | bwd_inner: 42167.31 | bwd_allreduce: 4946.08 | step: 41.85
{'loss': 1.1611, 'learning_rate': 9.172148719990237e-07, 'epoch': 0.91}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 03:56:22,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.79 | bwd_microstep: 1376.81 | bwd_inner_microstep: 1376.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2318
[2024-06-11 03:56:23,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.77 | bwd_microstep: 882.43 | bwd_inner_microstep: 882.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-11 03:56:26,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.99 | bwd_microstep: 1545.82 | bwd_inner_microstep: 1545.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 03:56:27,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1373.27 | bwd_inner_microstep: 1373.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 03:56:29,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1373.78 | bwd_inner_microstep: 1373.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3807
[2024-06-11 03:56:31,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1379.98 | bwd_inner_microstep: 1379.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-11 03:56:33,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1542.57 | bwd_inner_microstep: 1542.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 03:56:35,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1383.85 | bwd_inner_microstep: 1383.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 03:56:37,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1245.33 | bwd_inner_microstep: 1245.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 03:56:39,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 03:56:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1287.50 | bwd_inner_microstep: 1287.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 03:56:42,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1959
[2024-06-11 03:56:43,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.57 | bwd_microstep: 765.09 | bwd_inner_microstep: 765.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3503
[2024-06-11 03:56:45,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.91 | bwd_microstep: 1438.76 | bwd_inner_microstep: 1438.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498
[2024-06-11 03:56:48,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.97 | bwd_microstep: 1579.32 | bwd_inner_microstep: 1579.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3956
[2024-06-11 03:56:50,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.31 | bwd_microstep: 1793.91 | bwd_inner_microstep: 1793.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816
[2024-06-11 03:56:52,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1484.00 | bwd_inner_microstep: 1483.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524
[2024-06-11 03:56:54,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.59 | bwd_microstep: 1419.98 | bwd_inner_microstep: 1419.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-11 03:56:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1407.14 | bwd_inner_microstep: 1407.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 03:56:58,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1251.44 | bwd_inner_microstep: 1251.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3860
[2024-06-11 03:57:00,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.44 | bwd_microstep: 1368.12 | bwd_inner_microstep: 1368.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-11 03:57:02,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1427.93 | bwd_inner_microstep: 1427.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-11 03:57:04,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1406.89 | bwd_inner_microstep: 1406.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 03:57:05,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1295.00 | bwd_inner_microstep: 1294.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3679
[2024-06-11 03:57:07,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.08 | bwd_microstep: 1458.55 | bwd_inner_microstep: 1458.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 03:57:10,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1644.48 | bwd_inner_microstep: 1644.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 03:57:11,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.54 | bwd_microstep: 1257.38 | bwd_inner_microstep: 1257.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-11 03:57:13,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.07 | bwd_microstep: 851.49 | bwd_inner_microstep: 851.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-11 03:57:15,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.27 | bwd_microstep: 1410.37 | bwd_inner_microstep: 1410.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719
[2024-06-11 03:57:17,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1533.08 | bwd_inner_microstep: 1533.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 714
[2024-06-11 03:57:17,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 117.45 | bwd_microstep: 289.69 | bwd_inner_microstep: 289.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-11 03:57:24,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.01 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 03:57:24,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.17 | bwd_microstep: 6145.02 | bwd_inner_microstep: 1810.77 | bwd_allreduce_microstep: 4334.19 | step_microstep: 38.16
[2024-06-11 03:57:24,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.79 | bwd: 47255.63 | bwd_inner: 42920.54 | bwd_allreduce: 4334.42 | step: 39.63
{'loss': 1.2075, 'learning_rate': 9.060121991255566e-07, 'epoch': 0.91}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 03:57:26,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.83 | bwd_microstep: 1270.70 | bwd_inner_microstep: 1270.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 03:57:28,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.99 | bwd_microstep: 1380.48 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2348
[2024-06-11 03:57:29,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.00 | bwd_microstep: 981.95 | bwd_inner_microstep: 981.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3780
[2024-06-11 03:57:31,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1474.89 | bwd_inner_microstep: 1474.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899
[2024-06-11 03:57:32,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.16 | bwd_microstep: 712.92 | bwd_inner_microstep: 712.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 03:57:34,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1384.42 | bwd_inner_microstep: 1384.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 03:57:36,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.95 | bwd_microstep: 1250.73 | bwd_inner_microstep: 1250.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 03:57:37,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1386.47 | bwd_inner_microstep: 1386.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945
[2024-06-11 03:57:39,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.88 | bwd_microstep: 793.03 | bwd_inner_microstep: 793.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3431
[2024-06-11 03:57:40,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1393.35 | bwd_inner_microstep: 1393.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3686
[2024-06-11 03:57:43,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.80 | bwd_microstep: 1566.42 | bwd_inner_microstep: 1566.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3391
[2024-06-11 03:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.45 | bwd_microstep: 1335.91 | bwd_inner_microstep: 1335.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 03:57:46,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1384.51 | bwd_inner_microstep: 1384.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-11 03:57:48,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1248.87 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-11 03:57:50,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.83 | bwd_microstep: 1494.27 | bwd_inner_microstep: 1494.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 03:57:52,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1479.48 | bwd_inner_microstep: 1479.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 03:57:54,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1348.99 | bwd_inner_microstep: 1348.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3640
[2024-06-11 03:57:56,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1708.69 | bwd_inner_microstep: 1708.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641
[2024-06-11 03:57:59,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.90 | bwd_microstep: 1614.10 | bwd_inner_microstep: 1614.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-11 03:58:01,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.45 | bwd_microstep: 1657.32 | bwd_inner_microstep: 1657.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-11 03:58:03,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.48 | bwd_microstep: 1521.79 | bwd_inner_microstep: 1521.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515
[2024-06-11 03:58:05,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.13 | bwd_microstep: 1287.74 | bwd_inner_microstep: 1287.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2025
[2024-06-11 03:58:06,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.61 | bwd_microstep: 714.94 | bwd_inner_microstep: 714.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-11 03:58:08,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1393.29 | bwd_inner_microstep: 1393.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-11 03:58:09,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.16 | bwd_microstep: 918.39 | bwd_inner_microstep: 918.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2013
[2024-06-11 03:58:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.10 | bwd_microstep: 900.83 | bwd_inner_microstep: 900.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2068
[2024-06-11 03:58:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.27 | bwd_microstep: 1009.65 | bwd_inner_microstep: 1009.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3800
[2024-06-11 03:58:14,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 03:58:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1377.82 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3903
[2024-06-11 03:58:18,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1426.00 | bwd_inner_microstep: 1425.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 03:58:20,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1554.96 | bwd_inner_microstep: 1554.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-11 03:58:26,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.58
[2024-06-11 03:58:26,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 5246.98 | bwd_inner_microstep: 1881.47 | bwd_allreduce_microstep: 3365.45 | step_microstep: 40.74
[2024-06-11 03:58:26,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15711.29 | bwd: 45671.09 | bwd_inner: 42304.72 | bwd_allreduce: 3365.69 | step: 42.27
{'loss': 1.1893, 'learning_rate': 8.948767751778598e-07, 'epoch': 0.91}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453
[2024-06-11 03:58:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1280.00 | bwd_inner_microstep: 1279.83 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 03:58:29,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1350.28 | bwd_inner_microstep: 1350.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862
[2024-06-11 03:58:31,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1459.38 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 03:58:33,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.61 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1549.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862
[2024-06-11 03:58:36,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.99 | bwd_microstep: 1563.33 | bwd_inner_microstep: 1563.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 03:58:37,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1386.36 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 03:58:39,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1384.12 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-11 03:58:41,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1397.51 | bwd_inner_microstep: 1397.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-11 03:58:43,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1418.96 | bwd_inner_microstep: 1418.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2162
[2024-06-11 03:58:45,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.50 | bwd_microstep: 949.00 | bwd_inner_microstep: 948.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 03:58:46,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1253.55 | bwd_inner_microstep: 1253.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-11 03:58:48,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1346.25 | bwd_inner_microstep: 1346.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434
[2024-06-11 03:58:50,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1407.70 | bwd_inner_microstep: 1407.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 03:58:52,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.35 | bwd_microstep: 1486.98 | bwd_inner_microstep: 1486.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-11 03:58:54,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.90 | bwd_microstep: 1450.44 | bwd_inner_microstep: 1450.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 03:58:56,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1481.77 | bwd_inner_microstep: 1481.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005
[2024-06-11 03:58:57,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.05 | bwd_microstep: 833.09 | bwd_inner_microstep: 833.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3459
[2024-06-11 03:58:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1229.44 | bwd_inner_microstep: 1229.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3662
[2024-06-11 03:59:01,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.51 | bwd_microstep: 1542.50 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635
[2024-06-11 03:59:03,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1506.04 | bwd_inner_microstep: 1506.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2684
[2024-06-11 03:59:05,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.59 | bwd_microstep: 1025.54 | bwd_inner_microstep: 1025.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113
[2024-06-11 03:59:06,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.84 | bwd_microstep: 828.05 | bwd_inner_microstep: 828.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-11 03:59:08,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.94 | bwd_microstep: 1502.68 | bwd_inner_microstep: 1502.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2081
[2024-06-11 03:59:09,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.53 | bwd_microstep: 725.59 | bwd_inner_microstep: 725.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-11 03:59:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1481.72 | bwd_inner_microstep: 1481.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3759
[2024-06-11 03:59:13,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1250.12 | bwd_inner_microstep: 1250.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 03:59:14,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.67 | bwd_microstep: 1290.34 | bwd_inner_microstep: 1290.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655
[2024-06-11 03:59:16,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.00 | bwd_microstep: 1422.52 | bwd_inner_microstep: 1422.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 03:59:19,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.37 | bwd_microstep: 1642.74 | bwd_inner_microstep: 1642.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-11 03:59:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.05 | bwd_microstep: 1508.16 | bwd_inner_microstep: 1508.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-11 03:59:23,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1402.52 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3814
[2024-06-11 03:59:27,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-11 03:59:27,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 3287.26 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 1840.39 | step_microstep: 37.98
[2024-06-11 03:59:27,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15974.53 | bwd: 44643.35 | bwd_inner: 42801.93 | bwd_allreduce: 1840.68 | step: 39.55
{'loss': 1.1215, 'learning_rate': 8.83808639374848e-07, 'epoch': 0.91}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 03:59:28,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1365.35 | bwd_inner_microstep: 1365.24 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3969
[2024-06-11 03:59:31,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.60 | bwd_microstep: 1600.54 | bwd_inner_microstep: 1600.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 03:59:32,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.22 | bwd_microstep: 786.61 | bwd_inner_microstep: 786.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-11 03:59:34,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1295.92 | bwd_inner_microstep: 1295.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-11 03:59:36,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.85 | bwd_microstep: 1536.65 | bwd_inner_microstep: 1536.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420
[2024-06-11 03:59:37,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1251.06 | bwd_inner_microstep: 1251.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 03:59:39,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.50 | bwd_inner_microstep: 1283.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 03:59:41,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1250.45 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410
[2024-06-11 03:59:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.02 | bwd_inner_microstep: 1150.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-11 03:59:44,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.78 | bwd_microstep: 1190.72 | bwd_inner_microstep: 1190.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 03:59:46,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.96 | bwd_microstep: 1248.28 | bwd_inner_microstep: 1248.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 03:59:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1283.64 | bwd_inner_microstep: 1283.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1956
[2024-06-11 03:59:49,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.26 | bwd_microstep: 889.69 | bwd_inner_microstep: 889.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 03:59:51,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.14 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512
[2024-06-11 03:59:53,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1320.64 | bwd_inner_microstep: 1320.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-11 03:59:55,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.86 | bwd_microstep: 1439.33 | bwd_inner_microstep: 1439.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517
[2024-06-11 03:59:56,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1288.93 | bwd_inner_microstep: 1288.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 03:59:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.81 | bwd_microstep: 1544.20 | bwd_inner_microstep: 1544.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-11 04:00:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1317.47 | bwd_inner_microstep: 1317.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-11 04:00:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1348.86 | bwd_inner_microstep: 1348.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-11 04:00:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2111
[2024-06-11 04:00:05,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.46 | bwd_microstep: 825.92 | bwd_inner_microstep: 825.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522
[2024-06-11 04:00:07,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.93 | bwd_microstep: 1452.67 | bwd_inner_microstep: 1452.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 04:00:09,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1386.33 | bwd_inner_microstep: 1386.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-11 04:00:11,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1491.99 | bwd_inner_microstep: 1491.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570
[2024-06-11 04:00:13,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1459.73 | bwd_inner_microstep: 1459.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059
[2024-06-11 04:00:14,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.98 | bwd_microstep: 820.73 | bwd_inner_microstep: 820.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481
[2024-06-11 04:00:16,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.21 | bwd_microstep: 1572.63 | bwd_inner_microstep: 1572.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562
[2024-06-11 04:00:19,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.55 | bwd_microstep: 1591.86 | bwd_inner_microstep: 1591.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897
[2024-06-11 04:00:20,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.94 | bwd_microstep: 777.12 | bwd_inner_microstep: 777.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2995
[2024-06-11 04:00:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.00 | bwd_microstep: 1239.62 | bwd_inner_microstep: 1239.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577
[2024-06-11 04:00:25,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.19 | optimizer_step: 6.57
[2024-06-11 04:00:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.61 | bwd_microstep: 3545.73 | bwd_inner_microstep: 1528.30 | bwd_allreduce_microstep: 2017.38 | step_microstep: 38.05
[2024-06-11 04:00:25,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15429.20 | bwd: 43217.52 | bwd_inner: 41199.14 | bwd_allreduce: 2017.66 | step: 39.64


 91%|█████████ | 1563/1726 [27:17:09<3:52:50, 85.71s/it]
 91%|█████████ | 1564/1726 [27:18:57<4:09:32, 92.43s/it]


 91%|█████████ | 1564/1726 [27:18:57<4:09:32, 92.43s/it]
 91%|█████████ | 1565/1726 [27:20:01<3:44:47, 83.78s/it]


 91%|█████████ | 1565/1726 [27:20:01<3:44:47, 83.78s/it]
 91%|█████████ | 1566/1726 [27:21:02<3:25:45, 77.16s/it]


 91%|█████████ | 1566/1726 [27:21:02<3:25:45, 77.16s/it]
 91%|█████████ | 1567/1726 [27:22:03<3:11:35, 72.30s/it]


 91%|█████████ | 1567/1726 [27:22:03<3:11:35, 72.30s/it]
 91%|█████████ | 1568/1726 [27:23:02<2:59:51, 68.30s/it]
     {'loss': 1.1624, 'learning_rate': 8.728078306984322e-07, 'epoch': 0.91}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469
[2024-06-11 04:00:28,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.34 | bwd_microstep: 1569.20 | bwd_inner_microstep: 1569.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2218
[2024-06-11 04:00:29,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.15 | bwd_microstep: 902.30 | bwd_inner_microstep: 902.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931
[2024-06-11 04:00:30,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.42 | bwd_microstep: 794.76 | bwd_inner_microstep: 794.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3770
[2024-06-11 04:00:32,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.86 | bwd_microstep: 1402.83 | bwd_inner_microstep: 1402.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 04:00:34,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1385.20 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-11 04:00:35,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.27 | bwd_microstep: 810.64 | bwd_inner_microstep: 810.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 04:00:37,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1358.79 | bwd_inner_microstep: 1358.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-11 04:00:38,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 789.44 | bwd_inner_microstep: 789.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 04:00:40,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1386.60 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3730
[2024-06-11 04:00:42,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1565.01 | bwd_inner_microstep: 1564.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2161
[2024-06-11 04:00:43,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 980.93 | bwd_inner_microstep: 980.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3667
[2024-06-11 04:00:46,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.33 | bwd_microstep: 1687.64 | bwd_inner_microstep: 1687.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-11 04:00:48,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1627.36 | bwd_inner_microstep: 1627.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3528
[2024-06-11 04:00:50,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1541.73 | bwd_inner_microstep: 1541.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3662
[2024-06-11 04:00:52,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.89 | bwd_microstep: 1717.70 | bwd_inner_microstep: 1717.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3525
[2024-06-11 04:00:55,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.12 | bwd_microstep: 1690.03 | bwd_inner_microstep: 1690.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 04:00:57,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.96 | bwd_microstep: 1392.57 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 04:00:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1356.45 | bwd_inner_microstep: 1356.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 04:01:00,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1277.44 | bwd_inner_microstep: 1277.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2373
[2024-06-11 04:01:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.11 | bwd_microstep: 905.64 | bwd_inner_microstep: 905.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3727
[2024-06-11 04:01:04,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 1601.91 | bwd_inner_microstep: 1601.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023
[2024-06-11 04:01:05,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.04 | bwd_microstep: 714.49 | bwd_inner_microstep: 714.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-11 04:01:07,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.62 | bwd_microstep: 1521.09 | bwd_inner_microstep: 1521.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 04:01:09,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-11 04:01:11,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.92 | bwd_microstep: 1403.31 | bwd_inner_microstep: 1403.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-11 04:01:13,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.49 | bwd_microstep: 1536.01 | bwd_inner_microstep: 1535.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-11 04:01:15,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1441.08 | bwd_inner_microstep: 1441.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951
[2024-06-11 04:01:16,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.32 | bwd_microstep: 702.05 | bwd_inner_microstep: 702.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 04:01:18,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1395.39 | bwd_inner_microstep: 1395.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768
[2024-06-11 04:01:20,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.17 | bwd_microstep: 1649.22 | bwd_inner_microstep: 1649.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3581
[2024-06-11 04:01:22,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.04 | bwd_microstep: 1547.42 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3538
[2024-06-11 04:01:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.04 | optimizer_step: 6.60
[2024-06-11 04:01:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.64 | bwd_microstep: 3305.60 | bwd_inner_microstep: 1869.78 | bwd_allreduce_microstep: 1435.78 | step_microstep: 37.80
[2024-06-11 04:01:26,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15886.81 | bwd: 44244.54 | bwd_inner: 42807.83 | bwd_allreduce: 1436.02 | step: 39.28
{'loss': 1.1727, 'learning_rate': 8.618743878934022e-07, 'epoch': 0.91}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-11 04:01:28,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.25 | bwd_microstep: 1475.83 | bwd_inner_microstep: 1475.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3954
[2024-06-11 04:01:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1593.05 | bwd_inner_microstep: 1593.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 04:01:32,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.97 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-11 04:01:33,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.35 | bwd_microstep: 675.28 | bwd_inner_microstep: 675.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-11 04:01:35,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1539.33 | bwd_inner_microstep: 1539.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 04:01:37,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1286.60 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 04:01:39,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1480.54 | bwd_inner_microstep: 1480.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 04:01:41,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2448
[2024-06-11 04:01:42,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.51 | bwd_microstep: 978.13 | bwd_inner_microstep: 978.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481
[2024-06-11 04:01:44,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.65 | bwd_microstep: 1409.90 | bwd_inner_microstep: 1409.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 04:01:46,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1489.31 | bwd_inner_microstep: 1489.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-11 04:01:47,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.61 | bwd_microstep: 884.74 | bwd_inner_microstep: 884.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 04:01:49,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1390.50 | bwd_inner_microstep: 1390.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974
[2024-06-11 04:01:50,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.01 | bwd_microstep: 889.81 | bwd_inner_microstep: 889.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3663
[2024-06-11 04:01:52,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1484.35 | bwd_inner_microstep: 1484.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-11 04:01:54,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.08 | bwd_microstep: 1483.27 | bwd_inner_microstep: 1483.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641
[2024-06-11 04:01:56,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1313.02 | bwd_inner_microstep: 1313.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 636
[2024-06-11 04:01:57,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.95 | bwd_microstep: 263.90 | bwd_inner_microstep: 263.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 04:01:59,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1391.15 | bwd_inner_microstep: 1391.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983
[2024-06-11 04:02:00,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.82 | bwd_microstep: 706.00 | bwd_inner_microstep: 705.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448
[2024-06-11 04:02:01,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.06 | bwd_microstep: 1156.15 | bwd_inner_microstep: 1156.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599
[2024-06-11 04:02:03,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.27 | bwd_microstep: 1310.73 | bwd_inner_microstep: 1310.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 04:02:05,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1510.67 | bwd_inner_microstep: 1510.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597
[2024-06-11 04:02:07,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1404.51 | bwd_inner_microstep: 1404.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3613
[2024-06-11 04:02:09,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.98 | bwd_microstep: 1467.28 | bwd_inner_microstep: 1467.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561
[2024-06-11 04:02:11,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.51 | bwd_microstep: 1501.70 | bwd_inner_microstep: 1501.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805
[2024-06-11 04:02:13,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600
[2024-06-11 04:02:16,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.04 | bwd_microstep: 1596.76 | bwd_inner_microstep: 1596.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-11 04:02:17,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1425.92 | bwd_inner_microstep: 1425.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457
[2024-06-11 04:02:19,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.31 | bwd_microstep: 1434.50 | bwd_inner_microstep: 1434.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-11 04:02:21,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1314.53 | bwd_inner_microstep: 1314.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3001
[2024-06-11 04:02:27,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.09 | optimizer_step: 6.56
[2024-06-11 04:02:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 5437.78 | bwd_inner_microstep: 1518.76 | bwd_allreduce_microstep: 3918.97 | step_microstep: 37.92
[2024-06-11 04:02:27,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15495.36 | bwd: 45476.13 | bwd_inner: 41556.25 | bwd_allreduce: 3919.20 | step: 39.40
{'loss': 1.1421, 'learning_rate': 8.510083494672905e-07, 'epoch': 0.91}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1919
[2024-06-11 04:02:28,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.70 | bwd_microstep: 870.62 | bwd_inner_microstep: 870.50 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3896
[2024-06-11 04:02:31,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.79 | bwd_microstep: 1485.48 | bwd_inner_microstep: 1485.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 04:02:32,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1373.98 | bwd_inner_microstep: 1373.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 04:02:34,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1274.50 | bwd_inner_microstep: 1274.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757
[2024-06-11 04:02:36,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.27 | bwd_microstep: 1639.31 | bwd_inner_microstep: 1639.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 04:02:38,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-11 04:02:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1410.68 | bwd_inner_microstep: 1410.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-11 04:02:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.26 | bwd_microstep: 796.95 | bwd_inner_microstep: 796.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969
[2024-06-11 04:02:42,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.38 | bwd_microstep: 795.36 | bwd_inner_microstep: 795.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 04:02:44,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1388.28 | bwd_inner_microstep: 1388.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914
[2024-06-11 04:02:45,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 717.73 | bwd_inner_microstep: 717.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-11 04:02:46,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.35 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-11 04:02:48,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.60 | bwd_microstep: 1282.18 | bwd_inner_microstep: 1282.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 04:02:50,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1255.67 | bwd_inner_microstep: 1255.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2729
[2024-06-11 04:02:51,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.56 | bwd_microstep: 1118.07 | bwd_inner_microstep: 1118.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3410
[2024-06-11 04:02:54,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1503.61 | bwd_inner_microstep: 1503.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829
[2024-06-11 04:02:56,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.03 | bwd_microstep: 1583.06 | bwd_inner_microstep: 1583.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625
[2024-06-11 04:02:58,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.18 | bwd_microstep: 1531.57 | bwd_inner_microstep: 1531.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-11 04:03:00,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1494.67 | bwd_inner_microstep: 1494.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702
[2024-06-11 04:03:02,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1330.32 | bwd_inner_microstep: 1330.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 04:03:04,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1398.30 | bwd_inner_microstep: 1398.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542
[2024-06-11 04:03:06,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.72 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 04:03:08,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1553.32 | bwd_inner_microstep: 1553.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 04:03:11,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 2238.31 | bwd_inner_microstep: 2238.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297
[2024-06-11 04:03:12,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.85 | bwd_microstep: 978.71 | bwd_inner_microstep: 978.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-11 04:03:14,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.72 | bwd_microstep: 1346.77 | bwd_inner_microstep: 1346.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3387
[2024-06-11 04:03:15,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.49 | bwd_microstep: 1145.06 | bwd_inner_microstep: 1145.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3059
[2024-06-11 04:03:17,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1269.03 | bwd_inner_microstep: 1269.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278
[2024-06-11 04:03:19,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.98 | bwd_microstep: 971.43 | bwd_inner_microstep: 971.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815
[2024-06-11 04:03:21,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.43 | bwd_microstep: 1597.03 | bwd_inner_microstep: 1597.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239
[2024-06-11 04:03:22,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.20 | bwd_microstep: 962.98 | bwd_inner_microstep: 962.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564
[2024-06-11 04:03:28,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.14 | optimizer_step: 6.60
[2024-06-11 04:03:28,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 5314.12 | bwd_inner_microstep: 1461.22 | bwd_allreduce_microstep: 3852.84 | step_microstep: 38.93
[2024-06-11 04:03:28,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15117.44 | bwd: 45200.74 | bwd_inner: 41346.89 | bwd_allreduce: 3853.12 | step: 40.40
{'loss': 1.2051, 'learning_rate': 8.402097536902221e-07, 'epoch': 0.91}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 04:03:30,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1365.99 | bwd_inner_microstep: 1365.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473
[2024-06-11 04:03:31,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.97 | bwd_microstep: 1210.63 | bwd_inner_microstep: 1210.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 04:03:34,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.51 | bwd_microstep: 1550.33 | bwd_inner_microstep: 1550.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3792
[2024-06-11 04:03:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1478.17 | bwd_inner_microstep: 1478.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-11 04:03:37,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.42 | bwd_microstep: 1148.77 | bwd_inner_microstep: 1148.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 04:03:39,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1245.61 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755
[2024-06-11 04:03:41,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1535.59 | bwd_inner_microstep: 1535.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 04:03:43,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1283.93 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-11 04:03:45,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.10 | bwd_microstep: 1539.15 | bwd_inner_microstep: 1539.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4031
[2024-06-11 04:03:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.04 | bwd_microstep: 1564.41 | bwd_inner_microstep: 1564.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445
[2024-06-11 04:03:49,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.69 | bwd_microstep: 1214.35 | bwd_inner_microstep: 1214.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627
[2024-06-11 04:03:51,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.91 | bwd_microstep: 1424.20 | bwd_inner_microstep: 1424.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836
[2024-06-11 04:03:53,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.07 | bwd_microstep: 1758.30 | bwd_inner_microstep: 1758.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-11 04:03:55,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1406.85 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 04:03:57,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-11 04:03:59,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1606.02 | bwd_inner_microstep: 1605.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3661
[2024-06-11 04:04:01,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993
[2024-06-11 04:04:02,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 803.22 | bwd_inner_microstep: 803.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703
[2024-06-11 04:04:04,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1433.35 | bwd_inner_microstep: 1433.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-11 04:04:06,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1417.35 | bwd_inner_microstep: 1417.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-11 04:04:08,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1254.78 | bwd_inner_microstep: 1254.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3520
[2024-06-11 04:04:10,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 1317.94 | bwd_inner_microstep: 1317.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3609
[2024-06-11 04:04:12,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.56 | bwd_microstep: 1453.03 | bwd_inner_microstep: 1453.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816
[2024-06-11 04:04:14,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1386.05 | bwd_inner_microstep: 1386.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604
[2024-06-11 04:04:15,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1409.01 | bwd_inner_microstep: 1408.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-11 04:04:17,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.28 | bwd_microstep: 1441.55 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 04:04:19,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1405.30 | bwd_inner_microstep: 1405.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 04:04:21,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1387.25 | bwd_inner_microstep: 1387.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-11 04:04:23,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1444.00 | bwd_inner_microstep: 1443.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-11 04:04:25,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-11 04:04:27,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1402.90 | bwd_inner_microstep: 1402.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2267
[2024-06-11 04:04:29,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-11 04:04:29,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.91 | bwd_microstep: 1104.81 | bwd_inner_microstep: 962.46 | bwd_allreduce_microstep: 142.30 | step_microstep: 37.77
[2024-06-11 04:04:29,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16432.07 | bwd: 44055.98 | bwd_inner: 43912.78 | bwd_allreduce: 142.52 | step: 39.24
{'loss': 1.1591, 'learning_rate': 8.29478638594805e-07, 'epoch': 0.91}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-11 04:04:31,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.85 | bwd_microstep: 1497.76 | bwd_inner_microstep: 1497.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409
[2024-06-11 04:04:32,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.21 | bwd_microstep: 1148.66 | bwd_inner_microstep: 1148.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-11 04:04:34,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1483.31 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-11 04:04:37,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1495.45 | bwd_inner_microstep: 1495.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-11 04:04:38,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.29 | bwd_microstep: 1184.61 | bwd_inner_microstep: 1184.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-11 04:04:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1511.69 | bwd_inner_microstep: 1511.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 04:04:42,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.70 | bwd_microstep: 1286.31 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-11 04:04:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.61 | bwd_microstep: 1630.26 | bwd_inner_microstep: 1630.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623
[2024-06-11 04:04:46,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.02 | bwd_microstep: 1343.69 | bwd_inner_microstep: 1343.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482
[2024-06-11 04:04:48,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1413.08 | bwd_inner_microstep: 1413.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-11 04:04:50,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.28 | bwd_microstep: 1415.95 | bwd_inner_microstep: 1415.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2176
[2024-06-11 04:04:51,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.62 | bwd_microstep: 1050.76 | bwd_inner_microstep: 1050.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-11 04:04:54,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.76 | bwd_microstep: 1482.11 | bwd_inner_microstep: 1482.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 04:04:55,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3514
[2024-06-11 04:04:57,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1345.26 | bwd_inner_microstep: 1345.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939
[2024-06-11 04:05:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.13 | bwd_microstep: 1690.37 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-11 04:05:02,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1396.59 | bwd_inner_microstep: 1396.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639
[2024-06-11 04:05:03,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1408.71 | bwd_inner_microstep: 1408.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740
[2024-06-11 04:05:05,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.40 | bwd_microstep: 1340.58 | bwd_inner_microstep: 1340.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-11 04:05:07,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1391.18 | bwd_inner_microstep: 1391.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693
[2024-06-11 04:05:09,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.43 | bwd_microstep: 1426.71 | bwd_inner_microstep: 1426.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 04:05:11,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.31 | bwd_microstep: 1557.18 | bwd_inner_microstep: 1557.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638
[2024-06-11 04:05:14,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1616.47 | bwd_inner_microstep: 1616.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108
[2024-06-11 04:05:15,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.43 | bwd_microstep: 824.31 | bwd_inner_microstep: 824.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2811
[2024-06-11 04:05:16,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.39 | bwd_microstep: 1056.48 | bwd_inner_microstep: 1056.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3609
[2024-06-11 04:05:18,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.25 | bwd_microstep: 1469.81 | bwd_inner_microstep: 1469.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076
[2024-06-11 04:05:19,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.13 | bwd_microstep: 823.92 | bwd_inner_microstep: 823.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 04:05:21,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.86 | bwd_microstep: 1401.83 | bwd_inner_microstep: 1401.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-11 04:05:23,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1448.52 | bwd_inner_microstep: 1448.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-11 04:05:25,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1392.98 | bwd_inner_microstep: 1392.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2027
[2024-06-11 04:05:26,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.28 | bwd_microstep: 904.95 | bwd_inner_microstep: 904.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3582
[2024-06-11 04:05:29,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.03 | optimizer_step: 6.60
[2024-06-11 04:05:29,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.64 | bwd_microstep: 1980.68 | bwd_inner_microstep: 1761.39 | bwd_allreduce_microstep: 219.25 | step_microstep: 37.55
[2024-06-11 04:05:29,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16229.89 | bwd: 43800.65 | bwd_inner: 43580.51 | bwd_allreduce: 219.47 | step: 39.04


 91%|█████████ | 1568/1726 [27:23:02<2:59:51, 68.30s/it]
 91%|█████████ | 1569/1726 [27:24:03<2:52:34, 65.95s/it]


 91%|█████████ | 1569/1726 [27:24:03<2:52:34, 65.95s/it]
 91%|█████████ | 1570/1726 [27:25:04<2:47:51, 64.56s/it]


 91%|█████████ | 1570/1726 [27:25:04<2:47:51, 64.56s/it]
 91%|█████████ | 1571/1726 [27:26:05<2:43:44, 63.38s/it]


 91%|█████████ | 1571/1726 [27:26:05<2:43:44, 63.38s/it]
 91%|█████████ | 1572/1726 [27:27:05<2:40:42, 62.62s/it]


 91%|█████████ | 1572/1726 [27:27:05<2:40:42, 62.62s/it]
 91%|█████████ | 1573/1726 [27:28:06<2:37:56, {'loss': 1.1449, 'learning_rate': 8.188150419759577e-07, 'epoch': 0.91}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3394
[2024-06-11 04:05:31,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.19 | bwd_microstep: 1269.97 | bwd_inner_microstep: 1269.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1920
[2024-06-11 04:05:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.08 | bwd_microstep: 827.89 | bwd_inner_microstep: 827.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2689
[2024-06-11 04:05:34,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.91 | bwd_microstep: 1081.91 | bwd_inner_microstep: 1081.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-11 04:05:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.43 | bwd_microstep: 1538.93 | bwd_inner_microstep: 1538.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1922
[2024-06-11 04:05:37,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.04 | bwd_microstep: 758.10 | bwd_inner_microstep: 758.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-11 04:05:39,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1529.89 | bwd_inner_microstep: 1529.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-11 04:05:41,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-11 04:05:43,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.69 | bwd_microstep: 1413.27 | bwd_inner_microstep: 1413.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 04:05:45,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.96 | bwd_microstep: 1380.22 | bwd_inner_microstep: 1380.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 04:05:47,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.24 | bwd_microstep: 1483.53 | bwd_inner_microstep: 1483.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520
[2024-06-11 04:05:49,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1482.81 | bwd_inner_microstep: 1482.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1990
[2024-06-11 04:05:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.88 | bwd_microstep: 829.85 | bwd_inner_microstep: 829.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 04:05:52,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.28 | bwd_microstep: 1479.73 | bwd_inner_microstep: 1479.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691
[2024-06-11 04:05:54,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1526.79 | bwd_inner_microstep: 1526.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-11 04:05:55,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.35 | bwd_microstep: 903.44 | bwd_inner_microstep: 903.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-11 04:05:56,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.77 | bwd_microstep: 703.63 | bwd_inner_microstep: 703.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-11 04:05:58,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.94 | bwd_microstep: 1548.22 | bwd_inner_microstep: 1548.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142
[2024-06-11 04:06:00,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.27 | bwd_microstep: 833.60 | bwd_inner_microstep: 833.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 04:06:01,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.85 | bwd_microstep: 1376.61 | bwd_inner_microstep: 1376.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2176
[2024-06-11 04:06:03,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.37 | bwd_microstep: 889.13 | bwd_inner_microstep: 889.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 04:06:04,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.82 | bwd_microstep: 1295.49 | bwd_inner_microstep: 1295.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692
[2024-06-11 04:06:06,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.87 | bwd_microstep: 1432.14 | bwd_inner_microstep: 1432.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551
[2024-06-11 04:06:08,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.65 | bwd_microstep: 1206.51 | bwd_inner_microstep: 1206.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3464
[2024-06-11 04:06:10,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.70 | bwd_microstep: 1227.97 | bwd_inner_microstep: 1227.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602
[2024-06-11 04:06:12,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1607.77 | bwd_inner_microstep: 1607.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 04:06:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.86 | bwd_microstep: 1660.50 | bwd_inner_microstep: 1660.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 04:06:16,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1553.74 | bwd_inner_microstep: 1553.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569
[2024-06-11 04:06:18,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 04:06:20,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1470.76 | bwd_inner_microstep: 1470.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-11 04:06:22,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1405.44 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2088
[2024-06-11 04:06:24,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.61 | bwd_microstep: 1014.43 | bwd_inner_microstep: 1014.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-11 04:07:17,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-11 04:07:17,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 53194.54 | bwd_inner_microstep: 1750.65 | bwd_allreduce_microstep: 51443.82 | step_microstep: 39.12
[2024-06-11 04:07:17,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15349.76 | bwd: 92664.09 | bwd_inner: 41219.34 | bwd_allreduce: 51444.07 | step: 40.58
{'loss': 1.1917, 'learning_rate': 8.082190013908242e-07, 'epoch': 0.91}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-11 04:07:19,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1468.36 | bwd_inner_microstep: 1468.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 04:07:21,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.37 | bwd_microstep: 786.16 | bwd_inner_microstep: 786.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4347
[2024-06-11 04:07:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.74 | bwd_microstep: 1793.21 | bwd_inner_microstep: 1793.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 04:07:25,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1274.10 | bwd_inner_microstep: 1274.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 04:07:27,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1278.09 | bwd_inner_microstep: 1278.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488
[2024-06-11 04:07:28,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.82 | bwd_microstep: 1342.76 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-11 04:07:30,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480
[2024-06-11 04:07:31,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.41 | bwd_microstep: 1182.89 | bwd_inner_microstep: 1182.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-11 04:07:33,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1286.04 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2134
[2024-06-11 04:08:21,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.96 | bwd_microstep: 853.09 | bwd_inner_microstep: 853.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-11 04:08:23,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 1610.90 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-11 04:08:25,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.53 | bwd_microstep: 1314.13 | bwd_inner_microstep: 1314.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1949
[2024-06-11 04:08:26,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.35 | bwd_microstep: 879.89 | bwd_inner_microstep: 879.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3678
[2024-06-11 04:08:28,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.54 | bwd_microstep: 1361.96 | bwd_inner_microstep: 1361.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-11 04:08:30,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.56 | bwd_microstep: 1280.75 | bwd_inner_microstep: 1280.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-11 04:08:32,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1398.03 | bwd_inner_microstep: 1398.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395
[2024-06-11 04:08:34,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1238.33 | bwd_inner_microstep: 1238.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2098
[2024-06-11 04:08:35,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.76 | bwd_microstep: 817.94 | bwd_inner_microstep: 817.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-11 04:08:36,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.01 | bwd_microstep: 1186.40 | bwd_inner_microstep: 1186.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 04:08:38,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1452.07 | bwd_inner_microstep: 1452.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3611
[2024-06-11 04:08:40,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.80 | bwd_microstep: 1458.52 | bwd_inner_microstep: 1458.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1930
[2024-06-11 04:08:42,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.19 | bwd_microstep: 843.13 | bwd_inner_microstep: 843.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-11 04:08:43,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.47 | bwd_microstep: 776.94 | bwd_inner_microstep: 776.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 04:08:45,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.42 | bwd_microstep: 1544.09 | bwd_inner_microstep: 1544.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-11 04:08:47,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1487.98 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-11 04:08:49,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1395.35 | bwd_inner_microstep: 1395.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3557
[2024-06-11 04:08:51,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1518.91 | bwd_inner_microstep: 1518.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3704
[2024-06-11 04:08:53,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.33 | bwd_microstep: 1450.21 | bwd_inner_microstep: 1450.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-11 04:08:55,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.64 | bwd_microstep: 1486.29 | bwd_inner_microstep: 1486.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-11 04:08:57,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1496.24 | bwd_inner_microstep: 1496.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-11 04:08:59,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.26 | bwd_microstep: 1296.44 | bwd_inner_microstep: 1296.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576
[2024-06-11 04:09:01,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.07 | optimizer_step: 6.67
[2024-06-11 04:09:01,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1677.35 | bwd_inner_microstep: 1669.64 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.48
[2024-06-11 04:09:01,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15399.48 | bwd: 41026.89 | bwd_inner: 41018.34 | bwd_allreduce: 7.88 | step: 38.96
{'loss': 1.179, 'learning_rate': 7.976905541585967e-07, 'epoch': 0.91}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938
[2024-06-11 04:09:02,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.45 | bwd_microstep: 783.08 | bwd_inner_microstep: 783.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899
[2024-06-11 04:09:04,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.52 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1481.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-11 04:09:06,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.09 | bwd_microstep: 1182.00 | bwd_inner_microstep: 1181.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3855
[2024-06-11 04:09:08,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.62 | bwd_microstep: 1561.63 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767
[2024-06-11 04:09:10,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 1570.80 | bwd_inner_microstep: 1570.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782
[2024-06-11 04:09:12,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1453.63 | bwd_inner_microstep: 1453.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2263
[2024-06-11 04:09:13,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 872.93 | bwd_inner_microstep: 872.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-11 04:09:16,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1534.11 | bwd_inner_microstep: 1534.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 04:09:17,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.85 | bwd_microstep: 1388.96 | bwd_inner_microstep: 1388.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 04:09:19,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1387.19 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 04:09:21,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-11 04:09:23,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1417.94 | bwd_inner_microstep: 1417.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674
[2024-06-11 04:09:25,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.22 | bwd_microstep: 1683.11 | bwd_inner_microstep: 1683.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3668
[2024-06-11 04:09:28,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.95 | bwd_microstep: 1584.10 | bwd_inner_microstep: 1584.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497
[2024-06-11 04:09:30,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1516.18 | bwd_inner_microstep: 1516.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-11 04:09:32,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1443.84 | bwd_inner_microstep: 1443.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3509
[2024-06-11 04:09:34,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.38 | bwd_microstep: 1683.38 | bwd_inner_microstep: 1683.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526
[2024-06-11 04:09:36,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1295.30 | bwd_inner_microstep: 1295.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2098
[2024-06-11 04:09:37,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.86 | bwd_microstep: 883.63 | bwd_inner_microstep: 883.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 04:09:39,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-11 04:09:40,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.05 | bwd_microstep: 726.33 | bwd_inner_microstep: 726.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684
[2024-06-11 04:09:42,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1328.29 | bwd_inner_microstep: 1328.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281
[2024-06-11 04:09:43,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.71 | bwd_microstep: 974.15 | bwd_inner_microstep: 974.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063
[2024-06-11 04:09:44,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.07 | bwd_microstep: 921.42 | bwd_inner_microstep: 921.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 04:09:46,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1375.34 | bwd_inner_microstep: 1375.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3533
[2024-06-11 04:09:48,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1449.46 | bwd_inner_microstep: 1449.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820
[2024-06-11 04:09:51,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.84 | bwd_microstep: 1853.39 | bwd_inner_microstep: 1853.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 04:09:53,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.41 | bwd_microstep: 1664.03 | bwd_inner_microstep: 1664.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3491
[2024-06-11 04:09:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1544.19 | bwd_inner_microstep: 1544.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 04:09:57,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.34 | bwd_microstep: 1383.84 | bwd_inner_microstep: 1383.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629
[2024-06-11 04:09:59,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.48 | bwd_microstep: 1707.66 | bwd_inner_microstep: 1707.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3471
[2024-06-11 04:10:45,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.26 | optimizer_step: 6.59
[2024-06-11 04:10:45,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.16 | bwd_microstep: 44683.32 | bwd_inner_microstep: 1611.02 | bwd_allreduce_microstep: 43072.23 | step_microstep: 39.85
[2024-06-11 04:10:45,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.19 | bwd: 86978.97 | bwd_inner: 43905.79 | bwd_allreduce: 43072.47 | step: 41.34
{'loss': 1.1869, 'learning_rate': 7.872297373604154e-07, 'epoch': 0.91}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-11 04:10:47,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.39 | bwd_microstep: 1397.97 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2454
[2024-06-11 04:10:48,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.74 | bwd_microstep: 941.41 | bwd_inner_microstep: 941.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 04:10:50,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.62 | bwd_microstep: 1336.14 | bwd_inner_microstep: 1336.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-11 04:10:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1552.01 | bwd_inner_microstep: 1551.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 04:10:54,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1379.58 | bwd_inner_microstep: 1379.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3872
[2024-06-11 04:10:56,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.60 | bwd_microstep: 1557.75 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-11 04:10:58,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1416.04 | bwd_inner_microstep: 1416.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 916
[2024-06-11 04:10:59,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.37 | bwd_microstep: 372.14 | bwd_inner_microstep: 372.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732
[2024-06-11 04:11:18,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.69 | bwd_microstep: 1522.13 | bwd_inner_microstep: 1522.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1962
[2024-06-11 04:11:19,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.12 | bwd_microstep: 819.88 | bwd_inner_microstep: 819.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-11 04:11:21,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1432.92 | bwd_inner_microstep: 1432.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1952
[2024-06-11 04:11:23,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.86 | bwd_microstep: 910.97 | bwd_inner_microstep: 910.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470
[2024-06-11 04:11:25,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.83 | bwd_microstep: 1338.18 | bwd_inner_microstep: 1338.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3934
[2024-06-11 04:11:27,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 1584.28 | bwd_inner_microstep: 1584.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3505
[2024-06-11 04:11:29,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.94 | bwd_microstep: 1670.32 | bwd_inner_microstep: 1670.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829
[2024-06-11 04:11:31,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.23 | bwd_microstep: 1742.81 | bwd_inner_microstep: 1742.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-11 04:11:33,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1507.73 | bwd_inner_microstep: 1507.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-11 04:11:35,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.85 | bwd_microstep: 1459.84 | bwd_inner_microstep: 1459.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981
[2024-06-11 04:11:37,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.48 | bwd_microstep: 796.26 | bwd_inner_microstep: 796.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 04:11:39,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2089
[2024-06-11 04:11:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.91 | bwd_microstep: 824.99 | bwd_inner_microstep: 824.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 04:11:42,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1346.00 | bwd_inner_microstep: 1345.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442
[2024-06-11 04:11:43,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1444.01 | bwd_inner_microstep: 1443.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638
[2024-06-11 04:11:46,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.40 | bwd_microstep: 1703.88 | bwd_inner_microstep: 1703.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552
[2024-06-11 04:11:48,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1490.76 | bwd_inner_microstep: 1490.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563
[2024-06-11 04:11:50,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1420.81 | bwd_inner_microstep: 1420.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2034
[2024-06-11 04:11:51,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.89 | bwd_microstep: 840.54 | bwd_inner_microstep: 840.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819
[2024-06-11 04:11:53,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.64 | bwd_microstep: 1716.93 | bwd_inner_microstep: 1716.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569
[2024-06-11 04:11:55,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1425.80 | bwd_inner_microstep: 1425.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773
[2024-06-11 04:11:57,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 04:12:00,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.15 | bwd_microstep: 1651.47 | bwd_inner_microstep: 1651.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-11 04:12:03,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-11 04:12:03,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.00 | bwd_microstep: 3156.50 | bwd_inner_microstep: 995.03 | bwd_allreduce_microstep: 2161.41 | step_microstep: 38.25
[2024-06-11 04:12:03,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15790.42 | bwd: 44596.39 | bwd_inner: 42433.94 | bwd_allreduce: 2161.72 | step: 39.72
{'loss': 1.1548, 'learning_rate': 7.768365878392225e-07, 'epoch': 0.91}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053
[2024-06-11 04:12:04,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.84 | bwd_microstep: 905.26 | bwd_inner_microstep: 905.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4017
[2024-06-11 04:12:07,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.32 | bwd_microstep: 1604.92 | bwd_inner_microstep: 1604.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-11 04:12:08,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.25 | bwd_microstep: 1277.24 | bwd_inner_microstep: 1277.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3792
[2024-06-11 04:12:10,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1450.85 | bwd_inner_microstep: 1450.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521
[2024-06-11 04:12:12,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.33 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3628
[2024-06-11 04:12:14,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1377.41 | bwd_inner_microstep: 1377.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 04:12:16,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1390.50 | bwd_inner_microstep: 1390.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897
[2024-06-11 04:12:18,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.03 | bwd_microstep: 1591.47 | bwd_inner_microstep: 1591.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1940
[2024-06-11 04:12:19,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.58 | bwd_microstep: 730.17 | bwd_inner_microstep: 730.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417
[2024-06-11 04:12:21,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.45 | bwd_microstep: 1280.26 | bwd_inner_microstep: 1280.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 04:12:23,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.28 | bwd_microstep: 1248.86 | bwd_inner_microstep: 1248.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-11 04:12:25,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1527.53 | bwd_inner_microstep: 1527.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-11 04:12:27,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1251.73 | bwd_inner_microstep: 1251.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2989
[2024-06-11 04:12:28,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.62 | bwd_microstep: 1013.98 | bwd_inner_microstep: 1013.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2480
[2024-06-11 04:12:29,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.58 | bwd_microstep: 1015.36 | bwd_inner_microstep: 1015.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-11 04:12:31,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.49 | bwd_microstep: 1437.35 | bwd_inner_microstep: 1437.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-11 04:12:34,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.77 | bwd_microstep: 1611.59 | bwd_inner_microstep: 1611.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-11 04:12:35,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.93 | bwd_microstep: 800.27 | bwd_inner_microstep: 800.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3636
[2024-06-11 04:12:37,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1347.30 | bwd_inner_microstep: 1347.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 04:12:39,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1656.23 | bwd_inner_microstep: 1656.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 04:12:41,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665
[2024-06-11 04:12:43,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1325.31 | bwd_inner_microstep: 1325.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-11 04:12:45,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.57 | bwd_microstep: 1429.49 | bwd_inner_microstep: 1429.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2422
[2024-06-11 04:12:46,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.64 | bwd_microstep: 940.91 | bwd_inner_microstep: 940.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2184
[2024-06-11 04:12:47,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.32 | bwd_microstep: 827.53 | bwd_inner_microstep: 827.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785
[2024-06-11 04:12:49,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.51 | bwd_microstep: 1647.72 | bwd_inner_microstep: 1647.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-11 04:12:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.52 | bwd_microstep: 1486.33 | bwd_inner_microstep: 1486.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457
[2024-06-11 04:12:53,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728
[2024-06-11 04:12:55,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.66 | bwd_microstep: 1536.61 | bwd_inner_microstep: 1536.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3575
[2024-06-11 04:12:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.01 | bwd_microstep: 1530.82 | bwd_inner_microstep: 1530.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3562
[2024-06-11 04:13:00,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.41 | bwd_microstep: 1526.17 | bwd_inner_microstep: 1526.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403
[2024-06-11 04:13:06,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.56
[2024-06-11 04:13:06,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.39 | bwd_microstep: 5841.22 | bwd_inner_microstep: 1632.15 | bwd_allreduce_microstep: 4209.02 | step_microstep: 37.83
[2024-06-11 04:13:06,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15838.13 | bwd: 46693.40 | bwd_inner: 42483.46 | bwd_allreduce: 4209.24 | step: 39.28
61.94s/it]


 91%|█████████ | 1573/1726 [27:28:06<2:37:56, 61.94s/it]
 91%|█████████ | 1574/1726 [27:29:54<3:12:11, 75.86s/it]


 91%|█████████ | 1574/1726 [27:29:54<3:12:11, 75.86s/it]
 91%|█████████▏| 1575/1726 [27:31:38<3:31:55, 84.21s/it]


 91%|█████████▏| 1575/1726 [27:31:38<3:31:55, 84.21s/it]
 91%|█████████▏| 1576/1726 [27:33:22<3:45:05, 90.04s/it]


 91%|█████████▏| 1576/1726 [27:33:22<3:45:05, 90.04s/it]
 91%|█████████▏| 1577/1726 [27:34:40<3:34:54, 86.54s/it]


 91%|█████████▏| 1577/1726 [27:34:40<3:34:54, 86.54s/it]
 91%|█████████▏|{'loss': 1.1629, 'learning_rate': 7.665111421996329e-07, 'epoch': 0.91}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-11 04:13:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1570.39 | bwd_inner_microstep: 1570.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1342
[2024-06-11 04:13:09,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 198.10 | bwd_microstep: 516.22 | bwd_inner_microstep: 516.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 04:13:11,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1482.31 | bwd_inner_microstep: 1482.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 04:13:13,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1274.66 | bwd_inner_microstep: 1274.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 04:13:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481
[2024-06-11 04:13:17,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 04:13:18,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1378.23 | bwd_inner_microstep: 1378.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412
[2024-06-11 04:13:20,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1250.14 | bwd_inner_microstep: 1250.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 04:13:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.10 | bwd_microstep: 1386.84 | bwd_inner_microstep: 1386.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3397
[2024-06-11 04:13:24,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.86 | bwd_microstep: 1276.02 | bwd_inner_microstep: 1276.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3671
[2024-06-11 04:13:26,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.31 | bwd_microstep: 1480.50 | bwd_inner_microstep: 1480.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 04:13:28,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1483.17 | bwd_inner_microstep: 1483.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3659
[2024-06-11 04:13:30,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.05 | bwd_microstep: 1819.92 | bwd_inner_microstep: 1819.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 04:13:32,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1392.58 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923
[2024-06-11 04:13:34,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.53 | bwd_microstep: 820.36 | bwd_inner_microstep: 820.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-11 04:13:36,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1612.01 | bwd_inner_microstep: 1611.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516
[2024-06-11 04:13:38,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.24 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-11 04:13:40,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.86 | bwd_microstep: 1631.84 | bwd_inner_microstep: 1631.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448
[2024-06-11 04:13:42,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1313.76 | bwd_inner_microstep: 1313.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3802
[2024-06-11 04:13:44,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.23 | bwd_microstep: 1753.72 | bwd_inner_microstep: 1753.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 04:13:46,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1397.13 | bwd_inner_microstep: 1397.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109
[2024-06-11 04:13:47,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.32 | bwd_microstep: 920.06 | bwd_inner_microstep: 920.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618
[2024-06-11 04:13:49,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1414.93 | bwd_inner_microstep: 1414.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710
[2024-06-11 04:13:51,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1429.82 | bwd_inner_microstep: 1429.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3606
[2024-06-11 04:13:53,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1480.76 | bwd_inner_microstep: 1480.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3806
[2024-06-11 04:13:55,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1497.63 | bwd_inner_microstep: 1497.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468
[2024-06-11 04:13:57,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.91 | bwd_microstep: 1245.47 | bwd_inner_microstep: 1245.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823
[2024-06-11 04:13:59,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1357.42 | bwd_inner_microstep: 1357.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934
[2024-06-11 04:14:00,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.25 | bwd_microstep: 726.88 | bwd_inner_microstep: 726.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456
[2024-06-11 04:14:02,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1377.96 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2040
[2024-06-11 04:14:03,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.11 | bwd_microstep: 745.44 | bwd_inner_microstep: 745.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-11 04:14:07,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-11 04:14:07,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 3273.01 | bwd_inner_microstep: 1697.45 | bwd_allreduce_microstep: 1575.51 | step_microstep: 37.54
[2024-06-11 04:14:07,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15991.09 | bwd: 44562.57 | bwd_inner: 42986.16 | bwd_allreduce: 1575.74 | step: 39.05
{'loss': 1.1981, 'learning_rate': 7.562534368078167e-07, 'epoch': 0.91}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-11 04:14:09,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1488.55 | bwd_inner_microstep: 1488.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 04:14:11,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3881
[2024-06-11 04:14:13,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.28 | bwd_microstep: 1682.94 | bwd_inner_microstep: 1682.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-11 04:14:15,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.40 | bwd_microstep: 1653.30 | bwd_inner_microstep: 1653.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-11 04:14:16,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.33 | bwd_microstep: 678.68 | bwd_inner_microstep: 678.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 04:14:18,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1388.94 | bwd_inner_microstep: 1388.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3489
[2024-06-11 04:14:20,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.41 | bwd_microstep: 1232.30 | bwd_inner_microstep: 1232.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3403
[2024-06-11 04:14:22,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.16 | bwd_microstep: 1370.77 | bwd_inner_microstep: 1370.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-11 04:14:24,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.37 | bwd_microstep: 1182.33 | bwd_inner_microstep: 1182.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 04:14:25,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1250.49 | bwd_inner_microstep: 1250.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-11 04:14:27,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.89 | bwd_microstep: 1524.46 | bwd_inner_microstep: 1524.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505
[2024-06-11 04:14:29,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.84 | bwd_microstep: 1317.25 | bwd_inner_microstep: 1317.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502
[2024-06-11 04:14:31,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.29 | bwd_microstep: 1410.81 | bwd_inner_microstep: 1410.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 04:14:33,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.77 | bwd_microstep: 1485.84 | bwd_inner_microstep: 1485.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441
[2024-06-11 04:14:35,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.96 | bwd_microstep: 1444.62 | bwd_inner_microstep: 1444.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404
[2024-06-11 04:14:37,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.36 | bwd_microstep: 1436.09 | bwd_inner_microstep: 1436.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-11 04:14:39,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.39 | bwd_microstep: 1486.19 | bwd_inner_microstep: 1486.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-11 04:14:41,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.40 | bwd_microstep: 1649.14 | bwd_inner_microstep: 1649.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 04:14:43,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.02 | bwd_microstep: 1288.33 | bwd_inner_microstep: 1288.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2053
[2024-06-11 04:14:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.19 | bwd_microstep: 722.25 | bwd_inner_microstep: 722.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3623
[2024-06-11 04:14:46,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1475.72 | bwd_inner_microstep: 1475.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3815
[2024-06-11 04:14:48,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1530.86 | bwd_inner_microstep: 1530.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668
[2024-06-11 04:14:50,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.51 | bwd_microstep: 1427.89 | bwd_inner_microstep: 1427.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3722
[2024-06-11 04:14:52,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.87 | bwd_microstep: 1369.39 | bwd_inner_microstep: 1369.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488
[2024-06-11 04:14:54,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1317.98 | bwd_inner_microstep: 1317.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2018
[2024-06-11 04:14:55,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.21 | bwd_microstep: 777.16 | bwd_inner_microstep: 777.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-11 04:14:57,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1511.68 | bwd_inner_microstep: 1511.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-11 04:14:58,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.72 | bwd_microstep: 811.79 | bwd_inner_microstep: 811.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043
[2024-06-11 04:15:00,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.04 | bwd_microstep: 907.12 | bwd_inner_microstep: 907.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3602
[2024-06-11 04:15:01,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.23 | bwd_microstep: 1245.36 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 04:15:03,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1494.50 | bwd_inner_microstep: 1494.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4091
[2024-06-11 04:15:08,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-11 04:15:08,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 670.69 | bwd_microstep: 3656.31 | bwd_inner_microstep: 2090.61 | bwd_allreduce_microstep: 1565.65 | step_microstep: 38.04
[2024-06-11 04:15:08,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15984.10 | bwd: 44598.30 | bwd_inner: 43031.73 | bwd_allreduce: 1565.88 | step: 39.58
{'loss': 1.2043, 'learning_rate': 7.46063507791357e-07, 'epoch': 0.92}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-11 04:15:10,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.90 | bwd_microstep: 1290.21 | bwd_inner_microstep: 1290.05 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 04:15:12,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1378.41 | bwd_inner_microstep: 1378.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876
[2024-06-11 04:15:14,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 1581.38 | bwd_inner_microstep: 1581.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514
[2024-06-11 04:15:15,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.10 | bwd_microstep: 1188.22 | bwd_inner_microstep: 1188.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 04:15:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1546.26 | bwd_inner_microstep: 1546.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 04:15:19,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.83 | bwd_microstep: 1247.80 | bwd_inner_microstep: 1247.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3535
[2024-06-11 04:15:21,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.79 | bwd_microstep: 1441.85 | bwd_inner_microstep: 1441.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727
[2024-06-11 04:15:23,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.70 | bwd_microstep: 1635.31 | bwd_inner_microstep: 1635.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 04:15:25,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1250.21 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942
[2024-06-11 04:15:26,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.85 | bwd_microstep: 698.02 | bwd_inner_microstep: 698.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526
[2024-06-11 04:15:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.54 | bwd_microstep: 1417.99 | bwd_inner_microstep: 1417.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762
[2024-06-11 04:15:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.73 | bwd_microstep: 1633.32 | bwd_inner_microstep: 1633.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506
[2024-06-11 04:15:32,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1418.11 | bwd_inner_microstep: 1418.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 04:15:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 1340.25 | bwd_inner_microstep: 1340.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-11 04:15:35,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.66 | bwd_microstep: 781.41 | bwd_inner_microstep: 781.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671
[2024-06-11 04:15:37,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.99 | bwd_microstep: 1551.01 | bwd_inner_microstep: 1550.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 04:15:39,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-11 04:15:40,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.29 | bwd_microstep: 812.11 | bwd_inner_microstep: 812.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1994
[2024-06-11 04:15:41,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.25 | bwd_microstep: 708.25 | bwd_inner_microstep: 708.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3979
[2024-06-11 04:15:44,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.20 | bwd_microstep: 1809.71 | bwd_inner_microstep: 1809.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842
[2024-06-11 04:15:46,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1564.12 | bwd_inner_microstep: 1564.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734
[2024-06-11 04:15:48,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.05 | bwd_microstep: 1440.64 | bwd_inner_microstep: 1440.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715
[2024-06-11 04:15:50,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1636.71 | bwd_inner_microstep: 1636.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555
[2024-06-11 04:15:52,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.80 | bwd_microstep: 1329.88 | bwd_inner_microstep: 1329.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-11 04:15:53,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.64 | bwd_microstep: 688.83 | bwd_inner_microstep: 688.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2280
[2024-06-11 04:15:54,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.01 | bwd_microstep: 784.41 | bwd_inner_microstep: 784.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3467
[2024-06-11 04:15:56,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.40 | bwd_microstep: 1542.72 | bwd_inner_microstep: 1542.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 04:15:58,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.24 | bwd_microstep: 1503.29 | bwd_inner_microstep: 1503.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3555
[2024-06-11 04:16:00,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1377.52 | bwd_inner_microstep: 1377.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 04:16:03,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.86 | bwd_microstep: 1660.33 | bwd_inner_microstep: 1660.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597
[2024-06-11 04:16:05,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1608.31 | bwd_inner_microstep: 1608.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270
[2024-06-11 04:16:10,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 04:16:10,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.10 | bwd_microstep: 5028.04 | bwd_inner_microstep: 1131.25 | bwd_allreduce_microstep: 3896.74 | step_microstep: 37.81
[2024-06-11 04:16:10,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15777.60 | bwd: 46283.03 | bwd_inner: 42385.22 | bwd_allreduce: 3897.05 | step: 39.34
{'loss': 1.1591, 'learning_rate': 7.359413910391322e-07, 'epoch': 0.92}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 04:16:12,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1331.11 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 04:16:14,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1471.61 | bwd_inner_microstep: 1471.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 04:16:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1397.11 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-11 04:16:17,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.77 | bwd_microstep: 787.00 | bwd_inner_microstep: 786.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863
[2024-06-11 04:16:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.37 | bwd_microstep: 675.90 | bwd_inner_microstep: 675.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-11 04:16:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.27 | bwd_microstep: 1473.82 | bwd_inner_microstep: 1473.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 04:16:22,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1379.63 | bwd_inner_microstep: 1379.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 04:16:24,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1281.14 | bwd_inner_microstep: 1281.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733
[2024-06-11 04:16:26,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.29 | bwd_microstep: 1629.58 | bwd_inner_microstep: 1629.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689
[2024-06-11 04:16:28,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.43 | bwd_microstep: 1549.57 | bwd_inner_microstep: 1549.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-11 04:16:30,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.05 | bwd_microstep: 1526.61 | bwd_inner_microstep: 1526.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3586
[2024-06-11 04:16:32,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.65 | bwd_microstep: 1368.98 | bwd_inner_microstep: 1368.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-11 04:16:34,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1483.89 | bwd_inner_microstep: 1483.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3505
[2024-06-11 04:16:36,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1511.53 | bwd_inner_microstep: 1511.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3382
[2024-06-11 04:16:38,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.17 | bwd_microstep: 1438.04 | bwd_inner_microstep: 1438.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3832
[2024-06-11 04:16:41,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.07 | bwd_microstep: 1760.48 | bwd_inner_microstep: 1760.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392
[2024-06-11 04:16:42,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.78 | bwd_microstep: 1243.78 | bwd_inner_microstep: 1243.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1931
[2024-06-11 04:16:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.91 | bwd_microstep: 849.68 | bwd_inner_microstep: 849.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-11 04:16:45,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.73 | bwd_microstep: 1307.78 | bwd_inner_microstep: 1307.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3834
[2024-06-11 04:16:47,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.35 | bwd_microstep: 1516.73 | bwd_inner_microstep: 1516.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-11 04:16:50,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 1553.21 | bwd_inner_microstep: 1553.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2138
[2024-06-11 04:16:51,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.59 | bwd_microstep: 832.95 | bwd_inner_microstep: 832.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2210
[2024-06-11 04:16:52,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.61 | bwd_microstep: 957.72 | bwd_inner_microstep: 957.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3545
[2024-06-11 04:16:54,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.00 | bwd_microstep: 1690.35 | bwd_inner_microstep: 1690.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2066
[2024-06-11 04:16:56,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.88 | bwd_microstep: 1009.84 | bwd_inner_microstep: 1009.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424
[2024-06-11 04:16:58,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632
[2024-06-11 04:17:00,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.33 | bwd_microstep: 1441.21 | bwd_inner_microstep: 1441.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 575
[2024-06-11 04:17:00,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 99.82 | bwd_microstep: 250.59 | bwd_inner_microstep: 250.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811
[2024-06-11 04:17:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.12 | bwd_microstep: 1485.07 | bwd_inner_microstep: 1485.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609
[2024-06-11 04:17:04,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.54 | bwd_microstep: 1609.23 | bwd_inner_microstep: 1609.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591
[2024-06-11 04:17:06,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.63 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575
[2024-06-11 04:17:11,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-11 04:17:11,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.73 | bwd_microstep: 3918.20 | bwd_inner_microstep: 1809.56 | bwd_allreduce_microstep: 2108.59 | step_microstep: 37.77
[2024-06-11 04:17:11,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15695.13 | bwd: 44388.59 | bwd_inner: 42279.10 | bwd_allreduce: 2108.82 | step: 39.17
{'loss': 1.1987, 'learning_rate': 7.258871222011832e-07, 'epoch': 0.92}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1863
[2024-06-11 04:17:12,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.19 | bwd_microstep: 760.16 | bwd_inner_microstep: 760.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-11 04:17:14,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.14 | bwd_microstep: 1313.90 | bwd_inner_microstep: 1313.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871
[2024-06-11 04:17:16,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.49 | bwd_microstep: 1471.54 | bwd_inner_microstep: 1471.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 04:17:18,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400
[2024-06-11 04:17:19,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.47 | bwd_microstep: 1276.00 | bwd_inner_microstep: 1275.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750
[2024-06-11 04:17:21,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1539.14 | bwd_inner_microstep: 1539.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-11 04:17:24,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1486.74 | bwd_inner_microstep: 1486.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943
[2024-06-11 04:17:25,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.50 | bwd_microstep: 759.91 | bwd_inner_microstep: 759.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-11 04:17:27,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1417.24 | bwd_inner_microstep: 1417.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075
[2024-06-11 04:17:28,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.10 | bwd_microstep: 822.79 | bwd_inner_microstep: 822.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441
[2024-06-11 04:17:29,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1188.13 | bwd_inner_microstep: 1188.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-11 04:17:31,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1509.64 | bwd_inner_microstep: 1509.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 04:17:33,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1399.10 | bwd_inner_microstep: 1399.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 04:17:35,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1506.52 | bwd_inner_microstep: 1506.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 04:17:37,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1376.50 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 04:17:39,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1403.89 | bwd_inner_microstep: 1403.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525
[2024-06-11 04:17:41,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1423.81 | bwd_inner_microstep: 1423.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3984
[2024-06-11 04:17:43,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1504.31 | bwd_inner_microstep: 1504.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2931
[2024-06-11 04:17:45,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.88 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 04:17:47,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1484.85 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3547
[2024-06-11 04:17:49,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.58 | bwd_microstep: 1442.92 | bwd_inner_microstep: 1442.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 04:17:51,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 04:17:53,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.86 | bwd_microstep: 1513.46 | bwd_inner_microstep: 1513.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580
[2024-06-11 04:17:55,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1502.55 | bwd_inner_microstep: 1502.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-11 04:17:57,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1405.00 | bwd_inner_microstep: 1404.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607
[2024-06-11 04:17:59,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1312.02 | bwd_inner_microstep: 1312.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808
[2024-06-11 04:18:01,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1384.27 | bwd_inner_microstep: 1384.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3594
[2024-06-11 04:18:02,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.40 | bwd_microstep: 1213.79 | bwd_inner_microstep: 1213.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2210
[2024-06-11 04:18:04,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.29 | bwd_microstep: 863.45 | bwd_inner_microstep: 863.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2497
[2024-06-11 04:18:05,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.84 | bwd_microstep: 1057.91 | bwd_inner_microstep: 1057.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3604
[2024-06-11 04:18:07,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.90 | bwd_microstep: 1706.45 | bwd_inner_microstep: 1706.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263
[2024-06-11 04:18:11,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.08 | optimizer_step: 6.63
[2024-06-11 04:18:11,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.68 | bwd_microstep: 3496.30 | bwd_inner_microstep: 1099.16 | bwd_allreduce_microstep: 2397.09 | step_microstep: 37.70
[2024-06-11 04:18:11,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15748.91 | bwd: 44558.94 | bwd_inner: 42160.91 | bwd_allreduce: 2397.33 | step: 39.18
 1578/1726 [27:35:43<3:15:56, 79.44s/it]


 91%|█████████▏| 1578/1726 [27:35:43<3:15:56, 79.44s/it]
 91%|█████████▏| 1579/1726 [27:36:44<3:00:59, 73.87s/it]


 91%|█████████▏| 1579/1726 [27:36:44<3:00:59, 73.87s/it]
 92%|█████████▏| 1580/1726 [27:37:45<2:50:18, 69.99s/it]


 92%|█████████▏| 1580/1726 [27:37:45<2:50:18, 69.99s/it]
 92%|█████████▏| 1581/1726 [27:38:47<2:43:37, 67.71s/it]


 92%|█████████▏| 1581/1726 [27:38:47<2:43:37, 67.71s/it]
 92%|█████████▏| 1582/1726 [27:39:47<2:37:14, 65.52s/it]


 92%|█████████▏| 1582/1726 [27:39:47<2:37:14, 65.52s/it]
{'loss': 1.1493, 'learning_rate': 7.15900736688595e-07, 'epoch': 0.92}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2459
[2024-06-11 04:18:13,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.99 | bwd_microstep: 1030.88 | bwd_inner_microstep: 1030.80 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 04:18:15,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1375.30 | bwd_inner_microstep: 1375.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 04:18:17,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.73 | bwd_microstep: 1382.89 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 04:18:19,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1651.67 | bwd_inner_microstep: 1651.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 528
[2024-06-11 04:18:19,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.45 | bwd_microstep: 240.75 | bwd_inner_microstep: 240.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729
[2024-06-11 04:18:21,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.16 | bwd_microstep: 1462.60 | bwd_inner_microstep: 1462.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 04:18:23,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1244.00 | bwd_inner_microstep: 1243.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-11 04:18:24,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.51 | bwd_microstep: 788.20 | bwd_inner_microstep: 788.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 04:18:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.81 | bwd_microstep: 1246.57 | bwd_inner_microstep: 1246.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-11 04:18:28,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.93 | bwd_microstep: 1427.59 | bwd_inner_microstep: 1427.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-11 04:18:30,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1345.01 | bwd_inner_microstep: 1344.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3487
[2024-06-11 04:18:31,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1394.20 | bwd_inner_microstep: 1394.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 04:18:33,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1340.84 | bwd_inner_microstep: 1340.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 04:18:35,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.15 | bwd_microstep: 1475.06 | bwd_inner_microstep: 1475.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-11 04:18:37,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.85 | bwd_microstep: 1434.56 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3607
[2024-06-11 04:18:39,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.00 | bwd_microstep: 1460.38 | bwd_inner_microstep: 1460.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1987
[2024-06-11 04:18:40,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.04 | bwd_microstep: 768.48 | bwd_inner_microstep: 768.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-11 04:18:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.12 | bwd_microstep: 1317.94 | bwd_inner_microstep: 1317.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 04:18:44,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1285.12 | bwd_inner_microstep: 1285.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3818
[2024-06-11 04:18:46,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.00 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-11 04:18:48,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441
[2024-06-11 04:18:49,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1251.22 | bwd_inner_microstep: 1251.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 04:18:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.48 | bwd_inner_microstep: 1459.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-11 04:18:54,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.93 | bwd_microstep: 1548.49 | bwd_inner_microstep: 1548.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539
[2024-06-11 04:18:56,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.55 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 04:18:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 973.51 | bwd_inner_microstep: 973.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 04:18:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.63 | bwd_microstep: 1404.48 | bwd_inner_microstep: 1404.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 925
[2024-06-11 04:19:00,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.21 | bwd_microstep: 375.61 | bwd_inner_microstep: 375.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3429
[2024-06-11 04:19:01,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.45 | bwd_microstep: 1198.53 | bwd_inner_microstep: 1198.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-11 04:19:03,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1393.52 | bwd_inner_microstep: 1393.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3579
[2024-06-11 04:19:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.94 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046
[2024-06-11 04:19:13,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.55 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 04:19:13,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.46 | bwd_microstep: 7926.13 | bwd_inner_microstep: 1036.85 | bwd_allreduce_microstep: 6889.22 | step_microstep: 38.60
[2024-06-11 04:19:13,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14904.92 | bwd: 46683.01 | bwd_inner: 39792.82 | bwd_allreduce: 6889.49 | step: 40.06
{'loss': 1.1881, 'learning_rate': 7.059822696733598e-07, 'epoch': 0.92}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491
[2024-06-11 04:19:15,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 04:19:17,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1393.60 | bwd_inner_microstep: 1393.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3880
[2024-06-11 04:19:19,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.87 | bwd_microstep: 1485.43 | bwd_inner_microstep: 1485.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471
[2024-06-11 04:19:21,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.87 | bwd_microstep: 1408.34 | bwd_inner_microstep: 1408.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262
[2024-06-11 04:19:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.35 | bwd_microstep: 966.37 | bwd_inner_microstep: 966.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3758
[2024-06-11 04:19:24,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.66 | bwd_microstep: 1340.12 | bwd_inner_microstep: 1340.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 04:19:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.34 | bwd_microstep: 1279.49 | bwd_inner_microstep: 1279.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 04:19:28,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1248.51 | bwd_inner_microstep: 1248.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749
[2024-06-11 04:19:30,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1538.96 | bwd_inner_microstep: 1538.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 04:19:31,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.62 | bwd_microstep: 1186.65 | bwd_inner_microstep: 1186.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3694
[2024-06-11 04:19:34,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.02 | bwd_microstep: 1690.94 | bwd_inner_microstep: 1690.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3512
[2024-06-11 04:19:36,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1450.46 | bwd_inner_microstep: 1450.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217
[2024-06-11 04:19:37,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.11 | bwd_microstep: 960.02 | bwd_inner_microstep: 959.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-11 04:19:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.92 | bwd_microstep: 1613.90 | bwd_inner_microstep: 1613.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 04:19:41,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1374.47 | bwd_inner_microstep: 1374.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 04:19:44,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.03 | bwd_microstep: 1659.37 | bwd_inner_microstep: 1659.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-11 04:19:45,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1417.39 | bwd_inner_microstep: 1417.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 04:19:47,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.13 | bwd_microstep: 1276.90 | bwd_inner_microstep: 1276.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-11 04:19:49,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.05 | bwd_microstep: 1182.92 | bwd_inner_microstep: 1182.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 04:19:51,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.31 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3818
[2024-06-11 04:19:53,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.06 | bwd_microstep: 1603.22 | bwd_inner_microstep: 1603.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624
[2024-06-11 04:19:55,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.28 | bwd_microstep: 1311.67 | bwd_inner_microstep: 1311.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3608
[2024-06-11 04:19:57,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.29 | bwd_microstep: 1674.53 | bwd_inner_microstep: 1674.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 04:19:59,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1345.67 | bwd_inner_microstep: 1345.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 04:20:01,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3553
[2024-06-11 04:20:03,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.57 | bwd_microstep: 1555.78 | bwd_inner_microstep: 1555.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3421
[2024-06-11 04:20:05,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.00 | bwd_microstep: 1402.54 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1900
[2024-06-11 04:20:06,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.56 | bwd_microstep: 728.30 | bwd_inner_microstep: 728.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571
[2024-06-11 04:20:08,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1304.86 | bwd_inner_microstep: 1304.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-11 04:20:10,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1393.75 | bwd_inner_microstep: 1393.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 04:20:12,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.81 | bwd_microstep: 1285.57 | bwd_inner_microstep: 1285.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192
[2024-06-11 04:20:16,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-11 04:20:16,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 4153.90 | bwd_inner_microstep: 1082.67 | bwd_allreduce_microstep: 3071.18 | step_microstep: 38.23
[2024-06-11 04:20:16,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16187.47 | bwd: 46403.39 | bwd_inner: 43331.32 | bwd_allreduce: 3071.40 | step: 39.74
{'loss': 1.2365, 'learning_rate': 6.961317560882741e-07, 'epoch': 0.92}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 04:20:18,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1370.96 | bwd_inner_microstep: 1370.88 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943
[2024-06-11 04:20:20,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.85 | bwd_microstep: 1193.95 | bwd_inner_microstep: 1193.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 04:20:22,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.84 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2266
[2024-06-11 04:20:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.93 | bwd_microstep: 968.04 | bwd_inner_microstep: 968.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-11 04:20:25,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1454.46 | bwd_inner_microstep: 1454.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 04:20:27,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1244.12 | bwd_inner_microstep: 1244.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 04:20:29,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.73 | bwd_microstep: 1545.58 | bwd_inner_microstep: 1545.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-11 04:20:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.38 | bwd_microstep: 794.31 | bwd_inner_microstep: 794.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 04:20:32,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-11 04:20:34,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.73 | bwd_microstep: 1529.00 | bwd_inner_microstep: 1528.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 04:20:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.21 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 04:20:38,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.56 | bwd_microstep: 1380.31 | bwd_inner_microstep: 1380.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473
[2024-06-11 04:20:40,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1404.03 | bwd_inner_microstep: 1404.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-11 04:20:42,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1448.76 | bwd_inner_microstep: 1448.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 04:20:44,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1481.75 | bwd_inner_microstep: 1481.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3946
[2024-06-11 04:20:46,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1601.87 | bwd_inner_microstep: 1601.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-11 04:20:48,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.19 | bwd_microstep: 1256.11 | bwd_inner_microstep: 1256.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 04:20:49,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1283.13 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648
[2024-06-11 04:20:51,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1407.96 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669
[2024-06-11 04:20:53,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1425.58 | bwd_inner_microstep: 1425.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-11 04:20:55,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-11 04:20:57,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1455.69 | bwd_inner_microstep: 1455.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2512
[2024-06-11 04:20:58,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.80 | bwd_microstep: 1058.75 | bwd_inner_microstep: 1058.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3674
[2024-06-11 04:21:00,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.30 | bwd_microstep: 1454.97 | bwd_inner_microstep: 1454.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723
[2024-06-11 04:21:03,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.00 | bwd_microstep: 1638.03 | bwd_inner_microstep: 1638.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-11 04:21:05,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.02 | bwd_microstep: 1544.40 | bwd_inner_microstep: 1544.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588
[2024-06-11 04:21:07,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1410.24 | bwd_inner_microstep: 1410.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2241
[2024-06-11 04:21:08,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.31 | bwd_microstep: 964.90 | bwd_inner_microstep: 964.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007
[2024-06-11 04:21:09,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.21 | bwd_microstep: 907.22 | bwd_inner_microstep: 907.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-11 04:21:11,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2264
[2024-06-11 04:21:13,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.37 | bwd_microstep: 1066.56 | bwd_inner_microstep: 1066.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-11 04:21:18,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.11 | optimizer_step: 6.63
[2024-06-11 04:21:18,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.43 | bwd_microstep: 4177.44 | bwd_inner_microstep: 1740.79 | bwd_allreduce_microstep: 2436.60 | step_microstep: 37.95
[2024-06-11 04:21:18,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15934.46 | bwd: 45238.55 | bwd_inner: 42800.98 | bwd_allreduce: 2436.86 | step: 39.48
{'loss': 1.1499, 'learning_rate': 6.863492306267927e-07, 'epoch': 0.92}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-11 04:21:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.67 | bwd_microstep: 1475.47 | bwd_inner_microstep: 1475.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 04:21:22,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1380.86 | bwd_inner_microstep: 1380.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3835
[2024-06-11 04:21:24,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.97 | bwd_microstep: 1416.94 | bwd_inner_microstep: 1416.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-11 04:21:25,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1342.35 | bwd_inner_microstep: 1342.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2503
[2024-06-11 04:21:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.62 | bwd_microstep: 1024.81 | bwd_inner_microstep: 1024.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 04:21:29,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1496.02 | bwd_inner_microstep: 1496.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-11 04:21:31,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1531.92 | bwd_inner_microstep: 1531.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710
[2024-06-11 04:21:33,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1630.19 | bwd_inner_microstep: 1630.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 04:21:35,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1250.76 | bwd_inner_microstep: 1250.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618
[2024-06-11 04:21:37,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1445.00 | bwd_inner_microstep: 1444.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4099
[2024-06-11 04:21:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.65 | bwd_microstep: 1531.87 | bwd_inner_microstep: 1531.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633
[2024-06-11 04:21:41,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.10 | bwd_microstep: 1706.37 | bwd_inner_microstep: 1706.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-11 04:21:44,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016
[2024-06-11 04:21:45,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.18 | bwd_microstep: 710.74 | bwd_inner_microstep: 710.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2175
[2024-06-11 04:21:46,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.17 | bwd_microstep: 887.98 | bwd_inner_microstep: 887.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705
[2024-06-11 04:21:48,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1617.81 | bwd_inner_microstep: 1617.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2110
[2024-06-11 04:21:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.33 | bwd_microstep: 824.48 | bwd_inner_microstep: 824.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2052
[2024-06-11 04:21:50,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.50 | bwd_microstep: 723.42 | bwd_inner_microstep: 723.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-11 04:21:52,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.27 | bwd_microstep: 1508.66 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616
[2024-06-11 04:21:54,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.35 | bwd_microstep: 1508.42 | bwd_inner_microstep: 1508.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621
[2024-06-11 04:21:56,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1405.59 | bwd_inner_microstep: 1405.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829
[2024-06-11 04:21:58,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1520.68 | bwd_inner_microstep: 1520.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 04:22:00,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.00 | bwd_microstep: 1286.08 | bwd_inner_microstep: 1286.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 04:22:02,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1377.09 | bwd_inner_microstep: 1377.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009
[2024-06-11 04:22:03,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.23 | bwd_microstep: 742.55 | bwd_inner_microstep: 742.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820
[2024-06-11 04:22:05,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.88 | bwd_microstep: 1461.19 | bwd_inner_microstep: 1461.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3578
[2024-06-11 04:22:07,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1349.06 | bwd_inner_microstep: 1349.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3919
[2024-06-11 04:22:09,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.80 | bwd_microstep: 1596.29 | bwd_inner_microstep: 1596.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 585
[2024-06-11 04:22:10,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.25 | bwd_microstep: 255.97 | bwd_inner_microstep: 255.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594
[2024-06-11 04:22:11,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.54 | bwd_microstep: 1307.55 | bwd_inner_microstep: 1307.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3589
[2024-06-11 04:22:14,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.90 | bwd_microstep: 1648.20 | bwd_inner_microstep: 1648.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-11 04:22:18,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-11 04:22:18,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 4232.43 | bwd_inner_microstep: 1614.25 | bwd_allreduce_microstep: 2618.13 | step_microstep: 37.98
[2024-06-11 04:22:18,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15627.64 | bwd: 44784.70 | bwd_inner: 42165.66 | bwd_allreduce: 2618.35 | step: 39.46
{'loss': 1.1387, 'learning_rate': 6.766347277429175e-07, 'epoch': 0.92}
 92%|█████████▏| 1583/1726 [27:40:48<2:32:39, 64.06s/it]


 92%|█████████▏| 1583/1726 [27:40:48<2:32:39, 64.06s/it]
 92%|█████████▏| 1584/1726 [27:41:50<2:30:04, 63.41s/it]


 92%|█████████▏| 1584/1726 [27:41:50<2:30:04, 63.41s/it]
 92%|█████████▏| 1585/1726 [27:42:53<2:28:40, 63.27s/it]


 92%|█████████▏| 1585/1726 [27:42:53<2:28:40, 63.27s/it]
 92%|█████████▏| 1586/1726 [27:43:54<2:26:23, 62.74s/it]


 92%|█████████▏| 1586/1726 [27:43:54<2:26:23, 62.74s/it]
 92%|█████████▏| 1587/1726 [27:44:55<2:23:57, 62.14s/it]


 92%|█████████▏| 1587dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-11 04:22:20,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1465.98 | bwd_inner_microstep: 1465.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3948
[2024-06-11 04:22:23,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.52 | bwd_microstep: 1592.12 | bwd_inner_microstep: 1592.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777
[2024-06-11 04:22:25,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.65 | bwd_microstep: 1442.09 | bwd_inner_microstep: 1442.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 04:22:26,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.89 | bwd_microstep: 1282.28 | bwd_inner_microstep: 1282.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-11 04:22:28,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1545.46 | bwd_inner_microstep: 1545.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-11 04:22:30,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.83 | bwd_microstep: 1276.72 | bwd_inner_microstep: 1276.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445
[2024-06-11 04:22:32,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1447.90 | bwd_inner_microstep: 1447.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 04:22:33,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.75 | bwd_microstep: 789.47 | bwd_inner_microstep: 789.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 04:22:35,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1402.36 | bwd_inner_microstep: 1402.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 04:22:37,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1247.05 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 04:22:39,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 1384.79 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-11 04:22:41,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1402.24 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-11 04:22:43,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.20 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4189
[2024-06-11 04:22:45,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 1553.74 | bwd_inner_microstep: 1553.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 04:22:47,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3601
[2024-06-11 04:22:49,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 1459.76 | bwd_inner_microstep: 1459.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-11 04:22:51,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1489.13 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-11 04:22:52,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.60 | bwd_microstep: 803.52 | bwd_inner_microstep: 803.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-11 04:22:54,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.77 | bwd_microstep: 1589.02 | bwd_inner_microstep: 1589.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1996
[2024-06-11 04:22:55,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.61 | bwd_microstep: 892.83 | bwd_inner_microstep: 892.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103
[2024-06-11 04:22:56,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.80 | bwd_microstep: 920.71 | bwd_inner_microstep: 920.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3647
[2024-06-11 04:22:58,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1442.09 | bwd_inner_microstep: 1442.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-11 04:23:01,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.30 | bwd_microstep: 1499.13 | bwd_inner_microstep: 1499.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3721
[2024-06-11 04:23:03,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.55 | bwd_microstep: 1601.63 | bwd_inner_microstep: 1601.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048
[2024-06-11 04:23:04,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.39 | bwd_microstep: 809.14 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-11 04:23:05,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.43 | bwd_microstep: 716.77 | bwd_inner_microstep: 716.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3637
[2024-06-11 04:23:07,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.67 | bwd_microstep: 1662.42 | bwd_inner_microstep: 1662.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3817
[2024-06-11 04:23:09,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1527.26 | bwd_inner_microstep: 1527.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 04:23:11,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-11 04:23:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1512.58 | bwd_inner_microstep: 1512.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3800
[2024-06-11 04:23:15,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.70 | bwd_microstep: 1450.35 | bwd_inner_microstep: 1450.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3569
[2024-06-11 04:23:19,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.77 | optimizer_gradients: 4.04 | optimizer_step: 6.59
[2024-06-11 04:23:19,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.85 | bwd_microstep: 3564.47 | bwd_inner_microstep: 1772.84 | bwd_allreduce_microstep: 1791.59 | step_microstep: 39.46
[2024-06-11 04:23:19,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15937.38 | bwd: 44595.26 | bwd_inner: 42802.76 | bwd_allreduce: 1791.81 | step: 41.00
{'loss': 1.1754, 'learning_rate': 6.669882816510776e-07, 'epoch': 0.92}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 04:23:21,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1364.81 | bwd_inner_microstep: 1364.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 04:23:23,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1378.26 | bwd_inner_microstep: 1378.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-11 04:23:25,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.00 | bwd_microstep: 1275.68 | bwd_inner_microstep: 1275.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262
[2024-06-11 04:23:26,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 971.54 | bwd_inner_microstep: 971.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 04:23:28,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3745
[2024-06-11 04:23:30,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.96 | bwd_microstep: 1640.03 | bwd_inner_microstep: 1640.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2190
[2024-06-11 04:23:32,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.70 | bwd_microstep: 954.48 | bwd_inner_microstep: 954.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2179
[2024-06-11 04:23:33,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.67 | bwd_microstep: 856.89 | bwd_inner_microstep: 856.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691
[2024-06-11 04:23:35,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.06 | bwd_microstep: 1629.37 | bwd_inner_microstep: 1629.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-11 04:23:36,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.17 | bwd_microstep: 686.28 | bwd_inner_microstep: 686.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 04:23:38,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1288.78 | bwd_inner_microstep: 1288.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649
[2024-06-11 04:23:40,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1548.56 | bwd_inner_microstep: 1548.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3086
[2024-06-11 04:23:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.59 | bwd_microstep: 1292.71 | bwd_inner_microstep: 1292.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-11 04:23:44,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1585.40 | bwd_inner_microstep: 1585.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3910
[2024-06-11 04:23:46,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.42 | bwd_microstep: 1622.09 | bwd_inner_microstep: 1622.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 04:23:48,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1391.29 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 04:23:50,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.74 | bwd_microstep: 1492.40 | bwd_inner_microstep: 1492.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 04:23:52,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1253.79 | bwd_inner_microstep: 1253.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 04:23:54,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.12 | bwd_microstep: 1256.94 | bwd_inner_microstep: 1256.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625
[2024-06-11 04:23:56,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1414.98 | bwd_inner_microstep: 1414.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-11 04:23:58,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.26 | bwd_microstep: 1460.98 | bwd_inner_microstep: 1460.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-11 04:23:59,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.49 | bwd_microstep: 794.78 | bwd_inner_microstep: 794.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 04:24:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 04:24:03,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.99 | bwd_microstep: 1555.25 | bwd_inner_microstep: 1555.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273
[2024-06-11 04:24:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.88 | bwd_microstep: 874.90 | bwd_inner_microstep: 874.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-11 04:24:06,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.17 | bwd_microstep: 1473.97 | bwd_inner_microstep: 1473.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 04:24:08,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.18 | bwd_microstep: 1655.45 | bwd_inner_microstep: 1655.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-11 04:24:09,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.34 | bwd_microstep: 877.72 | bwd_inner_microstep: 877.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-11 04:24:11,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1493.09 | bwd_inner_microstep: 1493.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 04:24:13,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 1473.03 | bwd_inner_microstep: 1473.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-11 04:24:16,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.36 | bwd_microstep: 1645.53 | bwd_inner_microstep: 1645.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3736
[2024-06-11 04:24:24,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.61
[2024-06-11 04:24:24,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.28 | bwd_microstep: 7929.94 | bwd_inner_microstep: 2079.60 | bwd_allreduce_microstep: 5850.27 | step_microstep: 38.37
[2024-06-11 04:24:24,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15943.50 | bwd: 48791.37 | bwd_inner: 42940.17 | bwd_allreduce: 5850.51 | step: 39.91
{'loss': 1.1902, 'learning_rate': 6.574099263260092e-07, 'epoch': 0.92}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2440
[2024-06-11 04:24:26,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.55 | bwd_microstep: 1000.32 | bwd_inner_microstep: 1000.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-11 04:24:27,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.54 | bwd_microstep: 800.93 | bwd_inner_microstep: 800.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875
[2024-06-11 04:24:29,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 1578.48 | bwd_inner_microstep: 1578.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 04:24:31,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3922
[2024-06-11 04:24:33,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1485.77 | bwd_inner_microstep: 1485.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 702
[2024-06-11 04:24:33,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.82 | bwd_microstep: 285.06 | bwd_inner_microstep: 285.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 04:24:35,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.78 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3747
[2024-06-11 04:24:37,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.65 | bwd_microstep: 1635.02 | bwd_inner_microstep: 1635.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616
[2024-06-11 04:24:39,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1342.28 | bwd_inner_microstep: 1342.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 04:24:41,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-11 04:24:42,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.07 | bwd_microstep: 696.27 | bwd_inner_microstep: 696.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1981
[2024-06-11 04:24:43,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.07 | bwd_microstep: 828.46 | bwd_inner_microstep: 828.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2088
[2024-06-11 04:24:44,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.93 | bwd_microstep: 851.63 | bwd_inner_microstep: 851.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 04:24:46,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.51 | bwd_microstep: 1295.07 | bwd_inner_microstep: 1295.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2146
[2024-06-11 04:24:48,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.95 | bwd_microstep: 945.18 | bwd_inner_microstep: 945.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-11 04:24:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.03 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-11 04:24:50,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.43 | bwd_microstep: 788.33 | bwd_inner_microstep: 788.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511
[2024-06-11 04:24:52,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1316.55 | bwd_inner_microstep: 1316.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673
[2024-06-11 04:24:54,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1515.33 | bwd_inner_microstep: 1515.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 04:24:56,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1476.95 | bwd_inner_microstep: 1476.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 04:24:58,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1376.63 | bwd_inner_microstep: 1376.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-11 04:25:00,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.71 | bwd_microstep: 1431.00 | bwd_inner_microstep: 1430.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 04:25:02,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.77 | bwd_microstep: 1279.28 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692
[2024-06-11 04:25:04,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1329.91 | bwd_inner_microstep: 1329.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634
[2024-06-11 04:25:06,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1542.90 | bwd_inner_microstep: 1542.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-11 04:25:08,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.68 | bwd_microstep: 1288.90 | bwd_inner_microstep: 1288.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3820
[2024-06-11 04:25:10,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.48 | bwd_microstep: 1292.73 | bwd_inner_microstep: 1292.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-11 04:25:11,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.38 | bwd_microstep: 1313.24 | bwd_inner_microstep: 1313.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273
[2024-06-11 04:25:13,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.80 | bwd_microstep: 879.85 | bwd_inner_microstep: 879.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-11 04:25:14,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.16 | bwd_microstep: 1374.62 | bwd_inner_microstep: 1374.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3300
[2024-06-11 04:25:16,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.22 | bwd_microstep: 1230.72 | bwd_inner_microstep: 1230.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 04:25:26,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.09 | optimizer_step: 6.63
[2024-06-11 04:25:26,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 9599.72 | bwd_inner_microstep: 1436.12 | bwd_allreduce_microstep: 8163.55 | step_microstep: 38.14
[2024-06-11 04:25:26,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14586.61 | bwd: 47109.64 | bwd_inner: 38945.10 | bwd_allreduce: 8163.83 | step: 39.64
{'loss': 1.1754, 'learning_rate': 6.478996955026251e-07, 'epoch': 0.92}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 04:25:28,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1372.60 | bwd_inner_microstep: 1372.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 04:25:30,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.92 | bwd_microstep: 1269.72 | bwd_inner_microstep: 1269.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 04:25:32,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1480.53 | bwd_inner_microstep: 1480.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2961
[2024-06-11 04:25:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.91 | bwd_microstep: 1096.24 | bwd_inner_microstep: 1096.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 04:25:35,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1244.54 | bwd_inner_microstep: 1244.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496
[2024-06-11 04:25:37,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1188.04 | bwd_inner_microstep: 1188.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 04:25:39,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1278.94 | bwd_inner_microstep: 1278.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488
[2024-06-11 04:25:41,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1408.93 | bwd_inner_microstep: 1408.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718
[2024-06-11 04:25:43,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.08 | bwd_microstep: 1628.04 | bwd_inner_microstep: 1628.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-11 04:25:44,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.33 | bwd_microstep: 790.99 | bwd_inner_microstep: 790.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-11 04:25:46,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.39 | bwd_microstep: 1516.61 | bwd_inner_microstep: 1516.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4144
[2024-06-11 04:25:49,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.66 | bwd_microstep: 1839.25 | bwd_inner_microstep: 1839.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 04:25:51,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1390.54 | bwd_inner_microstep: 1390.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3519
[2024-06-11 04:25:52,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.44 | bwd_microstep: 1333.09 | bwd_inner_microstep: 1333.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-11 04:25:54,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.13 | bwd_microstep: 1521.49 | bwd_inner_microstep: 1521.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2297
[2024-06-11 04:25:56,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.58 | bwd_microstep: 1069.16 | bwd_inner_microstep: 1069.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3469
[2024-06-11 04:25:58,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1503.50 | bwd_inner_microstep: 1503.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-11 04:25:59,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.50 | bwd_microstep: 799.05 | bwd_inner_microstep: 799.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2300
[2024-06-11 04:26:00,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.35 | bwd_microstep: 879.08 | bwd_inner_microstep: 879.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-11 04:26:03,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.81 | bwd_microstep: 1605.49 | bwd_inner_microstep: 1605.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064
[2024-06-11 04:26:04,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.18 | bwd_microstep: 914.31 | bwd_inner_microstep: 914.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 04:26:06,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.12 | bwd_microstep: 1352.25 | bwd_inner_microstep: 1352.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 04:26:07,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1255.44 | bwd_inner_microstep: 1255.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 04:26:09,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-11 04:26:11,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.39 | bwd_microstep: 1321.89 | bwd_inner_microstep: 1321.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-11 04:26:13,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1512.28 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811
[2024-06-11 04:26:15,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1450.98 | bwd_inner_microstep: 1450.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-11 04:26:17,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1627.60 | bwd_inner_microstep: 1627.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582
[2024-06-11 04:26:19,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589
[2024-06-11 04:26:22,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1606.52 | bwd_inner_microstep: 1606.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-11 04:26:24,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.59 | bwd_microstep: 1484.05 | bwd_inner_microstep: 1484.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3576
[2024-06-11 04:26:29,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.13 | optimizer_step: 6.58
[2024-06-11 04:26:29,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.40 | bwd_microstep: 4946.77 | bwd_inner_microstep: 1349.18 | bwd_allreduce_microstep: 3597.53 | step_microstep: 38.34
[2024-06-11 04:26:29,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15987.43 | bwd: 46487.85 | bwd_inner: 42889.41 | bwd_allreduce: 3597.77 | step: 39.79
{'loss': 1.1658, 'learning_rate': 6.384576226759165e-07, 'epoch': 0.92}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 04:26:31,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1345.62 | bwd_inner_microstep: 1345.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4034
[2024-06-11 04:26:33,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.82 | bwd_microstep: 1714.35 | bwd_inner_microstep: 1714.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3852
[2024-06-11 04:26:35,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1457.38 | bwd_inner_microstep: 1457.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870
[2024-06-11 04:26:36,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.26 | bwd_microstep: 680.74 | bwd_inner_microstep: 680.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476
[2024-06-11 04:26:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.87 | bwd_microstep: 1214.64 | bwd_inner_microstep: 1214.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 04:26:40,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 04:26:42,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1388.78 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 04:26:44,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1281.82 | bwd_inner_microstep: 1281.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681
[2024-06-11 04:26:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.59 | bwd_microstep: 1323.25 | bwd_inner_microstep: 1323.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487
[2024-06-11 04:26:47,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.77 | bwd_microstep: 1187.78 | bwd_inner_microstep: 1187.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-11 04:26:49,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.00 | bwd_microstep: 1401.33 | bwd_inner_microstep: 1401.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498
[2024-06-11 04:26:51,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1412.51 | bwd_inner_microstep: 1412.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640
[2024-06-11 04:26:53,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.55 | bwd_microstep: 1603.17 | bwd_inner_microstep: 1603.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 04:26:55,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1357.31 | bwd_inner_microstep: 1357.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 665
[2024-06-11 04:26:55,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.94 | bwd_microstep: 278.00 | bwd_inner_microstep: 277.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2685
[2024-06-11 04:26:57,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.43 | bwd_microstep: 1024.28 | bwd_inner_microstep: 1024.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2041
[2024-06-11 04:26:58,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.57 | bwd_microstep: 838.55 | bwd_inner_microstep: 838.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515
[2024-06-11 04:27:00,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1319.17 | bwd_inner_microstep: 1319.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 04:27:02,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-11 04:27:04,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.63 | bwd_microstep: 1658.85 | bwd_inner_microstep: 1658.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 04:27:06,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1553.09 | bwd_inner_microstep: 1553.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3459
[2024-06-11 04:27:08,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1241.64 | bwd_inner_microstep: 1241.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 04:27:10,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1291.87 | bwd_inner_microstep: 1291.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 04:27:12,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1398.89 | bwd_inner_microstep: 1398.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673
[2024-06-11 04:27:13,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1326.91 | bwd_inner_microstep: 1326.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 04:27:15,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.56 | bwd_microstep: 1377.61 | bwd_inner_microstep: 1377.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-11 04:27:17,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.44 | bwd_microstep: 1282.15 | bwd_inner_microstep: 1282.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3593
[2024-06-11 04:27:19,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1456.17 | bwd_inner_microstep: 1456.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3635
[2024-06-11 04:27:21,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1710.46 | bwd_inner_microstep: 1710.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-11 04:27:23,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1395.16 | bwd_inner_microstep: 1395.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2033
[2024-06-11 04:27:25,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.80 | bwd_microstep: 931.68 | bwd_inner_microstep: 931.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580
[2024-06-11 04:27:30,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.07 | optimizer_step: 6.61
[2024-06-11 04:27:30,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.15 | bwd_microstep: 4188.25 | bwd_inner_microstep: 1915.80 | bwd_allreduce_microstep: 2272.40 | step_microstep: 37.68
[2024-06-11 04:27:30,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15717.72 | bwd: 44412.01 | bwd_inner: 42138.70 | bwd_allreduce: 2272.63 | step: 39.11
{'loss': 1.1994, 'learning_rate': 6.290837411008044e-07, 'epoch': 0.92}
/1726 [27:44:55<2:23:57, 62.14s/it]
 92%|█████████▏| 1588/1726 [27:45:56<2:22:03, 61.76s/it]


 92%|█████████▏| 1588/1726 [27:45:56<2:22:03, 61.76s/it]
 92%|█████████▏| 1589/1726 [27:47:01<2:23:17, 62.76s/it]


 92%|█████████▏| 1589/1726 [27:47:01<2:23:17, 62.76s/it]
 92%|█████████▏| 1590/1726 [27:48:03<2:21:44, 62.53s/it]


 92%|█████████▏| 1590/1726 [27:48:03<2:21:44, 62.53s/it]
 92%|█████████▏| 1591/1726 [27:49:06<2:20:52, 62.61s/it]


 92%|█████████▏| 1591/1726 [27:49:06<2:20:52, 62.61s/it]
 92%|█████████▏| 1592/1726 [27:50:06<2:18:23, 61.97s/it]


 92%|dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2027
[2024-06-11 04:27:31,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.88 | bwd_microstep: 804.14 | bwd_inner_microstep: 804.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412
[2024-06-11 04:27:32,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.12 | bwd_microstep: 1279.07 | bwd_inner_microstep: 1279.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-11 04:27:34,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.38 | bwd_microstep: 1371.71 | bwd_inner_microstep: 1371.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 04:27:36,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1245.71 | bwd_inner_microstep: 1245.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 04:27:38,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-11 04:27:40,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.50 | bwd_microstep: 1149.33 | bwd_inner_microstep: 1149.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 04:27:41,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1245.76 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3748
[2024-06-11 04:27:43,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1373.91 | bwd_inner_microstep: 1373.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-11 04:27:45,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1151.85 | bwd_inner_microstep: 1151.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-11 04:27:47,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1355.38 | bwd_inner_microstep: 1355.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549
[2024-06-11 04:27:48,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1297.99 | bwd_inner_microstep: 1297.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439
[2024-06-11 04:27:50,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.15 | bwd_microstep: 1185.75 | bwd_inner_microstep: 1185.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 04:27:51,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 792.63 | bwd_inner_microstep: 792.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131
[2024-06-11 04:27:53,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.40 | bwd_microstep: 930.91 | bwd_inner_microstep: 930.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3434
[2024-06-11 04:27:55,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.71 | bwd_microstep: 1476.64 | bwd_inner_microstep: 1476.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-11 04:27:57,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1492.79 | bwd_inner_microstep: 1492.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663
[2024-06-11 04:27:59,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.85 | bwd_microstep: 1619.94 | bwd_inner_microstep: 1619.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3379
[2024-06-11 04:28:01,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1953
[2024-06-11 04:28:02,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.21 | bwd_microstep: 779.01 | bwd_inner_microstep: 778.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2277
[2024-06-11 04:28:03,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.63 | bwd_microstep: 936.70 | bwd_inner_microstep: 936.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-11 04:28:05,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1491.59 | bwd_inner_microstep: 1491.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-11 04:28:07,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1296.47 | bwd_inner_microstep: 1296.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618
[2024-06-11 04:28:09,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1610.62 | bwd_inner_microstep: 1610.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3558
[2024-06-11 04:28:11,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.04 | bwd_microstep: 1249.71 | bwd_inner_microstep: 1249.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 04:28:13,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.98 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-11 04:28:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1401.48 | bwd_inner_microstep: 1401.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-11 04:28:17,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.98 | bwd_microstep: 1631.02 | bwd_inner_microstep: 1630.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-11 04:28:19,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.54 | bwd_microstep: 1508.11 | bwd_inner_microstep: 1508.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469
[2024-06-11 04:28:21,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.90 | bwd_microstep: 1242.26 | bwd_inner_microstep: 1242.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3468
[2024-06-11 04:28:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1359.80 | bwd_inner_microstep: 1359.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 04:28:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1255.02 | bwd_inner_microstep: 1254.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579
[2024-06-11 04:28:30,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.22 | optimizer_step: 6.59
[2024-06-11 04:28:30,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.94 | bwd_microstep: 4588.49 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 3132.28 | step_microstep: 37.96
[2024-06-11 04:28:30,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15412.82 | bwd: 44294.04 | bwd_inner: 41160.85 | bwd_allreduce: 3132.51 | step: 39.45
{'loss': 1.2023, 'learning_rate': 6.197780837920598e-07, 'epoch': 0.92}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426
[2024-06-11 04:28:31,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.63 | bwd_microstep: 1332.58 | bwd_inner_microstep: 1332.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4020
[2024-06-11 04:28:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.11 | bwd_microstep: 1610.98 | bwd_inner_microstep: 1610.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 04:28:35,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.20 | bwd_microstep: 1280.86 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 04:28:37,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851
[2024-06-11 04:28:39,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.07 | bwd_microstep: 1563.15 | bwd_inner_microstep: 1563.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446
[2024-06-11 04:28:41,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1449.45 | bwd_inner_microstep: 1449.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 04:28:43,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1253.95 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3782
[2024-06-11 04:28:45,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.40 | bwd_microstep: 1506.58 | bwd_inner_microstep: 1506.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-11 04:28:47,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1285.83 | bwd_inner_microstep: 1285.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 04:28:49,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418
[2024-06-11 04:28:50,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.12 | bwd_microstep: 1183.38 | bwd_inner_microstep: 1183.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 04:28:52,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1384.04 | bwd_inner_microstep: 1384.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-11 04:28:54,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1523.75 | bwd_inner_microstep: 1523.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656
[2024-06-11 04:28:57,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.88 | bwd_microstep: 1541.21 | bwd_inner_microstep: 1541.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 04:28:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.88 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960
[2024-06-11 04:29:00,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.53 | bwd_microstep: 894.60 | bwd_inner_microstep: 894.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3460
[2024-06-11 04:29:02,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.87 | bwd_microstep: 1537.11 | bwd_inner_microstep: 1537.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3520
[2024-06-11 04:29:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.78 | bwd_microstep: 1684.74 | bwd_inner_microstep: 1684.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3661
[2024-06-11 04:29:06,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1481.40 | bwd_inner_microstep: 1481.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 04:29:08,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1285.41 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-11 04:29:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3509
[2024-06-11 04:29:12,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.77 | bwd_microstep: 1223.41 | bwd_inner_microstep: 1223.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827
[2024-06-11 04:29:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1521.79 | bwd_inner_microstep: 1521.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3732
[2024-06-11 04:29:16,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.24 | bwd_microstep: 1607.72 | bwd_inner_microstep: 1607.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3728
[2024-06-11 04:29:18,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1481.71 | bwd_inner_microstep: 1481.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-11 04:29:20,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.40 | bwd_microstep: 1759.52 | bwd_inner_microstep: 1759.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3738
[2024-06-11 04:29:22,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.14 | bwd_microstep: 1460.92 | bwd_inner_microstep: 1460.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449
[2024-06-11 04:29:24,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1455.38 | bwd_inner_microstep: 1455.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3582
[2024-06-11 04:29:26,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1430.61 | bwd_inner_microstep: 1430.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 04:29:29,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.29 | bwd_microstep: 1650.33 | bwd_inner_microstep: 1650.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-11 04:29:30,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1406.06 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-11 04:29:33,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.02 | optimizer_step: 6.61
[2024-06-11 04:29:33,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.02 | bwd_microstep: 1599.10 | bwd_inner_microstep: 1556.35 | bwd_allreduce_microstep: 42.71 | step_microstep: 37.45
[2024-06-11 04:29:33,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17044.83 | bwd: 45703.28 | bwd_inner: 45659.68 | bwd_allreduce: 42.93 | step: 38.89
{'loss': 1.1508, 'learning_rate': 6.105406835241545e-07, 'epoch': 0.92}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-11 04:29:35,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1576.46 | bwd_inner_microstep: 1576.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-11 04:29:37,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.16 | bwd_microstep: 1478.94 | bwd_inner_microstep: 1478.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862
[2024-06-11 04:29:39,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.80 | bwd_microstep: 1665.28 | bwd_inner_microstep: 1665.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-11 04:29:41,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.96 | bwd_microstep: 1546.75 | bwd_inner_microstep: 1546.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-11 04:29:43,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1483.12 | bwd_inner_microstep: 1483.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735
[2024-06-11 04:29:46,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.26 | bwd_microstep: 1635.63 | bwd_inner_microstep: 1635.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964
[2024-06-11 04:29:47,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 702.35 | bwd_inner_microstep: 702.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-11 04:29:48,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.84 | bwd_microstep: 1286.60 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-11 04:29:50,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.13 | bwd_microstep: 1278.53 | bwd_inner_microstep: 1278.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 04:29:52,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.90 | bwd_microstep: 1384.22 | bwd_inner_microstep: 1384.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476
[2024-06-11 04:29:54,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.41 | bwd_microstep: 1341.62 | bwd_inner_microstep: 1341.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2826
[2024-06-11 04:29:55,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.47 | bwd_microstep: 1061.75 | bwd_inner_microstep: 1061.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503
[2024-06-11 04:29:58,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.14 | bwd_microstep: 1575.88 | bwd_inner_microstep: 1575.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145
[2024-06-11 04:29:59,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.14 | bwd_microstep: 847.42 | bwd_inner_microstep: 847.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1970
[2024-06-11 04:30:00,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.07 | bwd_microstep: 842.85 | bwd_inner_microstep: 842.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666
[2024-06-11 04:30:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.78 | bwd_microstep: 1329.71 | bwd_inner_microstep: 1329.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 04:30:04,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.90 | bwd_microstep: 1388.14 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 04:30:06,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1377.36 | bwd_inner_microstep: 1377.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629
[2024-06-11 04:30:08,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.50 | bwd_microstep: 1612.00 | bwd_inner_microstep: 1611.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-11 04:30:10,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1295.35 | bwd_inner_microstep: 1295.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 04:30:12,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.31 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3820
[2024-06-11 04:30:14,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1625.65 | bwd_inner_microstep: 1625.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3719
[2024-06-11 04:30:16,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1368.05 | bwd_inner_microstep: 1368.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-11 04:30:18,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1299.19 | bwd_inner_microstep: 1299.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 04:30:20,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1560.17 | bwd_inner_microstep: 1560.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3600
[2024-06-11 04:30:21,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.81 | bwd_microstep: 1213.64 | bwd_inner_microstep: 1213.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2382
[2024-06-11 04:30:23,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.21 | bwd_microstep: 966.62 | bwd_inner_microstep: 966.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3564
[2024-06-11 04:30:25,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1529.63 | bwd_inner_microstep: 1529.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3448
[2024-06-11 04:30:27,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.76 | bwd_microstep: 1481.84 | bwd_inner_microstep: 1481.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3590
[2024-06-11 04:30:29,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1339.26 | bwd_inner_microstep: 1339.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3837
[2024-06-11 04:30:31,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.20 | bwd_microstep: 1620.95 | bwd_inner_microstep: 1620.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805
[2024-06-11 04:30:34,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.04 | optimizer_gradients: 4.04 | optimizer_step: 6.58
[2024-06-11 04:30:34,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.59 | bwd_microstep: 2120.66 | bwd_inner_microstep: 1881.32 | bwd_allreduce_microstep: 239.29 | step_microstep: 37.74
[2024-06-11 04:30:34,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16423.28 | bwd: 44294.94 | bwd_inner: 44054.76 | bwd_allreduce: 239.52 | step: 39.21
{'loss': 1.1776, 'learning_rate': 6.013715728311664e-07, 'epoch': 0.92}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462
[2024-06-11 04:30:36,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.80 | bwd_microstep: 1472.50 | bwd_inner_microstep: 1472.34 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 04:30:38,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1257.96 | bwd_inner_microstep: 1257.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470
[2024-06-11 04:30:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 04:30:41,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-11 04:30:43,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.56 | bwd_microstep: 1638.65 | bwd_inner_microstep: 1638.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-11 04:30:44,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.94 | bwd_microstep: 788.25 | bwd_inner_microstep: 788.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 04:30:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699
[2024-06-11 04:30:49,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.72 | bwd_microstep: 1627.54 | bwd_inner_microstep: 1627.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022
[2024-06-11 04:30:50,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.33 | bwd_microstep: 807.12 | bwd_inner_microstep: 807.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2449
[2024-06-11 04:30:51,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.20 | bwd_microstep: 978.29 | bwd_inner_microstep: 978.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1901
[2024-06-11 04:30:52,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.62 | bwd_microstep: 717.86 | bwd_inner_microstep: 717.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560
[2024-06-11 04:31:02,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.55 | bwd_microstep: 1292.61 | bwd_inner_microstep: 1292.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 04:31:04,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.92 | bwd_microstep: 1368.06 | bwd_inner_microstep: 1368.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 04:31:06,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.02 | bwd_microstep: 1338.58 | bwd_inner_microstep: 1338.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652
[2024-06-11 04:31:08,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.05 | bwd_microstep: 1611.52 | bwd_inner_microstep: 1611.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3631
[2024-06-11 04:31:10,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.47 | bwd_microstep: 1650.13 | bwd_inner_microstep: 1650.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3630
[2024-06-11 04:31:12,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.88 | bwd_microstep: 1260.13 | bwd_inner_microstep: 1260.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 04:31:14,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1551.63 | bwd_inner_microstep: 1551.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511
[2024-06-11 04:31:16,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1413.00 | bwd_inner_microstep: 1412.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 04:31:18,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.75 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3497
[2024-06-11 04:31:20,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.06 | bwd_microstep: 1217.83 | bwd_inner_microstep: 1217.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-11 04:31:21,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.56 | bwd_microstep: 1291.17 | bwd_inner_microstep: 1291.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 04:31:23,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1259.70 | bwd_inner_microstep: 1259.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 04:31:25,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1552.93 | bwd_inner_microstep: 1552.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 04:31:27,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.48 | bwd_microstep: 1256.80 | bwd_inner_microstep: 1256.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-11 04:31:28,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.71 | bwd_microstep: 878.39 | bwd_inner_microstep: 878.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2063
[2024-06-11 04:31:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.64 | bwd_microstep: 877.28 | bwd_inner_microstep: 877.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529
[2024-06-11 04:31:31,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.29 | bwd_microstep: 1197.73 | bwd_inner_microstep: 1197.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584
[2024-06-11 04:31:33,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1429.18 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3426
[2024-06-11 04:31:35,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1480.95 | bwd_inner_microstep: 1480.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565
[2024-06-11 04:31:37,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1593.86 | bwd_inner_microstep: 1593.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2160
[2024-06-11 04:31:40,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.09 | optimizer_gradients: 4.03 | optimizer_step: 6.57
[2024-06-11 04:31:40,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.42 | bwd_microstep: 2412.02 | bwd_inner_microstep: 963.45 | bwd_allreduce_microstep: 1448.53 | step_microstep: 38.05
[2024-06-11 04:31:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15418.78 | bwd: 42557.82 | bwd_inner: 41108.27 | bwd_allreduce: 1448.82 | step: 39.58
{'loss': 1.2242, 'learning_rate': 5.922707840066544e-07, 'epoch': 0.92}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 04:31:42,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1271.22 | bwd_inner_microstep: 1271.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 04:31:44,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1478.05 | bwd_inner_microstep: 1478.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844
[2024-06-11 04:31:46,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.21 | bwd_microstep: 1558.81 | bwd_inner_microstep: 1558.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 04:31:48,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1276.99 | bwd_inner_microstep: 1276.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-11 04:31:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1441.57 | bwd_inner_microstep: 1441.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893
[2024-06-11 04:31:51,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 713.82 | bwd_inner_microstep: 713.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3766
[2024-06-11 04:31:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1590.48 | bwd_inner_microstep: 1590.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 04:31:55,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1385.00 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 04:31:57,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1384.48 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021
[2024-06-11 04:31:58,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.81 | bwd_microstep: 718.42 | bwd_inner_microstep: 718.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 04:32:00,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1248.19 | bwd_inner_microstep: 1248.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435
[2024-06-11 04:32:02,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.31 | bwd_microstep: 1442.54 | bwd_inner_microstep: 1442.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 04:32:04,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1480.90 | bwd_inner_microstep: 1480.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464
[2024-06-11 04:32:06,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1466.01 | bwd_inner_microstep: 1465.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3888
[2024-06-11 04:32:08,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1528.62 | bwd_inner_microstep: 1528.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 04:32:10,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.76 | bwd_microstep: 1283.99 | bwd_inner_microstep: 1283.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 04:32:11,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515
[2024-06-11 04:32:13,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1392.75 | bwd_inner_microstep: 1392.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644
[2024-06-11 04:32:15,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1312.14 | bwd_inner_microstep: 1312.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1954
[2024-06-11 04:32:16,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.58 | bwd_microstep: 828.86 | bwd_inner_microstep: 828.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526
[2024-06-11 04:32:18,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.97 | bwd_microstep: 1193.15 | bwd_inner_microstep: 1193.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680
[2024-06-11 04:32:20,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.60 | bwd_microstep: 1425.14 | bwd_inner_microstep: 1425.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537
[2024-06-11 04:32:22,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1415.09 | bwd_inner_microstep: 1415.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-11 04:32:24,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1413.18 | bwd_inner_microstep: 1413.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3553
[2024-06-11 04:32:26,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.25 | bwd_microstep: 1342.98 | bwd_inner_microstep: 1342.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-11 04:32:27,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.20 | bwd_microstep: 792.70 | bwd_inner_microstep: 792.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-11 04:32:29,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1473.53 | bwd_inner_microstep: 1473.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-11 04:32:31,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1395.47 | bwd_inner_microstep: 1395.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2799
[2024-06-11 04:32:32,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.00 | bwd_microstep: 1165.68 | bwd_inner_microstep: 1165.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-11 04:32:34,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1487.72 | bwd_inner_microstep: 1487.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-11 04:32:37,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.72 | bwd_microstep: 1597.21 | bwd_inner_microstep: 1597.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570
[2024-06-11 04:32:43,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-11 04:32:43,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.23 | bwd_microstep: 6139.68 | bwd_inner_microstep: 2089.54 | bwd_allreduce_microstep: 4050.08 | step_microstep: 38.07
[2024-06-11 04:32:43,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15910.25 | bwd: 46928.47 | bwd_inner: 42877.48 | bwd_allreduce: 4050.31 | step: 39.59
█████████▏| 1592/1726 [27:50:06<2:18:23, 61.97s/it]
 92%|█████████▏| 1593/1726 [27:51:06<2:16:04, 61.39s/it]


 92%|█████████▏| 1593/1726 [27:51:06<2:16:04, 61.39s/it]
 92%|█████████▏| 1594/1726 [27:52:09<2:16:10, 61.90s/it]


 92%|█████████▏| 1594/1726 [27:52:09<2:16:10, 61.90s/it]
 92%|█████████▏| 1595/1726 [27:53:10<2:14:35, 61.64s/it]


 92%|█████████▏| 1595/1726 [27:53:10<2:14:35, 61.64s/it]
 92%|█████████▏| 1596/1726 [27:54:17<2:16:38, 63.06s/it]


 92%|█████████▏| 1596/1726 [27:54:17<2:16:38, 63.06s/it]
 93%|█████████▎| 1597/1726 [27:55:20<2:15:39, 63.10s/it]
                           {'loss': 1.183, 'learning_rate': 5.832383491035499e-07, 'epoch': 0.93}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952
[2024-06-11 04:32:45,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.54 | bwd_microstep: 884.92 | bwd_inner_microstep: 884.74 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3947
[2024-06-11 04:32:47,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.31 | bwd_microstep: 1691.77 | bwd_inner_microstep: 1691.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4007
[2024-06-11 04:32:49,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1606.45 | bwd_inner_microstep: 1606.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-11 04:32:51,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.39 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3785
[2024-06-11 04:32:53,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1443.77 | bwd_inner_microstep: 1443.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 04:32:55,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.90 | bwd_microstep: 1276.91 | bwd_inner_microstep: 1276.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3555
[2024-06-11 04:32:57,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.41 | bwd_microstep: 1456.68 | bwd_inner_microstep: 1456.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 04:32:58,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1247.71 | bwd_inner_microstep: 1247.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 04:33:00,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-11 04:33:01,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.53 | bwd_microstep: 682.41 | bwd_inner_microstep: 682.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2009
[2024-06-11 04:33:02,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.26 | bwd_microstep: 835.36 | bwd_inner_microstep: 835.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-11 04:33:05,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.94 | bwd_microstep: 1625.72 | bwd_inner_microstep: 1625.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3669
[2024-06-11 04:33:07,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.77 | bwd_microstep: 1821.06 | bwd_inner_microstep: 1821.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905
[2024-06-11 04:33:08,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.82 | bwd_microstep: 778.73 | bwd_inner_microstep: 778.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487
[2024-06-11 04:33:10,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.46 | bwd_microstep: 1411.64 | bwd_inner_microstep: 1411.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438
[2024-06-11 04:33:12,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1348.95 | bwd_inner_microstep: 1348.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 04:33:14,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.52 | bwd_microstep: 1284.24 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2497
[2024-06-11 04:33:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.43 | bwd_microstep: 957.90 | bwd_inner_microstep: 957.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725
[2024-06-11 04:33:17,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.55 | bwd_microstep: 1338.13 | bwd_inner_microstep: 1338.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-11 04:33:19,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1491.69 | bwd_inner_microstep: 1491.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519
[2024-06-11 04:33:21,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1488.47 | bwd_inner_microstep: 1488.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154
[2024-06-11 04:33:22,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.83 | bwd_microstep: 849.12 | bwd_inner_microstep: 849.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535
[2024-06-11 04:33:24,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1293.58 | bwd_inner_microstep: 1293.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-11 04:33:26,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1394.35 | bwd_inner_microstep: 1394.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273
[2024-06-11 04:33:27,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.46 | bwd_microstep: 880.20 | bwd_inner_microstep: 880.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1584
[2024-06-11 04:33:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 224.77 | bwd_microstep: 591.81 | bwd_inner_microstep: 591.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3558
[2024-06-11 04:33:30,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.81 | bwd_microstep: 1524.98 | bwd_inner_microstep: 1524.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-11 04:33:32,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.03 | bwd_microstep: 1281.90 | bwd_inner_microstep: 1281.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3571
[2024-06-11 04:33:34,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 1330.66 | bwd_inner_microstep: 1330.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 04:33:36,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1546.20 | bwd_inner_microstep: 1546.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408
[2024-06-11 04:33:38,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1346.41 | bwd_inner_microstep: 1346.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3745
[2024-06-11 04:33:45,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.84 | optimizer_gradients: 4.23 | optimizer_step: 6.62
[2024-06-11 04:33:45,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.43 | bwd_microstep: 6749.51 | bwd_inner_microstep: 1774.29 | bwd_allreduce_microstep: 4975.15 | step_microstep: 39.97
[2024-06-11 04:33:45,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15368.69 | bwd: 46191.09 | bwd_inner: 41214.88 | bwd_allreduce: 4975.46 | step: 41.53
{'loss': 1.1716, 'learning_rate': 5.742742999340411e-07, 'epoch': 0.93}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-11 04:33:47,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.56 | bwd_microstep: 1268.94 | bwd_inner_microstep: 1268.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-11 04:33:49,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1245.08 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476
[2024-06-11 04:33:51,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1480.08 | bwd_inner_microstep: 1480.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 04:33:52,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.53 | bwd_microstep: 1244.12 | bwd_inner_microstep: 1244.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-11 04:33:55,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.69 | bwd_microstep: 1538.19 | bwd_inner_microstep: 1538.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-11 04:33:57,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1436.79 | bwd_inner_microstep: 1436.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422
[2024-06-11 04:33:58,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.74 | bwd_microstep: 1182.49 | bwd_inner_microstep: 1182.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-11 04:34:00,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.41 | bwd_microstep: 1287.75 | bwd_inner_microstep: 1287.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042
[2024-06-11 04:34:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.04 | bwd_microstep: 810.98 | bwd_inner_microstep: 810.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 04:34:03,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-11 04:34:05,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1291.82 | bwd_inner_microstep: 1291.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651
[2024-06-11 04:34:07,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1321.62 | bwd_inner_microstep: 1321.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 04:34:09,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1482.86 | bwd_inner_microstep: 1482.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3696
[2024-06-11 04:34:11,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1554.85 | bwd_inner_microstep: 1554.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 04:34:13,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1379.75 | bwd_inner_microstep: 1379.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457
[2024-06-11 04:34:15,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.68 | bwd_microstep: 1341.98 | bwd_inner_microstep: 1341.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 04:34:16,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1339.26 | bwd_inner_microstep: 1339.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 04:34:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.94 | bwd_microstep: 1483.48 | bwd_inner_microstep: 1483.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2087
[2024-06-11 04:34:20,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.01 | bwd_microstep: 912.11 | bwd_inner_microstep: 912.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3667
[2024-06-11 04:34:22,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.55 | bwd_microstep: 1550.05 | bwd_inner_microstep: 1550.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 04:34:24,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598
[2024-06-11 04:34:26,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1501.87 | bwd_inner_microstep: 1501.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2135
[2024-06-11 04:34:27,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.57 | bwd_microstep: 930.03 | bwd_inner_microstep: 930.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 04:34:29,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.70 | bwd_microstep: 1656.09 | bwd_inner_microstep: 1656.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 04:34:31,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1358.98 | bwd_inner_microstep: 1358.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192
[2024-06-11 04:34:33,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.35 | bwd_microstep: 957.09 | bwd_inner_microstep: 957.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-11 04:34:35,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1403.08 | bwd_inner_microstep: 1403.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3598
[2024-06-11 04:34:37,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.07 | bwd_microstep: 1464.25 | bwd_inner_microstep: 1464.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-11 04:34:39,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.26 | bwd_microstep: 1650.41 | bwd_inner_microstep: 1650.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3808
[2024-06-11 04:34:41,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.64 | bwd_microstep: 1604.48 | bwd_inner_microstep: 1604.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-11 04:34:43,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.20 | bwd_microstep: 1181.85 | bwd_inner_microstep: 1181.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564
[2024-06-11 04:34:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 04:34:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.54 | bwd_microstep: 3919.76 | bwd_inner_microstep: 1693.88 | bwd_allreduce_microstep: 2225.83 | step_microstep: 37.80
[2024-06-11 04:34:47,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16147.64 | bwd: 45561.27 | bwd_inner: 43334.54 | bwd_allreduce: 2226.06 | step: 39.29
{'loss': 1.1702, 'learning_rate': 5.653786680694629e-07, 'epoch': 0.93}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 04:34:49,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1468.10 | bwd_inner_microstep: 1468.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 04:34:51,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1282.58 | bwd_inner_microstep: 1282.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 04:34:53,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1478.72 | bwd_inner_microstep: 1478.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768
[2024-06-11 04:34:55,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.58 | bwd_microstep: 1342.97 | bwd_inner_microstep: 1342.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-11 04:34:57,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1310.29 | bwd_inner_microstep: 1310.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 04:34:59,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1384.77 | bwd_inner_microstep: 1384.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 04:35:00,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1285.50 | bwd_inner_microstep: 1285.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-11 04:35:02,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.08 | bwd_microstep: 793.18 | bwd_inner_microstep: 793.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-11 04:35:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1476.49 | bwd_inner_microstep: 1476.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 04:35:05,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.93 | bwd_microstep: 1339.85 | bwd_inner_microstep: 1339.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629
[2024-06-11 04:35:07,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.71 | bwd_microstep: 1323.03 | bwd_inner_microstep: 1323.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 04:35:09,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1351.75 | bwd_inner_microstep: 1351.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3510
[2024-06-11 04:35:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.19 | bwd_microstep: 1432.59 | bwd_inner_microstep: 1432.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2161
[2024-06-11 04:35:12,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.08 | bwd_microstep: 884.52 | bwd_inner_microstep: 884.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3700
[2024-06-11 04:35:15,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.47 | bwd_microstep: 1720.95 | bwd_inner_microstep: 1720.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837
[2024-06-11 04:35:17,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1460.29 | bwd_inner_microstep: 1460.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635
[2024-06-11 04:35:19,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.91 | bwd_microstep: 1612.84 | bwd_inner_microstep: 1612.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631
[2024-06-11 04:35:21,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1512.20 | bwd_inner_microstep: 1512.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 04:35:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.91 | bwd_inner_microstep: 1459.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2071
[2024-06-11 04:35:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.76 | bwd_microstep: 725.82 | bwd_inner_microstep: 725.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817
[2024-06-11 04:35:26,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1355.42 | bwd_inner_microstep: 1355.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 04:35:28,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1561.64 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3016
[2024-06-11 04:35:30,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.93 | bwd_microstep: 1245.09 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-11 04:35:32,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.65 | bwd_microstep: 1476.54 | bwd_inner_microstep: 1476.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 04:35:34,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1282.81 | bwd_inner_microstep: 1282.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-11 04:35:35,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.54 | bwd_microstep: 797.35 | bwd_inner_microstep: 797.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484
[2024-06-11 04:35:36,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.57 | bwd_microstep: 1186.50 | bwd_inner_microstep: 1186.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-11 04:35:38,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.70 | bwd_microstep: 1284.44 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-11 04:35:40,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1408.52 | bwd_inner_microstep: 1408.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650
[2024-06-11 04:35:42,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.94 | bwd_microstep: 1515.47 | bwd_inner_microstep: 1515.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-11 04:35:44,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.39 | bwd_microstep: 970.72 | bwd_inner_microstep: 970.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 04:35:51,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.09 | optimizer_step: 6.62
[2024-06-11 04:35:51,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.97 | bwd_microstep: 6424.27 | bwd_inner_microstep: 1692.13 | bwd_allreduce_microstep: 4732.08 | step_microstep: 37.85
[2024-06-11 04:35:51,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15821.16 | bwd: 47155.16 | bwd_inner: 42422.14 | bwd_allreduce: 4732.32 | step: 39.39
{'loss': 1.154, 'learning_rate': 5.565514848401887e-07, 'epoch': 0.93}


 93%|█████████▎| 1597/1726 [27:55:20<2:15:39, 63.10s/it]
 93%|█████████▎| 1598/1726 [27:56:22<2:13:50, 62.74s/it]


 93%|█████████▎| 1598/1726 [27:56:22<2:13:50, 62.74s/it]
 93%|█████████▎| 1599/1726 [27:57:24<2:12:21, 62.53s/it]


 93%|█████████▎| 1599/1726 [27:57:24<2:12:21, 62.53s/it]
 93%|█████████▎| 1600/1726 [27:58:27<2:11:48, 62.77s/it]


 93%|█████████▎| 1600/1726 [27:58:27<2:11:48, 62.77s/it][INFO|trainer.py:2936] 2024-06-11 04:35:53,554 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600
[INFO|configuration_utils.py:473] 2024-06-11 04:35:53,558 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/config.json
[INFO|configuration_utils.py:594] 2024-06-11 04:35:53,561 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-11 04:36:01,298 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-11 04:36:01,332 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-11 04:36:01,359 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-11 04:36:01,372 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/added_tokens.json
[2024-06-11 04:36:01,712] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1600 is about to be saved!
[2024-06-11 04:36:01,728] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/mp_rank_00_model_states.pt
[2024-06-11 04:36:01,728] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/mp_rank_00_model_states.pt...
[2024-06-11 04:36:09,775] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/mp_rank_00_model_states.pt.
[2024-06-11 04:36:09,813] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-06-11 04:36:21,305] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-06-11 04:36:21,321] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-06-11 04:36:21,321] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1600 is ready now!
[INFO|trainer.py:3028] 2024-06-11 04:36:21,503 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-1000] due to args.save_total_limit
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 04:36:24,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1361.75 | bwd_inner_microstep: 1361.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 04:36:25,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1335.17 | bwd_inner_microstep: 1335.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838
[2024-06-11 04:36:28,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.81 | bwd_microstep: 1544.90 | bwd_inner_microstep: 1544.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-11 04:36:30,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.83 | bwd_microstep: 1640.59 | bwd_inner_microstep: 1640.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 04:36:32,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.36 | bwd_microstep: 1547.30 | bwd_inner_microstep: 1547.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1862
[2024-06-11 04:36:33,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.46 | bwd_microstep: 705.44 | bwd_inner_microstep: 705.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-11 04:36:35,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.86 | bwd_microstep: 1435.22 | bwd_inner_microstep: 1435.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-11 04:36:36,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.51 | bwd_microstep: 796.39 | bwd_inner_microstep: 796.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 04:36:55,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-11 04:36:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1277.99 | bwd_inner_microstep: 1277.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487
[2024-06-11 04:36:59,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1405.99 | bwd_inner_microstep: 1405.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1981
[2024-06-11 04:37:00,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.35 | bwd_microstep: 764.30 | bwd_inner_microstep: 764.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-11 04:37:02,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.79 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-11 04:37:04,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.28 | bwd_microstep: 1404.25 | bwd_inner_microstep: 1404.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 04:37:06,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1387.06 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.18 | step_microstep: 0.26
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 04:37:08,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1380.14 | bwd_inner_microstep: 1380.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641
[2024-06-11 04:37:10,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1510.20 | bwd_inner_microstep: 1510.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 04:37:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1401.32 | bwd_inner_microstep: 1401.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920
[2024-06-11 04:37:13,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.98 | bwd_microstep: 719.17 | bwd_inner_microstep: 719.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649
[2024-06-11 04:37:15,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1413.74 | bwd_inner_microstep: 1413.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063
[2024-06-11 04:37:16,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.37 | bwd_microstep: 812.20 | bwd_inner_microstep: 812.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531
[2024-06-11 04:37:18,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.67 | bwd_microstep: 1293.85 | bwd_inner_microstep: 1293.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 04:37:20,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.68 | bwd_microstep: 1554.52 | bwd_inner_microstep: 1554.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-11 04:37:22,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.79 | bwd_microstep: 1517.67 | bwd_inner_microstep: 1517.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-11 04:37:24,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1557.54 | bwd_inner_microstep: 1557.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-11 04:37:25,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.46 | bwd_microstep: 802.58 | bwd_inner_microstep: 802.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-11 04:37:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1347.22 | bwd_inner_microstep: 1347.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-11 04:37:29,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1499.29 | bwd_inner_microstep: 1499.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-11 04:37:31,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1452.46 | bwd_inner_microstep: 1452.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934
[2024-06-11 04:37:32,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.29 | bwd_microstep: 726.89 | bwd_inner_microstep: 726.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3436
[2024-06-11 04:37:34,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.71 | bwd_microstep: 1510.03 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 04:37:43,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.62
[2024-06-11 04:37:43,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 7510.08 | bwd_inner_microstep: 1543.53 | bwd_allreduce_microstep: 5966.47 | step_microstep: 39.04
[2024-06-11 04:37:43,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15511.15 | bwd: 47373.12 | bwd_inner: 41405.47 | bwd_allreduce: 5966.90 | step: 40.78
{'loss': 1.1878, 'learning_rate': 5.477927813355056e-07, 'epoch': 0.93}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526
[2024-06-11 04:37:45,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1477.61 | bwd_inner_microstep: 1477.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-11 04:37:47,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1529.30 | bwd_inner_microstep: 1529.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2313
[2024-06-11 04:37:48,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.84 | bwd_microstep: 882.55 | bwd_inner_microstep: 882.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-11 04:37:50,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1506.19 | bwd_inner_microstep: 1506.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 04:37:52,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1380.90 | bwd_inner_microstep: 1380.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3848
[2024-06-11 04:37:54,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.00 | bwd_microstep: 1762.19 | bwd_inner_microstep: 1762.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405
[2024-06-11 04:37:56,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.46 | bwd_microstep: 1179.95 | bwd_inner_microstep: 1179.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-11 04:37:58,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1250.58 | bwd_inner_microstep: 1250.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3986
[2024-06-11 04:38:00,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.12 | bwd_microstep: 1706.97 | bwd_inner_microstep: 1706.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404
[2024-06-11 04:38:02,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.56 | bwd_microstep: 1310.09 | bwd_inner_microstep: 1310.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-11 04:38:04,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3671
[2024-06-11 04:38:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.71 | bwd_microstep: 1593.14 | bwd_inner_microstep: 1593.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460
[2024-06-11 04:38:08,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.91 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-11 04:38:10,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.45 | bwd_microstep: 1340.91 | bwd_inner_microstep: 1340.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3667
[2024-06-11 04:38:12,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.04 | bwd_microstep: 1562.23 | bwd_inner_microstep: 1562.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3769
[2024-06-11 04:38:14,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1601.98 | bwd_inner_microstep: 1601.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-11 04:38:16,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1285.69 | bwd_inner_microstep: 1285.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 04:38:17,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.93 | bwd_microstep: 1290.05 | bwd_inner_microstep: 1290.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841
[2024-06-11 04:38:19,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.98 | bwd_microstep: 1465.28 | bwd_inner_microstep: 1465.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-11 04:38:22,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.86 | bwd_microstep: 1639.44 | bwd_inner_microstep: 1639.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3642
[2024-06-11 04:38:24,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.71 | bwd_microstep: 1345.52 | bwd_inner_microstep: 1345.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828
[2024-06-11 04:38:26,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-11 04:38:28,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1350.18 | bwd_inner_microstep: 1350.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2514
[2024-06-11 04:38:29,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.73 | bwd_microstep: 960.86 | bwd_inner_microstep: 960.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-11 04:38:31,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1415.09 | bwd_inner_microstep: 1415.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049
[2024-06-11 04:38:32,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.66 | bwd_microstep: 910.59 | bwd_inner_microstep: 910.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-11 04:38:34,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.20 | bwd_microstep: 1181.12 | bwd_inner_microstep: 1181.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 04:38:36,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1377.96 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3772
[2024-06-11 04:38:38,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.68 | bwd_microstep: 1737.93 | bwd_inner_microstep: 1737.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-11 04:38:40,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.83 | bwd_microstep: 1528.58 | bwd_inner_microstep: 1528.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 04:38:42,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1341.80 | bwd_inner_microstep: 1341.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586
[2024-06-11 04:38:45,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.06 | optimizer_gradients: 4.02 | optimizer_step: 6.62
[2024-06-11 04:38:45,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.61 | bwd_microstep: 1988.41 | bwd_inner_microstep: 1696.13 | bwd_allreduce_microstep: 292.22 | step_microstep: 39.27
[2024-06-11 04:38:45,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16699.96 | bwd: 45017.10 | bwd_inner: 44723.98 | bwd_allreduce: 292.45 | step: 40.78
{'loss': 1.1753, 'learning_rate': 5.391025884035239e-07, 'epoch': 0.93}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2468
[2024-06-11 04:38:46,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.18 | bwd_microstep: 948.02 | bwd_inner_microstep: 947.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012
[2024-06-11 04:38:47,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.51 | bwd_microstep: 802.38 | bwd_inner_microstep: 802.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 04:38:49,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-11 04:38:50,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.30 | bwd_microstep: 790.27 | bwd_inner_microstep: 790.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 04:38:52,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1385.18 | bwd_inner_microstep: 1385.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3780
[2024-06-11 04:38:54,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.69 | bwd_microstep: 1348.22 | bwd_inner_microstep: 1348.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 04:38:56,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1243.88 | bwd_inner_microstep: 1243.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-11 04:38:57,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 797.58 | bwd_inner_microstep: 797.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908
[2024-06-11 04:38:58,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.35 | bwd_microstep: 686.24 | bwd_inner_microstep: 686.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966
[2024-06-11 04:38:59,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.18 | bwd_microstep: 853.77 | bwd_inner_microstep: 853.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2872
[2024-06-11 04:39:00,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.91 | bwd_microstep: 1175.45 | bwd_inner_microstep: 1175.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478
[2024-06-11 04:39:03,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.76 | bwd_microstep: 1573.98 | bwd_inner_microstep: 1573.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3502
[2024-06-11 04:39:05,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.40 | bwd_microstep: 1679.93 | bwd_inner_microstep: 1679.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-11 04:39:07,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1289.34 | bwd_inner_microstep: 1289.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2448
[2024-06-11 04:39:08,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 949.08 | bwd_inner_microstep: 949.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610
[2024-06-11 04:39:10,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1341.25 | bwd_inner_microstep: 1341.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515
[2024-06-11 04:39:12,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.40 | bwd_microstep: 1487.05 | bwd_inner_microstep: 1487.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-11 04:39:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.19 | bwd_microstep: 1459.38 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3827
[2024-06-11 04:39:16,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.07 | bwd_microstep: 1263.14 | bwd_inner_microstep: 1263.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3838
[2024-06-11 04:39:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1393.86 | bwd_inner_microstep: 1393.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444
[2024-06-11 04:39:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1353.15 | bwd_inner_microstep: 1353.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-11 04:39:21,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3679
[2024-06-11 04:39:23,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.68 | bwd_microstep: 1391.66 | bwd_inner_microstep: 1391.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285
[2024-06-11 04:39:25,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.32 | bwd_microstep: 880.22 | bwd_inner_microstep: 880.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-11 04:39:26,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1303.21 | bwd_inner_microstep: 1303.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553
[2024-06-11 04:39:28,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.70 | bwd_microstep: 1417.23 | bwd_inner_microstep: 1417.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3378
[2024-06-11 04:39:30,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.32 | bwd_microstep: 1339.97 | bwd_inner_microstep: 1339.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-11 04:39:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.23 | bwd_microstep: 898.44 | bwd_inner_microstep: 898.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2249
[2024-06-11 04:39:33,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.32 | bwd_microstep: 1062.38 | bwd_inner_microstep: 1062.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2259
[2024-06-11 04:39:34,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.90 | bwd_microstep: 808.74 | bwd_inner_microstep: 808.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 04:39:36,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.74 | bwd_microstep: 1282.84 | bwd_inner_microstep: 1282.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625
[2024-06-11 04:39:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.61
[2024-06-11 04:39:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 8638.37 | bwd_inner_microstep: 1470.57 | bwd_allreduce_microstep: 7167.73 | step_microstep: 38.35
[2024-06-11 04:39:45,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14415.18 | bwd: 45653.43 | bwd_inner: 38484.73 | bwd_allreduce: 7167.99 | step: 39.86
{'loss': 1.1944, 'learning_rate': 5.304809366510566e-07, 'epoch': 0.93}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-11 04:39:47,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.24 | bwd_microstep: 1574.72 | bwd_inner_microstep: 1574.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477
[2024-06-11 04:39:49,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1377.48 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3405
[2024-06-11 04:39:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1293.19 | bwd_inner_microstep: 1293.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837
[2024-06-11 04:39:53,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.55 | bwd_microstep: 1484.96 | bwd_inner_microstep: 1484.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837
[2024-06-11 04:39:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1484.91 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 04:39:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1384.05 | bwd_inner_microstep: 1384.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 04:39:59,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3731
[2024-06-11 04:40:01,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.12 | bwd_microstep: 1630.26 | bwd_inner_microstep: 1630.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-11 04:40:03,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.24 | bwd_microstep: 1559.97 | bwd_inner_microstep: 1559.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458
[2024-06-11 04:40:05,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.74 | bwd_microstep: 1342.04 | bwd_inner_microstep: 1342.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-11 04:40:06,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.19 | bwd_microstep: 1180.30 | bwd_inner_microstep: 1180.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2408
[2024-06-11 04:40:08,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.19 | bwd_microstep: 937.69 | bwd_inner_microstep: 937.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4126
[2024-06-11 04:40:10,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1442.54 | bwd_inner_microstep: 1442.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3453
[2024-06-11 04:40:12,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.08 | bwd_microstep: 1377.95 | bwd_inner_microstep: 1377.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1964
[2024-06-11 04:40:13,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.81 | bwd_microstep: 889.53 | bwd_inner_microstep: 889.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-11 04:40:15,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1408.46 | bwd_inner_microstep: 1408.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3544
[2024-06-11 04:40:17,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1452.65 | bwd_inner_microstep: 1452.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724
[2024-06-11 04:40:19,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1626.10 | bwd_inner_microstep: 1626.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634
[2024-06-11 04:40:21,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.93 | bwd_microstep: 1311.62 | bwd_inner_microstep: 1311.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-11 04:40:23,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.23 | bwd_microstep: 1602.73 | bwd_inner_microstep: 1602.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3623
[2024-06-11 04:40:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1535.24 | bwd_inner_microstep: 1535.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3821
[2024-06-11 04:40:28,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.50 | bwd_microstep: 1852.99 | bwd_inner_microstep: 1852.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824
[2024-06-11 04:40:30,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1453.63 | bwd_inner_microstep: 1453.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122
[2024-06-11 04:40:31,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.72 | bwd_microstep: 927.13 | bwd_inner_microstep: 927.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3724
[2024-06-11 04:40:33,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.24 | bwd_microstep: 1594.58 | bwd_inner_microstep: 1594.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 04:40:35,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 04:40:37,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1554.09 | bwd_inner_microstep: 1554.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-11 04:40:39,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.15 | bwd_microstep: 1440.50 | bwd_inner_microstep: 1440.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595
[2024-06-11 04:40:41,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.37 | bwd_microstep: 1505.45 | bwd_inner_microstep: 1505.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797
[2024-06-11 04:40:43,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1556.82 | bwd_inner_microstep: 1556.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-11 04:40:45,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.42 | bwd_microstep: 911.33 | bwd_inner_microstep: 911.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-11 04:40:47,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.03 | optimizer_step: 6.63
[2024-06-11 04:40:47,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1531.34 | bwd_inner_microstep: 1523.66 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.45
[2024-06-11 04:40:47,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16694.77 | bwd: 44755.69 | bwd_inner: 44747.16 | bwd_allreduce: 7.86 | step: 38.93
{'loss': 1.1298, 'learning_rate': 5.219278564435204e-07, 'epoch': 0.93}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549
[2024-06-11 04:40:49,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1488.09 | bwd_inner_microstep: 1488.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 04:40:51,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1278.59 | bwd_inner_microstep: 1278.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3905
[2024-06-11 04:40:53,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1484.45 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3958
[2024-06-11 04:40:55,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.58 | bwd_microstep: 1696.74 | bwd_inner_microstep: 1696.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4159
[2024-06-11 04:40:57,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.65 | bwd_microstep: 1545.39 | bwd_inner_microstep: 1545.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2333
[2024-06-11 04:40:58,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.17 | bwd_microstep: 982.18 | bwd_inner_microstep: 982.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 04:41:00,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.36 | bwd_microstep: 1380.81 | bwd_inner_microstep: 1380.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3558
[2024-06-11 04:41:02,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.86 | bwd_microstep: 1249.04 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 04:41:03,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.62 | bwd_microstep: 792.25 | bwd_inner_microstep: 792.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803
[2024-06-11 04:41:05,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1553.74 | bwd_inner_microstep: 1553.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 04:41:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1254.12 | bwd_inner_microstep: 1254.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638
[2024-06-11 04:41:09,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1314.89 | bwd_inner_microstep: 1314.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502
[2024-06-11 04:41:11,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.57 | bwd_microstep: 1328.91 | bwd_inner_microstep: 1328.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 04:41:13,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1378.33 | bwd_inner_microstep: 1378.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3640
[2024-06-11 04:41:15,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.29 | bwd_microstep: 1657.01 | bwd_inner_microstep: 1656.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2216
[2024-06-11 04:41:16,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.77 | bwd_microstep: 925.68 | bwd_inner_microstep: 925.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3670
[2024-06-11 04:41:18,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.60 | bwd_microstep: 1548.44 | bwd_inner_microstep: 1548.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 04:41:20,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1355.02 | bwd_inner_microstep: 1354.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 04:41:22,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 04:41:24,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1395.50 | bwd_inner_microstep: 1395.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1995
[2024-06-11 04:41:25,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.14 | bwd_microstep: 708.86 | bwd_inner_microstep: 708.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 04:41:27,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1396.89 | bwd_inner_microstep: 1396.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3485
[2024-06-11 04:41:29,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.30 | bwd_microstep: 1444.60 | bwd_inner_microstep: 1444.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 04:41:31,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1246.79 | bwd_inner_microstep: 1246.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3318
[2024-06-11 04:41:32,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.37 | bwd_microstep: 1230.18 | bwd_inner_microstep: 1230.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559
[2024-06-11 04:41:35,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1588.49 | bwd_inner_microstep: 1588.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562
[2024-06-11 04:41:37,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1421.48 | bwd_inner_microstep: 1421.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1905
[2024-06-11 04:41:38,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.33 | bwd_microstep: 745.15 | bwd_inner_microstep: 745.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-11 04:41:40,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.59 | bwd_microstep: 1449.68 | bwd_inner_microstep: 1449.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564
[2024-06-11 04:41:42,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.60 | bwd_microstep: 1600.33 | bwd_inner_microstep: 1600.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770
[2024-06-11 04:41:44,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.44 | bwd_microstep: 1541.67 | bwd_inner_microstep: 1541.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3755
[2024-06-11 04:41:50,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 04:41:50,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 5143.09 | bwd_inner_microstep: 1550.97 | bwd_allreduce_microstep: 3592.06 | step_microstep: 37.76
[2024-06-11 04:41:50,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16037.13 | bwd: 46516.54 | bwd_inner: 42923.58 | bwd_allreduce: 3592.29 | step: 39.22
{'loss': 1.1499, 'learning_rate': 5.134433779048186e-07, 'epoch': 0.93}

 93%|█████████▎| 1601/1726 [28:00:19<2:41:30, 77.53s/it]


 93%|█████████▎| 1601/1726 [28:00:19<2:41:30, 77.53s/it]
 93%|█████████▎| 1602/1726 [28:01:21<2:30:37, 72.89s/it]


 93%|█████████▎| 1602/1726 [28:01:21<2:30:37, 72.89s/it]
 93%|█████████▎| 1603/1726 [28:02:22<2:21:44, 69.14s/it]


 93%|█████████▎| 1603/1726 [28:02:22<2:21:44, 69.14s/it]
 93%|█████████▎| 1604/1726 [28:03:24<2:16:05, 66.93s/it]


 93%|█████████▎| 1604/1726 [28:03:24<2:16:05, 66.93s/it]
 93%|█████████▎| 1605/1726 [28:04:26<2:12:31, 65.72s/it]


 93%|█████████▎| 160dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2652
[2024-06-11 04:41:51,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.76 | bwd_microstep: 1106.16 | bwd_inner_microstep: 1106.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 04:41:53,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.15 | bwd_microstep: 1338.24 | bwd_inner_microstep: 1338.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3965
[2024-06-11 04:41:55,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.67 | bwd_microstep: 1686.06 | bwd_inner_microstep: 1686.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 04:41:57,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1375.35 | bwd_inner_microstep: 1375.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810
[2024-06-11 04:41:59,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1477.49 | bwd_inner_microstep: 1477.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3718
[2024-06-11 04:42:01,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.20 | bwd_microstep: 1459.31 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 04:42:03,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1387.60 | bwd_inner_microstep: 1387.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756
[2024-06-11 04:42:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.84 | bwd_microstep: 1537.30 | bwd_inner_microstep: 1537.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 04:42:07,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.89 | bwd_microstep: 1248.18 | bwd_inner_microstep: 1248.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 04:42:09,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1255.45 | bwd_inner_microstep: 1255.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411
[2024-06-11 04:42:11,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.16 | bwd_microstep: 1367.87 | bwd_inner_microstep: 1367.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504
[2024-06-11 04:42:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680
[2024-06-11 04:42:15,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.11 | bwd_microstep: 1617.44 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3936
[2024-06-11 04:42:17,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.14 | bwd_microstep: 1793.89 | bwd_inner_microstep: 1793.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524
[2024-06-11 04:42:20,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 1587.10 | bwd_inner_microstep: 1587.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645
[2024-06-11 04:42:22,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.77 | bwd_microstep: 1607.78 | bwd_inner_microstep: 1607.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 04:42:24,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1382.77 | bwd_inner_microstep: 1382.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 04:42:26,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1397.70 | bwd_inner_microstep: 1397.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3617
[2024-06-11 04:42:27,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1341.93 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3840
[2024-06-11 04:42:30,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.68 | bwd_microstep: 1707.92 | bwd_inner_microstep: 1707.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840
[2024-06-11 04:42:32,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.02 | bwd_microstep: 1659.76 | bwd_inner_microstep: 1659.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 04:42:34,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3826
[2024-06-11 04:42:36,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.10 | bwd_microstep: 1700.62 | bwd_inner_microstep: 1700.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 04:42:38,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1491.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544
[2024-06-11 04:42:40,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1427.77 | bwd_inner_microstep: 1427.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605
[2024-06-11 04:42:43,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.09 | bwd_microstep: 1610.04 | bwd_inner_microstep: 1610.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 04:42:44,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1349.01 | bwd_inner_microstep: 1348.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542
[2024-06-11 04:42:46,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1455.30 | bwd_inner_microstep: 1455.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813
[2024-06-11 04:42:49,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.99 | bwd_microstep: 1617.47 | bwd_inner_microstep: 1617.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3470
[2024-06-11 04:42:51,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1541.99 | bwd_inner_microstep: 1541.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-11 04:42:53,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.12 | bwd_microstep: 1507.17 | bwd_inner_microstep: 1507.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3811
[2024-06-11 04:42:55,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.04 | optimizer_step: 6.64
[2024-06-11 04:42:55,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.83 | bwd_microstep: 1740.83 | bwd_inner_microstep: 1733.16 | bwd_allreduce_microstep: 7.62 | step_microstep: 37.50
[2024-06-11 04:42:55,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17684.09 | bwd: 47606.92 | bwd_inner: 47598.41 | bwd_allreduce: 7.85 | step: 39.01
{'loss': 1.1236, 'learning_rate': 5.05027530917237e-07, 'epoch': 0.93}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472
[2024-06-11 04:42:57,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.36 | bwd_microstep: 1477.79 | bwd_inner_microstep: 1477.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3928
[2024-06-11 04:43:00,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.95 | bwd_microstep: 1592.28 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 04:43:01,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1251.30 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-11 04:43:04,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 1651.90 | bwd_inner_microstep: 1651.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-11 04:43:05,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.09 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-11 04:43:07,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.94 | bwd_microstep: 1285.70 | bwd_inner_microstep: 1285.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-11 04:43:08,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.79 | bwd_microstep: 682.08 | bwd_inner_microstep: 682.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-11 04:43:10,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.71 | bwd_microstep: 1437.40 | bwd_inner_microstep: 1437.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-11 04:43:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143
[2024-06-11 04:43:13,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.35 | bwd_microstep: 932.03 | bwd_inner_microstep: 932.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 04:43:15,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.98 | bwd_microstep: 1289.64 | bwd_inner_microstep: 1289.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552
[2024-06-11 04:43:17,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.15 | bwd_microstep: 1297.03 | bwd_inner_microstep: 1297.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3718
[2024-06-11 04:43:19,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.35 | bwd_microstep: 1573.79 | bwd_inner_microstep: 1573.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2185
[2024-06-11 04:43:20,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.53 | bwd_microstep: 1051.71 | bwd_inner_microstep: 1051.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634
[2024-06-11 04:43:23,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.67 | bwd_microstep: 1539.21 | bwd_inner_microstep: 1539.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 04:43:24,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1349.79 | bwd_inner_microstep: 1349.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-11 04:43:26,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.43 | bwd_microstep: 1313.45 | bwd_inner_microstep: 1313.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2154
[2024-06-11 04:43:28,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.33 | bwd_microstep: 946.71 | bwd_inner_microstep: 946.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-11 04:43:29,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.90 | bwd_microstep: 800.31 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657
[2024-06-11 04:43:31,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.53 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538
[2024-06-11 04:43:33,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.14 | bwd_microstep: 1416.91 | bwd_inner_microstep: 1416.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999
[2024-06-11 04:43:34,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.02 | bwd_microstep: 710.38 | bwd_inner_microstep: 710.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615
[2024-06-11 04:43:36,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1435.63 | bwd_inner_microstep: 1435.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 04:43:38,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.54 | bwd_microstep: 1455.92 | bwd_inner_microstep: 1455.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 04:43:40,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.73 | bwd_microstep: 1513.75 | bwd_inner_microstep: 1513.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 606
[2024-06-11 04:43:40,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.12 | bwd_microstep: 259.06 | bwd_inner_microstep: 259.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474
[2024-06-11 04:43:42,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.77 | bwd_microstep: 1282.04 | bwd_inner_microstep: 1282.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794
[2024-06-11 04:43:44,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1552.35 | bwd_inner_microstep: 1552.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-11 04:43:46,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1448.33 | bwd_inner_microstep: 1448.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-11 04:43:48,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1394.93 | bwd_inner_microstep: 1394.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3260
[2024-06-11 04:43:50,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.79 | bwd_microstep: 1316.06 | bwd_inner_microstep: 1316.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588
[2024-06-11 04:43:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.60
[2024-06-11 04:43:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 6344.83 | bwd_inner_microstep: 1768.04 | bwd_allreduce_microstep: 4576.74 | step_microstep: 37.79
[2024-06-11 04:43:57,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15359.06 | bwd: 45787.55 | bwd_inner: 41209.91 | bwd_allreduce: 4576.97 | step: 39.25
{'loss': 1.1884, 'learning_rate': 4.966803451213475e-07, 'epoch': 0.93}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043
[2024-06-11 04:43:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 803.21 | bwd_inner_microstep: 803.14 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415
[2024-06-11 04:44:00,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.56 | bwd_microstep: 1277.54 | bwd_inner_microstep: 1277.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 04:44:02,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1494.12 | bwd_inner_microstep: 1494.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832
[2024-06-11 04:44:04,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1485.17 | bwd_inner_microstep: 1485.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-11 04:44:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.04 | bwd_microstep: 789.51 | bwd_inner_microstep: 789.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-11 04:44:06,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.74 | bwd_microstep: 789.66 | bwd_inner_microstep: 789.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3724
[2024-06-11 04:44:08,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1493.67 | bwd_inner_microstep: 1493.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749
[2024-06-11 04:44:10,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.73 | bwd_microstep: 1636.85 | bwd_inner_microstep: 1636.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-11 04:44:12,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.43 | bwd_microstep: 1149.95 | bwd_inner_microstep: 1149.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 04:44:14,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.19 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416
[2024-06-11 04:44:15,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.06 | bwd_microstep: 1278.37 | bwd_inner_microstep: 1278.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3673
[2024-06-11 04:44:17,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1548.72 | bwd_inner_microstep: 1548.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 04:44:19,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3704
[2024-06-11 04:44:22,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.13 | bwd_microstep: 1591.85 | bwd_inner_microstep: 1591.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-11 04:44:23,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1381.64 | bwd_inner_microstep: 1381.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2118
[2024-06-11 04:44:25,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.50 | bwd_microstep: 859.65 | bwd_inner_microstep: 859.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3432
[2024-06-11 04:44:27,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1510.13 | bwd_inner_microstep: 1510.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-11 04:44:29,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.17 | bwd_microstep: 1531.38 | bwd_inner_microstep: 1531.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2070
[2024-06-11 04:44:30,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.84 | bwd_microstep: 754.49 | bwd_inner_microstep: 754.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929
[2024-06-11 04:44:31,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.55 | bwd_microstep: 697.38 | bwd_inner_microstep: 697.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2413
[2024-06-11 04:44:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.09 | bwd_microstep: 940.53 | bwd_inner_microstep: 940.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 04:44:34,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.88 | bwd_microstep: 1281.47 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 04:44:36,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.87 | bwd_microstep: 1276.98 | bwd_inner_microstep: 1276.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-11 04:44:38,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1346.67 | bwd_inner_microstep: 1346.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-11 04:44:39,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.57 | bwd_microstep: 1295.54 | bwd_inner_microstep: 1295.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-11 04:44:41,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.59 | bwd_microstep: 1525.59 | bwd_inner_microstep: 1525.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-11 04:44:44,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.34 | bwd_microstep: 1613.04 | bwd_inner_microstep: 1613.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-11 04:44:46,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.90 | bwd_microstep: 1422.29 | bwd_inner_microstep: 1422.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806
[2024-06-11 04:44:48,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1654.83 | bwd_inner_microstep: 1654.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-11 04:44:50,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1347.04 | bwd_inner_microstep: 1347.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-11 04:44:52,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.27 | bwd_microstep: 1590.59 | bwd_inner_microstep: 1590.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468
[2024-06-11 04:44:56,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 04:44:56,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 3770.41 | bwd_inner_microstep: 1653.02 | bwd_allreduce_microstep: 2117.34 | step_microstep: 37.62
[2024-06-11 04:44:56,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15504.12 | bwd: 43732.65 | bwd_inner: 41614.36 | bwd_allreduce: 2117.60 | step: 39.12
{'loss': 1.1861, 'learning_rate': 4.884018499158938e-07, 'epoch': 0.93}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3482
[2024-06-11 04:44:58,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.56 | bwd_microstep: 1431.30 | bwd_inner_microstep: 1431.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352
[2024-06-11 04:45:00,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.60 | bwd_microstep: 986.27 | bwd_inner_microstep: 986.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-11 04:45:02,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1447.98 | bwd_inner_microstep: 1447.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 04:45:04,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-11 04:45:06,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1481.53 | bwd_inner_microstep: 1481.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719
[2024-06-11 04:45:08,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.73 | bwd_microstep: 1633.46 | bwd_inner_microstep: 1633.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472
[2024-06-11 04:45:10,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.34 | bwd_microstep: 1244.09 | bwd_inner_microstep: 1244.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-11 04:45:11,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 793.20 | bwd_inner_microstep: 793.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1945
[2024-06-11 04:45:12,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.92 | bwd_microstep: 841.16 | bwd_inner_microstep: 841.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 04:45:14,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1346.73 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 04:45:16,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.33 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3276
[2024-06-11 04:45:18,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.47 | bwd_microstep: 1447.19 | bwd_inner_microstep: 1447.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674
[2024-06-11 04:45:20,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.82 | bwd_microstep: 1616.83 | bwd_inner_microstep: 1616.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-11 04:45:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.76 | bwd_microstep: 1489.99 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 04:45:24,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.16 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2187
[2024-06-11 04:45:25,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.64 | bwd_microstep: 950.43 | bwd_inner_microstep: 950.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 04:45:27,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1488.97 | bwd_inner_microstep: 1488.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3610
[2024-06-11 04:45:30,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.63 | bwd_microstep: 1809.22 | bwd_inner_microstep: 1809.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-11 04:45:31,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.67 | bwd_microstep: 685.30 | bwd_inner_microstep: 685.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 04:45:32,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1250.65 | bwd_inner_microstep: 1250.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-11 04:45:34,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.84 | bwd_microstep: 801.73 | bwd_inner_microstep: 801.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 04:45:35,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1283.05 | bwd_inner_microstep: 1283.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628
[2024-06-11 04:45:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.87 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433
[2024-06-11 04:45:39,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1251.15 | bwd_inner_microstep: 1251.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 04:45:41,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 04:45:43,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1400.55 | bwd_inner_microstep: 1400.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2057
[2024-06-11 04:45:44,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 911.39 | bwd_inner_microstep: 911.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 04:45:46,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281
[2024-06-11 04:45:47,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.64 | bwd_microstep: 911.68 | bwd_inner_microstep: 911.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851
[2024-06-11 04:45:49,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.18 | bwd_microstep: 1460.83 | bwd_inner_microstep: 1460.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-11 04:45:51,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.14 | bwd_microstep: 1182.50 | bwd_inner_microstep: 1182.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-11 04:45:57,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.08 | optimizer_step: 6.59
[2024-06-11 04:45:57,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 5181.41 | bwd_inner_microstep: 1814.85 | bwd_allreduce_microstep: 3366.52 | step_microstep: 37.71
[2024-06-11 04:45:57,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15379.29 | bwd: 44622.27 | bwd_inner: 41254.86 | bwd_allreduce: 3366.74 | step: 39.16
{'loss': 1.2099, 'learning_rate': 4.801920744576949e-07, 'epoch': 0.93}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553
[2024-06-11 04:45:59,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.53 | bwd_microstep: 1580.44 | bwd_inner_microstep: 1580.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3955
[2024-06-11 04:46:01,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.74 | bwd_microstep: 1594.44 | bwd_inner_microstep: 1594.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 04:46:03,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.43 | bwd_microstep: 1245.15 | bwd_inner_microstep: 1245.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-11 04:46:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.66 | bwd_microstep: 1543.76 | bwd_inner_microstep: 1543.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739
[2024-06-11 04:46:07,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.53 | bwd_microstep: 1634.06 | bwd_inner_microstep: 1634.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-11 04:46:09,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.67 | bwd_microstep: 1185.17 | bwd_inner_microstep: 1185.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-11 04:46:11,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.00 | bwd_microstep: 1530.55 | bwd_inner_microstep: 1530.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585
[2024-06-11 04:46:13,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1306.61 | bwd_inner_microstep: 1306.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865
[2024-06-11 04:46:14,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.08 | bwd_microstep: 675.41 | bwd_inner_microstep: 675.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 04:46:15,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1251.32 | bwd_inner_microstep: 1251.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-11 04:46:17,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1413.53 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3956
[2024-06-11 04:46:20,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.97 | bwd_microstep: 1810.78 | bwd_inner_microstep: 1810.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3643
[2024-06-11 04:46:22,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.85 | bwd_microstep: 1539.58 | bwd_inner_microstep: 1539.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2729
[2024-06-11 04:46:24,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.36 | bwd_microstep: 1200.25 | bwd_inner_microstep: 1200.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819
[2024-06-11 04:46:26,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.47 | bwd_microstep: 1718.96 | bwd_inner_microstep: 1718.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2323
[2024-06-11 04:46:27,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.32 | bwd_microstep: 889.71 | bwd_inner_microstep: 889.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 915
[2024-06-11 04:46:28,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.38 | bwd_microstep: 374.75 | bwd_inner_microstep: 374.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-11 04:46:30,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1408.52 | bwd_inner_microstep: 1408.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-11 04:46:32,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.06 | bwd_microstep: 1491.90 | bwd_inner_microstep: 1491.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-11 04:46:34,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1410.08 | bwd_inner_microstep: 1410.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 04:46:36,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1554.18 | bwd_inner_microstep: 1554.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 04:46:38,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1554.45 | bwd_inner_microstep: 1554.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997
[2024-06-11 04:46:39,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.07 | bwd_microstep: 737.53 | bwd_inner_microstep: 737.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536
[2024-06-11 04:46:41,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1493.81 | bwd_inner_microstep: 1493.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 04:46:43,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193
[2024-06-11 04:46:44,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.71 | bwd_microstep: 862.15 | bwd_inner_microstep: 862.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3588
[2024-06-11 04:46:46,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.60 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2691
[2024-06-11 04:46:48,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.68 | bwd_microstep: 1115.32 | bwd_inner_microstep: 1115.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1395
[2024-06-11 04:46:48,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.43 | bwd_microstep: 527.16 | bwd_inner_microstep: 527.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3715
[2024-06-11 04:46:50,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.70 | bwd_microstep: 1397.20 | bwd_inner_microstep: 1397.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3565
[2024-06-11 04:46:52,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1523.35 | bwd_inner_microstep: 1523.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-11 04:46:57,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.08 | optimizer_step: 6.57
[2024-06-11 04:46:57,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 4257.70 | bwd_inner_microstep: 1743.03 | bwd_allreduce_microstep: 2514.61 | step_microstep: 37.81
[2024-06-11 04:46:57,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.30 | bwd: 44607.60 | bwd_inner: 42092.09 | bwd_allreduce: 2514.84 | step: 39.23
{'loss': 1.1777, 'learning_rate': 4.720510476615348e-07, 'epoch': 0.93}
5/1726 [28:04:26<2:12:31, 65.72s/it]
 93%|█████████▎| 1606/1726 [28:05:32<2:11:23, 65.70s/it]


 93%|█████████▎| 1606/1726 [28:05:32<2:11:23, 65.70s/it]
 93%|█████████▎| 1607/1726 [28:06:34<2:07:47, 64.43s/it]


 93%|█████████▎| 1607/1726 [28:06:34<2:07:47, 64.43s/it]
 93%|█████████▎| 1608/1726 [28:07:33<2:03:50, 62.97s/it]


 93%|█████████▎| 1608/1726 [28:07:33<2:03:50, 62.97s/it]
 93%|█████████▎| 1609/1726 [28:08:33<2:01:14, 62.18s/it]


 93%|█████████▎| 1609/1726 [28:08:33<2:01:14, 62.18s/it]
 93%|█████████▎| 1610/1726 [28:09:34<1:59:16, 61.70s/it]


 93%dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-11 04:46:59,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 04:47:01,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1356.66 | bwd_inner_microstep: 1356.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3870
[2024-06-11 04:47:03,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1466.34 | bwd_inner_microstep: 1466.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394
[2024-06-11 04:47:05,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.96 | bwd_microstep: 1243.67 | bwd_inner_microstep: 1243.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-11 04:47:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 1534.45 | bwd_inner_microstep: 1534.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-11 04:47:09,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1501.15 | bwd_inner_microstep: 1501.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884
[2024-06-11 04:47:10,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.95 | bwd_microstep: 680.54 | bwd_inner_microstep: 680.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429
[2024-06-11 04:47:11,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.35 | bwd_microstep: 1155.52 | bwd_inner_microstep: 1155.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-11 04:47:13,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.40 | bwd_microstep: 1487.25 | bwd_inner_microstep: 1487.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-11 04:47:16,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1558.92 | bwd_inner_microstep: 1558.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3513
[2024-06-11 04:47:18,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.95 | bwd_microstep: 1505.28 | bwd_inner_microstep: 1505.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3524
[2024-06-11 04:47:20,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.66 | bwd_microstep: 1538.98 | bwd_inner_microstep: 1538.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-11 04:47:22,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.81 | bwd_microstep: 1447.46 | bwd_inner_microstep: 1447.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428
[2024-06-11 04:47:24,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.01 | bwd_microstep: 1297.63 | bwd_inner_microstep: 1297.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-11 04:47:25,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1344.89 | bwd_inner_microstep: 1344.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 04:47:27,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1287.23 | bwd_inner_microstep: 1287.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713
[2024-06-11 04:47:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1430.86 | bwd_inner_microstep: 1430.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-11 04:47:30,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 696.99 | bwd_inner_microstep: 696.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297
[2024-06-11 04:47:31,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.05 | bwd_microstep: 881.64 | bwd_inner_microstep: 881.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 04:47:33,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1376.33 | bwd_inner_microstep: 1376.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3733
[2024-06-11 04:47:35,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1336.94 | bwd_inner_microstep: 1336.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874
[2024-06-11 04:47:37,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 1583.72 | bwd_inner_microstep: 1583.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-11 04:47:39,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1414.50 | bwd_inner_microstep: 1414.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 04:47:41,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1284.86 | bwd_inner_microstep: 1284.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834
[2024-06-11 04:47:43,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1556.97 | bwd_inner_microstep: 1556.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 04:47:45,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613
[2024-06-11 04:47:47,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.20 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-11 04:47:48,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.22 | bwd_microstep: 809.48 | bwd_inner_microstep: 809.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 04:47:50,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 04:47:52,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1348.11 | bwd_inner_microstep: 1348.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3939
[2024-06-11 04:47:54,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.10 | bwd_microstep: 1623.26 | bwd_inner_microstep: 1623.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456
[2024-06-11 04:47:57,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.02 | optimizer_step: 6.57
[2024-06-11 04:47:57,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.68 | bwd_microstep: 2308.14 | bwd_inner_microstep: 1668.10 | bwd_allreduce_microstep: 639.98 | step_microstep: 37.55
[2024-06-11 04:47:57,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.07 | bwd: 43445.07 | bwd_inner: 42804.19 | bwd_allreduce: 640.21 | step: 39.07
{'loss': 1.1689, 'learning_rate': 4.6397879820006874e-07, 'epoch': 0.93}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 04:47:59,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.67 | bwd_microstep: 1236.20 | bwd_inner_microstep: 1236.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 04:48:01,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1284.46 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 04:48:02,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.72 | bwd_microstep: 1258.20 | bwd_inner_microstep: 1258.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829
[2024-06-11 04:48:04,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.29 | bwd_microstep: 1555.05 | bwd_inner_microstep: 1555.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4201
[2024-06-11 04:48:07,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.65 | bwd_microstep: 1754.31 | bwd_inner_microstep: 1754.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460
[2024-06-11 04:48:08,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.30 | bwd_microstep: 1180.07 | bwd_inner_microstep: 1180.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480
[2024-06-11 04:48:10,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1342.95 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744
[2024-06-11 04:48:12,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.49 | bwd_microstep: 1533.67 | bwd_inner_microstep: 1533.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2136
[2024-06-11 04:48:14,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.63 | bwd_microstep: 830.92 | bwd_inner_microstep: 830.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 04:48:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.10 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 04:48:17,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1393.99 | bwd_inner_microstep: 1393.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-11 04:48:19,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1415.40 | bwd_inner_microstep: 1415.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-11 04:48:20,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.60 | bwd_microstep: 788.85 | bwd_inner_microstep: 788.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3494
[2024-06-11 04:48:22,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1552.11 | bwd_inner_microstep: 1552.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621
[2024-06-11 04:48:25,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.09 | bwd_microstep: 1598.68 | bwd_inner_microstep: 1598.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-11 04:48:26,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.87 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 04:48:28,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1386.59 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-11 04:48:31,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.56 | bwd_microstep: 1609.15 | bwd_inner_microstep: 1609.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3518
[2024-06-11 04:48:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1320.40 | bwd_inner_microstep: 1320.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830
[2024-06-11 04:48:35,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.45 | bwd_microstep: 1756.07 | bwd_inner_microstep: 1756.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3562
[2024-06-11 04:48:37,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1332.43 | bwd_inner_microstep: 1332.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3676
[2024-06-11 04:48:39,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1478.10 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2013
[2024-06-11 04:48:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.35 | bwd_microstep: 867.38 | bwd_inner_microstep: 867.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-11 04:48:42,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.52 | bwd_microstep: 1602.38 | bwd_inner_microstep: 1602.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533
[2024-06-11 04:48:44,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.02 | bwd_microstep: 1583.08 | bwd_inner_microstep: 1583.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384
[2024-06-11 04:48:46,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1435.05 | bwd_inner_microstep: 1435.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585
[2024-06-11 04:48:48,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.47 | bwd_microstep: 1530.92 | bwd_inner_microstep: 1530.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-11 04:48:50,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.49 | bwd_microstep: 1275.83 | bwd_inner_microstep: 1275.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 04:48:52,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.14 | bwd_microstep: 1289.88 | bwd_inner_microstep: 1289.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2191
[2024-06-11 04:48:53,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.17 | bwd_microstep: 795.32 | bwd_inner_microstep: 795.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493
[2024-06-11 04:48:55,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.23 | bwd_microstep: 1189.67 | bwd_inner_microstep: 1189.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-11 04:48:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.07 | optimizer_step: 6.59
[2024-06-11 04:48:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 3455.40 | bwd_inner_microstep: 1433.32 | bwd_allreduce_microstep: 2022.02 | step_microstep: 37.73
[2024-06-11 04:48:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16103.50 | bwd: 45204.73 | bwd_inner: 43181.80 | bwd_allreduce: 2022.25 | step: 39.29
{'loss': 1.2202, 'learning_rate': 4.559753545037171e-07, 'epoch': 0.93}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 04:49:01,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1336.95 | bwd_inner_microstep: 1336.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933
[2024-06-11 04:49:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.14 | bwd_microstep: 1592.63 | bwd_inner_microstep: 1592.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2317
[2024-06-11 04:49:04,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 883.17 | bwd_inner_microstep: 883.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 04:49:06,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1247.15 | bwd_inner_microstep: 1247.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699
[2024-06-11 04:49:08,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1524.35 | bwd_inner_microstep: 1524.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 04:49:10,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4125
[2024-06-11 04:49:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.95 | bwd_microstep: 1637.43 | bwd_inner_microstep: 1637.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 04:49:14,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1381.22 | bwd_inner_microstep: 1381.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 04:49:16,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1254.28 | bwd_inner_microstep: 1254.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899
[2024-06-11 04:49:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 715.09 | bwd_inner_microstep: 715.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1922
[2024-06-11 04:49:18,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.65 | bwd_microstep: 741.05 | bwd_inner_microstep: 741.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676
[2024-06-11 04:49:20,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1515.67 | bwd_inner_microstep: 1515.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636
[2024-06-11 04:49:22,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1410.63 | bwd_inner_microstep: 1410.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 04:49:24,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1482.44 | bwd_inner_microstep: 1482.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669
[2024-06-11 04:49:26,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.26 | bwd_microstep: 1513.26 | bwd_inner_microstep: 1513.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4034
[2024-06-11 04:49:28,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.38 | bwd_microstep: 1709.60 | bwd_inner_microstep: 1709.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851
[2024-06-11 04:49:30,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.46 | bwd_microstep: 1651.61 | bwd_inner_microstep: 1651.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-11 04:49:32,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015
[2024-06-11 04:49:33,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.71 | bwd_microstep: 897.90 | bwd_inner_microstep: 897.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2034
[2024-06-11 04:49:35,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.21 | bwd_microstep: 811.32 | bwd_inner_microstep: 811.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817
[2024-06-11 04:49:37,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1609.94 | bwd_inner_microstep: 1609.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758
[2024-06-11 04:49:39,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.12 | bwd_microstep: 1449.31 | bwd_inner_microstep: 1449.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3845
[2024-06-11 04:49:41,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.01 | bwd_microstep: 1494.90 | bwd_inner_microstep: 1494.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006
[2024-06-11 04:49:42,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.09 | bwd_microstep: 830.53 | bwd_inner_microstep: 830.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278
[2024-06-11 04:49:43,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.88 | bwd_microstep: 882.07 | bwd_inner_microstep: 882.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-11 04:49:45,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1451.98 | bwd_inner_microstep: 1451.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581
[2024-06-11 04:49:47,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1398.39 | bwd_inner_microstep: 1398.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2438
[2024-06-11 04:49:48,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.34 | bwd_microstep: 851.70 | bwd_inner_microstep: 851.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 04:49:50,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1403.72 | bwd_inner_microstep: 1403.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2511
[2024-06-11 04:49:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 959.37 | bwd_inner_microstep: 959.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-11 04:49:54,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.69 | bwd_microstep: 1459.37 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-11 04:50:00,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-11 04:50:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 5613.62 | bwd_inner_microstep: 1457.93 | bwd_allreduce_microstep: 4155.63 | step_microstep: 38.28
[2024-06-11 04:50:00,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15421.62 | bwd: 45447.58 | bwd_inner: 41291.03 | bwd_allreduce: 4155.87 | step: 39.71
{'loss': 1.1107, 'learning_rate': 4.480407447605673e-07, 'epoch': 0.93}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533
[2024-06-11 04:50:02,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.94 | bwd_microstep: 1479.16 | bwd_inner_microstep: 1479.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 04:50:04,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.59 | bwd_microstep: 1276.14 | bwd_inner_microstep: 1276.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3479
[2024-06-11 04:50:05,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.49 | bwd_microstep: 1239.57 | bwd_inner_microstep: 1239.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033
[2024-06-11 04:50:07,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 808.13 | bwd_inner_microstep: 808.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2231
[2024-06-11 04:50:08,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.66 | bwd_microstep: 862.33 | bwd_inner_microstep: 862.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 04:50:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1380.54 | bwd_inner_microstep: 1380.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-11 04:50:11,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.25 | bwd_microstep: 1148.55 | bwd_inner_microstep: 1148.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 04:50:13,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1485.04 | bwd_inner_microstep: 1485.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-11 04:50:15,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1252.90 | bwd_inner_microstep: 1252.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-11 04:50:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.61 | bwd_microstep: 790.27 | bwd_inner_microstep: 790.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505
[2024-06-11 04:50:18,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1549.67 | bwd_inner_microstep: 1549.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079
[2024-06-11 04:50:19,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 915.37 | bwd_inner_microstep: 915.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-11 04:50:21,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 794.70 | bwd_inner_microstep: 794.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1980
[2024-06-11 04:50:22,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 895.66 | bwd_inner_microstep: 895.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 04:50:24,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1393.68 | bwd_inner_microstep: 1393.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509
[2024-06-11 04:50:26,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1483.89 | bwd_inner_microstep: 1483.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3626
[2024-06-11 04:50:28,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.65 | bwd_microstep: 1772.66 | bwd_inner_microstep: 1772.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510
[2024-06-11 04:50:30,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1578.24 | bwd_inner_microstep: 1578.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2112
[2024-06-11 04:50:32,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.42 | bwd_microstep: 857.47 | bwd_inner_microstep: 857.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292
[2024-06-11 04:50:33,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.38 | bwd_microstep: 878.62 | bwd_inner_microstep: 878.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-11 04:50:35,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.04 | bwd_microstep: 1612.99 | bwd_inner_microstep: 1612.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-11 04:50:37,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1297.87 | bwd_inner_microstep: 1297.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2513
[2024-06-11 04:50:38,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.06 | bwd_microstep: 960.98 | bwd_inner_microstep: 960.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456
[2024-06-11 04:50:40,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1160.03 | bwd_inner_microstep: 1160.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3613
[2024-06-11 04:50:42,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1342.91 | bwd_inner_microstep: 1342.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3771
[2024-06-11 04:50:44,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.50 | bwd_microstep: 1578.38 | bwd_inner_microstep: 1578.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3380
[2024-06-11 04:50:46,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.46 | bwd_microstep: 1273.75 | bwd_inner_microstep: 1273.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 04:50:48,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.07 | bwd_microstep: 1454.46 | bwd_inner_microstep: 1454.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 04:50:49,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1411.31 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 04:50:51,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1286.54 | bwd_inner_microstep: 1286.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585
[2024-06-11 04:50:53,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1402.44 | bwd_inner_microstep: 1402.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-11 04:51:02,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.09 | optimizer_step: 6.59
[2024-06-11 04:51:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.06 | bwd_microstep: 8077.82 | bwd_inner_microstep: 1459.61 | bwd_allreduce_microstep: 6618.16 | step_microstep: 39.05
[2024-06-11 04:51:02,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14948.97 | bwd: 46702.12 | bwd_inner: 40083.06 | bwd_allreduce: 6618.39 | step: 40.58
{'loss': 1.1265, 'learning_rate': 4.4017499691627384e-07, 'epoch': 0.94}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474
[2024-06-11 04:51:04,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.83 | bwd_microstep: 1569.50 | bwd_inner_microstep: 1569.42 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-11 04:51:06,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.54 | bwd_microstep: 1473.12 | bwd_inner_microstep: 1473.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3487
[2024-06-11 04:51:08,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.65 | bwd_microstep: 1242.08 | bwd_inner_microstep: 1242.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-11 04:51:10,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.65 | bwd_microstep: 1272.84 | bwd_inner_microstep: 1272.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058
[2024-06-11 04:51:11,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.92 | bwd_microstep: 816.16 | bwd_inner_microstep: 816.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 04:51:12,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.01 | bwd_microstep: 1283.14 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-11 04:51:15,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.50 | bwd_microstep: 1546.33 | bwd_inner_microstep: 1546.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407
[2024-06-11 04:51:16,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.92 | bwd_microstep: 1151.24 | bwd_inner_microstep: 1151.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510
[2024-06-11 04:51:18,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1320.03 | bwd_inner_microstep: 1320.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-11 04:51:19,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.73 | bwd_microstep: 792.32 | bwd_inner_microstep: 792.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 04:51:21,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1387.99 | bwd_inner_microstep: 1387.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447
[2024-06-11 04:51:23,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1255.04 | bwd_inner_microstep: 1255.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 04:51:25,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1393.67 | bwd_inner_microstep: 1393.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3674
[2024-06-11 04:51:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 04:51:28,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1343.72 | bwd_inner_microstep: 1343.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3526
[2024-06-11 04:51:31,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.51 | bwd_microstep: 1546.79 | bwd_inner_microstep: 1546.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3501
[2024-06-11 04:51:33,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.88 | bwd_microstep: 1510.43 | bwd_inner_microstep: 1510.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 04:51:35,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1340.39 | bwd_inner_microstep: 1340.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105
[2024-06-11 04:51:36,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.49 | bwd_microstep: 918.38 | bwd_inner_microstep: 918.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-11 04:51:38,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.45 | bwd_microstep: 1620.46 | bwd_inner_microstep: 1620.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3647
[2024-06-11 04:51:40,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.63 | bwd_microstep: 1345.90 | bwd_inner_microstep: 1345.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-11 04:51:42,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-11 04:51:44,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1607.28 | bwd_inner_microstep: 1607.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2001
[2024-06-11 04:51:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.48 | bwd_microstep: 711.98 | bwd_inner_microstep: 711.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-11 04:51:47,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1510.54 | bwd_inner_microstep: 1510.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815
[2024-06-11 04:51:49,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1654.29 | bwd_inner_microstep: 1654.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 04:51:51,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.85 | bwd_microstep: 1550.88 | bwd_inner_microstep: 1550.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-11 04:51:54,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.21 | bwd_microstep: 1551.25 | bwd_inner_microstep: 1551.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-11 04:51:56,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.61 | bwd_microstep: 1469.83 | bwd_inner_microstep: 1469.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042
[2024-06-11 04:51:57,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.04 | bwd_microstep: 907.67 | bwd_inner_microstep: 907.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-11 04:51:59,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1510.09 | bwd_inner_microstep: 1510.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476
[2024-06-11 04:52:04,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-11 04:52:04,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.16 | bwd_microstep: 4463.15 | bwd_inner_microstep: 1624.78 | bwd_allreduce_microstep: 2838.31 | step_microstep: 38.88
[2024-06-11 04:52:04,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16005.28 | bwd: 45847.81 | bwd_inner: 43008.52 | bwd_allreduce: 2838.58 | step: 40.48
|█████████▎| 1610/1726 [28:09:34<1:59:16, 61.70s/it]
 93%|█████████▎| 1611/1726 [28:10:34<1:57:08, 61.12s/it]


 93%|█████████▎| 1611/1726 [28:10:34<1:57:08, 61.12s/it]
 93%|█████████▎| 1612/1726 [28:11:35<1:56:25, 61.28s/it]


 93%|█████████▎| 1612/1726 [28:11:35<1:56:25, 61.28s/it]
 93%|█████████▎| 1613/1726 [28:12:37<1:55:21, 61.25s/it]


 93%|█████████▎| 1613/1726 [28:12:37<1:55:21, 61.25s/it]
 94%|█████████▎| 1614/1726 [28:13:39<1:54:44, 61.47s/it]


 94%|█████████▎| 1614/1726 [28:13:39<1:54:44, 61.47s/it]
 94%|█████████▎| 1615/1726 [28:14:41<1:54:07, 61.69s/it]
                          {'loss': 1.2352, 'learning_rate': 4.3237813867396117e-07, 'epoch': 0.94}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-11 04:52:06,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1468.32 | bwd_inner_microstep: 1468.18 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.23
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 04:52:08,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.10 | bwd_microstep: 1389.03 | bwd_inner_microstep: 1389.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454
[2024-06-11 04:52:10,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.40 | bwd_microstep: 1216.35 | bwd_inner_microstep: 1216.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-11 04:52:12,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.30 | bwd_microstep: 1558.39 | bwd_inner_microstep: 1558.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-11 04:52:14,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1550.63 | bwd_inner_microstep: 1550.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418
[2024-06-11 04:52:16,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.11 | bwd_microstep: 1157.62 | bwd_inner_microstep: 1157.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 04:52:17,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1387.70 | bwd_inner_microstep: 1387.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 04:52:19,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1248.35 | bwd_inner_microstep: 1248.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407
[2024-06-11 04:52:21,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.47 | bwd_microstep: 1281.82 | bwd_inner_microstep: 1281.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3406
[2024-06-11 04:52:23,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1295.83 | bwd_inner_microstep: 1295.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 04:52:25,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1487.83 | bwd_inner_microstep: 1487.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1927
[2024-06-11 04:52:26,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.79 | bwd_microstep: 882.53 | bwd_inner_microstep: 882.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-11 04:52:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421
[2024-06-11 04:52:30,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 1445.60 | bwd_inner_microstep: 1445.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 04:52:32,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.78 | bwd_microstep: 1481.05 | bwd_inner_microstep: 1481.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098
[2024-06-11 04:52:33,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.70 | bwd_microstep: 920.13 | bwd_inner_microstep: 920.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424
[2024-06-11 04:52:35,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.81 | bwd_microstep: 1161.76 | bwd_inner_microstep: 1161.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3634
[2024-06-11 04:52:37,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1379.72 | bwd_inner_microstep: 1379.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 04:52:39,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.32 | bwd_microstep: 1280.79 | bwd_inner_microstep: 1280.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-11 04:52:41,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1968
[2024-06-11 04:52:42,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.54 | bwd_microstep: 733.72 | bwd_inner_microstep: 733.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-11 04:52:44,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.26 | bwd_microstep: 1192.79 | bwd_inner_microstep: 1192.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545
[2024-06-11 04:52:45,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1328.55 | bwd_inner_microstep: 1328.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 04:52:48,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1509.40 | bwd_inner_microstep: 1509.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3565
[2024-06-11 04:52:49,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.07 | bwd_microstep: 1205.85 | bwd_inner_microstep: 1205.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 04:52:51,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740
[2024-06-11 04:52:53,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1444.97 | bwd_inner_microstep: 1444.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727
[2024-06-11 04:52:55,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1536.26 | bwd_inner_microstep: 1536.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2056
[2024-06-11 04:52:57,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.80 | bwd_microstep: 914.77 | bwd_inner_microstep: 914.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2284
[2024-06-11 04:52:58,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.82 | bwd_microstep: 854.09 | bwd_inner_microstep: 854.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-11 04:52:59,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.52 | bwd_microstep: 912.20 | bwd_inner_microstep: 912.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2979
[2024-06-11 04:53:20,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.32 | optimizer_step: 6.59
[2024-06-11 04:53:20,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.84 | bwd_microstep: 19920.01 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 18540.46 | step_microstep: 40.44
[2024-06-11 04:53:20,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15424.75 | bwd: 59801.99 | bwd_inner: 41260.46 | bwd_allreduce: 18540.76 | step: 42.12
{'loss': 1.1617, 'learning_rate': 4.2465019749411864e-07, 'epoch': 0.94}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 04:53:22,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.37 | bwd_microstep: 1461.68 | bwd_inner_microstep: 1461.62 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-11 04:53:23,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.47 | bwd_microstep: 793.01 | bwd_inner_microstep: 792.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2383
[2024-06-11 04:53:24,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.99 | bwd_microstep: 996.45 | bwd_inner_microstep: 996.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-11 04:53:26,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1338.36 | bwd_inner_microstep: 1338.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 04:53:28,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.44 | bwd_microstep: 1274.24 | bwd_inner_microstep: 1274.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793
[2024-06-11 04:53:30,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.91 | bwd_microstep: 1541.20 | bwd_inner_microstep: 1541.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3863
[2024-06-11 04:53:32,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.09 | bwd_microstep: 1651.94 | bwd_inner_microstep: 1651.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-11 04:53:34,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1478.95 | bwd_inner_microstep: 1478.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4081
[2024-06-11 04:53:36,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.98 | bwd_microstep: 1529.91 | bwd_inner_microstep: 1529.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-11 04:54:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.17 | bwd_microstep: 1607.47 | bwd_inner_microstep: 1607.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693
[2024-06-11 04:54:27,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1511.24 | bwd_inner_microstep: 1511.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-11 04:54:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.03 | bwd_microstep: 795.94 | bwd_inner_microstep: 795.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 04:54:30,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.42 | bwd_microstep: 1378.99 | bwd_inner_microstep: 1378.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3422
[2024-06-11 04:54:32,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.10 | bwd_microstep: 1270.79 | bwd_inner_microstep: 1270.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 04:54:34,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1364.66 | bwd_inner_microstep: 1364.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 04:54:36,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1394.89 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444
[2024-06-11 04:54:38,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1443.71 | bwd_inner_microstep: 1443.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3530
[2024-06-11 04:54:40,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.17 | bwd_microstep: 1648.95 | bwd_inner_microstep: 1648.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3729
[2024-06-11 04:54:42,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1330.98 | bwd_inner_microstep: 1330.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050
[2024-06-11 04:54:43,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.87 | bwd_microstep: 812.74 | bwd_inner_microstep: 812.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823
[2024-06-11 04:54:45,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1449.10 | bwd_inner_microstep: 1449.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534
[2024-06-11 04:54:47,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1321.63 | bwd_inner_microstep: 1321.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 04:54:49,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1395.05 | bwd_inner_microstep: 1395.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-11 04:54:50,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.99 | bwd_microstep: 1184.34 | bwd_inner_microstep: 1184.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2047
[2024-06-11 04:54:51,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.74 | bwd_microstep: 715.41 | bwd_inner_microstep: 715.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277
[2024-06-11 04:54:52,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.71 | bwd_microstep: 876.09 | bwd_inner_microstep: 876.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3559
[2024-06-11 04:54:54,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1263.60 | bwd_inner_microstep: 1263.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 04:54:56,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.85 | bwd_microstep: 1554.96 | bwd_inner_microstep: 1554.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-11 04:54:59,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.22 | bwd_microstep: 1754.46 | bwd_inner_microstep: 1754.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-11 04:55:01,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.97 | bwd_microstep: 1636.39 | bwd_inner_microstep: 1636.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665
[2024-06-11 04:55:03,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1522.38 | bwd_inner_microstep: 1522.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3587
[2024-06-11 04:55:05,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.05 | optimizer_gradients: 4.14 | optimizer_step: 6.68
[2024-06-11 04:55:05,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1605.82 | bwd_inner_microstep: 1597.74 | bwd_allreduce_microstep: 8.03 | step_microstep: 40.14
[2024-06-11 04:55:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16059.77 | bwd: 42905.36 | bwd_inner: 42896.37 | bwd_allreduce: 8.29 | step: 41.76
{'loss': 1.1581, 'learning_rate': 4.1699120059452093e-07, 'epoch': 0.94}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3406
[2024-06-11 04:55:07,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1439.22 | bwd_inner_microstep: 1439.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3993
[2024-06-11 04:55:09,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 1507.40 | bwd_inner_microstep: 1507.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 04:55:11,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1344.74 | bwd_inner_microstep: 1344.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-11 04:55:13,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.19 | bwd_microstep: 974.16 | bwd_inner_microstep: 974.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 04:55:14,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1252.94 | bwd_inner_microstep: 1252.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 04:55:16,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-11 04:55:17,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.17 | bwd_microstep: 788.17 | bwd_inner_microstep: 788.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 04:55:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1280.72 | bwd_inner_microstep: 1280.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 04:55:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1279.54 | bwd_inner_microstep: 1279.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 04:55:23,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1287.03 | bwd_inner_microstep: 1287.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-11 04:55:24,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.84 | bwd_microstep: 820.71 | bwd_inner_microstep: 820.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 04:55:26,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713
[2024-06-11 04:55:28,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.94 | bwd_microstep: 1633.42 | bwd_inner_microstep: 1633.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2141
[2024-06-11 04:55:29,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.71 | bwd_microstep: 962.88 | bwd_inner_microstep: 962.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2116
[2024-06-11 04:55:31,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.90 | bwd_microstep: 957.48 | bwd_inner_microstep: 957.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 04:55:33,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1342.45 | bwd_inner_microstep: 1342.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-11 04:55:34,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.63 | bwd_microstep: 915.76 | bwd_inner_microstep: 915.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2026
[2024-06-11 04:55:35,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.90 | bwd_microstep: 902.35 | bwd_inner_microstep: 902.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395
[2024-06-11 04:55:37,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1342.46 | bwd_inner_microstep: 1342.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3422
[2024-06-11 04:55:39,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1294.13 | bwd_inner_microstep: 1294.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666
[2024-06-11 04:55:41,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.72 | bwd_microstep: 1721.46 | bwd_inner_microstep: 1721.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-11 04:55:43,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.50 | bwd_microstep: 1533.14 | bwd_inner_microstep: 1532.76 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.31
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542
[2024-06-11 04:55:45,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1401.63 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2178
[2024-06-11 04:55:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.36 | bwd_microstep: 1055.18 | bwd_inner_microstep: 1055.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738
[2024-06-11 04:55:49,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 1539.88 | bwd_inner_microstep: 1539.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3593
[2024-06-11 04:55:51,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.88 | bwd_microstep: 1671.20 | bwd_inner_microstep: 1671.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-11 04:55:53,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.17 | bwd_microstep: 1453.02 | bwd_inner_microstep: 1452.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 04:55:55,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.32 | bwd_microstep: 1503.11 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 04:55:57,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1413.82 | bwd_inner_microstep: 1413.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 04:55:59,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.49 | bwd_microstep: 1359.89 | bwd_inner_microstep: 1359.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-11 04:56:01,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1412.77 | bwd_inner_microstep: 1412.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-11 04:56:28,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.60
[2024-06-11 04:56:28,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 26974.98 | bwd_inner_microstep: 1529.86 | bwd_allreduce_microstep: 25445.04 | step_microstep: 39.44
[2024-06-11 04:56:28,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15522.76 | bwd: 67133.32 | bwd_inner: 41686.88 | bwd_allreduce: 25445.61 | step: 41.74
{'loss': 1.1922, 'learning_rate': 4.094011749501103e-07, 'epoch': 0.94}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942
[2024-06-11 04:56:29,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.46 | bwd_microstep: 781.18 | bwd_inner_microstep: 781.04 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 720
[2024-06-11 04:56:30,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.23 | bwd_microstep: 292.62 | bwd_inner_microstep: 292.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483
[2024-06-11 04:56:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1473.18 | bwd_inner_microstep: 1473.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 04:56:34,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.11 | bwd_microstep: 1541.96 | bwd_inner_microstep: 1541.80 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.27
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 04:56:36,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.55 | bwd_microstep: 1280.20 | bwd_inner_microstep: 1280.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 04:56:38,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.09 | bwd_microstep: 1244.80 | bwd_inner_microstep: 1244.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211
[2024-06-11 04:56:39,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.74 | bwd_microstep: 952.45 | bwd_inner_microstep: 952.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-11 04:56:41,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1406.29 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-11 04:57:11,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.77 | bwd_microstep: 1510.32 | bwd_inner_microstep: 1510.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3694
[2024-06-11 04:57:13,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.67 | bwd_microstep: 1674.62 | bwd_inner_microstep: 1674.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 04:57:15,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.51 | bwd_microstep: 1252.13 | bwd_inner_microstep: 1252.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 04:57:17,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.34 | bwd_microstep: 1343.87 | bwd_inner_microstep: 1343.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-11 04:57:19,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.21 | bwd_microstep: 1376.39 | bwd_inner_microstep: 1376.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 04:57:21,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1349.94 | bwd_inner_microstep: 1349.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457
[2024-06-11 04:57:23,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1299.86 | bwd_inner_microstep: 1299.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032
[2024-06-11 04:57:24,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.11 | bwd_microstep: 895.65 | bwd_inner_microstep: 895.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661
[2024-06-11 04:57:26,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.19 | bwd_microstep: 1604.36 | bwd_inner_microstep: 1604.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-11 04:57:28,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.13 | bwd_microstep: 1503.22 | bwd_inner_microstep: 1503.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-11 04:57:29,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.09 | bwd_microstep: 891.62 | bwd_inner_microstep: 891.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2158
[2024-06-11 04:57:31,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.57 | bwd_microstep: 851.24 | bwd_inner_microstep: 851.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627
[2024-06-11 04:57:33,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.73 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543
[2024-06-11 04:57:34,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.42 | bwd_microstep: 1321.90 | bwd_inner_microstep: 1321.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939
[2024-06-11 04:57:35,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 697.91 | bwd_inner_microstep: 697.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2273
[2024-06-11 04:57:37,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.04 | bwd_microstep: 905.39 | bwd_inner_microstep: 905.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-11 04:57:39,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.83 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-11 04:57:40,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.49 | bwd_microstep: 800.31 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 04:57:42,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.37 | bwd_microstep: 1275.98 | bwd_inner_microstep: 1275.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3774
[2024-06-11 04:57:44,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.96 | bwd_microstep: 1345.09 | bwd_inner_microstep: 1345.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639
[2024-06-11 04:57:46,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1568.51 | bwd_inner_microstep: 1568.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1857
[2024-06-11 04:57:47,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.14 | bwd_microstep: 770.96 | bwd_inner_microstep: 770.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804
[2024-06-11 04:57:49,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.92 | bwd_microstep: 1751.01 | bwd_inner_microstep: 1750.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3814
[2024-06-11 04:58:17,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 04:58:17,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.59 | bwd_microstep: 26921.40 | bwd_inner_microstep: 1917.95 | bwd_allreduce_microstep: 25003.39 | step_microstep: 39.38
[2024-06-11 04:58:17,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14876.47 | bwd: 64972.62 | bwd_inner: 39968.06 | bwd_allreduce: 25003.80 | step: 41.44
{'loss': 1.1368, 'learning_rate': 4.0188014729292125e-07, 'epoch': 0.94}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 04:58:19,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1379.17 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-11 04:58:21,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.17 | bwd_microstep: 1476.67 | bwd_inner_microstep: 1476.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822
[2024-06-11 04:58:23,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.11 | bwd_microstep: 1505.30 | bwd_inner_microstep: 1505.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934
[2024-06-11 04:58:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.21 | bwd_microstep: 785.55 | bwd_inner_microstep: 785.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4209
[2024-06-11 04:58:26,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.00 | bwd_microstep: 1746.50 | bwd_inner_microstep: 1746.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476
[2024-06-11 04:58:28,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.46 | bwd_microstep: 1305.42 | bwd_inner_microstep: 1305.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 04:58:30,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.54 | bwd_microstep: 1185.61 | bwd_inner_microstep: 1185.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956
[2024-06-11 04:58:31,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 789.54 | bwd_inner_microstep: 789.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 04:58:33,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1281.31 | bwd_inner_microstep: 1281.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 04:58:35,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1392.95 | bwd_inner_microstep: 1392.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436
[2024-06-11 04:58:37,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.88 | bwd_microstep: 1375.36 | bwd_inner_microstep: 1375.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515
[2024-06-11 04:58:38,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.17 | bwd_microstep: 1220.32 | bwd_inner_microstep: 1220.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3641
[2024-06-11 04:58:40,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.44 | bwd_microstep: 1644.47 | bwd_inner_microstep: 1644.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2456
[2024-06-11 04:58:42,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.14 | bwd_microstep: 948.33 | bwd_inner_microstep: 948.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466
[2024-06-11 04:58:44,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.49 | bwd_microstep: 1436.13 | bwd_inner_microstep: 1436.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486
[2024-06-11 04:58:46,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.82 | bwd_microstep: 1440.69 | bwd_inner_microstep: 1440.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640
[2024-06-11 04:58:48,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1570.64 | bwd_inner_microstep: 1570.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425
[2024-06-11 04:58:50,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.03 | bwd_microstep: 1407.38 | bwd_inner_microstep: 1407.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 04:58:52,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.83 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522
[2024-06-11 04:58:54,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.25 | bwd_microstep: 1421.70 | bwd_inner_microstep: 1421.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-11 04:58:55,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 903.64 | bwd_inner_microstep: 903.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-11 04:58:57,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.83 | bwd_microstep: 1656.75 | bwd_inner_microstep: 1656.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3667
[2024-06-11 04:58:59,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.78 | bwd_microstep: 1585.63 | bwd_inner_microstep: 1585.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 04:59:01,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.88 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 04:59:03,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1256.03 | bwd_inner_microstep: 1256.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673
[2024-06-11 04:59:05,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.77 | bwd_microstep: 1591.33 | bwd_inner_microstep: 1591.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3570
[2024-06-11 04:59:07,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1430.35 | bwd_inner_microstep: 1430.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2091
[2024-06-11 04:59:08,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.63 | bwd_microstep: 883.41 | bwd_inner_microstep: 883.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 04:59:10,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-11 04:59:12,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.85 | bwd_microstep: 1313.12 | bwd_inner_microstep: 1313.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432
[2024-06-11 04:59:14,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.37 | bwd_microstep: 1153.97 | bwd_inner_microstep: 1153.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804
[2024-06-11 04:59:17,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.14 | optimizer_step: 6.62
[2024-06-11 04:59:17,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 3184.46 | bwd_inner_microstep: 1634.08 | bwd_allreduce_microstep: 1550.32 | step_microstep: 38.72
[2024-06-11 04:59:17,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15965.17 | bwd: 44328.21 | bwd_inner: 42776.88 | bwd_allreduce: 1550.60 | step: 40.30


 94%|█████████▎| 1615/1726 [28:14:41<1:54:07, 61.69s/it]
 94%|█████████▎| 1616/1726 [28:15:56<2:00:43, 65.85s/it]


 94%|█████████▎| 1616/1726 [28:15:56<2:00:43, 65.85s/it]
 94%|█████████▎| 1617/1726 [28:17:42<2:21:22, 77.82s/it]


 94%|█████████▎| 1617/1726 [28:17:42<2:21:22, 77.82s/it]
 94%|█████████▎| 1618/1726 [28:19:05<2:22:53, 79.38s/it]


 94%|█████████▎| 1618/1726 [28:19:05<2:22:53, 79.38s/it]
 94%|█████████▍| 1619/1726 [28:20:54<2:37:06, 88.10s/it]


 94%|█████████▍| 1619/1726 [28:20:54<2:37:06, 88.10s/it]
 94%|█████████▍| 1620/1726 [28:21:54<2:21:05, 7{'loss': 1.1664, 'learning_rate': 3.9442814411197125e-07, 'epoch': 0.94}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-11 04:59:19,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.67 | bwd_microstep: 1470.78 | bwd_inner_microstep: 1470.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 04:59:21,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1378.34 | bwd_inner_microstep: 1378.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2039
[2024-06-11 04:59:22,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.35 | bwd_microstep: 718.63 | bwd_inner_microstep: 718.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780
[2024-06-11 04:59:25,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.66 | bwd_microstep: 1545.59 | bwd_inner_microstep: 1545.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482
[2024-06-11 04:59:26,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.00 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401
[2024-06-11 04:59:28,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.45 | bwd_inner_microstep: 1150.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-11 04:59:30,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.71 | bwd_microstep: 1187.75 | bwd_inner_microstep: 1187.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2992
[2024-06-11 04:59:31,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.69 | bwd_microstep: 1101.62 | bwd_inner_microstep: 1101.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457
[2024-06-11 04:59:33,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.11 | bwd_microstep: 1278.54 | bwd_inner_microstep: 1278.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-11 04:59:34,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.17 | bwd_microstep: 792.59 | bwd_inner_microstep: 792.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912
[2024-06-11 04:59:35,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.22 | bwd_microstep: 687.14 | bwd_inner_microstep: 687.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 04:59:37,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.52 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007
[2024-06-11 04:59:38,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.30 | bwd_microstep: 899.10 | bwd_inner_microstep: 899.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664
[2024-06-11 04:59:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.33 | bwd_microstep: 1717.63 | bwd_inner_microstep: 1717.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 04:59:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 04:59:44,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1393.79 | bwd_inner_microstep: 1393.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507
[2024-06-11 04:59:46,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-11 04:59:48,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1250.52 | bwd_inner_microstep: 1250.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531
[2024-06-11 04:59:49,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.91 | bwd_microstep: 1328.11 | bwd_inner_microstep: 1328.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122
[2024-06-11 04:59:51,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.27 | bwd_microstep: 828.99 | bwd_inner_microstep: 828.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507
[2024-06-11 04:59:52,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1318.19 | bwd_inner_microstep: 1318.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687
[2024-06-11 04:59:55,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.53 | bwd_microstep: 1530.46 | bwd_inner_microstep: 1530.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449
[2024-06-11 04:59:56,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1386.36 | bwd_inner_microstep: 1386.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3609
[2024-06-11 04:59:58,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.38 | bwd_microstep: 1430.78 | bwd_inner_microstep: 1430.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565
[2024-06-11 05:00:01,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.63 | bwd_microstep: 1562.36 | bwd_inner_microstep: 1562.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2266
[2024-06-11 05:00:02,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.94 | bwd_microstep: 970.97 | bwd_inner_microstep: 970.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-11 05:00:04,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.30 | bwd_microstep: 1650.73 | bwd_inner_microstep: 1650.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562
[2024-06-11 05:00:06,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 05:00:08,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1384.17 | bwd_inner_microstep: 1384.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3562
[2024-06-11 05:00:10,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.05 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414
[2024-06-11 05:00:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.08 | bwd_microstep: 1310.11 | bwd_inner_microstep: 1310.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 05:00:20,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.52 | optimizer_step: 6.61
[2024-06-11 05:00:20,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 7916.82 | bwd_inner_microstep: 1407.66 | bwd_allreduce_microstep: 6509.06 | step_microstep: 41.02
[2024-06-11 05:00:20,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15282.85 | bwd: 47304.55 | bwd_inner: 40794.47 | bwd_allreduce: 6509.32 | step: 42.61
{'loss': 1.0814, 'learning_rate': 3.8704519165317923e-07, 'epoch': 0.94}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478
[2024-06-11 05:00:23,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1569.46 | bwd_inner_microstep: 1569.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3402
[2024-06-11 05:00:24,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1398.59 | bwd_inner_microstep: 1398.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477
[2024-06-11 05:00:26,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.48 | bwd_microstep: 1216.49 | bwd_inner_microstep: 1216.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 05:00:28,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1286.21 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-11 05:00:30,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1299.69 | bwd_inner_microstep: 1299.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-11 05:00:31,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 795.24 | bwd_inner_microstep: 795.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 05:00:33,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1289.81 | bwd_inner_microstep: 1289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1956
[2024-06-11 05:00:34,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.73 | bwd_microstep: 855.50 | bwd_inner_microstep: 855.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2901
[2024-06-11 05:00:35,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.60 | bwd_microstep: 1091.92 | bwd_inner_microstep: 1091.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 05:00:37,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1250.26 | bwd_inner_microstep: 1250.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507
[2024-06-11 05:00:39,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.65 | bwd_microstep: 1382.18 | bwd_inner_microstep: 1382.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-11 05:00:41,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1490.57 | bwd_inner_microstep: 1490.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2329
[2024-06-11 05:00:42,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.93 | bwd_microstep: 989.72 | bwd_inner_microstep: 989.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2945
[2024-06-11 05:00:44,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.78 | bwd_microstep: 1095.96 | bwd_inner_microstep: 1095.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 05:00:46,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.07 | bwd_microstep: 1404.26 | bwd_inner_microstep: 1404.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819
[2024-06-11 05:00:48,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.75 | bwd_microstep: 1691.09 | bwd_inner_microstep: 1691.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1949
[2024-06-11 05:00:49,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.58 | bwd_microstep: 825.84 | bwd_inner_microstep: 825.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615
[2024-06-11 05:00:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-11 05:00:53,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1398.63 | bwd_inner_microstep: 1398.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 05:00:55,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1283.50 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 05:00:57,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1391.30 | bwd_inner_microstep: 1391.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 05:00:59,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-11 05:01:01,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1497.52 | bwd_inner_microstep: 1497.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-11 05:01:03,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1286.83 | bwd_inner_microstep: 1286.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 05:01:04,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1250.21 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3609
[2024-06-11 05:01:06,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1343.00 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2027
[2024-06-11 05:01:07,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.60 | bwd_microstep: 809.07 | bwd_inner_microstep: 809.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 05:01:09,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 05:01:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1280.68 | bwd_inner_microstep: 1280.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 05:01:13,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1560.62 | bwd_inner_microstep: 1560.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224
[2024-06-11 05:01:15,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.80 | bwd_microstep: 960.67 | bwd_inner_microstep: 960.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3568
[2024-06-11 05:01:19,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.13 | optimizer_step: 6.62
[2024-06-11 05:01:19,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.67 | bwd_microstep: 4143.41 | bwd_inner_microstep: 1522.09 | bwd_allreduce_microstep: 2621.26 | step_microstep: 38.49
[2024-06-11 05:01:19,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15197.08 | bwd: 43334.26 | bwd_inner: 40712.08 | bwd_allreduce: 2621.49 | step: 40.21
{'loss': 1.1835, 'learning_rate': 3.797313159192628e-07, 'epoch': 0.94}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 05:01:21,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.63 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4695
[2024-06-11 05:01:24,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.17 | bwd_microstep: 1876.12 | bwd_inner_microstep: 1876.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3857
[2024-06-11 05:01:25,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1362.63 | bwd_inner_microstep: 1362.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 05:01:27,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1393.76 | bwd_inner_microstep: 1393.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215
[2024-06-11 05:01:29,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.23 | bwd_microstep: 955.15 | bwd_inner_microstep: 955.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2782
[2024-06-11 05:01:30,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.38 | bwd_microstep: 1053.54 | bwd_inner_microstep: 1053.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 05:01:32,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1285.58 | bwd_inner_microstep: 1285.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477
[2024-06-11 05:01:34,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.72 | bwd_microstep: 1185.46 | bwd_inner_microstep: 1185.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 05:01:35,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2138
[2024-06-11 05:01:37,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.84 | bwd_microstep: 1022.61 | bwd_inner_microstep: 1022.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 05:01:38,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1246.11 | bwd_inner_microstep: 1246.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 05:01:40,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1340.65 | bwd_inner_microstep: 1340.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 05:01:42,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1372.24 | bwd_inner_microstep: 1372.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-11 05:01:44,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.38 | bwd_microstep: 1329.03 | bwd_inner_microstep: 1329.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-11 05:01:46,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1484.17 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658
[2024-06-11 05:01:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1616.60 | bwd_inner_microstep: 1616.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 05:01:50,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1390.18 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 05:01:52,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.46 | bwd_microstep: 1290.09 | bwd_inner_microstep: 1290.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531
[2024-06-11 05:01:54,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1356.93 | bwd_inner_microstep: 1356.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282
[2024-06-11 05:01:55,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.81 | bwd_microstep: 877.19 | bwd_inner_microstep: 877.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927
[2024-06-11 05:01:56,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.34 | bwd_microstep: 697.67 | bwd_inner_microstep: 697.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814
[2024-06-11 05:01:58,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 05:02:00,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1284.71 | bwd_inner_microstep: 1284.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1931
[2024-06-11 05:02:01,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.18 | bwd_microstep: 760.23 | bwd_inner_microstep: 760.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2078
[2024-06-11 05:02:02,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.05 | bwd_microstep: 855.88 | bwd_inner_microstep: 855.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716
[2024-06-11 05:02:04,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1336.13 | bwd_inner_microstep: 1336.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-11 05:02:06,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1396.73 | bwd_inner_microstep: 1396.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536
[2024-06-11 05:02:08,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.97 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3589
[2024-06-11 05:02:10,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1368.25 | bwd_inner_microstep: 1368.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3808
[2024-06-11 05:02:12,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.47 | bwd_microstep: 1418.27 | bwd_inner_microstep: 1418.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-11 05:02:13,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.15 | bwd_microstep: 973.39 | bwd_inner_microstep: 973.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3047
[2024-06-11 05:03:49,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.69 | optimizer_step: 6.60
[2024-06-11 05:03:49,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 95267.24 | bwd_inner_microstep: 1495.03 | bwd_allreduce_microstep: 93772.13 | step_microstep: 41.07
[2024-06-11 05:03:49,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.37 | bwd: 134141.56 | bwd_inner: 40368.49 | bwd_allreduce: 93772.39 | step: 42.55
{'loss': 1.1262, 'learning_rate': 3.7248654266965665e-07, 'epoch': 0.94}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-11 05:03:51,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1464.84 | bwd_inner_microstep: 1464.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 05:03:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1372.38 | bwd_inner_microstep: 1372.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-11 05:03:54,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.70 | bwd_microstep: 1143.93 | bwd_inner_microstep: 1143.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992
[2024-06-11 05:03:55,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 796.91 | bwd_inner_microstep: 796.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3796
[2024-06-11 05:03:57,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.31 | bwd_microstep: 1346.29 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499
[2024-06-11 05:03:59,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.92 | bwd_microstep: 1476.27 | bwd_inner_microstep: 1476.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 05:04:01,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.54 | bwd_microstep: 1338.05 | bwd_inner_microstep: 1338.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 05:04:03,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.89 | bwd_microstep: 1280.84 | bwd_inner_microstep: 1280.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 05:04:05,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1382.95 | bwd_inner_microstep: 1382.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 05:04:53,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.01 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2565
[2024-06-11 05:04:55,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.21 | bwd_microstep: 995.98 | bwd_inner_microstep: 995.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-11 05:04:57,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.73 | bwd_microstep: 1488.89 | bwd_inner_microstep: 1488.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-11 05:04:59,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.84 | bwd_microstep: 1478.81 | bwd_inner_microstep: 1478.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2965
[2024-06-11 05:05:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.91 | bwd_microstep: 1239.64 | bwd_inner_microstep: 1239.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137
[2024-06-11 05:05:02,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.14 | bwd_microstep: 921.60 | bwd_inner_microstep: 921.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-11 05:05:04,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1496.13 | bwd_inner_microstep: 1496.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-11 05:05:06,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.88 | bwd_microstep: 1565.17 | bwd_inner_microstep: 1565.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898
[2024-06-11 05:05:07,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 679.82 | bwd_inner_microstep: 679.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181
[2024-06-11 05:05:08,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.53 | bwd_microstep: 855.79 | bwd_inner_microstep: 855.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828
[2024-06-11 05:05:10,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.25 | bwd_microstep: 1547.81 | bwd_inner_microstep: 1547.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3628
[2024-06-11 05:05:12,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.73 | bwd_microstep: 1271.62 | bwd_inner_microstep: 1271.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-11 05:05:14,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.09 | bwd_microstep: 1152.25 | bwd_inner_microstep: 1152.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-11 05:05:16,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.05 | bwd_microstep: 1402.64 | bwd_inner_microstep: 1402.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 05:05:18,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.54 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 653
[2024-06-11 05:05:18,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.94 | bwd_microstep: 276.69 | bwd_inner_microstep: 276.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 05:05:20,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.43 | bwd_microstep: 1374.89 | bwd_inner_microstep: 1374.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290
[2024-06-11 05:05:21,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.28 | bwd_microstep: 905.47 | bwd_inner_microstep: 905.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-11 05:05:23,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1408.01 | bwd_inner_microstep: 1407.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5438
[2024-06-11 05:05:26,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 776.33 | bwd_microstep: 2089.72 | bwd_inner_microstep: 2089.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3805
[2024-06-11 05:05:28,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1385.50 | bwd_inner_microstep: 1385.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814
[2024-06-11 05:05:30,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.50 | bwd_microstep: 1644.16 | bwd_inner_microstep: 1644.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805
[2024-06-11 05:05:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.12 | optimizer_step: 6.65
[2024-06-11 05:05:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.64 | bwd_microstep: 5537.23 | bwd_inner_microstep: 1741.09 | bwd_allreduce_microstep: 3796.08 | step_microstep: 38.17
[2024-06-11 05:05:36,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15367.69 | bwd: 44888.98 | bwd_inner: 41091.99 | bwd_allreduce: 3796.31 | step: 39.66
{'loss': 1.174, 'learning_rate': 3.653108974204145e-07, 'epoch': 0.94}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3406
[2024-06-11 05:05:38,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.02 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851
[2024-06-11 05:05:40,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.55 | bwd_microstep: 1454.47 | bwd_inner_microstep: 1454.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900
[2024-06-11 05:05:42,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.48 | bwd_microstep: 1581.47 | bwd_inner_microstep: 1581.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2303
[2024-06-11 05:05:43,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.11 | bwd_microstep: 908.99 | bwd_inner_microstep: 908.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 05:05:45,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.60 | bwd_microstep: 1377.99 | bwd_inner_microstep: 1377.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938
[2024-06-11 05:05:46,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.78 | bwd_microstep: 696.61 | bwd_inner_microstep: 696.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-11 05:05:48,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1413.45 | bwd_inner_microstep: 1413.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493
[2024-06-11 05:05:50,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1384.69 | bwd_inner_microstep: 1384.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 05:05:52,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 05:05:54,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1387.45 | bwd_inner_microstep: 1387.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3423
[2024-06-11 05:05:56,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.33 | bwd_microstep: 1184.22 | bwd_inner_microstep: 1184.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 05:05:58,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1346.31 | bwd_inner_microstep: 1346.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-11 05:05:59,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.28 | bwd_microstep: 1315.85 | bwd_inner_microstep: 1315.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633
[2024-06-11 05:06:01,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1537.35 | bwd_inner_microstep: 1537.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-11 05:06:03,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1450.84 | bwd_inner_microstep: 1450.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3638
[2024-06-11 05:06:06,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.09 | bwd_microstep: 1741.04 | bwd_inner_microstep: 1741.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829
[2024-06-11 05:06:08,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.40 | bwd_microstep: 1643.69 | bwd_inner_microstep: 1643.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3692
[2024-06-11 05:06:10,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1484.03 | bwd_inner_microstep: 1484.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-11 05:06:12,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1430.32 | bwd_inner_microstep: 1430.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3829
[2024-06-11 05:06:14,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.97 | bwd_microstep: 1264.05 | bwd_inner_microstep: 1264.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804
[2024-06-11 05:06:16,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.86 | bwd_microstep: 1747.53 | bwd_inner_microstep: 1747.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 05:06:18,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1394.75 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474
[2024-06-11 05:06:20,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1478.04 | bwd_inner_microstep: 1478.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 05:06:22,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1553.13 | bwd_inner_microstep: 1553.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 05:06:24,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1283.10 | bwd_inner_microstep: 1283.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 05:06:26,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1554.31 | bwd_inner_microstep: 1554.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066
[2024-06-11 05:06:28,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.62 | bwd_microstep: 918.36 | bwd_inner_microstep: 918.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3707
[2024-06-11 05:06:29,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.44 | bwd_microstep: 1294.57 | bwd_inner_microstep: 1294.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020
[2024-06-11 05:06:31,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.20 | bwd_microstep: 805.91 | bwd_inner_microstep: 805.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 05:06:33,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.59 | bwd_microstep: 1642.50 | bwd_inner_microstep: 1642.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812
[2024-06-11 05:06:35,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.72 | bwd_microstep: 1754.83 | bwd_inner_microstep: 1754.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413
[2024-06-11 05:07:21,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-11 05:07:21,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.51 | bwd_microstep: 45793.74 | bwd_inner_microstep: 1447.80 | bwd_allreduce_microstep: 44345.87 | step_microstep: 39.67
[2024-06-11 05:07:21,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16425.86 | bwd: 88490.92 | bwd_inner: 44144.13 | bwd_allreduce: 44346.11 | step: 41.09
9.86s/it]


 94%|█████████▍| 1620/1726 [28:21:54<2:21:05, 79.86s/it]
 94%|█████████▍| 1621/1726 [28:22:57<2:10:52, 74.78s/it]


 94%|█████████▍| 1621/1726 [28:22:57<2:10:52, 74.78s/it]
 94%|█████████▍| 1622/1726 [28:23:56<2:01:20, 70.01s/it]


 94%|█████████▍| 1622/1726 [28:23:56<2:01:20, 70.01s/it]
 94%|█████████▍| 1623/1726 [28:26:26<2:41:10, 93.89s/it]


 94%|█████████▍| 1623/1726 [28:26:26<2:41:10, 93.89s/it]
 94%|█████████▍| 1624/1726 [28:28:13<2:46:29, 97.94s/it]


 94%|█████████▍| 1624/1726 [28:28:13<2:46:29, 97.94s/it]
 94%|████████�{'loss': 1.1827, 'learning_rate': 3.5820440544411807e-07, 'epoch': 0.94}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546
[2024-06-11 05:07:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.82 | bwd_microstep: 1572.08 | bwd_inner_microstep: 1572.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3944
[2024-06-11 05:07:26,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.44 | bwd_microstep: 1684.89 | bwd_inner_microstep: 1684.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900
[2024-06-11 05:07:28,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.95 | bwd_microstep: 1574.61 | bwd_inner_microstep: 1574.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275
[2024-06-11 05:07:29,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.89 | bwd_microstep: 964.68 | bwd_inner_microstep: 964.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049
[2024-06-11 05:07:31,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.25 | bwd_microstep: 810.03 | bwd_inner_microstep: 810.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 05:07:32,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.75 | bwd_microstep: 1241.97 | bwd_inner_microstep: 1241.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 05:07:34,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1379.86 | bwd_inner_microstep: 1379.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3745
[2024-06-11 05:07:36,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.72 | bwd_microstep: 1631.22 | bwd_inner_microstep: 1631.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889
[2024-06-11 05:07:54,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.48 | bwd_microstep: 683.83 | bwd_inner_microstep: 683.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 05:07:56,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.17 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-11 05:07:59,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1287.04 | bwd_inner_microstep: 1287.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 05:08:02,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1500.61 | bwd_inner_microstep: 1500.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491
[2024-06-11 05:08:04,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1504.63 | bwd_inner_microstep: 1504.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2984
[2024-06-11 05:08:06,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.68 | bwd_microstep: 1036.55 | bwd_inner_microstep: 1036.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 05:08:07,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.73 | bwd_microstep: 1374.63 | bwd_inner_microstep: 1374.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 05:08:09,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1345.32 | bwd_inner_microstep: 1345.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 05:08:11,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1249.32 | bwd_inner_microstep: 1249.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-11 05:08:13,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1604.90 | bwd_inner_microstep: 1604.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681
[2024-06-11 05:08:15,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.05 | bwd_microstep: 1521.28 | bwd_inner_microstep: 1521.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637
[2024-06-11 05:08:17,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1418.69 | bwd_inner_microstep: 1418.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 05:08:19,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.48 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 05:08:21,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1493.58 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3701
[2024-06-11 05:08:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.35 | bwd_microstep: 1359.42 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964
[2024-06-11 05:08:24,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 703.03 | bwd_inner_microstep: 703.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 05:08:26,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.08 | bwd_microstep: 1452.42 | bwd_inner_microstep: 1452.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-11 05:08:28,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.02 | bwd_microstep: 1295.28 | bwd_inner_microstep: 1295.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3430
[2024-06-11 05:08:30,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.50 | bwd_microstep: 1308.35 | bwd_inner_microstep: 1308.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572
[2024-06-11 05:08:31,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.10 | bwd_microstep: 1299.90 | bwd_inner_microstep: 1299.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126
[2024-06-11 05:08:33,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.68 | bwd_microstep: 921.24 | bwd_inner_microstep: 921.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3577
[2024-06-11 05:08:35,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.73 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590
[2024-06-11 05:08:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.12 | bwd_microstep: 1424.91 | bwd_inner_microstep: 1424.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607
[2024-06-11 05:08:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.03 | optimizer_step: 6.59
[2024-06-11 05:08:40,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.98 | bwd_microstep: 2164.43 | bwd_inner_microstep: 1827.31 | bwd_allreduce_microstep: 337.07 | step_microstep: 37.31
[2024-06-11 05:08:40,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15884.97 | bwd: 42884.16 | bwd_inner: 42546.20 | bwd_allreduce: 337.29 | step: 38.84
{'loss': 1.2129, 'learning_rate': 3.511670917698018e-07, 'epoch': 0.94}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1868
[2024-06-11 05:08:41,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.52 | bwd_microstep: 763.72 | bwd_inner_microstep: 763.65 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3983
[2024-06-11 05:08:43,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1601.41 | bwd_inner_microstep: 1601.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867
[2024-06-11 05:08:45,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.70 | bwd_microstep: 1561.49 | bwd_inner_microstep: 1561.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871
[2024-06-11 05:08:47,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 1467.05 | bwd_inner_microstep: 1467.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3494
[2024-06-11 05:08:49,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.74 | bwd_microstep: 1219.46 | bwd_inner_microstep: 1219.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2252
[2024-06-11 05:08:50,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.51 | bwd_microstep: 964.00 | bwd_inner_microstep: 963.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498
[2024-06-11 05:08:52,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.13 | bwd_microstep: 1284.48 | bwd_inner_microstep: 1284.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 05:08:54,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1251.98 | bwd_inner_microstep: 1251.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3521
[2024-06-11 05:08:55,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.20 | bwd_microstep: 1226.52 | bwd_inner_microstep: 1226.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427
[2024-06-11 05:08:57,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1250.35 | bwd_inner_microstep: 1250.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494
[2024-06-11 05:08:59,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1315.13 | bwd_inner_microstep: 1315.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3527
[2024-06-11 05:09:01,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.71 | bwd_microstep: 1453.56 | bwd_inner_microstep: 1453.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-11 05:09:02,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.44 | bwd_microstep: 808.94 | bwd_inner_microstep: 808.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3696
[2024-06-11 05:09:04,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.21 | bwd_microstep: 1583.45 | bwd_inner_microstep: 1583.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2390
[2024-06-11 05:09:05,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.95 | bwd_microstep: 930.08 | bwd_inner_microstep: 930.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-11 05:09:07,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1486.99 | bwd_inner_microstep: 1486.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-11 05:09:09,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1421.08 | bwd_inner_microstep: 1421.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-11 05:09:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.77 | bwd_microstep: 675.95 | bwd_inner_microstep: 675.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-11 05:09:12,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.00 | bwd_microstep: 1528.12 | bwd_inner_microstep: 1528.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-11 05:09:14,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1450.56 | bwd_inner_microstep: 1450.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-11 05:09:17,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1490.24 | bwd_inner_microstep: 1490.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-11 05:09:19,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1493.87 | bwd_inner_microstep: 1493.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418
[2024-06-11 05:09:20,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.04 | bwd_microstep: 1184.07 | bwd_inner_microstep: 1184.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024
[2024-06-11 05:09:21,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.58 | bwd_microstep: 714.58 | bwd_inner_microstep: 714.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 05:09:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1556.84 | bwd_inner_microstep: 1556.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3629
[2024-06-11 05:09:25,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1473.91 | bwd_inner_microstep: 1473.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-11 05:09:27,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.52 | bwd_microstep: 1440.71 | bwd_inner_microstep: 1440.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2017
[2024-06-11 05:09:28,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.93 | bwd_microstep: 742.81 | bwd_inner_microstep: 742.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950
[2024-06-11 05:09:29,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.84 | bwd_microstep: 701.80 | bwd_inner_microstep: 701.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279
[2024-06-11 05:09:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.02 | bwd_microstep: 907.86 | bwd_inner_microstep: 907.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596
[2024-06-11 05:09:32,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1310.55 | bwd_inner_microstep: 1310.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-11 05:09:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.22 | optimizer_step: 6.63
[2024-06-11 05:09:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 8208.89 | bwd_inner_microstep: 1617.55 | bwd_allreduce_microstep: 6591.30 | step_microstep: 37.93
[2024-06-11 05:09:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14917.29 | bwd: 46470.45 | bwd_inner: 39878.20 | bwd_allreduce: 6591.55 | step: 39.39
{'loss': 1.1475, 'learning_rate': 3.441989811828417e-07, 'epoch': 0.94}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416
[2024-06-11 05:09:43,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1437.52 | bwd_inner_microstep: 1437.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 05:09:45,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.48 | bwd_microstep: 1394.34 | bwd_inner_microstep: 1394.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 05:09:47,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1346.81 | bwd_inner_microstep: 1346.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 05:09:49,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402
[2024-06-11 05:09:50,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.90 | bwd_microstep: 1149.08 | bwd_inner_microstep: 1149.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-11 05:09:52,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3603
[2024-06-11 05:09:54,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1341.53 | bwd_inner_microstep: 1341.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171
[2024-06-11 05:09:55,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.34 | bwd_microstep: 853.09 | bwd_inner_microstep: 853.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455
[2024-06-11 05:09:57,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1322.60 | bwd_inner_microstep: 1322.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-11 05:09:59,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.74 | bwd_microstep: 1523.15 | bwd_inner_microstep: 1523.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 05:10:01,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1483.08 | bwd_inner_microstep: 1483.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577
[2024-06-11 05:10:03,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 05:10:05,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1477.40 | bwd_inner_microstep: 1477.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-11 05:10:07,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1474.74 | bwd_inner_microstep: 1474.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-11 05:10:09,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1425.00 | bwd_inner_microstep: 1424.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 05:10:12,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.70 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3520
[2024-06-11 05:10:13,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.35 | bwd_microstep: 1366.88 | bwd_inner_microstep: 1366.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-11 05:10:15,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1294.48 | bwd_inner_microstep: 1294.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3930
[2024-06-11 05:10:17,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1527.25 | bwd_inner_microstep: 1527.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-11 05:10:19,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1423.86 | bwd_inner_microstep: 1423.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944
[2024-06-11 05:10:20,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 698.29 | bwd_inner_microstep: 698.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 05:10:22,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1508.69 | bwd_inner_microstep: 1508.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723
[2024-06-11 05:10:24,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1436.73 | bwd_inner_microstep: 1436.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2075
[2024-06-11 05:10:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.22 | bwd_microstep: 880.66 | bwd_inner_microstep: 880.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538
[2024-06-11 05:10:27,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1398.35 | bwd_inner_microstep: 1398.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 05:10:30,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.45 | bwd_microstep: 1562.85 | bwd_inner_microstep: 1562.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820
[2024-06-11 05:10:32,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.42 | bwd_microstep: 1388.45 | bwd_inner_microstep: 1388.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441
[2024-06-11 05:10:33,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1282.46 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-11 05:10:35,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-11 05:10:37,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1447.56 | bwd_inner_microstep: 1447.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3820
[2024-06-11 05:10:40,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.45 | bwd_microstep: 1822.59 | bwd_inner_microstep: 1822.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-11 05:10:42,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 05:10:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.47 | bwd_microstep: 1973.52 | bwd_inner_microstep: 1409.14 | bwd_allreduce_microstep: 564.34 | step_microstep: 37.75
[2024-06-11 05:10:42,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16339.01 | bwd: 44271.96 | bwd_inner: 43706.72 | bwd_allreduce: 564.56 | step: 39.22
{'loss': 1.1476, 'learning_rate': 3.3730009822488864e-07, 'epoch': 0.94}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-11 05:10:44,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1332.54 | bwd_inner_microstep: 1332.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3852
[2024-06-11 05:10:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1458.77 | bwd_inner_microstep: 1458.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-11 05:10:48,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1447.81 | bwd_inner_microstep: 1447.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750
[2024-06-11 05:10:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 05:10:52,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753
[2024-06-11 05:10:54,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1538.16 | bwd_inner_microstep: 1538.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2037
[2024-06-11 05:10:55,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.30 | bwd_microstep: 716.48 | bwd_inner_microstep: 716.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758
[2024-06-11 05:10:57,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1543.08 | bwd_inner_microstep: 1543.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 05:10:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.23 | bwd_microstep: 1251.18 | bwd_inner_microstep: 1251.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700
[2024-06-11 05:11:01,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.75 | bwd_microstep: 1423.81 | bwd_inner_microstep: 1423.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483
[2024-06-11 05:11:03,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1405.02 | bwd_inner_microstep: 1405.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-11 05:11:05,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1514.93 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616
[2024-06-11 05:11:07,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1424.71 | bwd_inner_microstep: 1424.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974
[2024-06-11 05:11:08,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.76 | bwd_microstep: 800.36 | bwd_inner_microstep: 800.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3431
[2024-06-11 05:11:10,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.10 | bwd_microstep: 1215.22 | bwd_inner_microstep: 1215.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487
[2024-06-11 05:11:12,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.81 | bwd_microstep: 1582.10 | bwd_inner_microstep: 1582.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3139
[2024-06-11 05:11:13,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.38 | bwd_microstep: 1253.61 | bwd_inner_microstep: 1253.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-11 05:11:15,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 1483.28 | bwd_inner_microstep: 1483.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-11 05:11:17,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 793.86 | bwd_inner_microstep: 793.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503
[2024-06-11 05:11:19,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1415.07 | bwd_inner_microstep: 1415.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-11 05:11:21,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1608.82 | bwd_inner_microstep: 1608.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 05:11:23,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433
[2024-06-11 05:11:24,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.89 | bwd_microstep: 1152.87 | bwd_inner_microstep: 1152.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811
[2024-06-11 05:11:27,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1654.31 | bwd_inner_microstep: 1654.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 05:11:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1397.50 | bwd_inner_microstep: 1397.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3703
[2024-06-11 05:11:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1330.07 | bwd_inner_microstep: 1330.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3811
[2024-06-11 05:11:33,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.40 | bwd_microstep: 1693.30 | bwd_inner_microstep: 1693.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3576
[2024-06-11 05:11:35,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.89 | bwd_microstep: 1632.10 | bwd_inner_microstep: 1632.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578
[2024-06-11 05:11:37,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1401.66 | bwd_inner_microstep: 1401.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727
[2024-06-11 05:11:39,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1432.84 | bwd_inner_microstep: 1432.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-11 05:11:41,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1498.58 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-11 05:11:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-11 05:11:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1489.01 | bwd_inner_microstep: 1481.30 | bwd_allreduce_microstep: 7.67 | step_microstep: 37.78
[2024-06-11 05:11:43,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16428.04 | bwd: 43955.26 | bwd_inner: 43946.69 | bwd_allreduce: 7.90 | step: 39.33
{'loss': 1.1369, 'learning_rate': 3.3047046719377305e-07, 'epoch': 0.94}
�▍| 1625/1726 [28:29:58<2:48:33, 100.13s/it]


 94%|█████████▍| 1625/1726 [28:29:58<2:48:33, 100.13s/it]
 94%|█████████▍| 1626/1726 [28:31:16<2:35:51, 93.52s/it]


 94%|█████████▍| 1626/1726 [28:31:16<2:35:51, 93.52s/it]
 94%|█████████▍| 1627/1726 [28:32:18<2:18:33, 83.97s/it]


 94%|█████████▍| 1627/1726 [28:32:18<2:18:33, 83.97s/it]
 94%|█████████▍| 1628/1726 [28:33:19<2:05:52, 77.06s/it]


 94%|█████████▍| 1628/1726 [28:33:19<2:05:52, 77.06s/it]
 94%|█████████▍| 1629/1726 [28:34:20<1:56:39, 72.16s/it]


 94%|█████████▍| 1629/1726 [28:34:20<1:56:39, 72dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 05:11:45,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1343.59 | bwd_inner_microstep: 1343.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-11 05:11:47,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1344.31 | bwd_inner_microstep: 1344.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 05:11:49,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-11 05:11:50,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.69 | bwd_microstep: 1303.32 | bwd_inner_microstep: 1303.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-11 05:11:51,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.45 | bwd_microstep: 791.63 | bwd_inner_microstep: 791.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 05:11:53,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.88 | bwd_microstep: 1286.94 | bwd_inner_microstep: 1286.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-11 05:11:55,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1253.63 | bwd_inner_microstep: 1253.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919
[2024-06-11 05:11:56,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.68 | bwd_microstep: 717.80 | bwd_inner_microstep: 717.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 05:11:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.63 | bwd_microstep: 1484.30 | bwd_inner_microstep: 1484.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755
[2024-06-11 05:12:00,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.32 | bwd_microstep: 1462.91 | bwd_inner_microstep: 1462.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3450
[2024-06-11 05:12:02,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1299.80 | bwd_inner_microstep: 1299.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-11 05:12:03,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.20 | bwd_microstep: 896.11 | bwd_inner_microstep: 896.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623
[2024-06-11 05:12:05,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.54 | bwd_microstep: 1708.98 | bwd_inner_microstep: 1708.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 05:12:07,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-11 05:12:09,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1342.43 | bwd_inner_microstep: 1342.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529
[2024-06-11 05:12:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.42 | bwd_microstep: 1414.45 | bwd_inner_microstep: 1414.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567
[2024-06-11 05:12:13,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1298.29 | bwd_inner_microstep: 1298.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2174
[2024-06-11 05:12:14,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.26 | bwd_microstep: 887.89 | bwd_inner_microstep: 887.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691
[2024-06-11 05:12:16,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.82 | bwd_microstep: 1617.68 | bwd_inner_microstep: 1617.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610
[2024-06-11 05:12:19,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.10 | bwd_microstep: 1601.53 | bwd_inner_microstep: 1601.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3626
[2024-06-11 05:12:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.88 | bwd_microstep: 1709.53 | bwd_inner_microstep: 1709.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3747
[2024-06-11 05:12:23,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1373.18 | bwd_inner_microstep: 1373.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3737
[2024-06-11 05:12:25,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1433.44 | bwd_inner_microstep: 1433.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 05:12:27,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1298.05 | bwd_inner_microstep: 1298.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-11 05:12:29,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1499.89 | bwd_inner_microstep: 1499.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 05:12:31,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.96 | bwd_microstep: 1556.24 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 05:12:33,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.68 | bwd_microstep: 1454.16 | bwd_inner_microstep: 1454.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-11 05:12:35,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1512.95 | bwd_inner_microstep: 1512.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809
[2024-06-11 05:12:37,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.85 | bwd_microstep: 1655.30 | bwd_inner_microstep: 1655.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 05:12:39,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1560.09 | bwd_inner_microstep: 1560.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602
[2024-06-11 05:12:41,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1534.21 | bwd_inner_microstep: 1534.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593
[2024-06-11 05:12:44,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.07 | optimizer_step: 6.62
[2024-06-11 05:12:44,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.66 | bwd_microstep: 1535.15 | bwd_inner_microstep: 1527.21 | bwd_allreduce_microstep: 7.89 | step_microstep: 37.51
[2024-06-11 05:12:44,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16378.12 | bwd: 43903.89 | bwd_inner: 43895.10 | bwd_allreduce: 8.12 | step: 38.97
{'loss': 1.1472, 'learning_rate': 3.2371011214342053e-07, 'epoch': 0.94}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-11 05:12:46,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.61 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 05:12:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1382.55 | bwd_inner_microstep: 1382.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845
[2024-06-11 05:12:50,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.56 | bwd_microstep: 1560.34 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 05:12:51,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.07 | bwd_microstep: 1342.22 | bwd_inner_microstep: 1342.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 05:12:53,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1484.94 | bwd_inner_microstep: 1484.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3732
[2024-06-11 05:12:55,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1335.07 | bwd_inner_microstep: 1335.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685
[2024-06-11 05:12:58,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1626.05 | bwd_inner_microstep: 1626.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694
[2024-06-11 05:13:00,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.59 | bwd_microstep: 1529.64 | bwd_inner_microstep: 1529.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-11 05:13:02,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1345.92 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957
[2024-06-11 05:13:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.38 | bwd_microstep: 892.19 | bwd_inner_microstep: 892.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387
[2024-06-11 05:13:05,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.55 | bwd_microstep: 1336.89 | bwd_inner_microstep: 1336.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 05:13:06,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512
[2024-06-11 05:13:08,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3602
[2024-06-11 05:13:11,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1567.02 | bwd_inner_microstep: 1566.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919
[2024-06-11 05:13:12,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.46 | bwd_microstep: 782.54 | bwd_inner_microstep: 782.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 05:13:13,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 794.24 | bwd_inner_microstep: 794.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076
[2024-06-11 05:13:14,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.68 | bwd_microstep: 822.29 | bwd_inner_microstep: 822.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 05:13:16,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1288.81 | bwd_inner_microstep: 1288.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588
[2024-06-11 05:13:18,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1340.93 | bwd_inner_microstep: 1340.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276
[2024-06-11 05:13:19,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 974.23 | bwd_inner_microstep: 974.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067
[2024-06-11 05:13:20,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 816.45 | bwd_inner_microstep: 816.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-11 05:13:21,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.33 | bwd_microstep: 975.84 | bwd_inner_microstep: 975.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949
[2024-06-11 05:13:22,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.37 | bwd_microstep: 700.35 | bwd_inner_microstep: 700.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 05:13:24,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1457.94 | bwd_inner_microstep: 1457.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2272
[2024-06-11 05:13:26,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.43 | bwd_microstep: 878.31 | bwd_inner_microstep: 878.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565
[2024-06-11 05:13:27,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.10 | bwd_microstep: 1304.21 | bwd_inner_microstep: 1304.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825
[2024-06-11 05:13:29,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1391.94 | bwd_inner_microstep: 1391.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3585
[2024-06-11 05:13:31,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.44 | bwd_microstep: 1425.39 | bwd_inner_microstep: 1425.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3574
[2024-06-11 05:13:33,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1444.68 | bwd_inner_microstep: 1444.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2223
[2024-06-11 05:13:35,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.69 | bwd_microstep: 1059.41 | bwd_inner_microstep: 1059.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3563
[2024-06-11 05:13:37,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.25 | bwd_microstep: 1471.90 | bwd_inner_microstep: 1471.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455
[2024-06-11 05:13:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.62
[2024-06-11 05:13:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 6887.62 | bwd_inner_microstep: 1532.14 | bwd_allreduce_microstep: 5355.43 | step_microstep: 37.98
[2024-06-11 05:13:44,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14924.87 | bwd: 45351.63 | bwd_inner: 39995.26 | bwd_allreduce: 5355.67 | step: 39.47
{'loss': 1.1699, 'learning_rate': 3.17019056883765e-07, 'epoch': 0.94}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472
[2024-06-11 05:13:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1563.99 | bwd_inner_microstep: 1563.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2399
[2024-06-11 05:13:48,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.14 | bwd_microstep: 998.50 | bwd_inner_microstep: 998.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501
[2024-06-11 05:13:50,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1445.03 | bwd_inner_microstep: 1445.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 05:13:52,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.65 | bwd_microstep: 1380.30 | bwd_inner_microstep: 1380.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 05:13:54,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1390.38 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 05:13:55,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1284.15 | bwd_inner_microstep: 1284.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 05:13:57,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1286.81 | bwd_inner_microstep: 1286.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506
[2024-06-11 05:13:59,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.83 | bwd_microstep: 1287.69 | bwd_inner_microstep: 1287.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 05:14:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1384.26 | bwd_inner_microstep: 1384.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510
[2024-06-11 05:14:02,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.69 | bwd_microstep: 1222.70 | bwd_inner_microstep: 1222.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684
[2024-06-11 05:14:04,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1417.02 | bwd_inner_microstep: 1416.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495
[2024-06-11 05:14:07,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.53 | bwd_microstep: 1579.65 | bwd_inner_microstep: 1579.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3834
[2024-06-11 05:14:09,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.41 | bwd_microstep: 1614.90 | bwd_inner_microstep: 1614.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3673
[2024-06-11 05:14:11,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.43 | bwd_microstep: 1719.65 | bwd_inner_microstep: 1719.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3542
[2024-06-11 05:14:14,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.10 | bwd_microstep: 1692.99 | bwd_inner_microstep: 1692.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492
[2024-06-11 05:14:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.78 | bwd_microstep: 1205.62 | bwd_inner_microstep: 1205.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986
[2024-06-11 05:14:16,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.94 | bwd_microstep: 834.21 | bwd_inner_microstep: 834.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 05:14:18,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.00 | bwd_microstep: 1397.79 | bwd_inner_microstep: 1397.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 05:14:20,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1514.31 | bwd_inner_microstep: 1514.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472
[2024-06-11 05:14:22,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.92 | bwd_microstep: 1247.93 | bwd_inner_microstep: 1247.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-11 05:14:24,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.55 | bwd_microstep: 1435.42 | bwd_inner_microstep: 1435.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619
[2024-06-11 05:14:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.29 | bwd_microstep: 1614.31 | bwd_inner_microstep: 1614.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3697
[2024-06-11 05:14:28,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.40 | bwd_microstep: 1383.80 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292
[2024-06-11 05:14:30,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.75 | bwd_microstep: 981.88 | bwd_inner_microstep: 981.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-11 05:14:32,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1417.20 | bwd_inner_microstep: 1417.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-11 05:14:34,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.49 | bwd_microstep: 1436.65 | bwd_inner_microstep: 1436.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2018
[2024-06-11 05:14:35,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.14 | bwd_microstep: 863.74 | bwd_inner_microstep: 863.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3698
[2024-06-11 05:14:37,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.99 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3017
[2024-06-11 05:14:39,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1326.35 | bwd_inner_microstep: 1325.21 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.09
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3786
[2024-06-11 05:14:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.43 | bwd_microstep: 1696.76 | bwd_inner_microstep: 1696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3826
[2024-06-11 05:14:43,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.29 | bwd_microstep: 1690.99 | bwd_inner_microstep: 1690.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492
[2024-06-11 05:14:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.65
[2024-06-11 05:14:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 1620.28 | bwd_inner_microstep: 1611.96 | bwd_allreduce_microstep: 8.27 | step_microstep: 37.87
[2024-06-11 05:14:46,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16592.30 | bwd: 44522.34 | bwd_inner: 44512.95 | bwd_allreduce: 8.63 | step: 39.62
{'loss': 1.1481, 'learning_rate': 3.103973249806691e-07, 'epoch': 0.95}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 05:14:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911
[2024-06-11 05:14:50,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.41 | bwd_microstep: 1690.07 | bwd_inner_microstep: 1690.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480
[2024-06-11 05:14:52,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937
[2024-06-11 05:14:53,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.29 | bwd_microstep: 792.38 | bwd_inner_microstep: 792.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2095
[2024-06-11 05:14:54,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.88 | bwd_microstep: 774.58 | bwd_inner_microstep: 774.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-11 05:14:55,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.95 | bwd_microstep: 791.08 | bwd_inner_microstep: 791.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-11 05:14:56,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.99 | bwd_microstep: 711.32 | bwd_inner_microstep: 711.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786
[2024-06-11 05:14:58,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1549.10 | bwd_inner_microstep: 1549.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 05:15:00,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1392.47 | bwd_inner_microstep: 1392.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-11 05:15:02,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.24 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3717
[2024-06-11 05:15:04,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.41 | bwd_microstep: 1702.36 | bwd_inner_microstep: 1702.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2307
[2024-06-11 05:15:06,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.40 | bwd_microstep: 981.65 | bwd_inner_microstep: 981.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 05:15:08,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1341.84 | bwd_inner_microstep: 1341.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3457
[2024-06-11 05:15:09,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1424.99 | bwd_inner_microstep: 1424.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3827
[2024-06-11 05:15:12,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.46 | bwd_microstep: 1618.92 | bwd_inner_microstep: 1618.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688
[2024-06-11 05:15:14,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.99 | bwd_microstep: 1526.67 | bwd_inner_microstep: 1526.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-11 05:15:16,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1394.45 | bwd_inner_microstep: 1394.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826
[2024-06-11 05:15:18,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.10 | bwd_microstep: 1659.09 | bwd_inner_microstep: 1659.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3507
[2024-06-11 05:15:20,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.21 | bwd_microstep: 1223.41 | bwd_inner_microstep: 1223.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 05:15:22,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.33 | bwd_microstep: 1558.99 | bwd_inner_microstep: 1558.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438
[2024-06-11 05:15:23,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.64 | bwd_microstep: 1158.27 | bwd_inner_microstep: 1158.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822
[2024-06-11 05:15:26,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.21 | bwd_microstep: 1487.11 | bwd_inner_microstep: 1487.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 05:15:27,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.46 | bwd_microstep: 1352.53 | bwd_inner_microstep: 1352.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638
[2024-06-11 05:15:29,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1504.15 | bwd_inner_microstep: 1504.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607
[2024-06-11 05:15:32,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.97 | bwd_microstep: 1703.51 | bwd_inner_microstep: 1703.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477
[2024-06-11 05:15:34,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1314.27 | bwd_inner_microstep: 1314.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2299
[2024-06-11 05:15:35,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.54 | bwd_microstep: 1022.25 | bwd_inner_microstep: 1022.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3587
[2024-06-11 05:15:37,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.33 | bwd_microstep: 1435.39 | bwd_inner_microstep: 1435.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-11 05:15:39,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1599.32 | bwd_inner_microstep: 1599.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602
[2024-06-11 05:15:41,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.40 | bwd_microstep: 1439.11 | bwd_inner_microstep: 1439.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-11 05:15:43,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.23 | bwd_microstep: 1450.12 | bwd_inner_microstep: 1450.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 05:15:47,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.25 | optimizer_step: 6.62
[2024-06-11 05:15:47,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 3679.41 | bwd_inner_microstep: 1578.29 | bwd_allreduce_microstep: 2101.04 | step_microstep: 39.40
[2024-06-11 05:15:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16098.37 | bwd: 45375.57 | bwd_inner: 43273.45 | bwd_allreduce: 2101.36 | step: 41.11
{'loss': 1.1526, 'learning_rate': 3.038449397558396e-07, 'epoch': 0.95}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-11 05:15:50,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1577.03 | bwd_inner_microstep: 1577.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4025
[2024-06-11 05:15:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1540.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581
[2024-06-11 05:15:54,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.82 | bwd_microstep: 1504.77 | bwd_inner_microstep: 1504.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 05:15:56,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.58 | bwd_microstep: 1650.43 | bwd_inner_microstep: 1650.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 05:15:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424
[2024-06-11 05:16:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.13 | bwd_microstep: 1151.63 | bwd_inner_microstep: 1151.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2190
[2024-06-11 05:16:01,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.92 | bwd_microstep: 951.62 | bwd_inner_microstep: 951.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495
[2024-06-11 05:16:03,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1284.87 | bwd_inner_microstep: 1284.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883
[2024-06-11 05:16:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.71 | bwd_microstep: 682.04 | bwd_inner_microstep: 682.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3680
[2024-06-11 05:16:06,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 1568.81 | bwd_inner_microstep: 1568.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440
[2024-06-11 05:16:08,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-11 05:16:10,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.16 | bwd_microstep: 1615.88 | bwd_inner_microstep: 1615.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3482
[2024-06-11 05:16:12,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1429.49 | bwd_inner_microstep: 1429.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714
[2024-06-11 05:16:14,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.77 | bwd_microstep: 1624.35 | bwd_inner_microstep: 1624.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670
[2024-06-11 05:16:16,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1511.90 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941
[2024-06-11 05:16:17,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.34 | bwd_microstep: 730.13 | bwd_inner_microstep: 730.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-11 05:16:19,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1514.21 | bwd_inner_microstep: 1514.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619
[2024-06-11 05:16:21,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1510.33 | bwd_inner_microstep: 1510.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580
[2024-06-11 05:16:23,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1399.26 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284
[2024-06-11 05:16:25,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.34 | bwd_microstep: 908.89 | bwd_inner_microstep: 908.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505
[2024-06-11 05:16:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-11 05:16:29,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.82 | bwd_microstep: 1525.85 | bwd_inner_microstep: 1525.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283
[2024-06-11 05:16:30,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.78 | bwd_microstep: 879.91 | bwd_inner_microstep: 879.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614
[2024-06-11 05:16:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1610.92 | bwd_inner_microstep: 1610.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2995
[2024-06-11 05:16:34,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.84 | bwd_microstep: 1109.41 | bwd_inner_microstep: 1109.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 05:16:36,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1552.16 | bwd_inner_microstep: 1552.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2883
[2024-06-11 05:16:37,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.53 | bwd_microstep: 1209.73 | bwd_inner_microstep: 1209.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3558
[2024-06-11 05:16:39,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.64 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522
[2024-06-11 05:16:41,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1294.47 | bwd_inner_microstep: 1294.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-11 05:16:43,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1449.94 | bwd_inner_microstep: 1449.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426
[2024-06-11 05:16:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1447.43 | bwd_inner_microstep: 1447.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3377
[2024-06-11 05:16:50,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-11 05:16:50,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.56 | bwd_microstep: 4295.96 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 3001.17 | step_microstep: 39.32
[2024-06-11 05:16:50,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16048.15 | bwd: 46059.35 | bwd_inner: 43054.73 | bwd_allreduce: 3001.41 | step: 40.89
{'loss': 1.2055, 'learning_rate': 2.9736192428674093e-07, 'epoch': 0.95}
.16s/it]
 94%|█████████▍| 1630/1726 [28:35:20<1:49:55, 68.70s/it]


 94%|█████████▍| 1630/1726 [28:35:20<1:49:55, 68.70s/it]
 94%|█████████▍| 1631/1726 [28:36:21<1:44:55, 66.27s/it]


 94%|█████████▍| 1631/1726 [28:36:21<1:44:55, 66.27s/it]
 95%|█████████▍| 1632/1726 [28:37:22<1:41:33, 64.83s/it]


 95%|█████████▍| 1632/1726 [28:37:22<1:41:33, 64.83s/it]
 95%|█████████▍| 1633/1726 [28:38:24<1:39:05, 63.93s/it]


 95%|█████████▍| 1633/1726 [28:38:24<1:39:05, 63.93s/it]
 95%|█████████▍| 1634/1726 [28:39:27<1:37:20, 63.49s/it]


 95%|█████████dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 05:16:52,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1375.20 | bwd_inner_microstep: 1375.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-11 05:16:53,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.89 | bwd_microstep: 1180.62 | bwd_inner_microstep: 1180.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865
[2024-06-11 05:16:55,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.61 | bwd_microstep: 741.31 | bwd_inner_microstep: 741.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 05:16:56,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.90 | bwd_microstep: 1386.03 | bwd_inner_microstep: 1386.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3883
[2024-06-11 05:16:59,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.34 | bwd_microstep: 1517.20 | bwd_inner_microstep: 1517.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 05:17:00,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1251.61 | bwd_inner_microstep: 1251.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769
[2024-06-11 05:17:02,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.33 | bwd_microstep: 1437.65 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-11 05:17:04,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.95 | bwd_inner_microstep: 1345.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2693
[2024-06-11 05:17:06,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.89 | bwd_microstep: 1020.38 | bwd_inner_microstep: 1020.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3722
[2024-06-11 05:17:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.85 | bwd_microstep: 1490.29 | bwd_inner_microstep: 1490.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670
[2024-06-11 05:17:09,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1326.33 | bwd_inner_microstep: 1326.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 749
[2024-06-11 05:17:10,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.00 | bwd_microstep: 302.07 | bwd_inner_microstep: 302.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886
[2024-06-11 05:17:11,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.65 | bwd_microstep: 682.72 | bwd_inner_microstep: 682.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3674
[2024-06-11 05:17:13,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.35 | bwd_microstep: 1570.56 | bwd_inner_microstep: 1570.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475
[2024-06-11 05:17:15,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.23 | bwd_microstep: 1344.12 | bwd_inner_microstep: 1344.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-11 05:17:17,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.89 | bwd_microstep: 1489.09 | bwd_inner_microstep: 1489.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1901
[2024-06-11 05:17:18,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.27 | bwd_microstep: 779.35 | bwd_inner_microstep: 779.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 05:17:20,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1354.82 | bwd_inner_microstep: 1354.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3840
[2024-06-11 05:17:22,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.64 | bwd_microstep: 1859.38 | bwd_inner_microstep: 1859.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602
[2024-06-11 05:17:24,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.73 | bwd_microstep: 1538.20 | bwd_inner_microstep: 1538.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 911
[2024-06-11 05:17:25,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.97 | bwd_microstep: 371.94 | bwd_inner_microstep: 371.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619
[2024-06-11 05:17:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1417.73 | bwd_inner_microstep: 1417.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 957
[2024-06-11 05:17:27,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.95 | bwd_microstep: 381.37 | bwd_inner_microstep: 381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-11 05:17:30,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.06 | bwd_microstep: 1637.75 | bwd_inner_microstep: 1637.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 05:17:32,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 1499.85 | bwd_inner_microstep: 1499.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3557
[2024-06-11 05:17:34,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.48 | bwd_microstep: 1333.20 | bwd_inner_microstep: 1333.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 698
[2024-06-11 05:17:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.65 | bwd_microstep: 286.27 | bwd_inner_microstep: 286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-11 05:17:36,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1308.77 | bwd_inner_microstep: 1308.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3805
[2024-06-11 05:17:38,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.44 | bwd_microstep: 1583.63 | bwd_inner_microstep: 1583.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461
[2024-06-11 05:17:40,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1473.76 | bwd_inner_microstep: 1473.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3802
[2024-06-11 05:17:42,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.80 | bwd_microstep: 1716.50 | bwd_inner_microstep: 1716.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-11 05:17:51,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.00 | optimizer_gradients: 4.28 | optimizer_step: 6.58
[2024-06-11 05:17:51,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 7600.38 | bwd_inner_microstep: 1867.00 | bwd_allreduce_microstep: 5733.30 | step_microstep: 41.59
[2024-06-11 05:17:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14842.44 | bwd: 45604.05 | bwd_inner: 39869.82 | bwd_allreduce: 5733.55 | step: 43.11
{'loss': 1.1403, 'learning_rate': 2.909483014065195e-07, 'epoch': 0.95}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2011
[2024-06-11 05:17:52,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.74 | bwd_microstep: 893.48 | bwd_inner_microstep: 893.37 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2514
[2024-06-11 05:17:53,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.87 | bwd_microstep: 940.46 | bwd_inner_microstep: 940.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537
[2024-06-11 05:17:55,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.93 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-11 05:17:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1474.49 | bwd_inner_microstep: 1474.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533
[2024-06-11 05:17:59,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1229.75 | bwd_inner_microstep: 1229.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1387
[2024-06-11 05:18:00,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.20 | bwd_microstep: 525.00 | bwd_inner_microstep: 524.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949
[2024-06-11 05:18:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.25 | bwd_microstep: 792.78 | bwd_inner_microstep: 792.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957
[2024-06-11 05:18:02,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 795.73 | bwd_inner_microstep: 795.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2476
[2024-06-11 05:18:03,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.85 | bwd_microstep: 954.41 | bwd_inner_microstep: 954.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4048
[2024-06-11 05:18:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.78 | bwd_microstep: 1814.00 | bwd_inner_microstep: 1813.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-11 05:18:08,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1584.40 | bwd_inner_microstep: 1584.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 05:18:10,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.82 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 05:18:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1377.23 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1912
[2024-06-11 05:18:13,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.34 | bwd_microstep: 778.60 | bwd_inner_microstep: 778.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-11 05:18:15,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.47 | bwd_microstep: 1611.79 | bwd_inner_microstep: 1611.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 05:18:17,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1487.73 | bwd_inner_microstep: 1487.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651
[2024-06-11 05:18:19,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.39 | bwd_microstep: 1614.75 | bwd_inner_microstep: 1614.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 05:18:21,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1353.38 | bwd_inner_microstep: 1353.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2172
[2024-06-11 05:18:23,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.85 | bwd_microstep: 1012.93 | bwd_inner_microstep: 1012.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 05:18:25,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1492.80 | bwd_inner_microstep: 1492.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 05:18:26,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-11 05:18:28,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1419.04 | bwd_inner_microstep: 1419.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 941
[2024-06-11 05:18:29,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.79 | bwd_microstep: 378.84 | bwd_inner_microstep: 378.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828
[2024-06-11 05:18:31,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1452.78 | bwd_inner_microstep: 1452.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-11 05:18:33,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1514.84 | bwd_inner_microstep: 1514.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765
[2024-06-11 05:18:35,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1644.57 | bwd_inner_microstep: 1644.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3564
[2024-06-11 05:18:37,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.47 | bwd_microstep: 1267.50 | bwd_inner_microstep: 1267.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 05:18:39,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.32 | bwd_microstep: 1288.18 | bwd_inner_microstep: 1288.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806
[2024-06-11 05:18:41,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1459.31 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437
[2024-06-11 05:18:42,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.42 | bwd_microstep: 1161.31 | bwd_inner_microstep: 1161.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936
[2024-06-11 05:18:43,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.56 | bwd_microstep: 789.22 | bwd_inner_microstep: 789.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-11 05:18:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.40 | optimizer_step: 6.60
[2024-06-11 05:18:52,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.27 | bwd_microstep: 7877.13 | bwd_inner_microstep: 1876.47 | bwd_allreduce_microstep: 6000.57 | step_microstep: 40.67
[2024-06-11 05:18:52,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14840.38 | bwd: 46043.81 | bwd_inner: 40042.21 | bwd_allreduce: 6000.86 | step: 42.32
{'loss': 1.129, 'learning_rate': 2.8460409370392405e-07, 'epoch': 0.95}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3553
[2024-06-11 05:18:54,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1355.47 | bwd_inner_microstep: 1355.36 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459
[2024-06-11 05:18:56,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1381.71 | bwd_inner_microstep: 1381.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 05:18:58,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1277.43 | bwd_inner_microstep: 1277.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 05:18:59,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1251.01 | bwd_inner_microstep: 1250.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797
[2024-06-11 05:19:01,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1452.62 | bwd_inner_microstep: 1452.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-11 05:19:03,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1387.38 | bwd_inner_microstep: 1387.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-11 05:19:05,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1347.48 | bwd_inner_microstep: 1347.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2245
[2024-06-11 05:19:06,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.65 | bwd_microstep: 967.31 | bwd_inner_microstep: 967.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-11 05:19:08,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 846
[2024-06-11 05:19:09,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.19 | bwd_microstep: 350.32 | bwd_inner_microstep: 350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950
[2024-06-11 05:19:10,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.16 | bwd_microstep: 792.58 | bwd_inner_microstep: 792.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-11 05:19:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1438.89 | bwd_inner_microstep: 1438.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506
[2024-06-11 05:19:14,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1448.87 | bwd_inner_microstep: 1448.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016
[2024-06-11 05:19:15,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.92 | bwd_microstep: 902.20 | bwd_inner_microstep: 902.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461
[2024-06-11 05:19:17,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1408.49 | bwd_inner_microstep: 1408.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535
[2024-06-11 05:19:19,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1456.44 | bwd_inner_microstep: 1456.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666
[2024-06-11 05:19:21,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.24 | bwd_microstep: 1424.68 | bwd_inner_microstep: 1424.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492
[2024-06-11 05:19:23,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1434.29 | bwd_inner_microstep: 1434.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3523
[2024-06-11 05:19:25,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1555.96 | bwd_inner_microstep: 1555.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-11 05:19:27,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1606.02 | bwd_inner_microstep: 1606.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897
[2024-06-11 05:19:29,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.56 | bwd_microstep: 1588.34 | bwd_inner_microstep: 1588.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 05:19:31,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1384.11 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573
[2024-06-11 05:19:33,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1301.77 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 05:19:35,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.19 | bwd_microstep: 1281.84 | bwd_inner_microstep: 1281.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810
[2024-06-11 05:19:37,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.41 | bwd_microstep: 1752.77 | bwd_inner_microstep: 1752.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3829
[2024-06-11 05:19:39,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1405.71 | bwd_inner_microstep: 1405.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 05:19:41,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1556.97 | bwd_inner_microstep: 1556.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513
[2024-06-11 05:19:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1290.93 | bwd_inner_microstep: 1290.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 05:19:45,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.00 | bwd_microstep: 1658.65 | bwd_inner_microstep: 1658.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3586
[2024-06-11 05:19:48,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.31 | bwd_microstep: 1533.71 | bwd_inner_microstep: 1533.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-11 05:19:49,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1399.33 | bwd_inner_microstep: 1399.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 05:19:54,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.12 | optimizer_step: 6.59
[2024-06-11 05:19:54,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 3681.15 | bwd_inner_microstep: 1422.77 | bwd_allreduce_microstep: 2258.32 | step_microstep: 38.58
[2024-06-11 05:19:54,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16057.76 | bwd: 45326.08 | bwd_inner: 43066.77 | bwd_allreduce: 2258.61 | step: 40.63
{'loss': 1.1928, 'learning_rate': 2.7832932352322094e-07, 'epoch': 0.95}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391
[2024-06-11 05:19:55,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.97 | bwd_microstep: 1238.28 | bwd_inner_microstep: 1238.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3878
[2024-06-11 05:19:57,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1485.66 | bwd_inner_microstep: 1485.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901
[2024-06-11 05:20:00,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.17 | bwd_microstep: 1585.24 | bwd_inner_microstep: 1585.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-11 05:20:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.97 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1481.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3786
[2024-06-11 05:20:04,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1448.08 | bwd_inner_microstep: 1448.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742
[2024-06-11 05:20:06,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.28 | bwd_microstep: 1633.63 | bwd_inner_microstep: 1633.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 05:20:08,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1386.50 | bwd_inner_microstep: 1386.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 05:20:10,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.35 | bwd_microstep: 1387.35 | bwd_inner_microstep: 1387.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-11 05:20:12,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1290.90 | bwd_inner_microstep: 1290.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 05:20:13,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1287.78 | bwd_inner_microstep: 1287.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3680
[2024-06-11 05:20:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.84 | bwd_microstep: 1485.66 | bwd_inner_microstep: 1485.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 05:20:17,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.15 | bwd_microstep: 1257.36 | bwd_inner_microstep: 1257.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114
[2024-06-11 05:20:18,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.75 | bwd_microstep: 922.93 | bwd_inner_microstep: 922.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2209
[2024-06-11 05:20:20,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.83 | bwd_microstep: 1055.66 | bwd_inner_microstep: 1055.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 05:20:22,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.62 | bwd_microstep: 1363.71 | bwd_inner_microstep: 1363.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3543
[2024-06-11 05:20:23,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1201.77 | bwd_inner_microstep: 1201.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3836
[2024-06-11 05:20:26,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.43 | bwd_microstep: 1692.42 | bwd_inner_microstep: 1692.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-11 05:20:28,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.47 | bwd_microstep: 1523.39 | bwd_inner_microstep: 1523.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827
[2024-06-11 05:20:30,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1391.28 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-11 05:20:32,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1395.63 | bwd_inner_microstep: 1395.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3569
[2024-06-11 05:20:33,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.62 | bwd_microstep: 1236.08 | bwd_inner_microstep: 1236.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297
[2024-06-11 05:20:35,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.94 | bwd_microstep: 983.74 | bwd_inner_microstep: 983.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831
[2024-06-11 05:20:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.59 | bwd_microstep: 1663.43 | bwd_inner_microstep: 1663.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-11 05:20:39,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.91 | bwd_microstep: 1198.26 | bwd_inner_microstep: 1198.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 05:20:41,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1409.46 | bwd_inner_microstep: 1409.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-11 05:20:42,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.72 | bwd_microstep: 1189.59 | bwd_inner_microstep: 1189.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3775
[2024-06-11 05:20:45,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.08 | bwd_microstep: 1574.38 | bwd_inner_microstep: 1574.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812
[2024-06-11 05:20:47,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1514.84 | bwd_inner_microstep: 1514.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3444
[2024-06-11 05:20:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-11 05:20:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1496.46 | bwd_inner_microstep: 1496.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 05:20:53,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1551.43 | bwd_inner_microstep: 1551.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-11 05:20:56,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.12 | optimizer_step: 6.59
[2024-06-11 05:20:56,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.08 | bwd_microstep: 2570.21 | bwd_inner_microstep: 824.55 | bwd_allreduce_microstep: 1745.60 | step_microstep: 39.07
[2024-06-11 05:20:56,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16279.92 | bwd: 45285.18 | bwd_inner: 43538.39 | bwd_allreduce: 1745.97 | step: 41.31
{'loss': 1.2094, 'learning_rate': 2.7212401296411675e-07, 'epoch': 0.95}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1887
[2024-06-11 05:20:57,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.03 | bwd_microstep: 866.87 | bwd_inner_microstep: 866.74 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3908
[2024-06-11 05:20:59,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-11 05:21:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 05:21:03,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-11 05:21:04,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.80 | bwd_microstep: 679.05 | bwd_inner_microstep: 679.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-11 05:21:05,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.36 | bwd_microstep: 1276.28 | bwd_inner_microstep: 1276.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913
[2024-06-11 05:21:06,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.44 | bwd_microstep: 686.90 | bwd_inner_microstep: 686.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3409
[2024-06-11 05:21:08,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.47 | bwd_microstep: 1181.83 | bwd_inner_microstep: 1181.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1377
[2024-06-11 05:21:09,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 214.58 | bwd_microstep: 556.40 | bwd_inner_microstep: 556.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424
[2024-06-11 05:21:11,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489
[2024-06-11 05:21:13,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2110
[2024-06-11 05:21:14,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.70 | bwd_microstep: 1017.77 | bwd_inner_microstep: 1017.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465
[2024-06-11 05:21:16,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3496
[2024-06-11 05:21:18,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.68 | bwd_microstep: 1553.88 | bwd_inner_microstep: 1553.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3665
[2024-06-11 05:21:20,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.06 | bwd_microstep: 1465.93 | bwd_inner_microstep: 1465.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-11 05:21:22,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.11 | bwd_microstep: 1284.17 | bwd_inner_microstep: 1284.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587
[2024-06-11 05:21:24,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828
[2024-06-11 05:21:26,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1390.74 | bwd_inner_microstep: 1390.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472
[2024-06-11 05:21:27,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.11 | bwd_microstep: 1184.09 | bwd_inner_microstep: 1184.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2242
[2024-06-11 05:21:28,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.98 | bwd_microstep: 902.56 | bwd_inner_microstep: 902.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2924
[2024-06-11 05:21:30,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.97 | bwd_microstep: 1191.04 | bwd_inner_microstep: 1191.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-11 05:21:32,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.37 | bwd_microstep: 1399.07 | bwd_inner_microstep: 1399.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1384
[2024-06-11 05:21:33,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 204.14 | bwd_microstep: 527.25 | bwd_inner_microstep: 527.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812
[2024-06-11 05:21:35,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.19 | bwd_microstep: 1575.21 | bwd_inner_microstep: 1575.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3389
[2024-06-11 05:21:37,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.56 | bwd_microstep: 1366.01 | bwd_inner_microstep: 1365.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817
[2024-06-11 05:21:39,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.94 | bwd_microstep: 1748.93 | bwd_inner_microstep: 1748.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-11 05:21:41,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.62 | bwd_microstep: 1314.85 | bwd_inner_microstep: 1314.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3770
[2024-06-11 05:21:43,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1375.91 | bwd_inner_microstep: 1375.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-11 05:21:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1470.56 | bwd_inner_microstep: 1470.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3561
[2024-06-11 05:21:47,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1381.85 | bwd_inner_microstep: 1381.41 | bwd_allreduce_microstep: 0.21 | step_microstep: 0.36
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583
[2024-06-11 05:21:49,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.52 | bwd_microstep: 1305.31 | bwd_inner_microstep: 1305.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-11 05:21:57,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.62
[2024-06-11 05:21:57,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.08 | bwd_microstep: 7424.96 | bwd_inner_microstep: 1804.23 | bwd_allreduce_microstep: 5620.66 | step_microstep: 42.50
[2024-06-11 05:21:57,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15030.20 | bwd: 45781.43 | bwd_inner: 40159.40 | bwd_allreduce: 5621.18 | step: 44.77
▍| 1634/1726 [28:39:27<1:37:20, 63.49s/it]
 95%|█████████▍| 1635/1726 [28:40:27<1:35:03, 62.68s/it]


 95%|█████████▍| 1635/1726 [28:40:27<1:35:03, 62.68s/it]
 95%|█████████▍| 1636/1726 [28:41:29<1:33:21, 62.24s/it]


 95%|█████████▍| 1636/1726 [28:41:29<1:33:21, 62.24s/it]
 95%|█████████▍| 1637/1726 [28:42:30<1:32:06, 62.09s/it]


 95%|█████████▍| 1637/1726 [28:42:30<1:32:06, 62.09s/it]
 95%|█████████▍| 1638/1726 [28:43:32<1:31:00, 62.05s/it]


 95%|█████████▍| 1638/1726 [28:43:32<1:31:00, 62.05s/it]
 95%|█████████▍| 1639/1726 [28:44:34<1:29:35, 61.79s/it]
                                                      {'loss': 1.1314, 'learning_rate': 2.6598818388168246e-07, 'epoch': 0.95}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-11 05:21:59,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1481.57 | bwd_inner_microstep: 1481.44 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005
[2024-06-11 05:22:00,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.90 | bwd_microstep: 798.22 | bwd_inner_microstep: 798.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 05:22:02,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1343.61 | bwd_inner_microstep: 1343.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792
[2024-06-11 05:22:04,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1547.97 | bwd_inner_microstep: 1547.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-11 05:22:06,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1459.19 | bwd_inner_microstep: 1459.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 05:22:08,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1278.54 | bwd_inner_microstep: 1278.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 05:22:09,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1247.21 | bwd_inner_microstep: 1247.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 05:22:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1245.53 | bwd_inner_microstep: 1245.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767
[2024-06-11 05:22:13,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.92 | bwd_microstep: 1437.93 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.21 | step_microstep: 0.33
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430
[2024-06-11 05:22:15,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.92 | bwd_microstep: 1187.40 | bwd_inner_microstep: 1187.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 05:22:17,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1287.53 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 05:22:19,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3484
[2024-06-11 05:22:21,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.30 | bwd_microstep: 1508.84 | bwd_inner_microstep: 1508.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449
[2024-06-11 05:22:22,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1317.57 | bwd_inner_microstep: 1317.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656
[2024-06-11 05:22:24,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1423.21 | bwd_inner_microstep: 1423.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490
[2024-06-11 05:22:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.15 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-11 05:22:28,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.32 | bwd_microstep: 1167.22 | bwd_inner_microstep: 1167.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3679
[2024-06-11 05:22:30,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.24 | bwd_microstep: 1356.48 | bwd_inner_microstep: 1356.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 05:22:32,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1377.35 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436
[2024-06-11 05:22:33,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.06 | bwd_microstep: 1156.16 | bwd_inner_microstep: 1156.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452
[2024-06-11 05:22:35,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.86 | bwd_microstep: 1160.59 | bwd_inner_microstep: 1160.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 05:22:37,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.35 | bwd_microstep: 1661.95 | bwd_inner_microstep: 1661.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820
[2024-06-11 05:22:39,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.14 | bwd_microstep: 1658.99 | bwd_inner_microstep: 1658.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-11 05:22:41,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.31 | bwd_microstep: 978.77 | bwd_inner_microstep: 978.50 | bwd_allreduce_microstep: 0.15 | step_microstep: 0.18
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3670
[2024-06-11 05:22:43,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1660.41 | bwd_inner_microstep: 1660.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593
[2024-06-11 05:22:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1345.14 | bwd_inner_microstep: 1345.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292
[2024-06-11 05:22:46,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.32 | bwd_microstep: 882.20 | bwd_inner_microstep: 882.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491
[2024-06-11 05:22:48,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1477.67 | bwd_inner_microstep: 1477.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 05:22:50,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.00 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807
[2024-06-11 05:22:52,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1449.36 | bwd_inner_microstep: 1449.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434
[2024-06-11 05:22:54,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1313.18 | bwd_inner_microstep: 1313.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729
[2024-06-11 05:22:58,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.63
[2024-06-11 05:22:58,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.05 | bwd_microstep: 3224.37 | bwd_inner_microstep: 1747.51 | bwd_allreduce_microstep: 1476.79 | step_microstep: 39.59
[2024-06-11 05:22:58,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16059.50 | bwd: 44490.75 | bwd_inner: 43012.35 | bwd_allreduce: 1477.44 | step: 42.03
{'loss': 1.1699, 'learning_rate': 2.5992185788627834e-07, 'epoch': 0.95}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423
[2024-06-11 05:23:00,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1442.40 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.15
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-11 05:23:01,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.24 | bwd_microstep: 697.95 | bwd_inner_microstep: 697.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3860
[2024-06-11 05:23:03,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.12 | bwd_microstep: 1458.63 | bwd_inner_microstep: 1458.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775
[2024-06-11 05:23:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1438.30 | bwd_inner_microstep: 1438.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782
[2024-06-11 05:23:07,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.63 | bwd_microstep: 1350.85 | bwd_inner_microstep: 1350.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 05:23:08,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1299.68 | bwd_inner_microstep: 1299.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518
[2024-06-11 05:23:10,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.11 | bwd_microstep: 1224.51 | bwd_inner_microstep: 1224.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417
[2024-06-11 05:23:12,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1345.42 | bwd_inner_microstep: 1345.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862
[2024-06-11 05:23:14,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1463.53 | bwd_inner_microstep: 1463.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512
[2024-06-11 05:23:16,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.22 | bwd_microstep: 1192.13 | bwd_inner_microstep: 1192.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2901
[2024-06-11 05:23:17,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.47 | bwd_microstep: 1092.02 | bwd_inner_microstep: 1091.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487
[2024-06-11 05:23:19,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.30 | bwd_microstep: 1322.33 | bwd_inner_microstep: 1322.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3983
[2024-06-11 05:23:22,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 695.22 | bwd_microstep: 1904.13 | bwd_inner_microstep: 1904.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911
[2024-06-11 05:23:23,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.28 | bwd_microstep: 686.79 | bwd_inner_microstep: 686.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3644
[2024-06-11 05:23:25,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.81 | bwd_microstep: 1815.78 | bwd_inner_microstep: 1815.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 05:23:27,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1245.86 | bwd_inner_microstep: 1245.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423
[2024-06-11 05:23:28,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.26 | bwd_microstep: 1151.97 | bwd_inner_microstep: 1151.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 05:23:30,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.74 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819
[2024-06-11 05:23:32,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1546.28 | bwd_inner_microstep: 1546.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717
[2024-06-11 05:23:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1366.71 | bwd_inner_microstep: 1366.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3539
[2024-06-11 05:23:36,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.88 | bwd_microstep: 1467.16 | bwd_inner_microstep: 1466.95 | bwd_allreduce_microstep: 0.13 | step_microstep: 0.25
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610
[2024-06-11 05:23:38,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1407.83 | bwd_inner_microstep: 1407.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 05:23:40,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1258.04 | bwd_inner_microstep: 1258.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605
[2024-06-11 05:23:42,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999
[2024-06-11 05:23:43,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.43 | bwd_microstep: 741.36 | bwd_inner_microstep: 741.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286
[2024-06-11 05:23:44,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.20 | bwd_microstep: 975.10 | bwd_inner_microstep: 975.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 05:23:46,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 05:23:48,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1482.03 | bwd_inner_microstep: 1482.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1986
[2024-06-11 05:23:49,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.01 | bwd_microstep: 706.67 | bwd_inner_microstep: 706.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-11 05:23:51,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1498.55 | bwd_inner_microstep: 1498.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589
[2024-06-11 05:23:53,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1506.37 | bwd_inner_microstep: 1506.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-11 05:23:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.57
[2024-06-11 05:23:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.83 | bwd_microstep: 3877.76 | bwd_inner_microstep: 2098.77 | bwd_allreduce_microstep: 1778.91 | step_microstep: 40.17
[2024-06-11 05:23:58,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15662.07 | bwd: 43942.24 | bwd_inner: 42162.11 | bwd_allreduce: 1779.37 | step: 42.05
{'loss': 1.161, 'learning_rate': 2.539250563434736e-07, 'epoch': 0.95}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-11 05:24:00,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.39 | bwd_microstep: 1335.55 | bwd_inner_microstep: 1335.45 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 05:24:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1250.46 | bwd_inner_microstep: 1250.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888
[2024-06-11 05:24:03,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1584.53 | bwd_inner_microstep: 1584.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744
[2024-06-11 05:24:06,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.07 | bwd_microstep: 1631.90 | bwd_inner_microstep: 1631.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939
[2024-06-11 05:24:07,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.84 | bwd_microstep: 822.97 | bwd_inner_microstep: 822.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492
[2024-06-11 05:24:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.10 | bwd_microstep: 1221.52 | bwd_inner_microstep: 1221.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-11 05:24:10,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.57 | bwd_microstep: 1151.23 | bwd_inner_microstep: 1151.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894
[2024-06-11 05:24:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.54 | bwd_microstep: 684.09 | bwd_inner_microstep: 684.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421
[2024-06-11 05:24:13,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.60 | bwd_microstep: 1153.85 | bwd_inner_microstep: 1153.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696
[2024-06-11 05:24:15,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1529.27 | bwd_inner_microstep: 1529.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3499
[2024-06-11 05:24:17,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.72 | bwd_microstep: 1623.46 | bwd_inner_microstep: 1623.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-11 05:24:19,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.86 | bwd_microstep: 1715.38 | bwd_inner_microstep: 1715.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068
[2024-06-11 05:24:21,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.50 | bwd_microstep: 914.81 | bwd_inner_microstep: 914.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569
[2024-06-11 05:24:22,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.03 | bwd_microstep: 1209.01 | bwd_inner_microstep: 1208.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490
[2024-06-11 05:24:24,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.60 | bwd_microstep: 1221.44 | bwd_inner_microstep: 1221.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633
[2024-06-11 05:24:26,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.61 | bwd_microstep: 1408.94 | bwd_inner_microstep: 1408.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 05:24:28,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.42 | bwd_microstep: 1660.06 | bwd_inner_microstep: 1660.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511
[2024-06-11 05:24:30,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1350.26 | bwd_inner_microstep: 1350.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 05:24:32,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.54 | bwd_microstep: 1656.08 | bwd_inner_microstep: 1656.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.31
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3822
[2024-06-11 05:24:34,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1263.09 | bwd_inner_microstep: 1263.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 05:24:36,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.12 | bwd_microstep: 1352.61 | bwd_inner_microstep: 1352.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778
[2024-06-11 05:24:38,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1456.28 | bwd_inner_microstep: 1456.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3601
[2024-06-11 05:24:40,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1554.93 | bwd_inner_microstep: 1554.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 05:24:42,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1383.74 | bwd_inner_microstep: 1383.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1409
[2024-06-11 05:24:43,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.98 | bwd_microstep: 532.00 | bwd_inner_microstep: 531.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 05:24:45,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1404.32 | bwd_inner_microstep: 1404.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 05:24:47,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1406.72 | bwd_inner_microstep: 1406.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3430
[2024-06-11 05:24:49,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.88 | bwd_microstep: 1473.47 | bwd_inner_microstep: 1473.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650
[2024-06-11 05:24:51,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.66 | bwd_microstep: 1585.15 | bwd_inner_microstep: 1585.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592
[2024-06-11 05:24:53,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1705.11 | bwd_inner_microstep: 1705.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 05:24:55,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1340.38 | bwd_inner_microstep: 1340.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3389
[2024-06-11 05:25:02,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.26 | optimizer_step: 6.64
[2024-06-11 05:25:02,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.78 | bwd_microstep: 6105.02 | bwd_inner_microstep: 1444.64 | bwd_allreduce_microstep: 4660.30 | step_microstep: 40.52
[2024-06-11 05:25:02,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16026.92 | bwd: 47687.72 | bwd_inner: 43026.36 | bwd_allreduce: 4660.60 | step: 42.70
{'loss': 1.1667, 'learning_rate': 2.479978003739669e-07, 'epoch': 0.95}
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3429
[2024-06-11 05:25:04,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1384.83 | bwd_inner_microstep: 1384.63 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2459
[2024-06-11 05:25:05,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.59 | bwd_microstep: 919.74 | bwd_inner_microstep: 919.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 05:25:07,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1271.99 | bwd_inner_microstep: 1271.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928
[2024-06-11 05:25:08,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.27 | bwd_microstep: 786.28 | bwd_inner_microstep: 786.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-11 05:25:10,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.41 | bwd_microstep: 1635.33 | bwd_inner_microstep: 1635.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 05:25:12,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.00 | bwd_microstep: 1286.19 | bwd_inner_microstep: 1286.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 05:25:14,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.39
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3748
[2024-06-11 05:25:16,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.94 | bwd_microstep: 1540.75 | bwd_inner_microstep: 1540.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 05:25:18,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.18 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500
[2024-06-11 05:25:20,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1317.76 | bwd_inner_microstep: 1317.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1945
[2024-06-11 05:25:21,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.19 | bwd_microstep: 885.92 | bwd_inner_microstep: 885.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617
[2024-06-11 05:25:23,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1316.09 | bwd_inner_microstep: 1316.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065
[2024-06-11 05:25:24,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.31 | bwd_microstep: 824.28 | bwd_inner_microstep: 824.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982
[2024-06-11 05:25:25,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.13 | bwd_microstep: 735.15 | bwd_inner_microstep: 735.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963
[2024-06-11 05:25:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.13 | bwd_microstep: 826.08 | bwd_inner_microstep: 826.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 05:25:28,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.44 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.22
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070
[2024-06-11 05:25:29,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.09 | bwd_microstep: 918.54 | bwd_inner_microstep: 918.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523
[2024-06-11 05:25:31,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1294.45 | bwd_inner_microstep: 1294.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501
[2024-06-11 05:25:33,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1322.18 | bwd_inner_microstep: 1322.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3545
[2024-06-11 05:25:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.78 | bwd_microstep: 1200.90 | bwd_inner_microstep: 1200.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-11 05:25:36,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.95 | bwd_microstep: 704.64 | bwd_inner_microstep: 704.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 05:25:38,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1654.85 | bwd_inner_microstep: 1654.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 05:25:40,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.33 | bwd_microstep: 1549.85 | bwd_inner_microstep: 1549.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 05:25:42,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1511.70 | bwd_inner_microstep: 1511.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818
[2024-06-11 05:25:44,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.63 | bwd_microstep: 1390.02 | bwd_inner_microstep: 1389.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523
[2024-06-11 05:25:46,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1489.06 | bwd_inner_microstep: 1489.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730
[2024-06-11 05:25:48,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1636.44 | bwd_inner_microstep: 1636.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3810
[2024-06-11 05:25:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.06 | bwd_microstep: 1500.47 | bwd_inner_microstep: 1500.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192
[2024-06-11 05:25:52,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.14 | bwd_microstep: 957.02 | bwd_inner_microstep: 956.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 05:25:54,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1393.85 | bwd_inner_microstep: 1393.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3030
[2024-06-11 05:25:55,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1329.03 | bwd_inner_microstep: 1329.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3860
[2024-06-11 05:26:05,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.00 | optimizer_gradients: 4.27 | optimizer_step: 6.64
[2024-06-11 05:26:05,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.57 | bwd_microstep: 8555.43 | bwd_inner_microstep: 1705.44 | bwd_allreduce_microstep: 6849.91 | step_microstep: 40.38
[2024-06-11 05:26:05,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15114.87 | bwd: 47294.88 | bwd_inner: 40443.34 | bwd_allreduce: 6850.56 | step: 42.92
{'loss': 1.234, 'learning_rate': 2.4214011085352815e-07, 'epoch': 0.95}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466
[2024-06-11 05:26:07,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1464.56 | bwd_inner_microstep: 1464.41 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907
[2024-06-11 05:26:09,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.06 | bwd_microstep: 1482.51 | bwd_inner_microstep: 1482.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427
[2024-06-11 05:26:11,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.89 | bwd_microstep: 1443.63 | bwd_inner_microstep: 1443.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1233
[2024-06-11 05:26:11,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 185.29 | bwd_microstep: 483.06 | bwd_inner_microstep: 483.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778
[2024-06-11 05:26:13,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.97 | bwd_microstep: 1544.86 | bwd_inner_microstep: 1544.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-11 05:26:16,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.52 | bwd_microstep: 1636.10 | bwd_inner_microstep: 1635.96 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.21
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419
[2024-06-11 05:26:17,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.78 | bwd_microstep: 1180.79 | bwd_inner_microstep: 1180.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440
[2024-06-11 05:26:19,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.14 | bwd_microstep: 1214.54 | bwd_inner_microstep: 1214.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446
[2024-06-11 05:26:21,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.65 | bwd_microstep: 1219.31 | bwd_inner_microstep: 1219.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181
[2024-06-11 05:26:22,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.07 | bwd_microstep: 856.72 | bwd_inner_microstep: 856.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448
[2024-06-11 05:26:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.01 | bwd_microstep: 1315.28 | bwd_inner_microstep: 1315.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492
[2024-06-11 05:26:26,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.81 | bwd_microstep: 1482.02 | bwd_inner_microstep: 1481.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-11 05:26:28,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.87 | bwd_microstep: 1344.76 | bwd_inner_microstep: 1344.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3663
[2024-06-11 05:26:30,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.99 | bwd_microstep: 1443.29 | bwd_inner_microstep: 1443.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656
[2024-06-11 05:26:32,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.11 | bwd_microstep: 1515.07 | bwd_inner_microstep: 1515.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-11 05:26:34,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.16 | bwd_microstep: 1512.44 | bwd_inner_microstep: 1512.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509
[2024-06-11 05:26:36,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1443.05 | bwd_inner_microstep: 1443.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3472
[2024-06-11 05:26:38,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.04 | bwd_microstep: 1505.80 | bwd_inner_microstep: 1505.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672
[2024-06-11 05:26:40,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.68 | bwd_microstep: 1617.05 | bwd_inner_microstep: 1616.92 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.28
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465
[2024-06-11 05:26:42,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.76 | bwd_microstep: 1436.79 | bwd_inner_microstep: 1436.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414
[2024-06-11 05:26:44,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.00 | bwd_microstep: 1280.65 | bwd_inner_microstep: 1280.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431
[2024-06-11 05:26:46,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1347.21 | bwd_inner_microstep: 1347.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3843
[2024-06-11 05:26:48,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.37 | bwd_microstep: 1698.12 | bwd_inner_microstep: 1698.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3587
[2024-06-11 05:27:24,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.99 | bwd_microstep: 1426.91 | bwd_inner_microstep: 1426.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652
[2024-06-11 05:27:26,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1513.42 | bwd_inner_microstep: 1513.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3811
[2024-06-11 05:27:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1589.96 | bwd_inner_microstep: 1589.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 05:27:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1551.47 | bwd_inner_microstep: 1551.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 05:27:33,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.22 | bwd_microstep: 1549.35 | bwd_inner_microstep: 1549.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-11 05:27:35,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.81 | bwd_microstep: 1282.59 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 05:27:37,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.24 | bwd_microstep: 1548.29 | bwd_inner_microstep: 1548.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759
[2024-06-11 05:27:39,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.31 | bwd_microstep: 1447.18 | bwd_inner_microstep: 1447.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573
[2024-06-11 05:27:48,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-11 05:27:48,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 8621.18 | bwd_inner_microstep: 1575.27 | bwd_allreduce_microstep: 7045.85 | step_microstep: 38.93
[2024-06-11 05:27:48,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16739.67 | bwd: 51998.00 | bwd_inner: 44950.88 | bwd_allreduce: 7046.32 | step: 41.27


 95%|█████████▍| 1639/1726 [28:44:34<1:29:35, 61.79s/it]
 95%|█████████▌| 1640/1726 [28:45:34<1:28:11, 61.53s/it]


 95%|█████████▌| 1640/1726 [28:45:34<1:28:11, 61.53s/it]
 95%|█████████▌| 1641/1726 [28:46:34<1:26:30, 61.06s/it]


 95%|█████████▌| 1641/1726 [28:46:34<1:26:30, 61.06s/it]
 95%|█████████▌| 1642/1726 [28:47:39<1:26:45, 61.97s/it]


 95%|█████████▌| 1642/1726 [28:47:39<1:26:45, 61.97s/it]
 95%|█████████▌| 1643/1726 [28:48:41<1:26:04, 62.22s/it]


 95%|█████████▌| 1643/1726 [28:48:41<1:26:04, 62.22s/it]
 95%|█████████▌| 1644/1726 [28:50:25<1:41:51, 74.53s/it]
                  {'loss': 1.1751, 'learning_rate': 2.3635200841290784e-07, 'epoch': 0.95}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2060
[2024-06-11 05:28:27,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.28 | bwd_microstep: 866.00 | bwd_inner_microstep: 865.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.20
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1344
[2024-06-11 05:28:28,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 197.24 | bwd_microstep: 515.18 | bwd_inner_microstep: 515.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3872
[2024-06-11 05:28:30,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.01 | bwd_microstep: 1482.94 | bwd_inner_microstep: 1482.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-11 05:28:32,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1276.48 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-11 05:28:34,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.94 | bwd_microstep: 1630.06 | bwd_inner_microstep: 1630.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 05:28:36,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.72 | bwd_microstep: 1243.60 | bwd_inner_microstep: 1243.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 05:28:37,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1346.28 | bwd_inner_microstep: 1346.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1880
[2024-06-11 05:28:38,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.80 | bwd_microstep: 708.92 | bwd_inner_microstep: 708.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3766
[2024-06-11 05:28:40,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1305.89 | bwd_inner_microstep: 1305.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 05:28:42,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1245.52 | bwd_inner_microstep: 1245.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003
[2024-06-11 05:28:43,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.51 | bwd_microstep: 800.33 | bwd_inner_microstep: 800.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534
[2024-06-11 05:28:45,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.08 | bwd_microstep: 1294.94 | bwd_inner_microstep: 1294.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984
[2024-06-11 05:28:46,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.79 | bwd_microstep: 734.88 | bwd_inner_microstep: 734.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452
[2024-06-11 05:28:48,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1316.77 | bwd_inner_microstep: 1316.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655
[2024-06-11 05:28:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.34 | bwd_microstep: 1611.68 | bwd_inner_microstep: 1611.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679
[2024-06-11 05:28:52,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.94 | bwd_microstep: 1618.17 | bwd_inner_microstep: 1618.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643
[2024-06-11 05:28:54,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1406.96 | bwd_inner_microstep: 1406.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 05:28:56,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1341.71 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831
[2024-06-11 05:28:58,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1360.85 | bwd_inner_microstep: 1360.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495
[2024-06-11 05:29:00,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.95 | bwd_microstep: 1319.55 | bwd_inner_microstep: 1319.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3605
[2024-06-11 05:29:02,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1569.71 | bwd_inner_microstep: 1569.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3687
[2024-06-11 05:29:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1490.65 | bwd_allreduce_microstep: 0.24 | step_microstep: 0.29
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3378
[2024-06-11 05:29:06,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.57 | bwd_microstep: 1241.04 | bwd_inner_microstep: 1241.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2024
[2024-06-11 05:29:07,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.60 | bwd_microstep: 841.04 | bwd_inner_microstep: 841.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445
[2024-06-11 05:29:08,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1158.33 | bwd_inner_microstep: 1158.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 05:29:10,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.11 | bwd_microstep: 1349.19 | bwd_inner_microstep: 1349.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2231
[2024-06-11 05:29:12,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.52 | bwd_microstep: 867.94 | bwd_inner_microstep: 867.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006
[2024-06-11 05:29:13,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.61 | bwd_microstep: 833.83 | bwd_inner_microstep: 833.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-11 05:29:15,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.56 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 05:29:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.00 | bwd_microstep: 1651.13 | bwd_inner_microstep: 1651.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731
[2024-06-11 05:29:19,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.26 | bwd_microstep: 1535.12 | bwd_inner_microstep: 1535.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 05:29:25,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.28 | optimizer_step: 6.60
[2024-06-11 05:29:25,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 5413.65 | bwd_inner_microstep: 1575.70 | bwd_allreduce_microstep: 3837.87 | step_microstep: 39.94
[2024-06-11 05:29:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14936.01 | bwd: 43776.59 | bwd_inner: 39937.28 | bwd_allreduce: 3838.39 | step: 42.09
{'loss': 1.183, 'learning_rate': 2.3063351343777241e-07, 'epoch': 0.95}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409
[2024-06-11 05:29:27,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.79 | bwd_microstep: 1336.42 | bwd_inner_microstep: 1336.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925
[2024-06-11 05:29:29,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.19 | bwd_microstep: 1589.56 | bwd_inner_microstep: 1589.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3410
[2024-06-11 05:29:31,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.04 | bwd_microstep: 1326.03 | bwd_inner_microstep: 1326.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 05:29:33,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2613
[2024-06-11 05:29:34,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.20 | bwd_microstep: 1063.35 | bwd_inner_microstep: 1063.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 05:29:36,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1297.10 | bwd_inner_microstep: 1297.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448
[2024-06-11 05:29:38,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1252.49 | bwd_inner_microstep: 1252.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403
[2024-06-11 05:29:39,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1147.32 | bwd_inner_microstep: 1147.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-11 05:29:41,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.10 | bwd_microstep: 1307.40 | bwd_inner_microstep: 1307.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 05:29:43,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.04 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3406
[2024-06-11 05:29:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1507.31 | bwd_inner_microstep: 1507.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2356
[2024-06-11 05:29:46,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.71 | bwd_microstep: 950.85 | bwd_inner_microstep: 950.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3375
[2024-06-11 05:29:48,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1239.71 | bwd_inner_microstep: 1239.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450
[2024-06-11 05:29:50,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.46 | bwd_microstep: 1444.12 | bwd_inner_microstep: 1444.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 05:29:52,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.28 | bwd_microstep: 1375.90 | bwd_inner_microstep: 1375.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426
[2024-06-11 05:29:54,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.18 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 05:29:56,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1387.16 | bwd_inner_microstep: 1387.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626
[2024-06-11 05:29:58,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1521.25 | bwd_inner_microstep: 1521.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2073
[2024-06-11 05:29:59,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.21 | bwd_microstep: 917.09 | bwd_inner_microstep: 917.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 05:30:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.18 | bwd_microstep: 1514.81 | bwd_inner_microstep: 1514.64 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.19
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 05:30:03,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1389.95 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452
[2024-06-11 05:30:05,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.64 | bwd_microstep: 1192.31 | bwd_inner_microstep: 1192.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513
[2024-06-11 05:30:07,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1591.22 | bwd_inner_microstep: 1591.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2276
[2024-06-11 05:30:08,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.68 | bwd_microstep: 788.77 | bwd_inner_microstep: 788.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-11 05:30:10,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1454.18 | bwd_inner_microstep: 1454.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 05:30:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.02 | bwd_allreduce_microstep: 0.22 | step_microstep: 0.35
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3603
[2024-06-11 05:30:14,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.01 | bwd_microstep: 1537.81 | bwd_inner_microstep: 1537.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829
[2024-06-11 05:30:16,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.58 | bwd_microstep: 1588.19 | bwd_inner_microstep: 1588.09 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.18
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3576
[2024-06-11 05:30:18,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1464.16 | bwd_inner_microstep: 1464.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-11 05:30:20,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.03 | bwd_microstep: 1546.35 | bwd_inner_microstep: 1546.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461
[2024-06-11 05:30:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.47 | bwd_microstep: 1439.47 | bwd_inner_microstep: 1439.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-11 05:30:27,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.63
[2024-06-11 05:30:27,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.21 | bwd_microstep: 4293.02 | bwd_inner_microstep: 1741.85 | bwd_allreduce_microstep: 2551.09 | step_microstep: 40.86
[2024-06-11 05:30:27,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16094.02 | bwd: 45722.91 | bwd_inner: 43170.26 | bwd_allreduce: 2551.77 | step: 43.19
{'loss': 1.2201, 'learning_rate': 2.2498464606863334e-07, 'epoch': 0.95}
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2640
[2024-06-11 05:30:29,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.93 | bwd_microstep: 1103.30 | bwd_inner_microstep: 1103.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462
[2024-06-11 05:30:30,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.12 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3808
[2024-06-11 05:30:33,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.38 | bwd_microstep: 1598.00 | bwd_inner_microstep: 1597.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-11 05:30:35,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1498.79 | bwd_inner_microstep: 1498.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 05:30:36,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1251.55 | bwd_inner_microstep: 1251.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 05:30:38,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-11 05:30:40,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1429.63 | bwd_inner_microstep: 1429.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 05:30:42,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.80 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3560
[2024-06-11 05:30:44,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.39 | bwd_microstep: 1350.03 | bwd_inner_microstep: 1350.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-11 05:30:46,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1474.84 | bwd_inner_microstep: 1474.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673
[2024-06-11 05:30:48,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.04 | bwd_microstep: 1610.17 | bwd_inner_microstep: 1610.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569
[2024-06-11 05:30:50,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1496.76 | bwd_inner_microstep: 1496.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-11 05:30:52,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.12 | bwd_microstep: 1419.93 | bwd_inner_microstep: 1419.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494
[2024-06-11 05:30:54,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.88 | bwd_microstep: 1579.57 | bwd_inner_microstep: 1579.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982
[2024-06-11 05:30:56,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.85 | bwd_microstep: 891.34 | bwd_inner_microstep: 891.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678
[2024-06-11 05:30:58,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.66 | bwd_microstep: 1422.72 | bwd_inner_microstep: 1422.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 05:31:00,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1411.05 | bwd_inner_microstep: 1411.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831
[2024-06-11 05:31:02,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.30 | bwd_microstep: 1757.65 | bwd_inner_microstep: 1757.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3567
[2024-06-11 05:31:05,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 2422.99 | bwd_inner_microstep: 2422.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514
[2024-06-11 05:31:07,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.58 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822
[2024-06-11 05:31:09,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.58 | bwd_microstep: 1415.12 | bwd_inner_microstep: 1415.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-11 05:31:10,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1293.04 | bwd_inner_microstep: 1293.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-11 05:31:12,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1180.73 | bwd_inner_microstep: 1180.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3543
[2024-06-11 05:31:14,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.64 | bwd_microstep: 1230.53 | bwd_inner_microstep: 1230.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465
[2024-06-11 05:31:16,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1374.67 | bwd_inner_microstep: 1374.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 05:31:18,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1551.25 | bwd_inner_microstep: 1551.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2045
[2024-06-11 05:31:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.80 | bwd_microstep: 716.69 | bwd_inner_microstep: 716.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-11 05:31:21,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.94 | bwd_microstep: 1345.53 | bwd_inner_microstep: 1345.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775
[2024-06-11 05:31:23,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.73 | bwd_microstep: 1745.39 | bwd_inner_microstep: 1745.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3540
[2024-06-11 05:31:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1435.82 | bwd_inner_microstep: 1435.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2406
[2024-06-11 05:31:27,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.10 | bwd_microstep: 1064.45 | bwd_inner_microstep: 1064.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583
[2024-06-11 05:31:29,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-11 05:31:29,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 8.23 | step_microstep: 38.80
[2024-06-11 05:31:29,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16360.83 | bwd: 44767.81 | bwd_inner: 44758.57 | bwd_allreduce: 8.51 | step: 40.92
{'loss': 1.1501, 'learning_rate': 2.1940542620076723e-07, 'epoch': 0.95}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 05:31:31,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1343.51 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3454
[2024-06-11 05:31:32,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1414.91 | bwd_inner_microstep: 1414.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1343
[2024-06-11 05:31:33,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 208.84 | bwd_microstep: 544.80 | bwd_inner_microstep: 544.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 05:31:35,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.64 | bwd_microstep: 1271.05 | bwd_inner_microstep: 1271.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039
[2024-06-11 05:31:36,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.60 | bwd_microstep: 808.88 | bwd_inner_microstep: 808.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-11 05:31:38,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1477.82 | bwd_inner_microstep: 1477.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 05:31:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 05:31:42,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1371.07 | bwd_inner_microstep: 1371.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 05:31:44,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1248.75 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2046
[2024-06-11 05:31:45,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.70 | bwd_microstep: 780.41 | bwd_inner_microstep: 780.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 05:31:47,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1397.24 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3440
[2024-06-11 05:31:49,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1401.40 | bwd_inner_microstep: 1401.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441
[2024-06-11 05:31:50,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1409.74 | bwd_inner_microstep: 1409.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 05:31:53,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-11 05:31:55,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.47 | bwd_microstep: 1482.12 | bwd_inner_microstep: 1482.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516
[2024-06-11 05:31:56,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.11 | bwd_microstep: 1194.96 | bwd_inner_microstep: 1194.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2444
[2024-06-11 05:31:57,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.71 | bwd_microstep: 887.25 | bwd_inner_microstep: 887.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-11 05:32:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.17 | bwd_microstep: 1660.44 | bwd_inner_microstep: 1660.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 05:32:02,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1511.25 | bwd_inner_microstep: 1511.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-11 05:32:04,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1415.36 | bwd_inner_microstep: 1415.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284
[2024-06-11 05:32:05,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.10 | bwd_microstep: 972.82 | bwd_inner_microstep: 972.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-11 05:32:07,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1303.23 | bwd_inner_microstep: 1303.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644
[2024-06-11 05:32:09,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1443.51 | bwd_inner_microstep: 1443.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2993
[2024-06-11 05:32:11,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.53 | bwd_microstep: 1299.86 | bwd_inner_microstep: 1299.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3809
[2024-06-11 05:32:13,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.41 | bwd_microstep: 1415.76 | bwd_inner_microstep: 1415.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989
[2024-06-11 05:32:14,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 05:32:16,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.66 | bwd_microstep: 1253.54 | bwd_inner_microstep: 1253.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2241
[2024-06-11 05:32:17,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.02 | bwd_microstep: 871.37 | bwd_inner_microstep: 871.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588
[2024-06-11 05:32:19,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.13 | bwd_microstep: 1306.79 | bwd_inner_microstep: 1306.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3585
[2024-06-11 05:32:21,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1569.24 | bwd_inner_microstep: 1569.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3574
[2024-06-11 05:32:23,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3773
[2024-06-11 05:32:30,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.09 | optimizer_step: 6.60
[2024-06-11 05:32:30,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.20 | bwd_microstep: 6338.07 | bwd_inner_microstep: 1515.26 | bwd_allreduce_microstep: 4822.75 | step_microstep: 37.95
[2024-06-11 05:32:30,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15172.63 | bwd: 45407.94 | bwd_inner: 40584.22 | bwd_allreduce: 4823.01 | step: 39.50
{'loss': 1.1617, 'learning_rate': 2.138958734841623e-07, 'epoch': 0.95}
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2403
[2024-06-11 05:32:31,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.57 | bwd_microstep: 957.50 | bwd_inner_microstep: 957.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-11 05:32:33,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1247.14 | bwd_inner_microstep: 1247.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 05:32:34,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1244.13 | bwd_inner_microstep: 1244.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 05:32:36,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1370.95 | bwd_inner_microstep: 1370.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 05:32:38,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1481.50 | bwd_inner_microstep: 1481.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539
[2024-06-11 05:32:40,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.70 | bwd_microstep: 1199.66 | bwd_inner_microstep: 1199.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 05:32:42,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.23 | bwd_microstep: 1383.05 | bwd_inner_microstep: 1383.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754
[2024-06-11 05:32:44,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.64 | bwd_microstep: 1638.49 | bwd_inner_microstep: 1638.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455
[2024-06-11 05:32:46,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.04 | bwd_microstep: 1318.75 | bwd_inner_microstep: 1318.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-11 05:32:48,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1478.71 | bwd_inner_microstep: 1478.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3635
[2024-06-11 05:32:50,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.97 | bwd_microstep: 1682.71 | bwd_inner_microstep: 1682.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 05:32:52,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1250.08 | bwd_inner_microstep: 1250.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429
[2024-06-11 05:32:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.49 | bwd_microstep: 1440.55 | bwd_inner_microstep: 1440.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943
[2024-06-11 05:32:55,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.84 | bwd_microstep: 697.45 | bwd_inner_microstep: 697.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3516
[2024-06-11 05:32:57,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.81 | bwd_microstep: 1228.68 | bwd_inner_microstep: 1228.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 05:32:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1255.50 | bwd_inner_microstep: 1255.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639
[2024-06-11 05:33:01,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.34 | bwd_microstep: 1513.12 | bwd_inner_microstep: 1513.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 05:33:02,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-11 05:33:04,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1289.18 | bwd_inner_microstep: 1289.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1404
[2024-06-11 05:33:05,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.64 | bwd_microstep: 528.59 | bwd_inner_microstep: 528.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822
[2024-06-11 05:33:07,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.91 | bwd_microstep: 1587.54 | bwd_inner_microstep: 1587.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543
[2024-06-11 05:33:09,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1296.62 | bwd_inner_microstep: 1296.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450
[2024-06-11 05:33:11,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1354.40 | bwd_inner_microstep: 1354.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140
[2024-06-11 05:33:12,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.30 | bwd_microstep: 834.18 | bwd_inner_microstep: 834.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741
[2024-06-11 05:33:14,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.01 | bwd_microstep: 1538.06 | bwd_inner_microstep: 1538.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730
[2024-06-11 05:33:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.17 | bwd_microstep: 1538.54 | bwd_inner_microstep: 1538.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3808
[2024-06-11 05:33:18,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1618.23 | bwd_inner_microstep: 1618.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3698
[2024-06-11 05:33:20,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1455.87 | bwd_inner_microstep: 1455.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288
[2024-06-11 05:33:22,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.06 | bwd_microstep: 974.87 | bwd_inner_microstep: 974.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 05:33:24,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.43 | bwd_microstep: 1478.33 | bwd_inner_microstep: 1478.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3803
[2024-06-11 05:33:26,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.03 | bwd_microstep: 1752.87 | bwd_inner_microstep: 1752.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 05:33:30,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.17 | optimizer_step: 6.63
[2024-06-11 05:33:30,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.10 | bwd_microstep: 3280.23 | bwd_inner_microstep: 1867.89 | bwd_allreduce_microstep: 1412.26 | step_microstep: 38.90
[2024-06-11 05:33:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15960.00 | bwd: 44312.39 | bwd_inner: 42899.19 | bwd_allreduce: 1412.50 | step: 40.36


 95%|█████████▌| 1644/1726 [28:50:25<1:41:51, 74.53s/it]
 95%|█████████▌| 1645/1726 [28:52:02<1:49:46, 81.31s/it]


 95%|█████████▌| 1645/1726 [28:52:02<1:49:46, 81.31s/it]
 95%|█████████▌| 1646/1726 [28:53:04<1:40:46, 75.58s/it]


 95%|█████████▌| 1646/1726 [28:53:04<1:40:46, 75.58s/it]
 95%|█████████▌| 1647/1726 [28:54:05<1:33:56, 71.35s/it]


 95%|█████████▌| 1647/1726 [28:54:05<1:33:56, 71.35s/it]
 95%|█████████▌| 1648/1726 [28:55:06<1:28:41, 68.22s/it]


 95%|█████████▌| 1648/1726 [28:55:06<1:28:41, 68.22s/it]
 96%|█████████▌| 1649/1726 [28:56:07<1:{'loss': 1.1828, 'learning_rate': 2.0845600732342987e-07, 'epoch': 0.96}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-11 05:33:32,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.45 | bwd_microstep: 1449.59 | bwd_inner_microstep: 1449.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4343
[2024-06-11 05:33:34,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 1601.21 | bwd_inner_microstep: 1601.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 05:33:36,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.16 | bwd_microstep: 1286.41 | bwd_inner_microstep: 1286.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469
[2024-06-11 05:33:38,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.57 | bwd_microstep: 1238.27 | bwd_inner_microstep: 1238.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 05:33:40,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1382.78 | bwd_inner_microstep: 1382.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3746
[2024-06-11 05:33:42,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.72 | bwd_microstep: 1469.25 | bwd_inner_microstep: 1469.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-11 05:33:43,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.84 | bwd_microstep: 1152.42 | bwd_inner_microstep: 1152.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2464
[2024-06-11 05:33:45,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.61 | bwd_microstep: 952.04 | bwd_inner_microstep: 952.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-11 05:33:47,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1521.91 | bwd_inner_microstep: 1521.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748
[2024-06-11 05:33:49,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.98 | bwd_microstep: 1640.08 | bwd_inner_microstep: 1640.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 05:33:51,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1246.08 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 05:33:53,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.61 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478
[2024-06-11 05:33:54,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1312.46 | bwd_inner_microstep: 1312.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510
[2024-06-11 05:33:56,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1394.18 | bwd_inner_microstep: 1394.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3486
[2024-06-11 05:33:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1266.95 | bwd_inner_microstep: 1266.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3502
[2024-06-11 05:34:00,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.65 | bwd_microstep: 1549.99 | bwd_inner_microstep: 1549.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3616
[2024-06-11 05:34:03,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1708.90 | bwd_inner_microstep: 1708.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526
[2024-06-11 05:34:05,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1455.30 | bwd_inner_microstep: 1455.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3679
[2024-06-11 05:34:07,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.51 | bwd_microstep: 1728.72 | bwd_inner_microstep: 1728.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830
[2024-06-11 05:34:09,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.80 | bwd_microstep: 1661.87 | bwd_inner_microstep: 1661.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2062
[2024-06-11 05:34:11,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.81 | bwd_microstep: 946.08 | bwd_inner_microstep: 946.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2069
[2024-06-11 05:34:12,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.31 | bwd_microstep: 976.95 | bwd_inner_microstep: 976.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 05:34:14,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1385.26 | bwd_inner_microstep: 1385.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-11 05:34:15,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.23 | bwd_microstep: 1161.42 | bwd_inner_microstep: 1161.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-11 05:34:17,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.37 | bwd_microstep: 1457.26 | bwd_inner_microstep: 1457.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-11 05:34:20,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.43 | bwd_microstep: 1555.09 | bwd_inner_microstep: 1555.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-11 05:34:22,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1402.94 | bwd_inner_microstep: 1402.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 05:34:23,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1399.05 | bwd_inner_microstep: 1399.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3768
[2024-06-11 05:34:26,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.19 | bwd_microstep: 1741.08 | bwd_inner_microstep: 1741.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-11 05:34:28,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.36 | bwd_microstep: 1400.19 | bwd_inner_microstep: 1400.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 05:34:30,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.61 | bwd_microstep: 1400.13 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3443
[2024-06-11 05:34:32,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.98 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-11 05:34:32,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.58 | bwd_microstep: 1352.94 | bwd_inner_microstep: 1219.64 | bwd_allreduce_microstep: 133.25 | step_microstep: 37.91
[2024-06-11 05:34:32,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16572.20 | bwd: 44481.77 | bwd_inner: 44347.61 | bwd_allreduce: 133.48 | step: 39.45
{'loss': 1.1761, 'learning_rate': 2.0308584687775745e-07, 'epoch': 0.96}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989
[2024-06-11 05:34:33,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.79 | bwd_microstep: 889.30 | bwd_inner_microstep: 889.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-11 05:34:35,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.51 | bwd_microstep: 1279.08 | bwd_inner_microstep: 1279.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-11 05:34:37,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.19 | bwd_microstep: 1560.86 | bwd_inner_microstep: 1560.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1426
[2024-06-11 05:34:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 217.45 | bwd_microstep: 567.88 | bwd_inner_microstep: 567.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763
[2024-06-11 05:34:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1541.47 | bwd_inner_microstep: 1541.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481
[2024-06-11 05:34:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1284.12 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-11 05:34:43,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.41 | bwd_microstep: 1150.85 | bwd_inner_microstep: 1150.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884
[2024-06-11 05:34:44,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.22 | bwd_microstep: 682.46 | bwd_inner_microstep: 682.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3412
[2024-06-11 05:34:46,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.74 | bwd_microstep: 1332.76 | bwd_inner_microstep: 1332.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-11 05:34:48,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1486.76 | bwd_inner_microstep: 1486.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2139
[2024-06-11 05:34:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.98 | bwd_microstep: 1025.37 | bwd_inner_microstep: 1025.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3702
[2024-06-11 05:34:52,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.69 | bwd_microstep: 1724.51 | bwd_inner_microstep: 1724.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3449
[2024-06-11 05:34:54,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1419.89 | bwd_inner_microstep: 1419.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967
[2024-06-11 05:34:55,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.49 | bwd_microstep: 734.18 | bwd_inner_microstep: 734.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648
[2024-06-11 05:34:57,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.34 | bwd_microstep: 1626.98 | bwd_inner_microstep: 1626.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454
[2024-06-11 05:34:59,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.03 | bwd_microstep: 1192.54 | bwd_inner_microstep: 1192.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 05:35:01,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1485.70 | bwd_inner_microstep: 1485.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427
[2024-06-11 05:35:02,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.10 | bwd_microstep: 1346.22 | bwd_inner_microstep: 1346.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3490
[2024-06-11 05:35:04,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1417.53 | bwd_inner_microstep: 1417.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-11 05:35:05,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 697.79 | bwd_inner_microstep: 697.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2290
[2024-06-11 05:35:07,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 879.16 | bwd_inner_microstep: 879.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509
[2024-06-11 05:35:08,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1292.84 | bwd_inner_microstep: 1292.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3438
[2024-06-11 05:35:10,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.00 | bwd_microstep: 1187.34 | bwd_inner_microstep: 1187.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608
[2024-06-11 05:35:12,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1414.94 | bwd_inner_microstep: 1414.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605
[2024-06-11 05:35:14,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.56 | bwd_microstep: 1430.23 | bwd_inner_microstep: 1430.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791
[2024-06-11 05:35:16,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.38 | bwd_microstep: 1659.71 | bwd_inner_microstep: 1659.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2411
[2024-06-11 05:35:18,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.11 | bwd_microstep: 969.71 | bwd_inner_microstep: 969.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 05:35:19,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.51 | bwd_microstep: 1254.94 | bwd_inner_microstep: 1254.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565
[2024-06-11 05:35:21,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1498.89 | bwd_inner_microstep: 1498.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768
[2024-06-11 05:35:24,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.70 | bwd_microstep: 1543.42 | bwd_inner_microstep: 1543.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-11 05:35:26,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1647.19 | bwd_inner_microstep: 1647.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3793
[2024-06-11 05:35:30,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 05:35:30,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.63 | bwd_microstep: 3506.06 | bwd_inner_microstep: 2236.97 | bwd_allreduce_microstep: 1269.03 | step_microstep: 38.93
[2024-06-11 05:35:30,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15286.33 | bwd: 42730.72 | bwd_inner: 41460.75 | bwd_allreduce: 1269.27 | step: 40.49
{'loss': 1.1595, 'learning_rate': 1.9778541106081572e-07, 'epoch': 0.96}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 05:35:32,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.54 | bwd_microstep: 1241.06 | bwd_inner_microstep: 1241.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3926
[2024-06-11 05:35:34,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.92 | bwd_microstep: 1687.12 | bwd_inner_microstep: 1687.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 05:35:36,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.13 | bwd_microstep: 1242.83 | bwd_inner_microstep: 1242.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 05:35:38,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.03 | bwd_microstep: 1378.84 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 05:35:39,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1283.74 | bwd_inner_microstep: 1283.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3757
[2024-06-11 05:35:41,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1471.52 | bwd_inner_microstep: 1471.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 05:35:43,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.50 | bwd_microstep: 1380.24 | bwd_inner_microstep: 1380.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779
[2024-06-11 05:35:46,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1653.55 | bwd_inner_microstep: 1653.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499
[2024-06-11 05:35:48,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1390.95 | bwd_inner_microstep: 1390.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505
[2024-06-11 05:35:50,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.83 | bwd_microstep: 1484.84 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509
[2024-06-11 05:35:52,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1416.99 | bwd_inner_microstep: 1416.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3510
[2024-06-11 05:35:53,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1333.82 | bwd_inner_microstep: 1333.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462
[2024-06-11 05:35:55,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.80 | bwd_microstep: 1213.30 | bwd_inner_microstep: 1213.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-11 05:35:57,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.10 | bwd_microstep: 1489.69 | bwd_inner_microstep: 1489.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687
[2024-06-11 05:35:59,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1616.94 | bwd_inner_microstep: 1616.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381
[2024-06-11 05:36:01,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1336.44 | bwd_inner_microstep: 1336.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 05:36:03,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1491.93 | bwd_inner_microstep: 1491.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3517
[2024-06-11 05:36:05,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1615.99 | bwd_inner_microstep: 1615.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3533
[2024-06-11 05:36:08,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1519.99 | bwd_inner_microstep: 1519.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581
[2024-06-11 05:36:09,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.93 | bwd_microstep: 1308.46 | bwd_inner_microstep: 1308.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419
[2024-06-11 05:36:11,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.20 | bwd_microstep: 1344.49 | bwd_inner_microstep: 1344.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 05:36:13,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1401.61 | bwd_inner_microstep: 1401.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-11 05:36:15,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1302.07 | bwd_inner_microstep: 1302.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451
[2024-06-11 05:36:17,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1447.40 | bwd_inner_microstep: 1447.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 526
[2024-06-11 05:36:17,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.33 | bwd_microstep: 240.40 | bwd_inner_microstep: 240.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025
[2024-06-11 05:36:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.10 | bwd_microstep: 901.12 | bwd_inner_microstep: 901.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532
[2024-06-11 05:36:21,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1492.67 | bwd_inner_microstep: 1492.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425
[2024-06-11 05:36:22,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.23 | bwd_microstep: 1035.05 | bwd_inner_microstep: 1035.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3813
[2024-06-11 05:36:24,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.45 | bwd_microstep: 1698.96 | bwd_inner_microstep: 1698.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570
[2024-06-11 05:36:27,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-11 05:36:29,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.85 | bwd_microstep: 1608.33 | bwd_inner_microstep: 1608.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 05:36:31,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.30 | optimizer_step: 6.61
[2024-06-11 05:36:31,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.91 | bwd_microstep: 1420.56 | bwd_inner_microstep: 1412.12 | bwd_allreduce_microstep: 8.37 | step_microstep: 38.92
[2024-06-11 05:36:31,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16417.13 | bwd: 43977.31 | bwd_inner: 43968.02 | bwd_allreduce: 8.61 | step: 40.59
{'loss': 1.1985, 'learning_rate': 1.9255471854071616e-07, 'epoch': 0.96}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467
[2024-06-11 05:36:33,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1441.44 | bwd_inner_microstep: 1441.28 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2386
[2024-06-11 05:36:34,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.80 | bwd_microstep: 904.71 | bwd_inner_microstep: 904.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404
[2024-06-11 05:36:36,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.30 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-11 05:36:38,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.06 | bwd_microstep: 1642.41 | bwd_inner_microstep: 1642.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-11 05:36:40,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 05:36:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1376.39 | bwd_inner_microstep: 1376.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760
[2024-06-11 05:36:44,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1436.18 | bwd_inner_microstep: 1436.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2220
[2024-06-11 05:36:45,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.31 | bwd_microstep: 958.13 | bwd_inner_microstep: 958.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 05:36:47,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1390.75 | bwd_inner_microstep: 1390.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 05:36:49,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3542
[2024-06-11 05:36:51,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.84 | bwd_microstep: 1260.62 | bwd_inner_microstep: 1260.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 05:36:52,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1383.48 | bwd_inner_microstep: 1383.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445
[2024-06-11 05:36:54,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.71 | bwd_microstep: 1217.95 | bwd_inner_microstep: 1217.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-11 05:36:56,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1290.10 | bwd_inner_microstep: 1290.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659
[2024-06-11 05:36:58,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.10 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2991
[2024-06-11 05:37:00,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.16 | bwd_microstep: 1204.64 | bwd_inner_microstep: 1204.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3632
[2024-06-11 05:37:01,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.01 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 05:37:03,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1279.08 | bwd_inner_microstep: 1279.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 05:37:05,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1396.45 | bwd_inner_microstep: 1396.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 05:37:07,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1391.91 | bwd_inner_microstep: 1391.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2125
[2024-06-11 05:37:08,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.93 | bwd_microstep: 991.61 | bwd_inner_microstep: 991.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463
[2024-06-11 05:37:10,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.53 | bwd_microstep: 1407.89 | bwd_inner_microstep: 1407.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3867
[2024-06-11 05:37:12,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1272.65 | bwd_inner_microstep: 1272.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517
[2024-06-11 05:37:14,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1490.42 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543
[2024-06-11 05:37:16,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1496.88 | bwd_inner_microstep: 1496.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3451
[2024-06-11 05:37:18,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1477.63 | bwd_inner_microstep: 1477.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819
[2024-06-11 05:37:20,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.88 | bwd_microstep: 1450.75 | bwd_inner_microstep: 1450.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687
[2024-06-11 05:37:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.54 | bwd_microstep: 1618.47 | bwd_inner_microstep: 1618.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-11 05:37:25,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.41 | bwd_microstep: 1597.44 | bwd_inner_microstep: 1597.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2277
[2024-06-11 05:37:26,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.59 | bwd_microstep: 1070.08 | bwd_inner_microstep: 1070.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-11 05:37:29,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.35 | bwd_microstep: 1637.10 | bwd_inner_microstep: 1637.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2030
[2024-06-11 05:37:32,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.20 | optimizer_step: 6.62
[2024-06-11 05:37:32,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.55 | bwd_microstep: 3233.80 | bwd_inner_microstep: 853.97 | bwd_allreduce_microstep: 2379.78 | step_microstep: 37.96
[2024-06-11 05:37:32,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.22 | bwd: 45058.80 | bwd_inner: 42677.98 | bwd_allreduce: 2380.07 | step: 39.48
{'loss': 1.1893, 'learning_rate': 1.87393787739929e-07, 'epoch': 0.96}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457
[2024-06-11 05:37:34,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.12 | bwd_microstep: 1362.92 | bwd_inner_microstep: 1362.78 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3885
[2024-06-11 05:37:36,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.14 | bwd_microstep: 1679.52 | bwd_inner_microstep: 1679.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-11 05:37:38,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.57 | bwd_microstep: 1446.02 | bwd_inner_microstep: 1446.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848
[2024-06-11 05:37:40,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.23 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 05:37:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1245.07 | bwd_inner_microstep: 1245.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929
[2024-06-11 05:37:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.83 | bwd_microstep: 790.24 | bwd_inner_microstep: 790.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420
[2024-06-11 05:37:45,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1377
[2024-06-11 05:37:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.30 | bwd_microstep: 525.12 | bwd_inner_microstep: 525.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 05:37:48,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1385.44 | bwd_inner_microstep: 1385.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3982
[2024-06-11 05:37:50,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1523.09 | bwd_inner_microstep: 1523.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2654
[2024-06-11 05:37:51,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 952.29 | bwd_inner_microstep: 952.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377
[2024-06-11 05:37:53,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1333.81 | bwd_inner_microstep: 1333.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674
[2024-06-11 05:37:55,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1690.43 | bwd_inner_microstep: 1690.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657
[2024-06-11 05:37:57,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.85 | bwd_microstep: 1519.35 | bwd_inner_microstep: 1519.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 05:37:59,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1246.38 | bwd_inner_microstep: 1246.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 05:38:01,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1344.33 | bwd_inner_microstep: 1344.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506
[2024-06-11 05:38:03,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1484.65 | bwd_inner_microstep: 1484.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505
[2024-06-11 05:38:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1430.88 | bwd_inner_microstep: 1430.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3658
[2024-06-11 05:38:07,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1667.59 | bwd_inner_microstep: 1667.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437
[2024-06-11 05:38:09,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1347.84 | bwd_inner_microstep: 1347.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676
[2024-06-11 05:38:11,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.44 | bwd_microstep: 1547.95 | bwd_inner_microstep: 1547.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-11 05:38:13,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.59 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-11 05:38:15,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1509.40 | bwd_inner_microstep: 1509.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2029
[2024-06-11 05:38:16,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.71 | bwd_microstep: 714.77 | bwd_inner_microstep: 714.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540
[2024-06-11 05:38:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1295.11 | bwd_inner_microstep: 1295.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2242
[2024-06-11 05:38:19,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.38 | bwd_microstep: 807.24 | bwd_inner_microstep: 807.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2545
[2024-06-11 05:38:20,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.78 | bwd_microstep: 1063.21 | bwd_inner_microstep: 1063.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815
[2024-06-11 05:38:22,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1599.38 | bwd_inner_microstep: 1599.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3732
[2024-06-11 05:38:24,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1274.66 | bwd_inner_microstep: 1274.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3780
[2024-06-11 05:38:26,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.94 | bwd_microstep: 1502.06 | bwd_inner_microstep: 1502.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3586
[2024-06-11 05:38:28,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.56 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1288.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 05:38:34,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.16 | optimizer_step: 6.60
[2024-06-11 05:38:34,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 5597.56 | bwd_inner_microstep: 1558.66 | bwd_allreduce_microstep: 4038.84 | step_microstep: 39.40
[2024-06-11 05:38:34,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15623.49 | bwd: 45951.85 | bwd_inner: 41911.97 | bwd_allreduce: 4039.14 | step: 40.97
24:37, 65.94s/it]


 96%|█████████▌| 1649/1726 [28:56:07<1:24:37, 65.94s/it]
 96%|█████████▌| 1650/1726 [28:57:08<1:21:47, 64.58s/it]


 96%|█████████▌| 1650/1726 [28:57:08<1:21:47, 64.58s/it]
 96%|█████████▌| 1651/1726 [28:58:07<1:18:23, 62.71s/it]


 96%|█████████▌| 1651/1726 [28:58:07<1:18:23, 62.71s/it]
 96%|█████████▌| 1652/1726 [28:59:07<1:16:37, 62.12s/it]


 96%|█████████▌| 1652/1726 [28:59:07<1:16:37, 62.12s/it]
 96%|█████████▌| 1653/1726 [29:00:09<1:15:18, 61.89s/it]


 96%|█████████▌| 1653/1726 [29:00:09<1:15:18, 61.89s/it]
 96%|██████{'loss': 1.1477, 'learning_rate': 1.823026368352232e-07, 'epoch': 0.96}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3388
[2024-06-11 05:38:36,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.04 | bwd_microstep: 1294.21 | bwd_inner_microstep: 1294.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461
[2024-06-11 05:38:38,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1336.20 | bwd_inner_microstep: 1336.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 05:38:39,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1349.42 | bwd_inner_microstep: 1349.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 05:38:42,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.51 | bwd_microstep: 1546.59 | bwd_inner_microstep: 1546.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460
[2024-06-11 05:38:44,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1474.00 | bwd_inner_microstep: 1473.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-11 05:38:46,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.28 | bwd_microstep: 1630.24 | bwd_inner_microstep: 1630.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 05:38:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1250.24 | bwd_inner_microstep: 1250.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954
[2024-06-11 05:38:49,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.27 | bwd_microstep: 701.21 | bwd_inner_microstep: 701.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687
[2024-06-11 05:38:51,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.05 | bwd_microstep: 1431.37 | bwd_inner_microstep: 1431.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486
[2024-06-11 05:38:52,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.35 | bwd_microstep: 1342.92 | bwd_inner_microstep: 1342.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683
[2024-06-11 05:38:55,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.16 | bwd_microstep: 1515.95 | bwd_inner_microstep: 1515.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-11 05:38:57,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.75 | bwd_microstep: 1476.22 | bwd_inner_microstep: 1476.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2047
[2024-06-11 05:38:58,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.29 | bwd_microstep: 717.97 | bwd_inner_microstep: 717.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3998
[2024-06-11 05:39:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.05 | bwd_microstep: 1608.41 | bwd_inner_microstep: 1608.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 05:39:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 1553.65 | bwd_inner_microstep: 1553.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 05:39:04,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1402.98 | bwd_inner_microstep: 1402.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019
[2024-06-11 05:39:05,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.62 | bwd_microstep: 805.81 | bwd_inner_microstep: 805.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494
[2024-06-11 05:39:07,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1189.84 | bwd_inner_microstep: 1189.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 05:39:09,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1488.10 | bwd_inner_microstep: 1488.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2990
[2024-06-11 05:39:10,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.71 | bwd_microstep: 1294.76 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3421
[2024-06-11 05:39:13,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 1470.84 | bwd_inner_microstep: 1470.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598
[2024-06-11 05:39:14,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1408.75 | bwd_inner_microstep: 1408.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2277
[2024-06-11 05:39:16,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.38 | bwd_microstep: 1069.22 | bwd_inner_microstep: 1069.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3601
[2024-06-11 05:39:18,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.39 | bwd_microstep: 1214.02 | bwd_inner_microstep: 1214.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811
[2024-06-11 05:39:19,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.07 | bwd_microstep: 1355.96 | bwd_inner_microstep: 1355.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386
[2024-06-11 05:39:21,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.24 | bwd_microstep: 1436.21 | bwd_inner_microstep: 1436.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2275
[2024-06-11 05:39:23,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.37 | bwd_microstep: 940.10 | bwd_inner_microstep: 940.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591
[2024-06-11 05:39:25,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.68 | bwd_microstep: 1598.09 | bwd_inner_microstep: 1598.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-11 05:39:27,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1411.51 | bwd_inner_microstep: 1411.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625
[2024-06-11 05:39:29,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1602.28 | bwd_inner_microstep: 1602.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.97
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3732
[2024-06-11 05:39:31,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.86 | bwd_microstep: 1730.19 | bwd_inner_microstep: 1730.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816
[2024-06-11 05:39:37,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-11 05:39:37,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.11 | bwd_microstep: 5319.07 | bwd_inner_microstep: 1806.79 | bwd_allreduce_microstep: 3512.20 | step_microstep: 39.87
[2024-06-11 05:39:37,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16147.97 | bwd: 46966.36 | bwd_inner: 43453.22 | bwd_allreduce: 3512.44 | step: 43.29
{'loss': 1.191, 'learning_rate': 1.7728128375760877e-07, 'epoch': 0.96}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455
[2024-06-11 05:39:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.35 | bwd_microstep: 1150.76 | bwd_inner_microstep: 1150.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3983
[2024-06-11 05:39:41,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1533.63 | bwd_inner_microstep: 1533.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845
[2024-06-11 05:39:43,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.04 | bwd_microstep: 1660.05 | bwd_inner_microstep: 1660.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-11 05:39:45,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.10 | bwd_microstep: 1454.95 | bwd_inner_microstep: 1454.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788
[2024-06-11 05:39:48,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1547.65 | bwd_inner_microstep: 1547.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441
[2024-06-11 05:39:49,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1345.20 | bwd_inner_microstep: 1345.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 05:39:51,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1392.52 | bwd_inner_microstep: 1392.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 05:39:53,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1279.94 | bwd_inner_microstep: 1279.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874
[2024-06-11 05:39:54,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.76 | bwd_microstep: 680.50 | bwd_inner_microstep: 680.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686
[2024-06-11 05:39:56,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.04 | bwd_microstep: 1523.09 | bwd_inner_microstep: 1523.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682
[2024-06-11 05:39:58,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.90 | bwd_microstep: 1628.26 | bwd_inner_microstep: 1628.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671
[2024-06-11 05:40:01,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.83 | bwd_microstep: 1521.62 | bwd_inner_microstep: 1521.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673
[2024-06-11 05:40:03,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.73 | bwd_microstep: 1423.50 | bwd_inner_microstep: 1423.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 05:40:05,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.48 | bwd_microstep: 1476.98 | bwd_inner_microstep: 1476.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460
[2024-06-11 05:40:06,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.24 | bwd_microstep: 1407.60 | bwd_inner_microstep: 1407.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492
[2024-06-11 05:40:09,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1580.99 | bwd_inner_microstep: 1580.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3578
[2024-06-11 05:40:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.43 | bwd_microstep: 1336.25 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920
[2024-06-11 05:40:11,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.79 | bwd_microstep: 686.52 | bwd_inner_microstep: 686.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 05:40:14,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.09 | bwd_microstep: 1657.45 | bwd_inner_microstep: 1657.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3835
[2024-06-11 05:40:16,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1359.42 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-11 05:40:18,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1510.19 | bwd_inner_microstep: 1510.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529
[2024-06-11 05:40:20,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1493.41 | bwd_inner_microstep: 1493.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 05:40:22,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1397.92 | bwd_inner_microstep: 1397.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3699
[2024-06-11 05:40:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1427.49 | bwd_inner_microstep: 1427.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603
[2024-06-11 05:40:25,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1312.24 | bwd_inner_microstep: 1312.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 05:40:27,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.78 | bwd_microstep: 1288.50 | bwd_inner_microstep: 1288.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3573
[2024-06-11 05:40:29,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1362.57 | bwd_inner_microstep: 1362.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 05:40:31,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1498.39 | bwd_inner_microstep: 1498.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725
[2024-06-11 05:40:33,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.07 | bwd_microstep: 1627.68 | bwd_inner_microstep: 1627.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3581
[2024-06-11 05:40:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.90 | bwd_microstep: 1565.58 | bwd_inner_microstep: 1565.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2552
[2024-06-11 05:40:37,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.24 | bwd_microstep: 1062.05 | bwd_inner_microstep: 1062.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2047
[2024-06-11 05:40:44,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.58
[2024-06-11 05:40:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.78 | bwd_microstep: 6040.42 | bwd_inner_microstep: 1037.63 | bwd_allreduce_microstep: 5002.72 | step_microstep: 40.05
[2024-06-11 05:40:44,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16488.19 | bwd: 49233.35 | bwd_inner: 44229.70 | bwd_allreduce: 5002.96 | step: 41.59
{'loss': 1.2292, 'learning_rate': 1.7232974619226572e-07, 'epoch': 0.96}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041
[2024-06-11 05:40:45,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 800.97 | bwd_inner_microstep: 800.86 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 05:40:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.24 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836
[2024-06-11 05:40:48,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.05 | bwd_microstep: 1448.49 | bwd_inner_microstep: 1448.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3841
[2024-06-11 05:40:50,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1484.72 | bwd_inner_microstep: 1484.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228
[2024-06-11 05:40:52,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.00 | bwd_microstep: 955.71 | bwd_inner_microstep: 955.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745
[2024-06-11 05:40:54,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.66 | bwd_microstep: 1532.83 | bwd_inner_microstep: 1532.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 05:40:56,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.07 | bwd_microstep: 1475.57 | bwd_inner_microstep: 1475.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413
[2024-06-11 05:40:58,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.00 | bwd_microstep: 1149.84 | bwd_inner_microstep: 1149.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1675
[2024-06-11 05:40:58,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 252.72 | bwd_microstep: 665.86 | bwd_inner_microstep: 665.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 05:41:00,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.22 | bwd_microstep: 1278.12 | bwd_inner_microstep: 1278.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 05:41:02,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1385.02 | bwd_inner_microstep: 1384.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972
[2024-06-11 05:41:03,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.45 | bwd_microstep: 828.77 | bwd_inner_microstep: 828.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2685
[2024-06-11 05:41:05,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.14 | bwd_microstep: 1119.81 | bwd_inner_microstep: 1119.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521
[2024-06-11 05:41:07,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.95 | bwd_microstep: 1449.83 | bwd_inner_microstep: 1449.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439
[2024-06-11 05:41:09,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.01 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 05:41:11,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1356.62 | bwd_inner_microstep: 1356.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446
[2024-06-11 05:41:12,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.52 | bwd_microstep: 1218.53 | bwd_inner_microstep: 1218.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620
[2024-06-11 05:41:13,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.47 | bwd_microstep: 260.44 | bwd_inner_microstep: 260.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470
[2024-06-11 05:41:14,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.54 | bwd_microstep: 1184.42 | bwd_inner_microstep: 1184.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3616
[2024-06-11 05:41:16,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1246.57 | bwd_inner_microstep: 1246.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453
[2024-06-11 05:41:18,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1258.37 | bwd_inner_microstep: 1258.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 05:41:20,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1290.81 | bwd_inner_microstep: 1290.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383
[2024-06-11 05:41:22,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.12 | bwd_microstep: 1435.64 | bwd_inner_microstep: 1435.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021
[2024-06-11 05:41:23,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.81 | bwd_microstep: 714.92 | bwd_inner_microstep: 714.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815
[2024-06-11 05:41:25,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.81 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519
[2024-06-11 05:41:26,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.79 | bwd_microstep: 1292.57 | bwd_inner_microstep: 1292.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 05:41:28,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.64 | bwd_microstep: 1450.76 | bwd_inner_microstep: 1450.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3440
[2024-06-11 05:41:30,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.51 | bwd_microstep: 1136.15 | bwd_inner_microstep: 1136.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 05:41:32,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1555.46 | bwd_inner_microstep: 1555.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470
[2024-06-11 05:41:34,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 1408.99 | bwd_inner_microstep: 1408.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3612
[2024-06-11 05:41:36,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1573.86 | bwd_inner_microstep: 1573.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3027
[2024-06-11 05:41:45,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.08 | optimizer_step: 6.61
[2024-06-11 05:41:45,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.75 | bwd_microstep: 8161.44 | bwd_inner_microstep: 1278.74 | bwd_allreduce_microstep: 6882.65 | step_microstep: 37.88
[2024-06-11 05:41:45,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14771.81 | bwd: 46329.09 | bwd_inner: 39445.44 | bwd_allreduce: 6882.92 | step: 39.36
{'loss': 1.1811, 'learning_rate': 1.6744804157848183e-07, 'epoch': 0.96}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-11 05:41:47,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1467.76 | bwd_inner_microstep: 1467.57 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3509
[2024-06-11 05:41:49,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.25 | bwd_microstep: 1217.91 | bwd_inner_microstep: 1217.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471
[2024-06-11 05:41:51,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1478.36 | bwd_inner_microstep: 1478.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3807
[2024-06-11 05:41:53,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.54 | bwd_microstep: 1413.37 | bwd_inner_microstep: 1413.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514
[2024-06-11 05:41:54,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1316.77 | bwd_inner_microstep: 1316.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-11 05:41:56,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.37 | bwd_microstep: 787.61 | bwd_inner_microstep: 787.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-11 05:41:58,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1430.87 | bwd_inner_microstep: 1430.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441
[2024-06-11 05:41:59,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.40 | bwd_microstep: 1185.71 | bwd_inner_microstep: 1185.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438
[2024-06-11 05:42:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.63 | bwd_microstep: 1152.16 | bwd_inner_microstep: 1152.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002
[2024-06-11 05:42:02,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.44 | bwd_microstep: 803.87 | bwd_inner_microstep: 803.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 05:42:04,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1254.65 | bwd_inner_microstep: 1254.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-11 05:42:06,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.61 | bwd_microstep: 1623.07 | bwd_inner_microstep: 1623.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2333
[2024-06-11 05:42:07,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.24 | bwd_microstep: 795.50 | bwd_inner_microstep: 795.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2298
[2024-06-11 05:42:08,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.63 | bwd_microstep: 1069.94 | bwd_inner_microstep: 1069.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634
[2024-06-11 05:42:11,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.98 | bwd_microstep: 1615.23 | bwd_inner_microstep: 1615.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511
[2024-06-11 05:42:13,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 05:42:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.95 | bwd_microstep: 1486.41 | bwd_inner_microstep: 1486.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-11 05:42:17,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.46 | bwd_microstep: 1317.86 | bwd_inner_microstep: 1317.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 05:42:19,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3684
[2024-06-11 05:42:21,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.84 | bwd_microstep: 1487.76 | bwd_inner_microstep: 1487.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 05:42:23,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.43 | bwd_microstep: 1549.17 | bwd_inner_microstep: 1549.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459
[2024-06-11 05:42:24,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1183.79 | bwd_inner_microstep: 1183.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446
[2024-06-11 05:42:26,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.14 | bwd_microstep: 1159.36 | bwd_inner_microstep: 1159.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568
[2024-06-11 05:42:28,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.59 | bwd_microstep: 1498.04 | bwd_inner_microstep: 1498.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 05:42:30,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1560.24 | bwd_inner_microstep: 1560.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926
[2024-06-11 05:42:31,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.96 | bwd_microstep: 788.12 | bwd_inner_microstep: 788.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2613
[2024-06-11 05:42:33,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.21 | bwd_microstep: 1047.91 | bwd_inner_microstep: 1047.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3553
[2024-06-11 05:42:35,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1455.74 | bwd_inner_microstep: 1455.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561
[2024-06-11 05:42:37,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.29 | bwd_microstep: 1332.05 | bwd_inner_microstep: 1332.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 05:42:39,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1557.21 | bwd_inner_microstep: 1557.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767
[2024-06-11 05:42:41,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.91 | bwd_microstep: 1572.53 | bwd_inner_microstep: 1572.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3567
[2024-06-11 05:42:47,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.44 | optimizer_step: 6.58
[2024-06-11 05:42:47,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.73 | bwd_microstep: 5525.35 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 4147.82 | step_microstep: 40.05
[2024-06-11 05:42:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15647.42 | bwd: 46003.76 | bwd_inner: 41854.86 | bwd_allreduce: 4148.15 | step: 41.58
{'loss': 1.22, 'learning_rate': 1.6263618710959494e-07, 'epoch': 0.96}
███▌| 1654/1726 [29:01:11<1:14:16, 61.90s/it]


 96%|█████████▌| 1654/1726 [29:01:11<1:14:16, 61.90s/it]
 96%|█████████▌| 1655/1726 [29:02:14<1:13:48, 62.37s/it]


 96%|█████████▌| 1655/1726 [29:02:14<1:13:48, 62.37s/it]
 96%|█████████▌| 1656/1726 [29:03:20<1:14:03, 63.48s/it]


 96%|█████████▌| 1656/1726 [29:03:20<1:14:03, 63.48s/it]
 96%|█████████▌| 1657/1726 [29:04:22<1:12:17, 62.86s/it]


 96%|█████████▌| 1657/1726 [29:04:22<1:12:17, 62.86s/it]
 96%|█████████▌| 1658/1726 [29:05:24<1:10:57, 62.60s/it]


 96%|█████████▌| 1658/1726 [29:05:24<1:10:57dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 05:42:49,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1236.23 | bwd_inner_microstep: 1236.17 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4304
[2024-06-11 05:42:51,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.20 | bwd_microstep: 1583.58 | bwd_inner_microstep: 1583.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415
[2024-06-11 05:42:53,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1306.81 | bwd_inner_microstep: 1306.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465
[2024-06-11 05:42:55,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1405.06 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867
[2024-06-11 05:42:56,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.09 | bwd_microstep: 709.09 | bwd_inner_microstep: 709.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494
[2024-06-11 05:42:58,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.46 | bwd_microstep: 1384.66 | bwd_inner_microstep: 1384.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4015
[2024-06-11 05:43:00,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.16 | bwd_microstep: 1609.06 | bwd_inner_microstep: 1609.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.69
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-11 05:43:01,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.78 | bwd_microstep: 794.82 | bwd_inner_microstep: 794.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479
[2024-06-11 05:43:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1994
[2024-06-11 05:43:04,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.54 | bwd_microstep: 898.04 | bwd_inner_microstep: 898.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 05:43:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 05:43:08,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1474.50 | bwd_inner_microstep: 1474.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-11 05:43:10,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.81 | bwd_microstep: 1480.27 | bwd_inner_microstep: 1480.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 05:43:12,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.35 | bwd_microstep: 1481.97 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-11 05:43:14,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.37 | bwd_microstep: 1281.74 | bwd_inner_microstep: 1281.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633
[2024-06-11 05:43:16,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 1513.30 | bwd_inner_microstep: 1513.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2157
[2024-06-11 05:43:17,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.64 | bwd_microstep: 949.23 | bwd_inner_microstep: 949.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 05:43:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1353.00 | bwd_inner_microstep: 1352.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 05:43:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1255.43 | bwd_inner_microstep: 1255.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630
[2024-06-11 05:43:23,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1315.42 | bwd_inner_microstep: 1315.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818
[2024-06-11 05:43:25,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.00 | bwd_microstep: 1358.96 | bwd_inner_microstep: 1358.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835
[2024-06-11 05:43:27,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1556.00 | bwd_inner_microstep: 1555.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008
[2024-06-11 05:43:28,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.48 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603
[2024-06-11 05:43:29,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.88 | bwd_microstep: 1211.79 | bwd_inner_microstep: 1211.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 05:43:31,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429
[2024-06-11 05:43:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.71 | bwd_microstep: 1281.44 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3806
[2024-06-11 05:43:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.45 | bwd_microstep: 1598.18 | bwd_inner_microstep: 1598.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633
[2024-06-11 05:43:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.45 | bwd_microstep: 1540.65 | bwd_inner_microstep: 1540.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064
[2024-06-11 05:43:39,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.67 | bwd_microstep: 914.10 | bwd_inner_microstep: 914.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583
[2024-06-11 05:43:41,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.97 | bwd_microstep: 1599.97 | bwd_inner_microstep: 1599.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803
[2024-06-11 05:43:43,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.80 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3749
[2024-06-11 05:43:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.07 | optimizer_step: 6.60
[2024-06-11 05:43:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.56 | bwd_microstep: 3697.91 | bwd_inner_microstep: 1927.84 | bwd_allreduce_microstep: 1770.02 | step_microstep: 38.24
[2024-06-11 05:43:47,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15802.28 | bwd: 44235.21 | bwd_inner: 42464.24 | bwd_allreduce: 1770.26 | step: 42.49
{'loss': 1.1647, 'learning_rate': 1.5789419973293306e-07, 'epoch': 0.96}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3406
[2024-06-11 05:43:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1441.07 | bwd_inner_microstep: 1441.00 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2413
[2024-06-11 05:43:51,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.95 | bwd_microstep: 1001.39 | bwd_inner_microstep: 1001.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980
[2024-06-11 05:43:52,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.19 | bwd_microstep: 795.17 | bwd_inner_microstep: 795.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3797
[2024-06-11 05:43:54,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.65 | bwd_microstep: 1415.68 | bwd_inner_microstep: 1415.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2232
[2024-06-11 05:43:55,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.65 | bwd_microstep: 965.42 | bwd_inner_microstep: 965.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405
[2024-06-11 05:43:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 05:43:59,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.21 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721
[2024-06-11 05:44:01,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.67 | bwd_microstep: 1631.06 | bwd_inner_microstep: 1631.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 05:44:03,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406
[2024-06-11 05:44:05,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1248.72 | bwd_inner_microstep: 1248.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3569
[2024-06-11 05:44:06,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.64 | bwd_microstep: 1237.36 | bwd_inner_microstep: 1237.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711
[2024-06-11 05:44:08,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1519.23 | bwd_inner_microstep: 1519.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504
[2024-06-11 05:44:10,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.80 | bwd_microstep: 1434.83 | bwd_inner_microstep: 1434.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672
[2024-06-11 05:44:13,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.62 | bwd_microstep: 1616.02 | bwd_inner_microstep: 1615.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3519
[2024-06-11 05:44:15,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.76 | bwd_microstep: 1556.65 | bwd_inner_microstep: 1556.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2886
[2024-06-11 05:44:16,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.09 | bwd_microstep: 1123.92 | bwd_inner_microstep: 1123.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2362
[2024-06-11 05:44:18,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.03 | bwd_microstep: 1086.16 | bwd_inner_microstep: 1086.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445
[2024-06-11 05:44:19,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1255.29 | bwd_inner_microstep: 1255.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401
[2024-06-11 05:44:21,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.16 | bwd_microstep: 1371.69 | bwd_inner_microstep: 1371.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-11 05:44:23,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.52 | bwd_microstep: 1181.14 | bwd_inner_microstep: 1181.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2558
[2024-06-11 05:44:24,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.93 | bwd_microstep: 1034.55 | bwd_inner_microstep: 1034.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503
[2024-06-11 05:44:26,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1319.79 | bwd_inner_microstep: 1319.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085
[2024-06-11 05:44:27,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.19 | bwd_microstep: 851.67 | bwd_inner_microstep: 851.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599
[2024-06-11 05:44:29,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545
[2024-06-11 05:44:31,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1297.22 | bwd_inner_microstep: 1297.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2452
[2024-06-11 05:44:32,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.41 | bwd_microstep: 923.15 | bwd_inner_microstep: 923.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-11 05:44:35,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1527.32 | bwd_inner_microstep: 1527.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1920
[2024-06-11 05:44:36,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.62 | bwd_microstep: 811.81 | bwd_inner_microstep: 811.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3587
[2024-06-11 05:44:38,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.28 | bwd_microstep: 1528.39 | bwd_inner_microstep: 1528.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776
[2024-06-11 05:44:40,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.66 | bwd_microstep: 1647.26 | bwd_inner_microstep: 1647.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823
[2024-06-11 05:44:42,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.28 | bwd_microstep: 1582.87 | bwd_inner_microstep: 1582.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2014
[2024-06-11 05:44:50,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.18 | optimizer_step: 6.60
[2024-06-11 05:44:50,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.85 | bwd_microstep: 7771.36 | bwd_inner_microstep: 877.26 | bwd_allreduce_microstep: 6894.04 | step_microstep: 39.16
[2024-06-11 05:44:50,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15181.03 | bwd: 47539.07 | bwd_inner: 40644.06 | bwd_allreduce: 6894.31 | step: 40.67
{'loss': 1.1898, 'learning_rate': 1.5322209614975214e-07, 'epoch': 0.96}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-11 05:44:52,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.86 | bwd_microstep: 1332.60 | bwd_inner_microstep: 1332.47 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3881
[2024-06-11 05:44:54,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.10 | bwd_microstep: 1477.73 | bwd_inner_microstep: 1477.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2317
[2024-06-11 05:44:55,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.00 | bwd_microstep: 880.38 | bwd_inner_microstep: 880.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563
[2024-06-11 05:44:58,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1455.90 | bwd_inner_microstep: 1455.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-11 05:44:59,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1184.01 | bwd_inner_microstep: 1183.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 05:45:01,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1389.75 | bwd_inner_microstep: 1389.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-11 05:45:03,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.27 | bwd_microstep: 1527.78 | bwd_inner_microstep: 1527.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952
[2024-06-11 05:45:04,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 789.27 | bwd_inner_microstep: 789.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-11 05:45:05,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.61 | bwd_microstep: 701.55 | bwd_inner_microstep: 701.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710
[2024-06-11 05:45:07,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.31 | bwd_microstep: 1530.31 | bwd_inner_microstep: 1530.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542
[2024-06-11 05:45:09,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1326.07 | bwd_inner_microstep: 1326.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487
[2024-06-11 05:45:11,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.17 | bwd_microstep: 1341.70 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-11 05:45:13,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1500.94 | bwd_inner_microstep: 1500.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383
[2024-06-11 05:45:15,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.23 | bwd_microstep: 1336.02 | bwd_inner_microstep: 1335.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518
[2024-06-11 05:45:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1575.63 | bwd_inner_microstep: 1575.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 05:45:19,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1386.12 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2532
[2024-06-11 05:45:20,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.76 | bwd_microstep: 962.57 | bwd_inner_microstep: 962.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 05:45:22,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1499.97 | bwd_inner_microstep: 1499.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-11 05:45:24,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 797.46 | bwd_inner_microstep: 797.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-11 05:45:26,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.16 | bwd_microstep: 1655.94 | bwd_inner_microstep: 1655.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3474
[2024-06-11 05:45:28,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.92 | bwd_microstep: 1216.12 | bwd_inner_microstep: 1216.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821
[2024-06-11 05:45:30,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.99 | bwd_microstep: 1580.52 | bwd_inner_microstep: 1580.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3676
[2024-06-11 05:45:32,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.43 | bwd_microstep: 1756.17 | bwd_inner_microstep: 1756.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 05:45:34,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.37 | bwd_microstep: 1379.73 | bwd_inner_microstep: 1379.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 05:45:36,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.89 | bwd_microstep: 1345.10 | bwd_inner_microstep: 1345.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2086
[2024-06-11 05:45:37,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.34 | bwd_microstep: 1012.70 | bwd_inner_microstep: 1012.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-11 05:45:39,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1549.20 | bwd_inner_microstep: 1549.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3686
[2024-06-11 05:45:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.58 | bwd_microstep: 1829.78 | bwd_inner_microstep: 1829.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567
[2024-06-11 05:45:44,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1493.07 | bwd_inner_microstep: 1493.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733
[2024-06-11 05:45:46,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.44 | bwd_microstep: 1536.24 | bwd_inner_microstep: 1536.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747
[2024-06-11 05:45:48,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.98 | bwd_microstep: 1542.55 | bwd_inner_microstep: 1542.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-11 05:45:55,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.16 | optimizer_step: 6.61
[2024-06-11 05:45:55,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.28 | bwd_microstep: 6010.83 | bwd_inner_microstep: 1755.02 | bwd_allreduce_microstep: 4255.76 | step_microstep: 38.89
[2024-06-11 05:45:55,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16207.45 | bwd: 47903.76 | bwd_inner: 43646.95 | bwd_allreduce: 4256.05 | step: 40.65
{'loss': 1.1844, 'learning_rate': 1.4861989281517386e-07, 'epoch': 0.96}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558
[2024-06-11 05:45:57,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.73 | bwd_microstep: 1287.94 | bwd_inner_microstep: 1287.75 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3858
[2024-06-11 05:45:59,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.18 | bwd_microstep: 1560.11 | bwd_inner_microstep: 1560.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 05:46:01,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.68 | bwd_microstep: 1402.23 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810
[2024-06-11 05:46:03,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.38 | bwd_microstep: 1648.96 | bwd_inner_microstep: 1648.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 05:46:05,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-11 05:46:07,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.93 | bwd_microstep: 1158.82 | bwd_inner_microstep: 1158.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-11 05:46:08,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.75 | bwd_microstep: 797.54 | bwd_inner_microstep: 797.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062
[2024-06-11 05:46:09,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.62 | bwd_microstep: 815.82 | bwd_inner_microstep: 815.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3616
[2024-06-11 05:46:10,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.17 | bwd_microstep: 1217.56 | bwd_inner_microstep: 1217.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614
[2024-06-11 05:46:13,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.93 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3685
[2024-06-11 05:46:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.84 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2024
[2024-06-11 05:46:16,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.86 | bwd_microstep: 744.09 | bwd_inner_microstep: 744.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-11 05:46:18,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.68 | bwd_microstep: 1281.44 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520
[2024-06-11 05:46:20,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1575.93 | bwd_inner_microstep: 1575.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512
[2024-06-11 05:46:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.43 | bwd_microstep: 1577.91 | bwd_inner_microstep: 1577.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3688
[2024-06-11 05:46:24,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.73 | bwd_microstep: 1690.39 | bwd_inner_microstep: 1690.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389
[2024-06-11 05:46:26,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1303.94 | bwd_inner_microstep: 1303.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3631
[2024-06-11 05:46:28,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1573.18 | bwd_inner_microstep: 1573.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3690
[2024-06-11 05:46:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.45 | bwd_microstep: 1577.36 | bwd_inner_microstep: 1577.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522
[2024-06-11 05:46:32,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1392.86 | bwd_inner_microstep: 1392.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-11 05:46:34,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.21 | bwd_microstep: 1295.42 | bwd_inner_microstep: 1295.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 05:46:36,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 1657.16 | bwd_inner_microstep: 1657.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471
[2024-06-11 05:46:38,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.46 | bwd_microstep: 1184.18 | bwd_inner_microstep: 1184.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556
[2024-06-11 05:46:40,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1401.36 | bwd_inner_microstep: 1401.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3584
[2024-06-11 05:46:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.61 | bwd_microstep: 1239.66 | bwd_inner_microstep: 1239.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3632
[2024-06-11 05:46:43,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.15 | bwd_microstep: 1314.32 | bwd_inner_microstep: 1314.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549
[2024-06-11 05:46:45,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1329.90 | bwd_inner_microstep: 1329.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286
[2024-06-11 05:46:47,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.63 | bwd_microstep: 880.49 | bwd_inner_microstep: 880.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817
[2024-06-11 05:46:48,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1387.28 | bwd_inner_microstep: 1387.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576
[2024-06-11 05:46:50,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1494.51 | bwd_inner_microstep: 1494.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822
[2024-06-11 05:46:52,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.03 | bwd_microstep: 1393.74 | bwd_inner_microstep: 1393.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1977
[2024-06-11 05:46:57,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.14 | optimizer_step: 6.57
[2024-06-11 05:46:57,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.62 | bwd_microstep: 4464.00 | bwd_inner_microstep: 834.23 | bwd_allreduce_microstep: 3629.71 | step_microstep: 38.92
[2024-06-11 05:46:57,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15885.44 | bwd: 46131.37 | bwd_inner: 42500.60 | bwd_allreduce: 3630.02 | step: 40.49
{'loss': 1.1454, 'learning_rate': 1.4408760593813463e-07, 'epoch': 0.96}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407
[2024-06-11 05:46:59,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.22 | bwd_microstep: 1236.51 | bwd_inner_microstep: 1236.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482
[2024-06-11 05:47:01,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.58 | bwd_microstep: 1186.49 | bwd_inner_microstep: 1186.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3868
[2024-06-11 05:47:03,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1469.09 | bwd_inner_microstep: 1469.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2304
[2024-06-11 05:47:04,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.18 | bwd_microstep: 908.42 | bwd_inner_microstep: 908.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3758
[2024-06-11 05:47:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1469.92 | bwd_inner_microstep: 1469.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-11 05:47:08,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.89 | bwd_microstep: 1182.52 | bwd_inner_microstep: 1182.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 05:47:09,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.08 | bwd_microstep: 1293.18 | bwd_inner_microstep: 1293.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407
[2024-06-11 05:47:11,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1344.76 | bwd_inner_microstep: 1344.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493
[2024-06-11 05:47:13,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 1483.45 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988
[2024-06-11 05:47:14,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.50 | bwd_microstep: 898.86 | bwd_inner_microstep: 898.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-11 05:47:16,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.21 | bwd_microstep: 1369.72 | bwd_inner_microstep: 1369.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917
[2024-06-11 05:47:17,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.51 | bwd_microstep: 785.43 | bwd_inner_microstep: 785.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533
[2024-06-11 05:47:20,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.43 | bwd_microstep: 1585.48 | bwd_inner_microstep: 1585.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2723
[2024-06-11 05:47:21,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.09 | bwd_microstep: 1076.36 | bwd_inner_microstep: 1076.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3519
[2024-06-11 05:47:23,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.12 | bwd_microstep: 1241.62 | bwd_inner_microstep: 1241.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469
[2024-06-11 05:47:25,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.83 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2185
[2024-06-11 05:47:26,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.81 | bwd_microstep: 805.89 | bwd_inner_microstep: 805.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547
[2024-06-11 05:47:28,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1296.20 | bwd_inner_microstep: 1296.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 05:47:30,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.86 | bwd_microstep: 1661.61 | bwd_inner_microstep: 1661.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790
[2024-06-11 05:47:32,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1554.39 | bwd_inner_microstep: 1554.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 05:47:34,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1399.28 | bwd_inner_microstep: 1399.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 05:47:36,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1277.18 | bwd_inner_microstep: 1277.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 05:47:37,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.04 | bwd_microstep: 975.86 | bwd_inner_microstep: 975.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613
[2024-06-11 05:47:39,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.93 | bwd_microstep: 1609.46 | bwd_inner_microstep: 1609.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-11 05:47:41,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.67 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807
[2024-06-11 05:47:43,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.78 | bwd_microstep: 1357.52 | bwd_inner_microstep: 1357.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 05:47:45,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.17 | bwd_microstep: 1665.04 | bwd_inner_microstep: 1665.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3781
[2024-06-11 05:47:47,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1495.02 | bwd_inner_microstep: 1494.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436
[2024-06-11 05:47:49,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.11 | bwd_microstep: 1312.57 | bwd_inner_microstep: 1312.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442
[2024-06-11 05:47:51,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3596
[2024-06-11 05:47:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.47 | bwd_microstep: 1705.35 | bwd_inner_microstep: 1705.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807
[2024-06-11 05:47:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.60
[2024-06-11 05:47:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 4706.76 | bwd_inner_microstep: 1753.76 | bwd_allreduce_microstep: 2952.94 | step_microstep: 39.50
[2024-06-11 05:47:59,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15810.79 | bwd: 45346.38 | bwd_inner: 42392.52 | bwd_allreduce: 2953.18 | step: 40.93
{'loss': 1.146, 'learning_rate': 1.396252514813279e-07, 'epoch': 0.96}
, 62.60s/it]
 96%|█████████▌| 1659/1726 [29:06:24<1:09:09, 61.94s/it]


 96%|█████████▌| 1659/1726 [29:06:24<1:09:09, 61.94s/it]
 96%|█████████▌| 1660/1726 [29:07:27<1:08:30, 62.27s/it]


 96%|█████████▌| 1660/1726 [29:07:27<1:08:30, 62.27s/it]
 96%|█████████▌| 1661/1726 [29:08:32<1:08:10, 62.93s/it]


 96%|█████████▌| 1661/1726 [29:08:32<1:08:10, 62.93s/it]
 96%|█████████▋| 1662/1726 [29:09:34<1:06:56, 62.76s/it]


 96%|█████████▋| 1662/1726 [29:09:34<1:06:56, 62.76s/it]
 96%|█████████▋| 1663/1726 [29:10:35<1:05:29, 62.38s/it]


 96%|███████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 05:48:01,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.13 | bwd_microstep: 1342.14 | bwd_inner_microstep: 1341.99 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-11 05:48:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.11 | bwd_microstep: 1240.52 | bwd_inner_microstep: 1240.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 05:48:04,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302
[2024-06-11 05:48:05,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.54 | bwd_microstep: 974.67 | bwd_inner_microstep: 974.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402
[2024-06-11 05:48:07,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.74 | bwd_microstep: 1277.07 | bwd_inner_microstep: 1277.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 05:48:09,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1281.29 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559
[2024-06-11 05:48:11,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.11 | bwd_microstep: 1298.12 | bwd_inner_microstep: 1298.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2718
[2024-06-11 05:48:12,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.11 | bwd_microstep: 969.39 | bwd_inner_microstep: 969.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 05:48:14,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1248.07 | bwd_inner_microstep: 1248.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404
[2024-06-11 05:48:16,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.85 | bwd_microstep: 1181.59 | bwd_inner_microstep: 1181.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706
[2024-06-11 05:48:18,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.68 | bwd_microstep: 1620.46 | bwd_inner_microstep: 1620.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-11 05:48:19,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 797.03 | bwd_inner_microstep: 797.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494
[2024-06-11 05:48:21,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1483.66 | bwd_inner_microstep: 1483.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385
[2024-06-11 05:48:23,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1334.59 | bwd_inner_microstep: 1334.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3486
[2024-06-11 05:48:25,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.70 | bwd_microstep: 1510.37 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589
[2024-06-11 05:48:27,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1315.02 | bwd_inner_microstep: 1314.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652
[2024-06-11 05:48:28,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.41 | bwd_microstep: 1326.13 | bwd_inner_microstep: 1326.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 05:48:30,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512
[2024-06-11 05:48:32,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1289.03 | bwd_inner_microstep: 1289.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697
[2024-06-11 05:48:34,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.85 | bwd_microstep: 1631.29 | bwd_inner_microstep: 1631.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-11 05:48:37,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.11 | bwd_microstep: 1528.35 | bwd_inner_microstep: 1528.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 05:48:38,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.38 | bwd_inner_microstep: 1254.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429
[2024-06-11 05:48:40,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1252.72 | bwd_inner_microstep: 1252.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2170
[2024-06-11 05:48:41,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.89 | bwd_microstep: 856.57 | bwd_inner_microstep: 856.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2180
[2024-06-11 05:48:42,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.79 | bwd_microstep: 764.08 | bwd_inner_microstep: 764.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-11 05:48:44,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1407.03 | bwd_inner_microstep: 1407.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 05:48:46,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3720
[2024-06-11 05:48:48,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.66 | bwd_microstep: 1467.99 | bwd_inner_microstep: 1467.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-11 05:48:50,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1492.36 | bwd_inner_microstep: 1492.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-11 05:48:52,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.61 | bwd_microstep: 1351.10 | bwd_inner_microstep: 1351.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3749
[2024-06-11 05:48:54,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.55 | bwd_microstep: 1572.58 | bwd_inner_microstep: 1572.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435
[2024-06-11 05:49:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.59 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 05:49:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 4808.40 | bwd_inner_microstep: 1765.40 | bwd_allreduce_microstep: 3042.94 | step_microstep: 39.25
[2024-06-11 05:49:00,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15589.94 | bwd: 44986.04 | bwd_inner: 41942.08 | bwd_allreduce: 3043.24 | step: 40.82
{'loss': 1.1778, 'learning_rate': 1.3523284516113955e-07, 'epoch': 0.96}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 05:49:01,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1242.66 | bwd_inner_microstep: 1242.48 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-11 05:49:02,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.30 | bwd_microstep: 785.90 | bwd_inner_microstep: 785.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 05:49:05,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1550.59 | bwd_inner_microstep: 1550.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 05:49:07,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1548.79 | bwd_inner_microstep: 1548.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-11 05:49:08,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.32 | bwd_microstep: 1246.83 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 05:49:10,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.52 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 05:49:12,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1383.56 | bwd_inner_microstep: 1383.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904
[2024-06-11 05:49:13,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.63 | bwd_microstep: 684.67 | bwd_inner_microstep: 684.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438
[2024-06-11 05:49:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1253.57 | bwd_inner_microstep: 1253.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958
[2024-06-11 05:49:16,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.47 | bwd_microstep: 793.61 | bwd_inner_microstep: 793.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2924
[2024-06-11 05:49:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.80 | bwd_microstep: 1126.76 | bwd_inner_microstep: 1126.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-11 05:49:19,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.66 | bwd_microstep: 1437.93 | bwd_inner_microstep: 1437.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411
[2024-06-11 05:49:21,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1256.49 | bwd_inner_microstep: 1256.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3835
[2024-06-11 05:49:24,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.36 | bwd_microstep: 1755.02 | bwd_inner_microstep: 1754.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 05:49:26,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1376.91 | bwd_inner_microstep: 1376.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 05:49:27,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1374.36 | bwd_inner_microstep: 1374.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 05:49:30,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.48 | bwd_microstep: 1517.51 | bwd_inner_microstep: 1517.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2072
[2024-06-11 05:49:31,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.92 | bwd_microstep: 848.28 | bwd_inner_microstep: 848.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-11 05:49:32,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.32 | bwd_microstep: 789.55 | bwd_inner_microstep: 789.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625
[2024-06-11 05:49:34,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918
[2024-06-11 05:49:35,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.22 | bwd_microstep: 688.56 | bwd_inner_microstep: 688.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-11 05:49:37,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1456.95 | bwd_inner_microstep: 1456.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-11 05:49:39,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1415.02 | bwd_inner_microstep: 1414.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817
[2024-06-11 05:49:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.29 | bwd_microstep: 1657.44 | bwd_inner_microstep: 1657.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3468
[2024-06-11 05:49:43,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.90 | bwd_microstep: 1214.70 | bwd_inner_microstep: 1214.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-11 05:49:45,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1452.03 | bwd_inner_microstep: 1452.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092
[2024-06-11 05:49:46,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.10 | bwd_microstep: 919.11 | bwd_inner_microstep: 919.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598
[2024-06-11 05:49:48,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.20 | bwd_microstep: 1609.19 | bwd_inner_microstep: 1609.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807
[2024-06-11 05:49:50,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.61 | bwd_microstep: 1631.57 | bwd_inner_microstep: 1631.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469
[2024-06-11 05:49:52,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.76 | bwd_microstep: 1405.63 | bwd_inner_microstep: 1405.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601
[2024-06-11 05:49:54,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1414.51 | bwd_inner_microstep: 1414.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 05:50:02,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.30 | optimizer_step: 6.56
[2024-06-11 05:50:02,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 7005.38 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 5457.90 | step_microstep: 39.94
[2024-06-11 05:50:02,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15379.95 | bwd: 46632.37 | bwd_inner: 41173.40 | bwd_allreduce: 5458.21 | step: 41.73
{'loss': 1.2043, 'learning_rate': 1.30910402447606e-07, 'epoch': 0.96}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468
[2024-06-11 05:50:04,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.12 | bwd_microstep: 1567.09 | bwd_inner_microstep: 1567.01 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401
[2024-06-11 05:50:06,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1341.15 | bwd_inner_microstep: 1341.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555
[2024-06-11 05:50:08,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-11 05:50:10,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1451.58 | bwd_inner_microstep: 1451.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2924
[2024-06-11 05:50:11,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.44 | bwd_microstep: 1093.81 | bwd_inner_microstep: 1093.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 05:50:14,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.60 | bwd_microstep: 1552.51 | bwd_inner_microstep: 1552.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 05:50:15,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 05:50:17,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1381.48 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 855
[2024-06-11 05:50:18,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.59 | bwd_microstep: 379.79 | bwd_inner_microstep: 379.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974
[2024-06-11 05:50:19,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.05 | bwd_microstep: 895.02 | bwd_inner_microstep: 894.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507
[2024-06-11 05:50:21,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1483.96 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2092
[2024-06-11 05:50:22,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.60 | bwd_microstep: 821.10 | bwd_inner_microstep: 821.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-11 05:50:23,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 790.00 | bwd_inner_microstep: 789.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509
[2024-06-11 05:50:26,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1584.61 | bwd_inner_microstep: 1584.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3479
[2024-06-11 05:50:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1446.29 | bwd_inner_microstep: 1446.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609
[2024-06-11 05:50:30,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1503.88 | bwd_inner_microstep: 1503.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3825
[2024-06-11 05:50:32,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.62 | bwd_microstep: 1719.48 | bwd_inner_microstep: 1719.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-11 05:50:34,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.75 | bwd_microstep: 1420.03 | bwd_inner_microstep: 1420.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3507
[2024-06-11 05:50:36,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.81 | bwd_microstep: 1333.89 | bwd_inner_microstep: 1333.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836
[2024-06-11 05:50:38,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1662.61 | bwd_inner_microstep: 1662.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 05:50:40,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1251.24 | bwd_inner_microstep: 1251.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299
[2024-06-11 05:50:41,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.82 | bwd_microstep: 976.50 | bwd_inner_microstep: 976.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613
[2024-06-11 05:50:43,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1413.41 | bwd_inner_microstep: 1413.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-11 05:50:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.63 | bwd_microstep: 1311.03 | bwd_inner_microstep: 1311.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269
[2024-06-11 05:50:46,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 970.96 | bwd_inner_microstep: 970.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2548
[2024-06-11 05:50:48,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.25 | bwd_microstep: 1094.22 | bwd_inner_microstep: 1094.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518
[2024-06-11 05:50:49,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.18 | bwd_microstep: 1192.53 | bwd_inner_microstep: 1192.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-11 05:50:52,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1497.80 | bwd_inner_microstep: 1497.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599
[2024-06-11 05:50:54,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.06 | bwd_microstep: 1703.97 | bwd_inner_microstep: 1703.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3436
[2024-06-11 05:50:56,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1334.04 | bwd_inner_microstep: 1334.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805
[2024-06-11 05:50:58,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.04 | bwd_microstep: 1449.26 | bwd_inner_microstep: 1449.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3773
[2024-06-11 05:51:06,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.13 | optimizer_step: 6.60
[2024-06-11 05:51:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.83 | bwd_microstep: 7250.96 | bwd_inner_microstep: 1550.60 | bwd_allreduce_microstep: 5700.30 | step_microstep: 38.30
[2024-06-11 05:51:06,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15602.36 | bwd: 47647.02 | bwd_inner: 41945.74 | bwd_allreduce: 5700.57 | step: 39.84
{'loss': 1.1698, 'learning_rate': 1.2665793856434516e-07, 'epoch': 0.97}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 05:51:07,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1286.92 | bwd_inner_microstep: 1286.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700
[2024-06-11 05:51:09,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.91 | bwd_microstep: 1522.38 | bwd_inner_microstep: 1522.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3838
[2024-06-11 05:51:12,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.12 | bwd_microstep: 1513.20 | bwd_inner_microstep: 1513.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3880
[2024-06-11 05:51:13,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1415.95 | bwd_inner_microstep: 1415.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544
[2024-06-11 05:51:15,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1425.20 | bwd_inner_microstep: 1425.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887
[2024-06-11 05:51:16,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.57 | bwd_microstep: 681.13 | bwd_inner_microstep: 681.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 05:51:18,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3496
[2024-06-11 05:51:20,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1251.83 | bwd_inner_microstep: 1251.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2940
[2024-06-11 05:51:22,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.49 | bwd_microstep: 1099.53 | bwd_inner_microstep: 1099.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474
[2024-06-11 05:51:23,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.87 | bwd_microstep: 1313.39 | bwd_inner_microstep: 1313.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963
[2024-06-11 05:51:25,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.32 | bwd_microstep: 893.87 | bwd_inner_microstep: 893.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3561
[2024-06-11 05:51:27,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.34 | bwd_microstep: 1694.93 | bwd_inner_microstep: 1694.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3664
[2024-06-11 05:51:29,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1483.17 | bwd_inner_microstep: 1483.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517
[2024-06-11 05:51:31,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.67 | bwd_microstep: 1584.30 | bwd_inner_microstep: 1584.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440
[2024-06-11 05:51:33,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1283.21 | bwd_inner_microstep: 1283.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529
[2024-06-11 05:51:35,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.06 | bwd_microstep: 1449.07 | bwd_inner_microstep: 1449.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3560
[2024-06-11 05:51:37,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.91 | bwd_microstep: 1334.70 | bwd_inner_microstep: 1334.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508
[2024-06-11 05:51:39,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1487.18 | bwd_inner_microstep: 1487.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829
[2024-06-11 05:51:41,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.49 | bwd_microstep: 1461.94 | bwd_inner_microstep: 1461.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-11 05:51:42,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.95 | bwd_microstep: 977.13 | bwd_inner_microstep: 977.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672
[2024-06-11 05:51:44,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1527.39 | bwd_inner_microstep: 1527.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606
[2024-06-11 05:51:46,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1513.40 | bwd_inner_microstep: 1513.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936
[2024-06-11 05:51:47,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.69 | bwd_microstep: 729.73 | bwd_inner_microstep: 729.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3550
[2024-06-11 05:51:50,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.85 | bwd_microstep: 1567.01 | bwd_inner_microstep: 1566.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1999
[2024-06-11 05:51:51,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.56 | bwd_microstep: 836.12 | bwd_inner_microstep: 836.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615
[2024-06-11 05:51:51,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.74 | bwd_microstep: 259.99 | bwd_inner_microstep: 259.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431
[2024-06-11 05:51:53,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.20 | bwd_microstep: 1281.63 | bwd_inner_microstep: 1281.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 05:51:55,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.19 | bwd_microstep: 1487.12 | bwd_inner_microstep: 1487.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 05:51:57,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.39 | bwd_microstep: 1450.68 | bwd_inner_microstep: 1450.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630
[2024-06-11 05:51:59,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1405.26 | bwd_inner_microstep: 1405.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3766
[2024-06-11 05:52:01,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.08 | bwd_microstep: 1566.22 | bwd_inner_microstep: 1566.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615
[2024-06-11 05:52:09,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.31 | optimizer_step: 6.60
[2024-06-11 05:52:09,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 7289.32 | bwd_inner_microstep: 1660.24 | bwd_allreduce_microstep: 5629.00 | step_microstep: 39.99
[2024-06-11 05:52:09,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15589.89 | bwd: 47460.45 | bwd_inner: 41830.51 | bwd_allreduce: 5629.25 | step: 41.55
{'loss': 1.156, 'learning_rate': 1.224754684885099e-07, 'epoch': 0.97}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052
[2024-06-11 05:52:10,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.81 | bwd_microstep: 907.22 | bwd_inner_microstep: 907.08 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562
[2024-06-11 05:52:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1495.35 | bwd_inner_microstep: 1495.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861
[2024-06-11 05:52:14,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.35 | bwd_microstep: 1557.98 | bwd_inner_microstep: 1557.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496
[2024-06-11 05:52:16,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.67 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545
[2024-06-11 05:52:18,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.26 | bwd_microstep: 1395.77 | bwd_inner_microstep: 1395.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233
[2024-06-11 05:52:20,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.46 | bwd_microstep: 864.02 | bwd_inner_microstep: 863.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 05:52:21,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968
[2024-06-11 05:52:23,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.09 | bwd_microstep: 892.11 | bwd_inner_microstep: 892.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 05:52:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.73 | bwd_microstep: 1411.30 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3729
[2024-06-11 05:52:27,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.69 | bwd_microstep: 1732.59 | bwd_inner_microstep: 1732.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4017
[2024-06-11 05:52:29,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.76 | bwd_microstep: 1704.33 | bwd_inner_microstep: 1704.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977
[2024-06-11 05:52:30,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.40 | bwd_microstep: 798.00 | bwd_inner_microstep: 797.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 05:52:32,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.18 | bwd_microstep: 1383.51 | bwd_inner_microstep: 1383.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1865
[2024-06-11 05:52:33,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.83 | bwd_microstep: 709.03 | bwd_inner_microstep: 709.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869
[2024-06-11 05:52:34,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.92 | bwd_microstep: 677.93 | bwd_inner_microstep: 677.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402
[2024-06-11 05:52:36,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1436.07 | bwd_inner_microstep: 1436.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640
[2024-06-11 05:52:38,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1314.96 | bwd_inner_microstep: 1314.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 05:52:40,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1489.46 | bwd_inner_microstep: 1489.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530
[2024-06-11 05:52:42,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1293.27 | bwd_inner_microstep: 1293.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3828
[2024-06-11 05:52:44,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.76 | bwd_microstep: 1511.21 | bwd_inner_microstep: 1511.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539
[2024-06-11 05:52:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.97 | bwd_microstep: 1594.31 | bwd_inner_microstep: 1594.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3482
[2024-06-11 05:52:48,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553
[2024-06-11 05:53:07,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1410.96 | bwd_inner_microstep: 1410.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 05:53:09,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.29 | bwd_microstep: 1651.74 | bwd_inner_microstep: 1651.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476
[2024-06-11 05:53:11,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1374.88 | bwd_inner_microstep: 1374.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719
[2024-06-11 05:53:13,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.37 | bwd_microstep: 1433.35 | bwd_inner_microstep: 1433.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3777
[2024-06-11 05:53:15,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.89 | bwd_microstep: 1347.91 | bwd_inner_microstep: 1347.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561
[2024-06-11 05:53:17,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1295.41 | bwd_inner_microstep: 1295.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822
[2024-06-11 05:53:19,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 1555.41 | bwd_inner_microstep: 1555.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616
[2024-06-11 05:53:21,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.62 | bwd_microstep: 1307.04 | bwd_inner_microstep: 1307.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3389
[2024-06-11 05:53:23,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1396.42 | bwd_inner_microstep: 1396.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388
[2024-06-11 05:53:25,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.06 | optimizer_step: 6.62
[2024-06-11 05:53:25,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1468.34 | bwd_inner_microstep: 1460.28 | bwd_allreduce_microstep: 8.01 | step_microstep: 38.76
[2024-06-11 05:53:25,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15962.63 | bwd: 42615.84 | bwd_inner: 42606.82 | bwd_allreduce: 8.29 | step: 40.35
�█▋| 1663/1726 [29:10:35<1:05:29, 62.38s/it]
 96%|█████████▋| 1664/1726 [29:11:36<1:04:00, 61.94s/it]


 96%|█████████▋| 1664/1726 [29:11:36<1:04:00, 61.94s/it]
 96%|█████████▋| 1665/1726 [29:12:39<1:03:05, 62.06s/it]


 96%|█████████▋| 1665/1726 [29:12:39<1:03:05, 62.06s/it]
 97%|█████████▋| 1666/1726 [29:13:42<1:02:31, 62.52s/it]


 97%|█████████▋| 1666/1726 [29:13:42<1:02:31, 62.52s/it]
 97%|█████████▋| 1667/1726 [29:14:46<1:01:44, 62.78s/it]


 97%|█████████▋| 1667/1726 [29:14:46<1:01:44, 62.78s/it]
 97%|█████████▋| 1668/1726 [29:16:02<1:04:29, 66.71s/it]
                                                  {'loss': 1.1779, 'learning_rate': 1.1836300695074354e-07, 'epoch': 0.97}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4476
[2024-06-11 05:53:27,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.25 | bwd_microstep: 1812.88 | bwd_inner_microstep: 1812.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3923
[2024-06-11 05:53:29,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.30 | bwd_microstep: 1423.19 | bwd_inner_microstep: 1423.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 05:53:31,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1343.19 | bwd_inner_microstep: 1343.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400
[2024-06-11 05:53:33,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.18 | bwd_microstep: 1181.85 | bwd_inner_microstep: 1181.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600
[2024-06-11 05:53:35,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1438.42 | bwd_inner_microstep: 1438.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-11 05:53:37,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.68 | bwd_microstep: 1431.57 | bwd_inner_microstep: 1431.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-11 05:53:38,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.13 | bwd_microstep: 1150.08 | bwd_inner_microstep: 1150.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3598
[2024-06-11 05:53:40,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.30 | bwd_microstep: 1310.50 | bwd_inner_microstep: 1310.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-11 05:53:42,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.96 | bwd_microstep: 1531.34 | bwd_inner_microstep: 1531.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570
[2024-06-11 05:53:44,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1395.51 | bwd_inner_microstep: 1395.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502
[2024-06-11 05:53:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1585.82 | bwd_inner_microstep: 1585.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3492
[2024-06-11 05:53:49,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1534.58 | bwd_inner_microstep: 1534.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506
[2024-06-11 05:53:50,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.76 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3638
[2024-06-11 05:53:53,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.81 | bwd_microstep: 1621.46 | bwd_inner_microstep: 1621.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637
[2024-06-11 05:53:55,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.28 | bwd_microstep: 1514.45 | bwd_inner_microstep: 1514.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690
[2024-06-11 05:53:57,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1330.57 | bwd_inner_microstep: 1330.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3638
[2024-06-11 05:53:58,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1280.86 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631
[2024-06-11 05:54:00,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1416.81 | bwd_inner_microstep: 1416.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592
[2024-06-11 05:54:03,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1606.85 | bwd_inner_microstep: 1606.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820
[2024-06-11 05:54:05,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1554.69 | bwd_inner_microstep: 1554.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1974
[2024-06-11 05:54:06,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.51 | bwd_microstep: 703.42 | bwd_inner_microstep: 703.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-11 05:54:08,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.51 | bwd_microstep: 1537.51 | bwd_inner_microstep: 1537.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-11 05:54:10,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.44 | bwd_microstep: 1310.23 | bwd_inner_microstep: 1310.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541
[2024-06-11 05:54:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.57 | bwd_microstep: 1497.42 | bwd_inner_microstep: 1497.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 05:54:13,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1353.27 | bwd_inner_microstep: 1353.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 05:54:16,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1496.97 | bwd_inner_microstep: 1496.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-11 05:54:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1395.33 | bwd_inner_microstep: 1395.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 05:54:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 05:54:21,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 05:54:23,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1251.08 | bwd_inner_microstep: 1251.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3594
[2024-06-11 05:54:25,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.60 | bwd_microstep: 1363.63 | bwd_inner_microstep: 1363.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3795
[2024-06-11 05:56:00,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.30 | optimizer_step: 6.58
[2024-06-11 05:56:00,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.37 | bwd_microstep: 94215.50 | bwd_inner_microstep: 1981.51 | bwd_allreduce_microstep: 92233.92 | step_microstep: 40.01
[2024-06-11 05:56:00,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16960.24 | bwd: 137838.80 | bwd_inner: 45603.96 | bwd_allreduce: 92234.16 | step: 41.46
{'loss': 1.1635, 'learning_rate': 1.1432056843511342e-07, 'epoch': 0.97}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472
[2024-06-11 05:56:02,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1397.28 | bwd_inner_microstep: 1397.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 05:56:04,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1369.63 | bwd_inner_microstep: 1369.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3883
[2024-06-11 05:56:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.24 | bwd_microstep: 1671.98 | bwd_inner_microstep: 1671.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832
[2024-06-11 05:56:08,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.06 | bwd_microstep: 1446.25 | bwd_inner_microstep: 1446.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821
[2024-06-11 05:56:10,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.16 | bwd_microstep: 1444.37 | bwd_inner_microstep: 1444.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500
[2024-06-11 05:56:12,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.12 | bwd_microstep: 1280.78 | bwd_inner_microstep: 1280.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428
[2024-06-11 05:56:14,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1341.28 | bwd_inner_microstep: 1341.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474
[2024-06-11 05:56:15,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.82 | bwd_microstep: 1187.32 | bwd_inner_microstep: 1187.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025
[2024-06-11 05:56:17,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.32 | bwd_microstep: 808.88 | bwd_inner_microstep: 808.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406
[2024-06-11 05:56:18,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.29 | bwd_microstep: 1147.71 | bwd_inner_microstep: 1147.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2167
[2024-06-11 05:56:19,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.80 | bwd_microstep: 788.55 | bwd_inner_microstep: 788.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3812
[2024-06-11 05:56:21,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.96 | bwd_microstep: 1622.19 | bwd_inner_microstep: 1622.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504
[2024-06-11 05:56:23,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1482.31 | bwd_inner_microstep: 1482.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3938
[2024-06-11 05:56:26,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.29 | bwd_microstep: 1758.53 | bwd_inner_microstep: 1758.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 05:56:28,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2103
[2024-06-11 05:56:29,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.01 | bwd_microstep: 952.84 | bwd_inner_microstep: 952.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-11 05:56:31,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1425.34 | bwd_inner_microstep: 1425.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-11 05:56:32,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 696.58 | bwd_inner_microstep: 696.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 05:56:34,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1279.67 | bwd_inner_microstep: 1279.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3573
[2024-06-11 05:56:36,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.76 | bwd_microstep: 1235.42 | bwd_inner_microstep: 1235.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-11 05:56:37,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1394.41 | bwd_inner_microstep: 1394.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465
[2024-06-11 05:56:39,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.08 | bwd_microstep: 1212.76 | bwd_inner_microstep: 1212.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3824
[2024-06-11 05:56:41,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1386.89 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 05:56:43,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.38 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3549
[2024-06-11 05:56:45,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1420.76 | bwd_inner_microstep: 1420.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3684
[2024-06-11 05:56:47,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.38 | bwd_microstep: 1588.80 | bwd_inner_microstep: 1588.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3798
[2024-06-11 05:56:49,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1413.93 | bwd_inner_microstep: 1413.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431
[2024-06-11 05:56:51,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1254.10 | bwd_inner_microstep: 1254.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2481
[2024-06-11 05:56:52,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.54 | bwd_microstep: 1062.89 | bwd_inner_microstep: 1062.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576
[2024-06-11 05:56:54,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1528.73 | bwd_inner_microstep: 1528.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465
[2024-06-11 05:56:56,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.62 | bwd_microstep: 1570.09 | bwd_inner_microstep: 1570.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-11 05:57:01,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-11 05:57:01,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.39 | bwd_microstep: 4313.79 | bwd_inner_microstep: 1813.57 | bwd_allreduce_microstep: 2500.16 | step_microstep: 38.94
[2024-06-11 05:57:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15918.39 | bwd: 45152.17 | bwd_inner: 42651.11 | bwd_allreduce: 2500.39 | step: 40.45
{'loss': 1.2243, 'learning_rate': 1.1034816717906405e-07, 'epoch': 0.97}
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3476
[2024-06-11 05:57:04,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.50 | bwd_microstep: 1534.67 | bwd_inner_microstep: 1534.58 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3871
[2024-06-11 05:57:06,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.82 | bwd_microstep: 1628.08 | bwd_inner_microstep: 1628.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 05:57:08,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1492.28 | bwd_inner_microstep: 1492.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3915
[2024-06-11 05:57:10,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.89 | bwd_microstep: 1688.56 | bwd_inner_microstep: 1688.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516
[2024-06-11 05:57:12,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 05:57:14,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.87 | bwd_microstep: 1246.77 | bwd_inner_microstep: 1246.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 05:57:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 792.64 | bwd_inner_microstep: 792.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501
[2024-06-11 05:57:17,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1388.28 | bwd_inner_microstep: 1388.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612
[2024-06-11 05:57:19,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1310.11 | bwd_inner_microstep: 1310.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466
[2024-06-11 05:57:20,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447
[2024-06-11 05:57:22,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.52 | bwd_microstep: 1281.14 | bwd_inner_microstep: 1281.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 05:57:24,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1279.29 | bwd_inner_microstep: 1279.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4812
[2024-06-11 05:57:26,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 692.65 | bwd_microstep: 1830.16 | bwd_inner_microstep: 1830.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998
[2024-06-11 05:57:28,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.53 | bwd_microstep: 892.92 | bwd_inner_microstep: 892.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 05:57:30,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1347.49 | bwd_inner_microstep: 1347.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2143
[2024-06-11 05:57:31,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.17 | bwd_microstep: 957.07 | bwd_inner_microstep: 957.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2169
[2024-06-11 05:57:32,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.95 | bwd_microstep: 994.92 | bwd_inner_microstep: 994.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 05:57:34,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3672
[2024-06-11 05:57:36,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.31 | bwd_microstep: 1356.25 | bwd_inner_microstep: 1356.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618
[2024-06-11 05:57:38,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1509.75 | bwd_inner_microstep: 1509.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 05:57:40,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1407.88 | bwd_inner_microstep: 1407.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3820
[2024-06-11 05:57:42,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1257.06 | bwd_inner_microstep: 1257.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761
[2024-06-11 05:57:44,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1439.60 | bwd_inner_microstep: 1439.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486
[2024-06-11 05:57:45,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.61 | bwd_microstep: 1186.31 | bwd_inner_microstep: 1186.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591
[2024-06-11 05:57:47,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.79 | bwd_microstep: 1307.28 | bwd_inner_microstep: 1307.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721
[2024-06-11 05:57:49,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.77 | bwd_microstep: 1534.01 | bwd_inner_microstep: 1533.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3556
[2024-06-11 05:57:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.88 | bwd_microstep: 1261.91 | bwd_inner_microstep: 1261.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-11 05:57:52,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.26 | bwd_microstep: 686.15 | bwd_inner_microstep: 686.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3440
[2024-06-11 05:57:54,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.07 | bwd_microstep: 1471.28 | bwd_inner_microstep: 1471.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 05:57:56,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.52 | bwd_microstep: 1644.59 | bwd_inner_microstep: 1644.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815
[2024-06-11 05:57:58,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.55 | bwd_microstep: 1583.49 | bwd_inner_microstep: 1583.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 05:58:04,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.59
[2024-06-11 05:58:04,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 4731.30 | bwd_inner_microstep: 1576.09 | bwd_allreduce_microstep: 3155.12 | step_microstep: 39.63
[2024-06-11 05:58:04,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16025.41 | bwd: 46043.89 | bwd_inner: 42887.75 | bwd_allreduce: 3155.41 | step: 41.14
{'loss': 1.151, 'learning_rate': 1.0644581717337288e-07, 'epoch': 0.97}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5015
[2024-06-11 05:58:06,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.36 | bwd_microstep: 1786.81 | bwd_inner_microstep: 1786.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481
[2024-06-11 05:58:08,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1214.84 | bwd_inner_microstep: 1214.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3870
[2024-06-11 05:58:10,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1463.10 | bwd_inner_microstep: 1463.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 05:58:12,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.07 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806
[2024-06-11 05:58:14,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.42 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 05:58:16,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1281.87 | bwd_inner_microstep: 1281.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3733
[2024-06-11 05:58:18,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1496.05 | bwd_inner_microstep: 1496.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959
[2024-06-11 05:58:19,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.35 | bwd_microstep: 795.15 | bwd_inner_microstep: 795.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711
[2024-06-11 05:58:21,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1630.86 | bwd_inner_microstep: 1630.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915
[2024-06-11 05:58:22,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.23 | bwd_microstep: 687.37 | bwd_inner_microstep: 687.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 05:58:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1391.57 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3627
[2024-06-11 05:58:26,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1475.26 | bwd_inner_microstep: 1475.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943
[2024-06-11 05:58:28,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.14 | bwd_microstep: 1194.38 | bwd_inner_microstep: 1194.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2504
[2024-06-11 05:58:29,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.06 | bwd_microstep: 1056.84 | bwd_inner_microstep: 1056.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2743
[2024-06-11 05:58:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.34 | bwd_microstep: 1046.06 | bwd_inner_microstep: 1046.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-11 05:58:32,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.24 | bwd_microstep: 1245.79 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530
[2024-06-11 05:58:35,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1490.74 | bwd_inner_microstep: 1490.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514
[2024-06-11 05:58:37,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.08 | bwd_microstep: 1486.58 | bwd_inner_microstep: 1486.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587
[2024-06-11 05:58:39,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1411.39 | bwd_inner_microstep: 1411.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2290
[2024-06-11 05:58:40,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.48 | bwd_microstep: 940.36 | bwd_inner_microstep: 940.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554
[2024-06-11 05:58:42,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1502.89 | bwd_inner_microstep: 1502.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509
[2024-06-11 05:58:44,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1355.24 | bwd_inner_microstep: 1355.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2097
[2024-06-11 05:58:45,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.42 | bwd_microstep: 827.74 | bwd_inner_microstep: 827.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473
[2024-06-11 05:58:47,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1376.67 | bwd_inner_microstep: 1376.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832
[2024-06-11 05:58:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.13 | bwd_microstep: 1493.33 | bwd_inner_microstep: 1493.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396
[2024-06-11 05:58:51,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1247.07 | bwd_inner_microstep: 1247.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3873
[2024-06-11 05:58:53,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.53 | bwd_microstep: 1580.02 | bwd_inner_microstep: 1580.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 05:58:55,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1546.46 | bwd_inner_microstep: 1546.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571
[2024-06-11 05:58:57,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1592.77 | bwd_inner_microstep: 1592.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782
[2024-06-11 05:58:59,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.83 | bwd_microstep: 1639.06 | bwd_inner_microstep: 1639.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 05:59:01,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1487.94 | bwd_inner_microstep: 1487.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-11 05:59:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.11 | optimizer_step: 6.61
[2024-06-11 05:59:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.85 | bwd_microstep: 2357.32 | bwd_inner_microstep: 1490.01 | bwd_allreduce_microstep: 867.25 | step_microstep: 38.71
[2024-06-11 05:59:04,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16123.65 | bwd: 44095.15 | bwd_inner: 43226.99 | bwd_allreduce: 867.48 | step: 40.18
{'loss': 1.1802, 'learning_rate': 1.0261353216209691e-07, 'epoch': 0.97}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408
[2024-06-11 05:59:06,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.46 | bwd_microstep: 1241.75 | bwd_inner_microstep: 1241.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 05:59:08,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934
[2024-06-11 05:59:09,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.01 | bwd_microstep: 694.90 | bwd_inner_microstep: 694.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3857
[2024-06-11 05:59:11,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1362.87 | bwd_inner_microstep: 1362.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871
[2024-06-11 05:59:12,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.04 | bwd_microstep: 708.54 | bwd_inner_microstep: 708.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467
[2024-06-11 05:59:14,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1407.64 | bwd_inner_microstep: 1407.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-11 05:59:16,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1252.44 | bwd_inner_microstep: 1252.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 05:59:17,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1346.57 | bwd_inner_microstep: 1346.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430
[2024-06-11 05:59:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.52 | bwd_microstep: 1157.85 | bwd_inner_microstep: 1157.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4012
[2024-06-11 05:59:21,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1443.65 | bwd_inner_microstep: 1443.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 05:59:23,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.86 | bwd_microstep: 1488.05 | bwd_inner_microstep: 1488.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3538
[2024-06-11 05:59:25,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1564.20 | bwd_inner_microstep: 1564.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661
[2024-06-11 05:59:27,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.24 | bwd_microstep: 1571.95 | bwd_inner_microstep: 1571.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-11 05:59:30,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.84 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681
[2024-06-11 05:59:32,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.53 | bwd_microstep: 1614.75 | bwd_inner_microstep: 1614.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 05:59:34,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1353.54 | bwd_inner_microstep: 1353.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527
[2024-06-11 05:59:36,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1422.55 | bwd_inner_microstep: 1422.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969
[2024-06-11 05:59:37,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.37 | bwd_microstep: 831.43 | bwd_inner_microstep: 831.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532
[2024-06-11 05:59:39,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1450.14 | bwd_inner_microstep: 1450.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2102
[2024-06-11 05:59:40,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.78 | bwd_microstep: 922.26 | bwd_inner_microstep: 922.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 05:59:42,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1557.13 | bwd_inner_microstep: 1557.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3556
[2024-06-11 05:59:44,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1568.55 | bwd_inner_microstep: 1568.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716
[2024-06-11 05:59:46,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1535.30 | bwd_inner_microstep: 1535.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718
[2024-06-11 05:59:48,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.65 | bwd_microstep: 1443.95 | bwd_inner_microstep: 1443.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 05:59:50,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.13 | bwd_microstep: 1394.60 | bwd_inner_microstep: 1394.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2018
[2024-06-11 05:59:51,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.81 | bwd_microstep: 716.91 | bwd_inner_microstep: 716.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717
[2024-06-11 05:59:53,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1337.31 | bwd_inner_microstep: 1337.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559
[2024-06-11 05:59:55,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1332.41 | bwd_inner_microstep: 1332.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271
[2024-06-11 05:59:56,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.66 | bwd_microstep: 970.95 | bwd_inner_microstep: 970.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595
[2024-06-11 05:59:58,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1308.56 | bwd_inner_microstep: 1308.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3587
[2024-06-11 06:00:00,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1630.82 | bwd_inner_microstep: 1630.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270
[2024-06-11 06:00:06,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60
[2024-06-11 06:00:06,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.26 | bwd_microstep: 4721.70 | bwd_inner_microstep: 990.28 | bwd_allreduce_microstep: 3731.36 | step_microstep: 38.06
[2024-06-11 06:00:06,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15550.52 | bwd: 45321.30 | bwd_inner: 41589.02 | bwd_allreduce: 3731.60 | step: 39.59


 97%|█████████▋| 1668/1726 [29:16:02<1:04:29, 66.71s/it]
 97%|█████████▋| 1669/1726 [29:18:37<1:28:34, 93.24s/it]


 97%|█████████▋| 1669/1726 [29:18:37<1:28:34, 93.24s/it]
 97%|█████████▋| 1670/1726 [29:19:38<1:18:06, 83.69s/it]


 97%|█████████▋| 1670/1726 [29:19:38<1:18:06, 83.69s/it]
 97%|█████████▋| 1671/1726 [29:20:41<1:10:51, 77.31s/it]


 97%|█████████▋| 1671/1726 [29:20:41<1:10:51, 77.31s/it]
 97%|█████████▋| 1672/1726 [29:21:41<1:05:03, 72.28s/it]


 97%|█████████▋| 1672/1726 [29:21:41<1:05:03, 72.28s/it]
 97%|█████████▋| 1673/1726 [29:22:42<1:00:54, 68.96s/it]
              {'loss': 1.1712, 'learning_rate': 9.885132564252386e-08, 'epoch': 0.97}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 06:00:07,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.79 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.62 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404
[2024-06-11 06:00:09,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.98 | bwd_microstep: 1146.02 | bwd_inner_microstep: 1146.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850
[2024-06-11 06:00:11,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.56 | bwd_microstep: 1468.16 | bwd_inner_microstep: 1468.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848
[2024-06-11 06:00:13,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1464.80 | bwd_inner_microstep: 1464.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 06:00:15,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.02 | bwd_microstep: 1279.23 | bwd_inner_microstep: 1279.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-11 06:00:17,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1403.53 | bwd_inner_microstep: 1403.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997
[2024-06-11 06:00:18,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.56 | bwd_microstep: 710.45 | bwd_inner_microstep: 710.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414
[2024-06-11 06:00:19,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.18 | bwd_microstep: 1153.33 | bwd_inner_microstep: 1153.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447
[2024-06-11 06:00:21,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1285.02 | bwd_inner_microstep: 1284.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1915
[2024-06-11 06:00:22,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.65 | bwd_microstep: 786.63 | bwd_inner_microstep: 786.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510
[2024-06-11 06:00:24,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1485.19 | bwd_inner_microstep: 1485.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683
[2024-06-11 06:00:27,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.84 | bwd_microstep: 1722.54 | bwd_inner_microstep: 1722.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3929
[2024-06-11 06:00:29,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.45 | bwd_microstep: 1789.29 | bwd_inner_microstep: 1789.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647
[2024-06-11 06:00:31,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1510.69 | bwd_inner_microstep: 1510.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 656
[2024-06-11 06:00:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.89 | bwd_microstep: 277.53 | bwd_inner_microstep: 277.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633
[2024-06-11 06:00:33,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1316.96 | bwd_inner_microstep: 1316.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628
[2024-06-11 06:00:35,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.33 | bwd_microstep: 1409.62 | bwd_inner_microstep: 1409.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3509
[2024-06-11 06:00:37,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1192.15 | bwd_inner_microstep: 1192.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753
[2024-06-11 06:00:39,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.33 | bwd_microstep: 1643.97 | bwd_inner_microstep: 1643.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629
[2024-06-11 06:00:41,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1512.77 | bwd_inner_microstep: 1512.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546
[2024-06-11 06:00:43,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1494.51 | bwd_inner_microstep: 1494.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726
[2024-06-11 06:00:46,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.25 | bwd_microstep: 1638.90 | bwd_inner_microstep: 1638.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2112
[2024-06-11 06:00:47,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.30 | bwd_microstep: 925.59 | bwd_inner_microstep: 925.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001
[2024-06-11 06:00:48,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 800.91 | bwd_inner_microstep: 800.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757
[2024-06-11 06:00:50,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.94 | bwd_microstep: 1547.62 | bwd_inner_microstep: 1547.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606
[2024-06-11 06:00:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.53 | bwd_microstep: 1613.05 | bwd_inner_microstep: 1609.93 | bwd_allreduce_microstep: 3.05 | step_microstep: 0.15
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1949
[2024-06-11 06:00:53,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.38 | bwd_microstep: 776.95 | bwd_inner_microstep: 776.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3098
[2024-06-11 06:00:55,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.02 | bwd_microstep: 1057.55 | bwd_inner_microstep: 1057.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3265
[2024-06-11 06:00:57,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1481.76 | bwd_inner_microstep: 1481.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816
[2024-06-11 06:00:59,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.56 | bwd_microstep: 1455.62 | bwd_inner_microstep: 1455.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3592
[2024-06-11 06:01:01,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.28 | bwd_microstep: 1516.04 | bwd_inner_microstep: 1516.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924
[2024-06-11 06:01:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.35 | optimizer_step: 6.58
[2024-06-11 06:01:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.83 | bwd_microstep: 3669.94 | bwd_inner_microstep: 1033.70 | bwd_allreduce_microstep: 2636.18 | step_microstep: 40.08
[2024-06-11 06:01:05,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15324.45 | bwd: 43879.14 | bwd_inner: 41238.82 | bwd_allreduce: 2639.55 | step: 41.99
{'loss': 1.0958, 'learning_rate': 9.51592108651278e-08, 'epoch': 0.97}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481
[2024-06-11 06:01:07,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1468.87 | bwd_inner_microstep: 1468.79 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406
[2024-06-11 06:01:09,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.18 | bwd_microstep: 1277.24 | bwd_inner_microstep: 1277.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400
[2024-06-11 06:01:11,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1369.64 | bwd_inner_microstep: 1369.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830
[2024-06-11 06:01:13,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1558.87 | bwd_inner_microstep: 1558.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-11 06:01:15,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 1651.62 | bwd_inner_microstep: 1651.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 06:01:17,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1346.72 | bwd_inner_microstep: 1346.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-11 06:01:19,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.18 | bwd_microstep: 1633.13 | bwd_inner_microstep: 1633.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518
[2024-06-11 06:01:21,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1289.13 | bwd_inner_microstep: 1289.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403
[2024-06-11 06:01:23,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1278.86 | bwd_inner_microstep: 1278.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569
[2024-06-11 06:01:25,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1404.68 | bwd_inner_microstep: 1404.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-11 06:01:26,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.57 | bwd_microstep: 1156.65 | bwd_inner_microstep: 1156.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983
[2024-06-11 06:01:28,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.65 | bwd_microstep: 798.07 | bwd_inner_microstep: 798.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-11 06:01:30,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1529.31 | bwd_inner_microstep: 1529.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666
[2024-06-11 06:01:32,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.89 | bwd_microstep: 1612.58 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3659
[2024-06-11 06:01:34,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.57 | bwd_microstep: 1549.67 | bwd_inner_microstep: 1549.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688
[2024-06-11 06:01:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1623.86 | bwd_inner_microstep: 1623.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423
[2024-06-11 06:01:38,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.02 | bwd_microstep: 1307.73 | bwd_inner_microstep: 1307.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3635
[2024-06-11 06:01:40,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1438.54 | bwd_inner_microstep: 1438.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447
[2024-06-11 06:01:42,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.38 | bwd_microstep: 1158.00 | bwd_inner_microstep: 1157.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3626
[2024-06-11 06:01:44,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.49 | bwd_microstep: 1654.81 | bwd_inner_microstep: 1654.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030
[2024-06-11 06:01:45,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.53 | bwd_microstep: 904.80 | bwd_inner_microstep: 904.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 06:01:47,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1647.18 | bwd_inner_microstep: 1647.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-11 06:01:49,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1427.01 | bwd_inner_microstep: 1426.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024
[2024-06-11 06:01:51,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.20 | bwd_microstep: 808.12 | bwd_inner_microstep: 808.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3609
[2024-06-11 06:01:52,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.55 | bwd_microstep: 1375.60 | bwd_inner_microstep: 1375.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-11 06:01:54,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1313.88 | bwd_inner_microstep: 1313.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473
[2024-06-11 06:01:56,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1313.16 | bwd_inner_microstep: 1313.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2090
[2024-06-11 06:01:57,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.93 | bwd_microstep: 757.65 | bwd_inner_microstep: 757.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816
[2024-06-11 06:01:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.45 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-11 06:02:01,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.68 | bwd_microstep: 1545.14 | bwd_inner_microstep: 1545.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389
[2024-06-11 06:02:03,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.89 | bwd_microstep: 1339.40 | bwd_inner_microstep: 1339.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564
[2024-06-11 06:02:06,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.59
[2024-06-11 06:02:06,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.16 | bwd_microstep: 2832.80 | bwd_inner_microstep: 1605.58 | bwd_allreduce_microstep: 1227.14 | step_microstep: 39.43
[2024-06-11 06:02:06,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16192.02 | bwd: 44758.35 | bwd_inner: 43530.22 | bwd_allreduce: 1227.43 | step: 41.03
{'loss': 1.2308, 'learning_rate': 9.153720083351358e-08, 'epoch': 0.97}
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1873
[2024-06-11 06:02:07,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.41 | bwd_microstep: 736.16 | bwd_inner_microstep: 736.10 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-11 06:02:09,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.91 | bwd_microstep: 790.52 | bwd_inner_microstep: 790.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972
[2024-06-11 06:02:10,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.62 | bwd_microstep: 738.97 | bwd_inner_microstep: 738.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299
[2024-06-11 06:02:11,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.57 | bwd_microstep: 973.39 | bwd_inner_microstep: 973.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 06:02:13,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1379.14 | bwd_inner_microstep: 1379.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493
[2024-06-11 06:02:15,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.89 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440
[2024-06-11 06:02:16,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1256.41 | bwd_inner_microstep: 1256.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428
[2024-06-11 06:02:18,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.22 | bwd_microstep: 1252.86 | bwd_inner_microstep: 1252.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500
[2024-06-11 06:02:20,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.41 | bwd_microstep: 1194.36 | bwd_inner_microstep: 1194.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432
[2024-06-11 06:02:21,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1254.73 | bwd_inner_microstep: 1254.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1886
[2024-06-11 06:02:22,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 711.22 | bwd_inner_microstep: 711.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643
[2024-06-11 06:02:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1512.42 | bwd_inner_microstep: 1512.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514
[2024-06-11 06:02:26,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.73 | bwd_microstep: 1384.73 | bwd_inner_microstep: 1384.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3985
[2024-06-11 06:02:29,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 694.63 | bwd_microstep: 1910.20 | bwd_inner_microstep: 1910.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3690
[2024-06-11 06:02:31,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1375.24 | bwd_inner_microstep: 1375.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 06:02:33,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1384.35 | bwd_inner_microstep: 1384.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821
[2024-06-11 06:02:35,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.06 | bwd_microstep: 1652.54 | bwd_inner_microstep: 1652.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664
[2024-06-11 06:02:37,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588
[2024-06-11 06:02:39,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.75 | bwd_microstep: 1507.34 | bwd_inner_microstep: 1507.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-11 06:02:41,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.10 | bwd_microstep: 1603.46 | bwd_inner_microstep: 1603.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3714
[2024-06-11 06:02:44,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.15 | bwd_microstep: 1581.42 | bwd_inner_microstep: 1581.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-11 06:02:46,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.98 | bwd_microstep: 1604.24 | bwd_inner_microstep: 1604.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446
[2024-06-11 06:02:48,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1382.69 | bwd_inner_microstep: 1382.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453
[2024-06-11 06:02:50,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1454.98 | bwd_inner_microstep: 1454.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450
[2024-06-11 06:02:51,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1284.77 | bwd_inner_microstep: 1284.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171
[2024-06-11 06:02:53,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.25 | bwd_microstep: 857.45 | bwd_inner_microstep: 857.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3630
[2024-06-11 06:02:55,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.08 | bwd_microstep: 1363.09 | bwd_inner_microstep: 1363.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703
[2024-06-11 06:02:57,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.28 | bwd_microstep: 1632.26 | bwd_inner_microstep: 1632.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051
[2024-06-11 06:02:58,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.19 | bwd_microstep: 911.78 | bwd_inner_microstep: 911.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 06:03:00,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1384.47 | bwd_inner_microstep: 1384.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 06:03:02,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1380.00 | bwd_inner_microstep: 1379.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583
[2024-06-11 06:03:07,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-11 06:03:07,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.20 | bwd_microstep: 4103.58 | bwd_inner_microstep: 1577.22 | bwd_allreduce_microstep: 2526.30 | step_microstep: 39.31
[2024-06-11 06:03:07,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15518.62 | bwd: 44263.55 | bwd_inner: 41736.30 | bwd_allreduce: 2526.56 | step: 40.93
{'loss': 1.1386, 'learning_rate': 8.798530830438579e-08, 'epoch': 0.97}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1922
[2024-06-11 06:03:08,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.12 | bwd_microstep: 877.12 | bwd_inner_microstep: 877.00 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3996
[2024-06-11 06:03:10,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.23 | bwd_microstep: 1606.03 | bwd_inner_microstep: 1606.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788
[2024-06-11 06:03:12,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.08 | bwd_microstep: 1642.29 | bwd_inner_microstep: 1642.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 06:03:14,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1398.28 | bwd_inner_microstep: 1398.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 06:03:16,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1355.17 | bwd_inner_microstep: 1355.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-11 06:03:18,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.25 | bwd_microstep: 1632.76 | bwd_inner_microstep: 1632.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809
[2024-06-11 06:03:20,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.71 | bwd_microstep: 1454.51 | bwd_inner_microstep: 1454.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-11 06:03:21,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 796.90 | bwd_inner_microstep: 796.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720
[2024-06-11 06:03:23,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1442.95 | bwd_inner_microstep: 1442.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534
[2024-06-11 06:03:25,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1397.17 | bwd_inner_microstep: 1397.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3406
[2024-06-11 06:03:27,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1403.48 | bwd_inner_microstep: 1403.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 06:03:29,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1482.78 | bwd_inner_microstep: 1482.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3428
[2024-06-11 06:03:31,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.65 | bwd_microstep: 1511.36 | bwd_inner_microstep: 1511.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625
[2024-06-11 06:03:34,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1538.88 | bwd_inner_microstep: 1538.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428
[2024-06-11 06:03:35,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.65 | bwd_microstep: 1156.25 | bwd_inner_microstep: 1156.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626
[2024-06-11 06:03:37,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1416.79 | bwd_inner_microstep: 1416.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 06:03:39,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.88 | bwd_microstep: 1656.59 | bwd_inner_microstep: 1656.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485
[2024-06-11 06:03:41,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.01 | bwd_microstep: 1186.92 | bwd_inner_microstep: 1186.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670
[2024-06-11 06:03:43,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 1429.77 | bwd_inner_microstep: 1429.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-11 06:03:45,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1510.03 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143
[2024-06-11 06:03:46,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.53 | bwd_microstep: 833.69 | bwd_inner_microstep: 833.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612
[2024-06-11 06:03:48,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.74 | bwd_microstep: 1314.00 | bwd_inner_microstep: 1313.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910
[2024-06-11 06:03:49,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.15 | bwd_microstep: 687.24 | bwd_inner_microstep: 687.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 06:03:51,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 1288.49 | bwd_inner_microstep: 1288.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3512
[2024-06-11 06:03:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.33 | bwd_microstep: 1224.28 | bwd_inner_microstep: 1224.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586
[2024-06-11 06:03:54,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1338.89 | bwd_inner_microstep: 1338.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816
[2024-06-11 06:03:57,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1687.58 | bwd_inner_microstep: 1687.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806
[2024-06-11 06:03:59,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.81 | bwd_microstep: 1516.42 | bwd_inner_microstep: 1516.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3577
[2024-06-11 06:04:01,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1459.34 | bwd_inner_microstep: 1459.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2065
[2024-06-11 06:04:02,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.95 | bwd_microstep: 950.48 | bwd_inner_microstep: 950.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985
[2024-06-11 06:04:03,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.76 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-11 06:04:07,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.40 | optimizer_step: 6.59
[2024-06-11 06:04:07,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.91 | bwd_microstep: 3640.49 | bwd_inner_microstep: 914.04 | bwd_allreduce_microstep: 2726.38 | step_microstep: 39.90
[2024-06-11 06:04:07,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15629.19 | bwd: 44636.52 | bwd_inner: 41909.12 | bwd_allreduce: 2726.67 | step: 41.44
{'loss': 1.1698, 'learning_rate': 8.450354578748876e-08, 'epoch': 0.97}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-11 06:04:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.41 | bwd_microstep: 1369.49 | bwd_inner_microstep: 1369.34 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698
[2024-06-11 06:04:11,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.34 | bwd_microstep: 1625.98 | bwd_inner_microstep: 1625.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 06:04:13,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1386.60 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840
[2024-06-11 06:04:15,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.00 | bwd_microstep: 1654.95 | bwd_inner_microstep: 1654.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 06:04:17,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 06:04:19,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.45 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 06:04:21,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1385.63 | bwd_inner_microstep: 1385.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3733
[2024-06-11 06:04:23,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1337.33 | bwd_inner_microstep: 1337.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736
[2024-06-11 06:04:25,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.38 | bwd_microstep: 1634.33 | bwd_inner_microstep: 1634.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 06:04:27,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.44 | bwd_microstep: 1290.61 | bwd_inner_microstep: 1290.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488
[2024-06-11 06:04:29,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.05 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963
[2024-06-11 06:04:30,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 798.92 | bwd_inner_microstep: 798.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502
[2024-06-11 06:04:32,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.44 | bwd_microstep: 1437.82 | bwd_inner_microstep: 1437.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3727
[2024-06-11 06:04:34,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.61 | bwd_microstep: 1729.86 | bwd_inner_microstep: 1729.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3697
[2024-06-11 06:04:36,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.59 | bwd_microstep: 1689.69 | bwd_inner_microstep: 1689.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 06:04:38,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1246.77 | bwd_inner_microstep: 1246.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683
[2024-06-11 06:04:41,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.34 | bwd_microstep: 1725.08 | bwd_inner_microstep: 1725.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2673
[2024-06-11 06:04:42,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.93 | bwd_microstep: 1120.66 | bwd_inner_microstep: 1120.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 06:04:44,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1387.72 | bwd_inner_microstep: 1387.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 06:04:46,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1410.48 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919
[2024-06-11 06:04:47,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.04 | bwd_microstep: 687.41 | bwd_inner_microstep: 687.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3460
[2024-06-11 06:04:49,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.22 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 604
[2024-06-11 06:04:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.99 | bwd_microstep: 260.37 | bwd_inner_microstep: 260.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526
[2024-06-11 06:04:51,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1395.94 | bwd_inner_microstep: 1395.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553
[2024-06-11 06:04:53,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.14 | bwd_microstep: 1396.78 | bwd_inner_microstep: 1396.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-11 06:04:55,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1652.24 | bwd_inner_microstep: 1652.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2299
[2024-06-11 06:04:57,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.67 | bwd_microstep: 1009.99 | bwd_inner_microstep: 1009.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531
[2024-06-11 06:04:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1393.64 | bwd_inner_microstep: 1393.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821
[2024-06-11 06:05:01,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1414.05 | bwd_inner_microstep: 1414.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799
[2024-06-11 06:05:03,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439
[2024-06-11 06:05:05,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1347.16 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-11 06:05:09,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.56 | optimizer_gradients: 4.14 | optimizer_step: 6.64
[2024-06-11 06:05:09,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 4311.36 | bwd_inner_microstep: 1620.78 | bwd_allreduce_microstep: 2690.52 | step_microstep: 41.80
[2024-06-11 06:05:09,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16067.72 | bwd: 45869.74 | bwd_inner: 43178.19 | bwd_allreduce: 2690.81 | step: 43.59


 97%|█████████▋| 1673/1726 [29:22:42<1:00:54, 68.96s/it]
 97%|█████████▋| 1674/1726 [29:23:42<57:19, 66.14s/it]


 97%|█████████▋| 1674/1726 [29:23:42<57:19, 66.14s/it]
 97%|█████████▋| 1675/1726 [29:24:43<54:59, 64.69s/it]


 97%|█████████▋| 1675/1726 [29:24:43<54:59, 64.69s/it]
 97%|█████████▋| 1676/1726 [29:25:43<52:45, 63.32s/it]


 97%|█████████▋| 1676/1726 [29:25:43<52:45, 63.32s/it]
 97%|█████████▋| 1677/1726 [29:26:44<51:02, 62.51s/it]


 97%|█████████▋| 1677/1726 [29:26:44<51:02, 62.51s/it]
 97%|█████████▋| 1678/1726 [29:27:46<49:57, 62.44s/it]
  {'loss': 1.2019, 'learning_rate': 8.109192554557333e-08, 'epoch': 0.97}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 06:05:11,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1369.44 | bwd_inner_microstep: 1369.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2350
[2024-06-11 06:05:13,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.55 | bwd_microstep: 920.46 | bwd_inner_microstep: 920.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928
[2024-06-11 06:05:14,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.54 | bwd_microstep: 723.56 | bwd_inner_microstep: 723.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839
[2024-06-11 06:05:16,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.70 | bwd_microstep: 1649.81 | bwd_inner_microstep: 1649.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 06:05:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3707
[2024-06-11 06:05:20,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.36 | bwd_microstep: 1457.42 | bwd_inner_microstep: 1457.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 06:05:22,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.03 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-11 06:05:23,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.52 | bwd_microstep: 1153.69 | bwd_inner_microstep: 1153.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704
[2024-06-11 06:05:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 06:05:27,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1287.90 | bwd_inner_microstep: 1287.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2900
[2024-06-11 06:05:29,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.90 | bwd_microstep: 1031.61 | bwd_inner_microstep: 1031.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447
[2024-06-11 06:05:30,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1350.67 | bwd_inner_microstep: 1350.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3511
[2024-06-11 06:05:33,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.39 | bwd_microstep: 1535.69 | bwd_inner_microstep: 1535.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643
[2024-06-11 06:05:35,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.65 | bwd_microstep: 1617.52 | bwd_inner_microstep: 1617.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2296
[2024-06-11 06:05:36,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 880.42 | bwd_inner_microstep: 880.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3540
[2024-06-11 06:05:38,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.32 | bwd_microstep: 1229.87 | bwd_inner_microstep: 1229.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667
[2024-06-11 06:05:40,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1427.37 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986
[2024-06-11 06:05:41,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.47 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1386
[2024-06-11 06:05:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.61 | bwd_microstep: 555.60 | bwd_inner_microstep: 555.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2330
[2024-06-11 06:05:43,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 889.72 | bwd_inner_microstep: 889.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823
[2024-06-11 06:05:45,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1559.95 | bwd_inner_microstep: 1559.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 06:05:47,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1400.46 | bwd_inner_microstep: 1400.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729
[2024-06-11 06:05:49,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1469.50 | bwd_inner_microstep: 1469.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3575
[2024-06-11 06:05:51,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.43 | bwd_microstep: 1242.29 | bwd_inner_microstep: 1242.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554
[2024-06-11 06:05:52,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1300.12 | bwd_inner_microstep: 1300.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3711
[2024-06-11 06:05:54,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1427.53 | bwd_inner_microstep: 1427.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927
[2024-06-11 06:05:55,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.21 | bwd_microstep: 792.76 | bwd_inner_microstep: 792.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-11 06:05:58,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1602.05 | bwd_inner_microstep: 1602.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-11 06:06:00,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.42 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3791
[2024-06-11 06:06:02,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.33 | bwd_microstep: 1858.05 | bwd_inner_microstep: 1858.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2900
[2024-06-11 06:06:04,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.12 | bwd_microstep: 1183.62 | bwd_inner_microstep: 1183.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1868
[2024-06-11 06:06:15,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.30 | optimizer_step: 6.60
[2024-06-11 06:06:15,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.06 | bwd_microstep: 10466.07 | bwd_inner_microstep: 882.09 | bwd_allreduce_microstep: 9583.90 | step_microstep: 39.99
[2024-06-11 06:06:15,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15018.18 | bwd: 49752.77 | bwd_inner: 40167.94 | bwd_allreduce: 9584.14 | step: 41.52
{'loss': 1.1877, 'learning_rate': 7.775045959434568e-08, 'epoch': 0.97}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-11 06:06:17,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1471.39 | bwd_inner_microstep: 1471.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2369
[2024-06-11 06:06:18,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.92 | bwd_microstep: 1089.54 | bwd_inner_microstep: 1089.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469
[2024-06-11 06:06:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1338.75 | bwd_inner_microstep: 1338.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400
[2024-06-11 06:06:22,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.56 | bwd_microstep: 1175.41 | bwd_inner_microstep: 1175.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467
[2024-06-11 06:06:24,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 06:06:25,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.81 | bwd_microstep: 1275.10 | bwd_inner_microstep: 1275.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-11 06:06:27,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1341.97 | bwd_inner_microstep: 1341.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481
[2024-06-11 06:06:29,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.28 | bwd_microstep: 1340.21 | bwd_inner_microstep: 1340.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3736
[2024-06-11 06:06:31,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1461.84 | bwd_inner_microstep: 1461.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-11 06:06:33,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 06:06:35,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.29 | bwd_microstep: 1384.31 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973
[2024-06-11 06:06:36,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 799.34 | bwd_inner_microstep: 799.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1983
[2024-06-11 06:06:37,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.92 | bwd_microstep: 846.51 | bwd_inner_microstep: 846.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557
[2024-06-11 06:06:39,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1392.26 | bwd_inner_microstep: 1392.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3444
[2024-06-11 06:06:41,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1299.06 | bwd_inner_microstep: 1299.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3667
[2024-06-11 06:06:43,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.63 | bwd_microstep: 1492.44 | bwd_inner_microstep: 1492.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505
[2024-06-11 06:06:45,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.85 | bwd_microstep: 1574.30 | bwd_inner_microstep: 1574.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640
[2024-06-11 06:06:47,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1409.56 | bwd_inner_microstep: 1409.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 06:06:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 06:06:51,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.64 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624
[2024-06-11 06:06:53,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1409.23 | bwd_inner_microstep: 1409.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 06:06:55,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.82 | bwd_microstep: 1283.18 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3200
[2024-06-11 06:06:56,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.08 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3705
[2024-06-11 06:06:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.87 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-11 06:07:00,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.63 | bwd_microstep: 976.11 | bwd_inner_microstep: 976.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3517
[2024-06-11 06:07:02,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1444.19 | bwd_inner_microstep: 1444.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2072
[2024-06-11 06:07:03,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.30 | bwd_microstep: 820.84 | bwd_inner_microstep: 820.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3622
[2024-06-11 06:07:05,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.11 | bwd_microstep: 1675.29 | bwd_inner_microstep: 1675.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3566
[2024-06-11 06:07:07,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1430.13 | bwd_inner_microstep: 1430.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774
[2024-06-11 06:07:10,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.78 | bwd_microstep: 1635.42 | bwd_inner_microstep: 1635.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2231
[2024-06-11 06:07:11,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.54 | bwd_microstep: 929.64 | bwd_inner_microstep: 929.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3811
[2024-06-11 06:07:17,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.30 | optimizer_step: 6.58
[2024-06-11 06:07:17,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.47 | bwd_microstep: 5370.94 | bwd_inner_microstep: 2023.86 | bwd_allreduce_microstep: 3347.03 | step_microstep: 38.91
[2024-06-11 06:07:17,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15885.45 | bwd: 46087.90 | bwd_inner: 42739.94 | bwd_allreduce: 3347.26 | step: 40.66
{'loss': 1.19, 'learning_rate': 7.447915970243414e-08, 'epoch': 0.97}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 06:07:19,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.16 | bwd_microstep: 1239.33 | bwd_inner_microstep: 1239.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 06:07:21,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1399.98 | bwd_inner_microstep: 1399.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 06:07:22,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1379.37 | bwd_inner_microstep: 1379.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424
[2024-06-11 06:07:24,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.29 | bwd_microstep: 1446.10 | bwd_inner_microstep: 1446.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783
[2024-06-11 06:07:27,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.81 | bwd_microstep: 1647.52 | bwd_inner_microstep: 1647.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2927
[2024-06-11 06:07:28,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.36 | bwd_microstep: 1092.49 | bwd_inner_microstep: 1092.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-11 06:07:29,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.62 | bwd_microstep: 684.87 | bwd_inner_microstep: 684.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944
[2024-06-11 06:07:30,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 793.69 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425
[2024-06-11 06:07:32,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1346.60 | bwd_inner_microstep: 1346.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 06:07:34,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.29 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521
[2024-06-11 06:07:36,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1491.37 | bwd_inner_microstep: 1491.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618
[2024-06-11 06:07:38,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1317.31 | bwd_inner_microstep: 1317.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647
[2024-06-11 06:07:40,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1410.94 | bwd_inner_microstep: 1410.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925
[2024-06-11 06:07:41,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 791.36 | bwd_inner_microstep: 791.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498
[2024-06-11 06:07:43,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.97 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-11 06:07:45,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.02 | bwd_microstep: 1578.56 | bwd_inner_microstep: 1578.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2103
[2024-06-11 06:07:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.04 | bwd_microstep: 971.79 | bwd_inner_microstep: 971.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 06:07:48,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1295.15 | bwd_inner_microstep: 1295.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990
[2024-06-11 06:07:49,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 807.49 | bwd_inner_microstep: 807.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524
[2024-06-11 06:07:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-11 06:07:53,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1313.25 | bwd_inner_microstep: 1313.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 06:07:55,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.13 | bwd_microstep: 1661.90 | bwd_inner_microstep: 1661.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983
[2024-06-11 06:07:56,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.02 | bwd_microstep: 707.36 | bwd_inner_microstep: 707.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 06:07:58,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.41 | bwd_microstep: 1282.70 | bwd_inner_microstep: 1282.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419
[2024-06-11 06:08:00,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1281.71 | bwd_inner_microstep: 1281.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 06:08:02,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1284.28 | bwd_inner_microstep: 1284.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2283
[2024-06-11 06:08:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.94 | bwd_microstep: 1038.51 | bwd_inner_microstep: 1038.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3595
[2024-06-11 06:08:05,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.87 | bwd_microstep: 1656.00 | bwd_inner_microstep: 1655.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426
[2024-06-11 06:08:08,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.30 | bwd_microstep: 1550.98 | bwd_inner_microstep: 1550.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622
[2024-06-11 06:08:09,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.21 | bwd_microstep: 1418.79 | bwd_inner_microstep: 1418.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3592
[2024-06-11 06:08:12,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.40 | bwd_microstep: 1674.84 | bwd_inner_microstep: 1674.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3752
[2024-06-11 06:08:17,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.11 | optimizer_step: 6.61
[2024-06-11 06:08:17,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 4562.67 | bwd_inner_microstep: 1769.18 | bwd_allreduce_microstep: 2793.44 | step_microstep: 38.37
[2024-06-11 06:08:17,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15374.94 | bwd: 44292.20 | bwd_inner: 41497.86 | bwd_allreduce: 2793.67 | step: 39.97
{'loss': 1.1472, 'learning_rate': 7.12780373913402e-08, 'epoch': 0.97}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490
[2024-06-11 06:08:19,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1340.22 | bwd_inner_microstep: 1340.13 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3950
[2024-06-11 06:08:21,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1491.61 | bwd_inner_microstep: 1491.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-11 06:08:23,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1341.17 | bwd_inner_microstep: 1341.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834
[2024-06-11 06:08:25,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.68 | bwd_microstep: 1651.11 | bwd_inner_microstep: 1651.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3737
[2024-06-11 06:08:27,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 1436.05 | bwd_inner_microstep: 1436.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3424
[2024-06-11 06:08:29,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.85 | bwd_microstep: 1180.71 | bwd_inner_microstep: 1180.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-11 06:08:30,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 06:08:32,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.66 | bwd_microstep: 1284.80 | bwd_inner_microstep: 1284.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715
[2024-06-11 06:08:34,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1436.02 | bwd_inner_microstep: 1436.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3514
[2024-06-11 06:08:36,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.09 | bwd_microstep: 1431.10 | bwd_inner_microstep: 1431.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2476
[2024-06-11 06:08:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.63 | bwd_microstep: 858.60 | bwd_inner_microstep: 858.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-11 06:08:39,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.59 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2120
[2024-06-11 06:08:41,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.90 | bwd_microstep: 923.51 | bwd_inner_microstep: 923.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3684
[2024-06-11 06:08:43,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.73 | bwd_microstep: 1694.93 | bwd_inner_microstep: 1694.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2000
[2024-06-11 06:08:44,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 896.62 | bwd_inner_microstep: 896.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3449
[2024-06-11 06:08:46,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1399.29 | bwd_inner_microstep: 1399.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539
[2024-06-11 06:08:48,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1294.97 | bwd_inner_microstep: 1294.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530
[2024-06-11 06:08:50,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.77 | bwd_microstep: 1585.53 | bwd_inner_microstep: 1585.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 06:08:52,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1496.38 | bwd_inner_microstep: 1496.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630
[2024-06-11 06:08:54,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1604.53 | bwd_inner_microstep: 1604.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 06:08:56,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1554.26 | bwd_inner_microstep: 1554.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 06:08:59,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1506.32 | bwd_inner_microstep: 1506.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724
[2024-06-11 06:09:00,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1369.56 | bwd_inner_microstep: 1369.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608
[2024-06-11 06:09:03,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 1609.53 | bwd_inner_microstep: 1609.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608
[2024-06-11 06:09:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.82 | bwd_inner_microstep: 1508.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558
[2024-06-11 06:09:07,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.83 | bwd_microstep: 1589.72 | bwd_inner_microstep: 1589.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808
[2024-06-11 06:09:09,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1659.81 | bwd_inner_microstep: 1659.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078
[2024-06-11 06:09:10,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.81 | bwd_microstep: 817.83 | bwd_inner_microstep: 817.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-11 06:09:12,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.69 | bwd_microstep: 976.76 | bwd_inner_microstep: 976.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3766
[2024-06-11 06:09:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.64 | bwd_microstep: 1250.79 | bwd_inner_microstep: 1250.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3573
[2024-06-11 06:09:15,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.37 | bwd_microstep: 1210.65 | bwd_inner_microstep: 1210.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2266
[2024-06-11 06:09:21,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.16 | optimizer_step: 6.56
[2024-06-11 06:09:21,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.80 | bwd_microstep: 5882.36 | bwd_inner_microstep: 994.77 | bwd_allreduce_microstep: 4887.54 | step_microstep: 38.99
[2024-06-11 06:09:21,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16065.26 | bwd: 48024.73 | bwd_inner: 43136.19 | bwd_allreduce: 4887.82 | step: 40.58
{'loss': 1.1586, 'learning_rate': 6.814710393539869e-08, 'epoch': 0.97}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432
[2024-06-11 06:09:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.37 | bwd_microstep: 1437.79 | bwd_inner_microstep: 1437.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939
[2024-06-11 06:09:26,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1687.35 | bwd_inner_microstep: 1687.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843
[2024-06-11 06:09:28,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.85 | bwd_microstep: 1460.02 | bwd_inner_microstep: 1459.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 06:09:30,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 06:09:31,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1382.43 | bwd_inner_microstep: 1382.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757
[2024-06-11 06:09:33,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1434.21 | bwd_inner_microstep: 1434.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492
[2024-06-11 06:09:35,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3594
[2024-06-11 06:09:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.91 | bwd_microstep: 1435.10 | bwd_inner_microstep: 1435.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 06:09:39,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2207
[2024-06-11 06:09:40,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.64 | bwd_microstep: 957.77 | bwd_inner_microstep: 957.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497
[2024-06-11 06:09:42,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1287.86 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2979
[2024-06-11 06:09:44,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.65 | bwd_microstep: 1137.10 | bwd_inner_microstep: 1137.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1991
[2024-06-11 06:09:45,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.34 | bwd_microstep: 707.59 | bwd_inner_microstep: 707.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3648
[2024-06-11 06:09:47,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1574.22 | bwd_inner_microstep: 1574.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506
[2024-06-11 06:09:49,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.51 | bwd_microstep: 1552.95 | bwd_inner_microstep: 1552.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 06:09:51,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.64 | bwd_microstep: 1349.18 | bwd_inner_microstep: 1349.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828
[2024-06-11 06:09:53,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1584.77 | bwd_inner_microstep: 1584.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479
[2024-06-11 06:09:55,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1281.58 | bwd_inner_microstep: 1281.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-11 06:09:56,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.21 | bwd_microstep: 977.20 | bwd_inner_microstep: 977.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543
[2024-06-11 06:09:58,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434
[2024-06-11 06:10:00,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1255.37 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440
[2024-06-11 06:10:01,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.14 | bwd_microstep: 1155.61 | bwd_inner_microstep: 1155.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804
[2024-06-11 06:10:04,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1654.31 | bwd_inner_microstep: 1654.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-11 06:10:06,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1398.93 | bwd_inner_microstep: 1398.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827
[2024-06-11 06:10:08,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1465.08 | bwd_inner_microstep: 1465.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-11 06:10:10,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1395.09 | bwd_inner_microstep: 1395.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3547
[2024-06-11 06:10:12,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.37 | bwd_microstep: 1541.36 | bwd_inner_microstep: 1541.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641
[2024-06-11 06:10:14,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1444.99 | bwd_inner_microstep: 1444.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-11 06:10:16,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1498.82 | bwd_inner_microstep: 1498.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3559
[2024-06-11 06:10:18,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.33 | bwd_microstep: 1527.00 | bwd_inner_microstep: 1526.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3566
[2024-06-11 06:10:20,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1368.58 | bwd_inner_microstep: 1368.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729
[2024-06-11 06:10:22,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.05 | optimizer_step: 6.62
[2024-06-11 06:10:22,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.17 | bwd_microstep: 1952.46 | bwd_inner_microstep: 1541.69 | bwd_allreduce_microstep: 410.71 | step_microstep: 86.82
[2024-06-11 06:10:22,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16394.17 | bwd: 44315.04 | bwd_inner: 43903.42 | bwd_allreduce: 410.94 | step: 88.46


 97%|█████████▋| 1678/1726 [29:27:46<49:57, 62.44s/it]
 97%|█████████▋| 1679/1726 [29:28:51<49:32, 63.24s/it]


 97%|█████████▋| 1679/1726 [29:28:51<49:32, 63.24s/it]
 97%|█████████▋| 1680/1726 [29:29:54<48:16, 62.97s/it]


 97%|█████████▋| 1680/1726 [29:29:54<48:16, 62.97s/it]
 97%|█████████▋| 1681/1726 [29:30:54<46:33, 62.08s/it]


 97%|█████████▋| 1681/1726 [29:30:54<46:33, 62.08s/it]
 97%|█████████▋| 1682/1726 [29:31:58<46:02, 62.79s/it]


 97%|█████████▋| 1682/1726 [29:31:58<46:02, 62.79s/it]
 98%|█████████▊| 1683/1726 [29:32:59<44:38, 62.28s/{'loss': 1.1642, 'learning_rate': 6.508637036174215e-08, 'epoch': 0.98}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401
[2024-06-11 06:10:24,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1433.04 | bwd_inner_microstep: 1433.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 06:10:26,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.64 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474
[2024-06-11 06:10:28,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1383.31 | bwd_inner_microstep: 1383.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 06:10:30,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 1485.50 | bwd_inner_microstep: 1485.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 06:10:32,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1383.77 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4190
[2024-06-11 06:10:34,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.55 | bwd_microstep: 1654.95 | bwd_inner_microstep: 1654.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 06:10:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1344.85 | bwd_inner_microstep: 1344.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 06:10:38,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1388.73 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 06:10:40,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 06:10:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1285.78 | bwd_inner_microstep: 1285.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525
[2024-06-11 06:10:44,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1294.36 | bwd_inner_microstep: 1294.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2130
[2024-06-11 06:10:45,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.89 | bwd_microstep: 974.62 | bwd_inner_microstep: 974.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 06:10:47,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.11 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584
[2024-06-11 06:10:49,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.95 | bwd_microstep: 1304.46 | bwd_inner_microstep: 1304.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490
[2024-06-11 06:10:51,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.87 | bwd_microstep: 1478.42 | bwd_inner_microstep: 1478.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924
[2024-06-11 06:10:52,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.03 | bwd_microstep: 847.44 | bwd_inner_microstep: 847.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2897
[2024-06-11 06:10:53,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.50 | bwd_microstep: 1091.15 | bwd_inner_microstep: 1091.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515
[2024-06-11 06:10:55,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1579.28 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3561
[2024-06-11 06:10:57,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.63 | bwd_microstep: 1446.49 | bwd_inner_microstep: 1446.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603
[2024-06-11 06:11:00,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.34 | bwd_microstep: 1603.83 | bwd_inner_microstep: 1603.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982
[2024-06-11 06:11:01,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.52 | bwd_microstep: 896.52 | bwd_inner_microstep: 896.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3825
[2024-06-11 06:11:03,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.86 | bwd_microstep: 1681.29 | bwd_inner_microstep: 1681.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620
[2024-06-11 06:11:04,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.12 | bwd_microstep: 263.62 | bwd_inner_microstep: 263.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 06:11:06,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.39 | bwd_microstep: 1612.10 | bwd_inner_microstep: 1612.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589
[2024-06-11 06:11:08,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1406.59 | bwd_inner_microstep: 1406.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813
[2024-06-11 06:11:10,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1478.42 | bwd_inner_microstep: 1478.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-11 06:11:12,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1300.12 | bwd_inner_microstep: 1300.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504
[2024-06-11 06:11:14,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.97 | bwd_microstep: 1433.30 | bwd_inner_microstep: 1433.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797
[2024-06-11 06:11:16,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.42 | bwd_microstep: 1655.01 | bwd_inner_microstep: 1654.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 06:11:18,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1403.57 | bwd_inner_microstep: 1403.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3587
[2024-06-11 06:11:19,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.14 | bwd_microstep: 1210.49 | bwd_inner_microstep: 1210.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3776
[2024-06-11 06:11:25,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.42 | optimizer_step: 6.61
[2024-06-11 06:11:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.52 | bwd_microstep: 4840.62 | bwd_inner_microstep: 1976.15 | bwd_allreduce_microstep: 2864.39 | step_microstep: 39.97
[2024-06-11 06:11:25,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16080.75 | bwd: 46105.03 | bwd_inner: 43239.71 | bwd_allreduce: 2864.63 | step: 41.57
{'loss': 1.1887, 'learning_rate': 6.209584745025643e-08, 'epoch': 0.98}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463
[2024-06-11 06:11:27,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.46 | bwd_microstep: 1430.41 | bwd_inner_microstep: 1430.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3994
[2024-06-11 06:11:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.55 | bwd_microstep: 1540.87 | bwd_inner_microstep: 1540.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 06:11:31,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1252.47 | bwd_inner_microstep: 1252.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-11 06:11:33,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1342.96 | bwd_inner_microstep: 1342.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 06:11:35,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1380.98 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032
[2024-06-11 06:11:36,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 807.61 | bwd_inner_microstep: 807.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400
[2024-06-11 06:11:37,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.27 | bwd_microstep: 1152.29 | bwd_inner_microstep: 1152.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492
[2024-06-11 06:11:39,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.34 | bwd_microstep: 1284.98 | bwd_inner_microstep: 1284.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408
[2024-06-11 06:11:41,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.55 | bwd_microstep: 1151.12 | bwd_inner_microstep: 1151.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3694
[2024-06-11 06:11:43,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.26 | bwd_microstep: 1455.61 | bwd_inner_microstep: 1455.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 06:11:44,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1288.77 | bwd_inner_microstep: 1288.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3675
[2024-06-11 06:11:47,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.19 | bwd_microstep: 1565.83 | bwd_inner_microstep: 1565.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3019
[2024-06-11 06:11:48,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.05 | bwd_microstep: 1232.09 | bwd_inner_microstep: 1232.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 06:11:50,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.06 | bwd_microstep: 1282.98 | bwd_inner_microstep: 1282.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490
[2024-06-11 06:11:52,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1574.18 | bwd_inner_microstep: 1574.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3516
[2024-06-11 06:11:54,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.26 | bwd_microstep: 1548.89 | bwd_inner_microstep: 1548.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458
[2024-06-11 06:11:56,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.81 | bwd_microstep: 1212.36 | bwd_inner_microstep: 1212.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538
[2024-06-11 06:11:58,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1492.31 | bwd_inner_microstep: 1492.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624
[2024-06-11 06:12:00,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028
[2024-06-11 06:12:02,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.92 | bwd_microstep: 808.68 | bwd_inner_microstep: 808.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579
[2024-06-11 06:12:04,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 1598.05 | bwd_inner_microstep: 1598.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1494
[2024-06-11 06:12:05,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 231.06 | bwd_microstep: 612.40 | bwd_inner_microstep: 612.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2151
[2024-06-11 06:12:06,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.54 | bwd_microstep: 803.70 | bwd_inner_microstep: 803.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560
[2024-06-11 06:12:08,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1500.59 | bwd_inner_microstep: 1500.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2153
[2024-06-11 06:12:09,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.47 | bwd_microstep: 854.21 | bwd_inner_microstep: 854.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735
[2024-06-11 06:12:11,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.43 | bwd_microstep: 1442.50 | bwd_inner_microstep: 1442.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-11 06:12:13,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1298.67 | bwd_inner_microstep: 1298.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818
[2024-06-11 06:12:15,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.16 | bwd_microstep: 1388.01 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3812
[2024-06-11 06:12:17,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.12 | bwd_microstep: 1801.29 | bwd_inner_microstep: 1801.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586
[2024-06-11 06:12:19,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3877
[2024-06-11 06:12:21,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1388.77 | bwd_inner_microstep: 1388.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762
[2024-06-11 06:12:26,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.15 | optimizer_step: 6.61
[2024-06-11 06:12:26,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 4072.93 | bwd_inner_microstep: 1919.02 | bwd_allreduce_microstep: 2153.86 | step_microstep: 38.11
[2024-06-11 06:12:26,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15772.93 | bwd: 44686.31 | bwd_inner: 42531.54 | bwd_allreduce: 2154.09 | step: 39.66
{'loss': 1.1466, 'learning_rate': 5.917554573354967e-08, 'epoch': 0.98}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 06:12:28,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1387.66 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3501
[2024-06-11 06:12:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1252.14 | bwd_inner_microstep: 1252.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853
[2024-06-11 06:12:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1559.51 | bwd_inner_microstep: 1559.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-11 06:12:34,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.17 | bwd_microstep: 1635.60 | bwd_inner_microstep: 1635.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 06:12:36,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946
[2024-06-11 06:12:37,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 798.69 | bwd_inner_microstep: 798.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-11 06:12:38,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.40 | bwd_microstep: 1292.56 | bwd_inner_microstep: 1292.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1961
[2024-06-11 06:12:40,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.18 | bwd_microstep: 826.27 | bwd_inner_microstep: 826.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-11 06:12:42,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.37 | bwd_microstep: 1641.04 | bwd_inner_microstep: 1641.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952
[2024-06-11 06:12:43,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.87 | bwd_microstep: 726.16 | bwd_inner_microstep: 726.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 06:12:45,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1376.05 | bwd_inner_microstep: 1376.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429
[2024-06-11 06:12:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1344.34 | bwd_inner_microstep: 1344.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3518
[2024-06-11 06:12:49,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.20 | bwd_microstep: 1646.40 | bwd_inner_microstep: 1646.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482
[2024-06-11 06:12:51,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.75 | bwd_microstep: 1476.11 | bwd_inner_microstep: 1476.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3681
[2024-06-11 06:12:53,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.49 | bwd_microstep: 1720.01 | bwd_inner_microstep: 1719.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 06:12:55,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1341.19 | bwd_inner_microstep: 1341.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3502
[2024-06-11 06:12:57,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.79 | bwd_microstep: 1528.77 | bwd_inner_microstep: 1528.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 06:12:59,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.25 | bwd_microstep: 1187.05 | bwd_inner_microstep: 1187.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3017
[2024-06-11 06:13:00,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.11 | bwd_microstep: 1133.99 | bwd_inner_microstep: 1133.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825
[2024-06-11 06:13:03,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.18 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-11 06:13:05,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1283.87 | bwd_inner_microstep: 1283.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547
[2024-06-11 06:13:06,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1396.77 | bwd_inner_microstep: 1396.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820
[2024-06-11 06:13:09,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1483.50 | bwd_inner_microstep: 1483.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592
[2024-06-11 06:13:11,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1509.32 | bwd_inner_microstep: 1509.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825
[2024-06-11 06:13:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1463.97 | bwd_inner_microstep: 1463.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3563
[2024-06-11 06:13:15,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1365.43 | bwd_inner_microstep: 1365.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3724
[2024-06-11 06:13:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.85 | bwd_microstep: 1704.17 | bwd_inner_microstep: 1704.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824
[2024-06-11 06:13:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.12 | bwd_microstep: 1519.69 | bwd_inner_microstep: 1519.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650
[2024-06-11 06:13:21,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.67 | bwd_microstep: 1589.79 | bwd_inner_microstep: 1589.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3391
[2024-06-11 06:13:23,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1437.55 | bwd_inner_microstep: 1437.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3612
[2024-06-11 06:13:25,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.75 | bwd_microstep: 1372.40 | bwd_inner_microstep: 1372.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3584
[2024-06-11 06:13:27,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.13 | optimizer_step: 6.65
[2024-06-11 06:13:27,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1478.31 | bwd_allreduce_microstep: 8.01 | step_microstep: 38.12
[2024-06-11 06:13:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16583.13 | bwd: 44391.28 | bwd_inner: 44382.25 | bwd_allreduce: 8.29 | step: 39.74
{'loss': 1.1506, 'learning_rate': 5.632547549690559e-08, 'epoch': 0.98}
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3416
[2024-06-11 06:13:29,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.81 | bwd_microstep: 1398.10 | bwd_inner_microstep: 1398.01 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458
[2024-06-11 06:13:31,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1275.02 | bwd_inner_microstep: 1274.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945
[2024-06-11 06:13:32,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.19 | bwd_microstep: 698.12 | bwd_inner_microstep: 698.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833
[2024-06-11 06:13:34,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.95 | bwd_microstep: 1661.50 | bwd_inner_microstep: 1661.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2244
[2024-06-11 06:13:35,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.46 | bwd_microstep: 869.69 | bwd_inner_microstep: 869.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 06:13:37,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1284.66 | bwd_inner_microstep: 1284.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 06:13:39,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.82 | bwd_microstep: 1495.47 | bwd_inner_microstep: 1495.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545
[2024-06-11 06:13:41,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1355.96 | bwd_inner_microstep: 1355.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-11 06:13:43,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499
[2024-06-11 06:13:45,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.05 | bwd_microstep: 1288.83 | bwd_inner_microstep: 1288.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2504
[2024-06-11 06:13:46,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.45 | bwd_microstep: 990.28 | bwd_inner_microstep: 990.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2013
[2024-06-11 06:13:47,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.65 | bwd_microstep: 743.05 | bwd_inner_microstep: 743.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667
[2024-06-11 06:13:49,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.61 | bwd_microstep: 1615.12 | bwd_inner_microstep: 1615.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-11 06:13:51,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.07 | bwd_microstep: 1151.53 | bwd_inner_microstep: 1151.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403
[2024-06-11 06:13:53,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1344.33 | bwd_inner_microstep: 1344.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-11 06:13:55,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1489.51 | bwd_inner_microstep: 1489.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 06:13:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.34 | bwd_microstep: 1492.09 | bwd_inner_microstep: 1492.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533
[2024-06-11 06:13:59,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1398.92 | bwd_inner_microstep: 1398.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3536
[2024-06-11 06:14:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.45 | bwd_microstep: 1356.53 | bwd_inner_microstep: 1356.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835
[2024-06-11 06:14:03,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.84 | bwd_microstep: 1655.97 | bwd_inner_microstep: 1655.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 06:14:05,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1279.92 | bwd_inner_microstep: 1279.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992
[2024-06-11 06:14:06,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.70 | bwd_microstep: 896.08 | bwd_inner_microstep: 896.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2405
[2024-06-11 06:14:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.47 | bwd_microstep: 878.71 | bwd_inner_microstep: 878.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572
[2024-06-11 06:14:09,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1993
[2024-06-11 06:14:10,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.77 | bwd_microstep: 738.00 | bwd_inner_microstep: 737.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467
[2024-06-11 06:14:12,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.42 | bwd_microstep: 1183.06 | bwd_inner_microstep: 1183.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-11 06:14:14,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1415.34 | bwd_inner_microstep: 1415.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061
[2024-06-11 06:14:15,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.67 | bwd_microstep: 914.64 | bwd_inner_microstep: 914.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 06:14:17,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823
[2024-06-11 06:14:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.09 | bwd_microstep: 1516.97 | bwd_inner_microstep: 1516.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 06:14:21,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1402.11 | bwd_inner_microstep: 1402.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579
[2024-06-11 06:14:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.20 | optimizer_step: 6.60
[2024-06-11 06:14:28,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.05 | bwd_microstep: 6778.34 | bwd_inner_microstep: 1578.25 | bwd_allreduce_microstep: 5200.01 | step_microstep: 38.46
[2024-06-11 06:14:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15173.14 | bwd: 45631.71 | bwd_inner: 40430.70 | bwd_allreduce: 5200.31 | step: 40.00
{'loss': 1.1441, 'learning_rate': 5.3545646778263575e-08, 'epoch': 0.98}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2460
[2024-06-11 06:14:30,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.48 | bwd_microstep: 938.89 | bwd_inner_microstep: 938.74 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470
[2024-06-11 06:14:31,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.63 | bwd_microstep: 1210.67 | bwd_inner_microstep: 1210.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3840
[2024-06-11 06:14:33,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1401.68 | bwd_inner_microstep: 1401.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779
[2024-06-11 06:14:35,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.11 | bwd_microstep: 1439.51 | bwd_inner_microstep: 1439.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796
[2024-06-11 06:14:37,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1644.79 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599
[2024-06-11 06:14:40,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.02 | bwd_microstep: 1505.32 | bwd_inner_microstep: 1505.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-11 06:14:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.76 | bwd_microstep: 1636.48 | bwd_inner_microstep: 1636.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-11 06:14:43,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.27 | bwd_microstep: 1157.04 | bwd_inner_microstep: 1157.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451
[2024-06-11 06:14:45,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.79 | bwd_microstep: 1286.43 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455
[2024-06-11 06:14:47,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.18 | bwd_microstep: 1314.43 | bwd_inner_microstep: 1314.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524
[2024-06-11 06:14:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1384.44 | bwd_inner_microstep: 1384.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3518
[2024-06-11 06:14:51,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.77 | bwd_microstep: 1548.34 | bwd_inner_microstep: 1548.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3714
[2024-06-11 06:14:53,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.35 | bwd_microstep: 1728.66 | bwd_inner_microstep: 1728.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3492
[2024-06-11 06:14:55,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.77 | bwd_microstep: 1529.26 | bwd_inner_microstep: 1529.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3654
[2024-06-11 06:14:58,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1482.45 | bwd_inner_microstep: 1482.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2125
[2024-06-11 06:14:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.44 | bwd_microstep: 764.71 | bwd_inner_microstep: 764.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3901
[2024-06-11 06:15:01,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.79 | bwd_microstep: 1689.52 | bwd_inner_microstep: 1689.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549
[2024-06-11 06:15:03,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1397.87 | bwd_inner_microstep: 1397.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720
[2024-06-11 06:15:05,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1336.77 | bwd_inner_microstep: 1336.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472
[2024-06-11 06:15:06,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1280.84 | bwd_inner_microstep: 1280.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3727
[2024-06-11 06:15:08,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1269.85 | bwd_inner_microstep: 1269.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 06:15:10,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1416.08 | bwd_inner_microstep: 1416.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 06:15:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1283.13 | bwd_inner_microstep: 1283.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470
[2024-06-11 06:15:14,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546
[2024-06-11 06:15:16,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1329.54 | bwd_inner_microstep: 1329.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450
[2024-06-11 06:15:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.36 | bwd_microstep: 1255.75 | bwd_inner_microstep: 1255.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576
[2024-06-11 06:15:19,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.51 | bwd_microstep: 1330.61 | bwd_inner_microstep: 1330.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2231
[2024-06-11 06:15:21,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.19 | bwd_microstep: 929.58 | bwd_inner_microstep: 929.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 06:15:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1545.68 | bwd_inner_microstep: 1545.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3589
[2024-06-11 06:15:25,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.00 | bwd_microstep: 1672.41 | bwd_inner_microstep: 1672.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2945
[2024-06-11 06:15:26,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.53 | bwd_microstep: 1038.52 | bwd_inner_microstep: 1038.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 06:15:31,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.18 | optimizer_step: 6.58
[2024-06-11 06:15:31,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.48 | bwd_microstep: 3886.24 | bwd_inner_microstep: 1534.26 | bwd_allreduce_microstep: 2351.92 | step_microstep: 39.06
[2024-06-11 06:15:31,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.34 | bwd: 46013.13 | bwd_inner: 43660.19 | bwd_allreduce: 2352.21 | step: 40.64
it]


 98%|█████████▊| 1683/1726 [29:32:59<44:38, 62.28s/it]
 98%|█████████▊| 1684/1726 [29:34:02<43:39, 62.36s/it]


 98%|█████████▊| 1684/1726 [29:34:02<43:39, 62.36s/it]
 98%|█████████▊| 1685/1726 [29:35:03<42:17, 61.89s/it]


 98%|█████████▊| 1685/1726 [29:35:03<42:17, 61.89s/it]
 98%|█████████▊| 1686/1726 [29:36:04<41:08, 61.72s/it]


 98%|█████████▊| 1686/1726 [29:36:04<41:08, 61.72s/it]
 98%|█████████▊| 1687/1726 [29:37:05<40:00, 61.55s/it]


 98%|█████████▊| 1687/1726 [29:37:05<40:00, 61.55s/it]
 98%|█████████▊| 1688/1726 [29:38:08<39:11, 6{'loss': 1.1412, 'learning_rate': 5.083606936815866e-08, 'epoch': 0.98}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3503
[2024-06-11 06:15:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1338.54 | bwd_inner_microstep: 1338.35 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3927
[2024-06-11 06:15:35,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.30 | bwd_microstep: 1692.60 | bwd_inner_microstep: 1692.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-11 06:15:37,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.23 | bwd_microstep: 1580.60 | bwd_inner_microstep: 1580.30 | bwd_allreduce_microstep: 0.16 | step_microstep: 0.27
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 06:15:39,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.47 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728
[2024-06-11 06:15:41,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.74 | bwd_microstep: 1333.36 | bwd_inner_microstep: 1333.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400
[2024-06-11 06:15:43,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1245.32 | bwd_inner_microstep: 1245.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 06:15:45,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4189
[2024-06-11 06:15:47,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.76 | bwd_microstep: 1566.34 | bwd_inner_microstep: 1566.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494
[2024-06-11 06:15:49,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1416.41 | bwd_inner_microstep: 1416.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933
[2024-06-11 06:15:50,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.26 | bwd_microstep: 824.94 | bwd_inner_microstep: 824.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438
[2024-06-11 06:15:52,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1402.82 | bwd_inner_microstep: 1402.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2273
[2024-06-11 06:15:53,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.69 | bwd_microstep: 909.23 | bwd_inner_microstep: 909.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422
[2024-06-11 06:15:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.74 | bwd_microstep: 1444.27 | bwd_inner_microstep: 1444.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660
[2024-06-11 06:15:57,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1522.50 | bwd_inner_microstep: 1522.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951
[2024-06-11 06:15:58,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.84 | bwd_microstep: 796.98 | bwd_inner_microstep: 796.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475
[2024-06-11 06:16:00,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.60 | bwd_microstep: 1219.13 | bwd_inner_microstep: 1219.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 06:16:02,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1383.76 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454
[2024-06-11 06:16:03,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.56 | bwd_microstep: 1158.34 | bwd_inner_microstep: 1158.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2710
[2024-06-11 06:16:05,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.68 | bwd_microstep: 1132.96 | bwd_inner_microstep: 1132.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 06:16:07,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1401.75 | bwd_inner_microstep: 1401.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2919
[2024-06-11 06:16:09,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.82 | bwd_microstep: 1128.24 | bwd_inner_microstep: 1128.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826
[2024-06-11 06:16:11,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1492.68 | bwd_inner_microstep: 1492.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 06:16:13,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1555.71 | bwd_inner_microstep: 1555.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983
[2024-06-11 06:16:14,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.06 | bwd_microstep: 705.42 | bwd_inner_microstep: 705.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987
[2024-06-11 06:16:15,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.18 | bwd_microstep: 797.01 | bwd_inner_microstep: 796.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471
[2024-06-11 06:16:17,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.72 | bwd_microstep: 1347.22 | bwd_inner_microstep: 1347.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284
[2024-06-11 06:16:18,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.75 | bwd_microstep: 1073.24 | bwd_inner_microstep: 1073.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3452
[2024-06-11 06:16:20,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.45 | bwd_microstep: 1517.46 | bwd_inner_microstep: 1517.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421
[2024-06-11 06:16:22,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.23 | bwd_microstep: 1282.63 | bwd_inner_microstep: 1282.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522
[2024-06-11 06:16:24,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1591.82 | bwd_inner_microstep: 1591.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597
[2024-06-11 06:16:26,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1573.94 | bwd_inner_microstep: 1573.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2402
[2024-06-11 06:16:31,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.13 | optimizer_step: 6.61
[2024-06-11 06:16:31,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.14 | bwd_microstep: 3721.13 | bwd_inner_microstep: 1208.19 | bwd_allreduce_microstep: 2512.88 | step_microstep: 38.21
[2024-06-11 06:16:31,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15421.85 | bwd: 43820.35 | bwd_inner: 41306.07 | bwd_allreduce: 2513.36 | step: 40.33
{'loss': 1.1881, 'learning_rate': 4.819675280971492e-08, 'epoch': 0.98}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467
[2024-06-11 06:16:32,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.85 | bwd_microstep: 1307.35 | bwd_inner_microstep: 1307.23 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 06:16:34,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.76 | bwd_microstep: 1376.92 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 06:16:36,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1483.57 | bwd_inner_microstep: 1483.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818
[2024-06-11 06:16:39,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.58 | bwd_microstep: 1653.71 | bwd_inner_microstep: 1653.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4199
[2024-06-11 06:16:41,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.76 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409
[2024-06-11 06:16:43,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1246.96 | bwd_inner_microstep: 1246.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744
[2024-06-11 06:16:45,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1466.74 | bwd_inner_microstep: 1466.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932
[2024-06-11 06:16:46,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 789.79 | bwd_inner_microstep: 789.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 06:16:47,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1280.74 | bwd_inner_microstep: 1280.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 06:16:49,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1389.15 | bwd_inner_microstep: 1389.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-11 06:16:50,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.77 | bwd_microstep: 680.26 | bwd_inner_microstep: 680.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1964
[2024-06-11 06:16:52,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.50 | bwd_microstep: 854.79 | bwd_inner_microstep: 854.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1973
[2024-06-11 06:16:53,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.01 | bwd_microstep: 859.57 | bwd_inner_microstep: 859.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501
[2024-06-11 06:16:55,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.21 | bwd_microstep: 1447.11 | bwd_inner_microstep: 1447.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511
[2024-06-11 06:16:57,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1387.69 | bwd_inner_microstep: 1387.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 06:16:59,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1384.42 | bwd_inner_microstep: 1384.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634
[2024-06-11 06:17:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1510.26 | bwd_inner_microstep: 1510.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826
[2024-06-11 06:17:03,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.92 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 06:17:05,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1396.75 | bwd_inner_microstep: 1396.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448
[2024-06-11 06:17:06,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1283.47 | bwd_inner_microstep: 1283.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530
[2024-06-11 06:17:08,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1391.80 | bwd_inner_microstep: 1391.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512
[2024-06-11 06:17:10,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818
[2024-06-11 06:17:12,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1553.07 | bwd_inner_microstep: 1553.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 612
[2024-06-11 06:17:13,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.05 | bwd_microstep: 260.46 | bwd_inner_microstep: 260.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544
[2024-06-11 06:17:15,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.60 | bwd_microstep: 1230.00 | bwd_inner_microstep: 1229.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994
[2024-06-11 06:17:16,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 830.79 | bwd_inner_microstep: 830.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546
[2024-06-11 06:17:18,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-11 06:17:20,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1396.17 | bwd_inner_microstep: 1396.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434
[2024-06-11 06:17:22,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.24 | bwd_microstep: 1445.73 | bwd_inner_microstep: 1445.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3464
[2024-06-11 06:17:23,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.13 | bwd_microstep: 1345.11 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725
[2024-06-11 06:17:26,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1532.50 | bwd_inner_microstep: 1532.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2024
[2024-06-11 06:17:32,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.59
[2024-06-11 06:17:32,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.00 | bwd_microstep: 6202.03 | bwd_inner_microstep: 1033.44 | bwd_allreduce_microstep: 5168.52 | step_microstep: 39.76
[2024-06-11 06:17:32,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.99 | bwd: 45989.89 | bwd_inner: 40820.33 | bwd_allreduce: 5168.82 | step: 41.51
{'loss': 1.1216, 'learning_rate': 4.562770639858549e-08, 'epoch': 0.98}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 06:17:34,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1369.71 | bwd_inner_microstep: 1369.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528
[2024-06-11 06:17:36,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-11 06:17:38,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.86 | bwd_microstep: 1453.18 | bwd_inner_microstep: 1452.90 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785
[2024-06-11 06:17:40,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 1549.85 | bwd_inner_microstep: 1549.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-11 06:17:42,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1414.37 | bwd_inner_microstep: 1414.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882
[2024-06-11 06:17:43,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.65 | bwd_microstep: 716.41 | bwd_inner_microstep: 716.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880
[2024-06-11 06:17:44,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.21 | bwd_microstep: 680.72 | bwd_inner_microstep: 680.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3708
[2024-06-11 06:17:46,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.17 | bwd_microstep: 1598.05 | bwd_inner_microstep: 1598.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964
[2024-06-11 06:17:47,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 703.91 | bwd_inner_microstep: 703.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513
[2024-06-11 06:17:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.36 | bwd_microstep: 1196.01 | bwd_inner_microstep: 1195.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686
[2024-06-11 06:17:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1619.04 | bwd_inner_microstep: 1619.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2893
[2024-06-11 06:17:53,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.98 | bwd_microstep: 1189.64 | bwd_inner_microstep: 1189.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 06:17:55,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1346.10 | bwd_inner_microstep: 1346.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3700
[2024-06-11 06:17:57,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.30 | bwd_microstep: 1589.47 | bwd_inner_microstep: 1589.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831
[2024-06-11 06:17:59,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1515.70 | bwd_inner_microstep: 1515.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2093
[2024-06-11 06:18:00,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.59 | bwd_microstep: 948.26 | bwd_inner_microstep: 948.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521
[2024-06-11 06:18:02,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1451.54 | bwd_inner_microstep: 1451.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 06:18:04,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1256.63 | bwd_inner_microstep: 1256.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 06:18:06,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1555.55 | bwd_inner_microstep: 1555.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452
[2024-06-11 06:18:08,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1351.70 | bwd_inner_microstep: 1351.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677
[2024-06-11 06:18:10,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1431.82 | bwd_inner_microstep: 1431.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611
[2024-06-11 06:18:12,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1313.05 | bwd_inner_microstep: 1313.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 06:18:13,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.65 | bwd_microstep: 1285.55 | bwd_inner_microstep: 1285.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1624
[2024-06-11 06:18:14,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 245.24 | bwd_microstep: 646.81 | bwd_inner_microstep: 646.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816
[2024-06-11 06:18:16,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.42 | bwd_microstep: 1355.02 | bwd_inner_microstep: 1354.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603
[2024-06-11 06:18:18,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1508.09 | bwd_inner_microstep: 1508.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573
[2024-06-11 06:18:21,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.57 | bwd_microstep: 1591.55 | bwd_inner_microstep: 1591.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3560
[2024-06-11 06:18:23,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.78 | bwd_microstep: 1566.02 | bwd_inner_microstep: 1566.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3814
[2024-06-11 06:18:25,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1423.77 | bwd_inner_microstep: 1423.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572
[2024-06-11 06:18:27,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1424.17 | bwd_inner_microstep: 1424.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3430
[2024-06-11 06:18:29,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.16 | bwd_microstep: 1378.50 | bwd_inner_microstep: 1378.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2032
[2024-06-11 06:18:33,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.10 | optimizer_step: 6.58
[2024-06-11 06:18:33,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.45 | bwd_microstep: 4041.84 | bwd_inner_microstep: 997.70 | bwd_allreduce_microstep: 3044.08 | step_microstep: 37.90
[2024-06-11 06:18:33,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15617.16 | bwd: 44864.40 | bwd_inner: 41819.22 | bwd_allreduce: 3044.40 | step: 39.73
{'loss': 1.1603, 'learning_rate': 4.3128939182941474e-08, 'epoch': 0.98}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3014
[2024-06-11 06:18:35,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.51 | bwd_microstep: 1217.83 | bwd_inner_microstep: 1217.66 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3965
[2024-06-11 06:18:37,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.46 | bwd_microstep: 1695.32 | bwd_inner_microstep: 1695.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552
[2024-06-11 06:18:39,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 06:18:40,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.49 | bwd_microstep: 972.05 | bwd_inner_microstep: 972.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807
[2024-06-11 06:18:43,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.75 | bwd_microstep: 1653.11 | bwd_inner_microstep: 1653.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277
[2024-06-11 06:18:44,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.01 | bwd_microstep: 973.95 | bwd_inner_microstep: 973.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 06:18:46,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.11 | bwd_microstep: 1386.14 | bwd_inner_microstep: 1386.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408
[2024-06-11 06:18:48,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.18 | bwd_microstep: 1280.32 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 06:18:49,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1246.87 | bwd_inner_microstep: 1246.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-11 06:18:51,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1250.22 | bwd_inner_microstep: 1250.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3696
[2024-06-11 06:18:53,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.17 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422
[2024-06-11 06:18:55,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1346.68 | bwd_inner_microstep: 1346.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2652
[2024-06-11 06:18:56,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.35 | bwd_microstep: 934.53 | bwd_inner_microstep: 934.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-11 06:18:58,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1484.53 | bwd_inner_microstep: 1484.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3666
[2024-06-11 06:19:01,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.48 | bwd_microstep: 1823.06 | bwd_inner_microstep: 1823.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-11 06:19:03,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1450.07 | bwd_inner_microstep: 1450.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2098
[2024-06-11 06:19:04,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.90 | bwd_microstep: 948.81 | bwd_inner_microstep: 948.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3450
[2024-06-11 06:19:06,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.70 | bwd_microstep: 1220.97 | bwd_inner_microstep: 1220.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838
[2024-06-11 06:19:08,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.11 | bwd_microstep: 1463.70 | bwd_inner_microstep: 1463.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480
[2024-06-11 06:19:10,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.78 | bwd_microstep: 1284.31 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080
[2024-06-11 06:19:11,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.56 | bwd_microstep: 821.96 | bwd_inner_microstep: 821.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731
[2024-06-11 06:19:13,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.55 | bwd_microstep: 1441.95 | bwd_inner_microstep: 1441.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614
[2024-06-11 06:19:15,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1414.57 | bwd_inner_microstep: 1414.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-11 06:19:16,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1281.70 | bwd_inner_microstep: 1281.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 06:19:18,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1282.41 | bwd_inner_microstep: 1282.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150
[2024-06-11 06:19:20,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.79 | bwd_microstep: 949.22 | bwd_inner_microstep: 949.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2087
[2024-06-11 06:19:21,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.36 | bwd_microstep: 966.41 | bwd_inner_microstep: 966.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3584
[2024-06-11 06:19:23,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1451.98 | bwd_inner_microstep: 1451.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3617
[2024-06-11 06:19:25,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1544.18 | bwd_inner_microstep: 1544.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781
[2024-06-11 06:19:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.62 | bwd_microstep: 1546.24 | bwd_inner_microstep: 1546.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622
[2024-06-11 06:19:29,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1507.26 | bwd_inner_microstep: 1507.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 06:19:35,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.14 | optimizer_step: 6.58
[2024-06-11 06:19:35,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 4723.60 | bwd_inner_microstep: 1579.90 | bwd_allreduce_microstep: 3143.64 | step_microstep: 39.28
[2024-06-11 06:19:35,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15760.84 | bwd: 45451.58 | bwd_inner: 42306.90 | bwd_allreduce: 3143.93 | step: 40.93
{'loss': 1.1953, 'learning_rate': 4.070045996342975e-08, 'epoch': 0.98}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3483
[2024-06-11 06:19:37,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.18 | bwd_microstep: 1569.98 | bwd_inner_microstep: 1569.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.17
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4185
[2024-06-11 06:19:39,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.65 | bwd_microstep: 1510.49 | bwd_inner_microstep: 1510.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3395
[2024-06-11 06:19:41,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.28 | bwd_microstep: 1375.23 | bwd_inner_microstep: 1375.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3848
[2024-06-11 06:19:43,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 1493.31 | bwd_inner_microstep: 1493.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843
[2024-06-11 06:19:45,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.72 | bwd_microstep: 1659.90 | bwd_inner_microstep: 1659.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 06:19:47,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1341.13 | bwd_inner_microstep: 1341.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537
[2024-06-11 06:19:49,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 06:19:51,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1386.07 | bwd_inner_microstep: 1386.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2179
[2024-06-11 06:19:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.95 | bwd_microstep: 950.37 | bwd_inner_microstep: 950.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705
[2024-06-11 06:19:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.85 | bwd_microstep: 1524.15 | bwd_inner_microstep: 1524.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 06:19:56,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1250.87 | bwd_inner_microstep: 1250.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1996
[2024-06-11 06:19:57,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.12 | bwd_microstep: 707.36 | bwd_inner_microstep: 707.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3430
[2024-06-11 06:19:59,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.00 | bwd_microstep: 1214.61 | bwd_inner_microstep: 1214.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979
[2024-06-11 06:20:00,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.62 | bwd_microstep: 894.02 | bwd_inner_microstep: 893.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503
[2024-06-11 06:20:02,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.51 | bwd_microstep: 1486.52 | bwd_inner_microstep: 1486.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3513
[2024-06-11 06:20:04,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1616.39 | bwd_inner_microstep: 1616.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472
[2024-06-11 06:20:06,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1375.60 | bwd_inner_microstep: 1375.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399
[2024-06-11 06:20:08,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1245.62 | bwd_inner_microstep: 1245.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563
[2024-06-11 06:20:10,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.32 | bwd_microstep: 1396.91 | bwd_inner_microstep: 1396.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 06:20:11,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1351.41 | bwd_inner_microstep: 1351.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 06:20:13,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1356.82 | bwd_inner_microstep: 1356.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456
[2024-06-11 06:20:15,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1350.64 | bwd_inner_microstep: 1350.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451
[2024-06-11 06:20:17,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.16 | bwd_microstep: 1168.12 | bwd_inner_microstep: 1168.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-11 06:20:19,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1297.85 | bwd_inner_microstep: 1297.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-11 06:20:20,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1317.02 | bwd_inner_microstep: 1317.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 06:20:22,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.86 | bwd_microstep: 1353.05 | bwd_inner_microstep: 1353.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280
[2024-06-11 06:20:24,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.80 | bwd_microstep: 1003.01 | bwd_inner_microstep: 1002.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823
[2024-06-11 06:20:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1418.93 | bwd_inner_microstep: 1418.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802
[2024-06-11 06:20:28,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1643.04 | bwd_inner_microstep: 1643.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3055
[2024-06-11 06:20:30,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.44 | bwd_microstep: 1331.86 | bwd_inner_microstep: 1331.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434
[2024-06-11 06:20:32,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.90 | bwd_microstep: 1311.14 | bwd_inner_microstep: 1311.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2048
[2024-06-11 06:20:35,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.34 | optimizer_step: 6.61
[2024-06-11 06:20:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.71 | bwd_microstep: 3514.27 | bwd_inner_microstep: 997.65 | bwd_allreduce_microstep: 2516.56 | step_microstep: 38.60
[2024-06-11 06:20:35,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15793.45 | bwd: 44812.62 | bwd_inner: 42295.00 | bwd_allreduce: 2516.88 | step: 40.22
1.88s/it]


 98%|█████████▊| 1688/1726 [29:38:08<39:11, 61.88s/it]
 98%|█████████▊| 1689/1726 [29:39:07<37:44, 61.20s/it]


 98%|█████████▊| 1689/1726 [29:39:07<37:44, 61.20s/it]
 98%|█████████▊| 1690/1726 [29:40:09<36:47, 61.32s/it]


 98%|█████████▊| 1690/1726 [29:40:09<36:47, 61.32s/it]
 98%|█████████▊| 1691/1726 [29:41:10<35:41, 61.17s/it]


 98%|█████████▊| 1691/1726 [29:41:10<35:41, 61.17s/it]
 98%|█████████▊| 1692/1726 [29:42:11<34:43, 61.29s/it]


 98%|█████████▊| 1692/1726 [29:42:11<34:43, 61.29s/it]
 98%|█████████▊| 1693/1726 [29:43:12<33{'loss': 1.1836, 'learning_rate': 3.834227729313966e-08, 'epoch': 0.98}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414
[2024-06-11 06:20:37,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.99 | bwd_microstep: 1337.22 | bwd_inner_microstep: 1337.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 06:20:39,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.02 | bwd_microstep: 1375.28 | bwd_inner_microstep: 1375.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416
[2024-06-11 06:20:41,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 06:20:43,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1289.18 | bwd_inner_microstep: 1289.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-11 06:20:44,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.14 | bwd_microstep: 789.49 | bwd_inner_microstep: 789.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865
[2024-06-11 06:20:45,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.42 | bwd_microstep: 741.63 | bwd_inner_microstep: 741.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 06:20:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.64 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497
[2024-06-11 06:20:48,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1390.06 | bwd_inner_microstep: 1390.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2180
[2024-06-11 06:20:49,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.94 | bwd_microstep: 951.70 | bwd_inner_microstep: 951.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430
[2024-06-11 06:20:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1346.90 | bwd_inner_microstep: 1346.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 06:20:53,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1387.83 | bwd_inner_microstep: 1387.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3706
[2024-06-11 06:20:55,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.35 | bwd_microstep: 1671.13 | bwd_inner_microstep: 1671.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561
[2024-06-11 06:20:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.65 | bwd_microstep: 1598.66 | bwd_inner_microstep: 1598.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3660
[2024-06-11 06:21:00,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.53 | bwd_microstep: 1654.93 | bwd_inner_microstep: 1654.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421
[2024-06-11 06:21:01,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.96 | bwd_microstep: 1248.52 | bwd_inner_microstep: 1248.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006
[2024-06-11 06:21:03,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.60 | bwd_microstep: 833.80 | bwd_inner_microstep: 833.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415
[2024-06-11 06:21:04,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.71 | bwd_microstep: 1343.41 | bwd_inner_microstep: 1343.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3700
[2024-06-11 06:21:06,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1379.84 | bwd_inner_microstep: 1379.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648
[2024-06-11 06:21:08,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1517.64 | bwd_inner_microstep: 1517.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984
[2024-06-11 06:21:10,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 802.16 | bwd_inner_microstep: 802.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 06:21:12,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 06:21:13,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1398.82 | bwd_inner_microstep: 1398.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659
[2024-06-11 06:21:16,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.93 | bwd_microstep: 1722.70 | bwd_inner_microstep: 1722.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597
[2024-06-11 06:21:18,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1310.14 | bwd_inner_microstep: 1310.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-11 06:21:20,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1510.38 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-11 06:21:22,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.24 | bwd_microstep: 1447.51 | bwd_inner_microstep: 1447.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045
[2024-06-11 06:21:23,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 814.16 | bwd_inner_microstep: 814.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 06:21:25,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1251.05 | bwd_inner_microstep: 1251.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-11 06:21:27,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.98 | bwd_microstep: 1439.58 | bwd_inner_microstep: 1439.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 06:21:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1656.68 | bwd_inner_microstep: 1656.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-11 06:21:31,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.44 | bwd_microstep: 1644.45 | bwd_inner_microstep: 1644.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-11 06:21:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-11 06:21:36,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.74 | bwd_microstep: 4766.18 | bwd_inner_microstep: 1702.91 | bwd_allreduce_microstep: 3063.22 | step_microstep: 38.88
[2024-06-11 06:21:36,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15597.12 | bwd: 45061.47 | bwd_inner: 41997.32 | bwd_allreduce: 3063.46 | step: 40.43
{'loss': 1.1871, 'learning_rate': 3.6054399477576384e-08, 'epoch': 0.98}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477
[2024-06-11 06:21:38,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1399.55 | bwd_inner_microstep: 1399.39 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.19
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952
[2024-06-11 06:21:41,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.25 | bwd_microstep: 1694.77 | bwd_inner_microstep: 1694.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784
[2024-06-11 06:21:43,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1455.39 | bwd_inner_microstep: 1455.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405
[2024-06-11 06:21:45,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1307.90 | bwd_inner_microstep: 1307.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953
[2024-06-11 06:21:46,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 792.96 | bwd_inner_microstep: 792.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483
[2024-06-11 06:21:47,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.07 | bwd_microstep: 1187.72 | bwd_inner_microstep: 1187.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2079
[2024-06-11 06:21:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.40 | bwd_microstep: 726.30 | bwd_inner_microstep: 726.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480
[2024-06-11 06:21:50,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.75 | bwd_microstep: 1387.05 | bwd_inner_microstep: 1387.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486
[2024-06-11 06:21:52,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3954
[2024-06-11 06:21:54,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1505.45 | bwd_inner_microstep: 1505.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500
[2024-06-11 06:21:56,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1416.80 | bwd_inner_microstep: 1416.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663
[2024-06-11 06:21:58,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.43 | bwd_microstep: 1615.78 | bwd_inner_microstep: 1615.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3738
[2024-06-11 06:22:01,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.31 | bwd_microstep: 1731.02 | bwd_inner_microstep: 1730.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 06:22:03,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1388.64 | bwd_inner_microstep: 1388.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3942
[2024-06-11 06:22:05,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.77 | bwd_microstep: 1729.69 | bwd_inner_microstep: 1729.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 06:22:07,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.49 | bwd_microstep: 1249.37 | bwd_inner_microstep: 1249.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514
[2024-06-11 06:22:09,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1416.25 | bwd_inner_microstep: 1416.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 06:22:11,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1374.73 | bwd_inner_microstep: 1374.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-11 06:22:13,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1504.21 | bwd_inner_microstep: 1504.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605
[2024-06-11 06:22:15,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.80 | bwd_microstep: 1430.98 | bwd_inner_microstep: 1430.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828
[2024-06-11 06:22:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.32 | bwd_microstep: 1582.15 | bwd_inner_microstep: 1582.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3578
[2024-06-11 06:22:19,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1422.53 | bwd_inner_microstep: 1422.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-11 06:22:21,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.31 | bwd_microstep: 1551.01 | bwd_inner_microstep: 1550.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640
[2024-06-11 06:22:23,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.75 | bwd_microstep: 1520.65 | bwd_inner_microstep: 1520.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018
[2024-06-11 06:22:24,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 809.60 | bwd_inner_microstep: 809.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398
[2024-06-11 06:22:26,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1342.55 | bwd_inner_microstep: 1342.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-11 06:22:28,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.87 | bwd_microstep: 1757.32 | bwd_inner_microstep: 1757.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548
[2024-06-11 06:22:30,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.38 | bwd_microstep: 1496.01 | bwd_inner_microstep: 1495.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 06:22:33,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1553.14 | bwd_inner_microstep: 1553.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3594
[2024-06-11 06:22:34,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.39 | bwd_microstep: 1212.77 | bwd_inner_microstep: 1212.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541
[2024-06-11 06:22:36,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.60 | bwd_microstep: 1399.95 | bwd_inner_microstep: 1399.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554
[2024-06-11 06:22:38,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.07 | optimizer_step: 6.65
[2024-06-11 06:22:38,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1444.52 | bwd_inner_microstep: 1436.52 | bwd_allreduce_microstep: 7.95 | step_microstep: 37.81
[2024-06-11 06:22:38,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16684.50 | bwd: 44691.70 | bwd_inner: 44682.73 | bwd_allreduce: 8.24 | step: 39.49
{'loss': 1.1749, 'learning_rate': 3.383683457463649e-08, 'epoch': 0.98}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445
[2024-06-11 06:22:40,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.69 | bwd_microstep: 1354.45 | bwd_inner_microstep: 1354.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-11 06:22:41,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.91 | bwd_microstep: 699.83 | bwd_inner_microstep: 699.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458
[2024-06-11 06:22:43,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1482.73 | bwd_inner_microstep: 1482.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3890
[2024-06-11 06:22:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.94 | bwd_microstep: 1590.96 | bwd_inner_microstep: 1590.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 06:22:47,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1244.35 | bwd_inner_microstep: 1244.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-11 06:22:49,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.66 | bwd_microstep: 1455.61 | bwd_inner_microstep: 1455.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418
[2024-06-11 06:22:51,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1251.79 | bwd_inner_microstep: 1251.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435
[2024-06-11 06:22:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.83 | bwd_microstep: 1254.24 | bwd_inner_microstep: 1254.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-11 06:22:55,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1525.94 | bwd_inner_microstep: 1525.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490
[2024-06-11 06:22:57,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.53 | bwd_microstep: 1388.56 | bwd_inner_microstep: 1388.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485
[2024-06-11 06:22:58,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.95 | bwd_microstep: 1287.49 | bwd_inner_microstep: 1287.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-11 06:23:00,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1527.33 | bwd_inner_microstep: 1527.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1908
[2024-06-11 06:23:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.86 | bwd_microstep: 779.24 | bwd_inner_microstep: 779.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942
[2024-06-11 06:23:03,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.82 | bwd_microstep: 820.38 | bwd_inner_microstep: 820.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423
[2024-06-11 06:23:04,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497
[2024-06-11 06:23:07,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1483.78 | bwd_inner_microstep: 1483.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2407
[2024-06-11 06:23:08,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.46 | bwd_microstep: 1033.00 | bwd_inner_microstep: 1032.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3674
[2024-06-11 06:23:10,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.75 | bwd_microstep: 1724.43 | bwd_inner_microstep: 1724.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-11 06:23:12,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1472.69 | bwd_inner_microstep: 1472.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2288
[2024-06-11 06:23:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.67 | bwd_microstep: 785.51 | bwd_inner_microstep: 785.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525
[2024-06-11 06:23:15,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1395.88 | bwd_inner_microstep: 1395.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822
[2024-06-11 06:23:17,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1460.71 | bwd_inner_microstep: 1460.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921
[2024-06-11 06:23:18,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.86 | bwd_microstep: 695.24 | bwd_inner_microstep: 695.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539
[2024-06-11 06:23:20,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1393.04 | bwd_inner_microstep: 1393.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 06:23:22,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1378.79 | bwd_inner_microstep: 1378.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285
[2024-06-11 06:23:24,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.46 | bwd_microstep: 975.46 | bwd_inner_microstep: 975.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516
[2024-06-11 06:23:25,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1292.62 | bwd_inner_microstep: 1292.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 06:23:27,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.56 | bwd_microstep: 1411.45 | bwd_inner_microstep: 1411.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3592
[2024-06-11 06:23:29,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.12 | bwd_microstep: 1431.21 | bwd_inner_microstep: 1431.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3767
[2024-06-11 06:23:31,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1467.54 | bwd_inner_microstep: 1467.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583
[2024-06-11 06:23:34,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.42 | bwd_microstep: 1698.78 | bwd_inner_microstep: 1698.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428
[2024-06-11 06:23:39,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.14 | optimizer_step: 6.64
[2024-06-11 06:23:39,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 4448.89 | bwd_inner_microstep: 1648.67 | bwd_allreduce_microstep: 2800.15 | step_microstep: 39.37
[2024-06-11 06:23:39,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15570.36 | bwd: 44558.33 | bwd_inner: 41757.26 | bwd_allreduce: 2800.38 | step: 40.91
{'loss': 1.1932, 'learning_rate': 3.1689590394570204e-08, 'epoch': 0.98}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-11 06:23:41,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.02 | bwd_microstep: 1400.03 | bwd_inner_microstep: 1400.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857
[2024-06-11 06:23:42,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.43 | bwd_microstep: 674.32 | bwd_inner_microstep: 674.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3910
[2024-06-11 06:23:44,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1493.47 | bwd_inner_microstep: 1493.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4393
[2024-06-11 06:23:46,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.29 | bwd_microstep: 1644.01 | bwd_inner_microstep: 1643.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881
[2024-06-11 06:23:47,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.45 | bwd_microstep: 679.67 | bwd_inner_microstep: 679.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478
[2024-06-11 06:23:49,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1483.24 | bwd_inner_microstep: 1483.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401
[2024-06-11 06:23:51,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1245.98 | bwd_inner_microstep: 1245.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712
[2024-06-11 06:23:53,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1431.11 | bwd_inner_microstep: 1431.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511
[2024-06-11 06:23:54,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1290.45 | bwd_inner_microstep: 1290.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619
[2024-06-11 06:23:56,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1314.95 | bwd_inner_microstep: 1314.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402
[2024-06-11 06:23:58,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.44 | bwd_microstep: 1371.15 | bwd_inner_microstep: 1371.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-11 06:24:00,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.11 | bwd_microstep: 1419.97 | bwd_inner_microstep: 1419.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 06:24:02,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1345.43 | bwd_inner_microstep: 1345.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 06:24:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1374.48 | bwd_inner_microstep: 1374.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3504
[2024-06-11 06:24:06,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1553.03 | bwd_inner_microstep: 1553.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2509
[2024-06-11 06:24:07,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.12 | bwd_microstep: 1062.90 | bwd_inner_microstep: 1062.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 06:24:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1252.92 | bwd_inner_microstep: 1252.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3429
[2024-06-11 06:24:11,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.34 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520
[2024-06-11 06:24:13,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.99 | bwd_microstep: 1226.29 | bwd_inner_microstep: 1226.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-11 06:24:15,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1512.19 | bwd_inner_microstep: 1512.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3705
[2024-06-11 06:24:16,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1334.09 | bwd_inner_microstep: 1334.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532
[2024-06-11 06:24:18,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1396.20 | bwd_inner_microstep: 1396.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513
[2024-06-11 06:24:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.83 | bwd_microstep: 1487.60 | bwd_inner_microstep: 1487.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528
[2024-06-11 06:24:22,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1295.53 | bwd_inner_microstep: 1295.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919
[2024-06-11 06:24:23,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.05 | bwd_microstep: 688.59 | bwd_inner_microstep: 688.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2349
[2024-06-11 06:24:25,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.81 | bwd_microstep: 1023.49 | bwd_inner_microstep: 1023.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789
[2024-06-11 06:24:27,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.80 | bwd_microstep: 1446.23 | bwd_inner_microstep: 1446.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3813
[2024-06-11 06:24:29,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.44 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2484
[2024-06-11 06:24:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.36 | bwd_microstep: 1026.83 | bwd_inner_microstep: 1026.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3570
[2024-06-11 06:24:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.45 | bwd_microstep: 1331.83 | bwd_inner_microstep: 1331.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563
[2024-06-11 06:24:34,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.29 | bwd_microstep: 1502.21 | bwd_inner_microstep: 1502.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2973
[2024-06-11 06:24:38,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.18 | optimizer_step: 6.59
[2024-06-11 06:24:38,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.37 | bwd_microstep: 3064.93 | bwd_inner_microstep: 1248.16 | bwd_allreduce_microstep: 1816.71 | step_microstep: 39.90
[2024-06-11 06:24:38,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15476.95 | bwd: 43151.77 | bwd_inner: 41333.29 | bwd_allreduce: 1816.93 | step: 41.51
{'loss': 1.1929, 'learning_rate': 2.9612674499961413e-08, 'epoch': 0.98}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939
[2024-06-11 06:24:39,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.95 | bwd_microstep: 780.71 | bwd_inner_microstep: 780.52 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 06:24:41,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1379.55 | bwd_inner_microstep: 1379.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878
[2024-06-11 06:24:43,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.24 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295
[2024-06-11 06:24:44,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.44 | bwd_microstep: 877.67 | bwd_inner_microstep: 877.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2057
[2024-06-11 06:24:45,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.97 | bwd_microstep: 764.87 | bwd_inner_microstep: 764.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811
[2024-06-11 06:24:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.33 | bwd_microstep: 1549.31 | bwd_inner_microstep: 1549.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503
[2024-06-11 06:24:49,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1390.85 | bwd_inner_microstep: 1390.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916
[2024-06-11 06:24:50,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 719.38 | bwd_inner_microstep: 719.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 06:24:52,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1281.60 | bwd_inner_microstep: 1281.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 11, images per sample: 2.75, dynamic token length: 1466
[2024-06-11 06:24:53,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 223.93 | bwd_microstep: 595.18 | bwd_inner_microstep: 595.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753
[2024-06-11 06:24:55,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.96 | bwd_microstep: 1464.09 | bwd_inner_microstep: 1464.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3613
[2024-06-11 06:24:57,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.49 | bwd_microstep: 1462.48 | bwd_inner_microstep: 1462.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3744
[2024-06-11 06:24:59,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1435.33 | bwd_inner_microstep: 1435.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 06:25:01,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507
[2024-06-11 06:25:03,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.12 | bwd_microstep: 1581.40 | bwd_inner_microstep: 1581.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2123
[2024-06-11 06:25:04,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.53 | bwd_microstep: 860.08 | bwd_inner_microstep: 860.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 06:25:06,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1473.26 | bwd_inner_microstep: 1473.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 06:25:08,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1257.35 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564
[2024-06-11 06:25:10,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1402.72 | bwd_inner_microstep: 1402.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832
[2024-06-11 06:25:12,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.98 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2069
[2024-06-11 06:25:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.01 | bwd_microstep: 752.99 | bwd_inner_microstep: 752.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3559
[2024-06-11 06:25:15,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1236.92 | bwd_inner_microstep: 1236.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557
[2024-06-11 06:25:17,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1501.07 | bwd_inner_microstep: 1501.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282
[2024-06-11 06:25:18,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.81 | bwd_microstep: 976.57 | bwd_inner_microstep: 976.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451
[2024-06-11 06:25:20,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.02 | bwd_microstep: 1192.72 | bwd_inner_microstep: 1192.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436
[2024-06-11 06:25:22,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1455.58 | bwd_inner_microstep: 1455.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593
[2024-06-11 06:25:24,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1406.15 | bwd_inner_microstep: 1406.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3725
[2024-06-11 06:25:26,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.45 | bwd_microstep: 1728.93 | bwd_inner_microstep: 1728.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229
[2024-06-11 06:25:27,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.33 | bwd_microstep: 960.31 | bwd_inner_microstep: 960.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2232
[2024-06-11 06:25:29,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.88 | bwd_microstep: 961.85 | bwd_inner_microstep: 961.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579
[2024-06-11 06:25:31,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.98 | bwd_microstep: 1504.25 | bwd_inner_microstep: 1504.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510
[2024-06-11 06:25:43,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.62
[2024-06-11 06:25:43,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 11399.83 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 9951.42 | step_microstep: 39.59
[2024-06-11 06:25:43,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14868.87 | bwd: 49775.28 | bwd_inner: 39822.81 | bwd_allreduce: 9951.72 | step: 41.32
:39, 61.19s/it]


 98%|█████████▊| 1693/1726 [29:43:12<33:39, 61.19s/it]
 98%|█████████▊| 1694/1726 [29:44:13<32:36, 61.13s/it]


 98%|█████████▊| 1694/1726 [29:44:13<32:36, 61.13s/it]
 98%|█████████▊| 1695/1726 [29:45:15<31:40, 61.31s/it]


 98%|█████████▊| 1695/1726 [29:45:15<31:40, 61.31s/it]
 98%|█████████▊| 1696/1726 [29:46:15<30:31, 61.06s/it]


 98%|█████████▊| 1696/1726 [29:46:15<30:31, 61.06s/it]
 98%|█████████▊| 1697/1726 [29:47:14<29:12, 60.43s/it]


 98%|█████████▊| 1697/1726 [29:47:14<29:12, 60.43s/it]
 98%|█████████▊| 1698/1726 [29:48{'loss': 1.1637, 'learning_rate': 2.760609420569882e-08, 'epoch': 0.98}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463
[2024-06-11 06:25:45,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1371.32 | bwd_inner_microstep: 1371.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2389
[2024-06-11 06:25:46,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.05 | bwd_microstep: 999.34 | bwd_inner_microstep: 999.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 06:25:48,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.24 | bwd_microstep: 1650.73 | bwd_inner_microstep: 1650.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3851
[2024-06-11 06:25:50,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.26 | bwd_microstep: 1485.42 | bwd_inner_microstep: 1485.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774
[2024-06-11 06:25:52,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.54 | bwd_microstep: 1438.37 | bwd_inner_microstep: 1438.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941
[2024-06-11 06:25:53,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.90 | bwd_microstep: 791.06 | bwd_inner_microstep: 791.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 06:25:55,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411
[2024-06-11 06:25:57,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.97 | bwd_microstep: 1180.41 | bwd_inner_microstep: 1180.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-11 06:25:59,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1487.47 | bwd_inner_microstep: 1487.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 06:26:01,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1253.18 | bwd_inner_microstep: 1253.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 06:26:02,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.91 | bwd_microstep: 1288.74 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442
[2024-06-11 06:26:04,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.44 | bwd_microstep: 1157.93 | bwd_inner_microstep: 1157.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668
[2024-06-11 06:26:06,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1324.37 | bwd_inner_microstep: 1324.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3601
[2024-06-11 06:26:08,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.59 | bwd_microstep: 1469.04 | bwd_inner_microstep: 1469.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3492
[2024-06-11 06:26:10,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1369.26 | bwd_inner_microstep: 1369.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645
[2024-06-11 06:26:12,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.03 | bwd_microstep: 1713.70 | bwd_inner_microstep: 1713.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3635
[2024-06-11 06:26:14,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.18 | bwd_microstep: 1535.56 | bwd_inner_microstep: 1535.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 06:26:16,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1394.23 | bwd_inner_microstep: 1394.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2700
[2024-06-11 06:26:18,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.69 | bwd_microstep: 1130.58 | bwd_inner_microstep: 1130.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 06:26:20,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1381.86 | bwd_inner_microstep: 1381.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 06:26:22,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1379.37 | bwd_inner_microstep: 1379.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 06:26:24,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1495.06 | bwd_inner_microstep: 1495.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3091
[2024-06-11 06:26:25,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.29 | bwd_microstep: 1064.01 | bwd_inner_microstep: 1063.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001
[2024-06-11 06:26:26,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.18 | bwd_microstep: 898.41 | bwd_inner_microstep: 898.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551
[2024-06-11 06:26:28,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1430.48 | bwd_inner_microstep: 1430.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943
[2024-06-11 06:26:29,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.67 | bwd_microstep: 762.20 | bwd_inner_microstep: 762.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 06:26:31,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.62 | bwd_microstep: 1257.49 | bwd_inner_microstep: 1257.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3004
[2024-06-11 06:26:33,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.05 | bwd_microstep: 1020.74 | bwd_inner_microstep: 1020.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3592
[2024-06-11 06:26:34,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1337.92 | bwd_inner_microstep: 1337.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3729
[2024-06-11 06:26:36,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.54 | bwd_microstep: 1338.87 | bwd_inner_microstep: 1338.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568
[2024-06-11 06:26:38,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1397.08 | bwd_inner_microstep: 1397.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3551
[2024-06-11 06:26:45,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.58
[2024-06-11 06:26:45,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 6054.85 | bwd_inner_microstep: 1724.78 | bwd_allreduce_microstep: 4329.95 | step_microstep: 41.02
[2024-06-11 06:26:45,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15675.58 | bwd: 46239.56 | bwd_inner: 41908.63 | bwd_allreduce: 4330.22 | step: 42.63
{'loss': 1.1207, 'learning_rate': 2.566985657894483e-08, 'epoch': 0.98}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399
[2024-06-11 06:26:47,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.30 | bwd_microstep: 1270.20 | bwd_inner_microstep: 1270.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943
[2024-06-11 06:26:48,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.66 | bwd_microstep: 792.66 | bwd_inner_microstep: 792.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839
[2024-06-11 06:26:50,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.21 | bwd_microstep: 1553.72 | bwd_inner_microstep: 1553.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 06:26:52,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1344.00 | bwd_inner_microstep: 1343.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810
[2024-06-11 06:26:54,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1456.55 | bwd_inner_microstep: 1456.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739
[2024-06-11 06:26:56,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1530.59 | bwd_inner_microstep: 1530.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494
[2024-06-11 06:26:58,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.80 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425
[2024-06-11 06:26:59,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1249.69 | bwd_inner_microstep: 1249.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 06:27:01,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.68 | bwd_microstep: 799.12 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3447
[2024-06-11 06:27:02,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.97 | bwd_microstep: 1207.75 | bwd_inner_microstep: 1207.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2095
[2024-06-11 06:27:03,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.12 | bwd_microstep: 854.84 | bwd_inner_microstep: 854.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2666
[2024-06-11 06:27:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.85 | bwd_microstep: 1084.77 | bwd_inner_microstep: 1084.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3650
[2024-06-11 06:27:07,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.65 | bwd_microstep: 1716.05 | bwd_inner_microstep: 1716.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2963
[2024-06-11 06:27:09,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.01 | bwd_microstep: 1198.43 | bwd_inner_microstep: 1198.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3638
[2024-06-11 06:27:11,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1434.80 | bwd_inner_microstep: 1434.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837
[2024-06-11 06:27:13,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.10 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 06:27:15,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1401.81 | bwd_inner_microstep: 1401.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 611
[2024-06-11 06:27:15,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.83 | bwd_microstep: 258.83 | bwd_inner_microstep: 258.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 06:27:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1511.78 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439
[2024-06-11 06:27:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.49 | bwd_inner_microstep: 1256.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822
[2024-06-11 06:27:21,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1645.61 | bwd_inner_microstep: 1645.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3522
[2024-06-11 06:27:23,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1518.81 | bwd_inner_microstep: 1518.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 06:27:25,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.16 | bwd_microstep: 1358.82 | bwd_inner_microstep: 1358.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3556
[2024-06-11 06:27:27,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.23 | bwd_microstep: 1260.21 | bwd_inner_microstep: 1260.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574
[2024-06-11 06:27:29,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1499.62 | bwd_inner_microstep: 1499.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288
[2024-06-11 06:27:30,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.91 | bwd_microstep: 877.39 | bwd_inner_microstep: 877.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2013
[2024-06-11 06:27:31,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.48 | bwd_microstep: 771.38 | bwd_inner_microstep: 771.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 06:27:33,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1383.27 | bwd_inner_microstep: 1383.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1532
[2024-06-11 06:27:34,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 217.37 | bwd_microstep: 564.25 | bwd_inner_microstep: 564.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2952
[2024-06-11 06:27:36,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.11 | bwd_microstep: 1099.20 | bwd_inner_microstep: 1099.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769
[2024-06-11 06:27:38,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.72 | bwd_microstep: 1551.12 | bwd_inner_microstep: 1551.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464
[2024-06-11 06:27:46,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.36 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-11 06:27:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.35 | bwd_microstep: 7571.77 | bwd_inner_microstep: 1325.16 | bwd_allreduce_microstep: 6246.54 | step_microstep: 41.68
[2024-06-11 06:27:46,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14806.85 | bwd: 45866.24 | bwd_inner: 39618.73 | bwd_allreduce: 6246.78 | step: 43.53
{'loss': 1.1581, 'learning_rate': 2.3803968439117807e-08, 'epoch': 0.98}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413
[2024-06-11 06:27:48,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.59 | bwd_microstep: 1360.26 | bwd_inner_microstep: 1360.18 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459
[2024-06-11 06:27:50,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.92 | bwd_microstep: 1472.75 | bwd_inner_microstep: 1472.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290
[2024-06-11 06:27:51,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.06 | bwd_microstep: 972.30 | bwd_inner_microstep: 972.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508
[2024-06-11 06:27:53,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1285.27 | bwd_inner_microstep: 1285.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3789
[2024-06-11 06:27:55,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.53 | bwd_microstep: 1380.76 | bwd_inner_microstep: 1380.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4079
[2024-06-11 06:27:57,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.38 | bwd_microstep: 1425.30 | bwd_inner_microstep: 1425.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767
[2024-06-11 06:27:59,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1543.13 | bwd_inner_microstep: 1543.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935
[2024-06-11 06:28:00,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.57 | bwd_microstep: 788.62 | bwd_inner_microstep: 788.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-11 06:28:01,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 792.77 | bwd_inner_microstep: 792.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546
[2024-06-11 06:28:03,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.12 | bwd_microstep: 1297.18 | bwd_inner_microstep: 1297.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1959
[2024-06-11 06:28:04,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 764.08 | bwd_inner_microstep: 764.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653
[2024-06-11 06:28:06,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.33 | bwd_microstep: 1612.53 | bwd_inner_microstep: 1612.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 06:28:08,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1379.32 | bwd_inner_microstep: 1379.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540
[2024-06-11 06:28:10,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1390.86 | bwd_inner_microstep: 1390.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-11 06:28:12,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 1523.15 | bwd_inner_microstep: 1523.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634
[2024-06-11 06:28:14,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1411.84 | bwd_inner_microstep: 1411.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 06:28:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1281.36 | bwd_inner_microstep: 1281.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-11 06:28:18,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1312.53 | bwd_inner_microstep: 1312.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 06:28:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1356.26 | bwd_inner_microstep: 1356.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954
[2024-06-11 06:28:21,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.38 | bwd_microstep: 702.30 | bwd_inner_microstep: 702.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935
[2024-06-11 06:28:22,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.97 | bwd_microstep: 699.73 | bwd_inner_microstep: 699.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 06:28:23,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.57 | bwd_microstep: 1377.74 | bwd_inner_microstep: 1377.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3789
[2024-06-11 06:28:26,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.37 | bwd_microstep: 1750.41 | bwd_inner_microstep: 1750.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3544
[2024-06-11 06:28:28,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.68 | bwd_microstep: 1356.99 | bwd_inner_microstep: 1356.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809
[2024-06-11 06:28:30,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.79 | bwd_microstep: 1584.23 | bwd_inner_microstep: 1584.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3491
[2024-06-11 06:28:32,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.59 | bwd_microstep: 1347.01 | bwd_inner_microstep: 1346.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189
[2024-06-11 06:28:33,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.61 | bwd_microstep: 861.35 | bwd_inner_microstep: 861.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181
[2024-06-11 06:28:34,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.87 | bwd_microstep: 955.50 | bwd_inner_microstep: 955.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-11 06:28:36,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.95 | bwd_microstep: 1461.09 | bwd_inner_microstep: 1461.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471
[2024-06-11 06:28:38,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1286.12 | bwd_inner_microstep: 1286.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2211
[2024-06-11 06:28:39,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.91 | bwd_microstep: 866.77 | bwd_inner_microstep: 866.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755
[2024-06-11 06:28:50,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.64
[2024-06-11 06:28:50,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.27 | bwd_microstep: 9560.74 | bwd_inner_microstep: 1860.89 | bwd_allreduce_microstep: 7699.78 | step_microstep: 40.01
[2024-06-11 06:28:50,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15101.20 | bwd: 48160.27 | bwd_inner: 40459.50 | bwd_allreduce: 7700.06 | step: 41.59
{'loss': 1.1754, 'learning_rate': 2.2008436357869866e-08, 'epoch': 0.99}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916
[2024-06-11 06:28:51,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.21 | bwd_microstep: 774.75 | bwd_inner_microstep: 774.59 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-11 06:28:53,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.89 | bwd_microstep: 1473.71 | bwd_inner_microstep: 1473.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3885
[2024-06-11 06:28:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.12 | bwd_microstep: 1479.03 | bwd_inner_microstep: 1479.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4186
[2024-06-11 06:28:57,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.45 | bwd_microstep: 1612.30 | bwd_inner_microstep: 1612.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922
[2024-06-11 06:28:58,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.24 | bwd_microstep: 817.20 | bwd_inner_microstep: 817.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 06:29:00,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.53 | bwd_microstep: 1374.60 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 06:29:02,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1247.11 | bwd_inner_microstep: 1247.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 06:29:04,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1382.42 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716
[2024-06-11 06:29:06,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.78 | bwd_microstep: 1465.12 | bwd_inner_microstep: 1465.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 06:29:07,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1353.02 | bwd_inner_microstep: 1352.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497
[2024-06-11 06:29:09,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1413.11 | bwd_inner_microstep: 1413.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418
[2024-06-11 06:29:11,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.25 | bwd_microstep: 1183.35 | bwd_inner_microstep: 1183.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2145
[2024-06-11 06:29:12,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.61 | bwd_microstep: 1006.45 | bwd_inner_microstep: 1006.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508
[2024-06-11 06:29:15,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1579.68 | bwd_inner_microstep: 1579.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500
[2024-06-11 06:29:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1481.42 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425
[2024-06-11 06:29:18,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.42 | bwd_microstep: 1032.63 | bwd_inner_microstep: 1032.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498
[2024-06-11 06:29:20,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1387.66 | bwd_inner_microstep: 1387.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-11 06:29:22,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.17 | bwd_microstep: 1509.02 | bwd_inner_microstep: 1508.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3606
[2024-06-11 06:29:24,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1245.51 | bwd_inner_microstep: 1245.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-11 06:29:25,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.46 | bwd_microstep: 975.24 | bwd_inner_microstep: 975.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623
[2024-06-11 06:29:27,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536
[2024-06-11 06:29:29,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1294.59 | bwd_inner_microstep: 1294.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-11 06:29:31,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.82 | bwd_microstep: 1549.49 | bwd_inner_microstep: 1549.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 06:29:33,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1375.54 | bwd_inner_microstep: 1375.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922
[2024-06-11 06:29:34,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.62 | bwd_microstep: 696.37 | bwd_inner_microstep: 696.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752
[2024-06-11 06:29:36,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1532.75 | bwd_inner_microstep: 1532.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 06:29:38,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1496.77 | bwd_inner_microstep: 1496.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526
[2024-06-11 06:29:40,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1455.79 | bwd_inner_microstep: 1455.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089
[2024-06-11 06:29:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.69 | bwd_microstep: 917.75 | bwd_inner_microstep: 917.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808
[2024-06-11 06:29:44,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.25 | bwd_microstep: 1548.41 | bwd_inner_microstep: 1548.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 06:29:45,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3587
[2024-06-11 06:29:50,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.17 | optimizer_step: 6.61
[2024-06-11 06:29:50,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 4200.95 | bwd_inner_microstep: 1851.46 | bwd_allreduce_microstep: 2349.44 | step_microstep: 39.03
[2024-06-11 06:29:50,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15735.04 | bwd: 44617.03 | bwd_inner: 42266.56 | bwd_allreduce: 2349.73 | step: 40.71
{'loss': 1.1812, 'learning_rate': 2.0283266659051338e-08, 'epoch': 0.99}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 06:29:52,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1408.20 | bwd_inner_microstep: 1408.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 06:29:54,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1378.95 | bwd_inner_microstep: 1378.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2376
[2024-06-11 06:29:55,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.34 | bwd_microstep: 997.05 | bwd_inner_microstep: 997.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399
[2024-06-11 06:29:57,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.64 | bwd_microstep: 1338.77 | bwd_inner_microstep: 1338.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467
[2024-06-11 06:29:59,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724
[2024-06-11 06:30:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.36 | bwd_microstep: 1530.71 | bwd_inner_microstep: 1530.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416
[2024-06-11 06:30:03,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1343.33 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2492
[2024-06-11 06:30:05,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.74 | bwd_microstep: 954.91 | bwd_inner_microstep: 954.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482
[2024-06-11 06:30:06,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1284.12 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507
[2024-06-11 06:30:08,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1416.73 | bwd_inner_microstep: 1416.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493
[2024-06-11 06:30:10,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1416.66 | bwd_inner_microstep: 1416.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982
[2024-06-11 06:30:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.49 | bwd_microstep: 830.29 | bwd_inner_microstep: 830.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508
[2024-06-11 06:30:13,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.82 | bwd_microstep: 1445.70 | bwd_inner_microstep: 1445.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2164
[2024-06-11 06:30:15,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.39 | bwd_microstep: 978.72 | bwd_inner_microstep: 978.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485
[2024-06-11 06:30:17,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.72 | bwd_microstep: 1480.69 | bwd_inner_microstep: 1480.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642
[2024-06-11 06:30:19,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1411.31 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415
[2024-06-11 06:30:21,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1437.74 | bwd_inner_microstep: 1437.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651
[2024-06-11 06:30:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1324.38 | bwd_inner_microstep: 1324.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632
[2024-06-11 06:30:25,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.25 | bwd_microstep: 1608.54 | bwd_inner_microstep: 1608.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 06:30:27,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1413.75 | bwd_inner_microstep: 1413.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821
[2024-06-11 06:30:29,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470
[2024-06-11 06:30:31,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.88 | bwd_microstep: 1284.25 | bwd_inner_microstep: 1284.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2277
[2024-06-11 06:30:32,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.77 | bwd_microstep: 785.45 | bwd_inner_microstep: 785.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3742
[2024-06-11 06:30:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1383.88 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 06:30:36,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1377.44 | bwd_inner_microstep: 1377.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547
[2024-06-11 06:30:38,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1494.79 | bwd_inner_microstep: 1494.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496
[2024-06-11 06:30:40,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1384.48 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995
[2024-06-11 06:30:41,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.95 | bwd_microstep: 895.34 | bwd_inner_microstep: 895.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810
[2024-06-11 06:30:43,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.25 | bwd_microstep: 1556.68 | bwd_inner_microstep: 1556.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 06:30:45,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1287.33 | bwd_inner_microstep: 1287.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 06:30:46,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1258.36 | bwd_inner_microstep: 1258.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590
[2024-06-11 06:30:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.19 | optimizer_step: 6.60
[2024-06-11 06:30:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.89 | bwd_microstep: 5369.62 | bwd_inner_microstep: 2252.83 | bwd_allreduce_microstep: 3116.73 | step_microstep: 39.23
[2024-06-11 06:30:52,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15860.49 | bwd: 46015.63 | bwd_inner: 42897.96 | bwd_allreduce: 3116.96 | step: 40.88
:19<28:50, 61.80s/it]


 98%|█████████▊| 1698/1726 [29:48:19<28:50, 61.80s/it]
 98%|█████████▊| 1699/1726 [29:49:22<27:52, 61.94s/it]


 98%|█████████▊| 1699/1726 [29:49:22<27:52, 61.94s/it]
 98%|█████████▊| 1700/1726 [29:50:23<26:43, 61.67s/it]


 98%|█████████▊| 1700/1726 [29:50:23<26:43, 61.67s/it]
 99%|█████████▊| 1701/1726 [29:51:26<25:56, 62.25s/it]


 99%|█████████▊| 1701/1726 [29:51:26<25:56, 62.25s/it]
 99%|█████████▊| 1702/1726 [29:52:27<24:42, 61.79s/it]


 99%|█████████▊| 1702/1726 [29:52:27<24:42, 61.79s/it]
 99%|█████████▊| 1703/1726 {'loss': 1.1587, 'learning_rate': 1.862846541870633e-08, 'epoch': 0.99}
dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1252
[2024-06-11 06:30:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 184.63 | bwd_microstep: 479.98 | bwd_inner_microstep: 479.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1866
[2024-06-11 06:30:54,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.06 | bwd_microstep: 740.31 | bwd_inner_microstep: 740.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3906
[2024-06-11 06:30:56,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.60 | bwd_microstep: 1589.35 | bwd_inner_microstep: 1589.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486
[2024-06-11 06:30:58,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1479.66 | bwd_inner_microstep: 1479.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769
[2024-06-11 06:31:00,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1341.97 | bwd_inner_microstep: 1341.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545
[2024-06-11 06:31:02,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.55 | bwd_microstep: 1492.21 | bwd_inner_microstep: 1492.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418
[2024-06-11 06:31:04,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1344.59 | bwd_inner_microstep: 1344.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971
[2024-06-11 06:31:05,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.69 | bwd_microstep: 703.24 | bwd_inner_microstep: 703.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-11 06:31:07,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1485.19 | bwd_inner_microstep: 1485.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504
[2024-06-11 06:31:09,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1390.52 | bwd_inner_microstep: 1390.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556
[2024-06-11 06:31:11,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1505.74 | bwd_inner_microstep: 1505.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1990
[2024-06-11 06:31:12,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.73 | bwd_microstep: 860.56 | bwd_inner_microstep: 860.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389
[2024-06-11 06:31:14,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.91 | bwd_microstep: 1240.87 | bwd_inner_microstep: 1240.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2132
[2024-06-11 06:31:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.34 | bwd_microstep: 890.41 | bwd_inner_microstep: 890.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3611
[2024-06-11 06:31:18,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.25 | bwd_microstep: 1702.44 | bwd_inner_microstep: 1702.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3652
[2024-06-11 06:31:20,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.42 | bwd_microstep: 1780.96 | bwd_inner_microstep: 1780.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2131
[2024-06-11 06:31:22,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.26 | bwd_microstep: 1021.73 | bwd_inner_microstep: 1021.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2483
[2024-06-11 06:31:23,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.73 | bwd_microstep: 1052.34 | bwd_inner_microstep: 1052.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524
[2024-06-11 06:31:25,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1487.37 | bwd_inner_microstep: 1487.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2093
[2024-06-11 06:31:26,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.82 | bwd_microstep: 851.43 | bwd_inner_microstep: 851.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544
[2024-06-11 06:31:28,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1391.87 | bwd_inner_microstep: 1391.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2986
[2024-06-11 06:31:30,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.47 | bwd_microstep: 1141.11 | bwd_inner_microstep: 1141.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439
[2024-06-11 06:31:32,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.33 | bwd_microstep: 1283.80 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 06:31:34,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.41 | bwd_microstep: 1551.70 | bwd_inner_microstep: 1551.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 06:31:36,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1383.38 | bwd_inner_microstep: 1383.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006
[2024-06-11 06:31:37,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.87 | bwd_microstep: 805.11 | bwd_inner_microstep: 805.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451
[2024-06-11 06:31:39,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1413.27 | bwd_inner_microstep: 1413.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719
[2024-06-11 06:31:40,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.99 | bwd_microstep: 1338.20 | bwd_inner_microstep: 1338.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468
[2024-06-11 06:31:42,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.98 | bwd_microstep: 1182.96 | bwd_inner_microstep: 1182.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 06:31:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.00 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668
[2024-06-11 06:31:46,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1528.04 | bwd_inner_microstep: 1528.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817
[2024-06-11 06:32:21,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.31 | optimizer_step: 6.62
[2024-06-11 06:32:21,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 34067.15 | bwd_inner_microstep: 1882.79 | bwd_allreduce_microstep: 32184.29 | step_microstep: 40.12
[2024-06-11 06:32:21,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15144.00 | bwd: 73084.52 | bwd_inner: 40899.19 | bwd_allreduce: 32184.59 | step: 41.65
{'loss': 1.138, 'learning_rate': 1.7044038465030553e-08, 'epoch': 0.99}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418
[2024-06-11 06:32:23,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1434.81 | bwd_inner_microstep: 1434.62 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397
[2024-06-11 06:32:25,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.61 | bwd_microstep: 1149.99 | bwd_inner_microstep: 1149.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 06:32:27,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841
[2024-06-11 06:32:29,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1553.96 | bwd_inner_microstep: 1553.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3767
[2024-06-11 06:32:31,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.76 | bwd_microstep: 1338.65 | bwd_inner_microstep: 1338.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2067
[2024-06-11 06:32:32,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.18 | bwd_microstep: 727.07 | bwd_inner_microstep: 727.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422
[2024-06-11 06:32:33,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495
[2024-06-11 06:32:35,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1382.65 | bwd_inner_microstep: 1382.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 06:32:37,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1378.42 | bwd_inner_microstep: 1378.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768
[2024-06-11 06:32:39,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.60 | bwd_microstep: 1436.83 | bwd_inner_microstep: 1436.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-11 06:32:40,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.88 | bwd_microstep: 796.98 | bwd_inner_microstep: 796.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659
[2024-06-11 06:32:42,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.10 | bwd_microstep: 1523.73 | bwd_inner_microstep: 1523.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2159
[2024-06-11 06:32:43,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.31 | bwd_microstep: 852.84 | bwd_inner_microstep: 852.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1964
[2024-06-11 06:32:44,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.88 | bwd_microstep: 736.09 | bwd_inner_microstep: 736.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2500
[2024-06-11 06:32:46,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.36 | bwd_microstep: 1147.41 | bwd_inner_microstep: 1147.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534
[2024-06-11 06:32:48,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1491.61 | bwd_inner_microstep: 1491.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663
[2024-06-11 06:32:50,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1425.23 | bwd_inner_microstep: 1425.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833
[2024-06-11 06:32:52,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.49 | bwd_microstep: 1564.72 | bwd_inner_microstep: 1564.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978
[2024-06-11 06:32:53,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.01 | bwd_microstep: 735.62 | bwd_inner_microstep: 735.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3534
[2024-06-11 06:32:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1419.46 | bwd_inner_microstep: 1419.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448
[2024-06-11 06:32:57,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.86 | bwd_microstep: 1315.14 | bwd_inner_microstep: 1315.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582
[2024-06-11 06:32:59,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.55 | bwd_microstep: 1502.06 | bwd_inner_microstep: 1502.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 06:33:01,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1558.08 | bwd_inner_microstep: 1558.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 06:33:03,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.48 | bwd_microstep: 1606.24 | bwd_inner_microstep: 1606.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-11 06:33:05,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.24 | bwd_microstep: 1445.39 | bwd_inner_microstep: 1445.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-11 06:33:08,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1543.60 | bwd_inner_microstep: 1543.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2035
[2024-06-11 06:33:09,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 903.28 | bwd_inner_microstep: 903.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437
[2024-06-11 06:33:11,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.82 | bwd_microstep: 1447.00 | bwd_inner_microstep: 1446.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593
[2024-06-11 06:33:13,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.16 | bwd_microstep: 1603.52 | bwd_inner_microstep: 1603.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577
[2024-06-11 06:33:15,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.66 | bwd_microstep: 1594.53 | bwd_inner_microstep: 1594.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585
[2024-06-11 06:33:17,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1498.74 | bwd_inner_microstep: 1498.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3656
[2024-06-11 06:33:21,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-11 06:33:21,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 3277.81 | bwd_inner_microstep: 1675.45 | bwd_allreduce_microstep: 1602.31 | step_microstep: 38.78
[2024-06-11 06:33:21,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15767.78 | bwd: 44018.87 | bwd_inner: 42415.51 | bwd_allreduce: 1602.62 | step: 40.37
{'loss': 1.1392, 'learning_rate': 1.552999137836908e-08, 'epoch': 0.99}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453
[2024-06-11 06:33:23,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1348.02 | bwd_inner_microstep: 1347.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3967
[2024-06-11 06:33:25,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.62 | bwd_microstep: 1698.29 | bwd_inner_microstep: 1698.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4291
[2024-06-11 06:33:28,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.44 | bwd_microstep: 1777.52 | bwd_inner_microstep: 1777.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466
[2024-06-11 06:33:30,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464
[2024-06-11 06:33:32,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1277.54 | bwd_inner_microstep: 1277.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3751
[2024-06-11 06:33:33,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1400.53 | bwd_inner_microstep: 1400.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558
[2024-06-11 06:33:36,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1500.68 | bwd_inner_microstep: 1500.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442
[2024-06-11 06:33:37,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1324.52 | bwd_inner_microstep: 1324.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489
[2024-06-11 06:33:39,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1388.89 | bwd_inner_microstep: 1388.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 06:33:41,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954
[2024-06-11 06:33:42,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.72 | bwd_microstep: 795.13 | bwd_inner_microstep: 795.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940
[2024-06-11 06:33:43,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.04 | bwd_microstep: 790.80 | bwd_inner_microstep: 790.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902
[2024-06-11 06:33:44,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.82 | bwd_microstep: 684.69 | bwd_inner_microstep: 684.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3673
[2024-06-11 06:33:46,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.15 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1950
[2024-06-11 06:33:47,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.56 | bwd_microstep: 854.36 | bwd_inner_microstep: 854.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434
[2024-06-11 06:33:49,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.94 | bwd_microstep: 1157.68 | bwd_inner_microstep: 1157.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621
[2024-06-11 06:33:51,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1511.49 | bwd_inner_microstep: 1511.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663
[2024-06-11 06:33:53,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.53 | bwd_microstep: 1522.14 | bwd_inner_microstep: 1522.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443
[2024-06-11 06:33:55,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1253.43 | bwd_inner_microstep: 1253.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976
[2024-06-11 06:33:56,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 797.21 | bwd_inner_microstep: 797.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-11 06:33:58,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1256.04 | bwd_inner_microstep: 1256.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527
[2024-06-11 06:34:00,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1395.71 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531
[2024-06-11 06:34:02,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1490.70 | bwd_inner_microstep: 1490.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979
[2024-06-11 06:34:03,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.09 | bwd_microstep: 800.57 | bwd_inner_microstep: 800.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442
[2024-06-11 06:34:05,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1253.26 | bwd_inner_microstep: 1253.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276
[2024-06-11 06:34:06,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.26 | bwd_microstep: 1002.72 | bwd_inner_microstep: 1002.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422
[2024-06-11 06:34:08,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.33 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 06:34:10,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.26 | bwd_microstep: 1377.76 | bwd_inner_microstep: 1377.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3548
[2024-06-11 06:34:12,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.39 | bwd_microstep: 1587.08 | bwd_inner_microstep: 1587.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580
[2024-06-11 06:34:14,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.02 | bwd_microstep: 1598.77 | bwd_inner_microstep: 1598.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3725
[2024-06-11 06:34:16,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.27 | bwd_microstep: 1515.42 | bwd_inner_microstep: 1515.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814
[2024-06-11 06:34:21,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.10 | optimizer_gradients: 4.14 | optimizer_step: 6.59
[2024-06-11 06:34:21,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.09 | bwd_microstep: 4258.16 | bwd_inner_microstep: 2310.15 | bwd_allreduce_microstep: 1947.94 | step_microstep: 40.34
[2024-06-11 06:34:21,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15576.46 | bwd: 44107.49 | bwd_inner: 42158.63 | bwd_allreduce: 1948.18 | step: 41.95
{'loss': 1.196, 'learning_rate': 1.408632949118971e-08, 'epoch': 0.99}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495
[2024-06-11 06:34:23,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.21 | bwd_microstep: 1474.94 | bwd_inner_microstep: 1474.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467
[2024-06-11 06:34:25,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1275.23 | bwd_inner_microstep: 1275.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509
[2024-06-11 06:34:27,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1386.69 | bwd_inner_microstep: 1386.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843
[2024-06-11 06:34:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.69 | bwd_microstep: 1557.19 | bwd_inner_microstep: 1557.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3794
[2024-06-11 06:34:31,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1507.53 | bwd_inner_microstep: 1507.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 06:34:33,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1439
[2024-06-11 06:34:34,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 218.00 | bwd_microstep: 567.42 | bwd_inner_microstep: 567.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971
[2024-06-11 06:34:35,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 794.00 | bwd_inner_microstep: 793.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412
[2024-06-11 06:34:37,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.87 | bwd_microstep: 1152.53 | bwd_inner_microstep: 1152.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948
[2024-06-11 06:34:38,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 700.57 | bwd_inner_microstep: 700.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684
[2024-06-11 06:34:40,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.88 | bwd_microstep: 1522.11 | bwd_inner_microstep: 1522.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666
[2024-06-11 06:34:42,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.96 | bwd_microstep: 1512.46 | bwd_inner_microstep: 1512.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708
[2024-06-11 06:34:44,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.71 | bwd_microstep: 1617.71 | bwd_inner_microstep: 1617.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3649
[2024-06-11 06:34:46,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.35 | bwd_microstep: 1607.50 | bwd_inner_microstep: 1607.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3646
[2024-06-11 06:34:48,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1676.62 | bwd_inner_microstep: 1676.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520
[2024-06-11 06:34:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.65 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2098
[2024-06-11 06:34:52,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.91 | bwd_microstep: 854.44 | bwd_inner_microstep: 854.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488
[2024-06-11 06:34:53,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.65 | bwd_microstep: 1187.00 | bwd_inner_microstep: 1186.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463
[2024-06-11 06:34:55,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.84 | bwd_microstep: 1311.61 | bwd_inner_microstep: 1311.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 06:34:57,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.27 | bwd_microstep: 1413.67 | bwd_inner_microstep: 1413.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616
[2024-06-11 06:34:59,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.99 | bwd_microstep: 1611.16 | bwd_inner_microstep: 1611.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2151
[2024-06-11 06:35:00,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.36 | bwd_microstep: 759.90 | bwd_inner_microstep: 759.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475
[2024-06-11 06:35:02,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1281.38 | bwd_inner_microstep: 1281.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4330
[2024-06-11 06:35:05,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.77 | bwd_microstep: 1904.53 | bwd_inner_microstep: 1904.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.38
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819
[2024-06-11 06:35:07,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.65 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600
[2024-06-11 06:35:09,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.06 | bwd_microstep: 1505.97 | bwd_inner_microstep: 1505.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2012
[2024-06-11 06:35:10,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 833.56 | bwd_inner_microstep: 833.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2010
[2024-06-11 06:35:11,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.64 | bwd_microstep: 853.91 | bwd_inner_microstep: 853.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3807
[2024-06-11 06:35:14,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.16 | bwd_microstep: 1855.24 | bwd_inner_microstep: 1855.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3449
[2024-06-11 06:35:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1417.49 | bwd_inner_microstep: 1417.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582
[2024-06-11 06:35:18,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.75 | bwd_microstep: 1605.49 | bwd_inner_microstep: 1605.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2661
[2024-06-11 06:35:23,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.18 | optimizer_step: 6.63
[2024-06-11 06:35:23,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.25 | bwd_microstep: 4472.87 | bwd_inner_microstep: 1259.61 | bwd_allreduce_microstep: 3213.20 | step_microstep: 38.31
[2024-06-11 06:35:23,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15689.68 | bwd: 45409.38 | bwd_inner: 42195.27 | bwd_allreduce: 3213.42 | step: 41.17
{'loss': 1.1484, 'learning_rate': 1.2713057888060764e-08, 'epoch': 0.99}
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540
[2024-06-11 06:35:24,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.87 | bwd_microstep: 1199.40 | bwd_inner_microstep: 1199.29 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414
[2024-06-11 06:35:26,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.26 | bwd_microstep: 1245.56 | bwd_inner_microstep: 1245.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-11 06:35:28,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.18 | bwd_microstep: 1295.45 | bwd_inner_microstep: 1295.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4150
[2024-06-11 06:35:30,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1504.35 | bwd_inner_microstep: 1504.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 06:35:32,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923
[2024-06-11 06:35:33,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.26 | bwd_microstep: 787.72 | bwd_inner_microstep: 787.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3774
[2024-06-11 06:35:36,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.71 | bwd_microstep: 1851.43 | bwd_inner_microstep: 1851.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-11 06:35:37,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1438.37 | bwd_inner_microstep: 1438.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471
[2024-06-11 06:35:39,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.87 | bwd_microstep: 1312.93 | bwd_inner_microstep: 1312.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-11 06:35:41,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.14 | bwd_microstep: 1382.03 | bwd_inner_microstep: 1382.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954
[2024-06-11 06:35:42,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.22 | bwd_microstep: 851.61 | bwd_inner_microstep: 851.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-11 06:35:44,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1350.48 | bwd_inner_microstep: 1350.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1906
[2024-06-11 06:35:45,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.85 | bwd_microstep: 874.39 | bwd_inner_microstep: 874.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446
[2024-06-11 06:35:47,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1353.96 | bwd_inner_microstep: 1353.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3631
[2024-06-11 06:35:49,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1538.46 | bwd_inner_microstep: 1538.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923
[2024-06-11 06:35:50,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.05 | bwd_microstep: 696.30 | bwd_inner_microstep: 696.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630
[2024-06-11 06:35:53,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1513.06 | bwd_inner_microstep: 1513.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1937
[2024-06-11 06:35:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.45 | bwd_microstep: 728.38 | bwd_inner_microstep: 728.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826
[2024-06-11 06:35:56,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.71 | bwd_microstep: 1558.97 | bwd_inner_microstep: 1558.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664
[2024-06-11 06:35:58,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.46 | bwd_microstep: 1510.56 | bwd_inner_microstep: 1510.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529
[2024-06-11 06:36:00,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1393.10 | bwd_inner_microstep: 1393.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2286
[2024-06-11 06:36:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.75 | bwd_microstep: 1023.78 | bwd_inner_microstep: 1023.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610
[2024-06-11 06:36:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1512.76 | bwd_inner_microstep: 1512.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 06:36:05,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3464
[2024-06-11 06:36:07,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1401.69 | bwd_inner_microstep: 1401.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289
[2024-06-11 06:36:09,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.25 | bwd_microstep: 972.00 | bwd_inner_microstep: 971.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274
[2024-06-11 06:36:10,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.53 | bwd_microstep: 976.93 | bwd_inner_microstep: 976.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3809
[2024-06-11 06:36:12,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.53 | bwd_microstep: 1384.83 | bwd_inner_microstep: 1384.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2273
[2024-06-11 06:36:13,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.81 | bwd_microstep: 905.68 | bwd_inner_microstep: 905.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-11 06:36:15,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.83 | bwd_microstep: 1638.47 | bwd_inner_microstep: 1638.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820
[2024-06-11 06:36:18,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.47 | bwd_microstep: 1857.93 | bwd_inner_microstep: 1857.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751
[2024-06-11 06:36:26,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.29 | optimizer_step: 6.56
[2024-06-11 06:36:26,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.86 | bwd_microstep: 7236.60 | bwd_inner_microstep: 1857.53 | bwd_allreduce_microstep: 5379.02 | step_microstep: 38.86
[2024-06-11 06:36:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15535.61 | bwd: 47239.76 | bwd_inner: 41859.74 | bwd_allreduce: 5379.31 | step: 40.66
[29:53:29<23:44, 61.92s/it]


 99%|█████████▊| 1703/1726 [29:53:29<23:44, 61.92s/it]
 99%|█████████▊| 1704/1726 [29:54:58<25:38, 69.91s/it]


 99%|█████████▊| 1704/1726 [29:54:58<25:38, 69.91s/it]
 99%|█████████▉| 1705/1726 [29:55:58<23:26, 66.98s/it]


 99%|█████████▉| 1705/1726 [29:55:58<23:26, 66.98s/it]
 99%|█████████▉| 1706/1726 [29:56:58<21:37, 64.90s/it]


 99%|█████████▉| 1706/1726 [29:56:58<21:37, 64.90s/it]
 99%|█████████▉| 1707/1726 [29:57:59<20:13, 63.87s/it]


 99%|█████████▉| 1707/1726 [29:57:59<20:13, 63.87s/it]
 99%|█████████▉| 1708{'loss': 1.1576, 'learning_rate': 1.1410181405639986e-08, 'epoch': 0.99}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 06:36:28,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.58 | bwd_microstep: 1466.20 | bwd_inner_microstep: 1466.12 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3474
[2024-06-11 06:36:30,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.06 | bwd_microstep: 1339.46 | bwd_inner_microstep: 1339.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 06:36:32,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.29 | bwd_microstep: 1550.42 | bwd_inner_microstep: 1550.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813
[2024-06-11 06:36:34,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.62 | bwd_microstep: 1651.15 | bwd_inner_microstep: 1651.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404
[2024-06-11 06:36:36,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491
[2024-06-11 06:36:38,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1385.96 | bwd_inner_microstep: 1385.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275
[2024-06-11 06:36:39,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.21 | bwd_microstep: 905.18 | bwd_inner_microstep: 905.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 06:36:41,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011
[2024-06-11 06:36:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 803.92 | bwd_inner_microstep: 803.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 06:36:44,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.73 | bwd_microstep: 1286.01 | bwd_inner_microstep: 1285.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408
[2024-06-11 06:36:46,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1438.32 | bwd_inner_microstep: 1438.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501
[2024-06-11 06:36:48,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.37 | bwd_microstep: 1483.59 | bwd_inner_microstep: 1483.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489
[2024-06-11 06:36:50,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.85 | bwd_microstep: 1487.38 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434
[2024-06-11 06:36:52,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1347.25 | bwd_inner_microstep: 1347.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3655
[2024-06-11 06:36:54,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1451.70 | bwd_inner_microstep: 1451.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417
[2024-06-11 06:36:56,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3630
[2024-06-11 06:36:58,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1473.70 | bwd_inner_microstep: 1473.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521
[2024-06-11 06:36:59,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 1322.88 | bwd_inner_microstep: 1322.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457
[2024-06-11 06:37:01,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.18 | bwd_microstep: 1183.49 | bwd_inner_microstep: 1183.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527
[2024-06-11 06:37:03,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.82 | bwd_microstep: 1293.16 | bwd_inner_microstep: 1293.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704
[2024-06-11 06:37:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1430.82 | bwd_inner_microstep: 1430.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 06:37:07,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1394.49 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832
[2024-06-11 06:37:09,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.46 | bwd_microstep: 1658.81 | bwd_inner_microstep: 1658.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442
[2024-06-11 06:37:11,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557
[2024-06-11 06:37:13,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1424.12 | bwd_inner_microstep: 1424.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809
[2024-06-11 06:37:15,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.14 | bwd_microstep: 1551.61 | bwd_inner_microstep: 1551.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491
[2024-06-11 06:37:17,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1291.23 | bwd_inner_microstep: 1291.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812
[2024-06-11 06:37:19,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.35 | bwd_microstep: 1455.78 | bwd_inner_microstep: 1455.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3577
[2024-06-11 06:37:21,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1550.70 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1889
[2024-06-11 06:37:22,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.97 | bwd_microstep: 806.79 | bwd_inner_microstep: 806.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820
[2024-06-11 06:37:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.95 | bwd_microstep: 1580.28 | bwd_inner_microstep: 1580.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3773
[2024-06-11 06:37:27,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.62
[2024-06-11 06:37:27,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.39 | bwd_microstep: 2391.36 | bwd_inner_microstep: 1771.35 | bwd_allreduce_microstep: 619.97 | step_microstep: 37.76
[2024-06-11 06:37:27,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16358.21 | bwd: 44659.98 | bwd_inner: 44039.05 | bwd_allreduce: 620.23 | step: 39.30
{'loss': 1.1595, 'learning_rate': 1.017770463264789e-08, 'epoch': 0.99}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413
[2024-06-11 06:37:29,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1241.98 | bwd_inner_microstep: 1241.83 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1939
[2024-06-11 06:37:30,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.17 | bwd_microstep: 852.29 | bwd_inner_microstep: 852.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 06:37:32,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1278.64 | bwd_inner_microstep: 1278.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478
[2024-06-11 06:37:34,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933
[2024-06-11 06:37:35,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487
[2024-06-11 06:37:37,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.00 | bwd_microstep: 1280.61 | bwd_inner_microstep: 1280.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069
[2024-06-11 06:37:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.24 | bwd_microstep: 818.68 | bwd_inner_microstep: 818.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 06:37:39,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.73 | bwd_microstep: 1283.51 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475
[2024-06-11 06:37:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.02 | bwd_microstep: 1479.06 | bwd_inner_microstep: 1479.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2675
[2024-06-11 06:37:43,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.95 | bwd_microstep: 1084.27 | bwd_inner_microstep: 1084.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683
[2024-06-11 06:37:45,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.77 | bwd_microstep: 1719.30 | bwd_inner_microstep: 1719.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3481
[2024-06-11 06:37:47,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.60 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3636
[2024-06-11 06:37:49,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1489.06 | bwd_inner_microstep: 1489.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3654
[2024-06-11 06:37:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1382.05 | bwd_inner_microstep: 1382.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390
[2024-06-11 06:37:53,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1337.22 | bwd_inner_microstep: 1337.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519
[2024-06-11 06:37:55,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1388.95 | bwd_inner_microstep: 1388.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460
[2024-06-11 06:37:57,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.92 | bwd_microstep: 1314.37 | bwd_inner_microstep: 1314.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590
[2024-06-11 06:37:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1506.82 | bwd_inner_microstep: 1506.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513
[2024-06-11 06:38:01,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1323.16 | bwd_inner_microstep: 1323.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813
[2024-06-11 06:38:03,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.05 | bwd_inner_microstep: 1556.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-11 06:38:05,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.39 | bwd_microstep: 1287.73 | bwd_inner_microstep: 1287.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 06:38:07,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1284.14 | bwd_inner_microstep: 1284.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575
[2024-06-11 06:38:08,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1302.50 | bwd_inner_microstep: 1302.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623
[2024-06-11 06:38:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1346.40 | bwd_inner_microstep: 1346.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475
[2024-06-11 06:38:12,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.60 | bwd_microstep: 1189.24 | bwd_inner_microstep: 1189.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935
[2024-06-11 06:38:13,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.18 | bwd_microstep: 728.18 | bwd_inner_microstep: 728.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473
[2024-06-11 06:38:15,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.22 | bwd_microstep: 1473.29 | bwd_inner_microstep: 1473.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3581
[2024-06-11 06:38:17,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.05 | bwd_microstep: 1455.14 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2614
[2024-06-11 06:38:18,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.35 | bwd_microstep: 1110.17 | bwd_inner_microstep: 1110.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3762
[2024-06-11 06:38:21,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1572.29 | bwd_inner_microstep: 1572.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379
[2024-06-11 06:38:23,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.06 | bwd_microstep: 1432.17 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802
[2024-06-11 06:38:31,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.63
[2024-06-11 06:38:31,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 7461.81 | bwd_inner_microstep: 1640.33 | bwd_allreduce_microstep: 5821.41 | step_microstep: 39.93
[2024-06-11 06:38:31,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15607.31 | bwd: 47557.93 | bwd_inner: 41735.48 | bwd_allreduce: 5821.70 | step: 41.64
{'loss': 1.1589, 'learning_rate': 9.015631909863321e-09, 'epoch': 0.99}
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-11 06:38:32,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.16 | bwd_microstep: 1267.39 | bwd_inner_microstep: 1267.21 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08
dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2713
[2024-06-11 06:38:34,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.75 | bwd_microstep: 999.61 | bwd_inner_microstep: 999.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3900
[2024-06-11 06:38:36,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.70 | bwd_microstep: 1688.79 | bwd_inner_microstep: 1688.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430
[2024-06-11 06:38:38,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1248.80 | bwd_inner_microstep: 1248.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295
[2024-06-11 06:38:39,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.99 | bwd_microstep: 973.07 | bwd_inner_microstep: 973.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502
[2024-06-11 06:38:41,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1392.66 | bwd_inner_microstep: 1392.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-11 06:38:42,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.70 | bwd_microstep: 701.36 | bwd_inner_microstep: 701.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3426
[2024-06-11 06:38:44,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.16 | bwd_microstep: 1152.05 | bwd_inner_microstep: 1152.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3629
[2024-06-11 06:38:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.20 | bwd_microstep: 1218.16 | bwd_inner_microstep: 1218.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968
[2024-06-11 06:38:47,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.16 | bwd_microstep: 795.75 | bwd_inner_microstep: 795.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 06:38:48,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1251.45 | bwd_inner_microstep: 1251.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 932
[2024-06-11 06:38:49,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.24 | bwd_microstep: 376.64 | bwd_inner_microstep: 376.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 06:38:51,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1349.60 | bwd_inner_microstep: 1349.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2978
[2024-06-11 06:38:52,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.63 | bwd_microstep: 1295.68 | bwd_inner_microstep: 1295.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737
[2024-06-11 06:38:55,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.49 | bwd_microstep: 1627.29 | bwd_inner_microstep: 1627.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519
[2024-06-11 06:38:57,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 1420.98 | bwd_inner_microstep: 1420.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975
[2024-06-11 06:38:58,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.21 | bwd_microstep: 733.62 | bwd_inner_microstep: 733.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385
[2024-06-11 06:39:00,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.95 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091
[2024-06-11 06:39:01,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.70 | bwd_microstep: 923.01 | bwd_inner_microstep: 922.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3834
[2024-06-11 06:39:03,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1510.65 | bwd_inner_microstep: 1510.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 06:39:05,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1379.34 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721
[2024-06-11 06:39:07,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 1441.42 | bwd_inner_microstep: 1441.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490
[2024-06-11 06:39:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.00 | bwd_microstep: 1190.36 | bwd_inner_microstep: 1190.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-11 06:39:11,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.48 | bwd_microstep: 1639.25 | bwd_inner_microstep: 1639.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3669
[2024-06-11 06:39:13,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.63 | bwd_microstep: 1327.47 | bwd_inner_microstep: 1327.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566
[2024-06-11 06:39:15,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1504.31 | bwd_inner_microstep: 1504.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726
[2024-06-11 06:39:17,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1539.27 | bwd_inner_microstep: 1539.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567
[2024-06-11 06:39:19,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819
[2024-06-11 06:39:21,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.09 | bwd_microstep: 1519.99 | bwd_inner_microstep: 1519.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479
[2024-06-11 06:39:23,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.43 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3473
[2024-06-11 06:39:25,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1429.78 | bwd_inner_microstep: 1429.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2194
[2024-06-11 06:39:32,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-11 06:39:32,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.33 | bwd_microstep: 6396.14 | bwd_inner_microstep: 1196.71 | bwd_allreduce_microstep: 5199.37 | step_microstep: 39.48
[2024-06-11 06:39:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15043.12 | bwd: 45518.23 | bwd_inner: 40317.84 | bwd_allreduce: 5199.66 | step: 41.10
{'loss': 1.1577, 'learning_rate': 7.923967330099036e-09, 'epoch': 0.99}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-11 06:39:34,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1472.61 | bwd_inner_microstep: 1472.51 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4009
[2024-06-11 06:39:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 1542.92 | bwd_inner_microstep: 1542.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898
[2024-06-11 06:39:38,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1585.66 | bwd_inner_microstep: 1585.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4272
[2024-06-11 06:39:40,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.93 | bwd_microstep: 1667.17 | bwd_inner_microstep: 1667.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772
[2024-06-11 06:39:42,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.15 | bwd_microstep: 1637.88 | bwd_inner_microstep: 1637.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405
[2024-06-11 06:39:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1241.77 | bwd_inner_microstep: 1241.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607
[2024-06-11 06:39:46,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1413.40 | bwd_inner_microstep: 1413.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916
[2024-06-11 06:39:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.77 | bwd_microstep: 686.31 | bwd_inner_microstep: 686.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469
[2024-06-11 06:39:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.69 | bwd_microstep: 1277.07 | bwd_inner_microstep: 1277.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487
[2024-06-11 06:39:51,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1383.34 | bwd_inner_microstep: 1383.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3927
[2024-06-11 06:39:53,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.65 | bwd_microstep: 1686.31 | bwd_inner_microstep: 1686.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421
[2024-06-11 06:39:55,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1341.99 | bwd_inner_microstep: 1341.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3515
[2024-06-11 06:39:57,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1517.34 | bwd_inner_microstep: 1517.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632
[2024-06-11 06:39:59,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.56 | bwd_microstep: 1502.62 | bwd_inner_microstep: 1502.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443
[2024-06-11 06:40:01,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.98 | bwd_microstep: 1448.48 | bwd_inner_microstep: 1448.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3480
[2024-06-11 06:40:03,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.36 | bwd_microstep: 1537.77 | bwd_inner_microstep: 1537.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817
[2024-06-11 06:40:05,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1550.38 | bwd_inner_microstep: 1550.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089
[2024-06-11 06:40:07,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.31 | bwd_microstep: 914.51 | bwd_inner_microstep: 914.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-11 06:40:09,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.45 | bwd_microstep: 1646.71 | bwd_inner_microstep: 1646.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827
[2024-06-11 06:40:11,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.14 | bwd_microstep: 1648.61 | bwd_inner_microstep: 1648.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 06:40:13,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.58 | bwd_microstep: 1254.65 | bwd_inner_microstep: 1254.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3830
[2024-06-11 06:40:15,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.86 | bwd_microstep: 1388.98 | bwd_inner_microstep: 1388.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004
[2024-06-11 06:40:16,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.17 | bwd_microstep: 708.42 | bwd_inner_microstep: 708.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485
[2024-06-11 06:40:18,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1379.52 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2658
[2024-06-11 06:40:19,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.11 | bwd_microstep: 924.49 | bwd_inner_microstep: 924.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809
[2024-06-11 06:40:21,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.06 | bwd_microstep: 1574.92 | bwd_inner_microstep: 1574.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594
[2024-06-11 06:40:23,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1405.81 | bwd_inner_microstep: 1405.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779
[2024-06-11 06:40:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1549.34 | bwd_inner_microstep: 1549.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553
[2024-06-11 06:40:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.29 | bwd_microstep: 1331.11 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827
[2024-06-11 06:40:29,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.77 | bwd_microstep: 1558.55 | bwd_inner_microstep: 1558.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796
[2024-06-11 06:40:31,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1552.76 | bwd_inner_microstep: 1552.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3489
[2024-06-11 06:40:33,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-11 06:40:33,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.27 | bwd_microstep: 1444.30 | bwd_inner_microstep: 1345.52 | bwd_allreduce_microstep: 98.73 | step_microstep: 37.48
[2024-06-11 06:40:33,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16657.48 | bwd: 44775.71 | bwd_inner: 44675.99 | bwd_allreduce: 99.01 | step: 39.11
{'loss': 1.1325, 'learning_rate': 6.902714738192817e-09, 'epoch': 0.99}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402
[2024-06-11 06:40:35,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.84 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3954
[2024-06-11 06:40:37,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1401.63 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417
[2024-06-11 06:40:39,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1247.85 | bwd_inner_microstep: 1247.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2301
[2024-06-11 06:40:40,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.22 | bwd_microstep: 929.10 | bwd_inner_microstep: 929.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775
[2024-06-11 06:40:42,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1472.96 | bwd_inner_microstep: 1472.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734
[2024-06-11 06:40:44,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1634.10 | bwd_inner_microstep: 1634.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407
[2024-06-11 06:40:46,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.83 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-11 06:40:48,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1288.24 | bwd_inner_microstep: 1288.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 06:40:50,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.36 | bwd_microstep: 1284.22 | bwd_inner_microstep: 1284.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1985
[2024-06-11 06:40:51,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.97 | bwd_microstep: 740.27 | bwd_inner_microstep: 740.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763
[2024-06-11 06:40:53,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1440.00 | bwd_inner_microstep: 1439.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504
[2024-06-11 06:40:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1324.49 | bwd_inner_microstep: 1324.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485
[2024-06-11 06:40:57,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.82 | bwd_microstep: 1316.31 | bwd_inner_microstep: 1316.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080
[2024-06-11 06:40:58,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.88 | bwd_microstep: 919.07 | bwd_inner_microstep: 919.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428
[2024-06-11 06:41:00,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.63 | bwd_microstep: 1299.84 | bwd_inner_microstep: 1299.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-11 06:41:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1453.54 | bwd_inner_microstep: 1453.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 06:41:03,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1384.24 | bwd_inner_microstep: 1384.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650
[2024-06-11 06:41:05,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1419.28 | bwd_inner_microstep: 1419.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3635
[2024-06-11 06:41:08,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 1576.93 | bwd_inner_microstep: 1576.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513
[2024-06-11 06:41:10,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1558.25 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828
[2024-06-11 06:41:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1662.18 | bwd_inner_microstep: 1662.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 06:41:14,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1257.91 | bwd_inner_microstep: 1257.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437
[2024-06-11 06:41:16,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1254.03 | bwd_inner_microstep: 1254.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565
[2024-06-11 06:41:17,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.73 | bwd_microstep: 1404.80 | bwd_inner_microstep: 1404.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033
[2024-06-11 06:41:19,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.64 | bwd_microstep: 840.05 | bwd_inner_microstep: 840.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3432
[2024-06-11 06:41:21,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1515.16 | bwd_inner_microstep: 1515.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815
[2024-06-11 06:41:23,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1558.62 | bwd_inner_microstep: 1558.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496
[2024-06-11 06:41:25,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1290.48 | bwd_inner_microstep: 1290.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3455
[2024-06-11 06:41:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1404.17 | bwd_inner_microstep: 1404.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074
[2024-06-11 06:41:28,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.52 | bwd_microstep: 1011.98 | bwd_inner_microstep: 1011.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2047
[2024-06-11 06:41:29,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.90 | bwd_microstep: 1006.09 | bwd_inner_microstep: 1006.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283
[2024-06-11 06:41:35,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.11 | optimizer_step: 6.59
[2024-06-11 06:41:35,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.32 | bwd_microstep: 5130.93 | bwd_inner_microstep: 1104.33 | bwd_allreduce_microstep: 4026.54 | step_microstep: 38.34
[2024-06-11 06:41:35,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15499.50 | bwd: 45676.61 | bwd_inner: 41649.13 | bwd_allreduce: 4026.77 | step: 39.96
/1726 [29:59:03<19:05, 63.64s/it]


 99%|█████████▉| 1708/1726 [29:59:03<19:05, 63.64s/it]
 99%|█████████▉| 1709/1726 [30:00:04<17:50, 62.96s/it]


 99%|█████████▉| 1709/1726 [30:00:04<17:50, 62.96s/it]
 99%|█████████▉| 1710/1726 [30:01:07<16:49, 63.12s/it]


 99%|█████████▉| 1710/1726 [30:01:07<16:49, 63.12s/it]
 99%|█████████▉| 1711/1726 [30:02:08<15:36, 62.46s/it]


 99%|█████████▉| 1711/1726 [30:02:08<15:36, 62.46s/it]
 99%|█████████▉| 1712/1726 [30:03:10<14:31, 62.25s/it]


 99%|█████████▉| 1712/1726 [30:03:10<14:31, 62.25s/it]
 99%|█████████▉{'loss': 1.1937, 'learning_rate': 5.951877730991928e-09, 'epoch': 0.99}
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492
[2024-06-11 06:41:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.07 | bwd_microstep: 1575.53 | bwd_inner_microstep: 1575.31 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542
[2024-06-11 06:41:39,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.16 | bwd_microstep: 1290.18 | bwd_inner_microstep: 1290.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 06:41:41,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.87 | bwd_microstep: 1478.00 | bwd_inner_microstep: 1477.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578
[2024-06-11 06:41:43,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.67 | bwd_microstep: 1303.80 | bwd_inner_microstep: 1303.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483
[2024-06-11 06:41:45,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1384.91 | bwd_inner_microstep: 1384.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4141
[2024-06-11 06:41:47,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.06 | bwd_microstep: 1569.36 | bwd_inner_microstep: 1569.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488
[2024-06-11 06:41:49,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.86 | bwd_microstep: 1284.60 | bwd_inner_microstep: 1284.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3984
[2024-06-11 06:41:51,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.59 | bwd_microstep: 1505.28 | bwd_inner_microstep: 1505.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600
[2024-06-11 06:41:52,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1314.44 | bwd_inner_microstep: 1314.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3482
[2024-06-11 06:41:55,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.42 | bwd_microstep: 1530.50 | bwd_inner_microstep: 1530.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 06:41:56,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1248.64 | bwd_inner_microstep: 1248.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 06:41:58,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1357.91 | bwd_inner_microstep: 1357.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544
[2024-06-11 06:42:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1493.13 | bwd_inner_microstep: 1493.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077
[2024-06-11 06:42:01,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.51 | bwd_microstep: 823.43 | bwd_inner_microstep: 823.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454
[2024-06-11 06:42:03,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.15 | bwd_inner_microstep: 1349.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679
[2024-06-11 06:42:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1527.97 | bwd_inner_microstep: 1527.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2102
[2024-06-11 06:42:07,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.11 | bwd_microstep: 921.12 | bwd_inner_microstep: 921.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446
[2024-06-11 06:42:08,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.95 | bwd_microstep: 1255.98 | bwd_inner_microstep: 1255.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448
[2024-06-11 06:42:10,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 1348.45 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2091
[2024-06-11 06:42:12,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.80 | bwd_microstep: 1014.70 | bwd_inner_microstep: 1014.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3818
[2024-06-11 06:42:14,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.38 | bwd_microstep: 1602.34 | bwd_inner_microstep: 1602.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 06:42:16,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1256.31 | bwd_inner_microstep: 1256.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774
[2024-06-11 06:42:18,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.65 | bwd_microstep: 1545.40 | bwd_inner_microstep: 1545.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814
[2024-06-11 06:42:20,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.31 | bwd_microstep: 1545.18 | bwd_inner_microstep: 1545.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3827
[2024-06-11 06:42:22,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.85 | bwd_microstep: 1622.45 | bwd_inner_microstep: 1622.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3580
[2024-06-11 06:42:24,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1364.17 | bwd_inner_microstep: 1364.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551
[2024-06-11 06:42:26,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1421.54 | bwd_inner_microstep: 1421.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242
[2024-06-11 06:42:27,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.70 | bwd_microstep: 969.72 | bwd_inner_microstep: 969.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008
[2024-06-11 06:42:28,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.52 | bwd_microstep: 721.14 | bwd_inner_microstep: 721.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722
[2024-06-11 06:42:31,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.45 | bwd_microstep: 1640.48 | bwd_inner_microstep: 1640.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771
[2024-06-11 06:42:33,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.16 | bwd_microstep: 1637.27 | bwd_inner_microstep: 1637.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2706
[2024-06-11 06:42:39,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.16 | optimizer_step: 6.62
[2024-06-11 06:42:39,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.97 | bwd_microstep: 5703.13 | bwd_inner_microstep: 1185.79 | bwd_allreduce_microstep: 4517.28 | step_microstep: 38.99
[2024-06-11 06:42:39,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16071.09 | bwd: 47606.22 | bwd_inner: 43087.85 | bwd_allreduce: 4517.60 | step: 40.56
{'loss': 1.1631, 'learning_rate': 5.071459657339794e-09, 'epoch': 0.99}
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457
[2024-06-11 06:42:41,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1464.87 | bwd_inner_microstep: 1464.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476
[2024-06-11 06:42:43,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1281.23 | bwd_inner_microstep: 1281.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390
[2024-06-11 06:42:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.72 | bwd_microstep: 1238.11 | bwd_inner_microstep: 1238.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2396
[2024-06-11 06:42:46,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.04 | bwd_microstep: 999.49 | bwd_inner_microstep: 999.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 06:42:48,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1244.95 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 06:42:49,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.42 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733
[2024-06-11 06:42:51,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1428.75 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 06:42:53,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1277.31 | bwd_inner_microstep: 1277.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1881
[2024-06-11 06:42:54,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.39 | bwd_microstep: 742.33 | bwd_inner_microstep: 742.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1900
[2024-06-11 06:42:55,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.34 | bwd_microstep: 870.68 | bwd_inner_microstep: 870.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606
[2024-06-11 06:42:57,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1404.41 | bwd_inner_microstep: 1404.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2558
[2024-06-11 06:42:59,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.61 | bwd_microstep: 872.63 | bwd_inner_microstep: 872.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455
[2024-06-11 06:43:00,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504
[2024-06-11 06:43:02,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3655
[2024-06-11 06:43:04,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.71 | bwd_microstep: 1651.66 | bwd_inner_microstep: 1651.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824
[2024-06-11 06:43:07,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1555.20 | bwd_inner_microstep: 1555.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505
[2024-06-11 06:43:08,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.48 | bwd_microstep: 1191.50 | bwd_inner_microstep: 1191.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518
[2024-06-11 06:43:10,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.04 | bwd_microstep: 1394.53 | bwd_inner_microstep: 1394.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 06:43:12,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1256.39 | bwd_inner_microstep: 1256.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831
[2024-06-11 06:43:14,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978
[2024-06-11 06:43:15,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.46 | bwd_microstep: 797.35 | bwd_inner_microstep: 797.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893
[2024-06-11 06:43:17,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.00 | bwd_microstep: 1691.66 | bwd_inner_microstep: 1691.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607
[2024-06-11 06:43:20,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1511.18 | bwd_inner_microstep: 1511.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570
[2024-06-11 06:43:21,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1302.28 | bwd_inner_microstep: 1302.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824
[2024-06-11 06:43:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.69 | bwd_microstep: 1580.67 | bwd_inner_microstep: 1580.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3601
[2024-06-11 06:43:25,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1369.27 | bwd_inner_microstep: 1369.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823
[2024-06-11 06:43:28,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.69 | bwd_microstep: 1649.33 | bwd_inner_microstep: 1649.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620
[2024-06-11 06:43:30,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1606.19 | bwd_inner_microstep: 1606.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2260
[2024-06-11 06:43:31,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.81 | bwd_microstep: 970.55 | bwd_inner_microstep: 970.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820
[2024-06-11 06:43:34,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.96 | bwd_microstep: 1755.96 | bwd_inner_microstep: 1755.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2220
[2024-06-11 06:43:35,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.91 | bwd_microstep: 1011.74 | bwd_inner_microstep: 1011.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2212
[2024-06-11 06:43:40,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.28 | optimizer_step: 6.61
[2024-06-11 06:43:40,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.75 | bwd_microstep: 4153.89 | bwd_inner_microstep: 982.58 | bwd_allreduce_microstep: 3171.23 | step_microstep: 40.21
[2024-06-11 06:43:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15532.95 | bwd: 44818.14 | bwd_inner: 41645.97 | bwd_allreduce: 3171.47 | step: 41.72
{'loss': 1.2006, 'learning_rate': 4.261463618062678e-09, 'epoch': 0.99}
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3576
[2024-06-11 06:43:41,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.29 | bwd_microstep: 1351.32 | bwd_inner_microstep: 1351.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3915
[2024-06-11 06:43:44,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1494.53 | bwd_inner_microstep: 1494.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470
[2024-06-11 06:43:46,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1480.06 | bwd_inner_microstep: 1480.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4107
[2024-06-11 06:43:48,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.01 | bwd_microstep: 1599.83 | bwd_inner_microstep: 1599.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782
[2024-06-11 06:43:50,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1545.79 | bwd_inner_microstep: 1545.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517
[2024-06-11 06:43:52,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1392.03 | bwd_inner_microstep: 1392.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-11 06:43:53,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.23 | bwd_microstep: 800.61 | bwd_inner_microstep: 800.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960
[2024-06-11 06:43:54,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.04 | bwd_microstep: 796.35 | bwd_inner_microstep: 796.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504
[2024-06-11 06:43:56,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1287.81 | bwd_inner_microstep: 1287.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3599
[2024-06-11 06:43:58,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.55 | bwd_microstep: 1532.81 | bwd_inner_microstep: 1532.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3674
[2024-06-11 06:44:00,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1481.38 | bwd_inner_microstep: 1481.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910
[2024-06-11 06:44:01,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.32 | bwd_microstep: 717.88 | bwd_inner_microstep: 717.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513
[2024-06-11 06:44:03,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3649
[2024-06-11 06:44:05,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.19 | bwd_microstep: 1566.12 | bwd_inner_microstep: 1566.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384
[2024-06-11 06:44:07,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3944
[2024-06-11 06:44:09,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.26 | bwd_microstep: 1656.41 | bwd_inner_microstep: 1656.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825
[2024-06-11 06:44:11,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.62 | bwd_microstep: 1554.58 | bwd_inner_microstep: 1554.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523
[2024-06-11 06:44:13,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469
[2024-06-11 06:44:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3682
[2024-06-11 06:44:17,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.56 | bwd_microstep: 1234.49 | bwd_inner_microstep: 1234.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276
[2024-06-11 06:44:18,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.85 | bwd_microstep: 879.41 | bwd_inner_microstep: 879.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675
[2024-06-11 06:44:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.75 | bwd_microstep: 1427.36 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012
[2024-06-11 06:44:21,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.28 | bwd_microstep: 711.32 | bwd_inner_microstep: 711.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604
[2024-06-11 06:44:23,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2062
[2024-06-11 06:44:24,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.25 | bwd_microstep: 724.82 | bwd_inner_microstep: 724.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046
[2024-06-11 06:44:25,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.12 | bwd_microstep: 812.62 | bwd_inner_microstep: 812.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421
[2024-06-11 06:44:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.42 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2273
[2024-06-11 06:44:28,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.45 | bwd_microstep: 845.13 | bwd_inner_microstep: 845.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436
[2024-06-11 06:44:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1354.92 | bwd_inner_microstep: 1354.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559
[2024-06-11 06:44:32,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1396.11 | bwd_inner_microstep: 1396.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583
[2024-06-11 06:44:34,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.96 | bwd_microstep: 1699.27 | bwd_inner_microstep: 1699.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465
[2024-06-11 06:44:39,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.22 | optimizer_step: 6.58
[2024-06-11 06:44:39,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.03 | bwd_microstep: 4660.72 | bwd_inner_microstep: 1435.58 | bwd_allreduce_microstep: 3225.07 | step_microstep: 39.16
[2024-06-11 06:44:39,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15316.14 | bwd: 44196.30 | bwd_inner: 40970.30 | bwd_allreduce: 3225.31 | step: 40.70
{'loss': 1.168, 'learning_rate': 3.5218924659607966e-09, 'epoch': 0.99}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464
[2024-06-11 06:44:41,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1363.31 | bwd_inner_microstep: 1363.13 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484
[2024-06-11 06:44:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899
[2024-06-11 06:44:46,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.12 | bwd_microstep: 1585.41 | bwd_inner_microstep: 1585.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3417
[2024-06-11 06:44:47,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1294.27 | bwd_inner_microstep: 1294.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750
[2024-06-11 06:44:49,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.84 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488
[2024-06-11 06:44:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.25 | bwd_microstep: 1485.29 | bwd_inner_microstep: 1485.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-11 06:44:53,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3720
[2024-06-11 06:44:55,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1494.91 | bwd_inner_microstep: 1494.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-11 06:44:56,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.88 | bwd_microstep: 698.98 | bwd_inner_microstep: 698.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 06:44:58,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1391.38 | bwd_inner_microstep: 1391.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449
[2024-06-11 06:45:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.70 | bwd_microstep: 1359.82 | bwd_inner_microstep: 1359.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3680
[2024-06-11 06:45:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1374.25 | bwd_inner_microstep: 1374.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3464
[2024-06-11 06:45:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 1420.52 | bwd_inner_microstep: 1420.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431
[2024-06-11 06:45:06,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1443.50 | bwd_inner_microstep: 1443.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3646
[2024-06-11 06:45:08,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.05 | bwd_microstep: 1545.33 | bwd_inner_microstep: 1545.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088
[2024-06-11 06:45:09,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.38 | bwd_microstep: 822.35 | bwd_inner_microstep: 822.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652
[2024-06-11 06:45:11,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1426.32 | bwd_inner_microstep: 1426.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-11 06:45:12,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.92 | bwd_microstep: 796.35 | bwd_inner_microstep: 796.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072
[2024-06-11 06:45:13,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.69 | bwd_microstep: 919.51 | bwd_inner_microstep: 919.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591
[2024-06-11 06:45:15,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.61 | bwd_microstep: 1505.04 | bwd_inner_microstep: 1505.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2122
[2024-06-11 06:45:17,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.63 | bwd_microstep: 862.62 | bwd_inner_microstep: 862.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615
[2024-06-11 06:45:18,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.18 | bwd_microstep: 1313.36 | bwd_inner_microstep: 1313.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676
[2024-06-11 06:45:21,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1553.90 | bwd_inner_microstep: 1553.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2280
[2024-06-11 06:45:22,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.67 | bwd_microstep: 907.63 | bwd_inner_microstep: 907.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2166
[2024-06-11 06:45:23,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.03 | bwd_microstep: 1047.52 | bwd_inner_microstep: 1047.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432
[2024-06-11 06:45:25,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1348.71 | bwd_inner_microstep: 1348.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561
[2024-06-11 06:45:27,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1396.13 | bwd_inner_microstep: 1396.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3551
[2024-06-11 06:45:29,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.43 | bwd_microstep: 1447.43 | bwd_inner_microstep: 1447.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579
[2024-06-11 06:45:31,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1526.98 | bwd_inner_microstep: 1526.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2020
[2024-06-11 06:45:32,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.38 | bwd_microstep: 837.57 | bwd_inner_microstep: 837.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767
[2024-06-11 06:45:35,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.45 | bwd_microstep: 1644.27 | bwd_inner_microstep: 1644.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458
[2024-06-11 06:45:40,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.31 | optimizer_step: 6.60
[2024-06-11 06:45:40,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.12 | bwd_microstep: 4916.54 | bwd_inner_microstep: 1324.99 | bwd_allreduce_microstep: 3591.49 | step_microstep: 39.71
[2024-06-11 06:45:40,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15385.75 | bwd: 44801.06 | bwd_inner: 41208.50 | bwd_allreduce: 3591.80 | step: 41.26
{'loss': 1.1652, 'learning_rate': 2.8527488058038844e-09, 'epoch': 0.99}
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1412
[2024-06-11 06:45:41,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.96 | bwd_microstep: 558.41 | bwd_inner_microstep: 558.29 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462
[2024-06-11 06:45:43,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1376.17 | bwd_inner_microstep: 1376.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471
[2024-06-11 06:45:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1384.60 | bwd_inner_microstep: 1384.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468
[2024-06-11 06:45:47,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1374.68 | bwd_inner_microstep: 1374.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3855
[2024-06-11 06:45:48,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1367.22 | bwd_inner_microstep: 1367.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484
[2024-06-11 06:45:50,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1279.77 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560
[2024-06-11 06:45:52,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1395.77 | bwd_inner_microstep: 1395.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726
[2024-06-11 06:45:54,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1438.64 | bwd_inner_microstep: 1438.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449
[2024-06-11 06:45:56,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1257.95 | bwd_inner_microstep: 1257.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707
[2024-06-11 06:45:58,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.34 | bwd_microstep: 1530.56 | bwd_inner_microstep: 1530.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1253
[2024-06-11 06:45:59,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 191.20 | bwd_microstep: 502.23 | bwd_inner_microstep: 502.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672
[2024-06-11 06:46:01,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.06 | bwd_microstep: 1720.66 | bwd_inner_microstep: 1720.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642
[2024-06-11 06:46:03,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.29 | bwd_microstep: 1611.41 | bwd_inner_microstep: 1611.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690
[2024-06-11 06:46:05,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 1526.28 | bwd_inner_microstep: 1526.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994
[2024-06-11 06:46:06,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.72 | bwd_microstep: 829.00 | bwd_inner_microstep: 828.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3850
[2024-06-11 06:46:09,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 679.31 | bwd_microstep: 1864.55 | bwd_inner_microstep: 1864.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-11 06:46:11,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1494.55 | bwd_inner_microstep: 1494.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3936
[2024-06-11 06:46:13,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.20 | bwd_microstep: 1446.44 | bwd_inner_microstep: 1446.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3489
[2024-06-11 06:46:15,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 1334.15 | bwd_inner_microstep: 1334.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661
[2024-06-11 06:46:17,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1568.81 | bwd_inner_microstep: 1568.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3864
[2024-06-11 06:46:19,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.78 | bwd_microstep: 1709.11 | bwd_inner_microstep: 1709.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444
[2024-06-11 06:46:21,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1412.14 | bwd_inner_microstep: 1412.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955
[2024-06-11 06:46:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.30 | bwd_microstep: 698.86 | bwd_inner_microstep: 698.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553
[2024-06-11 06:46:24,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.55 | bwd_microstep: 1298.42 | bwd_inner_microstep: 1298.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 06:46:26,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.15 | bwd_microstep: 1559.77 | bwd_inner_microstep: 1559.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190
[2024-06-11 06:46:28,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.27 | bwd_microstep: 860.38 | bwd_inner_microstep: 860.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575
[2024-06-11 06:46:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1402.02 | bwd_inner_microstep: 1401.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568
[2024-06-11 06:46:31,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.73 | bwd_microstep: 1301.88 | bwd_inner_microstep: 1301.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459
[2024-06-11 06:46:33,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1278.28 | bwd_inner_microstep: 1278.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-11 06:46:35,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1491.39 | bwd_inner_microstep: 1491.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812
[2024-06-11 06:46:37,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.70 | bwd_microstep: 1555.30 | bwd_inner_microstep: 1555.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497
[2024-06-11 06:46:43,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.29 | optimizer_step: 6.61
[2024-06-11 06:46:43,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 4729.49 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 3250.52 | step_microstep: 39.92
[2024-06-11 06:46:43,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16028.14 | bwd: 46158.89 | bwd_inner: 42907.35 | bwd_allreduce: 3250.80 | step: 41.56
| 1713/1726 [30:04:12<13:26, 62.03s/it]


 99%|█████████▉| 1713/1726 [30:04:12<13:26, 62.03s/it]
 99%|█████████▉| 1714/1726 [30:05:16<12:31, 62.63s/it]


 99%|█████████▉| 1714/1726 [30:05:16<12:31, 62.63s/it]
 99%|█████████▉| 1715/1726 [30:06:16<11:22, 62.05s/it]


 99%|█████████▉| 1715/1726 [30:06:16<11:22, 62.05s/it]
 99%|█████████▉| 1716/1726 [30:07:16<10:13, 61.39s/it]


 99%|█████████▉| 1716/1726 [30:07:16<10:13, 61.39s/it]
 99%|█████████▉| 1717/1726 [30:08:17<09:10, 61.13s/it]


 99%|█████████▉| 1717/1726 [30:08:17<09:10, 61.13s/it]
100%|████████{'loss': 1.1529, 'learning_rate': 2.2540349943089844e-09, 'epoch': 1.0}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461
[2024-06-11 06:46:44,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1361.87 | bwd_inner_microstep: 1361.78 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.12
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415
[2024-06-11 06:46:46,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.63 | bwd_microstep: 1181.19 | bwd_inner_microstep: 1181.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961
[2024-06-11 06:46:47,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.41 | bwd_microstep: 791.63 | bwd_inner_microstep: 791.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576
[2024-06-11 06:46:49,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1398.48 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801
[2024-06-11 06:46:51,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.45 | bwd_microstep: 1649.52 | bwd_inner_microstep: 1649.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897
[2024-06-11 06:46:52,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.98 | bwd_microstep: 777.02 | bwd_inner_microstep: 776.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2018
[2024-06-11 06:46:54,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.22 | bwd_microstep: 775.35 | bwd_inner_microstep: 775.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4039
[2024-06-11 06:46:56,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.13 | bwd_microstep: 1517.65 | bwd_inner_microstep: 1517.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955
[2024-06-11 06:46:57,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.60 | bwd_microstep: 798.58 | bwd_inner_microstep: 798.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3076
[2024-06-11 06:46:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1309.03 | bwd_inner_microstep: 1309.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3016
[2024-06-11 06:47:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.13 | bwd_microstep: 1133.73 | bwd_inner_microstep: 1133.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3649
[2024-06-11 06:47:02,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.52 | bwd_microstep: 1482.26 | bwd_inner_microstep: 1482.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455
[2024-06-11 06:47:04,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1263.26 | bwd_inner_microstep: 1263.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3519
[2024-06-11 06:47:06,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1351.58 | bwd_inner_microstep: 1351.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3646
[2024-06-11 06:47:08,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.67 | bwd_microstep: 1763.81 | bwd_inner_microstep: 1763.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 06:47:10,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.73 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456
[2024-06-11 06:47:12,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.50 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3386
[2024-06-11 06:47:14,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.45 | bwd_microstep: 1302.19 | bwd_inner_microstep: 1302.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436
[2024-06-11 06:47:15,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.41 | bwd_microstep: 1309.47 | bwd_inner_microstep: 1309.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458
[2024-06-11 06:47:17,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1374.94 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014
[2024-06-11 06:47:18,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 805.82 | bwd_inner_microstep: 805.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3715
[2024-06-11 06:47:20,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1337.46 | bwd_inner_microstep: 1337.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3823
[2024-06-11 06:47:22,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.27 | bwd_microstep: 1389.30 | bwd_inner_microstep: 1389.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2437
[2024-06-11 06:47:24,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.41 | bwd_microstep: 952.77 | bwd_inner_microstep: 952.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3593
[2024-06-11 06:47:26,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1534.86 | bwd_inner_microstep: 1534.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017
[2024-06-11 06:47:27,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.66 | bwd_microstep: 810.74 | bwd_inner_microstep: 810.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502
[2024-06-11 06:47:29,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.41 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595
[2024-06-11 06:47:31,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1411.09 | bwd_inner_microstep: 1411.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594
[2024-06-11 06:47:33,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1507.62 | bwd_inner_microstep: 1507.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3769
[2024-06-11 06:47:35,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1502.13 | bwd_inner_microstep: 1502.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602
[2024-06-11 06:47:37,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1404.86 | bwd_inner_microstep: 1404.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816
[2024-06-11 06:47:45,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.14 | optimizer_step: 6.62
[2024-06-11 06:47:45,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.68 | bwd_microstep: 7932.29 | bwd_inner_microstep: 1756.14 | bwd_allreduce_microstep: 6176.09 | step_microstep: 38.99
[2024-06-11 06:47:45,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15263.35 | bwd: 47088.98 | bwd_inner: 40911.90 | bwd_allreduce: 6176.37 | step: 40.56
{'loss': 1.21, 'learning_rate': 1.7257531401448924e-09, 'epoch': 1.0}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398
[2024-06-11 06:47:47,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.21 | bwd_microstep: 1235.29 | bwd_inner_microstep: 1235.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2505
[2024-06-11 06:47:48,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.23 | bwd_microstep: 1021.47 | bwd_inner_microstep: 1021.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413
[2024-06-11 06:47:50,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1342.32 | bwd_inner_microstep: 1342.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764
[2024-06-11 06:47:52,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.79 | bwd_microstep: 1635.39 | bwd_inner_microstep: 1635.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397
[2024-06-11 06:47:54,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.48 | bwd_microstep: 1244.53 | bwd_inner_microstep: 1244.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489
[2024-06-11 06:47:56,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.03 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735
[2024-06-11 06:47:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.65 | bwd_microstep: 1528.24 | bwd_inner_microstep: 1528.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483
[2024-06-11 06:48:00,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.47 | bwd_microstep: 1215.12 | bwd_inner_microstep: 1215.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4047
[2024-06-11 06:48:02,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.33 | bwd_microstep: 1714.31 | bwd_inner_microstep: 1714.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2633
[2024-06-11 06:48:03,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.94 | bwd_microstep: 953.48 | bwd_inner_microstep: 953.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404
[2024-06-11 06:48:05,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405
[2024-06-11 06:48:07,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1340.52 | bwd_inner_microstep: 1340.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434
[2024-06-11 06:48:09,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.86 | bwd_microstep: 1278.39 | bwd_inner_microstep: 1278.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2344
[2024-06-11 06:48:10,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.23 | bwd_microstep: 924.07 | bwd_inner_microstep: 924.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3909
[2024-06-11 06:48:12,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.41 | bwd_microstep: 1587.66 | bwd_inner_microstep: 1587.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386
[2024-06-11 06:48:14,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.44 | bwd_microstep: 1336.82 | bwd_inner_microstep: 1336.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497
[2024-06-11 06:48:16,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1551.35 | bwd_inner_microstep: 1551.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3837
[2024-06-11 06:48:18,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 1580.06 | bwd_inner_microstep: 1580.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3463
[2024-06-11 06:48:20,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.83 | bwd_microstep: 1217.89 | bwd_inner_microstep: 1217.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686
[2024-06-11 06:48:22,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1438.17 | bwd_inner_microstep: 1438.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596
[2024-06-11 06:48:24,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1512.67 | bwd_inner_microstep: 1512.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3838
[2024-06-11 06:48:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.56 | bwd_microstep: 1591.55 | bwd_inner_microstep: 1591.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658
[2024-06-11 06:48:28,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1429.26 | bwd_inner_microstep: 1429.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620
[2024-06-11 06:48:30,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1318.44 | bwd_inner_microstep: 1318.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657
[2024-06-11 06:48:32,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.42 | bwd_microstep: 1519.88 | bwd_inner_microstep: 1519.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2140
[2024-06-11 06:48:33,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.31 | bwd_microstep: 804.01 | bwd_inner_microstep: 803.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3832
[2024-06-11 06:48:35,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.37 | bwd_microstep: 1360.67 | bwd_inner_microstep: 1360.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3781
[2024-06-11 06:48:37,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.19 | bwd_microstep: 1486.20 | bwd_inner_microstep: 1486.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611
[2024-06-11 06:48:39,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1514.08 | bwd_inner_microstep: 1514.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036
[2024-06-11 06:48:40,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.43 | bwd_microstep: 810.32 | bwd_inner_microstep: 810.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913
[2024-06-11 06:48:42,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.98 | bwd_microstep: 780.71 | bwd_inner_microstep: 780.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454
[2024-06-11 06:48:46,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.15 | optimizer_step: 6.60
[2024-06-11 06:48:46,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 3971.02 | bwd_inner_microstep: 1421.96 | bwd_allreduce_microstep: 2549.00 | step_microstep: 39.11
[2024-06-11 06:48:46,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15750.16 | bwd: 44772.92 | bwd_inner: 42223.02 | bwd_allreduce: 2549.23 | step: 40.58
{'loss': 1.1683, 'learning_rate': 1.2679051039188317e-09, 'epoch': 1.0}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 06:48:48,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1374.83 | bwd_inner_microstep: 1374.68 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419
[2024-06-11 06:48:50,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.34 | bwd_microstep: 1148.20 | bwd_inner_microstep: 1148.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3481
[2024-06-11 06:48:52,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1441.53 | bwd_inner_microstep: 1441.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405
[2024-06-11 06:48:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.67 | bwd_microstep: 1146.41 | bwd_inner_microstep: 1146.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3413
[2024-06-11 06:48:55,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.25 | bwd_microstep: 1179.94 | bwd_inner_microstep: 1179.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 06:48:57,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505
[2024-06-11 06:48:58,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1285.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 06:48:59,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.88 | bwd_microstep: 797.65 | bwd_inner_microstep: 797.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969
[2024-06-11 06:49:01,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.78 | bwd_microstep: 827.44 | bwd_inner_microstep: 827.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487
[2024-06-11 06:49:03,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1481.27 | bwd_inner_microstep: 1481.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482
[2024-06-11 06:49:05,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1377.24 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419
[2024-06-11 06:49:06,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1249.71 | bwd_inner_microstep: 1249.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715
[2024-06-11 06:49:08,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1530.13 | bwd_inner_microstep: 1530.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3653
[2024-06-11 06:49:11,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.44 | bwd_microstep: 1683.00 | bwd_inner_microstep: 1682.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467
[2024-06-11 06:49:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1573.43 | bwd_inner_microstep: 1573.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536
[2024-06-11 06:49:15,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1326.36 | bwd_inner_microstep: 1326.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451
[2024-06-11 06:49:17,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1353.68 | bwd_inner_microstep: 1353.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.35
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502
[2024-06-11 06:49:19,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1441.96 | bwd_inner_microstep: 1441.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446
[2024-06-11 06:49:20,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.11 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716
[2024-06-11 06:49:22,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.02 | bwd_microstep: 1439.09 | bwd_inner_microstep: 1439.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818
[2024-06-11 06:49:24,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1459.15 | bwd_inner_microstep: 1459.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2065
[2024-06-11 06:49:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.12 | bwd_microstep: 785.72 | bwd_inner_microstep: 785.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463
[2024-06-11 06:49:27,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.09 | bwd_microstep: 1187.15 | bwd_inner_microstep: 1187.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814
[2024-06-11 06:49:29,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.51 | bwd_microstep: 1356.12 | bwd_inner_microstep: 1356.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557
[2024-06-11 06:49:31,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.92 | bwd_microstep: 1304.46 | bwd_inner_microstep: 1304.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3513
[2024-06-11 06:49:33,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1349.07 | bwd_inner_microstep: 1349.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3563
[2024-06-11 06:49:35,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1363.96 | bwd_inner_microstep: 1363.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580
[2024-06-11 06:49:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.76 | bwd_microstep: 1304.13 | bwd_inner_microstep: 1304.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2162
[2024-06-11 06:49:38,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.45 | bwd_microstep: 853.63 | bwd_inner_microstep: 853.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808
[2024-06-11 06:49:40,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.69 | bwd_microstep: 1453.39 | bwd_inner_microstep: 1453.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812
[2024-06-11 06:49:42,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.66 | bwd_microstep: 1644.94 | bwd_inner_microstep: 1644.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594
[2024-06-11 06:49:47,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.28 | optimizer_step: 6.58
[2024-06-11 06:49:47,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.38 | bwd_microstep: 4044.51 | bwd_inner_microstep: 1810.32 | bwd_allreduce_microstep: 2234.11 | step_microstep: 39.79
[2024-06-11 06:49:47,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15787.35 | bwd: 44425.97 | bwd_inner: 42190.79 | bwd_allreduce: 2234.40 | step: 41.67
{'loss': 1.1316, 'learning_rate': 8.804924981653529e-10, 'epoch': 1.0}
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463
[2024-06-11 06:49:49,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1396.19 | bwd_inner_microstep: 1396.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860
[2024-06-11 06:49:51,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.32 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2331
[2024-06-11 06:49:52,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.90 | bwd_microstep: 981.43 | bwd_inner_microstep: 981.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826
[2024-06-11 06:49:54,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.90 | bwd_microstep: 1582.52 | bwd_inner_microstep: 1582.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794
[2024-06-11 06:49:56,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.78 | bwd_microstep: 1445.65 | bwd_inner_microstep: 1445.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483
[2024-06-11 06:49:58,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.19 | bwd_microstep: 1283.48 | bwd_inner_microstep: 1283.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400
[2024-06-11 06:50:00,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1343.10 | bwd_inner_microstep: 1343.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3497
[2024-06-11 06:50:02,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.75 | bwd_microstep: 1236.90 | bwd_inner_microstep: 1236.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411
[2024-06-11 06:50:03,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.56 | bwd_microstep: 1282.01 | bwd_inner_microstep: 1281.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537
[2024-06-11 06:50:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1482.88 | bwd_inner_microstep: 1482.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621
[2024-06-11 06:50:07,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.89 | bwd_microstep: 1313.81 | bwd_inner_microstep: 1313.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-11 06:50:09,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.06 | bwd_microstep: 1343.41 | bwd_inner_microstep: 1343.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489
[2024-06-11 06:50:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.97 | bwd_microstep: 1314.58 | bwd_inner_microstep: 1314.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412
[2024-06-11 06:50:13,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1341.16 | bwd_inner_microstep: 1341.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633
[2024-06-11 06:50:15,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1604.18 | bwd_inner_microstep: 1604.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3649
[2024-06-11 06:50:17,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.78 | bwd_microstep: 1820.77 | bwd_inner_microstep: 1820.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3901
[2024-06-11 06:50:20,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1685.10 | bwd_inner_microstep: 1685.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2310
[2024-06-11 06:50:21,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.05 | bwd_microstep: 887.34 | bwd_inner_microstep: 887.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998
[2024-06-11 06:50:22,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 803.95 | bwd_inner_microstep: 803.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473
[2024-06-11 06:50:24,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.88 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3456
[2024-06-11 06:50:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.93 | bwd_microstep: 1190.85 | bwd_inner_microstep: 1190.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2292
[2024-06-11 06:50:27,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.47 | bwd_microstep: 816.49 | bwd_inner_microstep: 816.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906
[2024-06-11 06:50:28,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.16 | bwd_microstep: 685.26 | bwd_inner_microstep: 685.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566
[2024-06-11 06:50:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1404.56 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600
[2024-06-11 06:50:32,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1415.32 | bwd_inner_microstep: 1415.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548
[2024-06-11 06:50:33,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571
[2024-06-11 06:50:35,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.12 | bwd_microstep: 1401.24 | bwd_inner_microstep: 1401.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541
[2024-06-11 06:50:37,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1301.73 | bwd_inner_microstep: 1301.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2194
[2024-06-11 06:50:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 956.53 | bwd_inner_microstep: 956.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575
[2024-06-11 06:50:41,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1494.81 | bwd_inner_microstep: 1494.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820
[2024-06-11 06:50:43,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1524.19 | bwd_inner_microstep: 1524.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433
[2024-06-11 06:50:49,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.13 | optimizer_step: 6.61
[2024-06-11 06:50:49,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 5446.65 | bwd_inner_microstep: 1523.35 | bwd_allreduce_microstep: 3923.24 | step_microstep: 39.10
[2024-06-11 06:50:49,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.62 | bwd: 46024.13 | bwd_inner: 42099.97 | bwd_allreduce: 3923.48 | step: 40.83
{'loss': 1.1186, 'learning_rate': 5.63516687352994e-10, 'epoch': 1.0}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 06:50:50,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.85 | bwd_microstep: 1274.74 | bwd_inner_microstep: 1274.67 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3914
[2024-06-11 06:50:53,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.75 | bwd_microstep: 1587.74 | bwd_inner_microstep: 1587.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2137
[2024-06-11 06:50:54,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.50 | bwd_microstep: 891.41 | bwd_inner_microstep: 891.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474
[2024-06-11 06:50:56,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.00 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742
[2024-06-11 06:50:58,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1429.07 | bwd_inner_microstep: 1429.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491
[2024-06-11 06:50:59,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.67 | bwd_microstep: 1186.83 | bwd_inner_microstep: 1186.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425
[2024-06-11 06:51:01,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1440.43 | bwd_inner_microstep: 1440.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717
[2024-06-11 06:51:03,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1529.70 | bwd_inner_microstep: 1529.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3426
[2024-06-11 06:51:05,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.89 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501
[2024-06-11 06:51:07,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.08 | bwd_microstep: 1285.64 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3450
[2024-06-11 06:51:09,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.51 | bwd_microstep: 1312.47 | bwd_inner_microstep: 1312.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410
[2024-06-11 06:51:10,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1243.67 | bwd_inner_microstep: 1243.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3381
[2024-06-11 06:51:12,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.92 | bwd_microstep: 1361.72 | bwd_inner_microstep: 1361.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503
[2024-06-11 06:51:14,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.83 | bwd_microstep: 1287.41 | bwd_inner_microstep: 1287.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 640
[2024-06-11 06:51:15,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 105.41 | bwd_microstep: 265.28 | bwd_inner_microstep: 265.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301
[2024-06-11 06:51:16,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.17 | bwd_microstep: 975.25 | bwd_inner_microstep: 975.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525
[2024-06-11 06:51:18,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1493.01 | bwd_inner_microstep: 1492.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946
[2024-06-11 06:51:19,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.82 | bwd_microstep: 698.39 | bwd_inner_microstep: 698.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827
[2024-06-11 06:51:21,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.03 | bwd_microstep: 1361.21 | bwd_inner_microstep: 1361.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615
[2024-06-11 06:51:23,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1510.58 | bwd_inner_microstep: 1510.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620
[2024-06-11 06:51:25,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1511.85 | bwd_inner_microstep: 1511.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535
[2024-06-11 06:51:27,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1495.54 | bwd_inner_microstep: 1495.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693
[2024-06-11 06:51:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.24 | bwd_microstep: 1627.98 | bwd_inner_microstep: 1627.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484
[2024-06-11 06:51:31,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1379.44 | bwd_inner_microstep: 1379.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3615
[2024-06-11 06:51:33,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.88 | bwd_microstep: 1535.26 | bwd_inner_microstep: 1535.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2061
[2024-06-11 06:51:34,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.00 | bwd_microstep: 848.37 | bwd_inner_microstep: 848.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422
[2024-06-11 06:51:36,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.65 | bwd_microstep: 1165.69 | bwd_inner_microstep: 1165.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2937
[2024-06-11 06:51:38,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 1325.25 | bwd_inner_microstep: 1325.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551
[2024-06-11 06:51:40,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1498.85 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2019
[2024-06-11 06:51:41,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.63 | bwd_microstep: 903.62 | bwd_inner_microstep: 903.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3586
[2024-06-11 06:51:43,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.38 | bwd_microstep: 1670.34 | bwd_inner_microstep: 1670.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816
[2024-06-11 06:51:50,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.15 | optimizer_step: 6.59
[2024-06-11 06:51:50,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.18 | bwd_microstep: 6029.43 | bwd_inner_microstep: 1868.68 | bwd_allreduce_microstep: 4160.69 | step_microstep: 38.99
[2024-06-11 06:51:50,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15450.25 | bwd: 45665.32 | bwd_inner: 41503.66 | bwd_allreduce: 4160.95 | step: 40.55
█▉| 1718/1726 [30:09:19<08:12, 61.55s/it]


100%|█████████▉| 1718/1726 [30:09:19<08:12, 61.55s/it]
100%|█████████▉| 1719/1726 [30:10:22<07:13, 61.89s/it]


100%|█████████▉| 1719/1726 [30:10:22<07:13, 61.89s/it]
100%|█████████▉| 1720/1726 [30:11:23<06:09, 61.58s/it]


100%|█████████▉| 1720/1726 [30:11:23<06:09, 61.58s/it]
100%|█████████▉| 1721/1726 [30:12:23<05:06, 61.27s/it]


100%|█████████▉| 1721/1726 [30:12:23<05:06, 61.27s/it]
100%|█████████▉| 1722/1726 [30:13:25<04:06, 61.52s/it]


100%|█████████▉| 1722/1726 [30:13:25<04:06, 61.52s/it]
100%|██████{'loss': 1.153, 'learning_rate': 3.1697878786873804e-10, 'epoch': 1.0}
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551
[2024-06-11 06:51:53,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 2679.75 | bwd_inner_microstep: 2679.66 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.13
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463
[2024-06-11 06:51:55,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1278.57 | bwd_inner_microstep: 1278.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844
[2024-06-11 06:51:57,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.99 | bwd_microstep: 1656.27 | bwd_inner_microstep: 1656.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753
[2024-06-11 06:51:59,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1440.88 | bwd_inner_microstep: 1440.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444
[2024-06-11 06:52:01,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.19 | bwd_microstep: 1153.18 | bwd_inner_microstep: 1153.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784
[2024-06-11 06:52:03,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.60 | bwd_microstep: 1545.84 | bwd_inner_microstep: 1545.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477
[2024-06-11 06:52:05,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1279.13 | bwd_inner_microstep: 1279.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478
[2024-06-11 06:52:07,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.49 | bwd_microstep: 1385.14 | bwd_inner_microstep: 1385.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423
[2024-06-11 06:52:09,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1252.26 | bwd_inner_microstep: 1252.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509
[2024-06-11 06:52:11,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1438.56 | bwd_inner_microstep: 1438.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3400
[2024-06-11 06:52:12,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.10 | bwd_microstep: 1211.23 | bwd_inner_microstep: 1211.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3610
[2024-06-11 06:52:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.39 | bwd_microstep: 1467.64 | bwd_inner_microstep: 1467.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3699
[2024-06-11 06:52:17,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.23 | bwd_microstep: 1653.62 | bwd_inner_microstep: 1653.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515
[2024-06-11 06:52:19,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1447.12 | bwd_inner_microstep: 1447.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611
[2024-06-11 06:52:20,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1404.07 | bwd_inner_microstep: 1404.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486
[2024-06-11 06:52:22,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.27 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452
[2024-06-11 06:52:24,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1256.65 | bwd_inner_microstep: 1256.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502
[2024-06-11 06:52:26,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1484.29 | bwd_inner_microstep: 1484.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456
[2024-06-11 06:52:28,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723
[2024-06-11 06:52:30,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.26 | bwd_microstep: 1533.78 | bwd_inner_microstep: 1533.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824
[2024-06-11 06:52:32,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.06 | bwd_microstep: 1655.06 | bwd_inner_microstep: 1655.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520
[2024-06-11 06:52:34,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539
[2024-06-11 06:52:36,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1492.65 | bwd_inner_microstep: 1492.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806
[2024-06-11 06:52:38,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1557.04 | bwd_inner_microstep: 1557.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3656
[2024-06-11 06:52:40,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1326.71 | bwd_inner_microstep: 1326.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476
[2024-06-11 06:52:42,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.06 | bwd_microstep: 1189.41 | bwd_inner_microstep: 1189.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3691
[2024-06-11 06:52:44,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.92 | bwd_microstep: 1361.02 | bwd_inner_microstep: 1361.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454
[2024-06-11 06:52:46,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1448.89 | bwd_inner_microstep: 1448.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2024
[2024-06-11 06:52:47,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.29 | bwd_microstep: 906.33 | bwd_inner_microstep: 906.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631
[2024-06-11 06:52:49,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1606.31 | bwd_inner_microstep: 1606.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3559
[2024-06-11 06:52:51,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1520.05 | bwd_inner_microstep: 1520.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3741
[2024-06-11 06:52:54,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.08 | optimizer_step: 6.58
[2024-06-11 06:52:54,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1976.12 | bwd_inner_microstep: 1722.97 | bwd_allreduce_microstep: 253.09 | step_microstep: 39.23
[2024-06-11 06:52:54,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16821.09 | bwd: 46638.15 | bwd_inner: 46384.08 | bwd_allreduce: 253.37 | step: 41.04
{'loss': 1.1706, 'learning_rate': 1.4087966801579201e-10, 'epoch': 1.0}
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403
[2024-06-11 06:52:56,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.03 | bwd_microstep: 1175.44 | bwd_inner_microstep: 1175.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08
dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3915
[2024-06-11 06:52:58,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.71 | bwd_microstep: 1521.95 | bwd_inner_microstep: 1521.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796
[2024-06-11 06:53:00,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.94 | bwd_microstep: 1456.06 | bwd_inner_microstep: 1456.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834
[2024-06-11 06:53:02,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.06 | bwd_microstep: 1453.49 | bwd_inner_microstep: 1453.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762
[2024-06-11 06:53:04,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.19 | bwd_microstep: 1445.99 | bwd_inner_microstep: 1445.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948
[2024-06-11 06:53:05,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.05 | bwd_microstep: 792.71 | bwd_inner_microstep: 792.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480
[2024-06-11 06:53:07,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.34 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1900
[2024-06-11 06:53:08,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.22 | bwd_microstep: 734.98 | bwd_inner_microstep: 734.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469
[2024-06-11 06:53:10,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.05 | bwd_microstep: 1482.17 | bwd_inner_microstep: 1482.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607
[2024-06-11 06:53:12,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.31 | bwd_microstep: 1604.87 | bwd_inner_microstep: 1604.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377
[2024-06-11 06:53:14,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1338.33 | bwd_inner_microstep: 1338.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966
[2024-06-11 06:53:15,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 798.41 | bwd_inner_microstep: 798.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765
[2024-06-11 06:53:17,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.79 | bwd_microstep: 1440.01 | bwd_inner_microstep: 1439.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402
[2024-06-11 06:53:19,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1245.88 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504
[2024-06-11 06:53:21,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.66 | bwd_microstep: 1448.80 | bwd_inner_microstep: 1448.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528
[2024-06-11 06:53:23,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1496.35 | bwd_inner_microstep: 1496.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3656
[2024-06-11 06:53:25,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1481.98 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452
[2024-06-11 06:53:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.88 | bwd_microstep: 1286.32 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634
[2024-06-11 06:53:29,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.70 | bwd_microstep: 1543.66 | bwd_inner_microstep: 1543.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460
[2024-06-11 06:53:31,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1375.96 | bwd_inner_microstep: 1375.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822
[2024-06-11 06:53:33,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1391.21 | bwd_inner_microstep: 1391.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620
[2024-06-11 06:53:35,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1410.53 | bwd_inner_microstep: 1410.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475
[2024-06-11 06:53:37,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1382.10 | bwd_inner_microstep: 1382.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550
[2024-06-11 06:53:38,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1299.68 | bwd_inner_microstep: 1299.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958
[2024-06-11 06:53:39,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 703.54 | bwd_inner_microstep: 703.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549
[2024-06-11 06:53:41,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.36 | bwd_microstep: 1201.15 | bwd_inner_microstep: 1201.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553
[2024-06-11 06:53:43,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1499.92 | bwd_inner_microstep: 1499.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-11 06:53:45,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.59 | bwd_microstep: 1647.23 | bwd_inner_microstep: 1647.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19
dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3820
[2024-06-11 06:53:48,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.87 | bwd_microstep: 1619.76 | bwd_inner_microstep: 1619.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574
[2024-06-11 06:53:49,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1402.13 | bwd_inner_microstep: 1402.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781
[2024-06-11 06:53:52,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.72 | bwd_microstep: 1648.68 | bwd_inner_microstep: 1648.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581
[2024-06-11 06:53:57,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.25 | optimizer_step: 6.61
[2024-06-11 06:53:57,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 4280.50 | bwd_inner_microstep: 2090.37 | bwd_allreduce_microstep: 2190.06 | step_microstep: 39.58
[2024-06-11 06:53:57,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16250.07 | bwd: 46095.21 | bwd_inner: 43904.23 | bwd_allreduce: 2190.30 | step: 41.38
{'loss': 1.165, 'learning_rate': 3.521994801580775e-11, 'epoch': 1.0}
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014
[2024-06-11 06:53:58,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.93 | bwd_microstep: 889.54 | bwd_inner_microstep: 889.35 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3961
[2024-06-11 06:54:00,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1493.34 | bwd_inner_microstep: 1493.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878
[2024-06-11 06:54:01,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.29 | bwd_microstep: 739.98 | bwd_inner_microstep: 739.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3923
[2024-06-11 06:54:03,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956
[2024-06-11 06:54:04,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.19 | bwd_microstep: 824.83 | bwd_inner_microstep: 824.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403
[2024-06-11 06:54:06,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759
[2024-06-11 06:54:08,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1541.22 | bwd_inner_microstep: 1541.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3733
[2024-06-11 06:54:10,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1367.23 | bwd_inner_microstep: 1367.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444
[2024-06-11 06:54:12,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1254.13 | bwd_inner_microstep: 1254.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617
[2024-06-11 06:54:14,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1414.69 | bwd_inner_microstep: 1414.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3101
[2024-06-11 06:54:15,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.95 | bwd_microstep: 1151.97 | bwd_inner_microstep: 1151.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975
[2024-06-11 06:54:16,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.76 | bwd_microstep: 797.61 | bwd_inner_microstep: 797.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676
[2024-06-11 06:54:18,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.43 | bwd_microstep: 1432.88 | bwd_inner_microstep: 1432.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508
[2024-06-11 06:54:20,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.63 | bwd_microstep: 1393.26 | bwd_inner_microstep: 1393.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3664
[2024-06-11 06:54:22,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.05 | bwd_microstep: 1478.34 | bwd_inner_microstep: 1478.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628
[2024-06-11 06:54:24,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1513.25 | bwd_inner_microstep: 1513.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415
[2024-06-11 06:54:26,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3499
[2024-06-11 06:54:28,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.39 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3631
[2024-06-11 06:54:30,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1658.39 | bwd_inner_microstep: 1658.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2080
[2024-06-11 06:54:31,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.43 | bwd_microstep: 883.31 | bwd_inner_microstep: 883.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3618
[2024-06-11 06:54:34,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.51 | bwd_microstep: 1709.57 | bwd_inner_microstep: 1709.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397
[2024-06-11 06:54:36,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.34 | bwd_microstep: 1345.16 | bwd_inner_microstep: 1345.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3653
[2024-06-11 06:54:38,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.62 | bwd_microstep: 1716.04 | bwd_inner_microstep: 1716.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712
[2024-06-11 06:54:40,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.09 | bwd_microstep: 1529.54 | bwd_inner_microstep: 1529.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182
[2024-06-11 06:54:41,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.32 | bwd_microstep: 953.88 | bwd_inner_microstep: 953.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926
[2024-06-11 06:54:42,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.25 | bwd_microstep: 696.56 | bwd_inner_microstep: 696.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590
[2024-06-11 06:54:44,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1407.55 | bwd_inner_microstep: 1407.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500
[2024-06-11 06:54:46,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596
[2024-06-11 06:54:48,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.91 | bwd_microstep: 1406.75 | bwd_inner_microstep: 1406.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03
dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468
[2024-06-11 06:54:50,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1283.22 | bwd_inner_microstep: 1283.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800
[2024-06-11 06:54:52,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1552.78 | bwd_inner_microstep: 1552.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04
dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612
[2024-06-11 06:54:59,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.27 | optimizer_step: 6.57
[2024-06-11 06:54:59,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 6616.12 | bwd_inner_microstep: 1579.57 | bwd_allreduce_microstep: 5036.47 | step_microstep: 39.78
[2024-06-11 06:54:59,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15523.38 | bwd: 46773.30 | bwd_inner: 41735.75 | bwd_allreduce: 5036.80 | step: 41.48
{'loss': 1.1662, 'learning_rate': 0.0, 'epoch': 1.0}
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fa0769bf9f0>
Failed to load image: playground/data/ocr_vqa/images/60170921.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0768c3900>
Failed to load image: playground/data/ocr_vqa/images/157851231X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0768aeb30>
Failed to load image: playground/data/ocr_vqa/images/930031571.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0769346d0>
Failed to load image: playground/data/ocr_vqa/images/771573936.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa072b7bef0>
Failed to load image: playground/data/ocr_vqa/images/60164255.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0769d5f90>
Failed to load image: playground/data/ocr_vqa/images/316142778.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0769569a0>
Failed to load image: playground/data/ocr_vqa/images/715308904.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0769a3cc0>
Failed to load image: playground/data/ocr_vqa/images/312244452.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa072b97270>
Failed to load image: playground/data/ocr_vqa/images/201112973.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0768c0270>
Failed to load image: playground/data/ocr_vqa/images/521506743.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0769ec2c0>
Failed to load image: playground/data/ocr_vqa/images/2067009559.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa0769f8540>
Failed to load image: playground/data/ocr_vqa/images/425099369.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fbe7ecaa2c0>
Failed to load image: playground/data/ocr_vqa/images/816512019.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7ecaaf40>
Failed to load image: playground/data/ocr_vqa/images/883880075.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7d005c70>
Failed to load image: playground/data/ocr_vqa/images/966424603.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe9745b1d0>
Failed to load image: playground/data/ocr_vqa/images/739704516.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7ee35540>
Failed to load image: playground/data/ocr_vqa/images/292796048.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe80449770>
Failed to load image: playground/data/ocr_vqa/images/037550267X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7ecb77c0>
Failed to load image: playground/data/ocr_vqa/images/3928819232.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7ecb76d0>
Failed to load image: playground/data/ocr_vqa/images/878576991.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7edb6360>
Failed to load image: playground/data/ocr_vqa/images/786405511.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7ed5f7c0>
Failed to load image: playground/data/ocr_vqa/images/553069403.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe9745b450>
Failed to load image: playground/data/ocr_vqa/images/800756614.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbe7ecb7cc0>
Failed to load image: playground/data/ocr_vqa/images/679445765.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f690f301540>
Failed to load image: playground/data/ocr_vqa/images/930016238.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f690f2e3c70>
Failed to load image: playground/data/ocr_vqa/images/763115509.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f691108c0e0>
Failed to load image: playground/data/ocr_vqa/images/1576730867.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f691101f6d0>
Failed to load image: playground/data/ocr_vqa/images/393701719.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6a4c76bd10>
Failed to load image: playground/data/ocr_vqa/images/937274461.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f69110de900>
Failed to load image: playground/data/ocr_vqa/images/761500413.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6910f0aa40>
Failed to load image: playground/data/ocr_vqa/images/914625179.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f691101f3b0>
Failed to load image: playground/data/ocr_vqa/images/28628594.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f7ba50a14a0>
Failed to load image: playground/data/ocr_vqa/images/830816011.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba5101ea0>
Failed to load image: playground/data/ocr_vqa/images/899330266.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba5090f90>
Failed to load image: playground/data/ocr_vqa/images/674035275.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6dbc4f0>
Failed to load image: playground/data/ocr_vqa/images/785282394.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6df4c70>
Failed to load image: playground/data/ocr_vqa/images/965776611.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6ebcef0>
Failed to load image: playground/data/ocr_vqa/images/B00XIZWWNC.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
string index out of range
Failed to load image: playground/data/coco/train2017/000000178275.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fedd2299c20>
Failed to load image: playground/data/ocr_vqa/images/471243787.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fedd2251a90>
Failed to load image: playground/data/ocr_vqa/images/912111364.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fedd21ceea0>
Failed to load image: playground/data/ocr_vqa/images/688116191.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fedd21219a0>
Failed to load image: playground/data/ocr_vqa/images/671025627.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f07462bf720>
Failed to load image: playground/data/ocr_vqa/images/1881174034.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07462ab090>
Failed to load image: playground/data/ocr_vqa/images/898620996.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07462bf270>
Failed to load image: playground/data/ocr_vqa/images/393090027.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0746364720>
Failed to load image: playground/data/ocr_vqa/images/1560523573.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07463db9f0>
Failed to load image: playground/data/ocr_vqa/images/892390263.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07462aacc0>
Failed to load image: playground/data/ocr_vqa/images/060960323X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07463c55e0>
Failed to load image: playground/data/ocr_vqa/images/517884283.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f074255ee50>
Failed to load image: playground/data/ocr_vqa/images/394532643.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07462ab9f0>
Failed to load image: playground/data/ocr_vqa/images/871563932.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0742587e50>
Failed to load image: playground/data/ocr_vqa/images/789401592.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f074623e5e0>
Failed to load image: playground/data/ocr_vqa/images/28624084.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07463a8cc0>
Failed to load image: playground/data/ocr_vqa/images/446387355.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07463a8860>
Failed to load image: playground/data/ocr_vqa/images/810940183.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f07462aab30>
Failed to load image: playground/data/ocr_vqa/images/1569750882.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6db6310>
Failed to load image: playground/data/ocr_vqa/images/080740604X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6e20a40>
Failed to load image: playground/data/ocr_vqa/images/966542606.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6e20ea0>
Failed to load image: playground/data/ocr_vqa/images/412132710.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6db69f0>
Failed to load image: playground/data/ocr_vqa/images/1566866391.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6d87ae0>
Failed to load image: playground/data/ocr_vqa/images/067173363X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6e91540>
Failed to load image: playground/data/ocr_vqa/images/953735702.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7ba6e20900>
Failed to load image: playground/data/ocr_vqa/images/465069347.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fbf83c090e0>
Failed to load image: playground/data/ocr_vqa/images/739714864.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83cf19f0>
Failed to load image: playground/data/ocr_vqa/images/038794740X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83bb65e0>
Failed to load image: playground/data/ocr_vqa/images/812931432.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83c75a40>
Failed to load image: playground/data/ocr_vqa/images/1860340660.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83c75b30>
Failed to load image: playground/data/ocr_vqa/images/1574301012.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83c75ae0>
Failed to load image: playground/data/ocr_vqa/images/789401509.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83c59e00>
Failed to load image: playground/data/ocr_vqa/images/1564965112.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83c20e00>
Failed to load image: playground/data/ocr_vqa/images/078710339X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83b16bd0>
Failed to load image: playground/data/ocr_vqa/images/688118127.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf8c093b30>
Failed to load image: playground/data/ocr_vqa/images/188673206X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fbf83cf1040>
Failed to load image: playground/data/ocr_vqa/images/785807209.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f1258f51b30>
Failed to load image: playground/data/ocr_vqa/images/938076140.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258f37d10>
Failed to load image: playground/data/ocr_vqa/images/078942049X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258fb3400>
Failed to load image: playground/data/ocr_vqa/images/1566250420.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258f51b80>
Failed to load image: playground/data/ocr_vqa/images/395539331.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258e3dea0>
Failed to load image: playground/data/ocr_vqa/images/967695104.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1265d899f0>
Failed to load image: playground/data/ocr_vqa/images/1557987203.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258eb4c20>
Failed to load image: playground/data/ocr_vqa/images/64462013.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258f1d130>
Failed to load image: playground/data/ocr_vqa/images/1579771009.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f1ffb2b1590>
Failed to load image: playground/data/ocr_vqa/images/134412052.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f2023de50e0>
Failed to load image: playground/data/ocr_vqa/images/1570280762.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb181720>
Failed to load image: playground/data/ocr_vqa/images/785805516.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb2aa2c0>
Failed to load image: playground/data/ocr_vqa/images/893148423.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb21c450>
Failed to load image: playground/data/ocr_vqa/images/671567918.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb205270>
Failed to load image: playground/data/ocr_vqa/images/960536205.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb0c5590>
Failed to load image: playground/data/ocr_vqa/images/761522751.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb0c5c70>
Failed to load image: playground/data/ocr_vqa/images/789408554.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb0c5810>
Failed to load image: playground/data/ocr_vqa/images/185343342X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1ffb2ac130>
Failed to load image: playground/data/ocr_vqa/images/785808841.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f0a8c2ddcc0>
Failed to load image: playground/data/ocr_vqa/images/1571971459.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c2218b0>
Failed to load image: playground/data/ocr_vqa/images/073970477X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8a5b2ef0>
Failed to load image: playground/data/ocr_vqa/images/185230863X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c244040>
Failed to load image: playground/data/ocr_vqa/images/933261004.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c3b2d10>
Failed to load image: playground/data/ocr_vqa/images/933821131.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c3b2ae0>
Failed to load image: playground/data/ocr_vqa/images/1855326779.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c293f40>
Failed to load image: playground/data/ocr_vqa/images/670886939.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c355590>
Failed to load image: playground/data/ocr_vqa/images/345414810.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c2fc400>
Failed to load image: playground/data/ocr_vqa/images/679844023.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fd8c7bbbd10>
Failed to load image: playground/data/ocr_vqa/images/1580170536.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c3dbf400>
Failed to load image: playground/data/ocr_vqa/images/312187114.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c7c117c0>
Failed to load image: playground/data/ocr_vqa/images/1559920696.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c3dbf6d0>
Failed to load image: playground/data/ocr_vqa/images/1572240466.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c7b41130>
Failed to load image: playground/data/ocr_vqa/images/1886947694.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c3da4d10>
Failed to load image: playground/data/ocr_vqa/images/375400664.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c7a1a7c0>
Failed to load image: playground/data/ocr_vqa/images/786884061.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c6bc7180>
Failed to load image: playground/data/ocr_vqa/images/133099156.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c3dcb360>
Failed to load image: playground/data/ocr_vqa/images/668053984.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fd8c3da42c0>
Failed to load image: playground/data/ocr_vqa/images/805034676.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f121e5d4900>
Failed to load image: playground/data/ocr_vqa/images/739702505.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121e5d49a0>
Failed to load image: playground/data/ocr_vqa/images/812903390.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121e4f0220>
Failed to load image: playground/data/ocr_vqa/images/936783109.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121e55fcc0>
Failed to load image: playground/data/ocr_vqa/images/761307842.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121e54a450>
Failed to load image: playground/data/ocr_vqa/images/1559920734.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121c7ae9a0>
Failed to load image: playground/data/ocr_vqa/images/1555951295.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121e4ff8b0>
Failed to load image: playground/data/ocr_vqa/images/1580621783.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f121e4afef0>
Failed to load image: playground/data/ocr_vqa/images/958315434.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fc137a29450>
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fee4314f3b0>
Failed to load image: playground/data/ocr_vqa/images/762704519.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee430e3270>
Failed to load image: playground/data/ocr_vqa/images/125476604.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee431414a0>
Failed to load image: playground/data/ocr_vqa/images/821411896.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee43156a40>
Failed to load image: playground/data/ocr_vqa/images/870331612.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee43156400>
Failed to load image: playground/data/ocr_vqa/images/739715100.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee431721d0>
Failed to load image: playground/data/ocr_vqa/images/894550225.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee40398b30>
Failed to load image: playground/data/ocr_vqa/images/962726303.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee403b7a40>
Failed to load image: playground/data/ocr_vqa/images/860208656.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee431bc450>
Failed to load image: playground/data/ocr_vqa/images/156347185X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fee43054310>
Failed to load image: playground/data/ocr_vqa/images/966355903.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f6c551cbea0>
Failed to load image: playground/data/ocr_vqa/images/1862045852.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c56f7dc70>
Failed to load image: playground/data/ocr_vqa/images/156458321X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c56fd5860>
Failed to load image: playground/data/ocr_vqa/images/870675656.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c56f7cdb0>
Failed to load image: playground/data/ocr_vqa/images/809235269.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c551cb680>
Failed to load image: playground/data/ocr_vqa/images/810928949.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c56ffc130>
Failed to load image: playground/data/ocr_vqa/images/1568811012.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c551cb720>
Failed to load image: playground/data/ocr_vqa/images/78821185.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c551ba680>
Failed to load image: playground/data/ocr_vqa/images/870695916.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f6c56f12540>
Failed to load image: playground/data/ocr_vqa/images/B011M9LHUO.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f84909a3400>
Failed to load image: playground/data/ocr_vqa/images/1885928017.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f8490979680>
Failed to load image: playground/data/ocr_vqa/images/60191341.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f849370c310>
Failed to load image: playground/data/ocr_vqa/images/739715593.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f84909a3360>
Failed to load image: playground/data/ocr_vqa/images/1560445513.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f849370c860>
Failed to load image: playground/data/ocr_vqa/images/931580390.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f84937adae0>
Failed to load image: playground/data/ocr_vqa/images/849930987.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f8493807c20>
Failed to load image: playground/data/ocr_vqa/images/679761799.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f84909944f0>
Failed to load image: playground/data/ocr_vqa/images/664220789.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f76939792c0>
Failed to load image: playground/data/ocr_vqa/images/877792356.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f76938c1e50>
Failed to load image: playground/data/ocr_vqa/images/739704745.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f76938b8540>
Failed to load image: playground/data/ocr_vqa/images/1564582914.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7693876e00>
Failed to load image: playground/data/ocr_vqa/images/1568590644.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7693836d10>
Failed to load image: playground/data/ocr_vqa/images/679888268.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f76938b1900>
Failed to load image: playground/data/ocr_vqa/images/B00WTKH3HC.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f76939a4270>
Failed to load image: playground/data/ocr_vqa/images/1574770225.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f76939a44f0>
Failed to load image: playground/data/ocr_vqa/images/915801841.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f769395e450>
Failed to load image: playground/data/ocr_vqa/images/965150739.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258f5dd10>
Failed to load image: playground/data/ocr_vqa/images/289800900.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258f51720>
Failed to load image: playground/data/ocr_vqa/images/761511253.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f1258fb39a0>
Failed to load image: playground/data/ocr_vqa/images/1557488789.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f125ab30d60>
Failed to load image: playground/data/ocr_vqa/images/553062204.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7ff1db14d090>
Failed to load image: playground/data/ocr_vqa/images/739714600.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7ff1db1f6a40>
Failed to load image: playground/data/ocr_vqa/images/299130002.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7ff1db252040>
Failed to load image: playground/data/ocr_vqa/images/1570611912.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7ff1db0bff90>
Failed to load image: playground/data/ocr_vqa/images/B00XLZW19O.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7ff1db1f6220>
Failed to load image: playground/data/ocr_vqa/images/877936293.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7ff1dc5459a0>
Failed to load image: playground/data/ocr_vqa/images/B007K53FQ4.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f3c79659e00>
Failed to load image: playground/data/ocr_vqa/images/525941290.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c795bb5e0>
Failed to load image: playground/data/ocr_vqa/images/739714023.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c795e44f0>
Failed to load image: playground/data/ocr_vqa/images/785270965.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c794ff3b0>
Failed to load image: playground/data/ocr_vqa/images/706377273.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c7a971a90>
Failed to load image: playground/data/ocr_vqa/images/780800370.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c795d0630>
Failed to load image: playground/data/ocr_vqa/images/553353500.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c77608040>
Failed to load image: playground/data/ocr_vqa/images/1564583031.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c776088b0>
Failed to load image: playground/data/ocr_vqa/images/70359148.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f3c795e40e0>
Failed to load image: playground/data/ocr_vqa/images/3797306210.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c3cfae0>
Failed to load image: playground/data/ocr_vqa/images/1556507488.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f0a8c39fb30>
Failed to load image: playground/data/ocr_vqa/images/966586611.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fa73ad14310>
Failed to load image: playground/data/ocr_vqa/images/673384772.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a427810>
Failed to load image: playground/data/ocr_vqa/images/819183482.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a3e29a0>
Failed to load image: playground/data/ocr_vqa/images/139642625.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73ad31860>
Failed to load image: playground/data/ocr_vqa/images/295968265.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a404860>
Failed to load image: playground/data/ocr_vqa/images/823412385.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73ad1fdb0>
Failed to load image: playground/data/ocr_vqa/images/818405988.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a404e00>
Failed to load image: playground/data/ocr_vqa/images/29344506.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a3305e0>
Failed to load image: playground/data/ocr_vqa/images/882894293.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a3fd9f0>
Failed to load image: playground/data/ocr_vqa/images/345351452.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a47ef90>
Failed to load image: playground/data/ocr_vqa/images/231072430.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a3f4f90>
Failed to load image: playground/data/ocr_vqa/images/093727464X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa73a37c090>
Failed to load image: playground/data/ocr_vqa/images/051763547X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fe0e79bdcc0>
Failed to load image: playground/data/ocr_vqa/images/3980621146.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e7926f90>
Failed to load image: playground/data/ocr_vqa/images/1580910068.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e7934630>
Failed to load image: playground/data/ocr_vqa/images/1564583015.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0fffd5540>
Failed to load image: playground/data/ocr_vqa/images/471148288.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e78779f0>
Failed to load image: playground/data/ocr_vqa/images/1570761116.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Failed to load image: playground/data/ocr_vqa/images/472084798.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc137a25540>
Failed to load image: playground/data/ocr_vqa/images/823929493.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc137a1e9a0>
Failed to load image: playground/data/ocr_vqa/images/739704133.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc1379cf7c0>
Failed to load image: playground/data/ocr_vqa/images/1570282358.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc13790b810>
Failed to load image: playground/data/ocr_vqa/images/877793417.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc137956270>
Failed to load image: playground/data/ocr_vqa/images/807085707.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc1379ea900>
Failed to load image: playground/data/ocr_vqa/images/006270110X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc1349ca3b0>
Failed to load image: playground/data/ocr_vqa/images/835607240.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc13909a3b0>
Failed to load image: playground/data/ocr_vqa/images/942627458.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc13799d3b0>
Failed to load image: playground/data/ocr_vqa/images/1882419065.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fc137a25f40>
Failed to load image: playground/data/ocr_vqa/images/912818034.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f88b76541d0>
Failed to load image: playground/data/ocr_vqa/images/934710171.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b1c0d0e0>
Failed to load image: playground/data/ocr_vqa/images/268016860.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b5808bd0>
Failed to load image: playground/data/ocr_vqa/images/749517735.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b756aa40>
Failed to load image: playground/data/ocr_vqa/images/71351817.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b75f1c20>
Failed to load image: playground/data/ocr_vqa/images/375501983.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b1c2aa90>
Failed to load image: playground/data/ocr_vqa/images/810955563.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b1c2a6d0>
Failed to load image: playground/data/ocr_vqa/images/1563705176.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b76541d0>
Failed to load image: playground/data/ocr_vqa/images/030724055X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b1c0aea0>
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f7344952b80>
Failed to load image: playground/data/ocr_vqa/images/832904651.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344871220>
Failed to load image: playground/data/ocr_vqa/images/875730434.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344882ea0>
Failed to load image: playground/data/ocr_vqa/images/789404427.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7342a824f0>
Failed to load image: playground/data/ocr_vqa/images/812015320.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73447cbd10>
Failed to load image: playground/data/ocr_vqa/images/737303360.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344880950>
Failed to load image: playground/data/ocr_vqa/images/750219378.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f734475c450>
Failed to load image: playground/data/ocr_vqa/images/688151175.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344880cc0>
Failed to load image: playground/data/ocr_vqa/images/1579901387.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7342a82900>
Failed to load image: playground/data/ocr_vqa/images/205260780.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344882270>
Failed to load image: playground/data/ocr_vqa/images/750223391.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344882ae0>
Failed to load image: playground/data/ocr_vqa/images/1560984503.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f7344859d60>
Failed to load image: playground/data/ocr_vqa/images/811726819.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f734467f7c0>
Failed to load image: playground/data/ocr_vqa/images/1890838004.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fb319419040>
Failed to load image: playground/data/ocr_vqa/images/3980621154.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b1ae1d0>
Failed to load image: playground/data/ocr_vqa/images/3884452762.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b1202c0>
Failed to load image: playground/data/ocr_vqa/images/930410629.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31943c5e0>
Failed to load image: playground/data/ocr_vqa/images/1568362021.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b118f90>
Failed to load image: playground/data/ocr_vqa/images/1890916196.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b11b3b0>
Failed to load image: playground/data/ocr_vqa/images/031476271X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7efe10cf8130>
Failed to load image: playground/data/ocr_vqa/images/1883323703.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe0debb400>
Failed to load image: playground/data/ocr_vqa/images/717281434.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe0de9ed10>
Failed to load image: playground/data/ocr_vqa/images/1566864941.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe10c01e50>
Failed to load image: playground/data/ocr_vqa/images/1879505460.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe187b8d10>
Failed to load image: playground/data/web-celebrity/images/Lee_Byung-hun2.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k
cannot identify image file <_io.BytesIO object at 0x7efe10b44e50>
Failed to load image: playground/data/ocr_vqa/images/391040952.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe10c27c20>
Failed to load image: playground/data/ocr_vqa/images/1561701289.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe10cb19f0>
Failed to load image: playground/data/ocr_vqa/images/2061514022.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe10cac9a0>
Failed to load image: playground/data/ocr_vqa/images/739705539.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe10cb1f40>
Failed to load image: playground/data/ocr_vqa/images/1878239589.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7efe10cac590>
Failed to load image: playground/data/ocr_vqa/images/1887089160.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fcfb9126f40>
Failed to load image: playground/data/ocr_vqa/images/471178705.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfba0d6360>
Failed to load image: playground/data/ocr_vqa/images/1564582922.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfc42a2c20>
Failed to load image: playground/data/web-celebrity/images/Lee_Byung-hun.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k
cannot identify image file <_io.BytesIO object at 0x7fcfba0e2ae0>
Failed to load image: playground/data/ocr_vqa/images/155850835X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfba161590>
Failed to load image: playground/data/ocr_vqa/images/B013RVJ7KW.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfba073f40>
Failed to load image: playground/data/ocr_vqa/images/939302322.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfba141ef0>
Failed to load image: playground/data/ocr_vqa/images/1562614479.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfba1441d0>
Failed to load image: playground/data/ocr_vqa/images/531202852.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe2136a0680>
Failed to load image: playground/data/ocr_vqa/images/1567615317.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e79341d0>
Failed to load image: playground/data/ocr_vqa/images/093062596X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e784b310>
Failed to load image: playground/data/ocr_vqa/images/393314286.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e78c25e0>
Failed to load image: playground/data/ocr_vqa/images/968297072.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fe0e77964f0>
Failed to load image: playground/data/ocr_vqa/images/067944680X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Failed to load image: playground/data/ocr_vqa/images/1555951120.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f88b75d7e00>
Failed to load image: playground/data/ocr_vqa/images/B005FOFNA8.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb3193c1e00>
Failed to load image: playground/data/ocr_vqa/images/412076217.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b19ea90>
Failed to load image: playground/data/ocr_vqa/images/055305340X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b19f040>
Failed to load image: playground/data/ocr_vqa/images/1575662698.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fb31b0e5d60>
Failed to load image: playground/data/ocr_vqa/images/749521643.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfb9fce9a0>
Failed to load image: playground/data/ocr_vqa/images/739715534.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fcfba19d810>
Failed to load image: playground/data/ocr_vqa/images/471024961.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f178a669cc0>
Failed to load image: playground/data/ocr_vqa/images/1564588963.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
string index out of range
Failed to load image: playground/data/coco/train2017/000000047952.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a595950>
Failed to load image: playground/data/ocr_vqa/images/968557902.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a536a90>
Failed to load image: playground/data/ocr_vqa/images/471550833.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178863df90>
Failed to load image: playground/data/ocr_vqa/images/1574320793.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a5ad0e0>
Failed to load image: playground/data/ocr_vqa/images/1878239376.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a692180>
Failed to load image: playground/data/ocr_vqa/images/1562613480.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178863df90>
Failed to load image: playground/data/ocr_vqa/images/892133252.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a692e00>
Failed to load image: playground/data/ocr_vqa/images/28608194.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a5ad630>
Failed to load image: playground/data/ocr_vqa/images/1570761493.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a554860>
Failed to load image: playground/data/ocr_vqa/images/135707978.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a595310>
Failed to load image: playground/data/ocr_vqa/images/870408712.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a423c70>
Failed to load image: playground/data/ocr_vqa/images/962770124.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a6408b0>
Failed to load image: playground/data/ocr_vqa/images/B00XLX3W9O.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a6694a0>
Failed to load image: playground/data/ocr_vqa/images/962289027.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f178a60e6d0>
Failed to load image: playground/data/ocr_vqa/images/415913756.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fa70a785d60>
Failed to load image: playground/data/ocr_vqa/images/893463264.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70a790680>
Failed to load image: playground/data/ocr_vqa/images/B01577TUTC.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70a7a3f40>
Failed to load image: playground/data/ocr_vqa/images/1560443952.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70a7a39a0>
Failed to load image: playground/data/ocr_vqa/images/671766627.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70a8a57c0>
Failed to load image: playground/data/ocr_vqa/images/377570177X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70a88f590>
Failed to load image: playground/data/ocr_vqa/images/1558743030.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70b1559f0>
Failed to load image: playground/data/ocr_vqa/images/4544040604.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fa70b149810>
Failed to load image: playground/data/web-celebrity/images/Choi_Min-sik2.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7fca6ee6fea0>
Failed to load image: playground/data/ocr_vqa/images/014005667X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fca6ee50090>
Failed to load image: playground/data/ocr_vqa/images/843129697.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fca6ee87ef0>
Failed to load image: playground/data/ocr_vqa/images/155583468X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fca6ee8f2c0>
Failed to load image: playground/data/web-celebrity/images/Choi_Min-sik.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k
cannot identify image file <_io.BytesIO object at 0x7fca6ee62a90>
Failed to load image: playground/data/ocr_vqa/images/201624508.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fca6ee874f0>
Failed to load image: playground/data/ocr_vqa/images/570035651.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fca6eef19f0>
Failed to load image: playground/data/ocr_vqa/images/945397690.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7fca6ed12220>
Failed to load image: playground/data/ocr_vqa/images/3575228701.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f73e34eae50>
Failed to load image: playground/data/ocr_vqa/images/679441662.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e14b83b0>
Failed to load image: playground/data/ocr_vqa/images/316051772.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e3482270>
Failed to load image: playground/data/ocr_vqa/images/963479903.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e3452db0>
Failed to load image: playground/data/ocr_vqa/images/087033512X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e3482090>
Failed to load image: playground/data/ocr_vqa/images/471542989.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e3452ae0>
Failed to load image: playground/data/ocr_vqa/images/521256771.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e34eaa90>
Failed to load image: playground/data/ocr_vqa/images/1558216103.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e34ea630>
Failed to load image: playground/data/ocr_vqa/images/051770353X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e34ea450>
Failed to load image: playground/data/ocr_vqa/images/811820580.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e24fbdb0>
Failed to load image: playground/data/ocr_vqa/images/B001KBBD4A.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e34523b0>
Failed to load image: playground/data/ocr_vqa/images/688121675.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e3421770>
Failed to load image: playground/data/ocr_vqa/images/876043082.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f73e34f7cc0>
Failed to load image: playground/data/ocr_vqa/images/1560449152.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7faaa3116ae0>
Failed to load image: playground/data/ocr_vqa/images/081182568X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7faaa3a69b80>
Failed to load image: playground/data/ocr_vqa/images/939302349.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7faaa13b8f40>
Failed to load image: playground/data/ocr_vqa/images/471523771.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7faaa3a73ef0>
Failed to load image: playground/data/ocr_vqa/images/1565540581.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7faaa3116c20>
Failed to load image: playground/data/ocr_vqa/images/879801654.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm
Replace train sampler!!
cannot identify image file <_io.BytesIO object at 0x7f5ba6a7cdb0>
Failed to load image: playground/data/ocr_vqa/images/840734921.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f5ba698d900>
Failed to load image: playground/data/ocr_vqa/images/1561700940.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f5ba6b13360>
Failed to load image: playground/data/ocr_vqa/images/891960856.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f5ba69fbc20>
Failed to load image: playground/data/ocr_vqa/images/70224889.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f5ba6a918b0>
Failed to load image: playground/data/ocr_vqa/images/810943794.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f5ba73b29f0>
Failed to load image: playground/data/ocr_vqa/images/933478186.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
cannot identify image file <_io.BytesIO object at 0x7f5ba6a9f360>
Failed to load image: playground/data/ocr_vqa/images/1883323460.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table
    fp16_matmul._update_autotune_table()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table
    TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel)
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table
    autotune_table = cache_manager.load()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load
    with open(self.file_path, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table
    fp16_matmul._update_autotune_table()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table
    TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel)
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table
    autotune_table = cache_manager.load()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load
    with open(self.file_path, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table
    fp16_matmul._update_autotune_table()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table
    TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel)
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table
    autotune_table = cache_manager.load()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load
    with open(self.file_path, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table
    fp16_matmul._update_autotune_table()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table
    TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel)
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table
    autotune_table = cache_manager.load()
  File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load
    with open(self.file_path, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle'
███▉| 1723/1726 [30:14:27<03:04, 61.50s/it]


100%|█████████▉| 1723/1726 [30:14:27<03:04, 61.50s/it]
100%|█████████▉| 1724/1726 [30:15:31<02:04, 62.20s/it]


100%|█████████▉| 1724/1726 [30:15:31<02:04, 62.20s/it]
100%|█████████▉| 1725/1726 [30:16:33<01:02, 62.35s/it]


100%|█████████▉| 1725/1726 [30:16:33<01:02, 62.35s/it]
100%|██████████| 1726/1726 [30:17:36<00:00, 62.45s/it]


100%|██████████| 1726/1726 [30:17:36<00:00, 62.45s/it][INFO|trainer.py:1962] 2024-06-11 06:55:06,153 >>

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 109063.037, 'train_samples_per_second': 16.207, 'train_steps_per_second': 0.016, 'train_loss': 1.2428148501441487, 'epoch': 1.0}



100%|██████████| 1726/1726 [30:17:42<00:00, 62.45s/it]
100%|██████████| 1726/1726 [30:17:42<00:00, 63.19s/it]
[INFO|trainer.py:2936] 2024-06-11 06:55:08,482 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re
[INFO|configuration_utils.py:473] 2024-06-11 06:55:08,486 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/config.json
[INFO|configuration_utils.py:594] 2024-06-11 06:55:08,491 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/generation_config.json
[INFO|modeling_utils.py:2493] 2024-06-11 06:55:16,362 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/model.safetensors
[INFO|tokenization_utils_base.py:2433] 2024-06-11 06:55:16,433 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-11 06:55:16,444 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-11 06:55:16,450 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/added_tokens.json
***** train metrics *****
  epoch                    =               1.0
  train_loss               =            1.2428
  train_runtime            = 1 day, 6:17:43.03
  train_samples            =           1767531
  train_samples_per_second =            16.207
  train_steps_per_second   =             0.016
